Opened 2 years ago

Closed 2 years ago

#5574 closed enhancement (complete)

make the db lost callback a member of the host manager

Reported by: fdupont Owned by: fdupont
Priority: medium Milestone: Kea1.4
Component: database-all Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Required to merge #5533 aka host cache to current master.

Subtickets

Change History (16)

comment:1 follow-up: Changed 2 years ago by fdupont

Facts:

  • database reconnect adds an optional reconnect callback named db_lost_callback to HostDataSource constructor.
  • this callback is in fact a parameter of createManager passed down.
  • trac5533a includes the changes for multiple alternate host data sources which are orthogonal to reconnect so a merge of trac5533a into master branch gives a lot of conflicts pretty hard to solve manually.

Proposal:

  • not solve conflicts manually at merge time without a review.
  • instead of passing the reconnect callback down makes it a host manager service.
  • the callback closure parameter of createManager will be registered.
  • database code which wants to reconnect will call the host manager reconnect service.
  • the ticket branch can be reviewed before a far easier merge of trac5533a.

comment:2 in reply to: ↑ 1 ; follow-up: Changed 2 years ago by tmark

Replying to fdupont:

Facts:

  • database reconnect adds an optional reconnect callback named db_lost_callback to HostDataSource constructor.
  • this callback is in fact a parameter of createManager passed down.
  • trac5533a includes the changes for multiple alternate host data sources which are orthogonal to reconnect so a merge of trac5533a into master branch gives a lot of conflicts pretty hard to solve manually.

Proposal:

  • not solve conflicts manually at merge time without a review.

Agreed.

  • instead of passing the reconnect callback down makes it a host manager service.

There is no reconnect callback. There is dbReconnect() method on ControlledDhcpv4/6Srv, which
is initially called by ControlledDhcpv4/6Srv::dbLostCallback() IF recovery is enabled.
This method invokes CfgMgr::createManagers(); and reschedules calls to itself creation fails.

What exactly would this reconnect service do?

  • the callback closure parameter of createManager will be registered.

Registered where?

  • database code which wants to reconnect will call the host manager reconnect service.

What "database code" are you referring to? Where would calls to this reconnect service be, precisely?

  • the ticket branch can be reviewed before a far easier merge of trac5533a.

Currently the mechanism is as follows:

The Controller passes it's dbLostCallback method to CfgDbAccesss::createManagers()
This is propagated all the way down to DatabaseConnection? derivations used by
both LeaseMgrs? and HostDataSources?.

When a DatabaseConnection? detects an unrecoverable error it invokes this
callback and then throws an exception so the code which invoked the failed DB
operation can still fail, allowing the current server activity, such as responding
to a client query to fail gracefully.

Meanwhile, the Controller will have suspended DHCP service and begun
reconnect attempts, assuming they are enabled, by invoking its dbReconnect()
function. This function calls CfgDbAccess::createManagers() and either restores
DHCP service upon success, or schedules a call to itself upon failure unless
retry attempts have been exhausted.

The concept is that the server cannot meaningfully function unless all of its
data sources are online. Thus, the server must stop DHCP service while it
attempts to reconnect, and then restore it once it has.

How do your changes fit into or alter this basic scheme?

I'm guessing you don't want to simply recreate host data sources as this would
wipe out existing cache content. Do we need a reconnectManagers() service on
CfgDbAccess? to distinguish that from createManagers()?



comment:3 in reply to: ↑ 2 Changed 2 years ago by fdupont

Replying to tmark:

Replying to fdupont:

Facts:

  • database reconnect adds an optional reconnect callback named db_lost_callback to HostDataSource constructor.
  • this callback is in fact a parameter of createManager passed down.
  • trac5533a includes the changes for multiple alternate host data sources which are orthogonal to reconnect so a merge of trac5533a into master branch gives a lot of conflicts pretty hard to solve manually.

Proposal:

  • not solve conflicts manually at merge time without a review.

Agreed.

  • instead of passing the reconnect callback down makes it a host manager service.

There is no reconnect callback.

=> I meant the database lost callback.

There is dbReconnect() method on ControlledDhcpv4/6Srv, which
is initially called by ControlledDhcpv4/6Srv::dbLostCallback() IF recovery is enabled.
This method invokes CfgMgr::createManagers(); and reschedules calls to itself creation fails.

What exactly would this reconnect service do?

=> the same that the the current dbLostCallback. What I want to change (and only change)
is the way it is called: instead of getting the callback as a constructor argument use a
host manager entry point.

  • the callback closure parameter of createManager will be registered.

Registered where?

=> in the host manager.

  • database code which wants to reconnect will call the host manager reconnect service.

What "database code" are you referring to? Where would calls to this reconnect service be, precisely?

=> s/reconnect/lost/ (I abuse the fact that the lost callback reconnects).

How do your changes fit into or alter this basic scheme?

=> no, I want only to change the way the service is invoked.

I'm guessing you don't want to simply recreate host data sources as this would
wipe out existing cache content. Do we need a reconnectManagers() service on
CfgDbAccess? to distinguish that from createManagers()?

=> ideal we should do this, i.e., only reconnect lost connections. But it does not
matter today because host cache is not used with real databases. I believe we should
keep the things simple until we have a good reason to address that. In fact
I think I only add the type of the caller with existing parameters (retries/timeout).
Makes sense as we can have multiple backends but it will be used only for
debug logs.

comment:4 Changed 2 years ago by tmark

I'm not entirely clear on the details but it seems like your on a workable path. I'd suggest you go ahead with your plans and we'll review it prior to merge.

comment:5 Changed 2 years ago by fdupont

The whole goal of this ticket is to make you able to review it before merge.

comment:6 Changed 2 years ago by fdupont

  • Component changed from Unclassified to database-all
  • Milestone changed from Kea-proposed to Kea1.4
  • Owner set to fdupont
  • Status changed from new to accepted

comment:7 follow-up: Changed 2 years ago by fdupont

Did the library code. Finally I use a static member of DataConnection?
(makes sense as it is lease and host).
Two questions:

  • the static member is named `db_lost_callback': should it be a Camel name?
  • I added a backend_name in the ReconnectCtl class. Should it be type as it is the type which goes into it?

I am working on src/bin but someone forgot to regenerate flex file and my compile does not like register...

comment:8 in reply to: ↑ 7 ; follow-up: Changed 2 years ago by tmark

Replying to fdupont:

Did the library code. Finally I use a static member of DataConnection?
(makes sense as it is lease and host).
Two questions:

  • the static member is named `db_lost_callback': should it be a Camel name?

I suppose that's debatable as it's a function pointer, one could argue it should be camel back,
but I believe our convention is to name as we would data members, which would be underscored.

  • I added a backend_name in the ReconnectCtl class. Should it be type as it is the type which goes into it?

Do you mean type as in "memfile, mysql, postgresql..." or type as in "lease or host"?

I am working on src/bin but someone forgot to regenerate flex file and my compile does not like register...

comment:9 in reply to: ↑ 8 ; follow-up: Changed 2 years ago by fdupont

Replying to tmark:

Replying to fdupont:

Did the library code. Finally I use a static member of DataConnection?
(makes sense as it is lease and host).
Two questions:

  • the static member is named `db_lost_callback': should it be a Camel name?

I suppose that's debatable as it's a function pointer, one could argue it should be camel back,
but I believe our convention is to name as we would data members, which would be underscored.

=> it is a class member. There is not enough to get the convention. BTW as class vs instance member I did not put a final _.

  • I added a backend_name in the ReconnectCtl class. Should it be type as it is the type which goes into it?

Do you mean type as in "memfile, mysql, postgresql..." or type as in "lease or host"?

=> the first, If one day we had the second (useful too as database setting can be very different) I propose to name it "kind".

I am working on src/bin but someone forgot to regenerate flex file and my compile does not like register...

=> I found a duplicate entry I removed in master.

I can't find the unit tests exercising the lost callback. I believe I'll dig directly in there reconnect ticket. Perhaps it was not yet merged?
Note with my proposal the change for database code is simpler: just invoke the callback (same API), no need to pass down it.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 2 years ago by tmark

Replying to fdupont:

Replying to tmark:

Replying to fdupont:

Did the library code. Finally I use a static member of DataConnection?
(makes sense as it is lease and host).
Two questions:

  • the static member is named `db_lost_callback': should it be a Camel name?

I suppose that's debatable as it's a function pointer, one could argue it should be camel back,
but I believe our convention is to name as we would data members, which would be underscored.

=> it is a class member. There is not enough to get the convention. BTW as class vs instance member I did not put a final _.

  • I added a backend_name in the ReconnectCtl class. Should it be type as it is the type which goes into it?

Do you mean type as in "memfile, mysql, postgresql..." or type as in "lease or host"?

=> the first, If one day we had the second (useful too as database setting can be very different) I propose to name it "kind".

I'd suggest "type" as that's what the value is named within database parameters and used within the lease and host factories.

I am working on src/bin but someone forgot to regenerate flex file and my compile does not like register...

=> I found a duplicate entry I removed in master.

I can't find the unit tests exercising the lost callback. I believe I'll dig directly in there reconnect ticket. Perhaps it was not yet merged?
Note with my proposal the change for database code is simpler: just invoke the callback (same API), no need to pass down it.

The unit tests were added under 5556, which has not yet been merged, it's stuck waiting for review. They don't verify actual reconnect, only the callback's invocation when connectivity is actually lost.

comment:11 in reply to: ↑ 10 Changed 2 years ago by fdupont

Replying to tmark:

I'd suggest "type" as that's what the value is named within database parameters and used within the lease and host factories.

=> change done.

The unit tests were added under 5556, which has not yet been merged, it's stuck waiting for review. They don't verify actual reconnect, only the callback's invocation when connectivity is actually lost.

=> got the explanation. BTW now the question is to merge 5574 first and update 5556 so it can be reviewed and merged easily, or use another order.

comment:12 Changed 2 years ago by fdupont

  • Owner changed from fdupont to tmark
  • Status changed from accepted to reviewing

For final review (if needed).

comment:13 follow-up: Changed 2 years ago by tmark

I am not thrilled with the DatabaseConnection::db_lost_callback being a static. I can see the argument for it but I can also make an argument against it. By making it static, all DatabaseConnection? derivations in a given process will share the same callback or lack thereof. Making it static is simpler in that it doesn't have to be passed down, but it also means a loss of flexibility. Should we ever want to have different callbacks for different connections we cannot. Imagine a server running a worker thread with a secondary connection to the lease database for doing statistics work in the background. We might or might not want that thread's connection to have the same callback. To that extent by making it static, you are tying our hands.

comment:14 Changed 2 years ago by tmark

  • Owner changed from tmark to fdupont

comment:15 in reply to: ↑ 13 Changed 2 years ago by fdupont

Replying to tmark:

I am not thrilled with the DatabaseConnection::db_lost_callback being a static. I can see the argument for it but I can also make an argument against it. By making it static, all DatabaseConnection? derivations in a given process will share the same callback or lack thereof. Making it static is simpler in that it doesn't have to be passed down, but it also means a loss of flexibility. Should we ever want to have different callbacks for different connections we cannot. Imagine a server running a worker thread with a secondary connection to the lease database for doing statistics work in the background. We might or might not want that thread's connection to have the same callback. To that extent by making it static, you are tying our hands.

=> yes but at the other hand the lost callback is per server so it is not a problem and greatly simplify things. Note if we want to have different processing it can be enough to add new parameters (e.g. like the backend type) via DatabaseConnection? and ReconnectCtl? instances. So IMHO we do not lose critical (including in future) functionality.

comment:16 Changed 2 years ago by fdupont

  • Resolution set to complete
  • Status changed from reviewing to closed

Merged. Closing even the subject is not fully closed...

Note: See TracTickets for help on using tickets.