Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#5322 closed task (complete)

Subnet manipulation design

Reported by: tomek Owned by: UnAssigned
Priority: medium Milestone: Kea1.3 beta
Component: configuration Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description (last modified by tomek)

#5284 defined the requirements and API for the subnet manipulation commands.

Marcin proposed the following:

In case of host reservations and leases we have implemented entities
that provide an API to communicate with the databases: HostMgr and
LeaseMgr. I'd suggest that now that we create a code that adds and
modifies certain bits of the configuration and it may be a good time to
create such entity for the Kea configuration. We can't call it CfgMgr
because this name is already reserved. We can make a quiz for a good
name. Anyway, this "manager" would provide an API to manipulate the
configuration and would be eventually able to talk to MySQL, Postgres or
Cassandra. Now it would only talk to the in-memory configuration storage
held in CfgMgr.
 
This entity would be a friend of the CfgMgr and thus it would have
access to the restricted functions, such as those that return non-const
pointers to the "committed" configuration. That way the manager would be
the only class allowed to modify committed configuration directly. Every
other class would not be a friend, thus would only be able to read the
configuration as it does now.

This ticket covers the general design with the caveat that only the parts needed for subnet manipulation are expected to be detailed. No significant amount of time should be spent on other aspects.

The following existing documents are relevant: SubnetCommands and Commands.

Depending on the approach, there should be some way to check whether the configuration is up to date, has been changed or tampered with. The following tickets could be considered here: #5299 (timestamp of the last configuration change), #5300 (configuration is versioned) or #5301 (configuration digest). Other approaches to the problem are possible as well.

Subtickets

Change History (11)

comment:1 Changed 2 years ago by tomek

  • Milestone changed from Kea-proposed to Kea1.3

Accepting as discussed on 2017-06-22 call.

comment:2 Changed 2 years ago by tomek

  • Description modified (diff)

comment:3 Changed 2 years ago by marcin

  • Owner set to marcin
  • Status changed from new to accepted

comment:4 Changed 2 years ago by marcin

  • Owner changed from marcin to UnAssigned
  • Status changed from accepted to reviewing

I have created the first version of the design here: http://kea.isc.org/wiki/SubnetCommandsDesign

Please read and comment.

comment:5 follow-up: Changed 2 years ago by fdupont

Use threads to handle concurrent read and write changes is not a valid solution: to keep consistency write changes require locking the impacted data structures and at the end we could finish with only one thread being able to do useful work, all others waiting for something to do including the lock to be release: no parallelism and lock complexity, i.e. no benefit at all.
Note we can't use things like RCU because changes must be committed before returning the status to the caller. So advantage of to be mono thread is there is no need to maintain consistency at a fine grain level, only between phases of a big update interleaved with the service activity (cf the AFTR design).

comment:6 Changed 2 years ago by fdupont

About http://kea.isc.org/wiki/SubnetCommands I believe the update operation is done by a delete followed by an add, isn't it?
As it is not atomic it supposes a way to rollback the delete but an easy (and expensive) way to do this is just to write the config and to reload it if something wrong happens...

comment:7 in reply to: ↑ 5 ; follow-up: Changed 2 years ago by marcin

Replying to fdupont:

Use threads to handle concurrent read and write changes is not a valid solution: to keep consistency write changes require locking the impacted data structures and at the end we could finish with only one thread being able to do useful work, all others waiting for something to do including the lock to be release: no parallelism and lock complexity, i.e. no benefit at all.

There are tradeoffs obviously and I am aware of all the issues. The major problem to solve is how to guarantee minimal delays in response to the controlling client, how to guarantee that the DHCP service remains responsive while we deal with large configuration rewrites etc. You don't seem to propose any solution for these issues. Do you?

Note we can't use things like RCU because changes must be committed before returning the status to the caller. So advantage of to be mono thread is there is no need to maintain consistency at a fine grain level, only between phases of a big update interleaved with the service activity (cf the AFTR design).

Whether changes have to be committed before we respond is a question. Note that all current proposals I have seen propose to not commit any configuration changes until the controlling client performs config-get and then config-set. I am not terribly pleased with this solution because it generates additional traffic over the wire without any real benefit. But, it also indicates that we're somewhat open to accepting inconsistent states until the changes are written to disk. I think I am ok with allowing inconsistent states (perhaps enabled by configuration parameter) but without unnecessarily generating additional traffic. Always writing changes to disk before responding means serious performance implications in my opinion and, again, I don't see how you propose to deal with this.

One of the things to slightly mitigate performance issues might be to clone the current configuration as a background task after this configuration has been committed and the server starts using it. Then, when someone wants to apply a partial change to the existing configuration he will be given a pointer to this cloned configuration, modifies it and then server starts using this new configuration. If, the client comes again before the cloned configuration has been created he will be locked to wait for it to be ready but this is not significantly worse than doing everything mono-threaded. The other issue is that we don't have a way to clone/deep copy an existing configuration right now.

comment:8 in reply to: ↑ 7 Changed 2 years ago by fdupont

Replying to marcin:

Replying to fdupont:
You don't seem to propose any solution for these issues. Do you?

=> indirectly with the AFTR reference (which has similar constraints so a solution we can take ideas from).

Whether changes have to be committed before we respond is a question. Note that all current proposals I have seen propose to not commit any configuration changes until the controlling client performs config-get and then config-set. I am not terribly pleased with this solution because it generates additional traffic over the wire without any real benefit.

=> I understand the config-set as it implements brute force rollback but not the config-set. Do you mean the changes are done in the staging config and the result is extracted and applied as a whole? Note it makes sense but if the first (apply changes to staging) and the second (staging get) phases can be done in parallel this implies a big stop to switch configuration at the config-set, i.e., it reuses the current code but it does not improve its main problem...
BTW when I talked about RCU it was about fine grain RCUs vs locks. Here the method you describe is a kind of global RCU.

But, it also indicates that we're somewhat open to accepting inconsistent states until the changes are written to disk. I think I am ok with allowing inconsistent states (perhaps enabled by configuration parameter) but without unnecessarily generating additional traffic. Always writing changes to disk before responding means serious performance implications in my opinion and, again, I don't see how you propose to deal with this.

=> in place of writing the whole config and resume the service we need to use a journal with incremental changes and consolidate a whole config copy by parts or by another tool or...

One of the things to slightly mitigate performance issues might be to clone the current configuration as a background task after this configuration has been committed and the server starts using it. Then, when someone wants to apply a partial change to the existing configuration he will be given a pointer to this cloned configuration, modifies it and then server starts using this new configuration. If, the client comes again before the cloned configuration has been created he will be locked to wait for it to be ready but this is not significantly worse than doing everything mono-threaded. The other issue is that we don't have a way to clone/deep copy an existing configuration right now.

=> this supposes the deep copy (*) and configuration switching is fast (* *).
(* *) this could be not the case when the new configuration is a deep copy, i.e., this means nothing points to the previous configuration at switch time.
(*) this is related to trace and copy (aka two spaces) garbage collectors.

comment:9 Changed 2 years ago by fdupont

An idea: if we want to work on another task/process on a staging config it should be easier to start from a config in the ElementPtr format, i.e., call first toElement(). This gives a gratuitous backup too...

comment:10 Changed 2 years ago by tomek

  • Resolution set to complete
  • Status changed from reviewing to closed

The code for subnet manipulation was implemented couple weeks ago. I have updated the design document slightly. Sure, it could be improved further, but we'll go with what we have now. After discussing the matter briefly on jabber, Thomas and Tomek agree that this ticket can be closed.

If you disagree with that, please reopen.

comment:11 Changed 2 years ago by vicky

  • Milestone changed from Kea1.3 to Kea1.3 beta

Milestone renamed

Note: See TracTickets for help on using tickets.