Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#3973 closed task (complete)

Implement lease reclamation routines for DHCPv4 and DHCPv6

Reported by: marcin Owned by: marcin
Priority: high Milestone: Kea1.0-beta
Component: dhcp Version: git
Keywords: expiration Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

See the Design for the details.

Subtickets

Change History (8)

comment:1 Changed 5 years ago by marcin

  • Keywords expiration added

comment:2 Changed 5 years ago by marcin

  • Owner set to UnAssigned
  • Status changed from new to reviewing

Implemented leases reclamation routines for DHCPv4 and DHCPv6. I also added new unit tests for testing different scenarios, e.g. DDNS and no DDNS, statistics, limit on the number of leases reclaimed in a single shot etc.

What is still to be implemented is the execution of lease4_expire and lease6_expire hooks but this is a matter for #3972. Note that implementation of the hooks would allow me to easier test the timeout for the execution of the lease reclamation routines. So for now the timeout parameter is not covered in the unit tests, but appropriate todos have been added.

The lease reclamation routines are not used by the server yet, so I think it is not necessary to add a changelog for it.

comment:3 Changed 5 years ago by tomek

  • Owner changed from UnAssigned to tomek

comment:4 follow-up: Changed 5 years ago by tomek

  • Owner changed from tomek to marcin

alloc_engine_messages.mes
ALLOC_ENGINE_LEASE_RECLAIMED - The text should explain what 'reclaim'
means. A simple explanation like "The lease is now available for
assignment." would do the trick.

ALLOC_ENGINE_REMOVAL_NCR_FAILEDALLOC_ENGINE_REMOVAL_NCR_FAILED
Please don't use D2. It's our internal nickname.

ALLOC_ENGINE_V4_LEASE_RECLAMATION_FAILED
The text says "The error may be triggered in the lease expiration
hook". Do you mean the skip flag? If that's the case, I think
the wording should be tweaked here. Setting the flag should not
cause the server to log anything on error. Neither the actions
conducted as results should be referred to as failures. I'd like
to avoid a situation, when hook implementer uses the API correctly
and the server starts producing errors described in the doc as
failures.

ALLOC_ENGINE_V4_LEASES_RECLAMATION_COMPLETE mentions that the
maximum number of leases reclaimed is configurable, but does not
mention the parameter name.

% ALLOC_ENGINE_V4_LEASES_RECLAMATION_START
(limit = %1 leases, timeout = %2 seconds). It may be easier to
understand if you reword it as (limit = %1 leases or %2 seconds).

The last 3 comments also apply to their v6 equivalents.

alloc_engine.h

template<typename LeasePtrType?, typename IdentifierType?>
Comment for queueRemovalNameChangeRequest describes Identifer
template parameter, but the actual parameter is called IdentifierType?.

In "For DHCPv4 it will be hardware address, client
identifier." I would substitute coma with "or".

alloc_engine.cc
reclaimExpiredLeases6, in line 1311, lease->duid_ is dereferenced.
We had this assumption that DUID is always set, but after adding
declined leases (leases that don't have any client information
associated with them) this may or may not be true. We have 2 options
here:

  1. assume that DUID is always set, but may sometimes represent an

empty DUID. We'd need to loosen the check in DUID for non-emptiness.
We'd need to make sure that all backends initialize DUID, even if
there's no data in the database/lease file.

  1. assume that DUID may be null. This would require reviewing current

code and finding all instances where we assumed that DUID is always
set.

Due to the difficulty of conducting option 2, my personal opinion is
that 1. is the way to go. If you agree with that, then there's nothing
to do in that context in this ticket, but probably a separate ticket
needs to be created for checking that the backends really initialize
empty DUIDs.

Nevertheless, I'd add safety check before dereferencing lease->duid_.

I think the @todo for checking timeout is misplaced. It should appear
somewhere near the end of the loop. That applies to both 4 and 6
versions.

You did update statistics, but there are no doc updates. See
doc/guide/dhcp{4,6}-srv.xml. Search for dhcp{4,6}-stats section.

alloc_engine_expiration_unittest.cc
Let me say these are very thorough and well designed tests. Excellent work!
Now, on to the details...

The description of ExpirationAllocEngineTest? should mention statistics
in its description.

testStatistics description should document return value.

Return comment for leaseReclaimed should be updated, as the method
checks more than just the lease state.

The code checks whether NCR is created for a given lease, but I
couldn't find a check if it is of proper type. Please either
add a check or @todo to address that in the future.


Code compiled and unit-tests passed on Ubuntu 15.04 x64.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 5 years ago by marcin

  • Owner changed from marcin to tomek

Replying to tomek:

alloc_engine_messages.mes
ALLOC_ENGINE_LEASE_RECLAIMED - The text should explain what 'reclaim'
means. A simple explanation like "The lease is now available for
assignment." would do the trick.

Added the sentence.

ALLOC_ENGINE_REMOVAL_NCR_FAILEDALLOC_ENGINE_REMOVAL_NCR_FAILED
Please don't use D2. It's our internal nickname.

I was always confused whether it is internal or external. If this is internal, I don't know what the external name is. I changed it to DHCP DDNS.

ALLOC_ENGINE_V4_LEASE_RECLAMATION_FAILED
The text says "The error may be triggered in the lease expiration
hook". Do you mean the skip flag? If that's the case, I think
the wording should be tweaked here. Setting the flag should not
cause the server to log anything on error. Neither the actions
conducted as results should be referred to as failures. I'd like
to avoid a situation, when hook implementer uses the API correctly
and the server starts producing errors described in the doc as
failures.

No, it is not skip. It is an exception thrown from the hook. The hook should basically handle its own exception and log errors, but if it doesn't for any reason we must catch it here to not fall over. Skip will be handled in a separate ticket.

ALLOC_ENGINE_V4_LEASES_RECLAMATION_COMPLETE mentions that the
maximum number of leases reclaimed is configurable, but does not
mention the parameter name.

Ok, I have added parameters by their name, but they haven't been implemented yet. However, this shouldn't be a problem because it will be added pretty soon and we don't claim that everything on master is releasable.

% ALLOC_ENGINE_V4_LEASES_RECLAMATION_START
(limit = %1 leases, timeout = %2 seconds). It may be easier to
understand if you reword it as (limit = %1 leases or %2 seconds).

Modifed as suggested.

The last 3 comments also apply to their v6 equivalents.

Ok.

alloc_engine.h

template<typename LeasePtrType?, typename IdentifierType?>
Comment for queueRemovalNameChangeRequest describes Identifer
template parameter, but the actual parameter is called IdentifierType?.

Got it.

In "For DHCPv4 it will be hardware address, client
identifier." I would substitute coma with "or".

Added or.

alloc_engine.cc
reclaimExpiredLeases6, in line 1311, lease->duid_ is dereferenced.
We had this assumption that DUID is always set, but after adding
declined leases (leases that don't have any client information
associated with them) this may or may not be true. We have 2 options
here:

  1. assume that DUID is always set, but may sometimes represent an

empty DUID. We'd need to loosen the check in DUID for non-emptiness.
We'd need to make sure that all backends initialize DUID, even if
there's no data in the database/lease file.

  1. assume that DUID may be null. This would require reviewing current

code and finding all instances where we assumed that DUID is always
set.

Due to the difficulty of conducting option 2, my personal opinion is
that 1. is the way to go. If you agree with that, then there's nothing
to do in that context in this ticket, but probably a separate ticket
needs to be created for checking that the backends really initialize
empty DUIDs.

I think 1 is my preference too.

Nevertheless, I'd add safety check before dereferencing lease->duid_.

Added check

I think the @todo for checking timeout is misplaced. It should appear
somewhere near the end of the loop. That applies to both 4 and 6
versions.

It is judgement call. I just put it next to Stopwatch declaration because it will be used to track timeout. In any case, I am going to remove this todo pretty soon anyway, so does it matter?

You did update statistics, but there are no doc updates. See
doc/guide/dhcp{4,6}-srv.xml. Search for dhcp{4,6}-stats section.

I am planning to address documentation in separate tickets.

alloc_engine_expiration_unittest.cc
Let me say these are very thorough and well designed tests. Excellent work!
Now, on to the details...

Thanks.

The description of ExpirationAllocEngineTest? should mention statistics
in its description.

Mentioned it now.

testStatistics description should document return value.

Added.

Return comment for leaseReclaimed should be updated, as the method
checks more than just the lease state.

Updated.

The code checks whether NCR is created for a given lease, but I
couldn't find a check if it is of proper type. Please either
add a check or @todo to address that in the future.

I added this check in one of the "algorithm" functions.


Code compiled and unit-tests passed on Ubuntu 15.04 x64.

Thanks.

comment:6 in reply to: ↑ 5 Changed 5 years ago by tomek

  • Owner changed from tomek to marcin

All changes are OK, please merge.


Updated code compiles on Ubuntu 15.04 x64 and all unit-tests pass.

comment:7 Changed 5 years ago by marcin

  • Resolution set to complete
  • Status changed from reviewing to closed

Merged with commit d24119e808bf98f71b32cee6e568babfe4ceb7ad

comment:8 Changed 4 years ago by tomek

  • Milestone changed from Kea1.0 to Kea1.0-beta

Milestone renamed

Note: See TracTickets for help on using tickets.