Opened 3 years ago

Closed 3 years ago

#5058 closed defect (fixed)

Kea4 fail to start when there is more than 100 declined leases in csv file.

Reported by: wlodekwencel Owned by: tmark
Priority: medium Milestone: Kea1.2
Component: dhcp4 Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 1
Total Hours: 8 Internal?: no

Description (last modified by wlodekwencel)

When kea4 decline an address it puts lease without client id or mac address into csv file:

172.31.26.225,,,86400,1477425929,1,0,0,,1

but this kind of lease trigger error while kea reads it from file after restart. When csv file contain 101 or more such leases Kea wont start and log:

kea-dhcp4[14193]: 2016-10-25 16:35:28.129 ERROR [kea-dhcp4.dhcp4/14193] DHCP4_INIT_FAIL failed to
initialize Kea server: configuration error using file '/etc/kea/kea-dhcp4.conf': exceeded maximum
number of failures 100 to read a lease from the lease file /var/lib/kea/kea-leases4.csv

Included leases file that triggers an error which was submitted by Munroe Sollog.

Subtickets

Attachments (1)

csv.tar.gz (781.1 KB) - added by wlodekwencel 3 years ago.

Download all attachments as: .zip

Change History (11)

Changed 3 years ago by wlodekwencel

comment:1 Changed 3 years ago by wlodekwencel

  • Description modified (diff)

comment:2 Changed 3 years ago by wlodekwencel

  • Milestone changed from Kea-proposed to Kea1.2

To consider: during startup Kea should log messages about each invalid lease in file when configured to DEBUG mode

comment:3 Changed 3 years ago by tmark

A bit of digging reveals that CSVLeaseFile4::readHWAddr() throws an exception because hwaddr column is empty. This exception is caught in CSVLeaseFile4::next(), which increments the read error count, saves the error message (which we do not appear to report), and discards the record.

So currently we do not accept lease rows that do not have a MAC address. The client id can be empty, but not mac address. We would need to alter the validation logic to accept leases with no hwaddr AND the state is declined.

The error message mechanism doesn't seem to work properly. It appears as though we save the message but it is never reported anywhere. We should probably report each error as it occurs and go on. This is what ISC DHCP does.

We might also consider a command line flag or perhaps a configuration parameter to override the number of errors allowed, in case someone wants to force a file load.

comment:4 Changed 3 years ago by tmark

  • Owner set to tmark
  • Status changed from new to assigned

comment:5 Changed 3 years ago by tmark

  • Owner changed from tmark to Unassigned
  • Status changed from assigned to reviewing
  • Total Hours changed from 0 to 4

CSVLeaseFile4 loading was modified to accept empty hardware addresses only if the lease state is declined.

A error log message is now emitted by LeaseFileLoader? for each invalid row it encounters as it is loading the file.

As to whether or not we add a configuration parameter to override the maximum number of errors, that probably warrants a separate ticket, if we decide to do it.

ChangeLog? entry:

11xx.   [bug]       tmark
    kea-dhcp4 now correctly loads declined leases from CSV
    lease files. Prior to this, declined leases were being
    incorrectly and silentlydiscarded.  In addition, both
    kea-dhcp4 and kea-dhcp6 will now emit an error log for 
    each invalid row encountered when loading leases from 
    CSV files.
    (Trac #5058, git TBD)

comment:6 Changed 3 years ago by tomek

  • Owner changed from Unassigned to tomek

comment:7 Changed 3 years ago by tomek

  • Add Hours to Ticket changed from 0 to 2
  • Owner changed from tomek to tmark
  • Total Hours changed from 4 to 6

csv_lease_file4_unittest.cc
s/delcined/declined/

s/address or only/address are only/

The comment in line 397 seems odd "and report as needing downgrade". I
don't get it. needsConversion() is expected to return false when the
schema is current. This seems to be the case here. So why "downgrade"?


This ticket fixes the original issue of not loading declined leases
correctly. It addresses this particular issue, but does not cover the
case when the leases are broken for whatever other reason. We still
have that problem of bailing out after 100 errors. The value is
hardcoded in MAX_LEASE_ERRORS in memfile_lease_mgr.cc. 100 is a fine
default value, but it should be configurable. If you agree with this
approach, we have couple options. First, if you think it's proper to
do it within this ticket, please do so. If you prefer it to be done
is a separate ticket, can you create one?


This solves the issue for v4. How about v6? I looked at the code and
the leases seem to have the same characteristic. I compared
Dhcpv4Srv::declineLease with Dhcpv6Srv::declineLease and both
seem to remove the identifier (HWAddr in v4 or DUID in v6).
However, the difference is in the csv loading routines. While the
CSVLeaseFile4::readHWAddr had an exception clause, the
CSVLeaseFile6::readDUID() doesn't. So the code looks not vulnerable.

Nevertheless, it would be useful to have similar check in
CSVLeaseFile6::next() for empty DUID and state being declined as you
added in its v4 counterpart. Given the diff for this ticket is only
166 long, I think it's reasonable to include this (smallish) task
within this ticket.


The code compiles and unit-tests pass on Ubuntu 16.04 x64.


The changelog looks almost ok, but there should be space in
"silentlydiscarded".

comment:8 Changed 3 years ago by tmark

  • Owner changed from tmark to tomek

Replying to tomek:

csv_lease_file4_unittest.cc
s/delcined/declined/

s/address or only/address are only/

Fixed

The comment in line 397 seems odd "and report as needing downgrade". I
don't get it. needsConversion() is expected to return false when the
schema is current. This seems to be the case here. So why "downgrade"?

Comment was a copy-and-paste left over. Deleted it.


This ticket fixes the original issue of not loading declined leases
correctly. It addresses this particular issue, but does not cover the
case when the leases are broken for whatever other reason. We still
have that problem of bailing out after 100 errors. The value is
hardcoded in MAX_LEASE_ERRORS in memfile_lease_mgr.cc. 100 is a fine
default value, but it should be configurable. If you agree with this
approach, we have couple options. First, if you think it's proper to
do it within this ticket, please do so. If you prefer it to be done
is a separate ticket, can you create one?

I consider that a separate issue and have created a new ticket, #5059.


This solves the issue for v4. How about v6? I looked at the code and
the leases seem to have the same characteristic. I compared
Dhcpv4Srv::declineLease with Dhcpv6Srv::declineLease and both
seem to remove the identifier (HWAddr in v4 or DUID in v6).
However, the difference is in the csv loading routines. While the
CSVLeaseFile4::readHWAddr had an exception clause, the
CSVLeaseFile6::readDUID() doesn't. So the code looks not vulnerable.

I did verify that v6 declined leases load without issue but neglected to mention it in my ticket comments.

Nevertheless, it would be useful to have similar check in
CSVLeaseFile6::next() for empty DUID and state being declined as you
added in its v4 counterpart. Given the diff for this ticket is only
166 long, I think it's reasonable to include this (smallish) task
within this ticket.

Actually, Lease6::decline() sets the DUID to a 1 byte value of 0x0, as DUID instances cannot created with no data. When the lease is written out the, DUID is written "00" and and therefore not physically empty. As rows with no DUID value, (i.e "") are never valid, I have added to logic to ensure that rows whose state is declined permit this value.

I replaced DUID::generateEmpty() which constructs a DuidPtr?, with DUID::EMPTY() which returns a constant reference to a DUID. This allows one to consistently test for the "empty" value.


The code compiles and unit-tests pass on Ubuntu 16.04 x64.

Fabulous! Thanks.


The changelog looks almost ok, but there should be space in
"silentlydiscarded".

:
Duly noted.

Last edited 3 years ago by tmark (previous) (diff)

comment:9 Changed 3 years ago by tomek

  • Add Hours to Ticket changed from 2 to 1
  • Owner changed from tomek to tmark
  • Total Hours changed from 6 to 7

I have reviewed your updates. They are good. Unit-tests passed again. Code is ready to go.

Thanks for submitting #5059 and dealing with this ticket.

comment:10 Changed 3 years ago by tmark

  • Resolution set to fixed
  • Status changed from reviewing to closed
  • Total Hours changed from 7 to 8

Changes merged with git 29b088079bed3c5059fdf8a43a4e79cd7f9a4207
Added Changelog entry 1183.

Ticket is complete.

Note: See TracTickets for help on using tickets.