Opened 7 years ago

Closed 6 years ago

#2940 closed defect (fixed)

DHCPv4 server crashes when using Memfile backend and running perfdhcp

Reported by: marcin Owned by: marcin
Priority: medium Milestone: Kea0.8
Component: dhcp4 Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: Medium
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

I am running the following setup:

  • Kea (DHCPv4) installed on Debian 7 with a single pool configured
  • perfdhcp installed on the Debian 6 system directly connected to the server system
  • Memfile backend is used to handle lease assignments

Server starts successfully. I start perfdhcp on the client machine:

./perfdhcp -4 -R 65535 -r 10 -n 100 172.16.1.1

When perfdhcp is started, the server crashes immediately with the following error log:

b10-dhcp4: /usr/include/boost/smart_ptr/shared_ptr.hpp:418: boost::shared_ptr<T>::reference boost::shared_ptr<T>::operator*() const [with T = isc::dhcp::ClientId; boost::shared_ptr<T>::reference = isc::dhcp::ClientId&]: Assertion `px != 0' failed.
2013-04-30 15:14:36.026 INFO  [b10-init.init/4796] BIND10_PROCESS_ENDED process 4803 of b10-dhcp4 ended with status 6
2013-04-30 15:14:36.026 ERROR [b10-init.init/4796] BIND10_COMPONENT_FAILED component b10-dhcp4 (pid 4803) failed: process terminated with exit status 6 (killed by signal 6: SIGABRT)
2013-04-30 15:14:41.011 INFO  [b10-init.init/4796] BIND10_RECEIVED_SIGNAL received signal SIGINT
2013-04-30 15:14:41.011 INFO  [b10-init.init/4796] BIND10_SHUTDOWN stopping the server
2013-04-30 15:14:41.024 INFO  [b10-init.init/4796] BIND10_CONFIGURATOR_STOP bind10 component configurator is shutting down
2013-04-30 15:14:41.024 INFO  [b10-init.init/4796] BIND10_COMPONENT_STOP component Socket creator is being stopped
2013-04-30 15:14:41.024 INFO  [b10-init.init/4796] BIND10_SOCKCREATOR_TERMINATE terminating socket creator
2013-04-30 15:14:41.026 INFO  [b10-init.init/4796] BIND10_COMPONENT_STOP component msgq is being stopped
2013-04-30 15:14:41.028 INFO  [b10-init.init/4796] BIND10_COMPONENT_STOP component b10-cmdctl is being stopped
2013-04-30 15:14:41.028 INFO  [b10-init.init/4796] BIND10_STOP_PROCESS asking b10-cmdctl to shut down
2013-04-30 15:14:41.030 INFO  [b10-init.init/4796] BIND10_COMPONENT_STOP component cfgmgr is being stopped
2013-04-30 15:14:41.030 INFO  [b10-init.init/4796] BIND10_STOP_PROCESS asking cfgmgr to shut down
2013-04-30 15:14:41.030 INFO  [b10-init.init/4796] BIND10_COMPONENT_STOP component b10-stats is being stopped
2013-04-30 15:14:41.030 INFO  [b10-init.init/4796] BIND10_STOP_PROCESS asking b10-stats to shut down
2013-04-30 15:14:41.031 INFO  [b10-cmdctl.cmdctl/4802] CMDCTL_EXITING exiting
2013-04-30 15:14:41.038 INFO  [b10-cfgmgr.cfgmgr/4800] CFGMGR_STOPPED_BY_COMMAND received shutdown command, shutting down
2013-04-30 15:14:41.047 INFO  [b10-stats.stats/4804] STATS_RECEIVED_SHUTDOWN_COMMAND shutdown command received
2013-04-30 15:14:41.048 INFO  [b10-stats.stats/4804] STATS_EXITING exiting
2013-04-30 15:14:42.035 INFO  [b10-init.init/4796] BIND10_PROCESS_ENDED process 4797 of Socket creator ended with status 0
2013-04-30 15:14:42.035 INFO  [b10-init.init/4796] BIND10_PROCESS_ENDED process 4800 of cfgmgr ended with status 0
2013-04-30 15:14:42.036 INFO  [b10-init.init/4796] BIND10_PROCESS_ENDED process 4802 of b10-cmdctl ended with status 0
2013-04-30 15:14:42.036 INFO  [b10-init.init/4796] BIND10_PROCESS_ENDED process 4804 of b10-stats ended with status 0
2013-04-30 15:14:42.040 INFO  [b10-init.init/4796] BIND10_SEND_SIGTERM sending SIGTERM to msgq (PID 4798)
2013-04-30 15:14:42.145 INFO  [b10-init.init/4796] BIND10_PROCESS_ENDED process 4798 of msgq ended with status 0
2013-04-30 15:14:42.145 INFO  [b10-init.init/4796] BIND10_SHUTDOWN_COMPLETE all processes ended, shutdown complete

Similar issue has been discovered a few months back when using MySQL backend. It has been reported and resolved as #2723.

Subtickets

Attachments (1)

patch.2940 (8.2 KB) - added by dclink 6 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 6 years ago by dclink

  • Owner set to dclink
  • Status changed from new to assigned

Changed 6 years ago by dclink

comment:2 Changed 6 years ago by dclink

  • Owner changed from dclink to marcin
  • Status changed from assigned to reviewing

It happens when the client id is null. Marcin and I decided to create a new small method for Lease4 which returns an empty vector if client_id_ is null. Besides, the custom KeyExtractor? for lease4 storage was simply replaced by a const function access to this new method.
For last, a new small set of unit tests comes with (Plus a small update of ClientIdNull? one to check if it throws an exception...).

comment:3 Changed 6 years ago by marcin

  • Milestone changed from DHCP Outstanding Tasks to Sprint-DHCP-20131016

comment:4 Changed 6 years ago by marcin

I applied a patch and then made the following extensions to the code:

  • Created the unit tests for Lease4 and moved the unit test which tests returning client id as vector to this new file.
  • Extended the LeaseMgr unit tests to make more extensive use of leases with NULL client id.
  • Fixed typos and updated some doxygen comments.

Proposed ChangeLog entry:

XXX.	[bug]		dclink,marcin
	libdhcpsrv: Fixed a bug in Memfile lease database backend which
	caused DHCPv4 server crashes when leases with NULL client id
	were present.
	(Trac #2940, git cafebeef)

I am now in the process of system testing this code. Once I confirm that the server is not crashing anymore the ticket goes for the further review to someone else.

comment:5 Changed 6 years ago by marcin

  • Owner changed from marcin to UnAssigned

I tried to reproduce the problem described in this ticket after applying this patch and the crash doesn't show up anymore. So I concluded that the patch works and I move the ticket to review queue.

comment:6 Changed 6 years ago by tomek

  • Owner changed from UnAssigned to tomek

comment:7 follow-up: Changed 6 years ago by tomek

  • Owner changed from tomek to marcin

lease.cc
getClientIdVector() returns a copy. This has a potential of making a copy of every client-id in the db when we use specific search index. Please consider returning a reference.

memfile_lease_mgr.cc
Lines 83 and 127: please add a space after 'if'.

Would it be possible to modify mutlti-index container to do the same with DUIDs as it does with clientIds now? That would eliminate the need to use KeyFromKeyExtractor?. This is an honest question. No is a perfectly good answer here :)

Please move content of MemfileLeaseMgrTest?.{getLease4ClientId, getLease4NullClientId, getLease4ClientIdHWAddrSubnetId} to GenericLeaseMgrTest?. Those tests should be perfectly valid for reuse for MySQL (and upcoming PostgreSQL, too).

I do not have any specific preference how this move is being conducted, but it is very important to share those tests among all backends. It will be even more important with every backend we add.

comment:8 in reply to: ↑ 7 Changed 6 years ago by marcin

  • Owner changed from marcin to tomek

Replying to tomek:

lease.cc
getClientIdVector() returns a copy. This has a potential of making a copy of every client-id in the db when we use specific search index. Please consider returning a reference.

I now return a reference. That change required changes to the ClientId::getClientId function to return a const reference too. Otherwise, by using the local copy of the client id returned by the ClientId::getClientId() I would cause a segfault.

memfile_lease_mgr.cc
Lines 83 and 127: please add a space after 'if'.

Added.

Would it be possible to modify mutlti-index container to do the same with DUIDs as it does with clientIds now? That would eliminate the need to use KeyFromKeyExtractor?. This is an honest question. No is a perfectly good answer here :)

I was going to say No. But on reflection I decided to do it because it simplifies the definition of the multi index container a lot.

Please move content of MemfileLeaseMgrTest?.{getLease4ClientId, getLease4NullClientId, getLease4ClientIdHWAddrSubnetId} to GenericLeaseMgrTest?. Those tests should be perfectly valid for reuse for MySQL (and upcoming PostgreSQL, too).

I do not have any specific preference how this move is being conducted, but it is very important to share those tests among all backends. It will be even more important with every backend we add.

I moved them, but the MySQL backend still has private unit tests. We will have to unify these tests at some point.

comment:9 Changed 6 years ago by tomek

  • Milestone changed from Sprint-DHCP-20131016 to Sprint-DHCP-20131204

comment:10 Changed 6 years ago by tomek

  • Owner changed from tomek to marcin

Changes look good. Please merge.

Make sure you add the ChangeLog? entry and add a 'thanks' clause (see previous entry, just search for 'dclink').

comment:11 Changed 6 years ago by marcin

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged to master with commit a232f3d7d92ebcfb7793dc6b67914299c45c715b

Note: See TracTickets for help on using tickets.