Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#5333 closed defect (fixed)

kea can't open socket after reconfigure

Reported by: wlodekwencel Owned by: fdupont
Priority: medium Milestone: Kea1.3 beta
Component: dhcp4 Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 1
Total Hours: 1 Internal?: no

Description

After reconfigure kea4 is unable to open socket just logging:

DHCPSRV_OPEN_SOCKET_FAIL failed to open socket: failed to open socket on interface eth2, reason: failed to bind fallback socket to address 192.168.50.253, port 67, reason: Address already in use - is another DHCP server running?

Looks like during reconfiguration old socket is not closed.

Kea logs attached shows scenario:

  1. start kea
  2. reconfigure kea (it is not able to respond)
  3. close kea
  4. start kea
  5. it's now creating socket without problem and respond messages

That regression was possibly introduced with d996bf25

Subtickets

Attachments (1)

kealog (17.3 KB) - added by wlodekwencel 2 years ago.

Download all attachments as: .zip

Change History (18)

Changed 2 years ago by wlodekwencel

comment:1 follow-up: Changed 2 years ago by fdupont

Perhaps it is an unwanted side-effect of #3389 (branch 3389a). I recommend:

  • use srace (Linux) to get system calls
  • retry with interface re-detection disabled.

BTW the socket is opened with SO_REUSEADDR so this should not happen...

comment:2 in reply to: ↑ 1 Changed 2 years ago by marcin

Replying to fdupont:

Perhaps it is an unwanted side-effect of #3389 (branch 3389a). I recommend:

  • use srace (Linux) to get system calls
  • retry with interface re-detection disabled.

BTW the socket is opened with SO_REUSEADDR so this should not happen...

Opening with SO_REUSEADDR might improve the situation but the problem is not eliminated. In my opinion, the following change is a problem:

18:36:07 marcin: MacBook-Pro-Marcin:parsers marcin$ git blame -L48,49 ifaces_config_parser.cc
d996bf25 (Francis Dupont 2017-03-16 23:38:52 +0100 48)         IfaceMgr::instance().clearIfaces();
d996bf25 (Francis Dupont 2017-03-16 23:38:52 +0100 49)         IfaceMgr::instance().detectIfaces();

It clears interface information including sockets open for each of the interfaces. That way, when closeSockets() is called it has no effect because the sockets structure looses information about the open sockets. The information about opened sockets should be preserved before calling clearIfaces.

comment:3 Changed 2 years ago by fdupont

  • Milestone changed from Kea-proposed to Kea1.3
  • Owner set to fdupont
  • Status changed from new to accepted

Moves to 1.3 as it was introduced by a 1.3 ticket and is clearly to be fixed ASAP.

comment:4 Changed 2 years ago by fdupont

  • Add Hours to Ticket changed from 0 to 1
  • Owner changed from fdupont to UnAssigned
  • Status changed from accepted to reviewing
  • Total Hours changed from 0 to 1

comment:5 Changed 2 years ago by fdupont

Ready for review but the closeSockets is not reverted if the re-config fails. Note it is not enough to move the interface list parse=ing later as it will allow to specify a dangling interface in subnets.
IMHO the interface manager code should be revamped to add a commit / rollback phase to a new not destructive re-scan. As it is a major change I postpone it into a new ticket. For now I propose to add a known problem asking to reload a working config when a re-config failed... note it is already required for hook problems.

comment:6 Changed 2 years ago by marcin

  • Owner changed from UnAssigned to marcin

I have reviewed your change and tested that it resolve the problem stated in the description. It seems that two things are now needed:

  • ChangeLog entry proposal,
  • A new ticket which covers commit /rollback in interface manager.

Can you please provide those?

Last edited 2 years ago by marcin (previous) (diff)

comment:7 Changed 2 years ago by marcin

  • Owner changed from marcin to fdupont

comment:8 Changed 2 years ago by fdupont

I can answer to the second with doing one or two other things in parallel: #5334
Waiting for next free time for the ChangeLog (if you have no proposal) as this fix adds a new way to shoot in one's feet.

comment:9 Changed 2 years ago by fdupont

As to merge it we have first to agree about the ChangeLog (i.e. we can change it after but not add one later).
I have at least 2 proposals, depending on whether #5334 will be done.

1- Now all interface service sockets are closed before interface re-detection.
2- Now all interface service sockets are closed before interface re-detection. Note if the re-configuration fails they remain closed.

We can do more than 2 but I expect we shan't stay with this, i.e. either #5334 will be included in 1.3, addressed and merged, or re-config call sequence will be changed to reopen sockets even on failure.
BTW as a subnet can refer to a disappearing interface it is not possible to simply move the code in order to lower the chance of a too late failure as it was done for hooks.

comment:10 Changed 2 years ago by fdupont

  • Owner changed from fdupont to marcin

comment:11 Changed 2 years ago by fdupont

I changed the Linux code to use addInterface as others.

comment:12 Changed 2 years ago by fdupont

  • Owner changed from marcin to UnAssigned

As Marcin can't confirm it is ready before being back from holidays I reset the ownership.

comment:13 Changed 2 years ago by tmark

  • Owner changed from UnAssigned to tmark

comment:14 follow-up: Changed 2 years ago by tmark

Changes function as described so go ahead and merge them. I notice that syntax errors rollback Ok, because they're caught before we clear the interfaces and re-scan so reloading isn't always necessary. We should probably try to describe this in the known issues entry. Too bad we can't get this fixed in 1.3, this is pretty obnoxious issue.


comment:15 Changed 2 years ago by tmark

  • Owner changed from tmark to fdupont

comment:16 in reply to: ↑ 14 Changed 2 years ago by fdupont

  • Resolution set to fixed
  • Status changed from reviewing to closed

Replying to tmark:

Changes function as described so go ahead and merge them.

=> merged.

I notice that syntax errors rollback Ok, because they're caught before we clear the interfaces and re-scan so reloading isn't always necessary.

=> IMHO the best test is to add a subnet with an not-existing (or no longer existing) interface

We should probably try to describe this in the known issues entry.

=> yes, this should be described in known issues.

Too bad we can't get this fixed in 1.3, this is pretty obnoxious issue.

=> I'll propose again #5334 for 1.3-final.

Closing...

comment:17 Changed 2 years ago by vicky

  • Milestone changed from Kea1.3 to Kea1.3 beta

Milestone renamed

Note: See TracTickets for help on using tickets.