#5654 closed defect (fixed)

HA: correct 'max-unacked-clients' function in hot standby mode

Reported by: marcin Owned by: marcin
Priority: medium Milestone: Kea1.5
Component: high-availability Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Consider the following case. The HA is operating in the hot standby mode. The primary server goes down and the secondary keeps sending heartbeats to this server. After a while it finds that the heartbeats aren't answered, so it should start looking into the DHCP traffic directed to the primary for delays. However, the current code would do the following:

        if (network_state_->isServiceEnabled() &&
            ((config_->getHAMode() == HAConfig::LOAD_BALANCING) ||
             (config_->getThisServerConfig()->getRole() == HAConfig::PeerConfig::STANDBY))) {
            return (communication_state_->failureDetected());
        }

which only does the delay detection when the DHCP function is enabled. The secondary server serves no scopes in the hot standby mode, so normally the DHCP service will be disabled. We didn't find this issue because, apparently, during syncing between the servers there is a timer set for enabling the service after 60 seconds. So, it probably gets enabled on its own and the whole thing works most of the time. The only time when it doesn't work is when the primary shows up and goes down again frequently. In this case, the secondary goes to partner-down state without checking the delays in DHCP responses.

Subtickets

Change History (8)

comment:1 Changed 20 months ago by marcin

  • Component changed from documentation to hook-lease-cmds

comment:2 Changed 20 months ago by marcin

  • Component changed from hook-lease-cmds to high-availability

comment:3 Changed 20 months ago by marcin

  • Milestone changed from Kea-proposed to Kea1.5

As this seems to be a significant issue in the HA hooks library I am moving this to 1.5 after checking this with Tomek.

comment:4 Changed 20 months ago by marcin

  • Owner set to marcin
  • Status changed from new to accepted

comment:5 Changed 20 months ago by marcin

  • Owner changed from marcin to UnAssigned
  • Status changed from accepted to reviewing

This ticket is now ready for review. Proposed ChangeLog entry:

14XX.	[bug]		marcin
	Corrected behavior of the standby server in the HA hot-standby
	mode, which failed to monitor delays in responses to the
	DHCP queries sent to the primary server after the primary
	server became unavailable. This resulted in transition of
	the standby server to the partner-down state immediately
	after detecting interruption in communication with the
	primary over the control channel.
	(Trac #5654, git cafe)

comment:6 Changed 20 months ago by fdupont

  • Owner changed from UnAssigned to fdupont

comment:7 Changed 20 months ago by fdupont

  • Owner changed from fdupont to marcin

Patrtner -> Partner, I'll run ispell on the diff... It found patner and reponse too.

Reading the code it seems sane, it compiles and passes tests.

comment:8 Changed 20 months ago by marcin

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged with commit 7a83f05fe40fb1b6812b055e2d6d633d9e00160c.

Note: See TracTickets for help on using tickets.