#5603 closed enhancement (complete)

HA: handle clocks skew

Reported by: marcin Owned by: marcin
Priority: medium Milestone: Kea1.4-final
Component: high-availability Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

The existing heartbeat command returns the timestamp of the partner. The local server can use it to determine the clock skews. The HA hook library should handle the clock skews to at least determine which of them has a newer information about the lease. Currently we don't handle that. We assume that the clocks on both servers are exactly the same.

Subtickets

Change History (8)

comment:1 Changed 18 months ago by marcin

  • Milestone changed from Kea-proposed to Kea1.4-final

Moving to 1.4 final per Kea calL on April 26th.

comment:2 Changed 17 months ago by marcin

  • Owner set to marcin
  • Status changed from new to accepted

comment:3 Changed 17 months ago by marcin

  • Owner changed from marcin to UnAssigned
  • Status changed from accepted to reviewing

This ticket is now ready for review. The changes are both in the premium and the main repository. In the main repository it is merely the User's Guide update.

It introduces a new state, terminated, which the HA state machine transitions to if the clocks are more than 60s apart. If the clocks are more than 30s apart or more, a warning is issued.

Proposed ChangeLog for premium:

XX.	[func]		marcin
	The HA hook library monitors clock skew between the active
	servers and the HA service is stopped if the clock skew exceeds
	60 seconds. If it exceeds 30 seconds a warning message is
	is issued.
	(Trac #5603, git cafe)

and for the main repo:

14XX.	[doc]		marcin
	Documented in the User's Guide how Kea HA service behaves
	when the clock skew between active servers becomes too
	high.
	(Trac #5603, git cafe)

comment:4 Changed 17 months ago by tmark

  • Owner changed from UnAssigned to tmark

comment:5 follow-up: Changed 17 months ago by tmark

  • Owner changed from tmark to marcin

Made some minor wording edits in both both repos so please pull first, to pick umy changes.

Everything builds and unit tests pass under Centos 7.


You may wish to note this "

/ @note Currently, restarting the HA service requires restarting the
/ DHCP server. In the future, we will provide a command to restart
/ the HA service."

in the user's guide under the new "terminated" state section.


Q: What happens if a user reconfigures a server in "terminated" state? Will
it attempt re-enter normal operations?


HAService::shouldTerminate()

Once you have servers that are > 30 seconds apart, the warning log will get
emitted with every call to this method, which could be pretty often.
I'm thinking you might need some sort of gating mechanism, so it only logs
every so often, like once every 30 seconds or so.

HA_HIGH_CLOCK_SKEW_CAUSES_TERMINATION this should be an error not a warning.

comment:6 in reply to: ↑ 5 Changed 17 months ago by marcin

  • Owner changed from marcin to tmark

Replying to tmark:

Made some minor wording edits in both both repos so please pull first, to pick umy changes.

Everything builds and unit tests pass under Centos 7.

Thanks.


You may wish to note this "

/ @note Currently, restarting the HA service requires restarting the
/ DHCP server. In the future, we will provide a command to restart
/ the HA service."

in the user's guide under the new "terminated" state section.

The note added.


Q: What happens if a user reconfigures a server in "terminated" state? Will
it attempt re-enter normal operations?

Reconfigure should also result in restarting the state machine, but I think I'd prefer if people restart.


HAService::shouldTerminate()

Once you have servers that are > 30 seconds apart, the warning log will get
emitted with every call to this method, which could be pretty often.
I'm thinking you might need some sort of gating mechanism, so it only logs
every so often, like once every 30 seconds or so.

Added a gating mechanism. I couldn't just agree on what you said so I made it 60 seconds minimum time rather than 30. I guess, 1 minute is also pretty frequent but I don't want it to be more often than heartbeats (in some configurations).

HA_HIGH_CLOCK_SKEW_CAUSES_TERMINATION this should be an error not a warning.

Right. I had intended to make it an error but copy-pasted from somewhere else.

comment:7 Changed 17 months ago by tmark

  • Owner changed from tmark to marcin

Changes are fine. Please merge.

comment:8 Changed 17 months ago by marcin

  • Resolution set to complete
  • Status changed from reviewing to closed

Merged with commits 3ecc84eb38651466ec78470eaa0ef7bbf791c4b5 and ffaff4d2a03600bb4f81d335b49a840e31d03c8c.

Note: See TracTickets for help on using tickets.