#5611 closed defect (duplicate)

During traffic spikes that exceed Kea's throughput capacity, handle backlog more effectively

Reported by: cathya Owned by: UnAssigned
Priority: high Milestone: Kea1.5
Component: Unclassified Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: Very High
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

The current Kea implementation processes the inbound socket buffer as a simple queue - first in, first out. When the server is under pressure and not handling client packets as fast as they are arriving, a backlog will build up.

If the situation continues for long enough, the client packets that the server is handling will have already timed-out on the client side, so it is pointless to spend time processing them and moreover wasting time on these old packets prevents the server from handling newer packets until they too have timed out. Effectively, it stops responding to active clients because it never gets through the backlog fast enough to reach the most recent inbounds.

Even though the initial spike in traffic may have subsided, the degraded performance can mean that clients change their behaviour, adding retries to the backlog and/or reverting back to initial discovery - thus increasing the backlog of packets to be processed and making recovery unlikely without restarting the server to clear things down.

We need to handle this situation better so that even when swamped, Kea servers are able to process a proportion of recently-received client packets, instead of none of them because it's 'stuck' with the oldest ones instead.

Suggestions being mooted so far suggest either an independent socket reading thread (or process) to manage the inbound traffic and to pull it off the sockets/interfaces on which the Kea server is listening. This will prevent the UDP buffers from overflowing as well as allowing the socket reader to apply better logic to:

  • discarding the oldest client packets in favour of the most recently received
  • managing the 'waiting' buffers appropriately to the throughput capacity of the server

Maximum per-server throughput will be highly dependent on both configuration and the choice of back-end (e.g database, or memfile, and if database, how and where etc..) - so it would be good to have the I/O handler be tunable too - not discarding too soon for a fast server and so on.

There's no clear operational mitigation strategy for this, other than ensuring sufficient headroom when provisioning so that there are no peaks in client traffic that can overwhelm the server(s) maximum capacity.

(Notably, increasing inbound UDP buffers is likely to make the situation worse rather than better.)

Subtickets

Change History (7)

comment:1 Changed 18 months ago by tomek

  • Milestone changed from Kea-proposed to Kea1.5

comment:2 Changed 17 months ago by fdupont

Associated with #5555 so I suppose this one is about design. At least I can see 3 things we can do:

1- manage kernel socket buffers. It is easy (there is a standard ioctl() to do this) but not the best as when its queue is full the kernel drops new incoming packets when it is clearly the oldest one which should be dropped.

2- when incoming packets are read instead of reading one read all of them (a) in a ring buffer (b).

a- using ioctl FIONREAD to check if there is still something to read
b- a ring buffer will drop the oldest read packet

An extra complexity is when more than one socket is available for reading. Fortunately this condition is returned by the select() system call so should not be too hard to handle.

3- the last option is to perform the 2- activity in a separated thread. Note there are lock free (at least on common CPUs) structures we can use with C++11.

comment:3 Changed 17 months ago by fdupont

  • Owner set to fdupont
  • Status changed from new to accepted

comment:4 Changed 17 months ago by fdupont

I am writing a proposal (design of a receiver thread / queue).

comment:5 Changed 17 months ago by fdupont

  • Owner changed from fdupont to UnAssigned
  • Status changed from accepted to reviewing

Initial proposal in https://kea.isc.org/wiki/Receiver

comment:6 Changed 17 months ago by tomek

  • Priority changed from very high to high

comment:7 Changed 15 months ago by tomek

  • Resolution set to duplicate
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.