Opened 4 years ago

Closed 4 years ago

#4493 closed defect (fixed)

perfdhcp: inefficient removal of timed out transactions causes outbound rate degradation

Reported by: marcin Owned by: marcin
Priority: medium Milestone: Kea1.1
Component: perfdhcp Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: Low
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

When running perfdhcp at the specified desired rate, the actual rate is gradually decreasing due to inefficient indexing of outbound packets being matched with server responses. The graphs showing this issue are attached to the ticket.

Subtickets

Attachments (1)

perfdhcp-issue.png (108.6 KB) - added by marcin 4 years ago.

Download all attachments as: .zip

Change History (9)

Changed 4 years ago by marcin

comment:1 Changed 4 years ago by marcin

  • Owner set to UnsAssigned
  • Status changed from new to reviewing
  • Summary changed from perfdhcp: inefficient sent packet indexing causes outbound rate degradation to perfdhcp: inefficient removal of timed out transactions causes outbound rate degradation

perfdhcp is using quite complex algorithm for matching inbound packets with requests sent to the DHCP server. Part of this is "unordered" packet lookup when perfdhcp retrieves a "bucket" with packets and iterates over them to find the transaction id match. While it is doing so, the timed out packets are being removed. If it finds a transaction id match, it stops the iterations and doesn't remove any more timed out packets in the bucket. As a result, the list of sent packets is growing which increases memory consumption which will eventually result in perfdhcp crash. Before the crash occurs, the perfdhcp outbound rate is gradually falling because of the need to work on large packet buckets.

With this ticket, I make sure that all timed out packets are removed from the bucket selected for packet matching. This means that perfdhcp spends a little while more on packet matching as it also needs to process additional packets to check if they should be timed out. But, that results in working on smaller buckets and thus less time needed for packet matching overall. It also guarantees relatively low memory consumption throughout the test.

Proposed ChangeLog:

11XX.	[bug]		marcin
	perfdhcp: Improved algorithm for dropping timed out transactions.
	This prevents growing memory consumption due to storing timed
	out transactions when the DHCP server drops many messages.
	(Trac #4493, git abcd)

comment:2 Changed 4 years ago by marcin

  • Owner changed from UnsAssigned to marcin

I am reassigning this ticket to myself to address one more issue with how we present the server response rate. This is currently the number of new DO or SA exchanges, while it should be the number of ACKs/Replies over time. This should however exclude the renewal exchanges.

comment:3 Changed 4 years ago by marcin

  • Owner changed from marcin to UnAssigned

I have updated the output containing statistics. There still seem to be some things to be improved:

  • Better reporting of Solicit-Reply (rapid commit case). Currently we don't have Solicit-Reply exchange defined
  • Sanity checks on received packets: was a lease really assigned, was Rapid Commit supported etc.

However, these are out of scope for this ticket. I will create another ticket to cover it.

comment:4 Changed 4 years ago by fdupont

  • Owner changed from UnAssigned to fdupont

comment:5 Changed 4 years ago by fdupont

  • Owner changed from fdupont to marcin

As far as I can remember the packets in buckets are in timeout order so the patch can do extra work, i.e., you move from remove timeout packets until the matching one is found to independently remove timeout packets and look for the matching one. But for the first part the stop condition is there is no further timeout packets. Note I don't suggest to do 2 passes but to record the 2 stop conditions
and to skip each test when the corresponding condition was reached (so for instance the getTransid() == should get a !packet_found first).
To check the 2 conditions are independent I suggest to add a test to match a timeout packet.
And ispell on the diff returned: orhpans -> orphans

comment:6 Changed 4 years ago by marcin

  • Owner changed from marcin to fdupont

The previous code was using hashed_non_unique index and thus the order of elements (even within the bucket) was not guaranteed. I have changed that type of the index to ordered_non_unique, in which case I could assume that when I find the first non-expired transaction I can stop doing further checks.

As for the test to match a timeout packet I actually have such test there already. Note that the SendReceiveCollected invokes 10 tests for matching different transaction ids, including those timed out.

comment:7 Changed 4 years ago by fdupont

  • Owner changed from fdupont to marcin

Fine. As Wlodek already verified it fixes the issue there is no reason to not merge (with a ChangeLog please, perhaps we are not the only users of perfdhcp :-).

comment:8 Changed 4 years ago by marcin

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged with commit 9757a93110afb82c5379643f2f48e223d497efae and added ChangeLog? entry too.

Note: See TracTickets for help on using tickets.