Opened 8 years ago

Closed 5 years ago

#1753 closed task (wontfix)

use object pool for in-memory finder contexts

Reported by: jinmei Owned by: jinmei
Priority: medium Milestone: DNS Outstanding Tasks
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket: auth performance
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

By default ZoneFinder::Context needs to be allocated from heap
(by new) for every call to find(). The cost would not be negligible
for performance sensitive version like the one for the in-memory data
source.

I'd suggest introducing an object pool such as boost::object_pool
so the major resource of Context can be resused over multiple calls to
find(). I'd maintain the pool in the InMemoryClient that created
the finder and use the same pool for all zones.

The in-memory version of Context class will need to be "resettable" so
we release resources in the context when it goes back to the pool.
It will also require some modifications to the base Context call.

Subtickets

Change History (16)

comment:1 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 5

comment:2 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120320

comment:3 Changed 8 years ago by jinmei

  • Owner set to jinmei
  • Status changed from new to assigned

comment:4 follow-up: Changed 8 years ago by jinmei

trac1753 is *completed*.

I believe the implementation is mostly straightforward, although it
has some dirty kludge due to the basic design issue of the entire
in-memory data source implementation (the "finder" does too many
things, and the relationship between client/finder/rrset is not well
clarified). I'd suggest revisiting these things as part of the
inevitable cleanup/refactoring.

But this task has a more fundamental issue: my local benchmark showed
it didn't improve performance much. In fact, sometimes it could be
slower than the code without this change. Some results:

I used a simplified (without RRSIGs) root zone snapshot, and performed
queryperf and querybench using some realistic query samples and a
single NXDOMAIN-only query (with the latter find() would be called
twice and other overhead such as name compression should be small, so
this should be basically advantageous for this optimization).

I created two brannches for the experiment, trac1753bench0 and
trac1753bench1. Both include major optimizations we made so far
(including the ones not merged to master yet), and bench1 contains the
changes in trac1753 (bench0 doesn't).

queryperf result:

w/ 1753
  real query data: Queries per second:   18163.467979 qps
  NXDOMAIN only: Queries per second:     24534.480037 qps
w/o 1753
  real: Queries per second:              18174.745761 qps
  NXDOMAIN: Queries per second:          23972.857990 qps
FYI: bind10-devel-20120301
  real: Queries per second:   6458.807839 qps

querybench result

w/ 1753
real: Processed 965600 queries in 26.074064s (37032.97qps)
NXDOMAIN: Processed 1000000 queries in 16.556643s (60398.72qps)
w/o 1753
real: Processed 965600 queries in 25.970008s (37181.35qps)
NXDOMAIN: Processed 1000000 queries in 16.015743s (62438.56qps)

Basically, I didn't see any significant difference between the two
implementations.

I don't mind dropping the idea at this point. But I'd at least try
the comparison again after #1767. Maybe if we eliminate the
construction overhead of the base class we see some difference.

I'll put this to the review queue anyway.

comment:5 Changed 8 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from assigned to reviewing

comment:6 Changed 8 years ago by vorner

  • Owner changed from UnAssigned to vorner

comment:7 in reply to: ↑ 4 Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

I believe the implementation is mostly straightforward, although it
has some dirty kludge due to the basic design issue of the entire
in-memory data source implementation (the "finder" does too many
things, and the relationship between client/finder/rrset is not well
clarified). I'd suggest revisiting these things as part of the
inevitable cleanup/refactoring.

Well, I don't find it too problematic, but the new check for compatible pointer breaks tests in Query:

[----------] 83 tests from QueryTest
[ RUN      ] QueryTest.noZone
terminate called after throwing an instance of 'isc::InvalidParameter'
  what():  NULL or incompatible pointer is passed to InMemoryClient::addZone()
/bin/sh: line 5: 29155 Aborted                 (core dumped) ${dir}$tst
FAIL: run_unittests
===================================

But this task has a more fundamental issue: my local benchmark showed
it didn't improve performance much. In fact, sometimes it could be
slower than the code without this change. Some results:

Well, what I get from what I found about the pool:

  • The constructor is still run. So there isn't less code for that. The only advantage would be the allocation of memory block for it.
  • The memory pool helps when there are many same-sized objects that are born and die „chaotically“. The memory cache of the normal allocator gets fragmented and shifting through it gets slow, as the slabs (or how it is called) need to be split and joined again. But this is not the case ‒ there's usually one single Context object per the query that is returned right back. So the same object is being returned all over and over again from the pool, but the standard allocator probably doesn't get confused by this and has a bit of memory at hand for it every time.

So the real overhead would be having a shared pointer and virtual methods here, if any.

I don't mind dropping the idea at this point. But I'd at least try
the comparison again after #1767. Maybe if we eliminate the
construction overhead of the base class we see some difference.

OK, I agree here. But we really need the „dormant“ state of tickets for this O:-).

comment:8 Changed 8 years ago by jinmei

Replying to vorner:

Thanks for the review(? if you reviewed it or at least for the
comments).

Replying to jinmei:

I believe the implementation is mostly straightforward, although it
has some dirty kludge due to the basic design issue of the entire
in-memory data source implementation (the "finder" does too many
things, and the relationship between client/finder/rrset is not well
clarified). I'd suggest revisiting these things as part of the
inevitable cleanup/refactoring.

Well, I don't find it too problematic, but the new check for compatible pointer breaks tests in Query:

[----------] 83 tests from QueryTest
[ RUN      ] QueryTest.noZone
terminate called after throwing an instance of 'isc::InvalidParameter'
  what():  NULL or incompatible pointer is passed to InMemoryClient::addZone()
/bin/sh: line 5: 29155 Aborted                 (core dumped) ${dir}$tst
FAIL: run_unittests
===================================

Ah, right, I was lazy not checking that. Although it may now be moot,
I've fixed it within this branch. Such cleanup would be a good thing
even if the main feature of this branch is dropped.

But this task has a more fundamental issue: my local benchmark showed
it didn't improve performance much. In fact, sometimes it could be
slower than the code without this change. Some results:

Well, what I get from what I found about the pool:

  • The constructor is still run. So there isn't less code for that. The only advantage would be the allocation of memory block for it.
  • The memory pool helps when there are many same-sized objects that are born and die „chaotically“. The memory cache of the normal allocator gets fragmented and shifting through it gets slow, as the slabs (or how it is called) need to be split and joined again. But this is not the case ‒ there's usually one single Context object per the query that is returned right back. So the same object is being returned all over and over again from the pool, but the standard allocator probably doesn't get confused by this and has a bit of memory at hand for it every time.

Yeah, that's basically my theory too.

So the real overhead would be having a shared pointer and virtual methods here, if any.

I don't mind dropping the idea at this point. But I'd at least try
the comparison again after #1767. Maybe if we eliminate the
construction overhead of the base class we see some difference.

OK, I agree here. But we really need the „dormant“ state of tickets for this O:-).

For now I'll remove this ticket from the review queue, still assigned
to me (so not to waste others' time in peeking into it).

comment:9 Changed 8 years ago by jinmei

  • Status changed from reviewing to accepted

comment:10 Changed 8 years ago by jinmei

  • Status changed from accepted to assigned

comment:11 Changed 8 years ago by jinmei

  • Priority changed from high to medium

comment:12 follow-up: Changed 8 years ago by shane

Just to be clear, this ticket is blocked waiting to look at it again after #1767, right? Should we remove this from the current sprint? (#1767 already mentions this ticket so it should not be forgotten.)

comment:13 in reply to: ↑ 12 Changed 8 years ago by jinmei

Replying to shane:

Just to be clear, this ticket is blocked waiting to look at it again
after #1767, right?

Right.

Should we remove this from the current sprint? (#1767 already mentions this ticket so it should not be forgotten.)

I don't know. I thought #1767 was intended to be addressed pretty
soon (it's a kind of 'hardening' stuff), so, for example, if we
include #1767 in the next sprint it's more reasonable to keep it here
and carry it over to the next sprint.

If it's too noisy to keep this ticket in the current sprint, I don't
mind removing it, but I'd be afraid we'll then forget to re-include it
with #1767.

comment:14 Changed 8 years ago by jinmei

  • Milestone changed from Sprint-20120417 to Year 3 Task Backlog

(moving it to backlog as #1767 hasn't been adopted for this sprint)

comment:15 Changed 6 years ago by stephen

  • Milestone set to DNS Outstanding Tasks

comment:16 Changed 5 years ago by tomek

  • Resolution set to wontfix
  • Status changed from assigned to closed

DNS and BIND10 framework is outside of scope for Kea project.
The corresponding code has been removed from Kea git repository.
If you want to follow up on DNS or former BIND10 issues, see
http://bundy-dns.de project.

Closing ticket.

Note: See TracTickets for help on using tickets.