Opened 8 years ago
Closed 5 years ago
#1753 closed task (wontfix)
use object pool for in-memory finder contexts
Reported by: | jinmei | Owned by: | jinmei |
---|---|---|---|
Priority: | medium | Milestone: | DNS Outstanding Tasks |
Component: | data source | Version: | |
Keywords: | Cc: | ||
CVSS Scoring: | Parent Tickets: | ||
Sensitive: | no | Defect Severity: | N/A |
Sub-Project: | DNS | Feature Depending on Ticket: | auth performance |
Estimated Difficulty: | 5 | Add Hours to Ticket: | 0 |
Total Hours: | 0 | Internal?: | no |
Description
By default ZoneFinder::Context needs to be allocated from heap
(by new) for every call to find(). The cost would not be negligible
for performance sensitive version like the one for the in-memory data
source.
I'd suggest introducing an object pool such as boost::object_pool
so the major resource of Context can be resused over multiple calls to
find(). I'd maintain the pool in the InMemoryClient that created
the finder and use the same pool for all zones.
The in-memory version of Context class will need to be "resettable" so
we release resources in the context when it goes back to the pool.
It will also require some modifications to the base Context call.
Subtickets
Change History (16)
comment:1 Changed 8 years ago by jelte
- Estimated Difficulty changed from 0 to 5
comment:2 Changed 8 years ago by jelte
- Milestone changed from Next-Sprint-Proposed to Sprint-20120320
comment:3 Changed 8 years ago by jinmei
- Owner set to jinmei
- Status changed from new to assigned
comment:4 follow-up: ↓ 7 Changed 8 years ago by jinmei
comment:5 Changed 8 years ago by jinmei
- Owner changed from jinmei to UnAssigned
- Status changed from assigned to reviewing
comment:6 Changed 8 years ago by vorner
- Owner changed from UnAssigned to vorner
comment:7 in reply to: ↑ 4 Changed 8 years ago by vorner
- Owner changed from vorner to jinmei
Hello
Replying to jinmei:
I believe the implementation is mostly straightforward, although it
has some dirty kludge due to the basic design issue of the entire
in-memory data source implementation (the "finder" does too many
things, and the relationship between client/finder/rrset is not well
clarified). I'd suggest revisiting these things as part of the
inevitable cleanup/refactoring.
Well, I don't find it too problematic, but the new check for compatible pointer breaks tests in Query:
[----------] 83 tests from QueryTest [ RUN ] QueryTest.noZone terminate called after throwing an instance of 'isc::InvalidParameter' what(): NULL or incompatible pointer is passed to InMemoryClient::addZone() /bin/sh: line 5: 29155 Aborted (core dumped) ${dir}$tst FAIL: run_unittests ===================================
But this task has a more fundamental issue: my local benchmark showed
it didn't improve performance much. In fact, sometimes it could be
slower than the code without this change. Some results:
Well, what I get from what I found about the pool:
- The constructor is still run. So there isn't less code for that. The only advantage would be the allocation of memory block for it.
- The memory pool helps when there are many same-sized objects that are born and die „chaotically“. The memory cache of the normal allocator gets fragmented and shifting through it gets slow, as the slabs (or how it is called) need to be split and joined again. But this is not the case ‒ there's usually one single Context object per the query that is returned right back. So the same object is being returned all over and over again from the pool, but the standard allocator probably doesn't get confused by this and has a bit of memory at hand for it every time.
So the real overhead would be having a shared pointer and virtual methods here, if any.
I don't mind dropping the idea at this point. But I'd at least try
the comparison again after #1767. Maybe if we eliminate the
construction overhead of the base class we see some difference.
OK, I agree here. But we really need the „dormant“ state of tickets for this O:-).
comment:8 Changed 8 years ago by jinmei
Replying to vorner:
Thanks for the review(? if you reviewed it or at least for the
comments).
Replying to jinmei:
I believe the implementation is mostly straightforward, although it
has some dirty kludge due to the basic design issue of the entire
in-memory data source implementation (the "finder" does too many
things, and the relationship between client/finder/rrset is not well
clarified). I'd suggest revisiting these things as part of the
inevitable cleanup/refactoring.
Well, I don't find it too problematic, but the new check for compatible pointer breaks tests in Query:
[----------] 83 tests from QueryTest [ RUN ] QueryTest.noZone terminate called after throwing an instance of 'isc::InvalidParameter' what(): NULL or incompatible pointer is passed to InMemoryClient::addZone() /bin/sh: line 5: 29155 Aborted (core dumped) ${dir}$tst FAIL: run_unittests ===================================
Ah, right, I was lazy not checking that. Although it may now be moot,
I've fixed it within this branch. Such cleanup would be a good thing
even if the main feature of this branch is dropped.
But this task has a more fundamental issue: my local benchmark showed
it didn't improve performance much. In fact, sometimes it could be
slower than the code without this change. Some results:
Well, what I get from what I found about the pool:
- The constructor is still run. So there isn't less code for that. The only advantage would be the allocation of memory block for it.
- The memory pool helps when there are many same-sized objects that are born and die „chaotically“. The memory cache of the normal allocator gets fragmented and shifting through it gets slow, as the slabs (or how it is called) need to be split and joined again. But this is not the case ‒ there's usually one single Context object per the query that is returned right back. So the same object is being returned all over and over again from the pool, but the standard allocator probably doesn't get confused by this and has a bit of memory at hand for it every time.
Yeah, that's basically my theory too.
So the real overhead would be having a shared pointer and virtual methods here, if any.
I don't mind dropping the idea at this point. But I'd at least try
the comparison again after #1767. Maybe if we eliminate the
construction overhead of the base class we see some difference.
OK, I agree here. But we really need the „dormant“ state of tickets for this O:-).
For now I'll remove this ticket from the review queue, still assigned
to me (so not to waste others' time in peeking into it).
comment:9 Changed 8 years ago by jinmei
- Status changed from reviewing to accepted
comment:10 Changed 8 years ago by jinmei
- Status changed from accepted to assigned
comment:11 Changed 8 years ago by jinmei
- Priority changed from high to medium
comment:12 follow-up: ↓ 13 Changed 8 years ago by shane
comment:13 in reply to: ↑ 12 Changed 8 years ago by jinmei
Replying to shane:
Just to be clear, this ticket is blocked waiting to look at it again
after #1767, right?
Right.
Should we remove this from the current sprint? (#1767 already mentions this ticket so it should not be forgotten.)
I don't know. I thought #1767 was intended to be addressed pretty
soon (it's a kind of 'hardening' stuff), so, for example, if we
include #1767 in the next sprint it's more reasonable to keep it here
and carry it over to the next sprint.
If it's too noisy to keep this ticket in the current sprint, I don't
mind removing it, but I'd be afraid we'll then forget to re-include it
with #1767.
comment:14 Changed 8 years ago by jinmei
- Milestone changed from Sprint-20120417 to Year 3 Task Backlog
(moving it to backlog as #1767 hasn't been adopted for this sprint)
comment:15 Changed 6 years ago by stephen
- Milestone set to DNS Outstanding Tasks
comment:16 Changed 5 years ago by tomek
- Resolution set to wontfix
- Status changed from assigned to closed
DNS and BIND10 framework is outside of scope for Kea project.
The corresponding code has been removed from Kea git repository.
If you want to follow up on DNS or former BIND10 issues, see
http://bundy-dns.de project.
Closing ticket.
trac1753 is *completed*.
I believe the implementation is mostly straightforward, although it
has some dirty kludge due to the basic design issue of the entire
in-memory data source implementation (the "finder" does too many
things, and the relationship between client/finder/rrset is not well
clarified). I'd suggest revisiting these things as part of the
inevitable cleanup/refactoring.
But this task has a more fundamental issue: my local benchmark showed
it didn't improve performance much. In fact, sometimes it could be
slower than the code without this change. Some results:
I used a simplified (without RRSIGs) root zone snapshot, and performed
queryperf and querybench using some realistic query samples and a
single NXDOMAIN-only query (with the latter find() would be called
twice and other overhead such as name compression should be small, so
this should be basically advantageous for this optimization).
I created two brannches for the experiment, trac1753bench0 and
trac1753bench1. Both include major optimizations we made so far
(including the ones not merged to master yet), and bench1 contains the
changes in trac1753 (bench0 doesn't).
queryperf result:
querybench result
Basically, I didn't see any significant difference between the two
implementations.
I don't mind dropping the idea at this point. But I'd at least try
the comparison again after #1767. Maybe if we eliminate the
construction overhead of the base class we see some difference.
I'll put this to the review queue anyway.