Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#1607 closed task (complete)

define ZoneFinder::findAdditional() and implement its default version

Reported by: jinmei Owned by: jinmei
Priority: medium Milestone: Sprint-20120320
Component: b10-auth Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket: auth performance
Estimated Difficulty: 4 Add Hours to Ticket: 0
Total Hours: 30.87 Internal?: no

Description

This is a subtask for "Pre-establish shortcuts for additional section processing" described in
https://lists.isc.org/pipermail/bind10-dev/2012-January/002985.html

We add a new method findAdditional() to the ZoneFinder class. It
would look like as follows:

void
ZoneFinder::findAdditional(const isc::dns::AbstractRRset& rrset,
                           const std::vector<isc::dns::RRType>& requested_types,
                           std::vector<isc::dns::ConstRRsetPtr>& result,
                           FindOptions options);

We provide the default implementation of it. The default version
behaves Query::addAdditional() and Query::addAdditionalAddrs()
currently do. Query::addAdditional() will now just call
the new method. requested_types stores the desired RR types to be
returned as associated (additional) data for rrset. In practice
it will be either or both of A and AAAA. findAdditional() returns any
found RRsets in the result variable. In options we can specify
whether glue is okay or not.

With this update Query::addAdditionalAddrs() will become unnecessary
so should be removed.

Subtickets

Change History (21)

comment:1 Changed 8 years ago by jinmei

  • Type changed from defect to task

comment:2 Changed 8 years ago by jelte

  • Milestone changed from Year 3 Task Backlog to Next-Sprint-Proposed

comment:3 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 4

comment:4 Changed 8 years ago by jelte

  • Milestone changed from Year 3 Task Backlog to Next-Sprint-Proposed

comment:5 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120306

comment:6 Changed 8 years ago by jinmei

  • Owner set to jinmei
  • Status changed from new to accepted

comment:7 Changed 8 years ago by jinmei

trac1607 is ready for review.

I've extended the scope of this ticket as discussed on the dev list.
See the one with the subject of "proposal: a performance extendable
auth query logic framework".

Unfortunately the amount of diff is big (again...), but hopefully it's
not as bad as it might look; most part of the changes are
straightforward refactoring due to the interface change.

  • the first two commits are relatively straightforward refactoring but the diff is big, so it's probably better to review these separately.
  • 118f1f9 and ac29935 are unrelated extension for test utility and can be reviewed separately.
  • from 4bcec5f to 1633572 (excluding 118f1f9 and ac29935) are the main changes for this branch.
  • 29fab38 and 6f31530 are updates to the auth::Query class due to the interface change. I believe these are straightforward.
  • 64611b9 is an unrelated optimization. If it looks too out-of-scope I'm okay with excluding it.

This is the proposed changelog entry:

392.?	[func]*		jinmei
	libdatasrc: change the return type of ZoneFinder::find() so it can
	contain more context of the search, which can be used for
	optimizing post find() processing.  A new method getAdditional()
	is added to it for finding additional RRsets based on the result
	of find().  External behavior shouldn't change.  The query
	handling code of b10-auth now uses the new interface.
	(Trac #1607, git TBD)

comment:8 Changed 8 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from accepted to reviewing

comment:9 Changed 8 years ago by vorner

  • Owner changed from UnAssigned to vorner

comment:10 follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

I'm thinking, if we're returning such a complicated result already, do we need a special interface for findAll? Wouldn't it be better to just have the context be able to fill all the results into a target passed to it, no matter if there's one or many? The context already has a copy of the vector for the findAll case anyway and it could simplify processing both inside the context and outside.

Another thing, maybe we want to have a abstract base class that contains no data members like the vector for the all thing. I could imagine the in-memory remembering the node only and iterating that one when needed, instead of copying it out to a vector to iterate later.

The python interface is not changed. Is it planned to any other ticket? Because it would not be possible to call the getAdditional, etc, from python right now.

Also, could a test check that the vector passed to getAdditional is not cleaned by it, that is, the data are just appended? I think all tests are done with empty vector now.

I noticed an indentation problem here:

@@ -821,17 +826,17 @@ DatabaseClient::Finder::findNoNameResult(const Name& name, const RRType& type,
             arg(accessor_->getDBName()).arg(name);
         const ConstRRsetPtr nsec = dnssec_data ? findNSECCover(name) :
             ConstRRsetPtr();
-        return (FindResult(NXRRSET, nsec,
-                           nsec ? RESULT_NSEC_SIGNED : RESULT_DEFAULT));
+        return (ResultContext(NXRRSET, nsec,
+                        nsec ? RESULT_NSEC_SIGNED : RESULT_DEFAULT));
     } else if ((options & NO_WILDCARD) == 0) {

comment:11 in reply to: ↑ 10 ; follow-up: Changed 8 years ago by jinmei

Replying to vorner:

Thanks for the review.

I'm thinking, if we're returning such a complicated result already, do we need a special interface for findAll? Wouldn't it be better to just have the context be able to fill all the results into a target passed to it, no matter if there's one or many? The context already has a copy of the vector for the findAll case anyway and it could simplify processing both inside the context and outside.

Hmm, good point. I didn't think about it but that seems to make some
sense. I personally still think it makes sense to define findAll()
separately from find() rather than relying on the implicit meaning of
qtype=ANY, but at least we can unify the interface.

Would you like to introduce the change now, or should it go to a
separate ticket?

Another thing, maybe we want to have a abstract base class that contains no data members like the vector for the all thing. I could imagine the in-memory remembering the node only and iterating that one when needed, instead of copying it out to a vector to iterate later.

Right. I was aware of that, and in fact I liked the clearer
separation of abstract interface from stuff that may not be used by
all derived classes. I still pushed vector, etc, to the base class
because if a derived class wants partial customization (e.g., use
customized version for getNegativeProof() if it's ever introduced but
still use the default version of getAdditional()), it'll still needs
to get access to things for the default version. I also expect if we
extend the context it may need to remember other information such as
the original qname (in case wildcard expansion happens), and so the
base class would retain a pointer blob, such as:

class Context {
...
   BaseContextData* base_data;
};

and allow a fully-customized implementation to keep it NULL.

Alternatively, we could define a protected method such as
getAllRRsets(), getFindOptions(), etc, and the default implementation
of getAdditional() would use getAllRRsets() to get access to the
RRsets that were possibly found by findAll(). Then we can move the
member variables to a separate derived class:

class Context {
...
protected:
    virtual vector<ConstRRset>* getAllRRsets() const = 0;
    virtual getAdditional(...) {
        // if this context was created from findAll():
        allsets = getAllRRsets();
        for_each(allsets->begin(), allsets->end(), ...);
    }
};

class DefaultContext {
protected:
    virtual vector<ConstRRset>* getAllRRsets() const {
         return (all_sets_); 
    }

private:
    vector<ConstRRsets> all_sets_;
};

class InMemoryContext {
protected:
    // it fully implements getAdditional() internally.

    virtual vector<ConstRRset>* getAllRRsets() const {
         return (NULL); 
    }
};

Regarding this, my suggestion is to keep it for now, and see how
easy/difficult to implement the customized in-memory version, and then
think about better reorganization based on the experience.

The python interface is not changed. Is it planned to any other ticket? Because it would not be possible to call the getAdditional, etc, from python right now.

I was aware of that, too (but I should've mentioned it). Right now
I'm not sure if we want to provide python interface for these. The
context class and its methods are basically intended for allowing
actual data source implementation to introduce customized (often
optimized) behavior. The services available via the context can be
achieved using the existing ZoneFinder?() interface. So, right now, I
don't see much need for the python wrapper of the context interface.

On the other hand, in terms of completeness it's generally better to
provide the same interface for both C++ and python.

Maybe we should just create a ticket mentioning the possibility so we
can revisit the point later. If we see as stronger need at that
point, we can add it; if not, maybe that indicates it's unlikely that
python apps need it and we can forget it for a relatively longer term.

Also, could a test check that the vector passed to getAdditional is not cleaned by it, that is, the data are just appended? I think all tests are done with empty vector now.

Good point, added it.

I noticed an indentation problem here:

@@ -821,17 +826,17 @@ DatabaseClient::Finder::findNoNameResult(const Name& name, const RRType& type,
             arg(accessor_->getDBName()).arg(name);
         const ConstRRsetPtr nsec = dnssec_data ? findNSECCover(name) :
             ConstRRsetPtr();
-        return (FindResult(NXRRSET, nsec,
-                           nsec ? RESULT_NSEC_SIGNED : RESULT_DEFAULT));
+        return (ResultContext(NXRRSET, nsec,
+                        nsec ? RESULT_NSEC_SIGNED : RESULT_DEFAULT));
     } else if ((options & NO_WILDCARD) == 0) {

Thanks, fixed.

comment:12 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:13 in reply to: ↑ 11 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

Hmm, good point. I didn't think about it but that seems to make some
sense. I personally still think it makes sense to define findAll()
separately from find() rather than relying on the implicit meaning of
qtype=ANY, but at least we can unify the interface.

Agreed.

Would you like to introduce the change now, or should it go to a
separate ticket?

Hmm, I don't really want to slow the ticket down too much, but it would be nice to have all the interface changes at once, if at all possible. So, if it isn't too much work, I'd prefer it now.

Another thing, maybe we want to have a abstract base class that contains no data members like the vector for the all thing. I could imagine the in-memory remembering the node only and iterating that one when needed, instead of copying it out to a vector to iterate later.

Right. I was aware of that, and in fact I liked the clearer
separation of abstract interface from stuff that may not be used by
all derived classes. I still pushed vector, etc, to the base class
because if a derived class wants partial customization (e.g., use
customized version for getNegativeProof() if it's ever introduced but
still use the default version of getAdditional()), it'll still needs
to get access to things for the default version. I also expect if we
extend the context it may need to remember other information such as
the original qname (in case wildcard expansion happens), and so the
base class would retain a pointer blob, such as:
and allow a fully-customized implementation to keep it NULL.

Alternatively, we could define a protected method such as
getAllRRsets(), getFindOptions(), etc, and the default implementation
of getAdditional() would use getAllRRsets() to get access to the
RRsets that were possibly found by findAll(). Then we can move the
member variables to a separate derived class:

Well, my original idea was to have the abstract base class without any guts. Then there'd be a default base class (something like the current version), that would be used if someone wanted no customization or just a partial customization. And if some datasource felt really brave, it could go and do full customization on top of the very abstract one.

I don't think it makes sense to do anything too complicated because of the vector, if they'll be reused, an empty vector that is just sitting there is not a big problem.

Regarding this, my suggestion is to keep it for now, and see how
easy/difficult to implement the customized in-memory version, and then
think about better reorganization based on the experience.

OK, no problem.

The python interface is not changed. Is it planned to any other ticket? Because it would not be possible to call the getAdditional, etc, from python right now.

[...]

Maybe we should just create a ticket mentioning the possibility so we
can revisit the point later. If we see as stronger need at that
point, we can add it; if not, maybe that indicates it's unlikely that
python apps need it and we can forget it for a relatively longer term.

OK, that sounds reasonable. If someone wants decent speed with python… Hmm, right, makes no sense anyway. A ticket seems enough (maybe with the ticket number mentioned in the wrappers code somewhere).

Thanks

comment:14 in reply to: ↑ 13 Changed 8 years ago by jinmei

Replying to vorner:

Would you like to introduce the change now, or should it go to a
separate ticket?

Hmm, I don't really want to slow the ticket down too much, but it would be nice to have all the interface changes at once, if at all possible. So, if it isn't too much work, I'd prefer it now.

On thinking about it further, I'd like to propose deferring this after
the cleaner abstraction (which would also be deferred to post #1608)
and #1747. Depending on the former, we may change how the application
gets the result (with the pure abstract model it will need to get the
resulting RRset, etc, via an accessor method, not just by referring to
a const member object). Also, #1747 will give a more concrete view on
how the application wants to get information from the context.

If this is okay I'll create a ticket and keep this branch intact.

Well, my original idea was to have the abstract base class without any guts. Then there'd be a default base class (something like the current version), that would be used if someone wanted no customization or just a partial customization. And if some datasource felt really brave, it could go and do full customization on top of the very abstract one.

I don't think it makes sense to do anything too complicated because of the vector, if they'll be reused, an empty vector that is just sitting there is not a big problem.

Okay, not fully thought about it, but this model seems quite clean to
me. In any case, I propose to defer this work to the separate ticket
I suggested above.

If these plans are acceptable, I think the branch is now okay for
merge with creating deferred tickets. If misunderstand it or if you
don't agree with the plan please let me know.

comment:15 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:16 follow-up: Changed 8 years ago by vorner

  • Total Hours changed from 0 to 3.37

I think that is OK, please merge.

comment:17 Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

comment:18 in reply to: ↑ 16 Changed 8 years ago by jinmei

Replying to vorner:

I think that is OK, please merge.

Okay thanks. Merge done, tickets created, closing.

comment:19 Changed 8 years ago by jinmei

  • Resolution set to complete
  • Status changed from reviewing to closed
  • Total Hours changed from 3.37 to 30.87

comment:20 follow-up: Changed 8 years ago by jreed

Here are some qps results:

Before:

4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf builtin--.nxdomain 3692.754604 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf builtin--.soa 3409.415857 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf builtin--.success 23433.920469 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf largehost-sqlite3-.nxdomain 0.008074 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf largehost-sqlite3-.soa 1657.549388 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf largehost-sqlite3-.success 1555.479221 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf root-memory-.nxdomain 28050.868567 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf root-memory-.soa 5841.863467 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf root-memory-.success 5162.483596 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf root-sqlite3-nocache.nxdomain 51.393433 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf root-sqlite3-nocache.soa 386.524202 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf root-sqlite3-nocache.success 407.885328 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf root-sqlite3-.nxdomain 52.398443 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf root-sqlite3-.soa 364.882712 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf root-sqlite3-.success 396.496054 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf smallzone-memory-.nxdomain 26705.076400 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf smallzone-memory-.soa 23378.067109 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf smallzone-memory-.success 25052.125958 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf smallzone-sqlite3-nocache.nxdomain 2523.513468 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf smallzone-sqlite3-nocache.soa 2707.114333 qps
4a82ece4c4e353ab8e3ceb01fd8e0f4824ac6bbf smallzone-sqlite3-nocache.success 2971.678972 qps

After:

2e940ea65d5b9f371c26352afd9e66719c38a6b9 builtin--.nxdomain 3071.227113 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 builtin--.soa 3423.179190 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 builtin--.success 22562.482508 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 largehost-memory-.nxdomain 22928.230329 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 largehost-memory-.soa 16004.200526 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 largehost-memory-.success 16294.292123 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 largehost-sqlite3-.nxdomain 0.008065 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 largehost-sqlite3-.soa 1542.643974 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 largehost-sqlite3-.success 1459.850292 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 root-memory-.nxdomain 26893.654851 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 root-memory-.soa 5568.907072 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 root-memory-.success 5326.642048 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 root-sqlite3-nocache.nxdomain 51.312237 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 root-sqlite3-nocache.soa 375.367218 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 root-sqlite3-nocache.success 405.669859 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 root-sqlite3-.nxdomain 56.554757 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 root-sqlite3-.soa 386.966231 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 root-sqlite3-.success 418.413817 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 smallzone-memory-.nxdomain 26824.650266 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 smallzone-memory-.soa 22369.809276 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 smallzone-memory-.success 24351.056519 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 smallzone-sqlite3-nocache.nxdomain 2501.184361 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 smallzone-sqlite3-nocache.soa 2727.912269 qps
2e940ea65d5b9f371c26352afd9e66719c38a6b9 smallzone-sqlite3-nocache.success 2961.104635 qps

Some explanations of these tests are at DnsBenchmarks.

Basically everything had a decrease except for successful delegations for root zone. (Note that the largehost-memory results weren't done for the first test. It timed out and I didn't look further.)

comment:21 in reply to: ↑ 20 Changed 8 years ago by jinmei

Replying to jreed:

Here are some qps results:

[...]

Some explanations of these tests are at DnsBenchmarks.

Basically everything had a decrease except for successful delegations for root zone. (Note that the largehost-memory results weren't done for the first test. It timed out and I didn't look further.)

Thanks, from a quick look they are within the range of my expectation.
I believe #1608 (when completed) will outweigh the overhead with a
large margin, and #1753 should mostly eliminate the overhead itself
(at least for in-memory).

Note: See TracTickets for help on using tickets.