Opened 8 years ago

Closed 8 years ago

#1688 closed task (fixed)

duplicate RR suprression in auth::Query

Reported by: jinmei Owned by: jinmei
Priority: medium Milestone: Sprint-20120403
Component: b10-auth Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 5.5 Internal?: no

Description (last modified by jinmei)

There are several cases where an answer from b10-auth can have
duplicate RRs (sometimes in the same section, sometimes over
multiple sections). We catch and suppress the possible duplicates
separately (avoid adding duplicate A/AAAA in answer and additional,
avoid adding duplicate NSECs) and sometimes just ignore such cases.

It would be better to unify such processing. Also, since duplicate
check is generally expensive (the generic way requires name
comparison), it would also be nicer if we take into account possible
further optimization (but that would be beyond the scope of this
ticket).

See also the review discussion in #1583.

It's probably easier to do this ticket after #1747 and #1748.

Subtickets

Change History (24)

comment:1 Changed 8 years ago by jelte

Not sure if we should just rely on this in the general case (And not care about it during actual query processing), but currently NSEC3 records can also be added multiple times.

In ticket #1696, one such test was added (but disable because it failed); if the NSEC3 that matches the wildcard happens to be the same as the NSEC3 that denies the requested record, it is added twice. Once this ticket has been done (or if we decide that's a query handling issue, once the ticket to fix that has been addressed), the scenario should be enabled. Scenario: 7.2.2 other; Name Error where one NSEC3 covers multiple parts of proof)

comment:2 Changed 8 years ago by jelte

  • Milestone changed from Year 3 Task Backlog to Next-Sprint-Proposed

comment:3 Changed 8 years ago by jinmei

  • Description modified (diff)

comment:4 follow-up: Changed 8 years ago by vorner

Thinking about it, as the message renderer does name coparison already (in the name compression), would it make sense to make it there?

comment:5 in reply to: ↑ 4 Changed 8 years ago by jinmei

Replying to vorner:

Thinking about it, as the message renderer does name coparison already (in the name compression), would it make sense to make it there?

I'm afraid it would make the renderer implementation too complicated.
The renderer basically only maintains label offsets they first appear
in its output stream. But for the RRset (kind) comparison it needs to
remember other attributes, type and class, and also needs to remember
other RRsets that are not needed for name compression purposes. So I
think it's way beyond the responsibility of the renderer.

comment:6 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 5

comment:7 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120320

comment:8 Changed 8 years ago by stephen

  • Owner set to stephen
  • Status changed from new to assigned

comment:9 Changed 8 years ago by stephen

  • Owner changed from stephen to UnAssigned
  • Status changed from assigned to reviewing

Ready for review. The idea behind the duplicate detection is described in the header comments for RRsetInserter (in query.c).

I don't think this requires a ChangeLog entry.

comment:10 Changed 8 years ago by jinmei

I have some preliminary design level comments:

  • Is it really a better choice to use a std::set? While it'd be more efficient for the search itself, it needs to allocate new resource every time we insert an entry (and we need to release the resource on destruction). On the other hand, if we perform duplicate check directly in the vectors, it'd be slower for search, but no new resource should be allocated (and the resources for the vectors are reused), and iterating over the vector would be quite fast because they are essentially an array. In many cases the number of RRsets in the response should be quite small, so I at least don't think the tradeoff is so obvious.
  • If we really conclude the set is the right choice, we should remove (Abstract)RRset::isSameKind(). Realistically, the only reason we add this to the RRset class even if functionality wise it doesn't have to be a member function is because we wanted to optimize the duplicate RRset detection for in-memory data source. If the only reason is gone, then we should keep the class simpler with the cleanup.
  • And, if we continue using the set and "lthan", I think we should reconsider the naming of the method. "lthan" (or less than) sounds quite generic and sounds like a predicate based on the comparison of the entire RRset (the awkward name of isSameKind() was to avoid such confusion).

A couple of minor things I just happened to notice (I've not yet
reviewed all of the branch):

  • I made a few minor editorial fixes.
  • This comment in rrset.h seems to be incomplete:
        /// check for equality.  It only needs to do a "less than" check on C.
        /// equality.  It only needs to do one check on C,
    

comment:11 Changed 8 years ago by jinmei

  • Owner changed from UnAssigned to stephen

comment:12 follow-up: Changed 8 years ago by stephen

  • Milestone changed from Sprint-20120403 to Sprint-20120320
  • Owner changed from stephen to jinmei

Is it really a better choice to use a std::set? While it'd be more efficient for the search itself, it needs to allocate new resource every time we insert an entry (and we need to release the resource on destruction).

That's a fair point. I've altered the RRsetInserter code to use a (pre-allocated) vector instead of a set and this is searched sequentially (so O(n2) comparisons). To avoid constant construction and destruction, that is a member of the Query class and is reset prior to each query. This change addresses all the other points.

If all is OK can you please merge it, or put is back to UnAssigned as I won't be around for the next week and a half.

comment:13 Changed 8 years ago by jinmei

  • Milestone changed from Sprint-20120320 to Sprint-20120403

(I believe this should have been moved to the current sprint, so moving it)

comment:14 in reply to: ↑ 12 Changed 8 years ago by jinmei

Replying to stephen:

Is it really a better choice to use a std::set? While it'd be more efficient for the search itself, it needs to allocate new resource every time we insert an entry (and we need to release the resource on destruction).

That's a fair point. I've altered the RRsetInserter code to use a (pre-allocated) vector instead of a set and this is searched sequentially (so O(n2) comparisons). To avoid constant construction and destruction, that is a member of the Query class and is reset prior to each query. This change addresses all the other points.

The basic idea looks okay. I originally meant performing duplicate
check on the answers_/authorities_/additionals_ vectors directly
rather than introducing a separate vector. But after experimenting
this approach I didn't see much difference in performance (and in some
cases the current branch version is faster), so, in the end, I'm okay
with that.

But, I have some non trivial comments on the latest version. As
you'll be off for development for a while, I took over it and
addressed my own comments. I'll ask someone else for the changes I
made.

I've also checked performance implication. While the test cases are
limited, the revised version is actually generally faster than the
previous one, using std::set, despite the O(n2) search. The test
case includes a relatively "large scale" case of a delegation from
root to com (without DNSSEC), which involves 15 RRsets.

Now, my comments follow:

Major point

  • I'd like to avoid exposing member variables directly, even as protected, and even if it's intended only for tests. It's so especially when they are mutable. I've self-addressed this pint at 4ad96ab. As a result, the Inserter class was renamed to ResponseCreator and publicly visible (for the test).

Relatively minor points

  • I think we need a changelog for this change. This is my proposed entry:
    407.?	[bug]		stephen, jinmei
    	b10-auth didn't remove duplicate RRsets from responses.
    	(Trac #1688, git TBD)
    
  • I'd suggest using vector::insert instead of copy:
    #if 0
        copy(rrset_vector.begin() + 2, rrset_vector.begin() + 4,
             back_inserter(answer));
    #endif
        answer.insert(answer.end(), rrset_vector.begin() + 2,
                      rrset_vector.begin() + 4);
    
    My understanding is that it's generally advisable because it's a vector's "native" method. See also the book "Effective STL" if you have it. I've self-addressed this at fd5a4de.
  • Also, I guess it's safer to use rrset_vector.end() instead of '+ 9'
        copy(rrset_vector.begin() + 7, rrset_vector.begin() + 9,
             back_inserter(additional));
    
    because '+9' would exceeds the valid range. I've not confirmed what the STL standard says on this point, and in event, it wouldn't cause a disruption in the case of vector iterators (in many cases they are just a plain old pointer), but at least using end() shouldn't do any harm. I self-addresses these at fd5a4de.
  • I'd suggest testing the case where duplicate happens in the same section (this can happen in practice), and I self-addressed this at 847525d.
  • I'd suggest using BOOST_FOREACH instead of code like this:
        for (i = answers.begin(); i != answers.end(); ++i) {
            addRRset(response, Message::SECTION_ANSWER, *i, dnssec);
        }
    
    I self-addressed this point at a6d9e2b. See the commit log.
  • We should now be able to skip the following duplicate checks:
        // Add the (no-) wildcard proof only when it's different from the NSEC
        // that proves NXDOMAIN; sometimes they can be the same.
        // Note: name comparison is relatively expensive.  When we are at the
        // stage of performance optimization, we should consider optimizing this
        // for some optimized data source implementations.
        if (nsec->getName() != fcontext->rrset->getName()) {
            authorities_.push_back(fcontext->rrset);
        }
    ...
        if (nsec->getName() != fcontext->rrset->getName()) {
            // one NSEC RR proves wildcard_nxrrset that no matched QNAME.
            authorities_.push_back(fcontext->rrset);
        }
    
    I've addressed this at e70bd3b.
  • I can now re-enable the nsec3 lettuce scenarios that were disabled due to the missing suppression of duplicate RRsets. I did it at 6f8e187.

comment:15 Changed 8 years ago by jinmei

Could someone review the changes I made (and the proposed changelog)?

They are c0f704f and later (of which the first two should be trivial).

comment:16 Changed 8 years ago by jinmei

  • Owner changed from jinmei to UnAssigned

comment:17 Changed 8 years ago by jelte

  • Owner changed from UnAssigned to jelte

comment:18 follow-up: Changed 8 years ago by jelte

  • Owner changed from jelte to jinmei

I must say I also didn't see the use of an extra class for this; the complexity should be the same if we use a local helper method instead of direct push_back for the three vectors in the Query class (which would only add to the given vector if they are not present in any). The only use this has is that now, it can never happen that rrsets are not put in answer if they are already present in authority or additional (whereas a local checker would result in first-come-first-serve instead of prioritizing answer>auth>additional)

comments on the code itself:

  • ResponseCreator::create(): additionals argument is not a reference
  • new includes in rbnode_rrset_unittests.cc? (<algorithm> and <sstream>, I suspect these are leftovers from now-removed code)
  • leftover newline in rbnode_rrset.cc
  • there are new includes in rrset_unittest too (iostream)

Oh and I suggest the changelog entry does not do 'used-to'; i.e. make it something like "b10-auth now filters out duplicate rrsets when building a response message" (can still say what it used to do, as long as we explicitly say what it does now :))

For the rest it looks ok

comment:19 in reply to: ↑ 18 ; follow-up: Changed 8 years ago by jinmei

Replying to jelte:

I must say I also didn't see the use of an extra class for this; the complexity should be the same if we use a local helper method instead of direct push_back for the three vectors in the Query class (which would only add to the given vector if they are not present in any). The only use this has is that now, it can never happen that rrsets are not put in answer if they are already present in authority or additional (whereas a local checker would result in first-come-first-serve instead of prioritizing answer>auth>additional)

I'm not sure if I understand the local method approach...at least I
interpreted it as you didn't request a change to the branch by saying
this, so I didn't do anything on the code for this part. If you
suggested a specific change, please say so (and clarify the
suggestion:-).

comments on the code itself:

  • ResponseCreator::create(): additionals argument is not a reference

Ah, good catch. Fixed.

  • new includes in rbnode_rrset_unittests.cc? (<algorithm> and <sstream>, I suspect these are leftovers from now-removed code)

algorithm really doesn't seem to be necessary, so I removed it.
sstream does actually seem to be necessary because we use
ostringstream (although it's not an issue introduced in this task).
So I kept it.

I also used this opportunity to align the order of header files based
on our recent consensus.

  • leftover newline in rbnode_rrset.cc

Removed.

  • there are new includes in rrset_unittest too (iostream)

Right, iostream doesn't seem to be necessary, but like
rbnode_rrset_unittests, technically we'd need sstream. So I added.

Also reordered the header files.

Finally, there was one more thing I intended to make (but forgot to do
before asking for review), although it was an unrelated cleanup: see
953d4bc.

Oh and I suggest the changelog entry does not do 'used-to'; i.e. make it something like "b10-auth now filters out duplicate rrsets when building a response message" (can still say what it used to do, as long as we explicitly say what it does now :))

Okay, and I realized we should clarify it's for the new query logic.

407.?	[bug]		stephen, jinmei
	b10-auth now filters out duplicate RRsets when building a
	response message using the new query handling logic.  It's
	currently only used with the in-memory data source, but will
	also be used for others soon.
	(Trac #1688, git TBD)

comment:20 Changed 8 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:21 in reply to: ↑ 19 ; follow-up: Changed 8 years ago by jelte

  • Owner changed from jelte to jinmei

Replying to jinmei:

I'm not sure if I understand the local method approach...at least I
interpreted it as you didn't request a change to the branch by saying
this, so I didn't do anything on the code for this part. If you
suggested a specific change, please say so (and clarify the
suggestion:-).

no, i did not request a change, i just noted that had I done it, i would probably have not created a new class, but changed the various calls to additionals_.push_back(rrset) (and the other two vectors) to a private method addRRset(vector&, rrset) which did the duplicate checks, by checking all three vectors. It's complexity would've been the same, there would be no additional vector to keep track, but the end result might be different (in which message section rrsets end up would depend on to which section they would've been added first).

all good, please go ahead and merge :)

comment:22 in reply to: ↑ 21 Changed 8 years ago by jinmei

Replying to jelte:

no, i did not request a change, i just noted that had I done it, i would probably have not created a new class, but changed the various calls to additionals_.push_back(rrset) (and the other two vectors) to a private method addRRset(vector&, rrset) which did the duplicate checks, by checking all three vectors. It's complexity would've been the same, there would be no additional vector to keep track, but the end result might be different (in which message section rrsets end up would depend on to which section they would've been added first).

So you mean we'd also pass a helper function (or perhaps a functor
object) to getAdditional()? I see the possibility, although it may
not be as efficient as it might look thanks to saving the additional
storage - calling the helper function for every RRset may not be
negligible.

Anyway, this is only for discussion for now. I've merged the branch,
and am closing the ticket.

Thanks,

comment:23 Changed 8 years ago by jinmei

  • Total Hours changed from 0 to 5.5

comment:24 Changed 8 years ago by jinmei

  • Resolution set to fixed
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.