Opened 9 years ago

Closed 9 years ago

#551 closed enhancement (complete)

wildcard handling for memory zone: find (main cases)

Reported by: jinmei Owned by: vorner
Priority: medium Milestone: A-Team-Sprint-20110223
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: 8.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

See the analysis ticket (#506) for the big picture.

This is a substask for finding (matching with) wildcard, covering
major cases.

This subtask requres #550. On top of that, update MemoryZone::find()
as follows:

If the search result is PARTIALMATCH and it's not a delegation,
perform the wildcard check. It works as follows:

  1. if the search stops at a node marked as "wild", that may be a wildcard match. otherwise, it's not (this case is possible if there are *.example.com and foo.example.com, and the query name is bar.foo.example.com. This case should not result in wildcard match according to Section 4.3.3 of RFC1034).
  2. if it can be a wildcard match, construct the wildcard name by prepending '*' to the node's abstract name.
  3. get the RBTree node for that wildcard name by searching RBTree.
  4. reject the case where the query name is a subdomain of an empty non terminal node under the node marked as "wild" (which was found in the first find()). This process is complicated, so it will be handled in a separate subtask.
  5. if the wildcard match is allowed, use the RRsets for the wildcard node, and return any positive response with replacing the owner name of the RRset (which is the wildcard name) with the query name.

Note that we should not do wildcard matching if the search finds a
zone cut (delegation): According to RFC1034, we cancel wildcard matching
on delegation (Section 4.3.3). The above algorithm will implicitly
reject such cases, but tests should be written explicitly.

Subtickets

Change History (16)

comment:1 Changed 9 years ago by jinmei

  • Estimated Difficulty changed from 0.0 to 8.0

comment:2 Changed 9 years ago by vorner

  • Milestone changed from A-Team-Task-Backlog to A-Team-Sprint-20110209
  • Owner set to vorner
  • Status changed from new to accepted

comment:3 follow-ups: Changed 9 years ago by vorner

  • Owner changed from vorner to UnAssigned
  • Status changed from accepted to reviewing

It is ready for review.

I'm not sure if it is correct in the way how the strange nested wildcards are handled (eg. wild.*.foo.example.org). See the tests, please, for that.

I think we can add the changelog entry with #553, as this will complete the wildcard family of tasks.

comment:4 in reply to: ↑ 3 Changed 9 years ago by jinmei

This branch contains a lot of irrelevant diff due to the intermediate merge. What's the easiest way to retrieve the
relevant stuff?

comment:5 Changed 9 years ago by vorner

I'm not sure if it is the easiest way, but I know of three ways:

  • Diff the part before the merge and after the merge separately. It's not much of a solution, more like ignoring the problem.
  • Create a temporary branch over master and rebase/cherry-pick everything except the merge there. That way you can view the changes, but is maybe little bit overkill.
  • Create a temporary branch over master and merge the reviewed branch into it, then diff against master. That way you view the changes that the merge does, for example this way (I believe this one is most accurate and is quite easy):
git checkout origin/master -b temporary --no-track
git merge origin/trac551
git diff origin/master

(The --no-track is there to stop the temporary branch from tracking master. It's not actually needed for anything, but ensures that if you typed git push by accident, it does not push the temporary merge to master).

You can then remove the branch:

git checkout master
git branch -D temporary

Maybe I should document this on the workflow page.

comment:6 Changed 9 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:7 in reply to: ↑ 3 ; follow-up: Changed 9 years ago by jinmei

Replying to vorner:

For review, I've created a local branch from master, merged trac551
and taken the diff from the branch point.

I'm not sure if it is correct in the way how the strange nested wildcards are handled (eg. wild.*.foo.example.org). See the tests, please, for that.

The implementation looks correct, but I have some suggestions about
more test cases. See below. I'd also suggest when you are not sure
about specific behavior you check how other implementations do. That
would also be useful/helpful even if you believe you are doing the
right thing. Sometimes you find you were actually wrong (which is
obviously useful), sometimes you find it might be controversial
behavior in non standardized cases, and in some rare cases you even
find a bug of other implementations.

I think we can add the changelog entry with #553, as this will complete the wildcard family of tasks.

Yes, and I actually think the changelog will be about a complete "in
memory data source (in terms of query handling (without DNSSEC)),
giving credit to all A team developers.

Detailed comments:

memory_datasrc.cc

  • I noticed an editorial error in my original implementation (code and comment didn't match) and fixed it.
  • MemoryZoneImpl::find() now seems too big to understand. I'd like to refactor it by (e.g) moving the PARTIALMATCH case outside the method. But that would be beyond the scope of this ticket. I'm okay with moving forward and leaving the refactoring to a separate task.
  • as for the note about RRSIG:
                // TODO What about signatures? If we change the name, it would be
                // wrong anyway...
    
    I wouldn't worry about it for now. My expectation is that the protocol wise consideration for wildcard + RRSIG will be naturally implemented.

memory_datasrc_unittest.cc

  • findTest()
    • maybe something like "actualIt" is better than "gotIt" (to be consistent with output from gtest)? That may be a minor preference matter, though.
    • shouldn't the value of gotIt be find_result.rrset->getRdataIterator()? if so, it makes me think the tests might not be developed with the spirit of "test driven" (i.e., starting with failures)
    • I guess we should generalize the RDATA check in lib/testutils/. (but that would better be deferred to a separate task)
    • I guess one of the following is intended for RRclass:
                          EXPECT_EQ(answer->getType(),
                              find_result.rrset->getType());
                          EXPECT_EQ(answer->getType(),
                              find_result.rrset->getType());
      
  • this comment can now be removed?
        // TODO:
        // glue name would match a wildcard under a zone cut: wildcard match
        // shouldn't happen under a cut and result must be PARTIALMATCH
        // (This case cannot be tested yet)
    
  • I don't understand what "it will be here" means.
        // Search at the parent. The parent will not have the A, but it will be here
    
  • delegatedWildcard test should also check the GLUEOK case.
  • emptyWildcard should check 'wild.bar.foo.example.org' doesn't match.
  • nestedEmptyWildcard should test match and unmatch cases:
    • baz.foo.*.bar.example.org (should match)
    • baz.foo.baz.bar.example.org (should not match)
    • *.foo.baz.bar.example.org (should not match)
  • we should also test the case where we have *.example.com and foo.example.com, with checking bar.foo.example.com doesn't match.

comment:8 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:9 in reply to: ↑ 7 ; follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Replying to jinmei:

  • MemoryZoneImpl::find() now seems too big to understand. I'd like to refactor it by (e.g) moving the PARTIALMATCH case outside the method. But that would be beyond the scope of this ticket. I'm okay with moving forward and leaving the refactoring to a separate task.

OK, I'll create a task for it in the backlog.

  • as for the note about RRSIG:
                // TODO What about signatures? If we change the name, it would be
                // wrong anyway...
    
    I wouldn't worry about it for now. My expectation is that the protocol wise consideration for wildcard + RRSIG will be naturally implemented.

Should I remove it? It crossed my mind at the time, so I marked the place where the original signature is lost. I didn't really worry about it.

memory_datasrc_unittest.cc

  • shouldn't the value of gotIt be find_result.rrset->getRdataIterator()? if so, it makes me think the tests might not be developed with the spirit of "test driven" (i.e., starting with failures)

It should. However, I developed it in two rounds. First, I implemented tests to see the wildcard RRset is returned unmodified. That did fail and I implemented finding it. In the second round, I modified the tests to check the name and content of the answer. It did fail with wrong name and I wasn't bothered that it doesn't fail with wrong content, because the content was there already because of the first round.

  • I guess we should generalize the RDATA check in lib/testutils/. (but that would better be deferred to a separate task)

Another task to backlog?

  • emptyWildcard should check 'wild.bar.foo.example.org' doesn't match.
  • nestedEmptyWildcard should test match and unmatch cases:
    • baz.foo.*.bar.example.org (should match)
    • baz.foo.baz.bar.example.org (should not match)
    • *.foo.baz.bar.example.org (should not match)

I take by doesn't match you mean NXDOMAIN, right? Because that fails and I think it should return NXRRSET, not NXDOMAIN. I reason this way:

While the RFC recommends not having RRset with multiple * in the name, it doesn't forbid it. However, if we load wild.*.foo.example.org, then the *.foo.example.org domain exists as empty nonterminal domain. Therefore we should match against that one (with *=wild.bar) and return NXRRSET, because that domain is empty.

Similarly with the baz.foo.baz.bar.example.org and *.foo.baz.bar.example.org.

Is there a problem in the reasoning? Anyway, as BIND9 rejects loading such thing (as you mentioned), this explanation doesn't go against it. And the algorithm that you described works this way as well.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 9 years ago by jinmei

Replying to vorner:

  • as for the note about RRSIG:
                // TODO What about signatures? If we change the name, it would be
                // wrong anyway...
    
    I wouldn't worry about it for now. My expectation is that the protocol wise consideration for wildcard + RRSIG will be naturally implemented.

Should I remove it? It crossed my mind at the time, so I marked the place where the original signature is lost. I didn't really worry about it.

It's up to you, but I'd leave it as a note.

  • I guess we should generalize the RDATA check in lib/testutils/. (but that would better be deferred to a separate task)

Another task to backlog?

Yes, I think so.

  • emptyWildcard should check 'wild.bar.foo.example.org' doesn't match.
  • nestedEmptyWildcard should test match and unmatch cases:
    • baz.foo.*.bar.example.org (should match)
    • baz.foo.baz.bar.example.org (should not match)
    • *.foo.baz.bar.example.org (should not match)

I take by doesn't match you mean NXDOMAIN, right? Because that fails and I think it should return NXRRSET, not NXDOMAIN. I reason this way:

While the RFC recommends not having RRset with multiple * in the name, it doesn't forbid it. However, if we load wild.*.foo.example.org, then the *.foo.example.org domain exists as empty nonterminal domain. Therefore we should match against that one (with *=wild.bar) and return NXRRSET, because that domain is empty.

Similarly with the baz.foo.baz.bar.example.org and *.foo.baz.bar.example.org.

Is there a problem in the reasoning?

Yes, there's a problem. By matching wild.bar.foo.example.org against
wild.*.foo.example.org. (resulting in NXRRSET), you substitute a label
for '*' not located in the first label of the name (in this example
it's the second label). RFC doesn't allow such matching:

<quote>
The owner name of the wildcard RRs is of
the form "*.<anydomain>", where <anydomain> is any domain name.
</quote>

Anyway, as BIND9 rejects loading such thing (as you mentioned), this explanation doesn't go against it. And the algorithm that you described works this way as well.

BIND9 rejects (loading) it *by default*. You can override the
behavior by adding the following to 'options':

options {
...
	check-names master ignore;
};

NSD (more accurately its zonec utility) also allows it.

For corner case scenarios like this, I normally check what other
implementations do, not only for BIND 9 but also at least one other
non ISC server (often NSD, sometimes powerdns also). I'd suggest this
practice to you.

Note: I've not checked the revised (if it's been revised) code yet.

comment:11 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:12 in reply to: ↑ 10 Changed 9 years ago by jinmei

Replying to jinmei:

Similarly with the baz.foo.baz.bar.example.org and *.foo.baz.bar.example.org.

Is there a problem in the reasoning?

Yes, there's a problem. By matching wild.bar.foo.example.org against
wild.*.foo.example.org. (resulting in NXRRSET), you substitute a label
for '*' not located in the first label of the name (in this example
it's the second label).

Hmm, apparently I was totally confused about this case with other
corner cases. Sorry for the confusion, please forget what I've said
on this point so far. Here's a revised version of suggested tests:

  • delegatedWildcard test should also check the GLUEOK case.
  • emptyWildcard should check 'wild.bar.foo.example.org' results in NXRRSET
  • nestedEmptyWildcard should test:
    • baz.foo.*.bar.example.org (should result in NXRRSET)
    • baz.foo.baz.bar.example.org (should result in NXRRSET)
    • *.foo.baz.bar.example.org (should result in NXRRSET)
  • we should also test the case where we have *.example.com and foo.example.com, with checking bar.foo.example.com doesn't match (result in NXDOMAIN)

comment:13 follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

OK, thanks and don't worry about that.

I think everything should be addressed from my point. Can you have another look, please?

Thanks

comment:14 in reply to: ↑ 13 Changed 9 years ago by jinmei

Replying to vorner:

OK, thanks and don't worry about that.

I think everything should be addressed from my point. Can you have another look, please?

I suspect you need to specify GLUE_OK explicitly for the glue_ok + wildcard test:

         SCOPED_TRACE("Looking under delegation point in GLUE_OK mode");
         findTest(Name("a.child.example.org"), RRType::A(), Zone::DELEGATION,
-            true, rr_child_ns_);
+                 true, rr_child_ns_, NULL, NULL, Zone::FIND_GLUE_OK);

Other than that it looks okay.

comment:15 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:16 Changed 9 years ago by vorner

  • Resolution set to complete
  • Status changed from reviewing to closed

Thanks, fixed, merged and closing.

Note: See TracTickets for help on using tickets.