Opened 9 years ago

Closed 9 years ago

#506 closed enhancement (fixed)

Analysis of Wildcard Processing

Reported by: stephen Owned by: jinmei
Priority: medium Milestone: A-Team-Sprint-20110209
Component: b10-auth Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: 3.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

An analysis of wildcard processing. An expected output of this is a task breakdown for the implementation of the feature.

Subtickets

Change History (11)

comment:1 Changed 9 years ago by jinmei

  • Owner set to jinmei
  • Status changed from new to accepted

comment:2 Changed 9 years ago by jinmei

I've been looking at BIND 9's wildcard handling.

I still need to read more for relatively minor cases, but
the basic ideas are:

When loading

If the owner name is a wildcard (e.g. *.foo.example.com) and the RR type is
!NS && !NSEC3, enable callback for the parent node (e.g. foo.example.com),
and mark that node as "wild".

When finding

  • The search context (FindState? in our implementation) has a new (boolean) field, "wild"
  • in the zone cut callback, if no NS/DNAME is found and the node

is marked as "wild", set the context's 'wild' field to 'true'.
the callback shouldn't stop the search, because the wildcard
may not be the best match.

  • when rbtree search completes with "PARTIALMATCH", and the search context indicates there has been a possible wildcard match (from the 'wild' field), find the wildcard node in the tree. it's has complicated logic due to various minor cases, but the common case is to follow the search chain (which we'd also use for empty node processing) and if a chain node is a wildcard, use it as a wildcard match.
  • once a wildcard match is found, use it as a normal match, except the owner name must be dynamically created.

My proposal is to estimate this simple scenario as an initial task
and add it to the task list.

The analysis should continue to cover minor cases.

comment:3 Changed 9 years ago by jinmei

This is a more complete version of BIND 9's wildcard handling
in its rbtdb. I've simplified the logic a bit for (the current
implementation of) the BIND 10 in-memory zone.

Loading

The basic idea is to mark the parent node of a wildcard name
(e.g. example.com for *.example.com) in order to force the find()
logic to perform special processing if the best match node indicates
the existence of a wildcard node below it.

This proces is two-fold:

  • when loading an RRset, if the owner name is a wildcard (e.g. *.foo.example.com) and the RR type is !NS && !NSEC3, mark that node as "wild". Also make sure the parent node exists in the tree by explicitly inserting it.
  • for any owner name, check if any of its ancestor is a wildcard. this is the case, e.g., when adding "foo.*.example.com", which would result in an empty non terminal wildcard node for name "*.example.com". Note: adding such a name as "foo.*.example.com" is almost bogus, and BIND 9 rejects loading them by default. It's still not prohibited by the protocol spec, however. If an ancestor name is a wildcard, explicitly add a rbt node for that name (to make sure if can find it as an exact match by find()), and treat its parent as described in the first bullet.

Finding (wildcard matching)

We need a new extension to RBTree::find(), which remembers the
"previous existing node" for the query name if an exact match isn't
found. This is necessary for a minor case of wildcard matching,
but will also be necessary for DNSSEC later.

  • in find(), if the search result is PARTIALMATCH and it's not a delegation, perform the wildcard check. It works as follows:
    1. if the search stops at a node marked as "wild", that may be a wildcard match. otherwise, it's not (this case is possible if there are *.example.com and foo.example.com, and the query name is bar.foo.example.com. This case should not result in wildcard match according to Section 4.3.3 of RFC1034).
    2. if it can be a wildcard match, construct the wildcard name by prepending '*' to the node's abstract name.
    3. get the RBTree node for that wildcard name by searching RBTree.
    4. reject the case where the query name is a subdomain of an empty non terminal node under the node marked as "wild" (which was found in the first find()). This process is complicated, so it's described separately below.
    5. if the wildcard match is allowed, use the RRsets for the wildcard node, and return any positive response with replacing the owner name of the RRset (which is the wildcard name) with the query name.

Note that we should not do wildcard matching if the search finds a
zone cut (delegation): According to RFC1034, we cancel wildcard matching
on delegation (Section 4.3.3). The above algorithm will implicitly
reject such cases, but tests should be written explicitly.

Cancel wildcard match due to an empty non terminal

This process is details of step 4 of the wildcard check in the previous
section. It implements the following part of Section 4.3.3 of RFC1034:

   - When the query name or a name between the wildcard domain and
     the query name is know to exist.

Specifically, when example.com has the wildcard name *.example.com and
bar.foo.example.com, this process will reject aaa.foo.example.com
and zzz.foo.example.com to be matched against the wildcard.

The necessary steps are as follows:

  • get the existing previous node of the original query name, and the next node of the previous. For the query name of aaa.foo.example.com, they are *.example.com and bar.foo.example.com, respectively; for the query name of zzz.foo.example.com, they are bar.foo.example.com and something else (or none), respectively.
  • check if any ancestor name of the query name (including the qname itself) up to the node marked as "wild" (in this case example.com) is a super domain of either the previous and next node name. For aaa.foo.example.com, its 1-generation ancestor, foo.example.com is a super domain of the next, "bar.foo.example.com"; for zzz.foo.example.com, its 1-generation ancestor, again foo.example.com, is a super domain of the previous, "bar.foo.example.com". This means the query name is a subdomain of an empty non terminal under the "wild" node, foo.example.com, and should not allow wildcard match.

proposed subtasks

I propose breaking down the above into the following 4 sub tasks:

  • loading part
  • finding part (except the non empty terminal rejection)
  • a new extension to RBTree::find() to remember and retrieve the "previous existing node" for the query name if an exact match isn't found. This task can be done separately from others, but it will use the framework of #517, so it should be done after #517.
  • implement canceling wildcard match due to an empty non terminal. it requires the new primitive described in the previous bullet.
Last edited 9 years ago by jinmei (previous) (diff)

comment:4 Changed 9 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from accepted to reviewing

I believe I completed this task.

I'll make it "reviewing state" for the moment. If someone can take a
look at it and point out any obvious errors or unclear points, that
would be appreciated. If I don't hear anything soon, I'll close this
ticket anyway, and create new development tasks based on the proposal.

comment:5 Changed 9 years ago by vorner

  • Owner changed from UnAssigned to vorner

I'll have a look at it, so we have some kind of formal review.

comment:6 follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

I didn't check it against RFC. Should I?

It mostly makes sense, but I have few questions.

  • What if we add "*.example.com." and "example.com." isn't there yet? Do we add it? Won't it change NXDOMAIN into NXRRSET?
  • The wildcards are handled inside MemoryZone?? Including creation of the names?
  • Do we store the „I met a wild node“ on the path here? The first part suggest yes, but the second doesn't mention it and it seems to me looking at the node we got from last partial match is enough.
  • How does the case with "foo.*.example.com." work while searching? We will encounter "example.com." and it is wild, therefore we would like to search for "*.example.com.", but that one isn't there. What we do then? (Provided we are looking for example for "foo.bar.example.com.")
  • The task for previous primitive for RBTree should list next as well?

comment:7 in reply to: ↑ 6 Changed 9 years ago by jinmei

Replying to vorner:

Thanks for the review.

I didn't check it against RFC. Should I?

I don't think so. But I think anyone who actually picks up
development (sub)task, except the "getting previous node" primitive,
should read at least Section 4.3.3 of RFC1034 (which is short).

It mostly makes sense, but I have few questions.

  • What if we add "*.example.com." and "example.com." isn't there yet? Do we add it? Won't it change NXDOMAIN into NXRRSET?

Ah, good point. We need to explicitly create a node for "example.com"
if it doesn't yet exist. That doesn't change the semantics about NXDOMAIN
vs NXRRSET: the existence of *.example.com automatically implies the existence
of example.com.

  • The wildcards are handled inside MemoryZone?? Including creation of the names?

Yes (at least that's what BIND 9 does).

  • Do we store the „I met a wild node“ on the path here? The first part suggest yes, but the second doesn't mention it and it seems to me looking at the node we got from last partial match is enough.

Do you mean that returned node should be marked as 'wild'? Hmm, you're
right (it looks like not (always) the case with BIND 9 due to other
features of its rbtdb). On rethinking about it in our simplified scenario,
we probably don't need the "I met a wild node" flag in FindState? or invoking
the callback at the "wild" node.

To reject wildcard match under a zone cut, however, we should be careful
not to perform wildcard matching if the search also indicates it has
encountered a zone cut.

  • How does the case with "foo.*.example.com." work while searching? We will encounter "example.com." and it is wild, therefore we would like to search for "*.example.com.", but that one isn't there. What we do then? (Provided we are looking for example for "foo.bar.example.com.")

First off, foo.bar.example.com doesn't (shouldn't) match foo.*.example.com.
Adding foo.*.example.com implicitly creates *.example.com. As for the
existence of *.example.com node, it's done in the second bullet of "Loading"
part, i.e., we explictly add *.example.com, and mark example.com as 'wild'.
(From re-reading it, I see it was not clear that we add *.example.com.
thanks for pointing it out).

  • The task for previous primitive for RBTree should list next as well?

No, getting the next is necessary for general non empty terminal support,
and it's already included in #517 (the interface should be consistent,
so this subtask should be done after #517).

comment:8 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:9 follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Ok, thanks for the explanations. So, let's call it reviewed.

comment:10 in reply to: ↑ 9 Changed 9 years ago by jinmei

Replying to vorner:

Ok, thanks for the explanations. So, let's call it reviewed.

Ack, thanks. I've updated the original analysys
(http://bind10.isc.org/ticket/506?replyto=9#comment:3)
with the clarifications based on the discussion and some additional
things/corrections I noticed.

I'm closing this ticket.

comment:11 Changed 9 years ago by jinmei

  • Resolution set to fixed
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.