Opened 9 years ago

Closed 9 years ago

#550 closed enhancement (complete)

wildcard handling for memory zone: load

Reported by: jinmei Owned by: jinmei
Priority: medium Milestone: A-Team-Sprint-20110223
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: 5.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

See the analysis ticket (#506) for the big picture.

This is a substask for the loading part:

The basic idea is to mark the parent node of a wildcard name
(e.g. example.com for *.example.com) in order to force the find()
logic to perform special processing if the best match node indicates
the existence of a wildcard node below it.
This proces is two-fold:

  • when loading an RRset, if the owner name is a wildcard (e.g. *.foo.example.com) and the RR type is !NS && !NSEC3, mark that node as "wild". Also make sure the parent node exists in the tree by explicitly inserting it.
  • for any owner name, check if any of its ancestor is a wildcard. this is the case, e.g., when adding "foo.*.example.com", which would result in an empty non terminal wildcard node for name "*.example.com". Note: adding such a name as "foo.*.example.com" is almost bogus, and BIND 9 rejects loading them by default. It's still not prohibited by the protocol spec, however. If an ancestor name is a wildcard, explicitly add a rbt node for that name (to make sure if can find it as an exact match by find()), and treat its parent as described in the first bullet.

Subtickets

Change History (10)

comment:1 Changed 9 years ago by jinmei

  • Estimated Difficulty changed from 0.0 to 5.0

comment:2 Changed 9 years ago by jinmei

  • Milestone changed from A-Team-Task-Backlog to A-Team-Sprint-20110209
  • Owner set to jinmei
  • Status changed from new to accepted

comment:3 Changed 9 years ago by jinmei

Branch trac550 is ready for review.

It begins with a refactoring of tests: bc1b548a3828f5cd722496dd8eb5c09fc1f95187
I hope this one is simpler (although we need to provide RDATA even when
it's not necessary for the test) and can be extended more easily.
This change is irrelevant to the topic of this ticket, and it
shouldn't change the existing test semantics.

The rest of the change is basically a straightforward implementation
of the plan. Some possibly non trivial parts are:

  • I changed the "callback" property of RBNode to generic "flags" and adjusted the interface accordingly. both "callback" and "wild" are now implemented as flags.
  • now the add() method has become pretty large and is getting unreadable, I refactored the code by moving the initial validation to a separate helper method: b832c3c975bcae66717ce28e392a189887b6f08f
  • as noted in the test code, the current test relies on an incomplete version of "empty node" support (I incorporated it from #507), and it relies on its "incompleteness". Once we fully support empty nodes the tests will become meaningless. The plan is to clean up when we complete #551.
  • I couldn't find a way to test the part of setting "wild" mark. so I'd defer that part of test to #551 (testing wildcard find() will indirectly test the "wild" marking)

comment:4 Changed 9 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from accepted to reviewing

comment:5 follow-up: Changed 9 years ago by vorner

  • Owner changed from UnAssigned to jinmei

Hello

There are three minor nits:

  • There's an C-like artifact in addWildcards ‒ result is defined on the top and not used for a while. Is there an actual reason for that?
  • You changed the name to call to rrset->getName in addValidation. Not that I would oppose that, I don't really care, but I'm curious why.
  • The flags in the Node, do we expect that we will have so many flags that uint32_t will be needed? And how was 0x8000000U chosen?

comment:6 in reply to: ↑ 5 ; follow-up: Changed 9 years ago by jinmei

Replying to vorner:

Hello

There are three minor nits:

  • There's an C-like artifact in addWildcards ‒ result is defined on the top and not used for a while. Is there an actual reason for that?

Do you mean we should (be able to) do this

DomainTree::Result result = domains.insert(wname.split(1), &node);

instead of this?

                DomainTree::Result result;
[comment lines]
                result = domains.insert(wname.split(1), &node);

Hmm, I would normally prefer the former. I don't remember exactly
what I was thinking at that time, but I probably tried to keep the two
sets of "insert() and assert()" consistent. In this particular case
that wouldn't be much different either way, because we don't do
anything between the definition and the first use of result. But in
case we may add something between them it may be a bit better to defer
the definition until we really need it (i.e., change it to the latter
style).

I don't have a strong opinion. If you (even slightly) prefer the
latter, I'll change it.

  • You changed the name to call to rrset->getName in addValidation. Not that I would oppose that, I don't really care, but I'm curious why.

This was because we don't need a copy object of the name in
addValidation() (unlike the add() itself, where we may modify the
name), and passing rrset->getName() as a (const) reference seemed
redundant (because it's part of another parameter, rrset), and because
in this case getName() of the rrset should be very cheap (it's quite
likely the rrset is actually a basic version of RRset, whose getName()
simply returns a reference to its internal object).

But it's not a strong opinion and can change if necessary or if it's
deemed to be better.

  • The flags in the Node, do we expect that we will have so many flags that uint32_t will be needed? And how was 0x8000000U chosen?

Good question. I actually thought about it (and noted a bit about it
in the documentation), but frankly I'm not entirely sure at this
moment. There are some specific possibilities in my mind though:

  • As documented, I expect we'll incorporate the node color to the flags (as a private 1-bit flag).
  • I also expect eventually we want to have a backpointer from a root of a sub tree (in the recursive tree organization of RBTree) to its parent tree and have a private "root" bit in the flags to identify the root nodes for performance optimizations. (This is what BIND 9 does also).
  • BIND 9 also uses some bit fields for NSEC processing. I'm not sure we'll end up doing the same thing, but there may be possibilities.
  • BIND 9 also embeds node name related information such as the length of the name in a "flag" (in the form of bitfields).
  • Finally, we may want to store the node "ID" for faster name compression by exploiting the RBTree structure. For that purpose we might want to have e.g. 4 real flags and a 28-bit wide field for the ID (to store up to about 260M nodes).

That said, these all are just possibilities and may be premature at
this stage. So I don't mind beginning with a smaller width field.

As for 0x8000000U, I was thinking we reserve higher several bits for
application defined usage...hmm, it was a typo; it should have been
0x80000000U in that sense. In any case, even after the correction
this is an ad hoc choice and may be changed in future versions. So
I'm okay with beginning with a more straightforward definition, such
as 0x1 and 0x2.

comment:7 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:8 in reply to: ↑ 6 ; follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Replying to jinmei:

Do you mean we should (be able to) do this

DomainTree::Result result = domains.insert(wname.split(1), &node);

instead of this?

                DomainTree::Result result;
[comment lines]
                result = domains.insert(wname.split(1), &node);

Well, yes, for the sake of consistance (I also prefer the way with parenthesis instead of equal sign, as it is clear the constructor is called, not assignment operator, but it doesn't matter).

I don't have a strong opinion. If you (even slightly) prefer the
latter, I'll change it.

I don't have any strong opinion either, it just caught my eye, and it looked strange to see it like that. If you can change it, it might be slightly nicer, but no reason to stop to review that again I guess.

This was because we don't need a copy object of the name in
addValidation() (unlike the add() itself, where we may modify the
name), and passing rrset->getName() as a (const) reference seemed
redundant (because it's part of another parameter, rrset), and because
in this case getName() of the rrset should be very cheap (it's quite
likely the rrset is actually a basic version of RRset, whose getName()
simply returns a reference to its internal object).

ACK, the way it was it seemed shorter code, but who cares ;-).

  • The flags in the Node, do we expect that we will have so many flags that uint32_t will be needed? And how was 0x8000000U chosen?
  • Finally, we may want to store the node "ID" for faster name compression by exploiting the RBTree structure. For that purpose we might want to have e.g. 4 real flags and a 28-bit wide field for the ID (to store up to about 260M nodes).

What if we have more of them? That seems like a limitation.

That said, these all are just possibilities and may be premature at
this stage. So I don't mind beginning with a smaller width field.

It doesn't really matter now, but if I ever would write the hybrid tree/radix tree backend, I wanted to save as much space as possible (to minimise data loads), so I was more wondering than anything else.

Anyway, let's merge it, I guess.

comment:9 in reply to: ↑ 8 Changed 9 years ago by jinmei

Replying to vorner:

I don't have any strong opinion either, it just caught my eye, and it looked strange to see it like that. If you can change it, it might be slightly nicer, but no reason to stop to review that again I guess.

Okay, changed.

  • The flags in the Node, do we expect that we will have so many flags that uint32_t will be needed? And how was 0x8000000U chosen?

(I've added one more "0" to the value, which was the originally intended
value as I explained)

  • Finally, we may want to store the node "ID" for faster name compression by exploiting the RBTree structure. For that purpose we might want to have e.g. 4 real flags and a 28-bit wide field for the ID (to store up to about 260M nodes).

What if we have more of them? That seems like a limitation.

Admittedly, not fully thought about it, and it's not even clear if
we'll be able to have 28 bits for this purpose. What I roughly
imagined is that we have two types of trees, one of which is
specifically designed for very large zones with a separate node ID
field, and have the administrator of such zones configure it
accordingly and manually (based on the observation that over 200M
nodes should be sufficient for the vast majority). But this may not
be a good idea in the end, especially when the zone can dynamically
grow via IXFR or dynamic update.

That said, these all are just possibilities and may be premature at
this stage. So I don't mind beginning with a smaller width field.

It doesn't really matter now, but if I ever would write the hybrid tree/radix tree backend, I wanted to save as much space as possible (to minimise data loads), so I was more wondering than anything else.

Anyway, let's merge it, I guess.

Okay, now merging, and closing the ticket. Thanks for the review.

comment:10 Changed 9 years ago by jinmei

  • Resolution set to complete
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.