Opened 8 years ago

Closed 8 years ago

#1608 closed task (complete)

implement ZoneFinder::Context::getAdditional() for in memory data source

Reported by: jinmei Owned by: jinmei
Priority: high Milestone: Sprint-20120320
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket: auth performance
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 16 Internal?: no

Description (last modified by jinmei)

This is a subtask for "Pre-establish shortcuts for additional section processing" described in
https://lists.isc.org/pipermail/bind10-dev/2012-January/002985.html

It depends on #1605 and #1607.

In this task we extend the (tentatively named) RBNodeRRset so that
it stores a shortcut to the RBNode corresponding to its each RDATA.
For example, an NS RBNodeRRset maintains a list of pointers to the
RBNodes corresponding to the NS names of its RDATAs. In our current
implementation this only happens for NS and MX (in future we should
extend and generalize it).

In my experimental implementation, I built a tentative list of all NS
and MX RRsets while loading the zone content, and once the zone is
built, build the additional pointer list for each NS and MX RRset
stored in the tentative list. Then release the list. This
tentatively requires more memory, so if there's a simpler way to build
it more incremental, that would be nice.

On top of this setup, the in memory version of
ZoeFinder::Context::getAdditional() checks
the pointers to the RBNodes, and search for requested types of RRsets
at each node. Then push all found RRsets to the result vector and
return.

This task completes the entire shortcut feature.

Subtickets

Change History (19)

comment:1 Changed 8 years ago by jelte

  • Milestone changed from Year 3 Task Backlog to Next-Sprint-Proposed

comment:2 follow-up: Changed 8 years ago by vorner

Hello

I could think of a lazy implementation. All these hint pointers are left out as NULL at the beginning (and some kind of flag „precomputed“ is set to false). Once the findAdditional is called for the RRset, the flag is checked. If it is false, the default ZoneFinder::findAdditional is found and the result is stored. If the flag is true, it just returns whatever is stored there.

It has the slight disadvantage that it slows down answers when serving a little bit shortly after the startup. But considering most of the queries are to the same names, the once-time real lookup should not be visible.

Also, this might be extended in future when we have some kind of updates at runtime. The flag could be turned to generation, all A/AAAA would have their generation and if they are not the same, the hint would be recomputed.

But I don't know if we really want to go to the lazy approach, since it is slightly more work, if we want to redesign the internal structure anyway.

comment:3 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 5

comment:4 Changed 8 years ago by jinmei

  • Feature Depending on Ticket set to auth performance

comment:5 Changed 8 years ago by jelte

  • Milestone changed from Year 3 Task Backlog to Next-Sprint-Proposed

comment:6 Changed 8 years ago by jinmei

  • Description modified (diff)
  • Milestone changed from Year 3 Task Backlog to Next-Sprint-Proposed
  • Summary changed from implement findAdditional() for in memory data source to implement ZoneFinder::Context::getAdditional() for in memory data source

comment:7 Changed 8 years ago by jinmei

  • Priority changed from major to critical

comment:8 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120320

comment:9 in reply to: ↑ 2 Changed 8 years ago by jinmei

Replying to vorner:

I'm going to pick up this task. Before doing that, on this point:

I could think of a lazy implementation.

Yeah, we could think about optimizing it further. I've thought about
these kinds of ideas a bit, but right now I'm inclined to stick to
a naive but simple version. Due to the footprint issue, the in-memory
data source isn't realistically usable for larger zones anyway (and
for smaller zones for which the current in-memory is usable in
practice, the various overhead of this naive approach should also be
marginal), so for now I'd complete the basic feature first in a
simplest way.

comment:10 Changed 8 years ago by jinmei

  • Owner set to jinmei
  • Status changed from new to accepted

comment:11 Changed 8 years ago by jinmei

trac1608 is ready for review.

The first two commits are setup for subsequent change and they should
be straightforward (while the size of diff is big). So it may be
easier to review separately (then the size of the remaining diff will
be a more focused one).

The changes around InMemoryZoneFinder::Context and
InMemoryZoneFinder::load() are the main changes for the branch. I
believe the former is basically straightforward. The latter may
require some patience...hopefully the comments are clear. The rest of
the changes are basically for adjusting the interface.

There's one open issue: the current implementation cannot handle the
case where the name of an additional record requires wildcard match.
It's tested in getAdditionalDelegationWithWild and disabled for
in-memory for now. I could address this within this branch, but it's
now getting quite big, and the case should be very minor (and the
solution wouldn't be trivial), so I propose completing the current set
first and defer the wildcard case to a separate ticket.

Due to this minor regression we may need a changelog entry for this,
but expecting we'll address this before the next release, I'd omit it
for now. The change should be otherwise transparent (except for the
better performance, but for that we'll probably create a single
unified changelog entry).

comment:12 Changed 8 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from accepted to reviewing

comment:13 Changed 8 years ago by jelte

  • Owner changed from UnAssigned to jelte

comment:14 follow-up: Changed 8 years ago by jelte

  • Owner changed from jelte to jinmei
  • Total Hours changed from 0 to 3

In general, the code looks ok, I do have a couple of comments;

  • The two mappings of DomainNode::FLAG_USER are defined in different scopes. This may not be a problem (and in fact, defining them each in the smallest context as possible can be argued for), but it does feel like an abstraction violation, and I wonder if we shouldn't make define them in the same place.
  • Do we want to support additional data and glue that is defined outside of this zone, but available to the system (i.e. in a different zone, or maybe a different data source)? Esp. from a different data source would require more memory (and scary reverse dependencies), since that can't simply point to existing memory, but if we use a 'real' lookup for the actual additional data, but we'd get wildcard matching for 'free'. But even for just the one in-mem datasource, we'd also need some way to reset them for any zone (if a child zone gets loaded later), even if we don't do dynamic updating of zone data. The answer may be 'no' :) (and if yes, out of scope for this ticket). Oh, and if 'no'; should we warn or error or missing needed glue?
  • This comment suggests we are doing things that are very very dirty:
    This is public only because it needs to be used outside
    // of the \c RBNodeRRset class, but conceptually this is a private type.
    

It's less ugly than the comment would suggest, but still :) (private for whom, for instance) Any specific reason to prefer this over an abstract base class?

  • One of the lettuce tests fails too (nsec3_auth.feature scenario 5; fails on the now missing glue from wildcards). We can temporarily change that one as well, but we should give a high priority to fixing this

comment:15 in reply to: ↑ 14 Changed 8 years ago by jinmei

Replying to jelte:

Thanks for the review.

First, this one:

  • One of the lettuce tests fails too (nsec3_auth.feature scenario 5; fails on the now missing glue from wildcards). We can temporarily change that one as well, but we should give a high priority to fixing this

Good catch, and actually it's not about the deferred wildcard case,
but something I intended to include in this branch but forgot. I
fixed it with a corresponding unittest case (62a3bb1).

  • The two mappings of DomainNode::FLAG_USER are defined in different scopes. This may not be a problem (and in fact, defining them each in the smallest context as possible can be argued for), but it does feel like an abstraction violation, and I wonder if we shouldn't make define them in the same place.

I'm not sure if it addresses the concern of "abstract violation", but
I agree we should define them in some concept of the "same set" so
it's clear we define different flag values for different purposes and
have no collisions. I tried to address this point by introducing a
specific namespace (only visible in the .cc file) and define both
flags there. Doe that address your points?

  • Do we want to support additional data and glue that is defined outside of this zone, but available to the system (i.e. in a different zone, or maybe a different data source)? Esp. from a different data source would require more memory (and scary reverse dependencies), since that can't simply point to existing memory, but if we use a 'real' lookup for the actual additional data, but we'd get wildcard matching for 'free'. But even for just the one in-mem datasource, we'd also need some way to reset them for any zone (if a child zone gets loaded later), even if we don't do dynamic updating of zone data. The answer may be 'no' :) (and if yes, out of scope for this ticket). Oh, and if 'no'; should we warn or error or missing needed glue?

Good point. I'd first like to point out that these concerns are not
specific to getAddtional(); it's basically a (possible) "shortcut" of
what the generic auth::Query code did, and these points apply to the
original code, too. So the question is whether we want to return
out-of-zone additional records from the auth server in general. Maybe
we should discuss this at a biweekly call and/or bind10-dev. For the
purpose of this branch, I added an explicit note that it only returns
in-zone (+ glue) records in the doxygen comment. Regarding warning,
maybe we could do at the construction time and especially for missing
in-zone glue (under a zone cut), that would be helpful because it's
more likely an operational error (BIND 9 warns about it as a post-load
check IIRC) - but I'd defer it to a separate ticket (or to the general
"zone loader" work).

  • This comment suggests we are doing things that are very very dirty:
    This is public only because it needs to be used outside
    // of the \c RBNodeRRset class, but conceptually this is a private type.
    

It's less ugly than the comment would suggest, but still :) (private for whom, for instance) Any specific reason to prefer this over an abstract base class?

Yeah, I know it's dirty. Actually, I could completely hide this thing
by introducing an additional layer of pimpl to RBNodeRRset. But for
now I preferred convenience since we'll soon need to revisit the
fundamental structure of this stuff (we cannot hold the full RRset
objects in-memory for various reasons including memory footprint), and
since this header file isn't expected to be used by anything else than
the in-memory implementation and its tests (in fact, if it doesn't
have to be tested directly, the whole definition could be hidden
within .cc). I tried to add some more notes that hopefully are helpful.

comment:16 Changed 8 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:17 follow-up: Changed 8 years ago by jelte

  • Owner changed from jelte to jinmei

Ack, all good, please go ahead and merge.

I've added the question about glue to today's meeting pad

comment:18 in reply to: ↑ 17 Changed 8 years ago by jinmei

Replying to jelte:

Ack, all good, please go ahead and merge.

I've added the question about glue to today's meeting pad

Thanks, merge done, closing.

comment:19 Changed 8 years ago by jinmei

  • Resolution set to complete
  • Status changed from reviewing to closed
  • Total Hours changed from 3 to 16
Note: See TracTickets for help on using tickets.