Opened 8 years ago

Closed 8 years ago

#1775 closed task (fixed)

update in-memory getAdditional() to handle wildcard match for additional names

Reported by: jinmei Owned by: jinmei
Priority: high Milestone: Sprint-20120403
Component: Unclassified Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket: auth performance
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 16.64 Internal?: no

Description

This one was deferred from #1608. See the ticket.

Also, ZoneFinderContextTest::getAdditionalDelegationWithWild() fails
due to this issue. With this task completed, the test should succeed
and be re-enabled.

Subtickets

Change History (17)

comment:1 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 5

comment:2 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120403

comment:3 Changed 8 years ago by jinmei

  • Owner set to jinmei
  • Status changed from new to accepted

comment:4 Changed 8 years ago by jinmei

trac1775 is ready for review.

The first 3 commits are actually just set up. It's basically an
internal refactoring so we can share the code logic of identifying
the bestmaching node for a given name in the zone. To do so
I extracted the first part of InMemoryZoneFinderImpl::find() to
a separate function and moved it to the ZoneData? structure.
The first commit (beb36d9) is the main part of this refactoring,
and it should essentially be straightforward extraction. The diff
of this commit itself is pretty large, however, so it's probably
easier to review the branch if this commit is first checked
separately.

I also believe this refactoring is a good cleanup even if it's not for
this task so the find() function is more concise and readable (see
also #578).

The 2nd and 3rd commits are setup for the main change by changing
addAdditional() to use the extracted function.

The last commit was the main change. I believe this one is pretty
easy to understand. As noted, this change involves heavy copies that
could be avoided if our goal is to minimize memory footprint. But
also as noted, I believe this should actually be a pretty rare case in
practice, so I think it should be acceptable. (Total memory footprint
is still an issue, but that applies for the entire data structure, and
we'll need to revisit the whole design fundamentally anyway. We can
revisit the memory footprint for this particular point at that point).

I don't plan to add a changelog for this task.

comment:5 Changed 8 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from accepted to reviewing

comment:6 Changed 8 years ago by vorner

  • Owner changed from UnAssigned to vorner

comment:7 follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

I have few notes. First, the cppcheck fails here:

src/lib/datasrc/memory_datasrc.cc:376: check_fail: Possible null pointer dereference: node (error,nullPointer)

I believe this probably can't happen to be NULL, but we might want to assert or check and throw in case it is.

Also, it is not obvious why the size of the tree that is created to hold the wildcard-constructed additionals can't grow and grow (I thing I found a reason after some time of thinking, though). Would it be possible to comment on why the size is bounded by the characteristics of the zone?

And, I do not think this is safe:

        } else {
            // For the tradeoff between safety and performance (of the
            // const version), we discard the constness here.  In practice
            // this should be okay because internally this node was retrieved
            // from the zone tree as a mutable pointer anyway.
            *nodep = const_cast<DomainNode*>(result.node);
        }

To be clear, I do not see any specific problem. But every time I see a const_cast, I get a bad feeling about it. While reinterpret_cast might be completely safe in some circumstances, the const cast probably is never safe if we were to take the C/C++ standards literally. The compiler can detect something is const and optimise based on it. It is not required to check if the const data really is const or if it is broken in some cast way. So every time we use it, we risk a future problem when some compiler introduces an optimisation that takes advantage of it (like it happened with gcc and strict aliasing rules ‒ a lot of code broke and there's the --no-strict-aliasing parameter since then to „fix“ it). Do you think it could be done in a way without that, maybe by reusing the code through a template, or something (hmm, that wouldn't look nice either)?

Thank you

comment:8 in reply to: ↑ 7 ; follow-up: Changed 8 years ago by jinmei

Replying to vorner:

I have few notes. First, the cppcheck fails here:

src/lib/datasrc/memory_datasrc.cc:376: check_fail: Possible null pointer dereference: node (error,nullPointer)

I believe this probably can't happen to be NULL, but we might want to assert or check and throw in case it is.

Hmm, my version of cppcheck (1.52) didn't complain about that, but I
added an assert(). We have similar assert, so this would actually be
more consistent with other parts of the code.

Also, it is not obvious why the size of the tree that is created to hold the wildcard-constructed additionals can't grow and grow (I thing I found a reason after some time of thinking, though). Would it be possible to comment on why the size is bounded by the characteristics of the zone?

I'm not sure if I understand the concern, but I tried to add some more
explanations at d3e876a. Does that address it?

And, I do not think this is safe:

        } else {
            // For the tradeoff between safety and performance (of the
            // const version), we discard the constness here.  In practice
            // this should be okay because internally this node was retrieved
            // from the zone tree as a mutable pointer anyway.
            *nodep = const_cast<DomainNode*>(result.node);
        }

To be clear, I do not see any specific problem. But every time I see a const_cast, I get a bad feeling about it.

Yeah I know, and I didn't like that either. But when I first explored
several ways (including template) to avoid the cast, what I came up
with was either requesting code duplicate or additional code for the
performance sensitive path (which actually slowed it down, although
for a small amount) or quite unreadable code, so I gave up and adopted
the cast.

As explicitly commented, however, I re-read the code, and found one
way that may not be so ugly or sacrifice the performance (40d727b).
Would it be better?

In any case, I'd note that one fundamental background issue is that
RBTree::find() (which findNode() internally uses) breaks constness:
even if it's a const member function, it actually returns a mutable
pointer via 'node' which points to its internal data. Ideally, we
should provide const and non const versions of RBTree::find(). Then
the issue around findNode() could be solved much more cleanly. But
that would also be non trivial (if we also want to avoid logic
duplicate), and would be a separate issue anyway.

If the latest branch addresses the comments, I think I've addressed
all points so far. But I plan to add some more things after that, so
regardless of whether they are okay, I'll still continue to work on
this ticket (but it'd be nice if you could confirm the updates so far).

Thanks,

comment:9 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:10 in reply to: ↑ 8 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

Also, it is not obvious why the size of the tree that is created to hold the wildcard-constructed additionals can't grow and grow (I thing I found a reason after some time of thinking, though). Would it be possible to comment on why the size is bounded by the characteristics of the zone?

I'm not sure if I understand the concern, but I tried to add some more
explanations at d3e876a. Does that address it?

More or less. I was worried that someone could create queries where each one would add a new name to the tree, which would make it eat all memory. However, this can't happen since the name must be as the RData in some other thing in the zone. This is easier to follow from this description than before.

As explicitly commented, however, I re-read the code, and found one
way that may not be so ugly or sacrifice the performance (40d727b).
Would it be better?

I believe so.

In any case, I'd note that one fundamental background issue is that
RBTree::find() (which findNode() internally uses) breaks constness:
even if it's a const member function, it actually returns a mutable
pointer via 'node' which points to its internal data. Ideally, we
should provide const and non const versions of RBTree::find(). Then
the issue around findNode() could be solved much more cleanly. But
that would also be non trivial (if we also want to avoid logic
duplicate), and would be a separate issue anyway.

Well, it breaks constnest on the logical level, but not on the compiler definition level. The compiler now has all information on what can or can not change.

If the latest branch addresses the comments, I think I've addressed
all points so far. But I plan to add some more things after that, so
regardless of whether they are okay, I'll still continue to work on
this ticket (but it'd be nice if you could confirm the updates so far).

I think they are OK.

Thanks

comment:11 in reply to: ↑ 10 ; follow-up: Changed 8 years ago by jinmei

Replying to vorner:

I'm not sure if I understand the concern, but I tried to add some more
explanations at d3e876a. Does that address it?

More or less. I was worried that someone could create queries where each one would add a new name to the tree, which would make it eat all memory. However, this can't happen since the name must be as the RData in some other thing in the zone. This is easier to follow from this description than before.

Ah, okay, then I misunderstood you, but in any event that doesn't
happen. As you probably now understand, the auxiliary tree is fixed
at the time of initial loading (maintaining it with dynamic update is
an issue, but by that time we'll have to fundamentally revisit the
structure anyway).

If the latest branch addresses the comments, I think I've addressed
all points so far. But I plan to add some more things after that, so
regardless of whether they are okay, I'll still continue to work on
this ticket (but it'd be nice if you could confirm the updates so far).

I think they are OK.

Okay, good.

I've updated the branch for addressing a couple of more corner cases
(and that's all I intended to make). Unfortunately, it made the
implementation a bit more complicated than I originally expected, but
I believe the changes are not too big to read.

The commit logs and additional tests should clarify the motivation of
the changes.

comment:12 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:13 in reply to: ↑ 11 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

Ah, okay, then I misunderstood you, but in any event that doesn't
happen. As you probably now understand, the auxiliary tree is fixed
at the time of initial loading (maintaining it with dynamic update is
an issue, but by that time we'll have to fundamentally revisit the
structure anyway).

By the time, I hope to have the new in-memory data structure. We should soon stop improving this one and start working on the design of the one that can handle the shared memory and dynamic updates.

I've updated the branch for addressing a couple of more corner cases
(and that's all I intended to make). Unfortunately, it made the
implementation a bit more complicated than I originally expected, but
I believe the changes are not too big to read.

The commit logs and additional tests should clarify the motivation of
the changes.

To say the truth, the logic in that file is getting very complicated. Few more changes and it will be virtually impossible to review or fix.

Anyway, that's not really because the new changes, it's just there's too much functionality. I believe the new changes are OK (but I'm surprised the tree does not preserve pointers when updated ‒ after all, it is a tree and it has little reason to change positions of nodes in memory).

comment:14 Changed 8 years ago by vorner

  • Total Hours changed from 0 to 3.98

comment:15 in reply to: ↑ 13 Changed 8 years ago by jinmei

Replying to vorner:

By the time, I hope to have the new in-memory data structure. We should soon stop improving this one and start working on the design of the one that can handle the shared memory and dynamic updates.

[...]

To say the truth, the logic in that file is getting very complicated. Few more changes and it will be virtually impossible to review or fix.

Yeah, I full agree. memory_datasrc.cc has already become non
understandable.

Anyway, that's not really because the new changes, it's just there's too much functionality. I believe the new changes are OK (but I'm surprised the tree does not preserve pointers when updated ‒ after all, it is a tree and it has little reason to change positions of nodes in memory).

The problem is that when we extract node for "b.example" from
"a.b.example" on inserting the former, the current implementation
creates a new node (which is unavoidable of course) and moves the
ownership of the existing name (a.b.example) to the new node. In the
original BIND 9 implementation it preserves existing nodes' ownership
when adding a new node - we should do the same thing.

From a quick look this is the only point that led to the additional
hack for the purpose of this branch. We could create a new ticket to
fix that (and probably also remove the hack at the same time), but
just as we discussed above, we might rather focus on redesigning. Do
you have an opinion?

At the moment, I'm going to add the following comment so we don't
forget it even if we don't do this soon.

// Note: when we redesign this (still keeping the basic concept), we should
// change this part so the newly created node will be used for the inserted
// name (and therefore the name for the existing node doesn't change).
// Otherwise, things like shortcut links between nodes won't work.

(I don't think we need to have an explicit review cycle for this, so
I'll directly add it to the master with the merge. But if you see any
problem please raise it).

Now merge done, closing the ticket.

Thanks.

comment:16 Changed 8 years ago by jinmei

  • Total Hours changed from 3.98 to 16.64

comment:17 Changed 8 years ago by jinmei

  • Resolution set to fixed
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.