Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#1430 closed defect (fixed)

Data source error with mismatched label counts

Reported by: shane Owned by: shane
Priority: high Milestone: Sprint-20120110
Component: Unclassified Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: Medium
Sub-Project: DNS Feature Depending on Ticket: none
Estimated Difficulty: 2 Add Hours to Ticket: 0
Total Hours: 2.80 Internal?: no

Description

I have a zone that looks like this:

time-travellers.nl.eu.org. 3600 IN      SOA     b.time-travellers.nl.eu.org. shane_kerr.yahoo.com. 2011111807 300 300 1209600 1800
time-travellers.nl.eu.org. 3600 IN      NS      b.time-travellers.nl.eu.org.
time-travellers.nl.eu.org. 3600 IN      NS      he.time-travellers.nl.eu.org.
time-travellers.nl.eu.org. 3600 IN      NS      borg.c-l-i.net.

This causes an exception in xfrin. I tracked it down and it looks like the problem is caused because of a sign error when looking up the short NS in the longer actual domain. So, looking up BORG.C-L-I.NET in TIME-TRAVELLERS.NL.EU.ORG causes the problem.

Subtickets

Change History (23)

comment:1 Changed 8 years ago by shane

The following patch fixes the problem:

diff --git a/src/lib/datasrc/database.cc b/src/lib/datasrc/database.cc
index 1bf93fc..053d4bc 100644
--- a/src/lib/datasrc/database.cc
+++ b/src/lib/datasrc/database.cc
@@ -520,7 +520,7 @@ DatabaseClient::Finder::find(const isc::dns::Name& name,
                 // It's not empty non-terminal. So check for wildcards.
                 // We remove labels one by one and look for the wildcard there.
                 // Go up to first non-empty domain.
-                for (size_t i(1); i <= current_label_count - last_known; ++i) {
+                for (size_t i(1); i + last_known <= current_label_count; ++i) {
                     // Construct the name with *
                     const Name superdomain(name.split(i));
                     const string wildcard("*." + superdomain.toText());

I still need to make a unit test that causes the failure.

comment:2 Changed 8 years ago by jelte

  • Owner set to shane
  • Status changed from new to assigned

This looks OK, I can confirm it solves the problem, and the change itself shouldn't hurt either way.

comment:3 Changed 8 years ago by jinmei

Note: we need a test that would fail without the fix to complete this ticket.

comment:4 Changed 8 years ago by shane

  • Milestone changed from New Tasks to Next-Sprint-Proposed

comment:5 Changed 8 years ago by shane

  • Feature Depending on Ticket set to none

comment:6 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 2

comment:7 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20111220

comment:8 Changed 8 years ago by shane

  • Owner changed from shane to UnAssigned

comment:9 Changed 8 years ago by jelte

  • Priority changed from major to critical

comment:10 Changed 8 years ago by jinmei

Shane:

I've looked into the ticket, and while I see the potential problem in
this area, the specific problem description didn't make sense to me.

Do you mean xfrin tries to look up in the
time-travellers.nl.eu.org. zone for borg.c-l-i.net.? If so,
(referring to the 20111128 release version) it seems an exception
should be triggered at this point:

    const size_t remove_labels(current_label_count - origin_label_count);

    // Now go trough all superdomains from origin down
    for (int i(remove_labels); i > 0; --i) {
        Name superdomain(name.split(i));

which is before the point your suggested patch changes.

So it doesn't make sense to me the patch really solved the problem.

Do you have a copy of the exception output from xfrin or can you
reproduce the problem by reverting your fix?

Also, if xfrin really looked up an out-of-zone name, xfrin itself is
buggy there and should be fixed, too. Even though find() shouldn't
throw an exception this way (i.e., it shouldn't naively produce an
overflow and surprise itself), the general assumption is that the
caller is expected to call find() after identifying the correct zone.
Any call to find() for an out-of-zone name is therefore consider a bug
of the caller, whether or not it results in an exception.

comment:11 Changed 8 years ago by jinmei

  • Owner changed from UnAssigned to shane

comment:12 Changed 8 years ago by shane

  • Owner changed from shane to jinmei

Jinmei reports that he sorted everything on this out except he thinks maybe it was xfrout and not xfrin. I can't remember to be honest, maybe I reported the wrong program. :(

Passing ticket back!

comment:13 Changed 8 years ago by jinmei

trac1430 is ready for review.

First, self comment on my previous comments: On further look I
realized this couldn't trigger an exception:

    const size_t remove_labels(current_label_count - origin_label_count);

    // Now go trough all superdomains from origin down
    for (int i(remove_labels); i > 0; --i) {
        Name superdomain(name.split(i));

If overflow happens "i" would be a negative value so the entire for
loop should be skipped.

Second, as a result of refactoring the find() method an exception
couldn't happen anyway. But, more fundamentally, we should have
caught and rejected out-of-zone query name earlier in the find()
method; otherwise expressions like `name.getLabelCount() -
origin_label_count` would have a bogus value and confuse the rest of
the code. Even if it doesn't trigger an exception the end result is
also garbage (for example, it could be NXRRSET, which is obviously
incorrect).

So, in this branch I added an explicit check for an out-of-zone name
and made sure it's rejected with NXDOMAIN (which is compatible with
the inmemory data source behavior in this case).

As for the application, I suspect it was actually xfrout, not xfrin.
xfrout naively calls find() for NS names in notify_out (which was
actually written by myself, but the older code using the older API had
the same problem. It just didn't result in an exception). As I noted
in my previous comment, this behavior of application is also a bug,
and should be fixed even if it doesn't die due to an exception. I'll
make a separate ticket for this (right now it won't die so it's not
urgent).

After fixing notify_out (for xfrout), I think we should actually
rather throw an exception for out-of-zone qname for find(). At least
returning NXDOMAIN doesn't seem to be the best choice, because then
the caller may misunderstand the name could belong to the zone but
actually doesn't exist. Also, since this is basically an
application's bug should it happen, it would be better to signal it
more explicitly rather than allowing the application to be unaware of
it. I plan to make a separate ticket for this, too.

The proposed changelog entry is:

350.?	[bug]		jinmei
	ZoneFinder::find() for database based data sources didn't
	correctly identify out-of-zone query name and could return a
	confusing result such as NXRRSET.  It now returns NXDOMAIN with an
	empty RRset.  Note: we should rather throw an exception in such a
	case, which should be revisited later.
	(Trac #1430, git TBD)

comment:14 Changed 8 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from assigned to reviewing

comment:15 Changed 8 years ago by vorner

  • Owner changed from UnAssigned to vorner

comment:16 follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

The code looks nice and sane. However, can we check somehow this really solves the problem?

Thanks

comment:17 in reply to: ↑ 16 ; follow-up: Changed 8 years ago by jinmei

Replying to vorner:

Hello

The code looks nice and sane. However, can we check somehow this really solves the problem?

Okay, then I'll tentatively give the ticket to Shane so he can at
least confirm this version doesn't re-introduce his problem. (I
suspect the find() refactoring already "hid" the problem so it would
actually be pretty difficult to check if this change "fixes" the
problem).

comment:18 Changed 8 years ago by jinmei

  • Owner changed from jinmei to shane

comment:19 in reply to: ↑ 17 Changed 8 years ago by jinmei

Replying to jinmei:

The code looks nice and sane. However, can we check somehow this really solves the problem?

Okay, then I'll tentatively give the ticket to Shane so he can at
least confirm this version doesn't re-introduce his problem. (I
suspect the find() refactoring already "hid" the problem so it would
actually be pretty difficult to check if this change "fixes" the
problem).

Shane, do you think you can do this check soon (e.g. within this
year)? If you are busy for other higher priority tasks, I think we
can simply merge the branch and close the ticket for now. While
explicit confirmation is nice, I'm quite sure the change itself
doesn't do any harm. If for some other reasons the same/similar
symptom happens we can open a new ticket.

comment:20 Changed 8 years ago by shane

No worries Jinmei, I'll have time tomorrow!

comment:21 Changed 8 years ago by jinmei

Okay, I think it's time up. I've just merged the branch to master,
and will close this ticket for now, mainly for not keeping the review
queue unnecessarily longer.

If this change triggers an unexpected side effect or regression,
please create a new ticket.

comment:22 Changed 8 years ago by jinmei

  • Resolution set to fixed
  • Status changed from reviewing to closed
  • Total Hours changed from 0 to 2.80

comment:23 Changed 8 years ago by shane

No worries. I did finally get time to try this today, and it seems as expected everything is working fine.

Note: See TracTickets for help on using tickets.