Opened 8 years ago

Closed 7 years ago

Last modified 7 years ago

#1771 closed defect (fixed)

database datasource incorrectly rejects "on zonecut" glue

Reported by: jinmei Owned by: jinmei
Priority: medium Milestone: Sprint-20120703
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 3 Add Hours to Ticket: 0
Total Hours: 3.5 Internal?: no

Description (last modified by jinmei)

e.g. in the "example" zone,

child.example. IN NS child.example.
child.example. IN AAAA 2001:db8::1

It rejects this setup (at the time of find()) due to this:

DatabaseClient::Finder::getRRsets(const string& name, const WantedTypes& types,
                                  bool check_ns, const string* construct_name,
                                  bool any)
...
    if (check_ns && seen_ns && seen_other) {
        isc_throw(DataSourceError, "NS shares domain " << name <<
                  " with something else");
    }

This is incorrect and should be fixed.

I'm temporarily disabling tests in zone_finder_context_unittest due to
this bug (in the work for #1608). When this ticket is resolved please
re-enable those tests too.

Subtickets

Change History (27)

comment:1 Changed 8 years ago by jinmei

  • Description modified (diff)

comment:2 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 3

comment:3 Changed 8 years ago by jinmei

  • Milestone set to Next-Sprint-Proposed
  • Priority changed from medium to high

comment:4 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120529

comment:5 Changed 8 years ago by jinmei

  • Priority changed from high to medium

comment:6 Changed 8 years ago by muks

  • Owner set to muks
  • Status changed from new to assigned

Picking

comment:7 Changed 8 years ago by muks

  • Owner changed from muks to UnAssigned
  • Status changed from assigned to reviewing

Up for review.

I'm not sure if this is the correct approach, but if it isn't, please point it out in the review. getAdditionalDelegationAtZoneCut does not call find() with FIND_GLUE_OK.

comment:8 Changed 8 years ago by vorner

  • Owner changed from UnAssigned to vorner

comment:9 follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to muks

Hello

I think this should work. However, I'd appreciate two things:

  • Test that the glue can be extracted by calling find.
  • An explanation comment about the may_have_glue variable. The name is not crystal-clear and I think, while the purpose is obvious in the context of this single diff, it would take some time to grasp the purpose when reading the function.

Also, this allows the domain name to contain any A/AAAA, not just the glue one. I'm not sure there's a reasonable way to check the A/AAAA is really glue, though. So maybe comment the check may have a false negative in that regard?

And, as this is a publicly-observable bug, I think it should have a changelog entry.

Thank you

comment:10 follow-up: Changed 8 years ago by jinmei

Maybe we should discuss this in a wider place (in terms of the
possibly philosophical discussion of the definition of "glue"), but
I personally wouldn't limit the search to A/AAAA.

At least it's different from the original intent of the FIND_GLUE_OK
option (e.g., read its description in zone.h. it doesn't mention
about particular RR types). The in-memory implementation is also
RR-type agnostic.

comment:11 in reply to: ↑ 9 Changed 8 years ago by muks

Hi vorner

Replying to vorner:

I think this should work. However, I'd appreciate two things:

  • Test that the glue can be extracted by calling find.

This bug (tests which were broken) doesn't use FIND_GLUE_OK though.

  • An explanation comment about the may_have_glue variable. The name is not crystal-clear and I think, while the purpose is obvious in the context of this single diff, it would take some time to grasp the purpose when reading the function.

I have added an example of what it indicates as a comment.

Also, this allows the domain name to contain any A/AAAA, not just the glue one. I'm not sure there's a reasonable way to check the A/AAAA is really glue, though. So maybe comment the check may have a false negative in that regard?

I didn't follow what you mean here by any A/AAAA. Can you give me an example of what case you mean?

And, as this is a publicly-observable bug, I think it should have a changelog entry.

How about this:

XYZ.    [bug]           muks
        The database datasource has been fixed to not reject glue records.

comment:12 in reply to: ↑ 10 ; follow-up: Changed 8 years ago by muks

Replying to jinmei:

Maybe we should discuss this in a wider place (in terms of the
possibly philosophical discussion of the definition of "glue"), but
I personally wouldn't limit the search to A/AAAA.

This restriction has been removed now, so all 'other' RRtypes are allowed.

comment:13 Changed 8 years ago by muks

  • Owner changed from muks to vorner

comment:14 in reply to: ↑ 12 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to muks

Hello

Replying to muks:

Replying to vorner:

I think this should work. However, I'd appreciate two things:

  • Test that the glue can be extracted by calling find.

This bug (tests which were broken) doesn't use FIND_GLUE_OK though.

That is true. But getting the glue from the place where the NS is is valid usage and it was not possible before this fix (maybe that's why the test was not written) and it should be possible now, thanks to the change. So I think a new test should be written to check the scenario.

Also, this allows the domain name to contain any A/AAAA, not just the glue one. I'm not sure there's a reasonable way to check the A/AAAA is really glue, though. So maybe comment the check may have a false negative in that regard?

I didn't follow what you mean here by any A/AAAA. Can you give me an example of what case you mean?

Well, let's say we have this in the same domain name domain.example.org:
NS sub.domain.example.org.
NS sub2.domain.example.org.
A 192.0.2.1

The A record is not the glue, it is not referenced by the NS. But this will not complain. However, such checks would be crazy to code and mostly useless anyway. I just wanted to note it.

And, as this is a publicly-observable bug, I think it should have a changelog entry.

How about this:

XYZ.    [bug]           muks
        The database datasource has been fixed to not reject glue records.

Not all glue records were rejected before, were they?

Replying to muks:

Replying to jinmei:

Maybe we should discuss this in a wider place (in terms of the
possibly philosophical discussion of the definition of "glue"), but
I personally wouldn't limit the search to A/AAAA.

This restriction has been removed now, so all 'other' RRtypes are allowed.

This raises another question. What is the situation when the exception is thrown? Can it happen at all? And if so, in what circumstances? Can it be a false positive too?

    if (check_ns && seen_ns && (!may_have_glue && seen_other)) {
        isc_throw(DataSourceError, "NS shares domain " << name <<
                  " with something else");
    }

comment:15 in reply to: ↑ 14 ; follow-up: Changed 8 years ago by jinmei

Replying to jinmei:

Maybe we should discuss this in a wider place (in terms of the
possibly philosophical discussion of the definition of "glue"), but
I personally wouldn't limit the search to A/AAAA.

This restriction has been removed now, so all 'other' RRtypes are allowed.

This raises another question. What is the situation when the exception is thrown? Can it happen at all? And if so, in what circumstances? Can it be a false positive too?

    if (check_ns && seen_ns && (!may_have_glue && seen_other)) {
        isc_throw(DataSourceError, "NS shares domain " << name <<
                  " with something else");
    }

And, actually, I'd rather basically omit the "NS + seen_other" check,
maybe except for specific cases like NS + DNAME that are prohibited by
the protocol.

We already ensure that records on and under a zone cut are hidden at
the caller side of getRRsets() (unless GLUE_OK is specified), and, as
long as it's ensured I think it's too restrictive to reject all other
cases. IMO this layer should be flexible about what kind of data can
be a glue (except for those already prohibited by the protocol), and
let the higher layer such as auth::Query class make that decision.

In any case the current implementation seems quite tricky:

                seen_ns = true;

                if (Name(columns[DatabaseAccessor::RDATA_COLUMN]) == Name(name))
                    may_have_glue = true;

I cannot understand from a glance what this condition really means, in
which case it doesn't hold, and for what purpose we do this (besides,
it needs editorial fixes: it's too long, and misses braces). Even if
we still want to selectively reject some cases, I'd like to see more
intuitive (and if possible more efficient - it requires text-to-name
conversions twice and comparison of two names, which are not very
cheap) implementation; and, even if the end result is still this
check, I think we need to explain it in comments.

comment:16 in reply to: ↑ 15 ; follow-ups: Changed 8 years ago by muks

  • Owner changed from muks to vorner

Replying to vorner:

I didn't follow what you mean here by any A/AAAA. Can you give me an example of what case you mean?

Well, let's say we have this in the same domain name domain.example.org:
NS sub.domain.example.org.
NS sub2.domain.example.org.
A 192.0.2.1

The A record is not the glue, it is not referenced by the NS. But this will not complain. However, such checks would be crazy to code and mostly useless anyway. I just wanted to note it.

may_have_glue doesn't allow all in-bailiwick cases, but only the exact match. i.e., may_have_glue is only true if NS domain.example.org exists and then the A RR would be allowed. In the above case, may_have_glue would be false.

And, as this is a publicly-observable bug, I think it should have a changelog entry.

How about this:

XYZ.    [bug]           muks
        The database datasource has been fixed to not reject glue records.

Not all glue records were rejected before, were they?

You are right. Before, there was a reject coded only for the cases where a NS record existed and the delegated zone name had other RRs. In-domain cases where the NS record did not equal the delegated zone name (such as the example you've provided above) would have passed. I'll update the ChangeLog? message for it.

Replying to muks:
This raises another question. What is the situation when the exception is thrown? Can it happen at all? And if so, in what circumstances? Can it be a false positive too?

    if (check_ns && seen_ns && (!may_have_glue && seen_other)) {
        isc_throw(DataSourceError, "NS shares domain " << name <<
                  " with something else");
    }

In the example that you provided above, this exception would be thrown now.

Replying to jinmei:

And, actually, I'd rather basically omit the "NS + seen_other" check,
maybe except for specific cases like NS + DNAME that are prohibited by
the protocol.

We already ensure that records on and under a zone cut are hidden at
the caller side of getRRsets() (unless GLUE_OK is specified), and, as
long as it's ensured I think it's too restrictive to reject all other
cases. IMO this layer should be flexible about what kind of data can
be a glue (except for those already prohibited by the protocol), and
let the higher layer such as auth::Query class make that decision.

Removing that check (or replacing with NS+DNAME check) causes some tests to fail. I have to check why it does (and read the datasrc code more completely). But for now, for this bug, can we use the code in this branch which does not change behavior significantly?

In any case the current implementation seems quite tricky:

                seen_ns = true;

                if (Name(columns[DatabaseAccessor::RDATA_COLUMN]) == Name(name))
                    may_have_glue = true;

I cannot understand from a glance what this condition really means, in
which case it doesn't hold, and for what purpose we do this (besides,
it needs editorial fixes: it's too long, and misses braces). Even if
we still want to selectively reject some cases, I'd like to see more
intuitive (and if possible more efficient - it requires text-to-name
conversions twice and comparison of two names, which are not very
cheap) implementation; and, even if the end result is still this
check, I think we need to explain it in comments.

This specifically checks for the in-bailiwick case where the NS name equals the delegated zone name exactly, and allows glue. Before these changes, the code in this method allowed other kinds of _valid glue_ anyway, just not the NS name == delegated zone name case. But in something like the following, the A record is NOT glue:

foo.example.org. NS 3600 ns.example.com.
foo.example.org. A  3600 192.0.2.1

The Name(name) has been replaced with a Name object outside the loop.

comment:17 in reply to: ↑ 16 Changed 8 years ago by muks

Replying to muks:
But in something like the following, the A record is NOT glue:

foo.example.org. NS 3600 ns.example.com.
foo.example.org. A  3600 192.0.2.1

What I mean here is that this is why the NS+seen_other check is present in this method.

comment:18 in reply to: ↑ 16 Changed 8 years ago by jinmei

Replying to muks:

We already ensure that records on and under a zone cut are hidden at
the caller side of getRRsets() (unless GLUE_OK is specified), and, as
long as it's ensured I think it's too restrictive to reject all other
cases. IMO this layer should be flexible about what kind of data can
be a glue (except for those already prohibited by the protocol), and
let the higher layer such as auth::Query class make that decision.

Removing that check (or replacing with NS+DNAME check) causes some tests to fail. I have to check why it does (and read the datasrc code more completely). But for now, for this bug, can we use the code in this branch which does not change behavior significantly?

Hmm, I'm personally tempted to do it in a clean way from the beginning
(assuming we basically agree on what would be cleaner) as I'm afraid
it's quite likely to be left undone as a lower priority thing for a
long period, resulting in keeping less understandable code. But I'm
just an observer in the review process for this ticket anyway, so I'd
leave it to you two.

comment:19 Changed 8 years ago by vorner

  • Owner changed from vorner to muks

Hello

After reading it two more times (the whole function), I came to conclusion that it does what it should. However, the fact that I needed to read it so carefully probably means something. I think it should be cleaned up somehow, but I'm not really sure how.

If it wouldn't be possible, would you add a comment noting to the counterintuitive thing that this compares to the RDATA, not to the name (this was how I was confused before)?

Regarding the comment from Jinmei, I'm not sure. Maybe because I don't know what should and what should not be considered glue. So I'll leave it up for you.

comment:20 Changed 8 years ago by muks

comment:21 Changed 7 years ago by jinmei

I'll take a deeper look into it as there seems to be some confusion
based on the quick chat at yesterday's daily call.

comment:22 Changed 7 years ago by jinmei

  • Owner changed from muks to jinmei

Okay, on looking into it more closely, I've come up with a specific
suggestion, implemented it in trac1771-2, and pushed it.

My suggestion is to make it getRRsets() more dumb: it basically just
sees whether the specified type of RR exists in the given context and
returns it; it doesn't have to care about whether an NS and other
type of RR coexist.

Of course, we need to make sure that nothing on or under zone cut
(except for the delegation information) would accidentally leak to the
caller unless GLUE_OK is specified, but the existing code actually
already ensures that.

So, by making getRRsets() more dumb, the only question is whether we
still want to catch such case explicitly and throw an exception. This
may be debatable, but in my personal opinion we don't have to do
that. Although it's a bit strange configuration and useless in
practice (because it's hidden), it does not necessarily cause harmful
result like ambiguity due to multiple CNAMEs.

The initial commit ef571c2 implements this policy. I think it's quite
simple and straightforward. Although we need to adjust an existing
test case, the affected part is limited. And, without changing any
other part of the main implementation, the previously failing tests
now pass.

The rest of the branch is just for cleanup and documentation update
(which will clarify the related issue a bit more). But even including
these, the entire branch should be pretty concise.

So, in conclusion, my suggestion is to use this version of the change
instead of current trac1771. If we can agree that it's okay to not
throw on seeing "NS+other", I believe this implementation is much
simpler and easier to understand and maintain.

comment:23 Changed 7 years ago by jinmei

  • Owner changed from jinmei to muks

comment:24 follow-up: Changed 7 years ago by muks

  • Owner changed from muks to jinmei

This branch is fine. I agree the code is cleaner now, but should we add a ChangeLog? entry for it because the new branch allows non-glue records now? Even with GLUE_OK, this isn't strictly glue for any of the accepted meanings:

     {"brokenns2.example.org.", "NS", "3600", "", "ns.example.com."},
     {"brokenns2.example.org.", "A", "3600", "", "192.0.2.1"},

Minor change: you may want to move the is_origin assignment into the if (found.first) block:

// @@ DatabaseClient::Finder::findInternal(const Name& name, const RRType& type,

     const bool is_origin = (name == getOrigin());
     final_types.insert(type);
     const FoundRRsets found = getRRsets(name.toText(), final_types,
-                                        !is_origin, NULL,
-                                        type == RRType::ANY());
+                                        NULL, type == RRType::ANY());
     FindDNSSECContext dnssec_ctx(*this, options);
     if (found.first) {
         // Something found at the domain name.  Look into it further to get

Please also repeat for not_origin in a previous hunk too. No need to get these reviewed. :)
You can go ahead and merge.

comment:25 in reply to: ↑ 24 Changed 7 years ago by jinmei

Replying to muks:

Thanks for the review.

This branch is fine. I agree the code is cleaner now, but should we add a ChangeLog? entry for it because the new branch allows non-glue records now? Even with GLUE_OK, this isn't strictly glue for any of the accepted meanings:

     {"brokenns2.example.org.", "NS", "3600", "", "ns.example.com."},
     {"brokenns2.example.org.", "A", "3600", "", "192.0.2.1"},

I've made the following changelog

451.	[bug]		muks, jinmei
	libdatasrc: the database-based data source now correctly returns
	glue records on (not under) a zone cut, such as in the case where
	the NS name of an NS record is identical to its owner name. (Note:
	libdatasrc itself doesn't judge what kind of record type can be a
	"glue"; it's the caller's responsibility.)
	(Trac #1771, git 483f1075942965f0340291e7ff7dae7806df22af)

I'd personally still argue the definition of glue is not fully fixed
and that it's still open whether the above A record can be a glue or
not, although it looks like I'm in the minority group within the BIND
10 team and I know it's seemingly different from what's described in
Peter Koch's draft. In any event, the intent of GLUE_OK (which was
borrowed from a BIND 9 API) was not to give this definition. In
retrospect it should have been named some like "IGNORE_ZONECUT", which
would indicate what this option intends to be and wouldn't cause
unnecessary confusion about the definition of "glue".

I'll make a ticket to make this change and see whether others think it
important enough to address.

Minor change: you may want to move the is_origin assignment into the if (found.first) block:

Please also repeat for not_origin in a previous hunk too. No need to get these reviewed. :)
You can go ahead and merge.

Okay, changed these as suggested, merged, and closing.

Please also delete the original trac1771 branch from the public repo.

comment:26 Changed 7 years ago by jinmei

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:27 Changed 7 years ago by jinmei

  • Total Hours changed from 0 to 3.5
Note: See TracTickets for help on using tickets.