Opened 9 years ago

Closed 9 years ago

#1177 closed task (complete)

NSEC support in new data source

Reported by: jinmei Owned by: vorner
Priority: medium Milestone: Sprint-20111011
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Like #1176, we should also support the ability to include NSEC (and
its RRSIG) for negative answers in DatabaseClient::Finder::find().
This should be done after #1176.

Subtickets

Attachments (2)

jinmei.zone.signed (98.4 KB) - added by jinmei 9 years ago.
zone.h.diff (1.2 KB) - added by jinmei 9 years ago.

Download all attachments as: .zip

Change History (29)

comment:1 Changed 9 years ago by shane

  • Milestone changed from Next-Sprint-Proposed to Sprint-20110830

comment:2 Changed 9 years ago by vorner

  • Owner set to vorner
  • Status changed from new to accepted

comment:3 Changed 9 years ago by stephen

  • Estimated Difficulty changed from 0 to 4

comment:4 Changed 9 years ago by stephen

  • Estimated Difficulty changed from 4 to 5

comment:5 follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to UnAssigned
  • Status changed from accepted to reviewing

I decided to split this, so the continuation lives in #1244. This part implements only the part for DatabaseClient? and the accessors.

Currently, it returns NSEC records when NXRRSET, NXDOMAIN.

As the logic in isc::auth::Query will need to add more NSEC records in case of wildcard, new result codes are added for them. There might be also need for empty nonterminal case.

The new result codes are optional for any backend not supporting DNSSEC, so the in-memory is not updated.

If the accessor doesn't support DNSSEC and the DNSSEC is requested, the NSEC records are not present, but it doesn't throw. Is it OK?

I don't know what is the correct way to handle empty nonterminal asterisk cases (eg. if there's a.*.example.org, then *.example.org is empty nonterminal and therefore b.example.org must give NXRRSET to the user, but what is the correct proof in this case? I think I got lost in the RFCs enough). Or, should we say this doesn't happen and we don't really care much?

I'm going to be away next week, so if there are any bigger changes needed, it might better be given to unassigned for someone else to make them. I'll be online from time to time, so smaller changes should be OK.

Thanks.

comment:6 Changed 9 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:7 follow-up: Changed 9 years ago by jinmei

I'll give the "refactoring" part a review first.

In general, I support the idea of this refactoring: it seems to be
clearer to let getRRset(s) be responsible only for a single task.

I've noted some points where variables can be const. These should be
quite trivial and non controversial, so I made the change directly.

getRRsets()

  • as a result of replacing getRRset() with this one, most of the documentation for the original method was removed, which doesn't seem to be good (even though it may not be fully detailed as it's private).
  • exception parameter variables aren't used and can be removed, e.g.:
            } catch (const InvalidRRType&) {
    
  • I'm not sure if this comment is relevant, but at least NSEC should be able to coexist with NS/CNAME (and as commented DS should be okay with NS, too):
                } else if (cur_type != RRType::RRSIG()) {//RRSIG can live anywhere
                    // FIXME: Is something else allowed in the delegation point? DS?
                    seen_other = true;
                }
    
  • regarding the previous point, doesn't it make more sense to move this type of check outside of this method completely?
  • A map<RRType, RRsetPtr> object is copied twice: when constructing FoundRRsets at the end of the method, and when passing the return value to the caller. Unlike the previous type (a shared pointer), the copy overhead is bigger. Should we care about it? (perhaps doesn't matter much, because operations requiring DB access are generally slow anyway).

Code in the unnamed namespace

  • empty_types is defined as a namespace scope static variable, which seems to be dangerous in terms of static initialization fiasco. Looking into how it's used, I suspect we actually don't (necessarily) need it, especially with taking a risk of introducing the fiasco:
        //static WantedTypes nsec_types(empty_types + RRType::NSEC());
        static WantedTypes nsec_types(WantedTypes() + RRType::NSEC());
    
  • operator+: I personally don't think this definition is intuitive. I'd expect an 'operator+' (with two params) to take the same types of arguments and to meet other natural expectations derived from the arithmetic '+' operator (such as the commutative law). Also, while admittedly this may purely be a matter of taste, I generally agree with the concerns documented in the google's guideline: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Operator_Overloading
  • Considering these, I'd suggest something like this:
    • Define nsec_types etc via a proxy function:
      const set<RRType>&
      NSEC_TYPES() {
          static set<RRType>* types;
          if (types == NULL) {
              auto_ptr<set<RRType> > types_ptr(new set<RRType>()); 
              types_ptr->insert(RRType::NSEC());
              types = types_ptr.release();
          }
          return (*types);
      }
      
    • For 'final_type' and 'wildcard_type', use the same technique with explicit insert():
              WantedTypes final_types(FINAL_TYPES());
              final_types.insert(type);
      
      (btw: with this approach you could also combine the definition of "final" and "wildcard")

find()

  • the changes in the first for loop is not a straightforward refactoring of the original code, and it's difficult to be sure if it really preserves the original behavior. For example, is the behavior same when DNAME exists and the first 'if' conditions are met?
  • as a more fundamental point, this method is now generally too big and complicated, and difficult to follow (although it's not a result of this branch). I suspect we'll need overall cleanup/refacotring to improve clarity and readability. (does #1198 intend to improve the situation?)
  • (not actually for this branch but) I'd avoid the construction overhead of "star" by defining a proxy function:
    const Name&
    STAR_NAME() {
        static Name star("*");
        return (star);
    }
    
  • (ditto) this "TODO" now seems to be doable:
                        // TODO: Once the underlying DatabaseAccessor takes
                        // string, do the concatenation on strings, not
                        // Names
    

comment:8 follow-up: Changed 9 years ago by jinmei

Overall the code looks okay, but (as usual) I have some points to
discuss.

I've also made some editorial/style changes directory on the branch.

Higher level points

  • This should be beyond the scope of this ticket anyway, but I've noticed our approach to findPreviousName() using the reversed name may not work for labels using the \ddd notation. For example, while \001.example.com. < a.example.com. < \200.example.com. (where '<' should be considered isc::dns::Name::operator<) our current DB schema and findPrevious implementation cannot order them correctly. We should probably discuss this at the dev list.
  • In the current implementation the negative proof is always looked for when FIND_DNSSEC flag is on. Assuming it's on when the corresponding query from the client at the higher level has the DO bit, the flag would actually be often on (both BIND 9 and unbound set it by default, as far as I know) while on the other hand many zones would actually not be signed. So the attempt of getting the proof is in many cases unnecessary overhead, which may be too costly even though the database involved lookup is already generally expensive. So, in order to avoid that, I wonder we should probably introduce the notion of a flag for a zone that identifies whether it's "signed" or not. BIND 9 has this flag depending on whether the zone has a DNSKEY record. For database backed zones this may not be trivial because the DNSKEY may be added or deleted bypassing the BIND 10 API, but maybe we should still consider it even if the result is an incomplete solution as a compromise. In any case, this point would also be beyond the scope of this ticket. I'm fine with skipping this point for now.

auth/query.cc

Do we need a test case for the "default"?

zone.h

  • "the DNSSEC/NSEC order" sounds a bit awkward:
        /// Gets the previous name in the DNSSEC/NSEC order. This can be used
        /// to find the correct NSEC records for proving nonexistence
        /// of domains.
    
    At least I've never heard the phrase of "NSEC order". To be fully pedantic, it would be the "canonical order (as specified in RFC 4034)"; to be a bit informal (I'm okay with it) I'd simply say "the DNSSEC order". The same comment applies to several other parts of the diff.

database.cc/h

  • According to lcov there are some untested code fragments:
            } catch (const rdata::InvalidRdataText& ird) {
                isc_throw(DataSourceError, "Invalid rdata in database for " <<
                          name << ": " << columns[DatabaseAccessor::
                          RDATA_COLUMN]);
    [...]
                                if (cni != found.second.end() &&
                                    type != RRType::CNAME()) {
                                    result_rrset = cni->second;
                                    result_status = CNAME;
                                } else if (nsi != found.second.end()) {
                                    result_rrset = nsi->second;
                                    result_status = DELEGATION;
    [...]
                            // The previous doesn't contain NSEC, bug?
                            isc_throw(DataSourceError, "No NSEC in " +
    
  • DatabaseAccessor::findPreviousName(), documentation: I suspect it's not so obvious why we need to use the reversed form, or even what's the "reversed form" in the first place. Some more details about the rationale with a couple of examples would help.
  • DatabaseAccessor::findPreviousName(): what if the name is out of zone and there's actually no "previous name"?
  • DatabaseClient::Finder::findPreviousName(): what if the accessor_->findPreviousName() returns a bogus name (like "example..com")? okay to propagate an exception from the Name class?
  • find(): relating to whether to consider the existence of NSEC in the accessor's findPreviousName() (see the comment for sqlite3_accessor), I'd explicitly expect the case where NSEC doesn't exist here rather than letting the accessor implementation check that:
                        } else {
                            // The previous doesn't contain NSEC, bug?
                            isc_throw(DataSourceError, "No NSEC in " +
                                      coverName.toText() + ", but it was "
                                      "returned as previous - "
                                      "accessor error?");
                        }
    
    and return NXDOMAIN/NXRRSET without a proof.
  • find(): maybe we should log the event here?
                    catch (const isc::NotImplemented&) {
                        // Well, they want DNSSEC, but there is no available.
                        // So we don't provide anything.
                    }
    

database_unittest

  • Is this a question of what's the expected proof of empty non-terminal wildcard match?
            // FIXME: What should be returned in this case? How does the
            // DNSSEC logic handle it?
    
    If so, I'm not 100% sure either, but I guess it's basically no different from other empty non-terminal case. That is, with having wild.*.foo.example.org, the negative proof for a.foo.example.org would be "prev_name(wild.*.foo.example.org) NSEC wild.*.foo.example.org..." I'd also suggest you check this behavior with other implementation. (Note, however, that you'll need "check-names master ignore;" for BIND 9 to force it to accept empty wilds)
  • If I understand the question correctly...
        // FIXME: Is the nonexistence of this node really the correct proof
        // we need?
    
    ...I believe the answer is yes. The NSEC for "l.example.org." whose next name is "empty.nonterminal.example.org" proves that both a node "nonterminal.example.org" exists and it's empty. You can also see how BIND 9 responds to this case by "dig @ns1.jinmei.org pt.y.jinmei.org +dnssec". (If you send it to ns.jinmei.org, it's BIND 10, and it doesn't seem to handle this case correctly:-). Like the previous case, when you are not sure about a certain corner case I'd strongly suggest seeing how other implementation does. Often these things were already considered and handled, and we can simply steal their lesson rather than wasting our time. In some rare cases we find a bug of the other implementation, which is also not a bad thing.
  • Do you mean we might want to differentiate empty non terminal (especially with a proof by NSEC) from "normal" NXRRSET cases?
        doFindTest(*finder, isc::dns::Name("nonterminal.example.org."),
                   isc::dns::RRType::TXT(), isc::dns::RRType::NSEC(), this->rrttl_,
                   ZoneFinder::NXRRSET, // FIXME: Do we want to have specific code?
                   this->expected_rdatas_, this->expected_sig_rdatas_,
                   Name("l.example.org."), ZoneFinder::FIND_DNSSEC);
    
    If so, I'm not sure. BIND 9 seems to have some intent to differentiate these two cases, but it doesn't actually seem to provide different behaviors for these cases.
  • "previous" test: Like the sqlite3 (or accessor in general) case, I suspect the "wrap around" behavior is too much at this moment. Depending on how we should do this, the test may also have to be adjusted.

sqlite3_accessor (general)

  • the different between FIND_PREVIOUS and FIND_PREVIOUS_WRAP doesn't look so obvious. Some comments would help.
  • comment for the wrong line?
        "WHERE zone_id=?1 AND rdtype = 'NSEC' AND "
        "rname < $2 ORDER BY rname DESC LIMIT 1", // FIND_PREVIOUS_WRAP
        "SELECT name FROM records " <== the comment should be here
    
  • convertToPlainCharInternal(): this function was moved to SQLite3Accessor::Context() mainly for omitting specifying 'db'. Now this advantage is effectively gone, I think it's even better to move it back from SQLite3Accessor::Context(), simply naming it convertToPlainChar().

sqlite3_accessor: findPreviousName()

  • when we solely expect SQLITE_OK, I wouldn't bother to get the actual return code (consuming a mutable variable). I would also use sqlite3_errmsg() to provide more informative error message. That is, instead of
        int rc = sqlite3_bind_int(dbparameters_->statements_[FIND_PREVIOUS], 1,
                                  zone_id);
        if (rc != SQLITE_OK) {
            isc_throw(SQLite3Error, "Could not bind zone ID " << zone_id <<
                      " to SQL statement (find previous)");
        }
    
    I'd do this:
        if (sqlite3_bind_int(...) != SQLITE_OK) {
            isc_throw(SQLite3Error, "Could not bind zone ID " << zone_id <<
                      " to SQL statement (find previous): " <<
    		  sqlite3_errmsg(dbparameters_->db_));
        }
    
  • I don't see why we need to include the condition of "rdtype = 'NSEC'" in the FIND_PREVIOUS statement. In fact, at least the current set of unittests all passes even without this condition. The notion of "previous name" is independent of whether the name has NSEC or not (although in practice the only use of it may be DNSSEC related), and it's also more consistent to the "keep it dumb" design principle to exclude this condition. So, I'd suggest simplifying the statement by removing "rdtype = 'NSEC'". On the other hand, if there is a specific reason why we really need to include this condition, we should document it as part of the abstract interface requirement. The same discussion applies to FIND_PREVIOUS_WRAP, but see the next bullet first.
  • Do we really need FIND_PREVIOUS_WRAP? As the code comment seemingly indicates, it doesn't seem to be a functional requirement. And since the implementation of this part has somewhat lengthy (although it's quite straightforward), I'd not include this behavior at the moment until/unless we see the real need for it through out experiences. In either case, I believe the expected behavior in this case (i.e., what happens if the given name is "smaller" than the zone's origin name) should be clearly documented at the abstract interface level.
  • Do you mean "Could not get..." here?
        if (rc != SQLITE_ROW && rc != SQLITE_DONE) {
            // Some kind of error
            isc_throw(SQLite3Error, "Could get data for previous name");
        }
    

sqlite3_accessor_unittest

  • I'd test the case where it would result in the largest name.
  • I'd test the case for an out of domain name (which would be placed before the origin name, such as previous name of "example.com" in zone "example.org". Note that depending on whether to wrap around, the expected result will be different.
  • I'd check (confirm) the comparison is case insensitive.
  • findPreviousNoData: this doesn't seem to be a good test case for the intended scenario (because the input data is bogus not only because it's missing NSEC but also is out of zone). I'd use something like "com.example.sql2.www.", which would make the test succeed if the "previous" name had NSEC. But note also that I personally suspect we shouldn't include the consideration for NSEC at this level in the first place. That discussion may affect the meaning of this test more fundamentally.

comment:9 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:10 in reply to: ↑ 5 Changed 9 years ago by jinmei

Replying to vorner:

I don't know what is the correct way to handle empty nonterminal asterisk cases (eg. if there's a.*.example.org, then *.example.org is empty nonterminal and therefore b.example.org must give NXRRSET to the user, but what is the correct proof in this case? I think I got lost in the RFCs enough). Or, should we say this doesn't happen and we don't really care much?

I believe I've answered this question in the review comment (for the
database test). But I'm not 100% sure about it either. I'd suggest
you trying that case with other implementations.

comment:11 in reply to: ↑ 7 ; follow-up: Changed 9 years ago by vorner

Hello

So, answering the first part first, the comments are long.

Replying to jinmei:

  • as a result of replacing getRRset() with this one, most of the documentation for the original method was removed, which doesn't seem to be good (even though it may not be fully detailed as it's private).

Yes, good catch, I intended to adapt it and forgot about it.

  • I'm not sure if this comment is relevant, but at least NSEC should be able to coexist with NS/CNAME (and as commented DS should be okay with NS, too):

I added the NSEC and NSEC3 (I expect that one is there too). The DS needs separate
bool variable, as it is not allowed with CNAME.

  • regarding the previous point, doesn't it make more sense to move this type of check outside of this method completely?

From the design point of view, maybe. But this is the place where it seem easiest. If we were to take them out, all the NS and CNAME and other types would have to be returned every time and every time a check would have to be performed, which looks like a bad idea.

  • A map<RRType, RRsetPtr> object is copied twice: when constructing FoundRRsets at the end of the method, and when passing the return value to the caller. Unlike the previous type (a shared pointer), the copy overhead is bigger. Should we care about it? (perhaps doesn't matter much, because operations requiring DB access are generally slow anyway).

As the map will contain only few elements, it shouldn't be much of a problem. And I don't think it is unrealistic to expect the compiler to optimise both copies away and create the result in-place, as the complete code of map is available.

  • empty_types is defined as a namespace scope static variable, which seems to be dangerous in terms of static initialization fiasco. Looking into how it's used, I suspect we actually don't (necessarily) need it, especially with taking a risk of introducing the fiasco:
        //static WantedTypes nsec_types(empty_types + RRType::NSEC());
        static WantedTypes nsec_types(WantedTypes() + RRType::NSEC());
    

Ah, right. Well, I don't think there could be the fiasco, but from second look at it, it was probably mixture of premature optimisation on my side and confusion about what I'm going to write anyway.

  • operator+: I personally don't think this definition is intuitive. I'd expect an 'operator+' (with two params) to take the same types of arguments and to meet other natural expectations derived from the arithmetic '+' operator (such as the commutative law). Also, while admittedly this may purely be a matter of taste, I generally agree with the concerns documented in the google's guideline: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Operator_Overloading

Actually, I don't agree there at all with the google styleguide. Then we're getting to the java world, where you have to write string_a.equals(string_b), because it can't overload operators. That is much less readable code than with ==.

As with this concrete operator ‒ adding an element to a set is something I did see written this way in mathematics, mainly in graph theory and combinatorics (and even substracting an element from set). So, to me, the use of the operator for this is much more intuitive than + with strings. You have a valid point with the commutative thing, but it isn't used the other way around (in the code, nor in mathematics), even if it could be, so I just saved myself writing another copy with switched order of parameters.

Even if the definition might be less intuitive, I think the intention is clear where used and probably can't be confused with any other possible use of the + operator.

  • Considering these, I'd suggest something like this:
    • Define nsec_types etc via a proxy function:
      const set<RRType>&
      NSEC_TYPES() {
          static set<RRType>* types;
          if (types == NULL) {
              auto_ptr<set<RRType> > types_ptr(new set<RRType>()); 
              types_ptr->insert(RRType::NSEC());
              types = types_ptr.release();
          }
          return (*types);
      }
      

That's what I tried before introducing the operator and it turned out to be a lot of typing and quite a lot of code, which was much less readable than simple inline list of types separated by + signs.

So, I'd rather like to keep the operator there. It's pity we can't overload the , operator, because that one would be even better for the list. But this is C++, not perl6.

  • the changes in the first for loop is not a straightforward refactoring of the original code, and it's difficult to be sure if it really preserves the original behavior. For example, is the behavior same when DNAME exists and the first 'if' conditions are met?

Well, as I get it, that is the exact reason why we have tests. All the previous tests did pass, so the behaviour we are interested in should be the same. And even when I decided the original code was too complicated and I just rewrote this bit and it potentially acted differently in some situation than the original, if this one looks correct (and it does to me), it should be OK, because we did not try the original in anything but the unittests anyway.

So, is there a reason to stick to the exact original code?

  • as a more fundamental point, this method is now generally too big and complicated, and difficult to follow (although it's not a result of this branch). I suspect we'll need overall cleanup/refacotring to improve clarity and readability. (does #1198 intend to improve the situation?)

Yes, that's the exact reason for #1198, you are not the first one to notice ;-).

  • (ditto) this "TODO" now seems to be doable:

Yes, right, I modified that one. That one solves the star thing as well.

comment:12 in reply to: ↑ 8 ; follow-ups: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

  • This should be beyond the scope of this ticket anyway, but I've noticed our approach to findPreviousName() using the reversed name may not work for labels using the \ddd notation. For example, while \001.example.com. < a.example.com. < \200.example.com. (where '<' should be considered isc::dns::Name::operator<) our current DB schema and findPrevious implementation cannot order them correctly. We should probably discuss this at the dev list.

Crap. That complicates things.

As I don't see an obvious way how to solve this, I agree we want to put it to the dev list to discuss a possible solution. Also, the original SQLit3 implementation has this problem.

  • In the current implementation the negative proof is always looked for when FIND_DNSSEC flag is on. Assuming it's on when the corresponding query from the client at the higher level has the DO bit, the flag would actually be often on (both BIND 9 and unbound set it by default, as far as I know) while on the other hand many zones would actually not be signed. So the attempt of getting the proof is in many cases unnecessary overhead, which may be too costly even though the database involved lookup is already generally expensive. So, in order to avoid that, I wonder we should probably introduce the notion of a flag for a zone that identifies whether it's "signed" or not. BIND 9 has this flag depending on whether the zone has a DNSKEY record. For database backed zones this may not be trivial because the DNSKEY may be added or deleted bypassing the BIND 10 API, but maybe we should still consider it even if the result is an incomplete solution as a compromise. In any case, this point would also be beyond the scope of this ticket. I'm fine with skipping this point for now.

OK, let's skip it. Anyway, we may want to set the flag depending on if the DB is expected to support it/the zone is signed, so this might even be modification somewhere else.

auth/query.cc

Do we need a test case for the "default"?

I don't think so. This is only temporary, so it compiles. #1244 should handle it (part split off this ticket) and we can't really put the new code into use without it. Writing test for this temporary behaviour for the next week or so would be a waste of effort.

  • According to lcov there are some untested code fragments:

I added some more tests which should test it.

  • DatabaseClient::Finder::findPreviousName(): what if the accessor_->findPreviousName() returns a bogus name (like "example..com")? okay to propagate an exception from the Name class?

Good catch. I added a test and code to handle it (the code is little bit awkward, but commented. It's due to Name(...) throwing quite a bunch of different exceptions which don't have common NameError? ancestor).

  • find(): relating to whether to consider the existence of NSEC in the accessor's findPreviousName() (see the comment for sqlite3_accessor), I'd explicitly expect the case where NSEC doesn't exist here rather than letting the accessor implementation check that:

For one, I copied it from the original code, so I thought there was a reason for it protocol-wise.
For second, I kind of expected if there are unsigned delegations or unsigned glue data, they wouldn't have NSEC even if the zone was properly signed. In that case, we should skip them, right? Or am I wrong at the assumption? Or you think the higher level should iterate until it finds the correct one with NSEC? What if there's no NSEC in the whole zone? Wouldn't it be expensive?

  • Is this a question of what's the expected proof of empty non-terminal wildcard match? [...] If so, I'm not 100% sure either, but I guess it's basically no different from other empty non-terminal case. That is, with having wild.*.foo.example.org, the negative proof for a.foo.example.org would be "prev_name(wild.*.foo.example.org) NSEC wild.*.foo.example.org..." I'd also suggest you check this behavior with other implementation. (Note, however, that you'll need "check-names master ignore;" for BIND 9 to force it to accept empty wilds)

After an hour of fighting bind9, I didn't manage to make it return signatures and NSEC data, so I gave up and implemented it this way. I seem to be a bad admin :-|.

  • Do you mean we might want to differentiate empty non terminal (especially with a proof by NSEC) from "normal" NXRRSET cases?

Well, it kind of depends if we need to add another NSEC to prove nonexistence of something else. But I guess we don't so it's OK. I'm still not completely clear of how to prove nonexistence of each single scenario, I'll need to draw some pictures and diagrams if I take #1244.

  • "previous" test: Like the sqlite3 (or accessor in general) case, I suspect the "wrap around" behavior is too much at this moment. Depending on how we should do this, the test may also have to be adjusted.

Well, it depends. We might want to prove that something is before the origin of zone in some way. In that case, we should get the last one somehow and know it was the case before anything. What do you propose to do about it? Or that it should never get into this part of code anyway?

sqlite3_accessor: findPreviousName()

  • when we solely expect SQLITE_OK, I wouldn't bother to get the actual return code (consuming a mutable variable). I would also use sqlite3_errmsg() to provide more informative error message. That is, instead of

OK, but I don't think the mutable variable is a problem to anything. This might be slightly simpler code, the original one slightly more consistent, but I don't care here.

  • I don't see why we need to include the condition of "rdtype = 'NSEC'" in the FIND_PREVIOUS statement. In fact, at least the current set of unittests all passes even without this condition. The notion of "previous name" is independent of whether the name has NSEC or not (although in practice the only use of it may be DNSSEC related), and it's also more consistent to the "keep it dumb" design principle to exclude this condition. So, I'd suggest simplifying the statement by removing "rdtype = 'NSEC'". On the other hand, if there is a specific reason why we really need to include this condition, we should document it as part of the abstract interface requirement. The same discussion applies to FIND_PREVIOUS_WRAP, but see the next bullet first.

As I noted above, we may want the behaviour or not. I'm not completely sure if it is needed, if it isn't, I'm OK with removing it. If it is, I'll write a test checking this and add it to the interface requirement.

  • Do we really need FIND_PREVIOUS_WRAP? As the code comment seemingly indicates, it doesn't seem to be a functional requirement. And since the implementation of this part has somewhat lengthy (although it's quite straightforward), I'd not include this behavior at the moment until/unless we see the real need for it through out experiences. In either case, I believe the expected behavior in this case (i.e., what happens if the given name is "smaller" than the zone's origin name) should be clearly documented at the abstract interface level.

You think it won't be needed? I'm actually OK with both options (better documentation/removing it), but I kept it for now, because I think it will be needed. Do you see a way how to do it without this support?

Thank you & sorry for taking so long

comment:13 in reply to: ↑ 11 Changed 9 years ago by jinmei

Replying to vorner:

  • I'm not sure if this comment is relevant, but at least NSEC should be able to coexist with NS/CNAME (and as commented DS should be okay with NS, too):

I added the NSEC and NSEC3 (I expect that one is there too). The DS needs separate
bool variable, as it is not allowed with CNAME.

Hmm, that's true for NSEC3, but I see some subtle difference for that
case. NSEC3 RRs effectively belong to a separate name space of the
zone, so the rationale why the owner name of an NSEC3 can have other
types is different from the reason for RRSIG and NSEC. In that sense
I'd update the code comment to reflect the subtlety:

                // NSEC and RRSIG can coexist with anything, otherwise
                // we've seen something that can't live together with potential
                // CNAME or NS

(maybe it's sufficient to add NSEC3 in addition to NSEC though)

  • operator+: I personally don't think this definition is intuitive. I'd expect an 'operator+' (with two params) to take the same types of arguments and to meet other natural expectations derived from the arithmetic '+' operator (such as the commutative law). Also, while admittedly this may purely be a matter of taste, I generally agree with the concerns documented in the google's guideline: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Operator_Overloading

Actually, I don't agree there at all with the google styleguide. Then we're getting to the java world, where you have to write string_a.equals(string_b), because it can't overload operators. That is much less readable code than with ==.

Actually, I said "*generally* agree" due to such examples. My opinion
is that there are cases where operator overloading contributes to
better readability, and if the benefit outweighs the pitfalls I'd
rather prefer using them than dogmatically rejecting them. So it's
case by case basis, and I said "generally" because IMO different
people tend to have different views on clarity or intuitiveness and a
simple operator can often cause more confusion than clarity.

Now, for this particular case...

As with this concrete operator ‒ adding an element to a set is something I did see written this way in mathematics, mainly in graph theory and combinatorics (and even substracting an element from set).

...I've not seen textbooks in these theoretical areas for more than a
decade so I don't quite remember the exact notation in such cases. I
can see it if the operator + is used exactly as the operation of
union, i.e., setA + { element_being_added }, and I can imagine that
the literature may introduce a convenient abbreviation of allowing
"{}" to be omitted.

But when I first saw the code, it was just confusing; I first thought
it was one of the magical class methods of std::map and went to see an
online manual (unsuccessfully), then I found the private definition in
database.cc with my natural expectation that it should be a symmetric
operator, then found it was not, surprised, read the definition more
closely, then finally understood it.

Once I followed all these steps, I saw the code was correct, and once
I understood it, it was not difficult to read the rest of the code.
But I needed to spend a certain amount of time and brain cycles until
I reached that point. In such a case I wouldn't consider the use of
operator overloading a reasonable abstraction. Note specifically that
I had to go to cplusplus.com before knowing this operator is actually
our own invention.

So, to me, the use of the operator for this is much more intuitive than + with strings. You have a valid point with the commutative thing, but it isn't used the other way around (in the code, nor in mathematics), even if it could be, so I just saved myself writing another copy with switched order of parameters.
Even if the definition might be less intuitive, I think the intention is clear where used and probably can't be confused with any other possible use of the + operator.

As I said above, the intention is understandable (not sure if it's
"clear") once you follow the non trivial steps. But the question is
how intuitive and easy to understand the intent is to the majority of
the readers of the code who tries to understand and possibly update
it, compared to other, possibly more verbose ways. Due to the hassles
I explain, it's not so obvious to me at best.

Citing another reference as you don't seem to like the google
guideline, I also "generally" with what is written in the "C++ Coding
Standards" book on this matter. If you don't have it, go to
http://www.amazon.com/exec/obidos/ASIN/0321113586,
click on "look inside" and search for "Tensor". "Programmers hate
surprises" (I was surprised when I first saw the implementation using
the operator+). "Programmers expect operators to come in bundles" (I
did when I first saw the implementation, and I was confused as I found
the expectation didn't hold). "Named functions ... therefore should
be preferred for clearer code if there can be any doubt about
semantics" (at least I had a doubt about it when I first saw it,
although I can't speak for many others).

  • Considering these, I'd suggest something like this:
    • Define nsec_types etc via a proxy function:
      const set<RRType>&
      NSEC_TYPES() {
          static set<RRType>* types;
          if (types == NULL) {
              auto_ptr<set<RRType> > types_ptr(new set<RRType>()); 
              types_ptr->insert(RRType::NSEC());
              types = types_ptr.release();
          }
          return (*types);
      }
      

That's what I tried before introducing the operator and it turned out to be a lot of typing and quite a lot of code, which was much less readable than simple inline list of types separated by + signs.

(I just noticed we could actually replace new + auto_ptr with another
static object, but that probably doesn't make it so different in this
context)

I admit this approach looks less stylish and requires longer lines of
code (which I generally don't like either). But regarding the
readability I don't necessarily think it's worse. If I encounter
code:

                                    found = getRRsets(wildcard, NSEC_TYPES(),
                                                      true);

I'd assume NSEC_TYPES would be effectively a constant object just
idiomatically returned by a proxy function. So I'd then search for
the definition of it, and will find it:

// Convenient shortcut for a set of { NSEC }
const set<RRType>&
NSEC_TYPES() {
    static set<RRType> types;
    static bool initialized(false);

    if (!initialized) {
        types.insert(RRType::NSEC());
	initialized = true;
    }
    return (types);
}

it doesn't contradict my expectation, and at least to me pretty easy
to understand without any surprise, unlike the case with operator+.

We'll need to do the same thing for DELEGATION_TYPES(), FINAL_TYPES()
and WILDCARD_TYPES(), and I admit repeating the same pattern is dull,
but it seems unlikely that we have to continue this practice, so it's
basically one time effort. The only other places where we need to
perform insert() are these:

        found = getRRsets(name.toText(), final_types + type, name != origin);
...
                    found = getRRsets(wildcard, wildcard_types + type, true,
                                      &construct_name);

which would look like

        WantedTypes wanted_types(FINAL_TYPES());
	wanted_types.insert(type);
        found = getRRsets(name.toText(), wanted_types, name != origin);
...
                    WantedTypes wanted_types(WILDCARD_TYPES());
                    wanted_types.insert(type);
                    found = getRRsets(wildcard, wanted_types, true,
                                      &construct_name);

and while it results in 4 more lines which is not good, I can
personally accept that.

So, I'd rather like to keep the operator there. It's pity we can't overload the , operator, because that one would be even better for the list. But this is C++, not perl6.

(Not sure what you mean by we can't - C++ allows overloading
"operator," but that doesn't matter much in this context anyway) I
don't claim the proxy function approach is obviously better. While
personally I prefer that because it's more straightforward and can
understand it without a surprise, I also see its drawbacks. And
although I less prefer the overloading approach, I don't see a
technical flaw in it (unlike statically initialized namespace-scope
objects). If you still believe the approach with an overloaded
operator with taking account into all I explained above, especially
that different people have different senses of clarity and it may not
be so obvious as the original code author tends to think (to whom it
tends to be so obvious because it's based on his/her own sense), I
won't fight against that further. But I have two final
suggestions/comments in that case:

  • If possible I'd hope using something different from operator+ because this one is quite confusing at least to me. I once thought about operator<<, but probably you might not like that either because it may look modifying the left operand.
  • Because of the point that clarity/readability is in the eye of the beholder, I'd suggest revising this comment:
    // This arguably produces more readable code than having bunch of proxy
    // functions and set.insert calls scattered through the code.
    
    I'd simply say, following the previous paragraph, something like "this will help keep the lines of code shorter".
  • the changes in the first for loop is not a straightforward refactoring of the original code, and it's difficult to be sure if it really preserves the original behavior. For example, is the behavior same when DNAME exists and the first 'if' conditions are met?

Well, as I get it, that is the exact reason why we have tests. All the previous tests did pass, so the behaviour we are interested in should be the same. And even when I decided the original code was too complicated and I just rewrote this bit and it potentially acted differently in some situation than the original, if this one looks correct (and it does to me), it should be OK, because we did not try the original in anything but the unittests anyway.

So, is there a reason to stick to the exact original code?

In general, I don't necessarily oppose to piggy-backing improvements
(changing the behavior) as part of "refactoring". In fact I often do
it myself. I see two issues in this particular context, however.
First, tests only provide necessary checks, not sufficient ones. The
fact that the new code doesn't break existing tests means it doesn't
break the original behavior only covered by the tests. And, as I
noted in the subsequent review comments, there were in fact uncovered
part of the code. Second, I wouldn't be too much worry about
piggy-backing if the original code were reasonably simple and easy to
understand as I could be quite confident that the behavior change is
safe. In the case of this code, however, as we both seem to have
agreed the code is quite complicated, and I cannot be sure that the
behavior change doesn't have other adverse effects I'm missing.

So, in this specific case, I'd be much happier if we could make it
more gradually, i.e., fast refactor the code in its strictness sense
without introducing behavior changes, then improve the behavior based
on the (hopefully) now more understandable code.

That said, rewinding everything at this point may not be an effective
way of development. If you can list up things you changed with why,
and if it's reasonable, that's probably sufficient.

comment:14 in reply to: ↑ 12 Changed 9 years ago by jinmei

Replying to jinmei:

One quick point I clearly need to correct (myself):

  • Is this a question of what's the expected proof of empty non-terminal wildcard match? [...] If so, I'm not 100% sure either, but I guess it's basically no different from other empty non-terminal case. That is, with having wild.*.foo.example.org, the negative proof for a.foo.example.org would be "prev_name(wild.*.foo.example.org) NSEC wild.*.foo.example.org..." I'd also suggest you check this behavior with other implementation. (Note, however, that you'll need "check-names master ignore;" for BIND 9 to force it to accept empty wilds)

After an hour of fighting bind9, I didn't manage to make it return signatures and NSEC data, so I gave up and implemented it this way. I seem to be a bad admin :-|.

After playing with existing implementations and re-reading RFC4035, I
realized I had been a bit confused when I made this comment. Using
the above example, this should be the case for Section 3.1.3.2 of
RFC4035. And,

   o  An NSEC RR proving that there is no exact match for <SNAME,
      SCLASS>.

should be:

prev_name(a.foo.example.org.) NSEC next_name(a.foo.example.org.) ...

while

   o  An NSEC RR proving that the zone contains no RRsets that would
      match <SNAME, SCLASS> via wildcard name expansion.

should be

prev_name(*.foo.example.org) NSEC wild.*.foo.example.org. ...

And, in my understanding of the current branch, the latter is out of
scope (deferred to #1244), right? So, in the context of this branch
what should be returned is the former. Note that
prev_name(a.foo.example.org.) may not be equal to
wild.*.foo.example.org. If there's an RR for 1.foo.example.org, for
example, that will be the previous name.

So, commit dd340b3 is not really (yet) correct, and should be revised.

As for experiments with other implementations, try the attached zone
file for zone "jinmei.org". For BIND 9, you'll need to specify
check-names in the options section:

options {
...
	check-names master ignore;
};
zone "jinmei.org" {
...
     file "jinmei.zone.signed";
};

and query for c.b.jinmei.org and ham.b.jinmei.org. Actually, BIND 9
doesn't seem to return a correct answer. I also tried it with NSD
3.2.7, which seems to provide the really expected behavior.

Changed 9 years ago by jinmei

comment:15 in reply to: ↑ 12 ; follow-up: Changed 9 years ago by jinmei

Replying to vorner:

Replying to jinmei:

  • This should be beyond the scope of this ticket anyway, [...]

Crap. That complicates things.

As I don't see an obvious way how to solve this, I agree we want to put it to the dev list to discuss a possible solution. Also, the original SQLit3 implementation has this problem.

Okay, and I know the problem has been in the original implementation.

  • In the current implementation the negative proof is always looked for when FIND_DNSSEC flag is on. [...]

OK, let's skip it. Anyway, we may want to set the flag depending on if the DB is expected to support it/the zone is signed, so this might even be modification somewhere else.

Okay.

  • DatabaseClient::Finder::findPreviousName(): what if the accessor_->findPreviousName() returns a bogus name (like "example..com")? okay to propagate an exception from the Name class?

Good catch. I added a test and code to handle it (the code is little bit awkward, but commented. It's due to Name(...) throwing quite a bunch of different exceptions which don't have common NameError? ancestor).

I'd personally catch isc::Exception as a whole, but I'm okay with your
code, too. Also, I've been aware that we probably need some more
class hierarchies for exceptions (and some exception classes are too
detailed - we probably only need one "NameError?" exception class).
Hopefully we can revisit this in near future and simplify the catching
side of code.

  • find(): relating to whether to consider the existence of NSEC in the accessor's findPreviousName() (see the comment for sqlite3_accessor), I'd explicitly expect the case where NSEC doesn't exist here rather than letting the accessor implementation check that:

For one, I copied it from the original code, so I thought there was a reason for it protocol-wise.

As far as I know there's no protocol restriction in this context
(except the fact that only protocol elements using the notion of
"previous name" would probably be NSEC (and deprecated NXT)). I know
it's derived from the original implementation. And, as far as I can
see, the original code doesn't exploit the fact that
findPreviousName() includes the existence of an NSEC RR in the
condition of "previous"; the caller doesn't assume the existence.

For second, I kind of expected if there are unsigned delegations or unsigned glue data, they wouldn't have NSEC even if the zone was properly signed. In that case, we should skip them, right? Or am I wrong at the assumption? Or you think the higher level should iterate until it finds the correct one with NSEC? What if there's no NSEC in the whole zone? Wouldn't it be expensive?

Hmm, for glues (under a zone cut), you're probably right. They'll not
have an NSEC (or RRSIG for that matter), and if we didn't include the
existence of NSEC as part of the "previous name" condition the lookup
result would be incorrect or the process would be even more expensive.

So, now I see the need for it. On thinking further with this in mind,
however, it seems the essential point is that findPreviousName()
should search names above and the parent half of zone cuts
(i.e. excluding anything below a zone cut). Using the existence (or
non existence) of NSEC is one practical way to implement the concept,
but at this level of abstraction it seems to me too restrictive to
assume that way of implementation - a different database backend may
introduce an explicit notion of zone cut (for efficiency
reasons or something) and may still be able to meet the requirement of
findPreviousName() without relying on NSEC.

So, my revised suggestion is:

  • add the documentation of DatabaseAccessor::findPreviousName() that names under a zone cut be excluded (but ones on the parent half of cuts must be included). Also describe that it's up to specific derived implementation how to realize that, but one practical way is to include the existence of an NSEC RR.
  • in the caller side, don't assume the existence of NSEC even if findPreviousName() successfully returns. If NSEC RR cannot be found on the name, either simply skip including NSEC or throw an exception, which would subsequently result in SERVFAIL in practice. (in the latter case the end result is the same as the current implementation, but the assumption is different - it's not necessarily an implementation bug, but may be broken or not properly signed database).

I don't know what you mean by "unsigned delegations", btw. Do you
mean opt-out? If so, my understanding is that it's NSEC3 specific.
Or perhaps are you confused with DS?

  • "previous" test: Like the sqlite3 (or accessor in general) case, I suspect the "wrap around" behavior is too much at this moment. Depending on how we should do this, the test may also have to be adjusted.

Well, it depends. We might want to prove that something is before the origin of zone in some way. In that case, we should get the last one somehow and know it was the case before anything. What do you propose to do about it? Or that it should never get into this part of code anyway?

I'd first point out that this behavior doesn't exist the current
implementation, so this is adding a new behavior without a clear
reason. We *might* want that behavior in future for some yet-to-know
reason, but until then I'd rather keep the interface and
implementation smaller and simpler. Currently, and in practice, this
interface would be used after identifying a zone containing the query
name, so it shouldn't happen the query name is "smaller" than zone
origin. Should that happen, I'd just throw an exception at the moment.

  • I don't see why we need to include the condition of "rdtype = 'NSEC'" in the FIND_PREVIOUS statement.

As I noted above, we may want the behaviour or not. I'm not completely sure if it is needed, if it isn't, I'm OK with removing it. If it is, I'll write a test checking this and add it to the interface requirement.

Now I see why:-) Let's keep it.

  • Do we really need FIND_PREVIOUS_WRAP? As the code comment seemingly indicates, it doesn't seem to be a functional requirement. And since the implementation of this part has somewhat lengthy (although it's quite straightforward), I'd not include this behavior at the moment until/unless we see the real need for it through out experiences. In either case, I believe the expected behavior in this case (i.e., what happens if the given name is "smaller" than the zone's origin name) should be clearly documented at the abstract interface level.

You think it won't be needed? I'm actually OK with both options (better documentation/removing it), but I kept it for now, because I think it will be needed. Do you see a way how to do it without this support?

See above. If you can show a specific usage where we need it, I
wouldn't object. If you simply feel we might need it, I'd suggest
leaving it out of scope (with exception) unless and until we really
need it.

Finally, one small question. What's the purpose of this record? It's
not clear from the diff itself (f16de89).

+    {"delegation.example.org.", "DS", "3600", "", "1 RSAMD5 2 abcd"},

comment:16 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:17 in reply to: ↑ 15 ; follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

Hmm, that's true for NSEC3, but I see some subtle difference for that
case. NSEC3 RRs effectively belong to a separate name space of the
zone, so the rationale why the owner name of an NSEC3 can have other
types is different from the reason for RRSIG and NSEC. In that sense
I'd update the code comment to reflect the subtlety:

                // NSEC and RRSIG can coexist with anything, otherwise
                // we've seen something that can't live together with potential
                // CNAME or NS

(maybe it's sufficient to add NSEC3 in addition to NSEC though)

I added a comment there why they are ignored right now. I think it is enough to just ignore them and not care.

But when I first saw the code, it was just confusing; I first thought
it was one of the magical class methods of std::map and went to see an
online manual (unsuccessfully), then I found the private definition in
database.cc with my natural expectation that it should be a symmetric
operator, then found it was not, surprised, read the definition more
closely, then finally understood it.

OK, seems like my sense of intuitiveness is not so usual. This code is longer, but hopefully more people find it readable.

(Not sure what you mean by we can't - C++ allows overloading
"operator," but that doesn't matter much in this context anyway) I
don't claim the proxy function approach is obviously better. While
personally I prefer that because it's more straightforward and can
understand it without a surprise, I also see its drawbacks. And
although I less prefer the overloading approach, I don't see a
technical flaw in it (unlike statically initialized namespace-scope
objects). If you still believe the approach with an overloaded
operator with taking account into all I explained above, especially
that different people have different senses of clarity and it may not
be so obvious as the original code author tends to think (to whom it
tends to be so obvious because it's based on his/her own sense), I
won't fight against that further. But I have two final
suggestions/comments in that case:

I didn't know the comma can actually be overloaded, but it seems it is strange operator then anyway. It could be made to create lists of stuff, but it would look even more counter-intuitive in C++ to call it with (this one would show the true power of C++, but would have quite hight „What the hell“ effect):

findRRsets(name, (RRType::A(), RRType::NS(), RRType::NSEC()), ...)

That said, rewinding everything at this point may not be an effective
way of development. If you can list up things you changed with why,
and if it's reasonable, that's probably sufficient.

I probably didn't make myself clear enough.

I did not do any modifications to it on purpose, nor I know of any. I just found the original code complicated enough to read it, understand the purpose and write it again with the use of the new function instead of trying to adapt it, because adapting it would be more work.

So I admit that there might be some slight changes in observable behaviour in corner cases we did not think about before nor now, but I think the chance of the behaviour being wrong is the same with new and old version of code, simply because they are the cases we didn't think about. I did try to write it so it acts the same. I know of one place where I'm not sure if I preserved the exact same behaviour ‒ if there are DNAME and NS records in non-apex domain, I'm not sure they were rejected before or one of them was picked up. Now they should be rejected, as this is invalid.

I might try to identify the differences in behaviour, if it helps you in reviewing it, though (or find out that there are none).

Replying to jinmei:

Replying to jinmei:

One quick point I clearly need to correct (myself):

  • Is this a question of what's the expected proof of empty non-terminal wildcard match? [...] If so, I'm not 100% sure either, but I guess it's basically no different from other empty non-terminal case. That is, with having wild.*.foo.example.org, the negative proof for a.foo.example.org would be "prev_name(wild.*.foo.example.org) NSEC wild.*.foo.example.org..." I'd also suggest you check this behavior with other implementation. (Note, however, that you'll need "check-names master ignore;" for BIND 9 to force it to accept empty wilds)

After an hour of fighting bind9, I didn't manage to make it return signatures and NSEC data, so I gave up and implemented it this way. I seem to be a bad admin :-|.

After playing with existing implementations and re-reading RFC4035, I
realized I had been a bit confused when I made this comment. Using
the above example, this should be the case for Section 3.1.3.2 of
RFC4035. And,

   o  An NSEC RR proving that there is no exact match for <SNAME,
      SCLASS>.

should be:

prev_name(a.foo.example.org.) NSEC next_name(a.foo.example.org.) ...

while

   o  An NSEC RR proving that the zone contains no RRsets that would
      match <SNAME, SCLASS> via wildcard name expansion.

should be

prev_name(*.foo.example.org) NSEC wild.*.foo.example.org. ...

And, in my understanding of the current branch, the latter is out of
scope (deferred to #1244), right? So, in the context of this branch
what should be returned is the former. Note that
prev_name(a.foo.example.org.) may not be equal to
wild.*.foo.example.org. If there's an RR for 1.foo.example.org, for
example, that will be the previous name.

Well, to be consistent with the rest, I think it should return the second (eg. prev_name(*.foo.example.org)). Then the WILDCARD_NXRRSET or WILDCARD result status indicates that the #1244 should add an NSEC proving that the exact match doesn't exist. This is because the Query class won't know the exact name for the wildcard, but it will know the exact name for the original query. Regarding this, I'm afraid we might need to signal WILDCARD_CNAME and WILDCARD_DELEGATION as well for the same reason, but I'm not sure, so I left it out for now. It shouldn't be problem to add when needed.

It should look the same as with real WILDCARD_NXRRSET, were it returns NSEC to prove this node doesn't contain the required RRset and #1244 will add the nonexistence of the original query.

So, I think this does return what it should. Or did I overlook some difference?

Replying to jinmei:

I'd personally catch isc::Exception as a whole, but I'm okay with your
code, too. Also, I've been aware that we probably need some more
class hierarchies for exceptions (and some exception classes are too
detailed - we probably only need one "NameError?" exception class).
Hopefully we can revisit this in near future and simplify the catching
side of code.

Well, adding a common ancestor should be enough, we wouldn't need to change the current throws/catches and future ones could be simplified.

So, now I see the need for it. On thinking further with this in mind,
however, it seems the essential point is that findPreviousName()
should search names above and the parent half of zone cuts
(i.e. excluding anything below a zone cut). Using the existence (or
non existence) of NSEC is one practical way to implement the concept,
but at this level of abstraction it seems to me too restrictive to
assume that way of implementation - a different database backend may
introduce an explicit notion of zone cut (for efficiency
reasons or something) and may still be able to meet the requirement of
findPreviousName() without relying on NSEC.

I added a comment to the documentation saying so and modified the text of the exception thrown to say this might be a problem with the accessor or the zone. DataSourceError? is probably correct exception for both these cases.

I don't know what you mean by "unsigned delegations", btw. Do you
mean opt-out? If so, my understanding is that it's NSEC3 specific.
Or perhaps are you confused with DS?

I might have confused it with DS records. There's a lot to learn in the DNSSEC world for me.

I'd first point out that this behavior doesn't exist the current
implementation, so this is adding a new behavior without a clear
reason. We *might* want that behavior in future for some yet-to-know
reason, but until then I'd rather keep the interface and
implementation smaller and simpler. Currently, and in practice, this
interface would be used after identifying a zone containing the query
name, so it shouldn't happen the query name is "smaller" than zone
origin. Should that happen, I'd just throw an exception at the moment.

OK, changing to throwing an exception, because I don't know of specific scenario when this would be needed. If it turns out to be needed, it should be possible to extract it from the history.

As I noted above, we may want the behaviour or not. I'm not completely sure if it is needed, if it isn't, I'm OK with removing it. If it is, I'll write a test checking this and add it to the interface requirement.

Now I see why:-) Let's keep it.

I added a test for it to make sure they are really skipped and commented it is a heuristic to skip under-the-cut data.

Finally, one small question. What's the purpose of this record? It's
not clear from the diff itself (f16de89).

+    {"delegation.example.org.", "DS", "3600", "", "1 RSAMD5 2 abcd"},

This one tests that DS is actually allowed in the same node as non-apex NS. Some tests (like the ones checking delegations) would fail if it existed there and wouldn't be allowed. I didn't want to write a whole new test for it, just misused one already in place to do it. Should I add some kind of comment?

Thank you

comment:18 in reply to: ↑ 17 ; follow-up: Changed 9 years ago by jinmei

Replying to vorner:

The latest code looks mostly okay. I still have a few issues to
discuss (thanks for your patience:-).

Regarding these two points:

So, now I see the need for it. On thinking further with this in mind,
however, it seems the essential point is that findPreviousName()
should search names above and the parent half of zone cuts
(i.e. excluding anything below a zone cut). [...]

I added a comment to the documentation saying so and modified the text of the exception thrown to say this might be a problem with the accessor or the zone. DataSourceError? is probably correct exception for both these cases.

I'd first point out that this behavior doesn't exist the current
implementation, so this is adding a new behavior without a clear
reason. [...]

OK, changing to throwing an exception, because I don't know of specific scenario when this would be needed. If it turns out to be needed, it should be possible to extract it from the history.

I'm afraid we need some clarification even though it may not have to be now.

There are several possible error cases:

  1. The input is bogus, and smaller than the zone's origin name
  2. The zone is not intended to be signed in the first place (so there's no NSEC)
  3. The zone is supposed to be signed but is somehow broken, and NSEC RR does not exist on a "previous name"
  4. (Similar to 2, but) The specific accessor class doesn't support this feature

Case 1 shouldn't happen in the normal query processing context. In
this case we should throw an exception that clearly indicates a
program error such as InvalidParameter?.

Case 2 shouldn't trigger an exception because it's a completely valid
(and actually today's most common) case.

In case 3, we should probably thrown DataSourceError?, which would
subsequently result in SERVFAIL to the query.

Case 4 should probably be handled like case 2.

To implement this properly, I guess we'll need a notion of a flag that
indicates whether a zone is signed (as we discussed in this ticket),
and the code logic would be:

    // below, zone_is_signed is false in case of 2 and 4
    if (dnssec_required && zone_is_signed) {
       prev_name = findPreviousName();
       nsec = getRRsets(prev_name, NSEC);
       if (!nsec) {
         throw(DataSourceError); // case 3
       }
       return (nsec); // good case for a signed zone
    } else {
       // case 2 or 4, don't have to do anything.
    }

And, in the implementation of findPreviousName(), we'd return a (new)
exception "NotFound?", which is expected to cover case 1 should that
ever happen. Case 2 would also cause this error, but we'd generally
expect this case is avoided at a higher level (like the above sample
code).

But the notion of "is_signed" flag is beyond the scope of this ticket,
I cannot implement this idea yet. So, as a short term compromise, I'd
suggest the following logic for now:

    if (dnssec_required) {
       try {
           prev_name = findPreviousName();
           if (!nsec) {
             // depending on the implementation of findPreviousName(),
             // this is either case 2 or 3, but we cannot tell.  Since
             // we cannot SERVFAIL for case 2, we ignore that and simply
             // skip including NSEC.
	     ;    
           } else {
             return (nsec);
           }
      } catch (const NotImplemented&) {
           // This is case 4, or if the implementation of
           // findPreviousName() relies on NSEC, either 2 or 3.  For
           // the same reason as above, we'll ignore the exception.
           ;
      }
    }

And, in our sqlite3's findPreviousName(), if it cannot find the
requested previous name with NSEC, it will throw NotImplemented?. It's
not a good exception for cases 2 or 3, but as a short term workaround
it would be okay. Note also that this implementation would be very
expensive for negative responses with today's common zones (i.e.,
unsigned ones), because all negative processing involves exception
handling. Again, as a short term workaround it should be acceptable.
Case 1 shouldn't happen in this context, so we ignore it for now.

If the above makes sense, my latest suggestion is to slightly adjust
the code to implement the latter logic (we probably need a couple of
additional tests), create a new ticket for the notion of "is signed"
flag of a zone, and refer to this discussion with a note that this
issues should be solved in that ticket, too.

Also, if you do this it will also be solved but I found a minor bug in
the current code for the "empty non-terminal asterisk" case.

                        if (dnssec_data) {
                            // Which one should contain the NSEC record?
                            const Name
                                coverName(findPreviousName(Name(wildcard)));
                            // Get the record and copy it out
                            found = getRRsets(coverName.toText(), NSEC_TYPES(),
...

sqlite3 version of findPreviousName() would throw for an unsigned
zone, so we need to protect the case with a try-catch block.

And one more thing:

  • In sqlite3_accessor_unittest, I'd add the case for "com.example" (i.e. qname is smaller than the zone origin). Also for the database test.

Finally, responses to a couple of points of your comment:

Replying to jinmei:

That said, rewinding everything at this point may not be an effective
way of development. If you can list up things you changed with why,
and if it's reasonable, that's probably sufficient.

I probably didn't make myself clear enough. [...]

So I admit that there might be some slight changes in observable behaviour in corner cases we did not think about before nor now, but I think the chance of the behaviour being wrong is the same with new and old version of code, simply because they are the cases we didn't think about.

Hmm, I'm not entirely happy about this in terms of keeping the
confidence about the code accuracy and quality, but I can live with
that. Hopefully we can identify and fix hidden bugs (either original
or as a result of regression) in #1198.

One quick point I clearly need to correct (myself): [...]

Well, to be consistent with the rest, I think it should return the
second (eg. prev_name(*.foo.example.org)). [...]

It should look the same as with real WILDCARD_NXRRSET, were it returns NSEC to prove this node doesn't contain the required RRset and #1244 will add the nonexistence of the original query.

Ah, okay, I was confused. Then the code and test are okay (except the
above bug regarding exception). This also makes me wonder we should
describe in ZoneFinder::find() more details about what's expected in
various negative cases especially when FIND_DNSSEC is on.

comment:19 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:20 in reply to: ↑ 18 ; follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

Case 2 shouldn't trigger an exception because it's a completely valid
(and actually today's most common) case.

I think that the higher level should not set FIND_DNSSEC in that case, to avoid looking for the data.

So, in that sense, it shouldn't happen here as well.

In case 3, we should probably thrown DataSourceError?, which would
subsequently result in SERVFAIL to the query.

ACK.

But the notion of "is_signed" flag is beyond the scope of this ticket,
I cannot implement this idea yet. So, as a short term compromise, I'd
suggest the following logic for now:

Yes, agreed. But it seems to be needed. I added some FIXME notes about the places needing update and I'll create the ticket when merging this.

If the above makes sense, my latest suggestion is to slightly adjust
the code to implement the latter logic (we probably need a couple of
additional tests), create a new ticket for the notion of "is signed"
flag of a zone, and refer to this discussion with a note that this
issues should be solved in that ticket, too.

I think it should be implemented now.

I also unified the code that gets the covering NSEC RRset, so they are the same.

Ah, okay, I was confused. Then the code and test are okay (except the
above bug regarding exception). This also makes me wonder we should
describe in ZoneFinder::find() more details about what's expected in
various negative cases especially when FIND_DNSSEC is on.

I added some documentation, hopefully I didn't forget anything that is not clear.

comment:21 in reply to: ↑ 20 ; follow-up: Changed 9 years ago by jinmei

Replying to vorner:

The revised code looks okay except some editorial things for doxygen
comments. I also noticed one point where we can constify a variable.
I made that change directly.

Comments:

  • "In case of empty nonterminal cases" sound a bit awkward due to the duplicate "case":
        /// that matched the query name). In case of empty nonterminal cases,
    
    maybe just "In case of (an) empty nonterminal"?
  • I didn't really understand this phrase:
        /// lives, which is the one ending in the subdomain of the empty
        /// nonterminal.
    
    (I already know what it is because of my knowledge of the protocol and the implementation, but if I didn't have the prior knowledge I wouldn't understand what this comment means).
  • Also, even with improving the statement, I guess some examples help here because the concept is quite tricky. I'm attaching a suggested diff to zone.h.

Changed 9 years ago by jinmei

comment:22 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:23 in reply to: ↑ 21 ; follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

I applied your patch and tried to improve the text a little bit. Is it OK now?

Thanks

comment:24 Changed 9 years ago by jelte

  • Milestone changed from Sprint-20110927 to Sprint-20111011

comment:25 in reply to: ↑ 23 Changed 9 years ago by jinmei

Replying to vorner:

Hello

I applied your patch and tried to improve the text a little bit. Is it OK now?

I made two small suggested changes:

  • Replace 'no x.example.com' with 'no record in x.example.com' as the former more sounds like an NXDOMAIN case.
  • combine two short lines

If these are okay, please merge.

comment:26 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:27 Changed 9 years ago by vorner

  • Resolution set to complete
  • Status changed from reviewing to closed

Thanks

Note: See TracTickets for help on using tickets.