Opened 9 years ago

Closed 8 years ago

#1198 closed enhancement (complete)

Split DatabaseClient::Finder::find

Reported by: vorner Owned by: stephen
Priority: low Milestone: Sprint-20111220
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: High
Sub-Project: DNS Feature Depending on Ticket: Datasrc refactoring
Estimated Difficulty: 4 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

The function is getting rather long and should be split into smaller ones in some reasonable way to preserve sanity and prevent spaghettification of the code.

Subtickets

Attachments (1)

database.diff (27.9 KB) - added by jinmei 8 years ago.

Download all attachments as: .zip

Change History (28)

comment:1 Changed 8 years ago by stephen

  • Milestone changed from Next-Sprint-Proposed to Sprint-20110927

This needs to wait until #1177 has been completed.

comment:2 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 4

comment:3 Changed 8 years ago by jelte

  • Milestone changed from Sprint-20110927 to Sprint-20111011

comment:4 Changed 8 years ago by jelte

  • Priority changed from major to minor

comment:5 Changed 8 years ago by jelte

  • Milestone changed from Sprint-20111011 to Sprint-20111025

comment:6 Changed 8 years ago by jelte

  • Milestone changed from Sprint-20111025 to Year 3 Task Backlog

comment:7 Changed 8 years ago by jinmei

  • Defect Severity changed from N/A to High
  • Milestone changed from Year 3 Task Backlog to Next-Sprint-Proposed

comment:8 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20111122

comment:9 Changed 8 years ago by stephen

  • Owner set to stephen
  • Status changed from new to assigned

comment:10 follow-up: Changed 8 years ago by stephen

  • Owner changed from stephen to UnAssigned
  • Status changed from assigned to reviewing

Extracted from find() the logic that searches for delegations and the logic that checks for wildcard matches into separate methods.

In the header for those functions added a a description of the functionality beyond that in the original code based on my understanding of the source.

comment:11 Changed 8 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:12 in reply to: ↑ 10 Changed 8 years ago by jinmei

Replying to stephen:

Extracted from find() the logic that searches for delegations and the logic that checks for wildcard matches into separate methods.

In the header for those functions added a a description of the functionality beyond that in the original code based on my understanding of the source.

Basically, it looked very good. The resulting code was much, much
easier to understand.

But, I wanted to make the main find() method even more concise and (at
least to me) understandable. Instead of pointing out how we could do
it in the form review comment, I've made several proposed commits to
the branch as I thought that would be easier to convey the intent.
(of course, you may or may not agree with the changes, so they can be
reverted based on the review discussion). The suggested changes are
commits from cf8596b to b447162. The key observation is that the code
before the proposed changes were still not very readable due to
intermediate mutable variables: result_status, result_rrset,
record_found, and get_cover. In particular, it was quite difficult to
figure out how record_found and get_cover affect the final return
value because the points where they are set and used were not close.
On looking into the code more closely, I found that the cases where
the special handling is necessary are actually quite limited. So
after some incremental cycles of further refactoring, I eliminated the
use of these variables and let each "(else) if" case return the
determined result at that point.

I've also introduced a new private method findNoNameResult(), mainly
for making find() more concise. This may not absolutely be necessary,
but IMO the resulting code is now easy to understand "at a glance"
thanks to its concision.

Commit b447162 may be most controversial: it introduced another method
to log the event, which was lost due to the first attempt of the
refactoring. Also, we need to provide more log IDs for each specific
case in find(). And, while I didn't do it in my proposal, we'll also
need to do similar thing for findNoNameResult().

If you agree with the general intent of my proposal, please also note
that you'll need to update the documentation of some of the methods
you introduced in the branch and provide document for newer methods
and new log messages.

Now, here are comments on the original branch before applying my
suggestion.

  • First, I noted some variables can be defined as const. Since it should be obvious and less controversial, I made the changes directly (commit b77f5d1).
  • (In case we agree on the further refactoring idea) I'd also like to simplify the for loop of findWildcardMatch() using the same/similar technique as find(). But this method may not be so unreadable in its current form, so I didn't touch it.
  • (Although this may not actually be a topic of this refactoring task) in findWildcardMatch(), isn't it possible (thought probably rare or due to a pathological database) that getting NSEC fails here?
                            found = getRRsets(wildcard, NSEC_TYPES(), true);
                            result_rrset =
                                found.second.find(RRType::NSEC())->second;
    
    should we handle such case add a test case for it?
  • findWildcardMatch document: there's another case (correctly) considered in the code: wildcard match was canceled due to the existence of a more specific subdomain.
  • A minor style issue: there are some blank lines for unclear purposes and with seemingly not so consistent policy. I'd personally remove these lines because it would make the methods shorter while the blank lines don't improve readability (at least to me). At least the policy of when to insert blank should be consistent. e.g. there's a blank line before 'else if':
                result_rrset = nsi->second;
    
            } else if (type != isc::dns::RRType::CNAME() &&
    
    but not here:
                }
            } else if (dnssec_data) {
    
    There is a blank line after 'if':
        if (!result_rrset) { // Only if we didn't find a redirect already
    
    
    but not here:
            if (found.first) {
                if (first_ns) {
    
    etc.

comment:13 Changed 8 years ago by jinmei

  • Owner changed from jinmei to stephen

comment:14 Changed 8 years ago by stephen

This is just a comment - the changes still have to be made

Basically, it looked very good. The resulting code was much, much easier to understand.

Thank you.

But, I wanted to make the main find() method even more concise and (at least to me) understandable. Instead of pointing out how we could do it in the form review comment, I've made several proposed commits to the branch as I thought that would be easier to convey the intent.

That's fine.

The key observation is that the code before the proposed changes were still not very readable due to intermediate mutable variables: result_status, result_rrset, record_found, and get_cover.

Accepted - I'm afraid I was taking the simplest route and on the return from the method trying to set the state of find() back to what it was before I extracted a block of code into a separate method.

If you agree with the general intent of my proposal, please also note that you'll need to update the documentation of some of the methods you introduced in the branch and provide document for newer methods and new log messages.

What would really help is to have a general description of the algorithm somewhere. I found it difficult to follow the logic in some places - I was reverse engineering the documentation from the code.

First, I noted some variables can be defined as const. Since it should be obvious and less controversial, I made the changes directly (commit b77f5d1).

NP

(In case we agree on the further refactoring idea) I'd also like to simplify the for loop of findWildcardMatch() using the same/similar technique as find(). But this method may not be so unreadable in its current form, so I didn't touch it.

I'll have a look at it.

Does DelegationSearchResult? have to be public?

My bad, it should be private.

(Although this may not actually be a topic of this refactoring task) in findWildcardMatch(), isn't it possible (thought probably rare or due to a pathological database) that getting NSEC fails here?

    found = getRRsets(wildcard, NSEC_TYPES(), true);
    result_rrset =
        found.second.find(RRType::NSEC())->second;

should we handle such case add a test case for it?

Probably - I'll investigate that one further.

findWildcardMatch document: there's another case (correctly) considered in the code: wildcard match was canceled due to the existence of a more specific subdomain.

I'll add documentation for it.

A minor style issue: there are some blank lines for unclear purposes and with seemingly not so consistent policy. I'd personally remove these lines because it would make the methods shorter while the blank lines don't improve readability (at least to me). At least the policy of when to insert blank should be consistent. e.g. there's a blank line before 'else if':

The idea is to make it easier to read - divide the code into "paragraphs" that do some particular task. I find that easier to comprehend than one solid block of code and comments (even with syntax colouring). I admit it's a bit inconsistent - for example, a two lines each containing a brace already gives enough white space separating the "paragraph" above from the one below. They are separated more on visual effect than any particular policy.

comment:15 follow-up: Changed 8 years ago by stephen

  • Owner changed from stephen to jinmei

Additional comments (to be read in conjunction with those above):

If you agree with the general intent of my proposal, please also note that you'll need to update the documentation of some of the methods you introduced in the branch and provide document for newer methods and new log messages.

What would really help is to have a general description of the algorithm somewhere. I found it difficult to follow the logic in some places - I was reverse engineering the documentation from the code.

I've now done that and added more detailed comments to the code. At this point I'm hoping that the reading the comments + code should allow someone to understand what's happing without too much effort on their part.

(In case we agree on the further refactoring idea) I'd also like to simplify the for loop of findWildcardMatch() using the same/similar technique as find(). But this method may not be so unreadable in its current form, so I didn't touch it.

I'll have a look at it.

With the added comments I think it's now understandable enough without further decomposition into separate functions.

(Although this may not actually be a topic of this refactoring task) in findWildcardMatch(), isn't it possible (thought probably rare or due to a pathological database) that getting NSEC fails here?

It could do. I thought about testing for it here, but doesn't that fall into the more general case of requesting DNSSEC data from a zone we believe to be signed and not finding it? I'd prefer to put this into a separate ticket.

Two other things:

  • I've added a small utility to the tools directory to process message files and put them in alphabetical order of message ID. The message file has been run through this.
  • There are four TODOs in database.cc: three are requesting clarification of the algorithm (in particular, reasons why it does/does not do something). The fourth - line 792 - proposes alternate processing at a given condition. Comments/thoughts requested.

comment:16 in reply to: ↑ 15 Changed 8 years ago by jinmei

Replying to stephen:

A couple of preliminary comments/counter questions:

  • Is change 717946a only a style fix of comments? If so, I can concentrate on the essential changes and it would be quite helpful as the change in this commit is big.
  • About this:

    There are four TODOs in database.cc: three are requesting clarification of the algorithm (in particular, reasons why it does/does not do something). The fourth - line 792 - proposes alternate processing at a given condition. Comments/thoughts requested.

    Are you asking me to clarify these TODOs to complete this ticket, or is this just a general note? Especially in the case of the former, I'd wonder whether you are seeking answers to all four TODOs, since some of them seem to have been there before this branch started.

comment:17 Changed 8 years ago by stephen

Is change 717946a only a style fix of comments? If so, I can concentrate on the essential changes and it would be quite helpful as the change in this commit is big.

Yes (as well as adding a "\brief" tag to some of the headers).

Are you asking me to clarify these TODOs to complete this ticket, or is this just a general note?

For the first three, it just a general note - if you have any insights it would be useful. The fourth has come up as a result of the refactoring and a specific comment is sought.

comment:18 Changed 8 years ago by shane

  • Feature Depending on Ticket set to Datasrc refactoring

comment:19 Changed 8 years ago by jinmei

General

What would really help is to have a general description of the algorithm somewhere. I found it difficult to follow the logic in some places - I was reverse engineering the documentation from the code.

I've now done that and added more detailed comments to the code. At this point I'm hoping that the reading the comments + code should allow someone to understand what's happing without too much effort on their part.

(In case we agree on the further refactoring idea) I'd also like to simplify the for loop of findWildcardMatch() using the same/similar technique as find(). But this method may not be so unreadable in its current form, so I didn't touch it.

I'll have a look at it.

With the added comments I think it's now understandable enough without further decomposition into separate functions.

I agree the added comments help understand what's happening (and in
some specific cases why), and they are very good for that purpose.
But at the same time I'm afraid the massive amount of in-method
comments rather reduces readability at a glance and would make it more
difficult to extend the code later with confidence. For example, if
we see the following organization:

    if (some_condition) {
        // this is one possible case.
        return (something);
    } else if (some_other_condition) {
        // this is another case.
        return (other_thing);
    } else if (yet_another_condition) {
        // this is yet another case.
        return (yet_another_thing);
    }
    // A last resort case
    return (last_resort_result);

and need to extend the "last resort" case, we can immediately be sure
that added code doesn't affect any of the first three cases because
it's clear at a glance that all cases have been done with return
before the flow reaches the "last resort" part.

Now, imagine that each comment provides very detailed explanation of
what is going on each case and why we do that, etc, and consumes the
half of "vertical screen size". Even if the essential code logic is
trivial, we'd need to scroll up and down to check whether it's really
safe to add code at the very end part of the function. I'm afraid
that's happening in the revised code of findWildcardMatch() and the
main find() method.

So, I'd first like to propose one thing: move the detailed explanation
to the head of the method, and keep the body of the method concise.
In order to make what I mean clearer, I experimentally did it for
findWildcardMatch(), and looked into the resulting organization of the
method...then I noticed the code logic is now almost duplicate of that
of find() and we could combine these two cases. So I extended the
experiment further and did it also. This additional refactoring was
not only for reducing duplicate code; we can now adopt the consistent
policy on what to do if we encounter multiple CNAMEs.

After these changes, the main find() has become unbelievably small
(considering its original length...with over 600 lines of code). So,
while it still contains some detailed in-method comments, it seems to
me sufficiently concise, so I'm okay with that. The extracted method,
findOnNameResult(), also contains in-method comments. The amount of
the comments is "borderline" in terms of readability to me, but for
now I didn't touch them.

And, I then noticed that findWildcardMatch() could be simplified more,
eliminating the need for some temporary mutable variables. So in my
experiment I did it also.

This time I'm attaching a diff to implement the idea (rather than
committing it). Please let me know if you also think it improves the
code.

reader_message_file.py

  • removeEmptyLeadingTrailing and processFile should be named remove_empty_leading_trailing and process_file?

findDelegationPoint

  • About this:
        // TODO: where does it say we can't return wildcard glue?
    
    The rationale is this part of Section 4.3.3 of RFC1034:
       - When the query is in another zone.  That is, delegation cancels
         the wildcard defaults.
    

findWildcardMatch

  • About this TODO, rfc2672bis-dname-25 Section 3.3 basically invalidates the case. So I think it's okay simply to say the behavior of such a case is undefined (it would be a nice idea to explicitly test it nevertheless).
            // TODO What do we do about DNAME here?
    
  • Personally, I'd use assert for the "sanity check" for FIND_GLUE_OK because should this happen it's a pure internal bug within database.cc (even a broken database table cannot make this happen).
  • Why are there two blank lines here?
                    result_rrset = dresult.first_ns;
    
    
                } else if (!hasSubdomains(name.split(i - 1).toText())) {
    

findNoNameResult

  • I'm not sure "not exist in the zone" is correct, because a formal definition of a "zone" could be the tree. I'd say "not in the database".
            // If so, return NXRRSET instead of NXDOMAIN (as although the name does
            // not exist in the zone, it does exist in the DNS tree).
    
  • I'd remove this comment. I believe the detailed comment before this provides sufficient explanation.
            // pretend something is here as well.
    
  • I suggest revising the !NO_WILDCARD case as follows:
            // It's not an empty non-terminal and wildcard matching is not
            // disabled, so check for wildcards.  If there's some wildcard match
            // (i.e., all results except NXDOMAIN) return it; otherwise fall
            // through to the NXDOMAIN case below.
            const ZoneFinder::FindResult wresult =
                findWildcardMatch(name, type, options, dresult);
            if (wresult.code != NXDOMAIN) {
                return (FindResult(wresult.code, wresult.rrset));
            }
        }
    
    This makes the code flow a bit more complicated, but we can unify the NXDOMAIN cases, and (more important) make sure it's logged.

find()

  • DNAME applies only to subdomains of the DNAME's owner name. (Remember the whole chaos of BNAME, etc discussion. One reason for that is this property of DNAME).
        // TODO: Why is DNAME ignored?
    
    Or, if we want to refer to an authentic reference, see Section 2.3 of rfc2672bis-dname-25:
       Unlike a CNAME RR, a DNAME RR redirects DNS names subordinate to its
       owner name; the owner name of a DNAME is not redirected itself.
    
  • Regarding the note about the "resolver":
            // We are not searching for a CNAME but nevertheless we have found one
            // at the name we are searching so we return it. (A resolver could have
            // originated the query that caues this result.  If so, it will restart
            // the resolution process with the name that is the target of this
            // CNAME.)
    
    I don't think it a matter of this level of lookup (a single search, rather than how we respond to query from a resolver). That's a discussion for, e.g., auth::Query::process(). So I suggest removing this note. Or, if you actually meant the caller of the find() method by "a resolver", I'd revised it to, e.g.:
            // We are not searching for a CNAME but nevertheless we have found one
            // at the name we are searching so we return it. (The caller
            // may want to continue the lookup by replacing the query name
            // to the canonical name with the original RR type).
    
  • As for this "TODO":
            // TODO: Check that throwing an exception here is correct.
    
    In general, throwing an exception from find() is okay. The basic design assumption is that while we generally expect the underlying data source is sane in terms of the DNS protocol (e.g., there should normally be no duplicate CNAMEs, and tools like b10-loadzone is expected to ensure that), it's not 100% guaranteed, and the data source implementation is allowed to react to such unexpected-but-possible case with an exception. In the case of a database backend-ed data source, it's also possible that a connection to the database fails in the middle of a lookup. Such a case can also be signaled with an exception. (As part of the API contract) Applications are expected to be thrown DataSourceError? for these events and to handle it appropriately in their application context. In the case of b10-auth, it would be to return SERVFAIL to the client, and it indeed behaves that way:
        try {
            const RRType& qtype = question->getType();
            const Name& qname = question->getName();
            auth::Query(*memory_client_, qname, qtype, *message).process();
        } catch (const Exception& ex) {
            LOG_ERROR(auth_logger, AUTH_PROCESS_FAIL).arg(ex.what());
            makeErrorMessage(message, buffer, Rcode::SERVFAIL());
            return (true);
        }
    
    Note also that this is not the only place we throw from find() and its helpers. If we worry about the exception in this specific case, we'll need to revisit many other parts, too (but as explained above we shouldn't worry about it)..


  • A related note: in answering the previous point I noticed the description of the base class API (ZoneFinder::find())is stale:
        /// A derived version of this method may involve internal resource
        /// allocation, especially for constructing the resulting RRset, and may
        /// throw an exception if it fails.
        /// It throws DuplicateRRset exception if there are duplicate rrsets under
        /// the same domain.
        /// It should not throw other types of exceptions.
    
    Most important, the last sentence is now false. I'd suggest revising this part within this ticket too:
        /// \exception std::bad_alloc Memory allocation such as for constructing
        ///  the resulting RRset fails
        /// \exception DataSourceError Derived class specific exception, e.g.
        /// when encountering a bad zone configuration or database connection
        /// failure.  Although these are considered rare, exceptional events,
        /// it can happen under relatively usual conditions (unlike memory
        /// allocation failure).  So, in general, the application is expected
        /// to catch this exception, either specifically or as a result of
        /// catching a base exception class, and handle it gracefully.
    

datasrc_messages.mes

  • DATASRC_DATABASE_FOUND_CNAME: the same comment about the "resolver" above applies.
  • DATASRC_DATABASE_FOUND_NXRRSET_NSEC: the longer description doesn't seem to be correct:
    DNSSEC information
    has been requested, but there is no NSEC record corresponding to the node.
    
    Actually, in this case NSEC was found and returned.
  • Note: I've only checked these two messages, assuming the others were just reordered.

Other points

  • I made a few trivial cleanups, and directly committed the changes to the branch.
  • About blank lines:

    The idea is to make it easier to read - divide the code into "paragraphs" that do some particular task. I find that easier to comprehend than one solid block of code and comments (even with syntax colouring). I admit it's a bit inconsistent - for example, a two lines each containing a brace already gives enough white space separating the "paragraph" above from the one below. They are separated more on visual effect than any particular policy.

    Hmm, I'd like to respect your policy, but I guess we need some consistent guideline. To me, these lines just seem to make the code (vertically) longer without improving readability, so I'd tend to remove them as part of cleanup when I work on code including them. If and when you work on that part of the code next time, you'll then notice the need for blank lines and will insert them; and next time I work on it....so, I'd suggest discussing it project-wide to have some consistent policy (admittedly the same thing could happen to other blank lines for readability, but in practice I believe we share a quite consistent sense of which is readable and which is redundant in most of other cases).

Changed 8 years ago by jinmei

comment:20 Changed 8 years ago by jinmei

  • Owner changed from jinmei to stephen

comment:21 follow-up: Changed 8 years ago by stephen

  • Owner changed from stephen to jinmei

I agree the added comments help understand what's happening (and in some specific cases why), and they are very good for that purpose. But at the same time I'm afraid the massive amount of in-method comments rather reduces readability at a glance and would make it more difficult to extend the code later with confidence. For example, if we see the following organization...

Here I have to disagree with you. I find it more awkward to have to bounce back and forth between the header and the code to decide what is happening. I think the header should be for the general overview of what the function does, perhaps with details of the algorithm. But if that gets too detailed, it becomes pseudocode. (Also, the function header is usually put in the .h file, so is separate from the code.)

Now, imagine that each comment provides very detailed explanation of what is going on each case and why we do that, etc, and consumes the half of "vertical screen size". Even if the essential code logic is trivial, we'd need to scroll up and down to check whether it's really safe to add code at the very end part of the function. I'm afraid that's happening in the revised code of findWildcardMatch() and the main find() method.

I think that if you are not clear what is happening - and very importantly, why it is happening - because of insufficient detail, it is probably unsafe to add code anyway.

However...

I think we are both agreed on the need for adequate documentation, we just have a different opinion of where it should be put. (I mention in passing that a number of reviews have taken me a long time because I had to decipher what is going on to understand whether the code being added was relevant and correct - something that would have been considerably eased with better documentation.) In the end though it doesn't really matter where the documentation is, providing it is there; so if you prefer to have all the documentation in the header, I'm fine with that.

Regarding the other changes (findOnNameResult() etc.) they look fine. However, when I tried to apply the diffs file I got a number of rejections - not all changes were applied. So I have not applied any.

reader_message_file.py
removeEmptyLeadingTrailing and processFile should be named remove_empty_leading_trailing and process_file?

They should and have been changed. (And I'm mortified about that! - I picked someone else up on that very point in a review a few days ago. A question of "do as I say" and not "do as I do" I'm afraid :-$)

findWildcardMatch

Personally, I'd use assert for the "sanity check" for FIND_GLUE_OK because should this happen it's a pure internal bug within database.cc (even a broken database table cannot make this happen).

True, but if it ever happens in the wild due to a subsequent change to the code, we end up with the server stopping and a cryptic abort message. At least with an exception (a) the problem is reported through the normal logging channels, (b) it is not disabled if NDEBUG is set and (c) we have to opportunity to take remedial action, e.g. abort the current query. That way we could at least run a degraded service instead of no service.

Why are there two blank lines here?

Over-enthusiasm about the aesthetics of the code layout. (Either that or a mistake.) Spurious blank line removed.

I'm not sure "not exist in the zone" is correct, because a formal definition of a "zone" could be the tree. I'd say "not in the database".

Changed.

I'd remove this comment. I believe the detailed comment before this provides sufficient explanation.

Removed.

I suggest revising the !NO_WILDCARD case as follows:

Good catch, changed.

DNAME applies only to subdomains of the DNAME's owner name. (Remember the whole chaos of BNAME, etc discussion. One reason for that is this property of DNAME).

Comment updated.

Regarding the note about the "resolver":

Note modified. I felt it important to explain why we terminate the search even though we were not searching for a CNAME.

As for this "TODO":

Thanks for the explanation and sanity check, TODO comment removed.

datasrc_messages.mes

Changes applied.

comment:22 in reply to: ↑ 21 Changed 8 years ago by jinmei

Replying to stephen:

First, I've applied the proposed diff to the latest version of the
branch as requested.

I agree the added comments help understand what's happening (and
in some specific cases why), and they are very good for that
purpose. But at the same time I'm afraid the massive amount of
in-method comments rather reduces readability at a glance and would
make it more difficult to extend the code later with confidence. [...]

Here I have to disagree with you. I find it more awkward to have to
bounce back and forth between the header and the code to decide what
is happening. I think the header should be for the general overview
of what the function does, perhaps with details of the algorithm.
But if that gets too detailed, it becomes pseudocode. (Also, the
function header is usually put in the .h file, so is separate from
the code.)

IMO in general, ideally we should make the code itself sufficiently
clear that wouldn't require such literal comments (whether it's in the
body or the head) just to understand what's happening. If one cannot
understand what a method does without that level of detailed comments
we should actually fix it by refactoring the code itself (just as we
are doing now) rather than just trying to explain it in comments.

So, the basic assumption of mine here is that we have made the code
reasonably readable. Detailed comments may still be useful,
especially for someone who first tries to read the code, so I'm
positive about leaving the comments, but with the assumption that the
code itself should already be quite clean and understandable,
excessive amount of in-function comments will rather reduce the
readability.

I agree that it's awkward and inconvenient to have to bounce between
code and comments, but if inserting the comments in the function body
reduces the readability, I'd rather live with the inconvenience and
keep the code body concise.

This is what I wanted to convey via the example below.

Now, imagine that each comment provides very detailed explanation of what is going on each case and why we do that, etc, and consumes the half of "vertical screen size". [...]

I think that if you are not clear what is happening - and very importantly, why it is happening - because of insufficient detail, it is probably unsafe to add code anyway.

Of course, but the assumption is that the code should already be
pretty clear about what's happening.

The logic duplicate between find() and findWildcardMatch() was a good
example of this. The latter was long and less readable (to me) due to
the inserted comments, so I first tried to get the entire view by
extracting the code structure

                if (cni != found.second.end() && type != RRType::CNAME()) {
                } else if (nsi != found.second.end()) {
                } else if (wti != found.second.end()) {
                } else {
                    // Found a wildcard match to the name but not to the type
                    // (i.e. NXRRSET).
               }

and then noticed it was quite similar to the code logic of find():

    if (!is_origin && ((options & FIND_GLUE_OK) == 0) &&
        nsi != found.second.end()) {
    } else if (type != RRType::CNAME() && cni != found.second.end()) {
    } else if (wti != found.second.end()) {
    } else if (!found.first) {
        return (findNoNameResult(name, type, options, dresult));
    } else {
         // NSEC case
    }

It would have been much easier to notice if each block of if/else if
had had a few of lines of code (with possibly a small amount of
comments). But the large amount of comments in the actual code
obscured the entire structure so I had to manually extracted it.

However...

And that said...

I think we are both agreed on the need for adequate documentation, we just have a different opinion of where it should be put. [...] In the end though it doesn't really matter where the documentation is, providing it is there; so if you prefer to have all the documentation in the header, I'm fine with that.

I also admit what's readable is basically a subjective matter, and
since you were the original author of this refactoring, if we still
disagree I respect your preference.

In my latest push to the branch the in-function comments for
findWildcardMatch were moved to the head of the function, but if you
like I'm okay with reverting them to the body on merge.

findWildcardMatch

Personally, I'd use assert for the "sanity check" for FIND_GLUE_OK because should this happen it's a pure internal bug within database.cc (even a broken database table cannot make this happen).

True, but if it ever happens in the wild due to a subsequent change to the code, we end up with the server stopping and a cryptic abort message. At least with an exception (a) the problem is reported through the normal logging channels, (b) it is not disabled if NDEBUG is set and (c) we have to opportunity to take remedial action, e.g. abort the current query. That way we could at least run a degraded service instead of no service.

I see the points of the exception approach. I'm still personally
afraid about being too lenient about clear internal bugs, because if
the bug actually severely breaks the internal integrity (e.g. a large
amount of memory corruption) it may be just dangerous to pretend we
can safely recover from it and keep running. I think we should seek a
good balance between minimizing the risk of keeping (possibly)
corrupted state and improving resilience.

I think this is a bigger issue than this particular case (maybe a good
discussion topic for the next f2f meeting?). For the purpose of this
branch, I'm okay with keeping the current approach (if you don't want
to change it for now).

comment:23 Changed 8 years ago by jinmei

  • Owner changed from jinmei to stephen

comment:24 follow-up: Changed 8 years ago by stephen

  • Owner changed from stephen to jinmei

I am comfortable with the changes. A couple of minor points:

findOnNameResult() needs a Doxygen header.

The logic in logAndCreateResult() is perhaps a good example of why the check for unused arguments in calls to logging should be removed (as mentioned in the post on bind10-dev today. Without it, the LOG_DEBUG call could be reduced to:

LOG_DEBUG(logger, DBG_TRACE_DETAILED, log_id).
          arg(accessor_->getDBName()).arg(name).
          arg(type).arg(getClass()).
          arg(wildname == NULL ? "" : *wildname).
          arg(rrset == NULL ? "" : *rrset);

... with the different message descriptions omitting %5 and %6 as required.

comment:25 in reply to: ↑ 24 Changed 8 years ago by jinmei

Replying to stephen:

I am comfortable with the changes. A couple of minor points:

findOnNameResult() needs a Doxygen header.

I know that, but thought it was homework of the main author of the
ticket:-) But it was me who proposed this method anyway, so I wrote
the documentation and pushed it.

If it looks okay, please feel free to merge.

The logic in logAndCreateResult() is perhaps a good example of why the check for unused arguments in calls to logging should be removed (as mentioned in the post on bind10-dev today. Without it, the LOG_DEBUG call could be reduced to:

LOG_DEBUG(logger, DBG_TRACE_DETAILED, log_id).
          arg(accessor_->getDBName()).arg(name).
          arg(type).arg(getClass()).
          arg(wildname == NULL ? "" : *wildname).
          arg(rrset == NULL ? "" : *rrset);

... with the different message descriptions omitting %5 and %6 as required.

Yes, I was aware of that kind of extension. That's a tradeoff issue
between safety and convenience, and I'm not sure right now which
should be considered more important, but that's certainly worth a
discussion.

comment:26 Changed 8 years ago by jinmei

  • Owner changed from jinmei to stephen

comment:27 Changed 8 years ago by stephen

  • Resolution set to complete
  • Status changed from reviewing to closed

Merged into master with commit 95b7c29f155b8fa21cf85c6d3afbe3f510db83d1

As this is just an internal refactoring, no ChangeLog entry was created.

Note: See TracTickets for help on using tickets.