Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#2053 closed task (fixed)

LabelSequence::toText() and operator<<

Reported by: jinmei Owned by: muks
Priority: medium Milestone: Sprint-20120717
Component: libdns++ Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 2.25 Internal?: no

Description (last modified by jinmei)

This is not absolutely necessary, but for debugging purposes it would
be helpful if we support toText() for LabelSequence? and corresponding
operator<<. The implementation should be easy as we already have them
for Name (we need to share as much code as possible with the Name
class).

I'd also suggest deprecating the getName() method and use the
new toText() method in the test (which is the only place where
getName() is currently used and only for calling its toText()).
getName() won't be well undefined when we support construction of
a label sequence from serialized data, and using its toText() is not
really good to dump the content of the label sequence anyway because
the sequence may have been stripped and may not be absolute.

Subtickets

Attachments (1)

lseq.diff (1.4 KB) - added by jinmei 7 years ago.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 7 years ago by jinmei

  • Description modified (diff)

comment:2 Changed 7 years ago by jinmei

  • Priority changed from medium to low

(pushing this to the current sprint at a lower priority in case we run out of open
tickets)

comment:3 Changed 7 years ago by jinmei

  • Milestone set to Sprint-20120703

comment:4 Changed 7 years ago by muks

  • Owner changed from UnAssigned to muks
  • Status changed from new to assigned

Picking as I had implemented most of it yesterday

comment:5 Changed 7 years ago by muks

  • Owner changed from muks to UnAssigned

Pushed. Up for review.

comment:6 Changed 7 years ago by muks

  • Status changed from assigned to reviewing

comment:7 follow-up: Changed 7 years ago by jinmei

It's quite likely LabelSequence will stop holding a Name object
directly (see #2086). So I'd avoid relying on it. Perhaps we should
do it in a reversed way: move the current Name::toText logic to
LabelSequence, which for now uses the internal data structure of
Name (eventually changed to even lower-level representation).
Name::toText() will create LabelSequence and call its revised
toText().

comment:8 Changed 7 years ago by jinmei

  • Owner changed from UnAssigned to muks

comment:9 in reply to: ↑ 7 Changed 7 years ago by muks

  • Owner changed from muks to jinmei

Hi Jinmei

Replying to jinmei:

It's quite likely LabelSequence will stop holding a Name object
directly (see #2086). So I'd avoid relying on it. Perhaps we should
do it in a reversed way: move the current Name::toText logic to
LabelSequence, which for now uses the internal data structure of
Name (eventually changed to even lower-level representation).
Name::toText() will create LabelSequence and call its revised
toText().

Done. The code is also a little bit smaller now due to the reorganization.

comment:10 Changed 7 years ago by jinmei

  • Priority changed from low to medium

comment:11 follow-up: Changed 7 years ago by jinmei

I made a few style/editorial cleanups.

  • For LabelSequence, I don't think it makes sense to specify omit_final_dot=true, because we should always want to know whether the sequence is absolute or not and therefore we should always want to include the final_dot when it's absolute. I guess this version of toText() exists only fro the convenience of Name::toText(). If so, I can think of two alternatives:
    • make the version of LabelSequence::toText(bool) private and make it's a friend of Name.
    • unify this version of toText() to the other one so omit_final_dot would always be false. Let Name::toText() adjust the returned string depending on the value of omit_final_dot and the size of the returned string.
  • The declarations of the two toText()s in the header have a lot of duplicates. I'd unify them and refer to the unified text from one of them (but note that this point may become moot depending on how we resolve the previous point).
  • We shouldn't need a loop to locate the start point:
        for (unsigned int i = 0; i < first_label_; i++) {
            count = *np++;
            np += count;
        }
    
    e.g., This should suffice:
        Name::NameString::const_iterator np = name_.ndata_.begin() +
            name_.offsets_[first_label_];
    
  • What's the purpose of this check? This doesn't seem to be necessary.
        if ((first_label_ > name_.labelcount_) ||
            (last_label_ > name_.labelcount_) ||
            (first_label_ > last_label_)) {
            isc_throw(BadValue, "Bad first label indices were passed");
        }
    
  • In general, I'd like to avoid using Name members when possible: Use getLabelCount() instead of Name::labelcount_; Use getDataLength() instead of Name::length_.
  • Related to the previous point, this comment and the code are not really accurate:
        // result string: it will roughly have the same length as the wire format
        // name data.  reserve that length to minimize reallocation.
        std::string result;
        result.reserve(name_.length_);
    
    (Consider the case where the original name contains 127 labels but the sequence has been reduced to a single label)
  • Not specific to this branch, but it seems to be better to handle this as a special case:
        if (name_.length_ == 1) {
            //
            // Special handling for the root label.  We ignore omit_final_dot.
            //
            assert(name_.labelcount_ == 1 && name_.ndata_[0] == '\0');
            return (".");
        }
    
    I'd handle that case within the while loop like this:
            if (count == 0) {
                // We've reached the "final dot".  If we've not dumped any
                // character, the entire label sequence is the root name.
                // In that case we don't omit the final dot.
                if (!omit_final_dot || result.empty()) {
                    result.push_back('.');
                }
                break;
            }
    
  • Due to the class integrity this should actually never happen, so it could be assert:
                isc_throw(BadLabelType, "unknown label type in name data");
    
    That said, with LabelSequence revised to allow construction from raw data, this will become less reliable, so I'm okay with keeping it as an exception.
  • This comment does not make much sense any more:
        assert(count == 0);         // a valid name must end with a 'dot'.
    
    and the trick making this check pass is not clean:
            if (labels == 0) {
                count = 0;
                break;
            }
    
    We should revise it so it will be a reasonable extension as a result of the generalization.
  • This doesn't seem to be correct:
        EXPECT_EQ("example.com.", ls2.toText());
        ls2.stripRight(1);
        EXPECT_EQ("example", ls2.toText());
        ls2.stripRight(1);
        EXPECT_EQ("", ls2.toText());
    
    After the first strip, it should be "example.com" (without '.'). Likewise, the result of toText() should never be an empty string. There should be a bug in the main code.
  • I don't see the need for this check in the context of this test
        EXPECT_THROW(ls7.stripLeft(1), isc::OutOfRange);
    
  • testing another boundary case, i.e., a longest possible name (label seq) and possible longest labels would be a good idea.
  • If we really want to allow LabelSequence::toText() to take the omit_final_dot parameter, we should have tests for that version, too.

comment:12 Changed 7 years ago by jinmei

  • Owner changed from jinmei to muks

comment:13 in reply to: ↑ 11 ; follow-up: Changed 7 years ago by muks

  • Owner changed from muks to jinmei

Hi Jinmei

Replying to jinmei:

I made a few style/editorial cleanups.

  • For LabelSequence, I don't think it makes sense to specify omit_final_dot=true, because we should always want to know whether the sequence is absolute or not and therefore we should always want to include the final_dot when it's absolute. I guess this version of toText() exists only fro the convenience of Name::toText(). If so, I can think of two alternatives:
    • make the version of LabelSequence::toText(bool) private and make it's a friend of Name.
    • unify this version of toText() to the other one so omit_final_dot would always be false. Let Name::toText() adjust the returned string depending on the value of omit_final_dot and the size of the returned string.

I think the former approach is better. It involves use of friend, but is nicer than the second that involves further processing. The code has been changed.

  • The declarations of the two toText()s in the header have a lot of duplicates. I'd unify them and refer to the unified text from one of them (but note that this point may become moot depending on how we resolve the previous point).

They have been unified.

  • We shouldn't need a loop to locate the start point:
        for (unsigned int i = 0; i < first_label_; i++) {
            count = *np++;
            np += count;
        }
    
    e.g., This should suffice:
        Name::NameString::const_iterator np = name_.ndata_.begin() +
            name_.offsets_[first_label_];
    

Done. :)

  • What's the purpose of this check? This doesn't seem to be necessary.
        if ((first_label_ > name_.labelcount_) ||
            (last_label_ > name_.labelcount_) ||
            (first_label_ > last_label_)) {
            isc_throw(BadValue, "Bad first label indices were passed");
        }
    

It was necessary when this code was in the Name class, and remained when it was copied over. It has been removed now.

  • In general, I'd like to avoid using Name members when possible: Use getLabelCount() instead of Name::labelcount_; Use getDataLength() instead of Name::length_.

Done. :)

  • Related to the previous point, this comment and the code are not really accurate:
        // result string: it will roughly have the same length as the wire format
        // name data.  reserve that length to minimize reallocation.
        std::string result;
        result.reserve(name_.length_);
    
    (Consider the case where the original name contains 127 labels but the sequence has been reduced to a single label)

We can simply remove the reserve() call. The reallocation is going to not result in data copying in a paging kernel, and wire rep is small enough that the amount of buffer will suffice after a small number of reallocs (log_2 of max size).

  • Not specific to this branch, but it seems to be better to handle this as a special case:
        if (name_.length_ == 1) {
            //
            // Special handling for the root label.  We ignore omit_final_dot.
            //
            assert(name_.labelcount_ == 1 && name_.ndata_[0] == '\0');
            return (".");
        }
    
    I'd handle that case within the while loop like this:
            if (count == 0) {
                // We've reached the "final dot".  If we've not dumped any
                // character, the entire label sequence is the root name.
                // In that case we don't omit the final dot.
                if (!omit_final_dot || result.empty()) {
                    result.push_back('.');
                }
                break;
            }
    

Done. :)

  • Due to the class integrity this should actually never happen, so it could be assert:
                isc_throw(BadLabelType, "unknown label type in name data");
    
    That said, with LabelSequence revised to allow construction from raw data, this will become less reliable, so I'm okay with keeping it as an exception.

I've left it as it is.

  • This comment does not make much sense any more:
        assert(count == 0);         // a valid name must end with a 'dot'.
    
    and the trick making this check pass is not clean:
            if (labels == 0) {
                count = 0;
                break;
            }
    
    We should revise it so it will be a reasonable extension as a result of the generalization.

I don't follow what you mean by "is not clean". If labels is 0, we have reached the right end of the label sequence and need to break out. For other cases, the (count == 0) assertion is still a reasonable one to have. So we assign 0 to count before breaking out.

  • This doesn't seem to be correct:
        EXPECT_EQ("example.com.", ls2.toText());
        ls2.stripRight(1);
        EXPECT_EQ("example", ls2.toText());
        ls2.stripRight(1);
        EXPECT_EQ("", ls2.toText());
    
    After the first strip, it should be "example.com" (without '.'). Likewise, the result of toText() should never be an empty string. There should be a bug in the main code.

The tests were fixed and the bug was fixed.

  • I don't see the need for this check in the context of this test
        EXPECT_THROW(ls7.stripLeft(1), isc::OutOfRange);
    

This was written as an assertion when the toText() implementation was in the Name class so that it didn't called with bogus values. It can be removed, but as it's a logical sequence in that list of strip tests, I just left it in.

  • testing another boundary case, i.e., a longest possible name (label seq) and possible longest labels would be a good idea.

Such tests have been added now.

  • If we really want to allow LabelSequence::toText() to take the omit_final_dot parameter, we should have tests for that version, too.

This type of LabelSequence::toText() is private now. Name is the only user of it and its tests check both true and false cases on it.

comment:14 in reply to: ↑ 13 ; follow-up: Changed 7 years ago by jinmei

Replying to muks:

  • For LabelSequence, I don't think it makes sense to specify omit_final_dot=true, because we should always want to know whether the sequence is absolute or not and therefore we should always want to include the final_dot when it's absolute. I guess this version of toText() exists only fro the convenience of Name::toText(). If so, I can think of two alternatives:
    • make the version of LabelSequence::toText(bool) private and make it's a friend of Name.
    • unify this version of toText() to the other one so omit_final_dot would always be false. Let Name::toText() adjust the returned string depending on the value of omit_final_dot and the size of the returned string.

I think the former approach is better. It involves use of friend, but is nicer than the second that involves further processing. The code has been changed.

Okay, but in that case please make it clear in comments (for the
private method) that we use this function only as a helper for
Name::toText().

It was necessary when this code was in the Name class, and remained when it was copied over. It has been removed now.

  • In general, I'd like to avoid using Name members when possible: Use getLabelCount() instead of Name::labelcount_; Use getDataLength() instead of Name::length_.

Done. :)

Actually, I meant LabelSequence::getDataLength(), not Name::getData().
The use of the latter will soon be impossible, which was the point of
this comment.

  • Related to the previous point, this comment and the code are not really accurate:

We can simply remove the reserve() call. The reallocation is going to not result in data copying in a paging kernel, and wire rep is small enough that the amount of buffer will suffice after a small number of reallocs (log_2 of max size).

Yeah, in retrospect this might be a premature optimization (and these
methods don't have to be super efficient anyway). I see it still
there, and I'm okay with either way. If I keep it, however, we should
use getDataLength() (see above), and I'd revise the comment
accordingly.

  • This comment does not make much sense any more:
        assert(count == 0);         // a valid name must end with a 'dot'.
    
    and the trick making this check pass is not clean:
            if (labels == 0) {
                count = 0;
                break;
            }
    
    We should revise it so it will be a reasonable extension as a result of the generalization.

I don't follow what you mean by "is not clean". If labels is 0, we have reached the right end of the label sequence and need to break out. For other cases, the (count == 0) assertion is still a reasonable one to have. So we assign 0 to count before breaking out.

What I meant is that the above code tweaked the semantics of 'count'.
Due to this code, this variable is not (always) used to indicate the
number of characters in the label that was examined last. Such semantics
overloading makes the code less understandable.

And, on looking at it more closely with the bug I pointed out, I guess
the more fundamental issue is that it still tries to reach the end of
the original name data, while in LabelSequence it may have been
right-stripped. What we should actually do is, IMO, to limit the loop
to the end of the sequence, and the check should be revised whether we
really reached there (i.e., not unexpectedly exit from the loop).

I'm attaching a suggested patch. With this change we don't even have
to do this check at all while still avoiding the bug with right-strip.
The suggested patch also includes suggested changes to the
string::reserve() (above).

If you're okay with that, please apply it and merge with the comment
on the private function.

Last edited 7 years ago by jinmei (previous) (diff)

Changed 7 years ago by jinmei

comment:15 Changed 7 years ago by jinmei

  • Owner changed from jinmei to muks

comment:16 in reply to: ↑ 14 Changed 7 years ago by muks

Hi Jinmei

Replying to jinmei:

Okay, but in that case please make it clear in comments (for the
private method) that we use this function only as a helper for
Name::toText().

Comment has been updated.

We can simply remove the reserve() call. The reallocation is going to not result in data copying in a paging kernel, and wire rep is small enough that the amount of buffer will suffice after a small number of reallocs (log_2 of max size).

Yeah, in retrospect this might be a premature optimization (and these
methods don't have to be super efficient anyway). I see it still
there, and I'm okay with either way. If I keep it, however, we should
use getDataLength() (see above), and I'd revise the comment
accordingly.

I've left it in its updated form as-is.

  • This comment does not make much sense any more:
        assert(count == 0);         // a valid name must end with a 'dot'.
    
    and the trick making this check pass is not clean:
            if (labels == 0) {
                count = 0;
                break;
            }
    
    We should revise it so it will be a reasonable extension as a result of the generalization.

I don't follow what you mean by "is not clean". If labels is 0, we have reached the right end of the label sequence and need to break out. For other cases, the (count == 0) assertion is still a reasonable one to have. So we assign 0 to count before breaking out.

What I meant is that the above code tweaked the semantics of 'count'.
Due to this code, this variable is not (always) used to indicate the
number of characters in the label that was examined last. Such semantics
overloading makes the code less understandable.

And, on looking at it more closely with the bug I pointed out, I guess
the more fundamental issue is that it still tries to reach the end of
the original name data, while in LabelSequence it may have been
right-stripped. What we should actually do is, IMO, to limit the loop
to the end of the sequence, and the check should be revised whether we
really reached there (i.e., not unexpectedly exit from the loop).

I'm attaching a suggested patch. With this change we don't even have
to do this check at all while still avoiding the bug with right-strip.
The suggested patch also includes suggested changes to the
string::reserve() (above).

If you're okay with that, please apply it and merge with the comment
on the private function.

I follow what you mean. The code also is better now.

comment:17 Changed 7 years ago by muks

  • Resolution set to fixed
  • Status changed from reviewing to closed

Pushed to master in commit 1fc2b06b57a008ec602daa2dac79939b3cc6b65d:

* 9771dfb [2053] Add comment that toText(bool) variant is a helper for Name::toText()
* 398b221 [2053] Use getDataLength() to find out the right end to stop
* 23ae3c2 [2053] Add checks with long label sequences
* 3a1e804 [2053] Fix output of right-stripped label sequences
* d83d9fe [2053] Remove obsolete comment
* fdcc93e [2053] Rewrite code to handle the root name
* 09d780b [2053] Use methods on Name instead of accessing member variables directly
* e20691f [2053] Remove checks that are no longer necessary
* 15364a5 [2053] Remove loop to locate the start point
* 6c78bb4 [2053] Make LabelSequence::toText(bool) a private method
* b945fd1 [2053] style fixes: constify, long line
* efb3f33 [2053] Move toText() implementation from Name to LabelSequence class
* fb5902b [2053] Remove the dns::LabelSequence::getName() method
* 36d3b72 [2053] Add operator<< to dns::LabelSequence
* 0bebde9 [2053] Add toText() method to dns::LabelSequence

Resolving as fixed. Thank you for the reviews Jinmei.

comment:18 Changed 7 years ago by jinmei

  • Total Hours changed from 0 to 2.25
Note: See TracTickets for help on using tickets.