Opened 9 years ago

Closed 9 years ago

#256 closed defect (fixed)

bugs in base32.cc

Reported by: jinmei Owned by: jinmei
Priority: medium Milestone: 06. 4th Incremental Release
Component: libdns++ Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: Add Hours to Ticket:
Total Hours: Internal?: no

Description

The current implementation of lib/dns/base.cc is buggy.

Among the bugs is this one:

        string group;

        iss >> setw(8) >> group;
        if (iss.bad() || iss.fail()) {
            isc_throw(BadBase32String,
                      "Could not parse base32 input: " << base32);
        }

setw() isn't meaningful in the context of input stream. With SunStudio? this will trigger the exception.

There are other bugs (see, e.g., disabled tests in tests/base32_unittest.cc).

The setw issue will be solved with the attached patch, but I'd rather make a complete fix later.

Subtickets

Attachments (1)

base32.diff (2.3 KB) - added by jinmei 9 years ago.

Download all attachments as: .zip

Change History (11)

Changed 9 years ago by jinmei

comment:1 Changed 9 years ago by jinmei

  • billable set to 1
  • Internal? unset

branches/trac256 is ready for review. The diff can be retrieved by:

svn diff -r 2481 svn+ssh://bind10.isc.org/svn/bind10/branches/trac256

With this fix and trac #288 the sunstudio build will fully compile and
pass tests.

I've used this opportunity to clean up some other things (see below),
which make the diff relatively big, but function-wise the essential
changes are as follows:

  • dns/util/base_n.cc
  • dns/util/base{16,32hex}_from_binary.h
  • dns/util/binary_from_base{16,32hex}.h
  • dns/tests/base{16,32hex}_unittest.cc

Rather than trying to identify and fix each and every bug of the
original base32 (which should have been called base32hex btw, so I
renamed it in this patch) I chose to generalize the base64 wrapper for
boost base64-from-binary and binary-from-base64, and use the same code
base for all base64, base32hex and "hex" (which used the base16
encoding). This way we can centralize the source of bug (if any), and
can be confident that if it works with one encoding algorithm it
should work with others equally.

base_n.cc is the generalized wrapper implementation. It's essentially
the same as the original base64.cc, but employs internal class
templates to support different encoding algorithm with different
parameters. The implementation may not look so trivial, so I added
relatively detailed comments on how it generally works.

With this change, we need to provide translation table for base32hex
and base16 encodings (because they are not included in boost).
baseXX_from_binary and binary_from_baseXX define these tables. They
should be a pretty straightforward variation of the boost counterpart:
boost/archive/iterators/base64_from_binary.hpp and
boost/archive/iterators/binary_from_base64.hpp. So, for review, I
believe we can simply perform sanity checks on the table definition,
rather than checking the tricky template definitions.

Other notes about base-XX changes:

  • I've also updated base32hex and base16 unit tests.
  • With this change, the original base32.cc and hex.cc were removed.
  • I introduced a generic exception class defined in exceptions.h for errors in the base-XX processing, rather than the overly specialized BaseXX exception classes.

Other cleanup: in this change I made a separate subdirectory 'util'
under lib/dns for internal utility tools used in libdns++ internally,
and move existing source files that fall into this category to this
new directory. baseXX files were also moved to util.

Proposed ChangeLog entry:

  75.?	[bug]		jinmei
	lib/dns: Fixed miscellaneous bugs in the base32 (hex) and hex
	(base16) implementation, including incorrect padding handling,
	parser failure in decoding with a SunStudio build, missing
	validation on the length of encoded hex string.  Test cases were
	more detailed to identify these bugs and confirm the fix.  Also
	renamed the incorrect term of "base32" to "base32hex".  This
	changed the API, but they are not intended to be used outside
	libdns++, so we don't consider it a backward incompatible change.
	(Trac #256, rTBD)

comment:2 Changed 9 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from new to reviewing

comment:3 Changed 9 years ago by stephen

  • Owner changed from UnAssigned to stephen

comment:4 follow-up: Changed 9 years ago by stephen

  • Owner changed from stephen to jinmei

Branch created: revision 2481. Files checked: revision 2508

src/lib/datasrc/data_source.cc
src/lib/exception/exceptions.h
src/lib/dns/rdata/generic/rrsig_46.cc
src/lib/dns/rdata/generic/nsec3param_51.cc
src/lib/dns/rdata/generic/nsec_47.cc
src/lib/dns/rdata/generic/dnskey_48.cc
src/lib/dns/rdata/generic/ds_43.cc
src/lib/dns/rdata/generic/nsec3_50.cc
src/lib/dns/tests/rdata_nsec3param_unittest.cc
src/lib/dns/tests/sha1_unittest.cc
src/lib/dns/tests/base64_unittest.cc
src/lib/dns/tests/rdata_dnskey_unittest.cc
Changes are OK

src/lib/dns/tests/rdata_rrsig_unittest.cc
src/lib/dns/tests/rdata_nsec3_unittest.cc
Differences OK. Good idea to just put currently failing test cases in a disabled test rather than comment them out - it leaves them visible when the tests are run.

src/lib/dns/tests/base32hex_unittest.cc
src/lib/dns/tests/hex_unittest.cc
Differences OK, although it took a few minutes to work out the logic for the decodeMap test (i.e. that a string with only the last character not equal to the zero character would result in a data stream with the last byte equal to the numeric value represented by the that character); additional comments here would be useful. (Admittedly it would have been helpful to look at that test in hex_unittest.cc first, being the easier case.)

src/lib/dns/util/sha1.h
src/lib/dns/util/sha1.cc
Not checked. It is assumed that these are Donald Eastlake's implementation of SHA1 and have just been moved from their original location.

dns/util/base_n.cc
Header documentation in the file omits reference to the base16 classes.

class EncodeNormalizer
class !DecodeNormalizer
The intention of EncodeNormalizer appears to be that when the iterator reaches base_end it will return a pad character, and continue to return a pad character no matter how many times it is subsequently incremented. However, when initialized with with both arguments equal (i.e. the current iterator being equal to the end iterator) a dereference operation will dereference the end iterator. Initializing in_pad_ to be equal to (base == base_end) should fix this.

A similar comment apply to DecodeNormalizer; if initialized with base equal to base_beginpad, it should be regarded as being in the padding region.

(Admittedly the only occasion where both constructor arguments are set to the same value is where the constructed object is used as a target of a comparison with another iterator, so correcting these items will have no effect on the operation of the existing code; the above comments are more for completeness should the classes be used elsewhere.)

struct BaseNTransformer
The documentation in the header could be a bit clearer as to why the third and fourth template parameters in the BaseNTransformer class are what they are. It took a bit of rooting through the boost documentation to work out that they are correct.

In the calculation of padbits in BaseNTransformer::decode, the use of "& ~7" is a neat trick to round down to the nearest multiple of 8. It took me a few seconds to work out why it was there though - perhaps a comment would be in order?

Not sure that the comment "Confirm the original BaseX text is the canonical encoding of the data" is completely correct - all the code does is check that the padding in the decoded data does not conflict with the padding in the baseN-encoded text.

Not sure why this is a "struct" rather than a "class" like EncodeNormalizer and DecodeNormalizer.

src/lib/dns/Makefile.am
The files util/base16_from_binary.h and util/binary_from_base16.h are not in the list of libdns++ sources (but they are included in util/base_n.cc)

src/lib/dns/tests/Makefile.am
OK (but see "Other Comments" below).

base16_from_binary.h
base32hex_from_binary.h
binary_from_base16.h
binary_from_base32hex.h
OK - in both these cases the differences from the original boost file seem reasonable. The tables have been checked against those in RFC4648 and appear to be correct.

Comment: The check on "t" in the "to_n_bit::operator()" method is "<= 127" in binary_from_base32hex.h and "< sizeof(lookup_table)" in binary_from_base16.h. Both have the same effect (the lookup table is 128 bytes long), although the latter is preferable.

src/lib/dns/util/README
OK. Although it is not clear why the classes should not be used within other modules of BIND10. So far the contents of util are binary<-->base16/32 converters and a copy of Donald Eastlake's implementation of the SHA1 code in RFC4634. These are not really DNS-specific, so could be in a general BIND-10 library.

Other Comments
The version of the code checked did not include any unit tests of the base16 classes.

comment:5 in reply to: ↑ 4 Changed 9 years ago by jinmei

Thanks for the very careful review and detailed comments.

src/lib/dns/tests/rdata_rrsig_unittest.cc
src/lib/dns/tests/rdata_nsec3_unittest.cc
Differences OK. Good idea to just put currently failing test cases in a disabled test rather than comment them out - it leaves them visible when the tests are run.

Agreed, these ifdefs are just an ugly leftover when I didn't know how to disable specific tests temporarily. This is actually a separate issue from the main topic of this ticket, and fixing them is quite trivial, so I directly made that change on trunk.

src/lib/dns/tests/base32hex_unittest.cc
src/lib/dns/tests/hex_unittest.cc
Differences OK, although it took a few minutes to work out the logic for the decodeMap test (i.e. that a string with only the last character not equal to the zero character would result in a data stream with the last byte equal to the numeric value represented by the that character); additional comments here would be useful. (Admittedly it would have been helpful to look at that test in hex_unittest.cc first, being the easier case.)

Okay, added some comments (r2528).

src/lib/dns/util/sha1.h
src/lib/dns/util/sha1.cc
Not checked. It is assumed that these are Donald Eastlake's implementation of SHA1 and have just been moved from their original location.

dns/util/base_n.cc
Header documentation in the file omits reference to the base16 classes.

Ah, good catch, fixed (r2535).

class EncodeNormalizer
class !DecodeNormalizer

I believe r2535 addressed these comments. Some specific points follow:

The intention of EncodeNormalizer appears to be that when the iterator reaches base_end it will return a pad character, and continue to return a pad character no matter how many times it is subsequently incremented. However, when initialized with with both arguments equal (i.e. the current iterator being equal to the end iterator) a dereference operation will dereference the end iterator. Initializing in_pad_ to be equal to (base == base_end) should fix this.

A similar comment apply to DecodeNormalizer; if initialized with base equal to base_beginpad, it should be regarded as being in the padding region.

I've thought about it, and chose to just make comments on the assumption without modifying the code. As you also noted, these are only intended to be used within this file, each used only once, so it wouldn't be much beneficial to try to validate the arguments 100%. Also, if we make it very sure they are valid, we should also think about other invalid cases such as base_end beyond the end of the valid range, base_beginpad_ > base_end_, etc. Making these perfectly correct would complicate the resulting code, and I suspect the accuracy isn't worth the complexity in this case.

struct BaseNTransformer
The documentation in the header could be a bit clearer as to why the third and fourth template parameters in the BaseNTransformer class are what they are. It took a bit of rooting through the boost documentation to work out that they are correct.

I admit these are not easy to understand. I've added more information about these boost-derived classes at the top of this file. Hopefully these address your comment. (We might make it even more detailed, but it will eventually be the boost documentation itself. So we need to seek some reasonable balance).

In the calculation of padbits in BaseNTransformer::decode, the use of "& ~7" is a neat trick to round down to the nearest multiple of 8. It took me a few seconds to work out why it was there though - perhaps a comment would be in order?

I agree this is not trivial. Added more comments.

Not sure that the comment "Confirm the original BaseX text is the canonical encoding of the data" is completely correct - all the code does is check that the padding in the decoded data does not conflict with the padding in the baseN-encoded text.

Again, I agree this is not trivial, and, as I re-read the code I found the implementation could be simpler for this purpose. I've added more comments, and revised the code (the new code passes tests).

Not sure why this is a "struct" rather than a "class" like EncodeNormalizer and DecodeNormalizer.

I follow one common (but probably not so widely deployed) convetion: use struct for stateless class, and use class if it maintains states. But, admittedly, I'm not sure if this separation is helpful, and I also suspect I'm not always consistent in this separation. If you prefer consistency, I don't mind changing BaseNTransformer to a class.

src/lib/dns/Makefile.am
The files util/base16_from_binary.h and util/binary_from_base16.h are not in the list of libdns++ sources (but they are included in util/base_n.cc)

Another good catch, addressed (r2536).

src/lib/dns/tests/Makefile.am
OK (but see "Other Comments" below).

base16_from_binary.h
base32hex_from_binary.h
binary_from_base16.h
binary_from_base32hex.h
OK - in both these cases the differences from the original boost file seem reasonable. The tables have been checked against those in RFC4648 and appear to be correct.

Comment: The check on "t" in the "to_n_bit::operator()" method is "<= 127" in binary_from_base32hex.h and "< sizeof(lookup_table)" in binary_from_base16.h. Both have the same effect (the lookup table is 128 bytes long), although the latter is preferable.

Ah, I was not consistent here. I begain with keeping the style of the original boost base64 implementation (for base32hex), but then departed from that style for base16 because I thought it was "better". In any case, it's better to make them consistent, and as you pointed out, it's probably a bit better to use sizeof. Changed it (r2537).

src/lib/dns/util/README
OK. Although it is not clear why the classes should not be used within other modules of BIND10. So far the contents of util are binary<-->base16/32 converters and a copy of Donald Eastlake's implementation of the SHA1 code in RFC4634. These are not really DNS-specific, so could be in a general BIND-10 library.

Good point, I tried to clarify it in a revised README (r2539). Actually, I'm not 100% sure this is an agreed development policy. Maybe we should discuss it in the weekly call.

Other Comments
The version of the code checked did not include any unit tests of the base16 classes.

It should be covered in hex_unittest.

I know the terminology isn't consistent. I don't know why it was originally named using "hex" (in the code I revised), but it's probably because BIND 9 called it "hex" and RFC3658/5155 simply call the corresponding representation "hex digits". On the other hand, in util I mainly refer to the terminology defined in RFC4648, where "hex" is a (kind of) alias of base 16, so I chose base16.

But should we rather be consistent on the terminology?

...or, are you talking about something different?

comment:6 Changed 9 years ago by jinmei

  • Owner changed from jinmei to stephen

comment:7 follow-up: Changed 9 years ago by stephen

  • Owner changed from stephen to jinmei

dns/util/base_n.cc
:
I've thought about it, and chose to just make comments on the assumption without modifying the
code. As you also noted, these are only intended to be used within this file, each used only
once, so it wouldn't be much beneficial to try to validate the arguments 100%. Also, if we make
it very sure they are valid, we should also think about other invalid cases such as base_end
beyond the end of the valid range, base_beginpad_ > base_end_, etc. Making these perfectly
correct would complicate the resulting code, and I suspect the accuracy isn't worth the
complexity in this case.

Fair enough. No need to make any further modifications here. If we use the code elsewhere though, we should revisit it. I was thinking that these iterators could be useful in other circumstances where handling transformations to and from base-n encoded text is required. There shouldn't be too much extra effort, as the iterator checks can only be simple - unless an iterator is a random-access iterator it is not possible to compare iterators for anything other than equality.

I follow one common (but probably not so widely deployed) convetion: use struct for stateless
class, and use class if it maintains states. But, admittedly, I'm not sure if this separation
is helpful, and I also suspect I'm not always consistent in this separation. If you prefer
consistency, I don't mind changing BaseNTransformer to a class.

I've not come across that one. My own use has been to use "struct" where it comprises only data members (with, perhaps, a constructor) and "class" where it has methods. But that is really a hangover from C programming. As "class" and "struct" are equivalent (apart from the default visibility of members), it doesn't really matter apart from the look of the thing. Leave it as it is.

In the calculation of padbits in BaseNTransformer::decode, the use of "& ~7" is a neat trick
to round down to the nearest multiple of 8. It took me a few seconds to work out why it was
there though - perhaps a comment would be in order?

I agree this is not trivial. Added more comments.

The "& ~7" still needs a brief comment (e.g. "the bitwise AND with ~7 clears the lowest three bits, so has the effect of rounding the result down to the nearest multiple of 8"). I only mentioned this as it's non-obvious so it's something I think a reader would look at and puzzle over until they work it out. (Well, I did!)

src/lib/dns/util/README
:
Good point, I tried to clarify it in a revised README (r2539). Actually, I'm not 100% sure
this is an agreed development policy. Maybe we should discuss it in the weekly call.

I think that would be a useful discussion.

comment:8 in reply to: ↑ 7 Changed 9 years ago by jinmei

  • Owner changed from jinmei to stephen

Replying to stephen:

The "& ~7" still needs a brief comment (e.g. "the bitwise AND with ~7 clears the lowest three bits, so has the effect of rounding the result down to the nearest multiple of 8"). I only mentioned this as it's non-obvious so it's something I think a reader would look at and puzzle over until they work it out. (Well, I did!)

Hmm, maybe this is a typical example of the use of C as a bit more readable assembly, and explaining it may be either crucial or redundant depending on where the reader lives in the programming language hierarchy:-)

Anyway, I admit this may look tricky, so added some comments (mostly adopting your example). r2546.

src/lib/dns/util/README
:
Good point, I tried to clarify it in a revised README (r2539). Actually, I'm not 100% sure
this is an agreed development policy. Maybe we should discuss it in the weekly call.

I think that would be a useful discussion.

My understanding is it will go to a separate ticket. For the moment I think we can move forward with the current README.

Is there any outstanding issue, or can I now merge it to trunk?

comment:9 follow-up: Changed 9 years ago by stephen

  • Owner changed from stephen to jinmei

All is OK now - please go ahead and merge.

comment:10 in reply to: ↑ 9 Changed 9 years ago by jinmei

  • Resolution set to fixed
  • Status changed from reviewing to closed

Replying to stephen:

All is OK now - please go ahead and merge.

okay, thanks. merged to trunk, closing ticket.

Note: See TracTickets for help on using tickets.