Opened 8 years ago

Closed 7 years ago

#1774 closed defect (fixed)

use uint8_t for "characters" of Name data

Reported by: jinmei Owned by: muks
Priority: medium Milestone: Sprint-20120703
Component: libdns++ Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 3 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

The libdns++ Name class uses std::string as the internal
representation of name data

private:
    std::string ndata_;

In retrospect this was probably a suboptimal choice because
std::string is an alias of basic_string<char>, and char may be signed
(and may be larger than an 8-bit integer). Since domain name labels
can contain any 8-bit character, other parts of the code uses unsigned
char, uint8_t, etc, and we often need to convert them using a cast.
It's probably cleaner and possibly even safer if we use an unsigned
integer in the first place, and ideally an 8-bit unsigned integer.

From a quick experiment, we won't have to change much of the existing
code even if we change the type of 'ndata_' to basic_string<uint8_t>,
so I propose we do this as a clean up task.

Subtickets

Attachments (2)

test.diff (1.9 KB) - added by jinmei 7 years ago.
name.diff (1.5 KB) - added by jinmei 7 years ago.

Download all attachments as: .zip

Change History (18)

comment:1 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 3

comment:2 Changed 8 years ago by jinmei

  • Milestone changed from Year 3 Task Backlog to Next-Sprint-Proposed

comment:3 Changed 8 years ago by jelte

  • Milestone Next-Sprint-Proposed deleted

comment:4 Changed 8 years ago by jinmei

  • Milestone set to Next-Sprint-Proposed

comment:5 Changed 7 years ago by jinmei

  • Milestone set to Next-Sprint-Proposed

comment:6 Changed 7 years ago by shane

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120703

comment:7 Changed 7 years ago by muks

  • Owner changed from UnAssigned to muks
  • Status changed from new to assigned

Picking

comment:8 Changed 7 years ago by muks

  • Owner changed from muks to UnAssigned
  • Status changed from assigned to reviewing

Up for review.

There was one case of strcmp() in libdns++, and one of individual character compares in the tests. Both were for equality (strcmp() == 0 or ch1 == ch2), but if it were for greater or lesser, note that strcmp() takes (const char *) pointers in its args. But it still seems to do an unsigned compare internally:

[muks@jurassic ~]$ cat strcmp.c
#include <stdio.h>
#include <string.h>

int
main (int argc, char *argv[])
{
  unsigned char *p;
  char a[] = "C";
  char b[] = "D";

  printf ("%d\n", strcmp (a, b));

  p = (unsigned char *) b;
  *p = 0xff;

  printf ("%d\n", strcmp (a, b));

  return 0;
}
[muks@jurassic ~]$ gcc -Wall -Wwrite-strings -O2 -o strcmp strcmp.c
[muks@jurassic ~]$ ./strcmp 
-1
-188
[muks@jurassic ~]$ 

comment:9 Changed 7 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:10 follow-up: Changed 7 years ago by jinmei

  • maybe a matter of taste, especially because it's class-private data, but I'd personally define a custom type instead of directly using string<uint8_t>:
        typedef std::basic_string<uint8_t> NameString;
        NameString ndata_;
    
  • labelsequence should use uint8_t rather than unsigned char
  • same for messagenrenderer.
  • it looks like some other places in name.cc could use uint8_t instead of unsigned char
  • this doesn't seem to be correct in various points:
            return (std::strncmp((const char*) data,
                                 (const char*) other_data,
                                 len) == 0);
    
    
    First, although this is not because of this branch, I suspect str[n]cmp cannot be used here because data/other_data can contain \0 as part of a valid label. I also suggest adding a specific test that can either confirm or deny this. And, in any case, memcmp is a better choice here, and probably the only correct choice if my suspicion is correct. Third, we shouldn't deceive compiler by cast in this type of case. Fourth, even if this were one of such rare cases, we should never use C-style cast. Combining these, this part should be:
            // and type of data/other_data should be changed to uint8_t
            return (std::memcmp(data, other_data, len) == 0);
    
  • same sense of comment applies to labelsequence test. I'd first clarify the type of getData() and test data, too. On top of that I'd make overall cleanup. If it still seems to require casting from unsigned char* to char*, something should be wrong.

comment:11 Changed 7 years ago by jinmei

  • Owner changed from jinmei to muks

comment:12 in reply to: ↑ 10 ; follow-up: Changed 7 years ago by muks

  • Owner changed from muks to jinmei

Replying to jinmei:

  • maybe a matter of taste, especially because it's class-private data, but I'd personally define a custom type instead of directly using string<uint8_t>:
        typedef std::basic_string<uint8_t> NameString;
        NameString ndata_;
    

Good idea.. done. :)

  • labelsequence should use uint8_t rather than unsigned char

Done.

  • same for messagenrenderer.

Done.

  • it looks like some other places in name.cc could use uint8_t instead

Done in places where it's applicable. But there are still uses of unsigned char which comes from Name() accepting std::string.

of unsigned char

  • this doesn't seem to be correct in various points:
            return (std::strncmp((const char*) data,
                                 (const char*) other_data,
                                 len) == 0);
    
    
    First, although this is not because of this branch, I suspect str[n]cmp cannot be used here because data/other_data can contain \0 as part of a valid label. I also suggest adding a specific test that can either confirm or deny this. And, in any case, memcmp is a better choice here, and probably the only correct choice if my suspicion is correct. Third, we shouldn't deceive compiler by cast in this type of case. Fourth, even if this were one of such rare cases, we should never use C-style cast. Combining these, this part should be:
            // and type of data/other_data should be changed to uint8_t
            return (std::memcmp(data, other_data, len) == 0);
    

Agreed and fixed. Tests have been added for it (make check fails with strncmp() now).

  • same sense of comment applies to labelsequence test. I'd first clarify the type of getData() and test data, too. On top of that I'd make overall cleanup. If it still seems to require casting from unsigned char* to char*, something should be wrong.

This is because string literals are of type (const char *). They need casting to unsigned, but instead of doing it in a dozen places, it is cast inside getDataCheck().

comment:13 in reply to: ↑ 12 Changed 7 years ago by jinmei

Replying to muks:

  • same sense of comment applies to labelsequence test. I'd first clarify the type of getData() and test data, too. On top of that I'd make overall cleanup. If it still seems to require casting from unsigned char* to char*, something should be wrong.

This is because string literals are of type (const char *). They need casting to unsigned, but instead of doing it in a dozen places, it is cast inside getDataCheck().

This can be avoided by providing a few-line wrapper function. See
my proposed patch: test.diff.

I also suggest one last update: using uint8_t for Name::offsets_
(see patch "name.diff"). Using unsigned char is not necessarily
incorrect, but when we allow LabelSequence to be constructed from
serialized raw data it would be more convenient if it's defined as
uint8_t (so we can construct it in the unified way either from a
Name or from serialized data).

If you're okay with these suggestions, please apply them and merge.
If not, please continue the discussion.

Changed 7 years ago by jinmei

Changed 7 years ago by jinmei

comment:14 Changed 7 years ago by jinmei

  • Owner changed from jinmei to muks

comment:15 Changed 7 years ago by muks

The patches look fine. I have applied them and also used a typedef for the offsets.

comment:16 Changed 7 years ago by muks

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged to master:

* 3dab9ab [1774] Added NameOffsets typedef
* 8acfde4 [1774] Use uint8_t for name offsets
* b574c19 [1774] Use a wrapper to convert string literals to uint8_t*
* 24c9af4 [1774] Add tests with zero name data
* bf0f1e0 [1774] Use uint8_t instead of unsigned char (more places)
* 846dbfd [1774] Use uint8_t instead of unsigned char
* 31ee75a [1774] Define and use type for Name data
* de9c0a0 [1774] Use uint8_t for characters of Name data

Resolving as fixed. Thank you for the reviews Jinmei.

Note: See TracTickets for help on using tickets.