Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#363 closed defect (fixed)

"H" and "I" are probably harmful for PyArg_ParseTuple()

Reported by: jinmei Owned by: zzchen_pku
Priority: low Milestone: Sprint-20110419
Component: libdns++ Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: 3.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Some of our python binding code uses "H" (unsigned short) and "I" (unsigned int) with !PyArg_ParseTuple().

I suspect most (if not all) of this usage is questionable because overflows are ignored. For example, the following test current passes:

        self.assertEqual(RRType("A"), RRType(0x100000001))

(assuming sizeof(int) is 4)

Admittedly this is an artificial case but it still makes it awkward even if we don't call it a bug.

I think we should use a signed version of larger size of integer ("h", "i", etc) and perform more strict range checks, i.e., reject value < 0 and value > possible_max.

Subtickets

Change History (25)

comment:1 Changed 9 years ago by jinmei

btw, the following is a list of lines containing "H" or "I" with !PyArg_ParseTuple():

edns_python.cc:303:    if (!PyArg_ParseTuple(args, "I", &size)) {
message_python.cc:568:    if (PyArg_ParseTuple(args, "I", &i)) {
message_python.cc:652:    if (!PyArg_ParseTuple(args, "H", &id)) {
message_python.cc:884:    if (PyArg_ParseTuple(args, "I", &i)) {
messagerenderer_python.cc:174:    if (!PyArg_ParseTuple(args, "I", &lengthlimit)) {
name_python.cc:371:    if (!PyArg_ParseTuple(args, "I", &pos)) {
name_python.cc:484:    } else if (PyArg_ParseTuple(args, "I", &n)) {
rrclass_python.cc:170:        } else if (PyArg_ParseTuple(args, "I", &i)) {
rrttl_python.cc:163:        } else if (PyArg_ParseTuple(args, "I", &i)) {
rrtype_python.cc:200:        } else if (PyArg_ParseTuple(args, "I", &i)) {

comment:2 Changed 9 years ago by stephen

  • Milestone changed from y2 12 month milestone to A-Team-Sprint-20110223

comment:3 Changed 9 years ago by jinmei

making this ticket unassigned. it's relatively minor, and I've not spent much time on it yet anyway.

comment:4 Changed 9 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from new to assigned

comment:5 Changed 9 years ago by zzchen_pku

  • Owner changed from UnAssigned to zzchen_pku

comment:6 follow-up: Changed 9 years ago by zzchen_pku

  • Owner changed from zzchen_pku to UnAssigned
  • Status changed from assigned to reviewing

Branch #363 is ready for review.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 9 years ago by jinmei

Replying to zzchen_pku:

Branch #363 is ready for review.

It didn't pass a test for me:

Running test: message_python_test.py
..........E...........
======================================================================
ERROR: test_header_flag (__main__.MessageTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jinmei/src/isc/git/bind10-363/src/lib/dns/python/tests/message_python_test.py", line 114, in test_header_flag
    self.assertRaises(TypeError, self.r.set_header_flag, 0x80000000)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/unittest.py", line 589, in assertRaises
    callableObj(*args, **kwargs)
OverflowError: Message header flag out of range

Also, it was not clear to me why you chose long (instead of int) in
some cases while choosing int for others. Please clarify the
rationale.

(Note: as a test failed I've not looked at the code so closely yet)

comment:8 Changed 9 years ago by jinmei

  • Owner changed from UnAssigned to zzchen_pku

comment:9 in reply to: ↑ 7 ; follow-up: Changed 9 years ago by zzchen_pku

  • Owner changed from zzchen_pku to jinmei

Replying to jinmei:

It didn't pass a test for me:

Running test: message_python_test.py
..........E...........
======================================================================
ERROR: test_header_flag (__main__.MessageTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jinmei/src/isc/git/bind10-363/src/lib/dns/python/tests/message_python_test.py", line 114, in test_header_flag
    self.assertRaises(TypeError, self.r.set_header_flag, 0x80000000)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/unittest.py", line 589, in assertRaises
    callableObj(*args, **kwargs)
OverflowError: Message header flag out of range

For my 32bit OS,the range of interger is (-0x80000000, 0x7fffffff),so the test will throw TypeError? for 0x80000000. Since the unittest is platform dependent,I don't think it make sense to keep it.

Also, it was not clear to me why you chose long (instead of int) in
some cases while choosing int for others. Please clarify the
rationale.

I find this in python2 doc:

python2 supports numeric types: plain integers, long integers
Plain integers (also just called integers) are implemented using long in C, which gives
them at least 32 bits of precision,Long integers have unlimited precision.

and also

In Python 3, there is only one integer type, called int, which mostly behaves like the 
long type in Python 2. 

PyArg_ParseTuple() is responsible for converting a Python integer to a plain C numeric type, as described above, I think long precision is more suitable to prevent overflow.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 9 years ago by jinmei

Replying to zzchen_pku:

It didn't pass a test for me:

For my 32bit OS, the range of interger is (-0x80000000, 0x7fffffff),so the test will throw TypeError? for 0x80000000. Since the unittest is platform dependent, I don't think it make sense to keep it.

You mean "since the unittest shouldn't be platform dependent"? Anyway
I see the point. We could still test the overflow cases selectively
(with a helper routine to identifying the size of int, long, etc), but
that's probably overkilling. So I'm okay with removing it.

Also, it was not clear to me why you chose long (instead of int) in
some cases while choosing int for others. Please clarify the
rationale.

I find this in python2 doc:

python2 supports numeric types: plain integers, long integers
Plain integers (also just called integers) are implemented using long in C, which gives
them at least 32 bits of precision,Long integers have unlimited precision.

and also

In Python 3, there is only one integer type, called int, which mostly behaves like the 
long type in Python 2. 

PyArg_ParseTuple() is responsible for converting a Python integer to a plain C numeric type, as described above, I think long precision is more suitable to prevent overflow.

On thinking about it further, I now agree long is better than int, but
not for that reason. The above citation is about internal
representation of integers in the python language, and regardless of
the internal representation PyArg_ParseTuple() can convert them to
various C integer types, so the question of int vs long (vs short,
etc) still holds.

In our usage many integers are unsigned 16-bit ones. To convert them
safely (without worrying about overflow), we need at least 32-bit
signed integers. That's why I agree long (which is ensured to be at
least 32-bit) is better than int (which can be e.g. 16-bit). For this
rationale, in some cases we really don't need long, and in some other
cases even long is not safe enough. See specific cases below.

Now, specific comments about the code:

  • this may not specific to this patch, but some calls to PyErr_Clear() seem to be redundant. example:
        if (messageflag < 0 || messageflag > 0xffff) {
            PyErr_Clear();
            PyErr_SetString(PyExc_OverflowError, "Message header flag out of range");
    
  • As far as I can see, some of the long int variables don't have to be so (they can be a plain int): Message_init, Message_clear, MessageRenderer_setCompressMode. (although being long is not necessarily incorrect; it just consumes more (stack) space unnecessarily). Actually, in these cases I think it make more sense to do this check within the corresponding C++ class (Message, MessageRenderer?, etc) and delegate the check to it.
  • on the other hand, in this case the integer has to be long:
    • Message_setQid() (because size of int may be 16 bits)
  • Message_addRRset: I'd avoid using hardcoded range here. Actually, in this case I think it make more sense to delegate the check to the C++ code.
  • Name_init: I wouldn't bother to check '> 0xffff'. In fact, the buffer could store longer data than 64KB. Invalid (positive) positions will be rejected in setPosition() anyway. (with removing that check) I'd also consider this case a "type error" rather than overflow, that is, a negative integer type is passed where an unsigned is expected. Same comment applies to other similar cases.
  • Name_at, Name_split: likewise. And, in this case it doesn't even have to be long (int should be sufficient).
  • Name_split: is this case tested? (catching it seems to be correct and good)
    +    PyErr_Clear();
    +    PyErr_SetString(PyExc_TypeError,
    +                    "No valid type in split argument");
    
  • Rcode_init: ext_code doesn't have to be long.
  • RRTTL_init: this doesn't seem to be correct:
    -        } else if (PyArg_ParseTuple(args, "I", &i)) {
    +        } else if (PyArg_ParseTuple(args, "k", &i)) {
    
    "k" omits overflow check, so it essentially has the same problem as "I". One solution is to use "L" with long long (assuming our target system has it; this is probably acceptable. We already rely on uint64_t etc, anyway). Another solution is to ignore overflow in this specific case.
  • message_python_test.py: the change makes the following comment incorrect:
             # this would cause overflow and result in a "valid" flag
    
  • tests in general. I've not closely checked the test coverage, but from a quick look there seems to be some cases that were not tested (e.g. for RRTTL). Please make sure all new code is covered by tests (note also that the coverage problem is unlikely to happen if it was developed test driven); or, if I miss something and the code was fully covered already, please say so.

comment:11 Changed 9 years ago by jinmei

  • Owner changed from jinmei to zzchen_pku

comment:12 in reply to: ↑ 10 ; follow-up: Changed 9 years ago by zzchen_pku

  • Owner changed from zzchen_pku to jinmei

Replying to jinmei:

On thinking about it further, I now agree long is better than int, but
not for that reason. The above citation is about internal
representation of integers in the python language, and regardless of
the internal representation PyArg_ParseTuple() can convert them to
various C integer types, so the question of int vs long (vs short,
etc) still holds.

In our usage many integers are unsigned 16-bit ones. To convert them
safely (without worrying about overflow), we need at least 32-bit
signed integers. That's why I agree long (which is ensured to be at
least 32-bit) is better than int (which can be e.g. 16-bit). For this
rationale, in some cases we really don't need long, and in some other
cases even long is not safe enough. See specific cases below.

Make sense.

Now, specific comments about the code:

  • this may not specific to this patch, but some calls to PyErr_Clear() seem to be redundant. example:
        if (messageflag < 0 || messageflag > 0xffff) {
            PyErr_Clear();
            PyErr_SetString(PyExc_OverflowError, "Message header flag out of range");
    

Removed redundant PyErr_Clear()

  • As far as I can see, some of the long int variables don't have to be so (they can be a plain int): Message_init, Message_clear, MessageRenderer_setCompressMode. (although being long is not necessarily incorrect; it just consumes more (stack) space unnecessarily). Actually, in these cases I think it make more sense to do this check within the corresponding C++ class (Message, MessageRenderer?, etc) and delegate the check to it.
  • on the other hand, in this case the integer has to be long:
    • Message_setQid() (because size of int may be 16 bits)
  • Message_addRRset: I'd avoid using hardcoded range here. Actually, in this case I think it make more sense to delegate the check to the C++ code.
  • Name_init: I wouldn't bother to check '> 0xffff'. In fact, the buffer could store longer data than 64KB. Invalid (positive) positions will be rejected in setPosition() anyway. (with removing that check) I'd also consider this case a "type error" rather than overflow, that is, a negative integer type is passed where an unsigned is expected. Same comment applies to other similar cases.
  • Name_at, Name_split: likewise. And, in this case it doesn't even have to be long (int should be sufficient).

Done.

  • Name_split: is this case tested? (catching it seems to be correct and good)
    +    PyErr_Clear();
    +    PyErr_SetString(PyExc_TypeError,
    +                    "No valid type in split argument");
    

It has been covered by:

self.assertRaises(TypeError, self.name1.split, "wrong", 1) 
self.assertRaises(TypeError, self.name1.split, 1, "wrong")
  • Rcode_init: ext_code doesn't have to be long.

Done.

  • RRTTL_init: this doesn't seem to be correct:
    -        } else if (PyArg_ParseTuple(args, "I", &i)) {
    +        } else if (PyArg_ParseTuple(args, "k", &i)) {
    
    "k" omits overflow check, so it essentially has the same problem as "I". One solution is to use "L" with long long (assuming our target system has it; this is probably acceptable. We already rely on uint64_t etc, anyway). Another solution is to ignore overflow in this specific case.

I think ignore overflow is fine for this specific case, so I'll keep "k", is this ok?

  • message_python_test.py: the change makes the following comment incorrect:
             # this would cause overflow and result in a "valid" flag
    

Updated.

  • tests in general. I've not closely checked the test coverage, but from a quick look there seems to be some cases that were not tested (e.g. for RRTTL). Please make sure all new code is covered by tests (note also that the coverage problem is unlikely to happen if it was developed test driven); or, if I miss something and the code was fully covered already, please say so.

It should has been covered by original unittest:

self.assertRaises(InvalidRRTTL, RRTTL, "4294967296")

Hopefully I haven't missed anything.
Thanks.

comment:13 in reply to: ↑ 12 ; follow-up: Changed 9 years ago by jinmei

Replying to zzchen_pku:

  • As far as I can see, some of the long int variables don't have to be so (they can be a plain int): Message_init, Message_clear, MessageRenderer_setCompressMode. (although being long is not necessarily incorrect; it just consumes more (stack) space unnecessarily). Actually, in these cases I think it make more sense to do this check within the corresponding C++ class (Message, MessageRenderer?, etc) and delegate the check to it.

The "Actually..." part hasn't been addressed (but to address that
we'll also need to update the C++ Message/MessageRenderer?
implementation. So this set of changes would probably go to a
separate ticket).

  • Name_init: I wouldn't bother to check '> 0xffff'. In fact, the buffer could store longer data than 64KB. Invalid (positive) positions will be rejected in setPosition() anyway. (with removing that check) I'd also consider this case a "type error" rather than overflow, that is, a negative integer type is passed where an unsigned is expected. Same comment applies to other similar cases.

Did you clarify the 'type error' vs 'overflow' for negative values
throughout the interfaces (not only for Name_init)? The error policy
should be consistent.

  • Name_at, Name_split: likewise. And, in this case it doesn't even have to be long (int should be sufficient).

These don't seem to be addressed in that it still checks 0xffff (and
it still returns 'overflow' for a negative 'pos', see above).

  • RRTTL_init: this doesn't seem to be correct:

I think ignore overflow is fine for this specific case, so I'll keep "k", is this ok?

If I were you I'd rather provide consistent policy on overflow checks
at the minor risk of portability issue, but if you have a reason that
it's better to ignore, I'd not necessarily object to that. In that
case, however, please leave comment that this case is different from
other checks of this wrapper with your reason, and add the pydoc style
API documentation about the difference.

Also, in that case I wonder why this didn't cause a problem in your
32-bit environment:

    unsigned long i;
...
            if (i > 0xffffffff) {

I suspect g++ would normally warn about this situation like this:

warning: comparison is always false due to limited range of data type

which subsequently makes the build fail due to -Werror.

Besides, since "k" doesn't perform overflow checking, the check is
already quite moot.

  • message_python_test.py: the change makes the following comment incorrect:
             # this would cause overflow and result in a "valid" flag
    

Updated.

  • tests in general. I've not closely checked the test coverage, but from a quick look there seems to be some cases that were not tested (e.g. for RRTTL). Please make sure all new code is covered by tests (note also that the coverage problem is unlikely to happen if it was developed test driven); or, if I miss something and the code was fully covered already, please say so.

It should has been covered by original unittest:

self.assertRaises(InvalidRRTTL, RRTTL, "4294967296")

No, because "4294967296" is a string. Also, this answer seems to
suggest that the way of development may not really honor the spirit of
TDD. Since the previous code used "I", which allows overflow, a test
for 4294967296 (as an integer) would have had failed. When we wanted
to fix that, we first add a test that actually makes it fail, add code
that should fix it, and then confirm it really fixes the problem (but
with "k" we wouldn't be able to be sure about it in the end).

Besides, simply responding to the RRTTL case doesn't really address my
main concern. My question was more general, not only specific to
RRTTL: whether you confirmed all of your new code was really covered
by tests (preferably in the way of TDD). As I said, this was probably
not the case. Please make sure everything was really covered.

And one more thing regarding tests: now we tightened the range check
at the level of python wrapper, I'd also confirm the check isn't too
tight with the possible min/max values.

comment:14 Changed 9 years ago by jinmei

  • Owner changed from jinmei to zzchen_pku

comment:15 in reply to: ↑ 13 ; follow-up: Changed 9 years ago by zzchen_pku

  • Owner changed from zzchen_pku to jinmei

Replying to jinmei:

The "Actually..." part hasn't been addressed (but to address that
we'll also need to update the C++ Message/MessageRenderer?
implementation. So this set of changes would probably go to a
separate ticket).

Message_init, Message_clear, MessageRenderer_setCompressMode already have value check, for example:

if (i == Message::PARSE) {
    self->message = new Message(Message::PARSE);
    return (0);
} else if (i == Message::RENDER) {
    self->message = new Message(Message::RENDER);
    return (0);
} else {
    PyErr_SetString(PyExc_TypeError, "Message mode must be Message.PARSE or Message.RENDER");
    return (-1);
}

All other values will throw PyExc_TypeError exception.

Did you clarify the 'type error' vs 'overflow' for negative values
throughout the interfaces (not only for Name_init)? The error policy
should be consistent.
These don't seem to be addressed in that it still checks 0xffff (and
it still returns 'overflow' for a negative 'pos', see above).

After checking python doc again, I think ValueError? and IndexError? are more suitable for clarifying the exceptions, how do you fell?

  • RRTTL_init: this doesn't seem to be correct:

I think ignore overflow is fine for this specific case, so I'll keep "k", is this ok?

Okay, I use long long type instead.

Besides, simply responding to the RRTTL case doesn't really address my
main concern. My question was more general, not only specific to
RRTTL: whether you confirmed all of your new code was really covered
by tests (preferably in the way of TDD). As I said, this was probably
not the case. Please make sure everything was really covered.

Obviously, I missed something, should be more careful later.
I runned coverage tool again, all of my new code has been covered by unittest.

And one more thing regarding tests: now we tightened the range check
at the level of python wrapper, I'd also confirm the check isn't too
tight with the possible min/max values.

Most of them just check the possible range for corresponding type, which should be ok, did you find something with too tight checking?

comment:16 Changed 9 years ago by stephen

  • Milestone A-Team-Task-Backlog deleted

Milestone A-Team-Task-Backlog deleted

comment:17 Changed 9 years ago by shane

  • Milestone set to Sprint-20110405

comment:18 in reply to: ↑ 15 ; follow-up: Changed 9 years ago by jinmei

Replying to zzchen_pku:

First off: I made a minor cleanup and pushed it to the branch:
21718d6 [trac363] cleanup: removed a white space at EOL.

The "Actually..." part hasn't been addressed (but to address that
we'll also need to update the C++ Message/MessageRenderer?
implementation. So this set of changes would probably go to a
separate ticket).

Message_init, Message_clear, MessageRenderer_setCompressMode already have value check, for example:

if (i == Message::PARSE) {
    self->message = new Message(Message::PARSE);
    return (0);
} else if (i == Message::RENDER) {
    self->message = new Message(Message::RENDER);
    return (0);
} else {
    PyErr_SetString(PyExc_TypeError, "Message mode must be Message.PARSE or Message.RENDER");
    return (-1);
}

Either you didn't take my point or I was not clear enough. What I meant was that it would make more sense to update, e.g., MessageImpl::init() so that it would reject invalid "mode":

MessageImpl::init() {
    if (mode_ != Message::PARSE || mode_ != Message::RENDER) {
        isc_throw(something);
    }
    ...
}

and remove the checks in the python wrapper (like the above if-else if-else).

And, such updates would be beyond the scope of this ticket, and it would make more sense to leave them to a separate ticket.

These don't seem to be addressed in that it still checks 0xffff (and
it still returns 'overflow' for a negative 'pos', see above).

After checking python doc again, I think ValueError? and IndexError? are more suitable for clarifying the exceptions, how do you fell?

I don't have a strong opinion, but after re-checking the python document, I tend to agree that IndexError? is a bit better than ValueError? for the cases you chaged. So the latest code is okay for me on this point.

One more thing. I suspect the limitation in MessageRenderer_setLengthLimit() is too restrictive because the corresponding libdns++ backend expects size_t. I'd simply reject negative value.

And one more thing regarding tests: now we tightened the range check
at the level of python wrapper, I'd also confirm the check isn't too
tight with the possible min/max values.

Most of them just check the possible range for corresponding type, which should be ok, did you find something with too tight checking?

From a quick eye-grep, no, but the point is that it would be safer to make it sure by tests. Boundary conditions are one of the common points where human beings make mistakes.

comment:19 Changed 9 years ago by jinmei

  • Owner changed from jinmei to zzchen_pku

comment:20 in reply to: ↑ 18 ; follow-up: Changed 9 years ago by zzchen_pku

  • Owner changed from zzchen_pku to jinmei

Replying to jinmei:

First off: I made a minor cleanup and pushed it to the branch:
21718d6 [trac363] cleanup: removed a white space at EOL.

Thanks.

Either you didn't take my point or I was not clear enough. What I meant was that it would make more sense to update, e.g., MessageImpl::init() so that it would reject invalid "mode":

MessageImpl::init() {
    if (mode_ != Message::PARSE || mode_ != Message::RENDER) {
        isc_throw(something);
    }
    ...
}

and remove the checks in the python wrapper (like the above if-else if-else).

And, such updates would be beyond the scope of this ticket, and it would make more sense to leave them to a separate ticket.

Got it :), I have created #842 for it.

One more thing. I suspect the limitation in MessageRenderer_setLengthLimit() is too restrictive because the corresponding libdns++ backend expects size_t. I'd simply reject negative value.

Updated.

From a quick eye-grep, no, but the point is that it would be safer to make it sure by tests. Boundary conditions are one of the common points where human beings make mistakes.

Refined the unittest and added more boundary test cases.

comment:21 in reply to: ↑ 20 ; follow-up: Changed 9 years ago by jinmei

Replying to zzchen_pku:

And, such updates would be beyond the scope of this ticket, and it would make more sense to leave them to a separate ticket.

Got it :), I have created #842 for it.

Okay, thanks. I've updated that text with a reference to this ticket
for the background.

From a quick eye-grep, no, but the point is that it would be safer to make it sure by tests. Boundary conditions are one of the common points where human beings make mistakes.

Refined the unittest and added more boundary test cases.

In name_python_test.py, this test seems to be duplicate. Missing
cleanup or intentional?

        s = self.name1.split(0)
        self.assertEqual("example.com.", s.to_text())

I've also added some more tests for boundary cases (pushed).

If the above is just missing cleanup. please clean it up and merge it.

comment:22 Changed 9 years ago by jinmei

  • Owner changed from jinmei to zzchen_pku

comment:23 in reply to: ↑ 21 Changed 9 years ago by zzchen_pku

Replying to jinmei:

In name_python_test.py, this test seems to be duplicate. Missing
cleanup or intentional?

        s = self.name1.split(0)
        self.assertEqual("example.com.", s.to_text())

Oh, it is just missing cleanup.

I've also added some more tests for boundary cases (pushed).

Thanks.

comment:24 Changed 9 years ago by zzchen_pku

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged, closing it. Thanks.

comment:25 Changed 9 years ago by zzchen_pku

  • Estimated Difficulty changed from 0.0 to 3.0
Note: See TracTickets for help on using tickets.