Opened 9 years ago

Closed 9 years ago

#703 closed enhancement (complete)

Testing resolver with bad packets

Reported by: stephen Owned by: stephen
Priority: medium Milestone: Sprint-20110419
Component: Unclassified Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: 8.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Part of #561, this will test the resolver by sending corrupt packets to it and checking the response.

Subtickets

Attachments (1)

BadPacketTests_2011-04-13.txt (4.7 KB) - added by stephen 9 years ago.
Results of tests carried out on 13 April 2011

Download all attachments as: .zip

Change History (13)

comment:1 Changed 9 years ago by stephen

  • Owner set to stephen
  • Status changed from new to assigned

comment:2 Changed 9 years ago by stephen

  • Milestone R-Team-Task-Backlog deleted

Milestone R-Team-Task-Backlog deleted

comment:3 Changed 9 years ago by stephen

  • Milestone set to Sprint-20110405

comment:4 Changed 9 years ago by stephen

  • Owner changed from stephen to UnAssigned
  • Status changed from assigned to reviewing

First pass of the bad packet program is now ready for review. It will send a set of packets to the target nameserver, varying the contents of the flags word (the second 16 bits of the header).

comment:5 Changed 9 years ago by vorner

  • Owner changed from UnAssigned to vorner

comment:6 Changed 9 years ago by vorner

  • Owner changed from vorner to stephen

Hello

First, something that probably will not result in any code changes, but:

  • Why C++? Wouldn't python be enough? I believe the performance here is not that important (after all, single run can generate some 64k packets only). Not that it would make sense to rewrite it now, of course.
  • When this is merged, is it expected that the ticket will continue? Because this, for sure, aren't all possible ways to break the packet, I'd guess things like setting a name compression pointer or some length to get out of the packet would be more dangerous (I could imagine the server to crash more willingly to such broken packet than just a strange bits, in which case I would expect a broken server to simply misbehave).
  • Did you run it against the server? Everything OK?
  • We now have tests/tools and tools folders. Do we want to unify it somehow?

Then, some high-level observations:

  • You seem to use the DWIM behaviour in case of ranges, etc. Isn't it better to complain loudly if I set a range of flags 1-10000? Because then, I'm surprised it won't send 1000 packets at the server. Or I could have just mistyped it and didn't notice, so I spent quite some time observing the wrong results. Also, you do it not only with the parser, but with setFlag methods ‒ if any code tries to put something too large there, the code contains a bug and it should fail I believe. Or is there any reason for this behaviour I didn't notice?
  • You name each flag as separate symbol, function, etc. That leads to quite repetitive long code, really nested code, etc. If there was an enum of the flags, array of the masks and shifts for each of them, and the flags was an array as well, a lot of the code could be made a lot shorter. The only disadvantage would be it would be called something like this:
    flags.setFlag(RCODE, 3);
    

which can be little bit more error prone (eg. there's a need to check that the passed enum is in range).

BTW, don't we have such constants as these somewhere in the code?

  • There are tests for the code, so I guess this is something we want to distribute as a tool for people to use. I'm not sure if everything is tested, there's quite some code, but it seems there are less test filest than the .cc files. Or did I miscount?

Minor points about the actual code:

  • /// TODO: Need to randomise the source port
    

You have this in the middle of doxygen. Is it really something to be in the API level documentation, shouldn't the TODO be inside the implementation? And if so, wouldn't \todo be better, since doxygen understands it and creates indices of all such todos in the code, not to mention nicer formatting.

  • You mention that port has a default of 53, but the function header has no default at the port argument.
  • You have a different ordering of '\param' directives than the real parameters. This could be misleading.
  • Typo/missing word? „that invalid in a query“.
  • This is missing something:
    /// Parses the command-line options and returns the results in an Options
    /// structure.  If the 
    void
    
  • Did we talk about getopt_long? I don't really remember, but is it available everywhere?
  • You talk about boost::program_options namespace, but I didn't find anything using it.
  • This isn't very helpful (I can parse it, but some friendlier error wouldn't hurt):
    19:03 vorner@hydra ~/work/bind10/tests/tools/badpacket $ ./badpacket --qr bla
    terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_lexical_cast> >'
      what():  bad lexical cast: source type value could not be interpreted as target
      Aborted (core dumped)
    

With regards

Changed 9 years ago by stephen

Results of tests carried out on 13 April 2011

comment:7 follow-up: Changed 9 years ago by stephen

  • Estimated Difficulty changed from 3.0 to 8
  • Owner changed from stephen to vorner

Why C++? Wouldn't python be enough? I believe the performance here is not that important (after all, single run can generate some 64k packets only). Not that it would make sense to rewrite it now, of course.

Mainly because I'm more familiar with C++ and the various C++ libraries we use to handle sending and receiving DNS messages, so it was faster to write it that language. Also, since the program uses the existing IOFetch class for the I/O, the code essentially consists of little more than parsing options and setting fields in a message buffer.

When this is merged, is it expected that the ticket will continue? Because this, for sure, aren't all possible ways to break the packet, I'd guess things like setting a name compression pointer or some length to get out of the packet would be more dangerous (I could imagine the server to crash more willingly to such broken packet than just a strange bits, in which case I would expect a broken server to simply misbehave).

I wanted to wrap this ticket up why I restricted it to just setting flags. However in the changes made to this review I have extended it to include setting the various section counts and to artificially alter the message length. The README file (and a TODO comment in badpacket.cc) contain these suggestions and I will create additional tickets for the extension.

Did you run it against the server? Everything OK?

Duh! I was so focused on getting the code into a state where it could be included in the distribution, I omitted to report on the tests. I've run the current version of the program against the current development version of the resolver - the results are in the attachment to this ticket. In summary, the resolver passed the tests, although I do have a question as to the rcode returned when a NOTIFY message is received.

We now have tests/tools and tools folders. Do we want to unify it somehow?

That's a longer-term goal. The idea behind this program is to make it one of the system tests (which is why it is in the test folder) rather than to make it a fully-supported utility. Hence placing it in tests.

You seem to use the DWIM behaviour in case of ranges, etc. Isn't it better to complain loudly if I set a range of flags 1-10000? ...

The reason was that the tool is not intended for general use, so I assumed that the person using it would know what the bounds for the flags are.

However, you are right - since the code checks the value given against limits, there is no reason why it can't output an error message if the value given falls outside these limits. The code has been changed to report the error.

You name each flag as separate symbol, function, etc. That leads to quite repetitive long code, really nested code, etc. If there was an enum of the flags, array of the masks and shifts for each of them, and the flags was an array as well, a lot of the code could be made a lot shorter.

That's what you get when you focus on a limited set of objectives and not on extensibility. I have altered it to use arrays and made the changes you suggested. And in fairness, it did make it easier to make the extensions I described above.

BTW, don't we have such constants as these somewhere in the code?

We have some, but not all, in src/lib/dns/message.h (e.g. we don't have the offsets). For completeness I specified them all here.

There are tests for the code, so I guess this is something we want to distribute as a tool for people to use.

No - all code should be tested, even if we aren't distributing it. Whether we want this as a fully-supported tool is something that needs to be discussed.

...I'm not sure if everything is tested, there's quite some code, but it seems there are less test filest than the .cc files. Or did I miscount?

The .cc files for which there are no tests are badpacket.cc - which contains little more than "parse command line then call the scan" and scan.c, the scan itself. The latter is difficult to test as it actually sends and receives packets. However, a visual check of the output confirms that the scan module is behaving as expected.

TODO: Need to randomise the source port

This was left over from an earlier implementation of IOFetch. As port randomisation has been included in a previous ticket, the comment has been removed.

You mention that port has a default of 53, but the function header has no default at the port argument.

This was missed in a previous ticket. The comment has been removed as the "port" argument is followed by arguments that have no defaults.

You have a different ordering of '\param' directives than the real parameters. This could be misleading.

This was missed in a previous ticket. Fixed.

Typo/missing word? „that invalid in a query“.

Fixed.

This is missing something...

Fixed.

Did we talk about getopt_long? I don't really remember, but is it available everywhere?

The last time I looked I it was, although on one system the calling sequence is different (there is an additional argument). I thought that this was Solaris but that's not the case - the code built on Solaris, OS X and FreeBSD. In this case getopt_long is needed because there are so many cryptic options, so if the build fails I'll add an #ifdef for that operating system.

You talk about boost::program_options namespace, but I didn't find anything using it.

Removed. Originally I was going to use program_options but this requires linking in another library.

This isn't very helpful (I can parse it, but some friendlier error wouldn't hurt):

The bad lexical cast exception is now intercepted and a more friendly message is output.

Thank you for the helpful review.

comment:8 in reply to: ↑ 7 Changed 9 years ago by vorner

  • Owner changed from vorner to stephen

Hello

First of all, I bow to your incredible typing skills. To write enough code so it presents me with 6k lines of patches in a day, it's something to be proud of ;-).

Replying to stephen:

Did you run it against the server? Everything OK?

Duh! I was so focused on getting the code into a state where it could be included in the distribution, I omitted to report on the tests. I've run the current version of the program against the current development version of the resolver - the results are in the attachment to this ticket. In summary, the resolver passed the tests, although I do have a question as to the rcode returned when a NOTIFY message is received.

ACK.

There are tests for the code, so I guess this is something we want to distribute as a tool for people to use.

No - all code should be tested, even if we aren't distributing it. Whether we want this as a fully-supported tool is something that needs to be discussed.

Well, that would be really hard, if all test code would have to be tested as well, we would need to write a valid test code that could test something else and itself as well. It probably could be proven such code exists, but I think we have an exception that test code doesn't have to be tested ;-). Of course, testing some of it voluntarily isn't a problem.

Did we talk about getopt_long? I don't really remember, but is it available everywhere?

The last time I looked I it was, although on one system the calling sequence is different (there is an additional argument). I thought that this was Solaris but that's not the case - the code built on Solaris, OS X and FreeBSD. In this case getopt_long is needed because there are so many cryptic options, so if the build fails I'll add an #ifdef for that operating system.

I see. It's not a problem with a test program.

This isn't very helpful (I can parse it, but some friendlier error wouldn't hurt):

The bad lexical cast exception is now intercepted and a more friendly message is output.

Well, the unhelpful text of the what() was part of it, but you seem to rely on the fact that the exception is printed automatically. But it isn't so much a problem I guess, if it will be only us running it.

I noticed you used the enum as a parameter, but went for int in some later version. Why?

And, is it allowed by standard to have multiple enum elements with the same numerical value? Isn't it just that gcc is benevolent?

With regards

comment:9 follow-up: Changed 9 years ago by stephen

  • Owner changed from stephen to vorner

Well, the unhelpful text of the what() was part of it, but you seem to rely on the fact that the exception is printed automatically. But it isn't so much a problem I guess, if it will be only us running it.

The code does rely on getopt_long() printing the option in error because the loop that interprets the options only receives a '?' in the case of an unrecognised option. (In this case the code terminates by throwing an exception derived from isc::Exception which, like all such exceptions in this program, is caught (and printed) in main().)

I noticed you used the enum as a parameter, but went for int in some later version. Why?

The code was changed to use arrays and the elements of the enum are now used to name specific indexes in those arrays. Also declared in the enum is a variable denoting the size of the arrays. So the enum is really being used as a convenient way of declaring integer constants. This was reinforced by a change to the code that involves iterating through the arrays; the index element in these cases is an int. As there as there is no default conversion between an "int" and an element of an "enum", the compiler was generating errors when the index was passed to a method expecting an enum. So the decision was made to move to int's.

Of course, it would have been possible to provide an explicit cast between int and the enum. An alternative would have been to overload operator++() etc. to provide iterator-like behaviour to variables of that type. But this would have required additional work and it was judged just not to be worth it for the gain.

And, is it allowed by standard to have multiple enum elements with the same numerical value? Isn't it just that gcc is benevolent?

I've not found mention of any restriction. If there turns out to be, it would be trivial to define another enum containing the synonyms.

BTW, the intended ChangeLog entry is:

  XXX.  [func]          stephen
        Added the 'badpacket' program for testing which sends a set of 
        (potentially) bad packets to a nameserver and prints the responses.
        (Trac #703, git yyyyy...)

comment:10 in reply to: ↑ 9 Changed 9 years ago by vorner

  • Owner changed from vorner to shane

Hello

Replying to stephen:

Well, the unhelpful text of the what() was part of it, but you seem to rely on the fact that the exception is printed automatically. But it isn't so much a problem I guess, if it will be only us running it.

The code does rely on getopt_long() printing the option in error because the loop that interprets the options only receives a '?' in the case of an unrecognised option. (In this case the code terminates by throwing an exception derived from isc::Exception which, like all such exceptions in this program, is caught (and printed) in main().)

Ah, sorry, I seem to have overlooked the try-catch in the main and though it's the automatic output of uncaught exception.

The code was changed to use arrays and the elements of the enum are now used to name specific indexes in those arrays. Also declared in the enum is a variable denoting the size of the arrays. So the enum is really being used as a convenient way of declaring integer constants. This was reinforced by a change to the code that involves iterating through the arrays; the index element in these cases is an int. As there as there is no default conversion between an "int" and an element of an "enum", the compiler was generating errors when the index was passed to a method expecting an enum. So the decision was made to move to int's.

Crap, yes, I see, the ++ doesn't work on them in C++. It must have been C, where I used a for cycle to iterate over enum. Anyway, never mind.

And, is it allowed by standard to have multiple enum elements with the same numerical value? Isn't it just that gcc is benevolent?

I've not found mention of any restriction. If there turns out to be, it would be trivial to define another enum containing the synonyms.

OK. It just looked suspicious.

BTW, the intended ChangeLog entry is:

  XXX.  [func]          stephen
        Added the 'badpacket' program for testing which sends a set of 
        (potentially) bad packets to a nameserver and prints the responses.
        (Trac #703, git yyyyy...)

OK.

So, please merge.

Thanks

comment:11 Changed 9 years ago by shane

  • Owner changed from shane to stephen

comment:12 Changed 9 years ago by stephen

  • Resolution set to complete
  • Status changed from reviewing to closed

Merged with master in commit 1b666838b6c0fe265522b30971e878d9f0d21fde

Note: See TracTickets for help on using tickets.