Opened 5 years ago

Closed 5 years ago

#3743 closed enhancement (fixed)

Feature suggestion: log of packet drops and NAK reasons

Reported by: nicolas.chaigneau Owned by: tmark
Priority: high Milestone: Kea0.9.2-beta
Component: Unclassified Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

I'm opening this ticket following a discussion I started on the ML, archived here:
https://lists.isc.org/pipermail/kea-dev/2015-March/000279.html

The initial mail below:

I know you already have plan to improve logging.
We previously discussed with Tomek of what we're going to do with hooks: allowing to log each DHCP packet received and sent, along with some information such as: who sent the packet, which lease was requested and so on.

This could be useful to other people, so we agreed it would be nice to have this in Kea kernel code.

To expand on this, I thought of other things that would be very useful to us.

For each packet dropped by Kea, I would like to see a log describing the precise reason why it was dropped.
And same thing for rejects (NAKs).

For example:

2015-03-03 09:08:03.520 INFO [kea-packet-info/2720] DHCP4_DROP_INFO (1) DHCPv4 Request packet (xid: 3211589407) dropped: it contains foreign server identifier
2015-03-03 09:08:17.315 INFO [kea-packet-info/2720] DHCP4_DROP_INFO (2) BOOTP packet dropped: BOOTP is not supported
2015-03-03 09:08:34.139 INFO [kea-packet-info/2720] DHCP4_DROP_INFO (3) DHCPv4 Inform packet (xid: 4435657643) dropped: client 6c:3e:6d:b2:62:02 is unreachable through address 10.156.0.127
2015-03-03 09:11:02.120 INFO [kea-packet-info/2720] DHCP4_REJECT_INFO (42) DHCPv4 Request packet (xid: 5532642) rejected: invalid address 10.106.238.106 requested by INIT-REBOOT client 30:a8:db:89:90:68

And so on.

It should be possible to automatically parse these logs without prior knowledge of each possible case.
To do so, I suggest to have:

  • a common message identifier (for example: DHCP4_DROP_INFO, DHCP4_REJECT_INFO).
  • a numeric category identifier, allowing to classify the different reasons for dropping / rejecting.
  • and a text reason which can vary for each packet, containing relevant information.

Additionally, I suggest adding a new logger to handle these messages.
This would allow to easily enable or disable such logs independently of any other logs.

Let me expand a bit on why I would like to have this feature:

This is a pretty important need to me, because I'm replaying network captures of real DHCP traffic on our lab environment.
I need to be able to explain everything that Kea is doing, such as how many packets are dropped or rejected and why exactly.
Then I can conclude if there are issues that should be reported.

Currently, this is an empirical and time consuming task.
I have to look into the DEBUG logs and try and find relevant messages (sometimes there are none), and correlate with information from the network capture.
The capture is pretty big, and I need to do this each time a new version of Kea is available. Furthermore, in the future we plan to take and replay more network captures.

Subtickets

Change History (13)

comment:1 Changed 5 years ago by hschempf

  • Milestone changed from Kea-proposed to Kea0.9.2

comment:2 Changed 5 years ago by hschempf

  • Priority changed from medium to high

comment:3 Changed 5 years ago by tmark

  • Owner set to tmark
  • Status changed from new to accepted

comment:4 Changed 5 years ago by tmark

Ticket is ready for review, git 086343cf7c919885c6b3a0f84d4ae3db94ab107f:

I added a new logger, "bad_packet" to the DHCPv4 server. It is a child of the server's root logger "kea-dhcp4". It may be configured as "kea-dhcp4.bad_packet".

Log statements for packets that were either dropped by the server or to which the server replied with a NAK are now sent to this new logger. By default, these log statements will follow the root logger configuration.

I also changed the log message identifiers for these message to be either DHCP4_PACKET_DROP_<xxxx> or DHCP4_PACKET_NAK_<xxxx> where <xxxx> is a unique number. This seemed to be reasonable way to implement what the requester was seeking: a means to parse for drops or NAKs without having to know all the possible message identifiers. Note that I did not change the message text for these messages, only the message identifiers.

I updated the Admin guide section on loggers to include the new logger.

I suggest the following as a ChangeLog entry:

xxx.    [func]      tmark
    The DHCPv4 server now logs packets it has either dropped as invalid
    or to which it has replied with a NAK to a separate logger,
    "kea_dhcp4.bad_packet".  These log entries logs will contain an
    identifier of either DHCP4_PACKET_DROP_<xxxx> or DHCP4_PACKET_NAK_<xxxx>
    where <xxx> is a numeric value unique to the cause of the packet action
    along with text explanation.    
    (Trac #3743 git TBD)

Last edited 5 years ago by tmark (previous) (diff)

comment:5 Changed 5 years ago by tmark

  • Owner changed from tmark to UnAssigned
  • Status changed from accepted to reviewing

comment:6 Changed 5 years ago by marcin

  • Owner changed from UnAssigned to marcin

comment:7 Changed 5 years ago by marcin

  • Owner changed from marcin to tmark

Reviewed commit 086343cf7c919885c6b3a0f84d4ae3db94ab107f

doc/guide/logging.xml
Typo "clien" instead of "client".

src/bin/dhcp4/dhcp4_log.cc
Please update copyright date.

You should probably use the DHCP4_APP_LOGGER_NAME in the dhcp4_logger's constructor?

src/bin/dhcp4/dhcp4_log.h
Please update copyright date.

src/bin/dhcp4/dhcp4_messages.mes
The description of the DHCP4_PACKET_NAK_0001 should be corrected to indicate that it is the error message, not the warning messsage.

src/bin/dhcp4/dhcp4_srv.cc
A general comment which applies to all messages logged from the dhcp4_srv.cc is that they should better identify the packets to which they pertain.

One of the goals for the "Logging and Diagnostics" feature for 0.9.2 is to improve logging in dhcp4_srv.cc such that the log messages include the client's unique identifier: HW address or/and Client Id and transaction id. See requirements here: http://kea.isc.org/wiki/DiagnosticsRequirements.

I think that appropriate changes for the "bad_packet" logger messages should be done within this ticket.

I wonder if the DHCP4_PACKET_DROP_0002 should be logged at INFO level. I think it should rather be DEBUG just like other messages of this type. And, yes maybe that was even me who wrote this part, but people makes mistakes, no?

When I am running unit tests with verbose logging turned on I get the following error message before each logged message by bad-packet logger:

2015-04-17 12:11:58.145 ERROR [kea.bad-packet/14061] Unable to lock logger lockfile
2015-04-17 12:11:58.145 DEBUG [kea.bad-packet/14061] DHCP4_PACKET_DROP_0004 packet received on interface eth1 dropped, because of missing msg-type option

I am using OS-X for this test. Can you see the same thing?

BTW, it is sort of strange that the logger name which appears in the log file is [kea.bad-packet], rather than [kea-dhcp4.bad-packet] but this is probably the name which we use for unit tests?

ChangeLog
This is too many details in my opinion:

 " These log entries logs will contain an
    identifier of either DHCP4_PACKET_DROP_<xxxx> or DHCP4_PACKET_NAK_<xxxx>
    where <xxx> is a numeric value unique to the cause of the packet action
    along with text explanation."

Maybe just remove it.

comment:8 Changed 5 years ago by tmark

  • Owner changed from tmark to marcin

Replying to marcin:

Reviewed commit 086343cf7c919885c6b3a0f84d4ae3db94ab107f

doc/guide/logging.xml
Typo "clien" instead of "client".

Fixed.

src/bin/dhcp4/dhcp4_log.cc
Please update copyright date.

Fixed.

You should probably use the DHCP4_APP_LOGGER_NAME in the dhcp4_logger's constructor?

Oops. Fixed.

src/bin/dhcp4/dhcp4_log.h
Please update copyright date.

Fixed.

src/bin/dhcp4/dhcp4_messages.mes
The description of the DHCP4_PACKET_NAK_0001 should be corrected to indicate that it is the error message, not the warning messsage.

Ok, I have corrected your mistake. ;)

src/bin/dhcp4/dhcp4_srv.cc
A general comment which applies to all messages logged from the dhcp4_srv.cc is that they should better identify the packets to which they pertain.

One of the goals for the "Logging and Diagnostics" feature for 0.9.2 is to improve logging in dhcp4_srv.cc such that the log messages include the client's unique identifier: HW address or/and Client Id and transaction id. See requirements here: http://kea.isc.org/wiki/DiagnosticsRequirements.

I think that appropriate changes for the "bad_packet" logger messages should be done within this ticket.

I added the following methods:

virtual std::string Pkt::getLabel() 
static std::string Pkt4::makeLabel(HWAddrPtr hwaddr, OptionPtr client_id, uint32_t transid);
std::string Pkt4::getLabel()  - this calls Pkt4::makeLabel

and modified the DROP_ and NAK_ log messages to use the getLabel() call and to rearranged them to be more or less uniform, i.e. parsable.

I wonder if the DHCP4_PACKET_DROP_0002 should be logged at INFO level. I think it should rather be DEBUG just like other messages of this type. And, yes maybe that was even me who wrote this part, but people makes mistakes, no?

My you're fickle. I have changed it to DEBUG for you.

When I am running unit tests with verbose logging turned on I get the following error message before each logged message by bad-packet logger:

2015-04-17 12:11:58.145 ERROR [kea.bad-packet/14061] Unable to lock logger lockfile
2015-04-17 12:11:58.145 DEBUG [kea.bad-packet/14061] DHCP4_PACKET_DROP_0004 packet received on interface eth1 dropped, because of missing msg-type option

I am using OS-X for this test. Can you see the same thing?

Yes, it occurs on Centos as well. It has to do with our interaction with GoogleTest?. I have
verified that the server works normally outside GoogleTest?. I have created #3813 to address
the issue.

BTW, it is sort of strange that the logger name which appears in the log file is [kea.bad-packet], rather than [kea-dhcp4.bad-packet] but this is probably the name which we use for unit tests?

Yes, this is appears to because we call isc::log::initLogger() with no arguments for unit testing.
The name is correct under normal server operation.

ChangeLog
This is too many details in my opinion:

 " These log entries logs will contain an
    identifier of either DHCP4_PACKET_DROP_<xxxx> or DHCP4_PACKET_NAK_<xxxx>
    where <xxx> is a numeric value unique to the cause of the packet action
    along with text explanation."

Maybe just remove it.

Ok. Will shorten it.

comment:9 follow-up: Changed 5 years ago by marcin

  • Owner changed from marcin to tmark

Reviewed commit 2c0e24ab5bd406fdb4ce76d335af50d6a47d35fd

pk4.h

I think that Pkt::getLabel could easily be const. No?

In addition, parameters passed to the Pkt4::makeLabel should could be passed by const reference and the transid could be const.

Some typos in the pkt4.h: "Returns text representation primary packet identifiers". It should be "Returns text representation of the primary packet identifiers".

Also "makeLabel()()" shoul be "makeLabel()".

For the makeLabel it maybe worth to mention that passed pointers may be NULL.

The makeLabel uses OptionPtr? as a source of the client id and it calls the OptionPtr::toText to obtain the textual representation of the option. I think it would be better to use ClientId? class instead. As it is now, it gives the following output:

2015-04-20 18:49:22.316 DEBUG [kea.bad-packet/8918] DHCP4_PACKET_DROP_0002 hwaddr=[hwtype=1 ], client-id=[type=61, len=4: 64:65:66:67], transid=0x4d2, from interface eth1: 

Don't you think that the information is a bit redundant with the "type=61"?. Also, what about the case when you actually don't have the option holding client id, but the ClientId? object, like in the AllocEngine::ClientContext4. Also, we are basically aiming to extract the data from the option only once and keep it in the AllocEngine::ClientContext4 (held in Dhcp4Exchange) throughout the transaction.

pkt4.cc

makeLabel: The tmp.str() should be in parens. Also, tmp is probably not the best name but, up to you.:-)

comment:10 in reply to: ↑ 9 ; follow-up: Changed 5 years ago by tmark

Replying to marcin:

Reviewed commit 2c0e24ab5bd406fdb4ce76d335af50d6a47d35fd

pk4.h

I think that Pkt::getLabel could easily be const. No?

Done.

In addition, parameters passed to the Pkt4::makeLabel should could be passed by const reference and the transid could be const.

Done, but aren't there compilers somewhere that will complain about "const" on an integer parameter? If so, you can fix the build when it breaks.

Some typos in the pkt4.h: "Returns text representation primary packet identifiers". It should be "Returns text representation of the primary packet identifiers".

Also "makeLabel()()" shoul be "makeLabel()".

Done.

For the makeLabel it maybe worth to mention that passed pointers may be NULL.

Done.

The makeLabel uses OptionPtr? as a source of the client id and it calls the OptionPtr::toText to obtain the textual representation of the option. I think it would be better to use ClientId? class instead. As it is now, it gives the following output:

2015-04-20 18:49:22.316 DEBUG [kea.bad-packet/8918] DHCP4_PACKET_DROP_0002 hwaddr=[hwtype=1 ], client-id=[type=61, len=4: 64:65:66:67], transid=0x4d2, from interface eth1: 

Don't you think that the information is a bit redundant with the "type=61"?. Also, what about the case when you actually don't have the option holding client id, but the ClientId? object, like in the AllocEngine::ClientContext4. Also, we are basically aiming to extract the data from the option only once and keep it in the AllocEngine::ClientContext4 (held in Dhcp4Exchange) throughout the transaction.

I have altered Pkt4::makeLabel() to expect a ClientIdPtr now so we get more concise
output. Obviously this required altering Pkt4::getLabel() to create a ClientIdPtr from
the client option if it exists in the packet.

As per our jabber session, it might be more appropriate or efficient to have Pkt4 extract the client id option once upon unpacking it and retain the value as an instance member but I am inclined to leave that to future work.

pkt4.cc

makeLabel: The tmp.str() should be in parens. Also, tmp is probably not the best name but, up to you.:-)

Well since I was following the precedent for the name "tmp" set in Pkt4::toText() I really didn't concern myself too much. While its use is rather self-evident in such a small method body I have renamed so you would not lose further sleep over the matter.

comment:11 Changed 5 years ago by tmark

  • Owner changed from tmark to marcin

comment:12 in reply to: ↑ 10 Changed 5 years ago by marcin

  • Owner changed from marcin to tmark

Replying to tmark:

Replying to marcin:

Reviewed commit 2c0e24ab5bd406fdb4ce76d335af50d6a47d35fd

pk4.h

I think that Pkt::getLabel could easily be const. No?

Done.

In addition, parameters passed to the Pkt4::makeLabel should could be passed by const reference and the transid could be const.

Done, but aren't there compilers somewhere that will complain about "const" on an integer parameter? If so, you can fix the build when it breaks.

cppcheck doesn't like const before enums, but I am not aware of any issues with const before integers.

Some typos in the pkt4.h: "Returns text representation primary packet identifiers". It should be "Returns text representation of the primary packet identifiers".

Also "makeLabel()()" shoul be "makeLabel()".

Done.

For the makeLabel it maybe worth to mention that passed pointers may be NULL.

Done.

The makeLabel uses OptionPtr? as a source of the client id and it calls the OptionPtr::toText to obtain the textual representation of the option. I think it would be better to use ClientId? class instead. As it is now, it gives the following output:

2015-04-20 18:49:22.316 DEBUG [kea.bad-packet/8918] DHCP4_PACKET_DROP_0002 hwaddr=[hwtype=1 ], client-id=[type=61, len=4: 64:65:66:67], transid=0x4d2, from interface eth1: 

Don't you think that the information is a bit redundant with the "type=61"?. Also, what about the case when you actually don't have the option holding client id, but the ClientId? object, like in the AllocEngine::ClientContext4. Also, we are basically aiming to extract the data from the option only once and keep it in the AllocEngine::ClientContext4 (held in Dhcp4Exchange) throughout the transaction.

I have altered Pkt4::makeLabel() to expect a ClientIdPtr now so we get more concise
output. Obviously this required altering Pkt4::getLabel() to create a ClientIdPtr from
the client option if it exists in the packet.

As per our jabber session, it might be more appropriate or efficient to have Pkt4 extract the client id option once upon unpacking it and retain the value as an instance member but I am inclined to leave that to future work.

pkt4.cc

makeLabel: The tmp.str() should be in parens. Also, tmp is probably not the best name but, up to you.:-)

Well since I was following the precedent for the name "tmp" set in Pkt4::toText() I really didn't concern myself too much. While its use is rather self-evident in such a small method body I have renamed so you would not lose further sleep over the matter.

I have applied minor fixes to the code, so please pull. If you're ok with my fixes the code is ready for take off.

comment:13 Changed 5 years ago by tmark

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged with git cb91ca851099423e1b6c39cca3f3e2ba29795a51
Added ChangeLog? entry 923.

Ticket is cloesed.

Note: See TracTickets for help on using tickets.