Opened 7 years ago

Closed 6 years ago

#2765 closed defect (fixed)

b10-dhcp4 silently fails if bootp/dhcp4 port is already used on one interface

Reported by: cas Owned by: marcin
Priority: medium Milestone: Kea0.8
Component: dhcp4 Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

I had the isc-dhcp DHCP relay running on one interface, and was hoping that b10-dhcp4 would ignore that interface (with a log error) but would bind to the other interfaces on the machine.

However it seems that b10-dhcp4 will silently fail if the bootp port is used on one interface. I could not find any error messages in the logs about it.

Once I stopped dhcrelay, it all started fine

Subtickets

Attachments (1)

patch.2765 (2.8 KB) - added by dclink 6 years ago.

Download all attachments as: .zip

Change History (15)

comment:1 Changed 7 years ago by jelte

  • Milestone changed from New Tasks to DHCP Outstanding Tasks

comment:2 Changed 7 years ago by dclink

  • Milestone changed from DHCP Outstanding Tasks to Sprint-DHCP-20130904
  • Owner set to dclink
  • Status changed from new to assigned

comment:3 Changed 7 years ago by dclink

I provide a patch for when a socket is created with LPF Packet Filter. It adds a quick check before Filter creation. Given with a new Linux specific unit test.

Last edited 6 years ago by dclink (previous) (diff)

Changed 6 years ago by dclink

comment:4 Changed 6 years ago by dclink

  • Owner changed from dclink to UnAssigned
  • Status changed from assigned to reviewing

comment:5 Changed 6 years ago by tomek

  • Milestone changed from Sprint-DHCP-20130904 to Sprint-DHCP-20130918

comment:6 Changed 6 years ago by tomek

  • Milestone changed from Sprint-DHCP-20130918 to Sprint-DHCP-20131016

comment:7 Changed 6 years ago by marcin

  • Owner changed from UnAssigned to marcin

comment:8 Changed 6 years ago by tomek

  • Milestone changed from Sprint-DHCP-20131016 to Sprint-DHCP-20131204

comment:9 Changed 6 years ago by marcin

  • Owner changed from marcin to UnAssigned

dclink, many thanks for providing a patch. That was a good idea to open another socket to check if there is anything already bound. The problem I found with the patch was that IP/UDP socket was closed as soon as we checked that there is no other process bound to the address/port. If we close the socket and leave the raw socket, the kernel doesn't know that Kea is listening to the DHCP traffic and it is possible to open another DHCP server instance and bind to the same address and port. In order to make it a little bit better I decided that the IP/udp socket remains open as long as the raw socket is open. Following the terminology of the isc-dhcp, I refer to it as ''fallback'' socket. So, neither the fallback socket will open if there is another DHCP instance bound to the address/port nor any other instance will bind to the same address/port as Kea if Kea's socket is already bound. Obviously, if the other DHCP instance uses raw sockets only, it can still listen on the same address/port.

Another important change that comes with this ticket is to gracefully handle failures to open a socket. In other words, we usually don't want the IfaceMgr::openSockets4 to throw an exception when socket fails to open because this will effectively mean that no other sockets (on other interfaces) will open. Instead we should log warning. The libdhcp++ doesn't have any logger associated with it and it communicates errors through exceptions. I was wondering how to extend the IfaceMgr::openSockets4 function not to throw an exception when it fails to open a socket, but in the same time to communicate the error to the server. I realized that having an error handler function which would be called on error, is the simplest and the most flexible approach. If the handler is not installed, the default behaviour is preserved: exception is thrown. The error handler function on the server side will simply print the warning.

When I was writing unit tests for the new callback functionality I realized that IfaceMgr::openSockets4 has NO unit tests. Unit testing this function is not trivial because function's behavior depends on the network interfaces present. I created the test which removes any existing interfaces and configures the IfaceMgr with a set of well known fake interfaces. In order to use fictitious interfaces I had to inject the customized code for opening the sockets on them etc. I used a class derived from isc::dhcp::PktFilter to provide a no-op version of openSocket4.

Note, that the PktFilter-derived objects are used for DHCPv4. Therefore, it is impossible to use the same testing approach for openSockets6 at the moment. I submitted this ticket: http://bind10.isc.org/ticket/3251 to resolve the problem for DHCPv6.

Another new ticket: http://bind10.isc.org/ticket/3252 has been submitted to extend the IfaceMgr::openSockets6 function to not throw an exception when failed to open a socket.

Proposed ChangeLog entry for #2765:

XXX.	[func]		marcin,dclink
	b10-dhcp4: If server failed to open a socket on one interface it
	will log a warning and continue to open sockets on other interfaces.
	The warning message is communicated from the libdhcp++ via the
	error handler function supplied by the DHCPv4 server. Thanks to
	David Carlier for providing a patch.
	(Trac #2765, git abc)

comment:10 Changed 6 years ago by tmark

  • Owner changed from UnAssigned to tmark

comment:11 follow-up: Changed 6 years ago by tmark

  • Owner changed from tmark to marcin

This was tricky bit of work. The use of the error handler is a good approach.


pkt_filter.h and pkt_filter.cc

Please check your line width. Quite a few comment lines that spill over 80.


pkt_filter.h

openFallbackSocket() has no @throw.


pkt_filter_inet.h -

Commentary for openSocket() has no @throw. Neither does receive(), nor send().


pkt_filter.cc

openFallbackSocket()

You may wish to consider adding use of strerror(errno) as was done recently in
iface_mgr.cc. As it stands, if there are errors there is no OS level reason
for the failure.


pkt_filter_lpf.cc

In openSocket(), Why do this?

    SocketInfo sock_desc(addr, port, sock, fallback);
    return (sock_desc);

instead of:

    return (SocketInfo sock_desc(addr, port, sock, fallback));

Doing it your way, creates a copy locally only to then copies it onto the stack
to return it; instead of making it directly on the stack? It is also a bit
clearer in my mind, that you intend to return a copy on the stack if you use
the second form.


pkt_filter_lpf.cc

In receive(), you loop through reading the fallback socket:

    do {
        datalen = recv(socket_info.fallbackfd_, raw_buf, sizeof(raw_buf), 0);
    } while (datalen >= 0);

First, I think it should be > not >=.

Upon successful completion, recv() shall return the length of the message in bytes. If no messages are available to be received and the peer has performed an orderly shutdown, recv() shall return 0. Otherwise, -1 shall be returned and errno set to indicate the error.

You should only try it again if it returns greater than zero.

Also, is there any chance we could get stuck here if something were spewing
data at that socket? Should we consider some sort of throttle mechanism?


iface_mgr.h

handleSocketConfigError
The brief description says "during operation on the socket". It is not clear,
from the commentary that this handler is only for certain socket operations.
I think it would be good to list the conditions under which it will be invoked.
Certainly, it should state that it is not called during read, write, or close
operations.


iface_mgr.h

openSockets4

The parameter commentary for error_handler states that "the function will
continue when the callback returns control.". It does not say what it will
continue doing. There is no detailed discussion for what the function does.
You should add a descriptive paragraph for the function. It should perhaps
state the ramifications of the caller's handler doing something like throwing.


iface_mgr_unittest.cc

checkPacketFilterLPFSocket - It would help if this test had some commentary
describing what it tests.

AT 1137, you have this:

    // Surprisingly we managed to open another socket. We have to close it
    // to prevent resource leak.
    if (socket2 >= 0) {
        close(socket2);
    }

Shouldn't this be a test failure? The whole intent is to not be able to open
a second socket, shouldn't you do something like:

 ADD_FAILURE() << "we opened two!" ?

While not serious, running the unit the disabled tests as root with valgrind,
it warns numerous times about attempting to close invalid file descriptor.
Here's a snippet:

==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.socket4 (5 ms)
[ RUN      ] IfaceMgrTest.openSockets4
==17891== Warning: invalid file descriptor -1 in syscall close()
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSockets4 (13 ms)
[ RUN      ] IfaceMgrTest.openSockets4IfaceDown
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSockets4IfaceDown (8 ms)
[ RUN      ] IfaceMgrTest.openSockets4IfaceInactive
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSockets4IfaceInactive (9 ms)
[ RUN      ] IfaceMgrTest.openSockets4NoErrorHandler
==17891== Warning: invalid file descriptor -1 in syscall close()
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSockets4NoErrorHandler (10 ms)
[ RUN      ] IfaceMgrTest.openSocket4ErrorHandler
==17891== Warning: invalid file descriptor -1 in syscall close()
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSocket4ErrorHandler (9 ms)

You should probably fix this.


The code passed cppcheck, and the unit tests were run under both OS-X and Debian6.

comment:12 in reply to: ↑ 11 Changed 6 years ago by marcin

  • Owner changed from marcin to tmark

Replying to tmark:

This was tricky bit of work. The use of the error handler is a good approach.

Thanks.

I apologize but I had one outstanding commit which adds a new paragraph to the developer's guide. I forgot to push this commit prior to the review that you have done. I pushed it now.


pkt_filter.h and pkt_filter.cc

Please check your line width. Quite a few comment lines that spill over 80.

Fixed.


pkt_filter.h

openFallbackSocket() has no @throw.

Added


pkt_filter_inet.h -

Commentary for openSocket() has no @throw. Neither does receive(), nor send().

Added.
One comment: in some cases it is a rat hole to add throw tags for all types of exceptions being thrown because they very much depend on the exceptions thrown by functions they call. Even worse, exceptions thrown by functions being called from the particular function may change and all dependent throw tags would have to be updated accordingly.


pkt_filter.cc

openFallbackSocket()

You may wish to consider adding use of strerror(errno) as was done recently in
iface_mgr.cc. As it stands, if there are errors there is no OS level reason
for the failure.

I consider it a good idea. So, I added it.


pkt_filter_lpf.cc

In openSocket(), Why do this?

    SocketInfo sock_desc(addr, port, sock, fallback);
    return (sock_desc);

instead of:

    return (SocketInfo sock_desc(addr, port, sock, fallback));

Doing it your way, creates a copy locally only to then copies it onto the stack
to return it; instead of making it directly on the stack? It is also a bit
clearer in my mind, that you intend to return a copy on the stack if you use
the second form.

First, I think you meant this:

     return (SocketInfo(addr, port, sock, fallback));

Secondly, I don't quite understand your point. In both cases the SocketInfo? structure is created on the stack - they are local variables. The only difference is that in the latter case, we create an anonymous variable which may be used in the point where it has been created and immediately destroyed - doesn't live in the function scope like in the former case. Before it is destroyed it is copied and returned. Same in the former case: the variable is created, copied and destroyed. They are even destroyed in the same time, because the function hits ''return'' immediately after creation of the named variable.

In addition, both versions produce exactly the same assembler code.


pkt_filter_lpf.cc

In receive(), you loop through reading the fallback socket:

    do {
        datalen = recv(socket_info.fallbackfd_, raw_buf, sizeof(raw_buf), 0);
    } while (datalen >= 0);

First, I think it should be > not >=.

Upon successful completion, recv() shall return the length of the message in bytes. If no messages are available to be received and the peer has performed an orderly shutdown, recv() shall return 0. Otherwise, -1 shall be returned and errno set to indicate the error.

You should only try it again if it returns greater than zero.

I was thinking that I should return on error, and 0 is not an indication of error. But yes, there is rather no chance that I will get anymore data from this socket if it returned 0. So I changed that as requested.

Also, is there any chance we could get stuck here if something were spewing
data at that socket? Should we consider some sort of throttle mechanism?

I was considering this issue. Ultimately, I decided not to implement it, because I don't know if such a problem even has any chance to exist. If it does, it can be a problem of any (not only fallback) socket we open. On the other hand, during the normal operation both primary and fallback socket should be in sync - they are configured to listen/filter the same traffic. So, I really don't think it is a problem but I added a todo to reconsider this design if needed.


iface_mgr.h

handleSocketConfigError
The brief description says "during operation on the socket". It is not clear,
from the commentary that this handler is only for certain socket operations.
I think it would be good to list the conditions under which it will be invoked.
Certainly, it should state that it is not called during read, write, or close
operations.

Fixed.


iface_mgr.h

openSockets4

The parameter commentary for error_handler states that "the function will
continue when the callback returns control.". It does not say what it will
continue doing. There is no detailed discussion for what the function does.
You should add a descriptive paragraph for the function. It should perhaps
state the ramifications of the caller's handler doing something like throwing.

Added quite extensive description.


iface_mgr_unittest.cc

checkPacketFilterLPFSocket - It would help if this test had some commentary
describing what it tests.

I added test description.

AT 1137, you have this:

    // Surprisingly we managed to open another socket. We have to close it
    // to prevent resource leak.
    if (socket2 >= 0) {
        close(socket2);
    }

Shouldn't this be a test failure? The whole intent is to not be able to open
a second socket, shouldn't you do something like:

 ADD_FAILURE() << "we opened two!" ?

I added ADD_FAILURE but it doesn't change much. Note that the following part of the test:

    // The socket is open and bound. Another attempt to open socket and
    // bind to the same address and port should result in an exception.
    EXPECT_THROW(
        socket2 = iface_mgr2->openSocket(LOOPBACK, loAddr,
                                         DHCP4_SERVER_PORT + 10000),
        isc::dhcp::SocketConfigError
    );

already causes the test to fail if the socket opens. The other part is simply to perform a cleanup after the test if it opens the socket for some reason.


While not serious, running the unit the disabled tests as root with valgrind,
it warns numerous times about attempting to close invalid file descriptor.
Here's a snippet:

==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.socket4 (5 ms)
[ RUN      ] IfaceMgrTest.openSockets4
==17891== Warning: invalid file descriptor -1 in syscall close()
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSockets4 (13 ms)
[ RUN      ] IfaceMgrTest.openSockets4IfaceDown
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSockets4IfaceDown (8 ms)
[ RUN      ] IfaceMgrTest.openSockets4IfaceInactive
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSockets4IfaceInactive (9 ms)
[ RUN      ] IfaceMgrTest.openSockets4NoErrorHandler
==17891== Warning: invalid file descriptor -1 in syscall close()
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSockets4NoErrorHandler (10 ms)
[ RUN      ] IfaceMgrTest.openSocket4ErrorHandler
==17891== Warning: invalid file descriptor -1 in syscall close()
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSocket4ErrorHandler (9 ms)

You should probably fix this.

You deserve to take position in the QA team. :) Seriously, thanks for that. I fixed it - the IfaceMgr had a bug which caused the fallback socket to be closed if its descriptor was non-zero (including negative value).


The code passed cppcheck, and the unit tests were run under both OS-X and Debian6.

You really do deserve being in QA team ;)

comment:13 Changed 6 years ago by tmark

  • Owner changed from tmark to marcin

Replying to marcin:

Replying to tmark:

This was tricky bit of work. The use of the error handler is a good approach.

Thanks.

I apologize but I had one outstanding commit which adds a new paragraph to the developer's guide. I forgot to push this commit prior to the review that you have done. I pushed it now.


pkt_filter.h and pkt_filter.cc

Please check your line width. Quite a few comment lines that spill over 80.

Fixed.


pkt_filter.h

openFallbackSocket() has no @throw.

Added


pkt_filter_inet.h -

Commentary for openSocket() has no @throw. Neither does receive(), nor send().

Added.
One comment: in some cases it is a rat hole to add throw tags for all types of exceptions being thrown because they very much depend on the exceptions thrown by functions they call. Even worse, exceptions thrown by functions being called from the particular function may change and all dependent throw tags would have to be updated accordingly.

Agreed, there is a practical limit, however I think it is important to state
that a method may result in a throw, even indirectly. I believe I have used commentary to the effect "@throw Does not throw directly, but underlying functions may.".


pkt_filter.cc

openFallbackSocket()

You may wish to consider adding use of strerror(errno) as was done recently in
iface_mgr.cc. As it stands, if there are errors there is no OS level reason
for the failure.

I consider it a good idea. So, I added it.

Marcin, this is one potential issue in the manner you're accessing errno.
When you call close() on the socket, this may set errno to reflect the outcome
of the close. If that doesn't, the code produced by the logger macro might.
The most reliably thing to do is to grab the error message immediately following the failed function:

    if (bind(sock, reinterpret_cast<struct sockaddr*>(&addr4),
             sizeof(addr4)) < 0) {
        // Remember to close the socket if we failed to bind it.
        const char* errstr = strerror(errno);
        close(sock);
        isc_throw(SocketConfigError, "failed to bind fallback socket to"
                  " address " << addr.toText() << ", port " << port
                  << ", reason: " << errstr
                  << " - is another DHCP server running?");
    }

pkt_filter_lpf.cc

In openSocket(), Why do this?

    SocketInfo sock_desc(addr, port, sock, fallback);
    return (sock_desc);

instead of:

    return (SocketInfo sock_desc(addr, port, sock, fallback));

Doing it your way, creates a copy locally only to then copies it onto the stack
to return it; instead of making it directly on the stack? It is also a bit
clearer in my mind, that you intend to return a copy on the stack if you use
the second form.

First, I think you meant this:

     return (SocketInfo(addr, port, sock, fallback));

Yes that is what I meant.

Secondly, I don't quite understand your point. In both cases the SocketInfo? structure is created on the stack - they are local variables. The only difference is that in the latter case, we create an anonymous variable which may be used in the point where it has been created and immediately destroyed - doesn't live in the function scope like in the former case. Before it is destroyed it is copied and returned. Same in the former case: the variable is created, copied and destroyed. They are even destroyed in the same time, because the function hits ''return'' immediately after creation of the named variable.

In addition, both versions produce exactly the same assembler code.

If the assembler code is no different than I don't suppose there is any issue.


pkt_filter_lpf.cc

In receive(), you loop through reading the fallback socket:

    do {
        datalen = recv(socket_info.fallbackfd_, raw_buf, sizeof(raw_buf), 0);
    } while (datalen >= 0);

First, I think it should be > not >=.

Upon successful completion, recv() shall return the length of the message in bytes. If no messages are available to be received and the peer has performed an orderly shutdown, recv() shall return 0. Otherwise, -1 shall be returned and errno set to indicate the error.

You should only try it again if it returns greater than zero.

I was thinking that I should return on error, and 0 is not an indication of error. But yes, there is rather no chance that I will get anymore data from this socket if it returned 0. So I changed that as requested.

Also, is there any chance we could get stuck here if something were spewing
data at that socket? Should we consider some sort of throttle mechanism?

I was considering this issue. Ultimately, I decided not to implement it, because I don't know if such a problem even has any chance to exist. If it does, it can be a problem of any (not only fallback) socket we open. On the other hand, during the normal operation both primary and fallback socket should be in sync - they are configured to listen/filter the same traffic. So, I really don't think it is a problem but I added a todo to reconsider this design if needed.

I am good with this.


iface_mgr.h

handleSocketConfigError
The brief description says "during operation on the socket". It is not clear,
from the commentary that this handler is only for certain socket operations.
I think it would be good to list the conditions under which it will be invoked.
Certainly, it should state that it is not called during read, write, or close
operations.

Fixed.


iface_mgr.h

openSockets4

The parameter commentary for error_handler states that "the function will
continue when the callback returns control.". It does not say what it will
continue doing. There is no detailed discussion for what the function does.
You should add a descriptive paragraph for the function. It should perhaps
state the ramifications of the caller's handler doing something like throwing.

Added quite extensive description.

The description is very good.


iface_mgr_unittest.cc

checkPacketFilterLPFSocket - It would help if this test had some commentary
describing what it tests.

I added test description.

AT 1137, you have this:

    // Surprisingly we managed to open another socket. We have to close it
    // to prevent resource leak.
    if (socket2 >= 0) {
        close(socket2);
    }

Shouldn't this be a test failure? The whole intent is to not be able to open
a second socket, shouldn't you do something like:

 ADD_FAILURE() << "we opened two!" ?

I added ADD_FAILURE but it doesn't change much. Note that the following part of the test:

    // The socket is open and bound. Another attempt to open socket and
    // bind to the same address and port should result in an exception.
    EXPECT_THROW(
        socket2 = iface_mgr2->openSocket(LOOPBACK, loAddr,
                                         DHCP4_SERVER_PORT + 10000),
        isc::dhcp::SocketConfigError
    );

already causes the test to fail if the socket opens. The other part is simply to perform a cleanup after the test if it opens the socket for some reason.


While not serious, running the unit the disabled tests as root with valgrind,
it warns numerous times about attempting to close invalid file descriptor.
Here's a snippet:

==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.socket4 (5 ms)
[ RUN      ] IfaceMgrTest.openSockets4
==17891== Warning: invalid file descriptor -1 in syscall close()
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSockets4 (13 ms)
[ RUN      ] IfaceMgrTest.openSockets4IfaceDown
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSockets4IfaceDown (8 ms)
[ RUN      ] IfaceMgrTest.openSockets4IfaceInactive
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSockets4IfaceInactive (9 ms)
[ RUN      ] IfaceMgrTest.openSockets4NoErrorHandler
==17891== Warning: invalid file descriptor -1 in syscall close()
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSockets4NoErrorHandler (10 ms)
[ RUN      ] IfaceMgrTest.openSocket4ErrorHandler
==17891== Warning: invalid file descriptor -1 in syscall close()
==17891== Warning: invalid file descriptor -1 in syscall close()
[       OK ] IfaceMgrTest.openSocket4ErrorHandler (9 ms)

You should probably fix this.

You deserve to take position in the QA team. :) Seriously, thanks for that. I fixed it - the IfaceMgr had a bug which caused the fallback socket to be closed if its descriptor was non-zero (including negative value).

Gee, thanks... I think....


The code passed cppcheck, and the unit tests were run under both OS-X and Debian6.

You really do deserve being in QA team ;)

You trying to get rid of me?

I do not need to see this ticket again.

comment:14 Changed 6 years ago by marcin

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged with commit f49c4b8942cdbafb85414a1925ff6ca1d381f498

Note: See TracTickets for help on using tickets.