Opened 6 years ago

Closed 6 years ago

#3437 closed defect (fixed)

kea6 running on FreeBSD not responding on messages/ not getting message

Reported by: wlodekwencel Owned by: marcin
Priority: medium Milestone: Kea0.9
Component: dhcp6 Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 1
Total Hours: 44 Internal?: no

Description

Kea6 running on FreeBSD 9.2 do not responding on messages (solicit). There is no sign of incoming message in logs (no firewall enabled, no external application that could cause dropping message was enabled).

Kea6 configs:

config set Dhcp6/renew-timer 1000
config set Dhcp6/rebind-timer 2000
config set Dhcp6/preferred-lifetime 3000
config set Dhcp6/valid-lifetime 4000
config add Dhcp6/subnet6
config set Dhcp6/subnet6[0]/subnet "3000::/64"
config set Dhcp6/subnet6[0]/pool [ "3000::1-3000::ff" ]
config set Dhcp6/subnet6[0]/interface "em1"
config set Dhcp6/interfaces[0] "em1"
config set Dhcp6/renew-timer 1000
config set Dhcp6/rebind-timer 2000
config set Dhcp6/preferred-lifetime 3000
config set Dhcp6/valid-lifetime 4000
config add Dhcp6/subnet6
config set Dhcp6/subnet6[0]/subnet "3000::/64"
config set Dhcp6/subnet6[0]/pool [ "3000::1-3000::ff" ]
config set Dhcp6/subnet6[0]/interface "em1"
config set Dhcp6/interfaces[0] "em1"
config set Dhcp6/renew-timer 1000
config set Dhcp6/rebind-timer 2000
config set Dhcp6/preferred-lifetime 3000
config set Dhcp6/valid-lifetime 4000
config add Dhcp6/subnet6
config set Dhcp6/subnet6[0]/subnet "3000::/64"
config set Dhcp6/subnet6[0]/pool [ "3000::1-3000::ff" ]

included logs from bindctl and bind10 (nohup)

patch provided by marcin is fixing this issue:

diff --git a/src/lib/dhcp/iface_mgr.cc b/src/lib/dhcp/iface_mgr.cc
index e6aea3d..3b3092f 100644
--- a/src/lib/dhcp/iface_mgr.cc
+++ b/src/lib/dhcp/iface_mgr.cc
@@ -952,7 +952,6 @@ Pkt6Ptr IfaceMgr::receive6(uint32_t timeout_sec, uint32_t timeout_usec /* = 0 */
 
             // Only deal with IPv6 addresses.
             if (s->addr_.isV6()) {
-
                 // Add this socket to listening set
                 FD_SET(s->sockfd_, &sockets);
                 if (maxfd < s->sockfd_) {
diff --git a/src/lib/dhcp/iface_mgr_bsd.cc b/src/lib/dhcp/iface_mgr_bsd.cc
index 7a01228..fb3eeda 100644
--- a/src/lib/dhcp/iface_mgr_bsd.cc
+++ b/src/lib/dhcp/iface_mgr_bsd.cc
@@ -155,11 +155,13 @@ IfaceMgr::openMulticastSocket(Iface& iface,
                               const isc::asiolink::IOAddress& addr,
                               const uint16_t port,
                               IfaceMgrErrorMsgCallback error_handler) {
+    int sock;
     try {
+
         // This should open a socket, bound it to link-local address
         // and join multicast group.
-        openSocket(iface.getName(), addr, port,
-                   iface.flag_multicast_);
+        sock =  openSocket(iface.getName(), addr, port,
+                           iface.flag_multicast_);
 
     } catch (const Exception& ex) {
         IFACEMGR_ERROR(SocketConfigError, error_handler,
@@ -169,6 +171,26 @@ IfaceMgr::openMulticastSocket(Iface& iface,
         return (false);
 
     }
+
+    if (iface.flag_multicast_) {
+        try {
+            openSocket(iface.getName(),
+                       IOAddress(ALL_DHCP_RELAY_AGENTS_AND_SERVERS),
+                       port);
+        } catch (const Exception& ex) {
+            // An attempt to open and bind a socket to multicast addres
+            // has failed. We have to close the socket we previously
+            // bound to link-local address - this is everything or
+            // nothing strategy.
+            iface.delSocket(sock);
+            IFACEMGR_ERROR(SocketConfigError, error_handler,
+                           "Failed to open multicast socket on"
+                           " interface " << iface.getName()
+                           << ", reason: " << ex.what());
+            return (false);
+        }
+    }
+    // Both sockets have opened successfully.
     return (true);
 }

Subtickets

Attachments (4)

logs (7.7 KB) - added by wlodekwencel 6 years ago.
nohup.out (7.0 KB) - added by wlodekwencel 6 years ago.
bsd.patch (2.3 KB) - added by wlodekwencel 6 years ago.
netbsd5.1x64-make-check-error.log (23.8 KB) - added by tomek 6 years ago.

Download all attachments as: .zip

Change History (19)

Changed 6 years ago by wlodekwencel

Changed 6 years ago by wlodekwencel

Changed 6 years ago by wlodekwencel

comment:1 Changed 6 years ago by wlodekwencel

  • Component changed from Unclassified to dhcp6

comment:2 Changed 6 years ago by tomek

  • Milestone changed from Kea-proposed to Kea0.9
  • Priority changed from medium to low

comment:3 Changed 6 years ago by wlodekwencel

  • Owner set to wlodekwencel
  • Status changed from new to assigned

comment:4 Changed 6 years ago by wlodekwencel

changes causes failing unitests on FreeBSD 9.2

[  FAILED  ] 7 tests, listed below:
[  FAILED  ] IfaceMgrTest.openSockets6LinkLocal
[  FAILED  ] IfaceMgrTest.openSockets6NoLinkLocal
[  FAILED  ] IfaceMgrTest.openSockets6NotMulticast
[  FAILED  ] IfaceMgrTest.openSockets6Unicast
[  FAILED  ] IfaceMgrTest.openSockets6UnicastOnly
[  FAILED  ] IfaceMgrTest.openSockets6IfaceDown
[  FAILED  ] IfaceMgrTest.openSockets6IfaceInactive

 7 FAILED TESTS
  YOU HAVE 3 DISABLED TESTS

FAIL: libdhcp++_unittests
================================
1 of 1 test failed
Please report to kea-dev@isc.org
================================
*** [check-TESTS] Error code 1

Stop in /home/test/jenkins_lab/build/bind10-jenkins/src/lib/dhcp/tests.
*** [check-am] Error code 1

Stop in /home/test/jenkins_lab/build/bind10-jenkins/src/lib/dhcp/tests.
*** [check-recursive] Error code 1

Stop in /home/test/jenkins_lab/build/bind10-jenkins/src/lib/dhcp/tests.
*** [check-recursive] Error code 1

Stop in /home/test/jenkins_lab/build/bind10-jenkins/src/lib/dhcp.
*** [check-recursive] Error code 1

Stop in /home/test/jenkins_lab/build/bind10-jenkins/src/lib.
*** [check-recursive] Error code 1

comment:5 Changed 6 years ago by wlodekwencel

  • Owner changed from wlodekwencel to UnAssigned

comment:6 Changed 6 years ago by tomek

  • Priority changed from low to medium

comment:7 Changed 6 years ago by marcin

  • Owner changed from UnAssigned to marcin
  • Status changed from assigned to accepted

comment:8 Changed 6 years ago by marcin

  • Add Hours to Ticket changed from 0 to 27
  • Owner changed from marcin to UnAssigned
  • Status changed from accepted to reviewing
  • Total Hours changed from 0 to 27

Most of the time with this ticket I have spent investigating the issues with portability of the socket API between Linux, FreeBSD, NetBSD and OpenBSD.

The following two problems have been fixed in this ticket:

  1. sin6_scope_id was incorrectly set when socket was being bound to the global unicast address. FreeBSD refused to bind the socket with the sin6_scope_id set to anything other than 0 for global unicast address.
  1. Multicast traffic was not received on BSD systems at all.

The rest of this description is about the second issue.

In order to receive the multicast traffic on Linux we had a Linux-specific code that created two sockets:

  • one bound to link-local address - used to send responses from the server
  • second bound to ff02::1:2 multicast address to receive multicast packets

The first socket was joint to the multicast group but it didn't really help receiving multicast traffic. The multicast traffic was received because there was a second socket bound to ff02::1:2.

On BSD, there was only one socket opened on link-local address it was joint to the multicast group. Joining socket to multicast group is not enough to receive multicast traffic because the fact that the socket is bound to link-local address already seems to make the kernel filter out the packets other than sent to a link-local address. In every documentation available to me it was pointed out that the socket has to be bound to port only (to "any" address) if it is to receive multicast traffic by joining the multicast group.

I tested it on:

  • Debian,
  • FreeBSD10
  • NetBSD6.1
  • OpenBSD5.5

and it worked fine.

However, this approach has an obvious drawback. If we bind the socket to any address on interface we receive whole traffic to this interface on the DHCP port. This includes unicasts. This is not the case when we open two sockets and bind one of them to ff02::1:2 (as we currently do in Linux). Therefore I tried to implement the same mechanism on BSD and it even seems to work on FreeBSD (FreeBSD is fine with binding to multicast address), but unfortunately it doesn't work on NetBSD. I didn't try OpenBSD in that respect but I guess that it wouldn't work too.

So, if we want portability we should probably stick to one socket bound to in6addr_any and joining multicast group because it seems to work on every OS (even including Solaris - based on some docs I read). Note that isc-dhcp does that very same thing - it binds to in6addr_any!

The problem with receiving unicast packets on the socket bound to "any"address has to be resolved and the most straight forward way to do it is to filter out unicast packets when received over socket bound to "any"address in DHCP server (strictly in the libdhcp++). This has some performance implications but I don't think they are that big.

For the time being we are supporting Kea on Linux. So, I didn't want to change the Linux code. I used the approach with "any" address for BSDs and this is what has to be mostly reviewed for this ticket. I tested it on FreeBSD10, NetBSD6.1 and OpenBSD5.5.

We currently have a hybrid solution: different for BSD, different for Linux. That's fine, because we can use the BSD-specific code to test the solution and maybe one day use it for Linux too. I'd really want the IfaceMgr to code to minimally diverge between supported systems. Otherwise, the maintenance is a pain in the back.

comment:9 Changed 6 years ago by marcin

Forgot to add a ChangeLog:

XXX.	[bug]		marcin
	DHCPv6 server can receive messages sent to ff02::1:2 multicast
	address on FreeBSD, NetBSD and OpenBSD. Also, fixed the bug
	whereby the DHCPv6 server failed to bind the socket to
	global unicast address on BSD systems due to invalid scope id
	setting.
	(Trac #3437, git abcd)

comment:10 Changed 6 years ago by tomek

  • Owner changed from UnAssigned to tomek

Changed 6 years ago by tomek

comment:11 follow-up: Changed 6 years ago by tomek

  • Add Hours to Ticket changed from 27 to 12
  • Owner changed from tomek to marcin
  • Total Hours changed from 27 to 39

Code compiled on OpenBSD 5.4 x64, passed make check. b10-dhcp6 was able to receive traffic from Linux client and configure it properly.

Code compiled on NetBSD 5.1 x64, but unit-tests (DHCPv4.sigterm) failed. However, Kea6 did built. b10-dhcp6 was able to receive traffic from Linux client and configure it properly.

Code on FreeBSD10 failed compilation. The exact error is below:

Making all in tests
Making all in util
Making all in .
  CXX      csv_file.lo
In file included from csv_file.cc:15:
In file included from ../../../src/lib/util/csv_file.h:19:
In file included from /usr/local/include/boost/lexical_cast.hpp:166:
/usr/local/include/boost/math/special_functions/fpclassify.hpp:98:29: error: unused parameter 't' [-Werror,-Wunused-parameter]
inline bool is_nan_helper(T t, const boost::true_type&)
                            ^
1 error generated.
*** Error code 1

Stop.
make[5]: stopped in /usr/home/thomson/devel/kea/src/lib/util

This is from the latest ports, downloaded using postsnap. This affects multiple directories. First, I tried to manually add -Wno-error to Makefile.am in src/lib/asiolink, dns, hooks/tests, log/compiler log/tests and util, but eventually gave up and used ./configure --with-werror=no.

After that option, I managed to compile the code ok. Unit-tests passed. I was able to run the Kea6 server and provision linux client.

During my tests, I noticed that doc/examples/kea6/several-subnets.json, the sole example for Kea6, does not have lease database selected. I know that this ticket is not about fixing examples, but can I ask you add appropriate lease-database section? That's very small thing.

--- code review comments ---
pkt_filter_inet6.cc

PktFilterInet6::openSocket()
You forgot to remove std::cout in line 78.

iface_mgr_bsd.cc
When join_multicast is passed to IfaceMgr::openSocket6(), the
info about addr is discarded and :: is used instead. Is there any
information about the interface retained? I'm thinking about a
case where there are two interfaces, e.g. em1 and em2 and the server
is configured to listen only on em1. Will this change cause the server
to listen on em2?


When running on FreeBSD 10, I got the following warnings in the log file:

2014-06-30 18:39:18.141 WARN  [kea.dhcp6/19104] DHCP6_OPEN_SOCKET_FAIL failed to create socket: Failed to open link-local socket on  interface em1: Failed to bind socket 5 to ::/port=547: No error: 0
48
2014-06-30 18:39:18.141 WARN  [kea.dhcp6/19104] DHCP6_OPEN_SOCKET_FAIL failed to create socket: Failed to open link-local socket on  interface em2: Failed to bind socket 5 to ::/port=547: No error: 0

My config has interfaces: [ "em1" ], so I have 2 questions:

  1. Why em2 is ever mentioned?
  2. What does No Error: 0 mean? If that "No" is a short for Number, it should be changed to "Error code". If it means "No error occurred", why is it logged as warning?

Your proposed ChangeLog? entry is almost ok, but I would start it with something like "DHCPv6 component is now usable on BSD systems."


Sorry it took so long, but I don't use my BSD vms too frequently, and they required some non-trivial maintenance. Also, I had to set up FreeBSD10.0 from scratch.

comment:12 in reply to: ↑ 11 Changed 6 years ago by marcin

  • Owner changed from marcin to tomek

Replying to tomek:

Code compiled on OpenBSD 5.4 x64, passed make check. b10-dhcp6 was able to receive traffic from Linux client and configure it properly.

Code compiled on NetBSD 5.1 x64, but unit-tests (DHCPv4.sigterm) failed. However, Kea6 did built. b10-dhcp6 was able to receive traffic from Linux client and configure it properly.

This failure is not relevant to this ticket so I am not going to fix it here. The code that this branch is based on is actually older than the code on master branch. Later on I fixed a couple of timing issues in the keactrl tests and b10-dhcp4/6 tests and I am pretty confident that they are fine on master branch.

If you have time, can you please rerun this test on your NetBSD box from the master branch?

Code on FreeBSD10 failed compilation. The exact error is below:

Making all in tests
Making all in util
Making all in .
  CXX      csv_file.lo
In file included from csv_file.cc:15:
In file included from ../../../src/lib/util/csv_file.h:19:
In file included from /usr/local/include/boost/lexical_cast.hpp:166:
/usr/local/include/boost/math/special_functions/fpclassify.hpp:98:29: error: unused parameter 't' [-Werror,-Wunused-parameter]
inline bool is_nan_helper(T t, const boost::true_type&)
                            ^
1 error generated.
*** Error code 1

Stop.
make[5]: stopped in /usr/home/thomson/devel/kea/src/lib/util

This is from the latest ports, downloaded using postsnap. This affects multiple directories. First, I tried to manually add -Wno-error to Makefile.am in src/lib/asiolink, dns, hooks/tests, log/compiler log/tests and util, but eventually gave up and used ./configure --with-werror=no.

After that option, I managed to compile the code ok. Unit-tests passed. I was able to run the Kea6 server and provision linux client.

This is known problem and I am using --without-error option to overcome it on my FreeBSD10 system. Also, since it is not relevant to this work, I will not try to address it here.

During my tests, I noticed that doc/examples/kea6/several-subnets.json, the sole example for Kea6, does not have lease database selected. I know that this ticket is not about fixing examples, but can I ask you add appropriate lease-database section? That's very small thing.

Ok, fixed.

--- code review comments ---
pkt_filter_inet6.cc

PktFilterInet6::openSocket()
You forgot to remove std::cout in line 78.

Thanks. Removed.

iface_mgr_bsd.cc
When join_multicast is passed to IfaceMgr::openSocket6(), the
info about addr is discarded and :: is used instead. Is there any
information about the interface retained? I'm thinking about a
case where there are two interfaces, e.g. em1 and em2 and the server
is configured to listen only on em1. Will this change cause the server
to listen on em2?

Yes, the socket is bound to the unspecified address but the socket being opened is added to the specific interface:

int
IfaceMgr::openSocket6(Iface& iface, const IOAddress& addr, uint16_t port,
                      const bool join_multicast) {
    // On BSD, we bind the socket to in6addr_any and join multicast group
    // to receive multicast traffic. So, if the multicast is requested,
    // replace the address specified by the caller with the "unspecified"
    // address.
    IOAddress actual_address = join_multicast ? IOAddress("::") : addr;
    SocketInfo info = packet_filter6_->openSocket(iface, actual_address, port,
                                                  join_multicast);
    iface.addSocket(info);
    return (info.sockfd_);
}


When running on FreeBSD 10, I got the following warnings in the log file:

2014-06-30 18:39:18.141 WARN  [kea.dhcp6/19104] DHCP6_OPEN_SOCKET_FAIL failed to create socket: Failed to open link-local socket on  interface em1: Failed to bind socket 5 to ::/port=547: No error: 0
48
2014-06-30 18:39:18.141 WARN  [kea.dhcp6/19104] DHCP6_OPEN_SOCKET_FAIL failed to create socket: Failed to open link-local socket on  interface em2: Failed to bind socket 5 to ::/port=547: No error: 0

My config has interfaces: [ "em1" ], so I have 2 questions:

  1. Why em2 is ever mentioned?

The !Dhcpv6Srv constructor opens sockets on all available interfaces before the server is configured. When the configuration is parsed it will close open sockets and reopen those that were configured to be used to listen the DHCP packets on. I think it is unnecessary so I removed the code from the constructor and you should not see an attempt to open socket on the interface which is not specified in the configuration file anymore

  1. What does No Error: 0 mean? If that "No" is a short for Number, it should be changed to "Error code". If it means "No error occurred", why is it logged as warning?

I fixed this in the IfaceMgr. The problem was that when the bind operation failed the close() function was invoked to close other sockets. The close() function succeeded and reset the errno to 0. The updated code collects the error information before calling close() when the errno is still set to the actual error value.


Your proposed ChangeLog? entry is almost ok, but I would start it with something like "DHCPv6 component is now usable on BSD systems."

So something like this:

XXX.	[bug]		marcin
	DHCPv6 server is now usable on FreeBSD, NetBSD and OpenBSD systems.
        It can receive messages sent to ff02::1:2 multicast address. Also,
        fixed the bug whereby the DHCPv6 server failed to bind the socket to
	global unicast address on BSD systems due to invalid scope id
	setting.
	(Trac #3437, git abcd)

?


Sorry it took so long, but I don't use my BSD vms too frequently, and they required some non-trivial maintenance. Also, I had to set up FreeBSD10.0 from scratch.

Thanks for doing that.

comment:13 Changed 6 years ago by marcin

  • Add Hours to Ticket changed from 12 to 4
  • Total Hours changed from 39 to 43

comment:14 Changed 6 years ago by tomek

  • Add Hours to Ticket changed from 4 to 1
  • Owner changed from tomek to marcin
  • Total Hours changed from 43 to 44

Thanks a lot for cleaning up the opening sockets code. I think it was a leftover from times where we didn't have a configuration at all, so the code was trying to listen on all interfaces.

Your changes look fine. I checked that the updated code builds and unit-tests pass on NetBSD (except DHCPv4.sigterm and DHCPv6.sigint, , FreeBSD and OpenBSD. As those signal failures are not related to this ticket, the code is ready for merge.

comment:15 Changed 6 years ago by marcin

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged with commit f4c2fe2fc37a37f1510e138e1f6c4ccd757e1f06

Note: See TracTickets for help on using tickets.