Opened 10 years ago

Closed 9 years ago

#183 closed defect (fixed)

UNIX domain socket for msgq

Reported by: larissas Owned by: jelte
Priority: medium Milestone: 04. 2nd Incremental Release: Early Adopters
Component: message-library Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: Add Hours to Ticket:
Total Hours: Internal?:

Description

(task 39 - 1 week estimate)

Subtickets

Attachments (2)

session.diff (1.2 KB) - added by jinmei 9 years ago.
session-defarg.diff (3.6 KB) - added by jinmei 9 years ago.

Download all attachments as: .zip

Change History (21)

comment:1 Changed 10 years ago by jelte

  • Status changed from new to accepted

just as a note, msgq tests committed in r1900, main part of the code committed in r1901

i had to change lib/python/cc/session.py to lib/python/cc/session.py.in and add a file lib/cc/session_config.h.pre.in for the configure-time defaults. Methinks we should be able to come up with a more direct way to pass stuff like this...

two open items right now:

  • run_X scripts should set a local file in BIND10_MSGQ_SOCKET_FILE (so if installed dirs don't exist run from source fails right now)
  • the c++ boost::asio version has no 'manual override', only environment variable or default. (which is better than what it was before, only hardcoded port 9912), fixing this would mean an API change somewhere

comment:2 Changed 9 years ago by jelte

  • Owner changed from jelte to UnAssigned
  • Status changed from accepted to reviewing

Ok, addressed those two issues as well, I think this is ready for review. (r1913, r1916 and r1917 are the other relevant revisions, full diff from branch point is svn diff -r 1899:HEAD)

comment:3 Changed 9 years ago by jreed

FWIW: I ran this and it worked for me. I think the socket file maybe should be located under @localstatedir@/run/@PACKAGE@/.

comment:4 Changed 9 years ago by jreed

Note that after this okayed, I can update the documentation accordingly.

comment:5 follow-up: Changed 9 years ago by jinmei

failed to compile on my machine:

session.cc: In member function ‘virtual void isc::cc::SocketSession::establish(const char*)’:
session.cc:226: error: aggregate ‘isc::cc::sockaddr_un sun’ has incomplete type and cannot be defined

comment:6 Changed 9 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:7 in reply to: ↑ 5 Changed 9 years ago by jinmei

Replying to jinmei:

failed to compile on my machine:

session.cc: In member function ‘virtual void isc::cc::SocketSession::establish(const char*)’:
session.cc:226: error: aggregate ‘isc::cc::sockaddr_un sun’ has incomplete type and cannot be defined

I needed to include <sys/un.h>. This is so trivial (unless we need to autoconfigure the existence of it), so I directly fixed it on the branch. r1922.

comment:8 Changed 9 years ago by jinmei

A meta level comment: don't we have to worry about portability of UNIX domain sockets? I actually don't remember why we chose to change the transport from TCP to UNIX domain (to which I don't necessarily object), but that won't work for Windows (do it?), and there may be other UNIX-like systems that cause portability problems with UNIX domain sockets.

Specific comments:

  • about r1900
    • why add the 'pass' to the end of some tests?
    • test_open_socket_default() I think if you delete BIND10_MSGQ_SOCKET_FILE from os.environ you should recover it within the test. Otherwise this test case overrides a global state of the test process and may affect other test results.
  • about r1901 (bind10.py.in): is this necessary? __init__ did the same thing, and self.msgq_socket_file doesn't seem to be modified after that.
        def startup(self):
            if self.msgq_socket_file is not None:
                 c_channel_env["BIND10_MSGQ_SOCKET_FILE"] = self.msgq_socket_file 
    
    Also, the assumption of the self.msgq_socket_file type is inconsistent with __init__:
            if self.msgq_socket_file is not None:
                os.environ['BIND10_MSGQ_SOCKET_FILE'] = str(self.msgq_socket_file)
    
    That is, in __init__ self.msgq_socket_file is assumed to be (possibly) non string.
  • msgq.py.in.setup_listener Why are the following lines commented out? Clearly we don't need bind('127.0.0.1'), and I also don't think we need SO_REUSEADDR (it's only necessary for dealing with TCP's TIME_WAIT periods after closing a socket at the server side). So I basically we can simply purge these lines. If there's a reason to keep them as comments, please also explain that as comments.
        #self.listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        #self.listen_socket.bind(("127.0.0.1", self.c_channel_port))
  • lib/cc/session.cc:SocketSession::establish() shouldn't we need something equvalent to this?
    #ifdef HAVE_SIN_LEN
        sin.sin_len = sizeof(struct sockaddr_in); 
    #endif
    
    BSDs have sun_len member in sockaddr_un{}.
  • lib/cc/session.cc:SocketSession::establish() I don't think it's a good practice to blindly null-terminate sun_path:
        sun.sun_family = AF_UNIX;
        strncpy(sun.sun_path, socket_file, sizeof(sun.sun_path) - 1);
        sun.sun_path[sizeof(sun.sun_path) - 1] = '\0';
    
    IMO if socket_file is too long we should explicitly fail (for example, the auto-truncated file may exist, then something surprising will happen). In any case please write a test to cover this case.
  • lib/python/isc/cc/session.py.in:Session.__init__ (not specific to this patch, but) shouldn't we close the socket in these cases?
                if not env:
                    raise ProtocolError("Could not get local name")
                self._lname = msg["lname"]
                if not self._lname:
                    raise ProtocolError("Could not get local name")
            except socket.error as se:
                    raise SessionError(se)
    
    If so, please also write a test case to cover the bug (maybe as a separate ticket).
  • about r1916 Do we have a case that we should use the default parameter of "socket_file"? In general, I think we should avoid the use of default parameters unless there's a strong reason for that (because such a implicit default behavior often misleads the code reader, while it's sometimes convenient).

comment:9 Changed 9 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:10 follow-ups: Changed 9 years ago by jelte

  • Owner changed from jelte to jinmei

Ok i have addressed almost all of your issues, except two, which I'm not sure about, see below. But first some comments about what i did in r1933;

  • Those pass statements shouldn't have been there, nor should the commented-out lines, so i removed them.
  • I have added a session_unittest file, which right now *only* tests for that big filename exception (which is now thrown instead of cutting the file), i think we should make a seperate task for writing tests for this (since there didn't appear to be any, and most of the other tests are imho out of scope for this ticket).
  • Re-add the environment variable again (and added a try-catch block to the tests, if prefix/install dirs aren't available the test would have failed

For the not-larger-than-sun_path check, do you think we should make the test so that it checks for the boundary (which differs per system) or leave it like this, at something that is certainly too big?

And the two other things;

  • SIN/SUN_LEN, I could add it, i was looking at configure and i think the way it is in trunk doesn't work (is SIN_LEN defined on your system? configure checks for another field than the one used in that ifdef statement...), if it is needed i can make a check for .sun_len and set it
  • Do you propose to not make a default value and explicitly use establish(NULL) in auth? (well actually, we should probably add a -m <path> option to auth in the first place. I left in the default value so as not to break compatibility, and changing the interface was hard enough as it was, with boost.asio and the pimpl idiom there :p

But I don't have a strong opinion, and have no problem removing the = NULL there.

comment:11 in reply to: ↑ 10 ; follow-up: Changed 9 years ago by jinmei

Replying to jelte:

Ok i have addressed almost all of your issues, except two, which I'm not sure about, see below. But first some comments about what i did in r1933;

  • Those pass statements shouldn't have been there, nor should the commented-out lines, so i removed them.
  • I have added a session_unittest file, which right now *only* tests for that big filename exception (which is now thrown instead of cutting the file), i think we should make a seperate task for writing tests for this (since there didn't appear to be any, and most of the other tests are imho out of scope for this ticket).
  • Re-add the environment variable again (and added a try-catch block to the tests, if prefix/install dirs aren't available the test would have failed

So far, Looks okay (modulo r1940).

For the not-larger-than-sun_path check, do you think we should make the test so that it checks for the boundary (which differs per system) or leave it like this, at something that is certainly too big?

I'd check the boundary, both the okay and NG cases (although I can live with "reasonably too long" a name if that's difficult). As for the per-system difference, can't we use sizeof(sockaddr_un.sun_path)?

now looking at "the other two"...

Changed 9 years ago by jinmei

comment:12 in reply to: ↑ 11 Changed 9 years ago by jinmei

Replying to jinmei:

Replying to jelte:

Ok i have addressed almost all of your issues, except two, which I'm not sure about, see below. But first some comments about what i did in r1933;

  • Those pass statements shouldn't have been there, nor should the commented-out lines, so i removed them.
  • I have added a session_unittest file, which right now *only* tests for that big filename exception (which is now thrown instead of cutting the file), i think we should make a seperate task for writing tests for this (since there didn't appear to be any, and most of the other tests are imho out of scope for this ticket).
  • Re-add the environment variable again (and added a try-catch block to the tests, if prefix/install dirs aren't available the test would have failed

So far, Looks okay (modulo r1940).

Actually, it wasn't okay:-)

This code is not exception safe:

    if (strlen(socket_file) >= sizeof(sun.sun_path)) {
        isc_throw(SessionError, "Unable to connect to message queue; socket file path too long");
    }

We could close it, but I think it's better to validate the input even before opening the socket. And if we do so, I'd also move to the other initialization part of sockaddr_sun to the top, so that the reader can understand the set of validation-then-initialization more easily.

And for that matter,

  • I'd fold the isc_throw line (I know it's a personal preference, but I believe it's a widely adopted style to keep lines short for readability)
  • I'd include the offending file name in the exception message (although it may not be a good idea in this case because it might be absurdly long - I'd leave the decision to you)

Applying all of these, it would be the attached patch.

comment:13 in reply to: ↑ 10 ; follow-up: Changed 9 years ago by jinmei

Replying to jelte:

And the two other things;

  • SIN/SUN_LEN, I could add it, i was looking at configure and i think the way it is in trunk doesn't work (is SIN_LEN defined on your system? configure checks for another field than the one used in that ifdef statement...), if it is needed i can make a check for .sun_len and set it

I've looked at configure.ac. I think what we should do is to just test the (non)existence of sa_len and apply the result regardless of the specific AF (AF_INET, AF_INET6, AF_UNIX, or even AF_INET8). We could borrow the check of BIND 9, for example:

AC_MSG_CHECKING(for sa_len in struct sockaddr)
AC_TRY_COMPILE([
#include <sys/types.h>
#include <sys/socket.h>],
[struct sockaddr sa; sa.sa_len = 0; return (0);],
	[AC_MSG_RESULT(yes)
	ISC_PLATFORM_HAVESALEN="#define ISC_PLATFORM_HAVESALEN 1"
	LWRES_PLATFORM_HAVESALEN="#define LWRES_PLATFORM_HAVESALEN 1"],
	[AC_MSG_RESULT(no)
	ISC_PLATFORM_HAVESALEN="#undef ISC_PLATFORM_HAVESALEN"
	LWRES_PLATFORM_HAVESALEN="#undef LWRES_PLATFORM_HAVESALEN"])
AC_SUBST(ISC_PLATFORM_HAVESALEN)

In fact, the kernel assume the same organization for the first several fields of sockaddr_xxx, so it should be safe.

(but if we complete the migration to ASIO, we won't have to be bothered with such a low level portability issues anyway...)

  • Do you propose to not make a default value and explicitly use establish(NULL) in auth? (well actually, we should probably add a -m <path> option to auth in the first place. I left in the default value so as not to break compatibility, and changing the interface was hard enough as it was, with boost.asio and the pimpl idiom there :p

I'm not sure how pimpl is related to this issue, but anyway, after looking at the code more closely, I've noticed we can (and IMO should) cnetralize configuration of socket_file in Session::establish(), not in the derived implementation classes. Right each derived class has now the following duplicate code fragment:

    if (socket_file != NULL) {
        socket_file = getenv("BIND10_MSGQ_SOCKET_FILE");
    }
    if (socket_file != NULL) {
        socket_file = BIND10_MSGQ_SOCKET_FILE;
    }

This is a common pattern where we need refactoring.

As an additional bonus, by doing so we can ensure that we pass non NULL constant string to derived implementation classes, so we can use a const reference (then the reader don't have to worry about the possible case it's accidentally null).

I'm attaching a suggested patch.

Changed 9 years ago by jinmei

comment:14 in reply to: ↑ 13 Changed 9 years ago by jinmei

  • Owner changed from jinmei to jelte

And one final small thing: we should catch references by reference:

    } catch (boost::system::system_error se) {
        isc_throw(SessionError, se.what());
    }

I believe it's non controversial and trivial, so I've directly fixed it in the branch (r1943).

Giving the ticket back to Jelte.

comment:15 follow-up: Changed 9 years ago by jelte

applied your patches (with a minor cleanup) in r1949. Also changed IPPROTO_TCP to 0, it fails to setup the socket on my system with the macro. Again, I expect this part to be removed once we have non-boost ASIO.

So that only leaves SUN_LEN, right?

comment:16 Changed 9 years ago by jelte

looks good, tried it and it no longer fails for me.

One note, I see you added a constant CONFIG_MODULE_NAME (good! :) ), there is still one place where "config" is directly used, in the creation of the fake module in bindctl-source.py(.in)

comment:17 Changed 9 years ago by jelte

ignore last comment, that one was supposed to go to ticket #201

comment:18 in reply to: ↑ 15 Changed 9 years ago by jinmei

Replying to jelte:

applied your patches (with a minor cleanup) in r1949. Also changed IPPROTO_TCP to 0, it fails to setup the socket on my system with the macro.

Changing 0 looks correct (I should have noticed it in my review). I don't understand, however, why the problem is uncovered with my patch; IPPROTO_TCP was used in your original branch code...

Again, I expect this part to be removed once we have non-boost ASIO.

Right.

So that only leaves SUN_LEN, right?

I think so.

comment:19 Changed 9 years ago by jelte

  • Resolution set to fixed
  • Status changed from reviewing to closed

Re-added sa_len configure check. Merged to trunk in r2009 and r2010.

Removal of custom socket code left to a later ticket.

Closing this one.

Note: See TracTickets for help on using tickets.