Opened 8 years ago

Closed 8 years ago

#1454 closed task (complete)

Pass UPDATE packets from b10-auth to DDNS module

Reported by: jelte Owned by: vorner
Priority: low Milestone: Sprint-20120207
Component: ddns Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket: DDNS
Estimated Difficulty: 3 Add Hours to Ticket: 0
Total Hours: 12.88 Internal?: no

Description

(again, barring a general receptionist or receptionist-like framework), we need to be able to pass on DDNS UPDATE packets from b10-auth to the ddns module from #1451.

For UDP support, this also depends on #1453.

If there is extra code involved that is similar to the one used in the xfr's, we should put this in a shared python module somewhere.

Subtickets

Change History (21)

comment:1 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 3

comment:2 Changed 8 years ago by jinmei

  • Milestone changed from Year 3 Task Backlog to Next-Sprint-Proposed

comment:3 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120110

comment:4 Changed 8 years ago by jelte

  • Priority changed from major to minor

comment:5 Changed 8 years ago by vorner

  • Owner set to vorner
  • Status changed from new to accepted

comment:6 Changed 8 years ago by vorner

  • Owner changed from vorner to UnAssigned
  • Status changed from accepted to reviewing

Hello

This branch contains the part in DDNS, the one that creates the listening unix socket, accepts connections on it and uses the SocketSessionReceiver? to receive incoming requests. I'd like this to be reviewed now, separately from the other part in b10-auth (which I put into ticket #1539), as they are mostly independent and because further tickets will need to continue on top of this (as this one creates the handle_request method, where the packets should be handled).

Currently, it is built around select in a single thread, mostly for simplicity, avoiding all the hassle with synchronisation. If we need more performance, we'll need to introduce threads on some level (but I'd prefer having worker threads, receive the requests in single thread like now and put them into a workqueue in the handle_request). But that'd be some time in future.

I don't think this needs a Changelog entry, because it does nothing visible, there is no other side of this yet and it has even no public API.

comment:7 Changed 8 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:8 follow-up: Changed 8 years ago by jinmei

It generally looks good. I have some miscellaneous comments, most of
which are minor.

ddns.py

  • not necessarily objecting, but is there any reason for using a different term than "TRACE_BASIC"?
    DBG_ACTIVITY = logger.DBGLVL_TRACE_BASIC
    
    I think it's generally better to use the same terminology for overall consistency unless there's a reason for not doing so.
  • I'd say "messages (or requests)" instead of "packets":
            # List of the sessions where we get the packets
            self._socket_sessions = {}
    
    as "packet" is not technically correct in the case of TCP.
  • "socket sessions" could be misleading because they are actually "session receivers" ("sessions" are things popped out from each receiver):
            self._socket_sessions = {}
    
    I'd name it something like "_socksession_receivers". Same comment applies to some other "session" variables used in the main code and test (I'd rename them to something like "receiver").
  • I think it's more helpful if we log the "from address"
            socket = self._listen_socket.accept()
            fileno = socket.fileno()
            logger.debug(DBG_ACTIVITY, DDNS_NEW_CONN, fileno)
    
    instead of, or in addition to, the file descriptor. I also don't see the need for the temporary "fileno" variable.
  • handle_request: I don't understand the first paragraph of pydoc:
            This is called whenever a new DDNS request comes. Here the magic
            happens, the rest is either subroutines of the magic or the accounting
            around it that calls it from time to time.
    
    what's the "rest"? what's "accounting around it"? What's it in "that calls it"?
  • handle_request: I'd remove eg. (or replace it with "i.e.") from the second paragraph of pydoc:
            It is called with the request being session as received from
            SocketSessionReceiver, eg. tuple
            (socket, local_address, remote_address, data).
    
  • handle_incoming: it seems to me "incoming" is a bit broad. maybe something like "handle_session" or "handle_new_session" is better. or, since this should always be a DDNS request message in practice (except for some unexpected or buggy cases), we may name this "accept_request" (so it can be named differently from "handle_request"). Same sense of comment applies to "DDNS_INCOMING".
  • the temporary "request" variable doesn't seem to be necessary:
                request = session.pop()
                self.handle_request(request)
    
    Or, we could separate these two and move the second line to an "else" block (which should be added)
  • I'd move logging after close() so that even if logging raises an exception it at least completes the cleanup:
                logger.warn(DDNS_DROP_CONN, fileno, se)
                del self._socket_sessions[fileno]
                socket.close()
    
  • run: it seems that the comment about the exception at the beginning of the while loop is losing its sense. It was originally for this specific line:
                        self._cc.check_command(True)
    
    and I guess it meant exceptions from check_command(). Now it's unclear due to the distance between the comment and the line, and even if we move the comment I'm not sure if it makes much sense because we now have more code in the loop.
  • run: is there any reason we don't handle exceptions?
                # TODO: Catch select exceptions, like EINTR
    
    Also, what if self.accept() or self.handle_incoming() raises an exception?

ddns_messages.mes

  • If you didn't you may want to "normalize" the file using tools/reorder_message_file.py
  • DDNS_NEW_CONN: I'd clarify (in the detailed description) that it's a connection on a UNIX domain socket, and should basically be from a b10-auth instance.

ddns_test.py

  • I'd name FakeSession FakeSessionReceiver as (in the underlying API) "session" means a different notion.
  • TestDDNSServer: self.cc_session should better be "private"?
  • test_accept: I'd add a comment about where the magic number of '3' comes from:
            # Now the new socket session receiver is stored in the dict
            self.assertEqual([3], list(self.ddns_server._socket_sessions.keys()))
    
    e.g. "_listen_socket is FakeSocket?(2), whose accept() will return another FakeSocket? with a fake fileno of 3 (2 + 1)"

comment:9 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:10 Changed 8 years ago by jinmei

I just realized I overlooked one important point(s):

  • DDNSServer._listen_socket isn't set in the production (i.e. non test) class
  • More important, I think this ticket should also cover creating the listening UNIX domain socket and actually listening on it

comment:11 in reply to: ↑ 8 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

  • I think it's more helpful if we log the "from address"
            socket = self._listen_socket.accept()
            fileno = socket.fileno()
            logger.debug(DBG_ACTIVITY, DDNS_NEW_CONN, fileno)
    
    instead of, or in addition to, the file descriptor. I also don't see the need for the temporary "fileno" variable.

I don't know if the from address is any more helpful than the fileno, as it is just some random path, but it is true there's no reason not to log it.

  • handle_request: I don't understand the first paragraph of pydoc:
            This is called whenever a new DDNS request comes. Here the magic
            happens, the rest is either subroutines of the magic or the accounting
            around it that calls it from time to time.
    
    what's the "rest"? what's "accounting around it"? What's it in "that calls it"?

Maybe this version is slightly better. The description could also change when it is actually implemented.

  • run: it seems that the comment about the exception at the beginning of the while loop is losing its sense. It was originally for this specific line:
                        self._cc.check_command(True)
    
    and I guess it meant exceptions from check_command(). Now it's unclear due to the distance between the comment and the line, and even if we move the comment I'm not sure if it makes much sense because we now have more code in the loop.

I understood the comment to mean like if there's bug/problem parsing the packet, or something like that, we might want to drop the packet, log the exception, but continue handling the other packets that are coming in. In that sense, it is still valid. If it was for check_command only, shouldn't we rather catch it inside the function and return a negative/error answer instead of falling through the cc session code?

  • run: is there any reason we don't handle exceptions?
                # TODO: Catch select exceptions, like EINTR
    

This one needs EINTR ignoring, right.

Also, what if self.accept() or self.handle_incoming() raises an
exception?

Well, accept should not raise EINTR, since the connection is already waiting there at the time, and others would be fatal. If these would be caught, it would be in the place where the above comment is. Similar for handle_incoming. Or do you have an idea what to do with such an exception?

Also, the listen socket was actually forgotten, not left out intentionally, so I added it.

comment:12 in reply to: ↑ 11 ; follow-up: Changed 8 years ago by jinmei

Replying to vorner:

  • run: it seems that the comment about the exception at the beginning of the while loop is losing its sense. It was originally for this specific line:
                        self._cc.check_command(True)
    
    and I guess it meant exceptions from check_command(). Now it's unclear due to the distance between the comment and the line, and even if we move the comment I'm not sure if it makes much sense because we now have more code in the loop.

I understood the comment to mean like if there's bug/problem parsing the packet, or something like that, we might want to drop the packet, log the exception, but continue handling the other packets that are coming in. In that sense, it is still valid.

On re-reading it, the comment now doesn't look that awkward. Since it
took some time already I don't quite remember why I first saw it
awkward (maybe "here" seemed to indicate a small fragment of code),
but it doesn't seem to be a big issue to address.

That said, the comment is now actually not very accurate because the
code in the while loop now catches one exception. So I'd suggest
revise the comment to something like:

            # In this event loop we propage most of exceptions, which will
            # subsequently kill the b10-ddns process, but ideally it would be
            # better to catch any exceptions that b10-ddns can recover from.
            # We currently have no exception hierarchy to make such a
            # distinction easily, but once we do, we should catch and handle
            # non fatal exceptions here and continue the process.

If it was for check_command only, shouldn't we rather catch it inside the function and return a negative/error answer instead of falling through the cc session code?

check_command() is in a different module so it's not that trivial
whether we can safely modify it for the convenience of ddns.

Also, what if self.accept() or self.handle_incoming() raises an
exception?

Well, accept should not raise EINTR, since the connection is already waiting there at the time, and others would be fatal. If these would be caught, it would be in the place where the above comment is. Similar for handle_incoming. Or do you have an idea what to do with such an exception?

accept() involves the creation of SocketSessionReceiver?, which could
fail and result in an exception. I also think some error from the
accept system call such as ECONNABORTED or EMFILE could be handled
more gracefully because it's only for a single forwarder. In general,
I rather think it's okay to catch all BIND10-internal exceptions from
accept() and handle_session(), (and maybe also check_command()), but
as commented above it's currently not easily distinguished.

So, as a compromise it may be okay for now to let it die in that case,
but I'd at least explicitly log that event within the loop before
doing so.

I have a few more minor comments on the revised code:

  • handle_session: due to the change of the method name, it's probably better to pydoc too, e.g:
            Handle incoming session on the socket with given fileno.
    


  • Likewise, test_incoming_called would better be renamed
  • This comment doesn't seem to be addressed (even if only to reject it): I'd move logging after close() so that even if logging raises an exception it at least completes the cleanup:

comment:13 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:14 in reply to: ↑ 12 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

That said, the comment is now actually not very accurate because the
code in the while loop now catches one exception. So I'd suggest
revise the comment to something like:

Yes, you're right, it got outdated, so I applied your version.

If it was for check_command only, shouldn't we rather catch it inside the function and return a negative/error answer instead of falling through the cc session code?

check_command() is in a different module so it's not that trivial
whether we can safely modify it for the convenience of ddns.

No, I meant catching them inside the command_handler and config_handler, so the exception would be converted to negative answer for the caller. If the check_command itself threw, assuming because it lost the connection to the CC, we really should terminate.

Also, what if self.accept() or self.handle_incoming() raises an
exception?

Well, accept should not raise EINTR, since the connection is already waiting there at the time, and others would be fatal. If these would be caught, it would be in the place where the above comment is. Similar for handle_incoming. Or do you have an idea what to do with such an exception?

accept() involves the creation of SocketSessionReceiver?, which could
fail and result in an exception. I also think some error from the
accept system call such as ECONNABORTED or EMFILE could be handled
more gracefully because it's only for a single forwarder. In general,
I rather think it's okay to catch all BIND10-internal exceptions from
accept() and handle_session(), (and maybe also check_command()), but
as commented above it's currently not easily distinguished.

Hmm, right, there might be some that would be recoverable. But I don't really think these would happen in practice enough to get worried about them (eg. ECONNABORTED could happen, but auth does not give up connecting now).

So, as a compromise it may be okay for now to let it die in that case,
but I'd at least explicitly log that event within the loop before
doing so.

There's bunch of log-exception calls in the main function (down in the file, outside of the class). I think these could be enough.

  • This comment doesn't seem to be addressed (even if only to reject it): I'd move logging after close() so that even if logging raises an exception it at least completes the cleanup:

I probably moved a different line previously. Anyway, this is not so critical with python, because it would lose the reference and it'd close it automatically.

comment:15 in reply to: ↑ 14 ; follow-up: Changed 8 years ago by jinmei

Replying to vorner:

accept() involves the creation of SocketSessionReceiver?, which could
fail and result in an exception. [...]

Hmm, right, there might be some that would be recoverable. But I don't really think these would happen in practice enough to get worried about them (eg. ECONNABORTED could happen, but auth does not give up connecting now).

In my experience, any things "that wouldn't happen in practice" will
happen in real production systems and will surprise us. As for
ECONNABORTED I can imagine an instance of b10-auth could crash
immediately after trying to establish a connection. But from what you
said the main difference in our views doesn't seem to be how likely
the specific exceptions can happen, but whether we'd expect having
recoverable exceptions in this context in general. I thought we
would, but you rather seem to assume all non recoverable errors should
be handled within the underlying subroutines (right now they are
check_command(), accept(), handle_session(), there will be more as we
extend the module). I'm not so sure if that's reasonable (for
example, this assumes check_command should only raise fatal
exceptions, but we cannot fully control the behavior in terms of
modularity), but I see that's one approach. In that case, however, we
should revise the comment at the beginning of the while loop: we need
to clarify the assumption on the subroutines.

So, as a compromise it may be okay for now to let it die in that case,
but I'd at least explicitly log that event within the loop before
doing so.

There's bunch of log-exception calls in the main function (down in the file, outside of the class). I think these could be enough.

I suspect this is another form of difference on the view point. I
thought we'd expect various different types of exceptions that we'd
like to catch here but will let them be propagated as a compromise.
So they would be caught in the generic "Exception" block at the top
level, which wouldn't be helpful very much because it's too generic.

If, on the other hand, we really assume there would only be fatal
exceptions, it makes sense to simply pass them through to the top
level where they are logged.

So, in summary, my revised suggestion is to match the comment and the
code assumption:

  • If we really expect not to see non fatal exceptions in the while loop, revise the comment accordingly and add note that the subroutines shouldn't propagate recoverable errors as exceptions

OR

  • If we propagate exceptions (that may or may not be fatal) as a compromise (as commented), catch them in the while loop and log that we catch them and will let the process die as a compromise.
  • This comment doesn't seem to be addressed (even if only to reject it): I'd move logging after close() so that even if logging raises an exception it at least completes the cleanup:

I probably moved a different line previously. Anyway, this is not so critical with python, because it would lose the reference and it'd close it automatically.

But in the original code self._socksession_receivers[fileno] would
stay. Anyway I saw the code was updated so I have no problem with the
latest code.

comment:16 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:17 in reply to: ↑ 15 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

If, on the other hand, we really assume there would only be fatal
exceptions, it makes sense to simply pass them through to the top
level where they are logged.

OK, you persuaded me, I stop being lazy and pretending the exceptions aren't interesting will be solved in future O:-).

I updated the comment in the loop and added few TODO ones to some of the handlers.

I also started handling the exceptions on accept.

I believe the functions themself have better knowledge of what exceptions would be fatal and what not. A socket error from the accept is recoverable, we just didn't get the exception, but socket error from the check_command might not, for example (if it ever got out, I guess it should not, but if there was a bug…).

Would this work?

Thank you

comment:18 in reply to: ↑ 17 Changed 8 years ago by jinmei

Replying to vorner:

I updated the comment in the loop and added few TODO ones to some of the handlers.

I also started handling the exceptions on accept.

I believe the functions themself have better knowledge of what exceptions would be fatal and what not. A socket error from the accept is recoverable, we just didn't get the exception, but socket error from the check_command might not, for example (if it ever got out, I guess it should not, but if there was a bug…).

Would this work?

Looks okay, with one minor nit:

(probably coming from b10-auth). We continue serving on whanever other

s/whanever/whatever/ (or whichever?)

With fixing this please feel free to merge.

comment:19 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:20 Changed 8 years ago by jinmei

  • Total Hours changed from 0 to 4.88

comment:21 Changed 8 years ago by vorner

  • Resolution set to complete
  • Status changed from reviewing to closed
  • Total Hours changed from 4.88 to 12.88

Thanks, I fixed one more thing with paths during distcheck and merged.

Note: See TracTickets for help on using tickets.