Opened 7 years ago

Closed 7 years ago

#1986 closed defect (fixed)

b10-auth shouldn't try to forward update requests without ddns running

Reported by: jinmei Owned by: jelte
Priority: medium Milestone: Sprint-20120731
Component: b10-auth Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket: DDNS
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 10 Internal?: no

Description

See http://bind10.isc.org/ticket/1539#comment:8

It would not be good for b10-auth to always try to forward update
requests whether or not b10-ddns is running. There can be several
ways to do this, but one possible approach is to use #1985
(if we do that); b10-auth creates (and uses) the forwarder only when
it gets the ddns config. In this approach, it would still be open
how b10-auth stops forwarding when ddns was once running but then
stopped, but in that case we can at least have a workaround of
restarting the system (or b10-auth).

I think we should provide this feature if only as a quick hack (e.g.,
if we don't do #1985 or don't have time for it) by the time of
next release.

Subtickets

Change History (21)

comment:1 Changed 7 years ago by jinmei

  • Summary changed from b10-auth shouldn't try to forward update requests without to b10-auth shouldn't try to forward update requests without ddns running

comment:2 Changed 7 years ago by jinmei

Now the details and plans of #1985 are not so clear, we'll need to
consider this task independently.

I can think of two options:

  1. easy but ad-hoc: introduce a new config option for b10-auth, say, "use-ddns" to manually control that
  2. probably more correct for longer term but will require more work: have ddns tell auth (or in general anyone who's interested) when it's ready for accepting requests, and extend auth to dynamically create the forwarder. With this approach we could also solve #1985 by having b10-ddns include the UNIX socket file path in the notification.

We'll first need to decide whether, how, when we solve this, and then
need to give it a fresh estimation. I suspect any prior estimations
on this ticket are now useless.

As for "when", my revised suggestion is the hardening sprint.

comment:3 follow-up: Changed 7 years ago by vorner

The point 2. would really use the support of notifications, to be implemented properly. Should we consider doing them?

comment:4 in reply to: ↑ 3 ; follow-up: Changed 7 years ago by jinmei

Replying to vorner:

The point 2. would really use the support of notifications, to be implemented properly. Should we consider doing them?

Do you mean one of the msgq tasks by notifications?

I simply don't know if it should be considered part of it, mainly due
to my poorer understanding of the whole things. But if a general
framework can solve this particular issue cleanly, that will be
certainly a longer term move.

In any case I personally think the goal of this ticket should be
achieved by the next release, so we should choose a reasonably
lightweight approach, considering the tradeoff between having another
quick hack and cleaner but timeconsuming solution.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 7 years ago by vorner

Replying to jinmei:

Replying to vorner:

The point 2. would really use the support of notifications, to be implemented properly. Should we consider doing them?

Do you mean one of the msgq tasks by notifications?

Yes, thats the exact thing I mean.

I simply don't know if it should be considered part of it, mainly due
to my poorer understanding of the whole things. But if a general
framework can solve this particular issue cleanly, that will be
certainly a longer term move.

Yes, the auth would register for notification „DDNS ready“ and then act accordingly when it comes (it would ask over some call for the path of the socket, both on startup, which could fail if auth started sooner, and when the notification comes). Then it would register the disconnection notification, so it would know when DDNS stops.

In any case I personally think the goal of this ticket should be
achieved by the next release, so we should choose a reasonably
lightweight approach, considering the tradeoff between having another
quick hack and cleaner but timeconsuming solution.

If it is so, I'd prefer having a real ugly hack that would force us doing it properly sometime. We have a huge army of reasonable compromises that just live there forever because there's no real driving to do it properly.

But I don't know how time consuming would be make the notifications work. For this one to work, we probably only need like two small wrapper functions, one to send the notification with some corresponding address and one to register the callback for it in C++. In long term, we would need registration of the callback on python too, which would be tricky, and having the list of notifications in the spec file, but I think this could wait.

comment:6 Changed 7 years ago by jinmei

  • Milestone set to Next-Sprint-Proposed

comment:7 in reply to: ↑ 5 Changed 7 years ago by jinmei

Replying to vorner:

But I don't know how time consuming would be make the notifications work. For this one to work, we probably only need like two small wrapper functions, one to send the notification with some corresponding address and one to register the callback for it in C++. In long term, we would need registration of the callback on python too, which would be tricky, and having the list of notifications in the spec file, but I think this could wait.

Could you be more specific about this, probably in a different ticket?
I can then give it an estimate and decide which way we'll solve it
for the moment.

comment:8 Changed 7 years ago by shane

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120703

comment:9 Changed 7 years ago by jelte

  • Owner set to jelte
  • Status changed from new to assigned

comment:10 Changed 7 years ago by jelte

  • Owner changed from jelte to UnAssigned
  • Status changed from assigned to reviewing
  • Total Hours changed from 0 to 7

Ready for review.

I went for the messaging approach:

AuthSrv[Impl] now has a few methods to initialize/reset, and unset the SocketSessionForwarderHolder object, which are controlled through messages;

  • If b10-ddns starts, it sends a command to auth to create the forwarder object.
  • When b10-ddns stops (cleanly), it sends a command to stop.
  • both of these expect a success response; b10-ddns would currently be useless without auth to pass it messages anyway, and this way it can report a problem if setup fails for some reason.

I left the initial construction of AuthSrv intact; i.e. you still need to pass it the SocketSessionForwarder; auth will delay creation that specialized holder class until it receives the first message there.

Additionally, when b10-auth starts, it sends a message to b10-ddns as well ('auth_started'). It does *not* expect a response here. This is simply a hint that if b10-ddns exists, it should resend its request to forward updates. In this way, if b10-auth is restarted, the connection is automatically restored. (we may want to generalize this construct, but going all yagni I just made it to ddns directly for now).

I did not include any arguments in these commands; the initialization and fixed locations are the same to keep the diff reasonable (it's still a bit big, but largely due to class fluff)

Note: I don't think we have a ticket for it, but with this change the problem that the first UPDATE after b10-ddns is restarted fails is also fixed. And of course b10-auth now responds with NOTIMP if b10-ddns is not running (Subnote: if b10-ddns crashes uncleanly, during the restart or if it is not restarted at all, auth will still respond with SERVFAIL, but I think this is ok, since this really is an internal server failure scenario).

comment:11 Changed 7 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:12 follow-up: Changed 7 years ago by jinmei

other/general

  • I made a few small editorial changes.
  • I think we need a changelog for this.
  • About the protocol: What if start/stop_ddns_forwarder takes place multiple times without having the other?
  • Are we eventually going to generalize the notification mechanism? I think the concept is pretty generic, so ideally we have a unified framework and each module just uses the common interface.

auth.spec

  • Are these expected to be invoked by a user, too? If not, what if it happens?
  • I'd use 'message(s)' instead of 'packets':
            "command_description": "(Re)start internal forwarding of DDNS Update packets. This is automatically called if b10-ddns is started.",
    
    same for the stop counterpart.

auth_messages.mes

Just a comment: I wonder whether these messages could be INFO, because
it should be sufficiently rare and reports some important changes on
how the system will work.

auth_src.cc

  • Is hasDDNSForwarder() necessary? Although the policy already seems to be inconsistent (and I may be one of those introducing the confusion), since the entire AuthSrvImpl is essentially a private member of the AuthSrv class I don't see the strong need for introducing further accessibility barrier within the impl class. And, if we make ddns_forwarder_ public, there won't be need for the explicit accessor either. (and I then noticed the same discussion applies to the crete/destroy methods in the Impl class).

auth_src.h

  • I'd use 'message' instead of 'packet' where appropriate.

auth_srv_unittest

shouldn't we test various new cases that can now happen? e.g., make
sure auth returns NOTIMP if forwarder isn't created.

I also guess we should probably confirm the new auth commands actually
create/destroy the forwarder at the unit test level.

ddns.py

  • I'd do s/packets/messges/
            # Notify Auth server that DDNS update packets can now be forwarded
            self.__notify_start_forwarder()
    
    same for docstring for __notify_start/stop_forwarder, and some other places too.
  • command_handler: not only for this case, but I'm not a fan of duplicate hardcoding things like auth_started. maybe we can consider a generic, cleaner and unified way to handle these cases at this opportunity? (maybe a hardening topic?)
  • __notify_start_forwarder: what if group_send/recvmsg raises an exception? Same for stop.
  • __notify_stop_forwarder: docstring doesn't seem to be correct (copy-paste error).
  • If start_forwarder in response to auth_started fails, it can happen both auth and ddns are running but update requests will keep resulting in NOTIMP. I wouldn't require this be resolved in this ticket, but I think we need to define design principles of inter module communication on, e.g., how much of reliability we'd expect and what each module should do to ensure that.

ddns_messages.mes

  • 'DDNS_START_FOWARDER_ERROR' should be '...FORWARDER...'. And this seems to suggest we don't have sufficient tests.
  • the description of the error message: I'd describe how this could happen and what the operator should do if this message is printed.

ddns_test

  • check_session_stop_forwarder_called: docstring isn't correct (copy paste)
  • As noted above, test cases where sendmsg etc fail are missing.
  • shouldn't we (also) check "start update" has been sent immediately after the creation of DDNS? Likewise, I guess we should test the 'auth_started' command really triggers this message (for that matter, this command doesn't seem to be tested at all).

comment:13 Changed 7 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:14 in reply to: ↑ 12 ; follow-up: Changed 7 years ago by jelte

  • Owner changed from jelte to jinmei

Replying to jinmei:

other/general

  • I made a few small editorial changes.

thnx

  • I think we need a changelog for this.

Ack, proposal:
[bug] jelte
b10-auth no longer tries to send DDNS UPDATE messages to b10-ddns if b10-ddns is not running. Sending an UPDATE to BIND 10 that is not configured to run DDNS will now result in a response with rcode NOTIMP instead of SERVFAIL.

  • About the protocol: What if start/stop_ddns_forwarder takes place multiple times without having the other?

For multiple starts, it results in the connection being re-established (to cover the case where b10-ddns dies without being able to send its goodbye message). For multiple stops, it is a no-op after the first one (or before start has even happened).

  • Are we eventually going to generalize the notification mechanism? I think the concept is pretty generic, so ideally we have a unified framework and each module just uses the common interface.

I'm not against it, but have no current idea as to how.

auth.spec

  • Are these expected to be invoked by a user, too? If not, what if it happens?

They are not, but we need some form of 'internal' designator for commands, so we can verify them but reject them for the user. There is an old ticket for that, #941.

  • I'd use 'message(s)' instead of 'packets':
            "command_description": "(Re)start internal forwarding of DDNS Update packets. This is automatically called if b10-ddns is started.",
    
    same for the stop counterpart.

done

auth_messages.mes

Just a comment: I wonder whether these messages could be INFO, because
it should be sufficiently rare and reports some important changes on
how the system will work.

I am not heavily opposed, but I chose (low) debug, to not add to the number of startup messages.

auth_src.cc

  • Is hasDDNSForwarder() necessary? Although the policy already seems to be inconsistent (and I may be one of those introducing the confusion), since the entire AuthSrvImpl is essentially a private member of the AuthSrv class I don't see the strong need for introducing further accessibility barrier within the impl class. And, if we make ddns_forwarder_ public, there won't be need for the explicit accessor either. (and I then noticed the same discussion applies to the crete/destroy methods in the Impl class).

They are not necessary, I tend to default to writing accessors over direct manipulation of members. Removed.

auth_src.h

  • I'd use 'message' instead of 'packet' where appropriate.

only one case, right? fixed.

auth_srv_unittest

shouldn't we test various new cases that can now happen? e.g., make
sure auth returns NOTIMP if forwarder isn't created.

I also guess we should probably confirm the new auth commands actually
create/destroy the forwarder at the unit test level.

oh, right, kind of forgot about that given the system tests.

Added a test for this, which also checks behaviour is correct when sending multiple starts and stops.

ddns.py

  • I'd do s/packets/messges/
            # Notify Auth server that DDNS update packets can now be forwarded
            self.__notify_start_forwarder()
    
    same for docstring for __notify_start/stop_forwarder, and some other places too.

done

  • command_handler: not only for this case, but I'm not a fan of duplicate hardcoding things like auth_started. maybe we can consider a generic, cleaner and unified way to handle these cases at this opportunity? (maybe a hardening topic?)

yeah, I didn't want to write a wrapper just for a message string at this moment, but we can certainly consider a better way to do this.

  • __notify_start_forwarder: what if group_send/recvmsg raises an exception? Same for stop.

catching it and logging error now

  • __notify_stop_forwarder: docstring doesn't seem to be correct (copy-paste error).

fixed

  • If start_forwarder in response to auth_started fails, it can happen both auth and ddns are running but update requests will keep resulting in NOTIMP. I wouldn't require this be resolved in this ticket, but I think we need to define design principles of inter module communication on, e.g., how much of reliability we'd expect and what each module should do to ensure that.

yes that is the price of not wanting to forward packets when it is not running (or more generally, in order to make use of inter-module communication you need to rely on inter-module communication, and that the communicating agents play nice)

what exactly did you have in mind?

ddns_messages.mes

  • 'DDNS_START_FOWARDER_ERROR' should be '...FORWARDER...'. And this seems to suggest we don't have sufficient tests.

hmm, indeed. Fixed and added (also for exceptions from sendmsg/recvmsg)

  • the description of the error message: I'd describe how this could happen and what the operator should do if this message is printed.

added about the same info as for when exceptions are raised from send/recv, but for the real cause of the error one should check auth's logs, this module won't know any more than that it got an error back.

ddns_test

  • check_session_stop_forwarder_called: docstring isn't correct (copy paste)

fixed

  • shouldn't we (also) check "start update" has been sent immediately after the creation of DDNS? Likewise, I guess we should test

that was already tested, added a comment (though the easiest way to test the next one included an extra check for this as well)

the 'auth_started' command really triggers this message (for that
matter, this command doesn't seem to be tested at all).

added

comment:15 in reply to: ↑ 14 ; follow-up: Changed 7 years ago by jinmei

Replying to jelte:

Ack, proposal:
[bug] jelte
b10-auth no longer tries to send DDNS UPDATE messages to b10-ddns if b10-ddns is not running. Sending an UPDATE to BIND 10 that is not configured to run DDNS will now result in a response with rcode NOTIMP instead of SERVFAIL.

Looks okay.

auth.spec

  • Are these expected to be invoked by a user, too? If not, what if it happens?

They are not, but we need some form of 'internal' designator for commands, so we can verify them but reject them for the user. There is an old ticket for that, #941.

In that case I'd at least clarify in the command_description that they
are not supposed to be invoked by users.

auth_src.cc

  • Is hasDDNSForwarder() necessary? Although the policy already seems to be inconsistent (and I may be one of those introducing the confusion), since the entire AuthSrvImpl is essentially a private member of the AuthSrv class I don't see the strong need for introducing further accessibility barrier within the impl class. And, if we make ddns_forwarder_ public, there won't be need for the explicit accessor either. (and I then noticed the same discussion applies to the crete/destroy methods in the Impl class).

They are not necessary, I tend to default to writing accessors over direct manipulation of members. Removed.

In general I prefer providing accessors too because it's usually
safer. But in this case it seemed to be overkilling because
ddns_forwarder_ etc are essentially a private member of the
AuthSrv class (using a separate struct/class as pimpl is just an
idiom to hide these details from the users of the AuthSrv class).

  • If start_forwarder in response to auth_started fails, it can happen both auth and ddns are running but update requests will keep resulting in NOTIMP. I wouldn't require this be resolved in this ticket, but I think we need to define design principles of inter module communication on, e.g., how much of reliability we'd expect and what each module should do to ensure that.

yes that is the price of not wanting to forward packets when it is not running (or more generally, in order to make use of inter-module communication you need to rely on inter-module communication, and that the communicating agents play nice)

what exactly did you have in mind?

I don't have a specific idea or proposal right now. But I guess the
bind10 process keeps track of more (if it's not sufficient currently)
details of which module is running and which is not, and how many if a
module runs multiple instances, and if a module needs to know whether
or not another module is running (per module or per instance) it gets
updates on that information from bind10. The current shutting down
mechanism will be extended so that it will be more robust and the
bind10 process can use it.

  • shouldn't we (also) check "start update" has been sent immediately after the creation of DDNS? Likewise, I guess we should test

that was already tested, added a comment (though the easiest way to test the next one included an extra check for this as well)

I knew check_session_start_forwarder_called() is called in
test_session_msg() etc, but I thought it's doesn't clearly indicate
it's done "immediately" after the DDNS creation (depending on what
other things setUp() does, "start forwarder" may actually be sent
after some specific event that setUp() happens to invoke. This is
actually not the case in the current test code, but the distance
between the creation and the check seems to make it less reliable.

One possibility is to do this check within setUp() (a bit redundant to
do this for every test, though). Another is to create a separate test
case where we create a separate DDNSServer object with a separate
MyCCSession, and do check_session_start_forwarder_called() for these
pair immediate after the creation.

Comments on the revised code follow:

general

  • This didn't occur to me in the first review, but I now wonder how this notification works with multiple instances of b10-auth. For example, I guess all of them responds to "start forwarding"; could it cause any disruption? Even if not, I suspect it's possible that only a subset of auth instance succeeds in setting up the forwarder (although it may not be so different from the case where a single auth instance fails). Unless it causes a trouble in normal operation, I think it's okay to leave this open, but this seems to be another topic in the context of general framework.
  • I've made some editorial cleanups.

auth_src_unittest

  • This comment doesn't parse to me:
        // Test that AuthSrv returns NOTIMP before ddns forwarder is created,
        // that is is connected when the start_ddns_forwarder command is sent,
        // and that is it is no longer connected and returns NOTIMP after
        // the stop_ddns_forwarding command is sent.
    
    s/that is is/that it is/? and s/that is it is/that it is/?

comment:16 Changed 7 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:17 in reply to: ↑ 15 ; follow-up: Changed 7 years ago by jelte

  • Owner changed from jelte to jinmei

Replying to jinmei:

auth.spec

  • Are these expected to be invoked by a user, too? If not, what if it happens?

They are not, but we need some form of 'internal' designator for commands, so we can verify them but reject them for the user. There is an old ticket for that, #941.

In that case I'd at least clarify in the command_description that they
are not supposed to be invoked by users.

done (and that they will be removed as public commands in the future)

I don't have a specific idea or proposal right now. But I guess the
bind10 process keeps track of more (if it's not sufficient currently)
details of which module is running and which is not, and how many if a
module runs multiple instances, and if a module needs to know whether
or not another module is running (per module or per instance) it gets
updates on that information from bind10. The current shutting down
mechanism will be extended so that it will be more robust and the
bind10 process can use it.

ah, right, yes, IIRC that was part of the intended msgq changes. I was thinking more generally of communication protocols between modules (even one-shot module-to-module commands)

  • shouldn't we (also) check "start update" has been sent immediately after the creation of DDNS? Likewise, I guess we should test

that was already tested, added a comment (though the easiest way to test the next one included an extra check for this as well)

I knew check_session_start_forwarder_called() is called in
test_session_msg() etc, but I thought it's doesn't clearly indicate
it's done "immediately" after the DDNS creation (depending on what
other things setUp() does, "start forwarder" may actually be sent
after some specific event that setUp() happens to invoke. This is
actually not the case in the current test code, but the distance
between the creation and the check seems to make it less reliable.

One possibility is to do this check within setUp() (a bit redundant to
do this for every test, though). Another is to create a separate test
case where we create a separate DDNSServer object with a separate
MyCCSession, and do check_session_start_forwarder_called() for these
pair immediate after the creation.

oh actually it is a relatively light check that can be performed in setup, moved it.
(the only potential problem is that other tests might fail on this if there is a bug here, if we see that as a problem, but if any fail this would need to be fixed anyway)

Comments on the revised code follow:

general

  • This didn't occur to me in the first review, but I now wonder how this notification works with multiple instances of b10-auth. For example, I guess all of them responds to "start forwarding"; could it cause any disruption? Even if not, I suspect it's possible that only a subset of auth instance succeeds in setting up the forwarder (although it may not be so different from the case where a single auth instance fails). Unless it causes a trouble in normal operation, I think it's okay to leave this open, but this seems to be another topic in the context of general framework.

ack

  • I've made some editorial cleanups.

of course :)

auth_src_unittest

  • This comment doesn't parse to me:
        // Test that AuthSrv returns NOTIMP before ddns forwarder is created,
        // that is is connected when the start_ddns_forwarder command is sent,
        // and that is it is no longer connected and returns NOTIMP after
        // the stop_ddns_forwarding command is sent.
    
    s/that is is/that it is/? and s/that is it is/that it is/?

ack, also made explicit references to forwarder and auth in an attempt to be more clear

comment:18 in reply to: ↑ 17 Changed 7 years ago by jinmei

I made one last minor editorial cleanup. If you have confirmed this branch
works with multiple b10-auth instances, please merge.

comment:19 Changed 7 years ago by jinmei

  • Total Hours changed from 7 to 10

comment:20 Changed 7 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:21 Changed 7 years ago by jelte

  • Resolution set to fixed
  • Status changed from reviewing to closed

Thanks, merged, closing ticket.

Note: See TracTickets for help on using tickets.