Opened 9 years ago

Closed 8 years ago

#805 closed enhancement (complete)

Client side code to request new sockets (C++)

Reported by: stephen Owned by: vorner
Priority: very high Milestone: Sprint-20120124
Component: Unclassified Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket: Socket creator
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 20 Internal?: no

Description


Subtickets

Attachments (2)

requestor.diff (15.5 KB) - added by jinmei 8 years ago.
libexecpath.diff (5.9 KB) - added by jinmei 8 years ago.

Download all attachments as: .zip

Change History (32)

comment:1 Changed 8 years ago by jelte

  • Defect Severity set to N/A
  • Estimated Difficulty changed from 0.0 to 5
  • Sub-Project set to DNS

comment:2 Changed 8 years ago by jelte

  • Milestone changed from Year 3 Task Backlog to Sprint-20111206

comment:3 Changed 8 years ago by shane

  • Feature Depending on Ticket set to Socket creator

comment:4 Changed 8 years ago by vorner

  • Owner set to vorner
  • Status changed from new to accepted

comment:5 Changed 8 years ago by vorner

Some notes from some thinking & reading of docs (mostly so I don't forget over the weekend).

We'll need to split this into two parts. One part will be responsible for getting the socket from the boss. It'll need a ccsession and a socket to ask for. It'll send the request over the ccsession, wait for answer, connect to the unix socket there and pick up the file descriptor. The unix socket must be kept open (an, hopefully, reused). This should be relatively easy to write, but hard to test (no idea how to test this). The ccsession has group_sendmsg and group_recvmsg for this. We may want to use a batched mode, to send out all the requests at once and then pick them all at once, to minimise the RTTs, but probably not now.

The other part is the dns service. It'd probably get the file descriptor and protocol and will put it inside. It'll need the assign method most of the asio sockets have.

Then the installAddresses function will connect these two together.

Also, we need a place to store the old sockets to drop them once they are changed. Maybe we can first request all the new ones, then clean them from the dns service, then put the new ones inside and last of all release the old ones, as it is safer and we will get just dups of the original ones if they are the same (with using the SAMEAPP share).

comment:6 Changed 8 years ago by jelte

  • Priority changed from major to blocker

comment:7 follow-up: Changed 8 years ago by jinmei

While looking at #1522 I've noticed some portability and other issues
on the base #805 implementation. You may want to take a look at that
part of the comment:
http://bind10.isc.org/ticket/1522#comment:5

I'd also suggest building and running the code on other flavor of OSes
(some kind of BSD and ideally also Solaris) before merging the branch.

comment:8 in reply to: ↑ 7 Changed 8 years ago by vorner

  • Owner changed from vorner to UnAssigned
  • Status changed from accepted to reviewing

Hello

Replying to jinmei:

While looking at #1522 I've noticed some portability and other issues
on the base #805 implementation. You may want to take a look at that
part of the comment:
http://bind10.isc.org/ticket/1522#comment:5

Thanks for them, I'll have a look at them.

I'd also suggest building and running the code on other flavor of OSes
(some kind of BSD and ideally also Solaris) before merging the branch.

I don't have them at hand :-(. I'll ask Jeremy to put it to the build bots sometime before merge.

Anyway, despite the fact I didn't yet have a look at the code based on the comments, I'd put it to review, so I can fix it and have it being reviewed in parallel.

This branch does these things:

  • Allows the DNS service to add a listening sockets based on FD as well as address and port pair.
  • Makes the portconfig library use the socket creator instead of creating the sockets itself. It does not use any kind of sharing for now and hardcodes the share name, but this has the minimal impact of the rest of the code and makes auth and resolver use the socket creator right away (the only change in them is including the initialization of the socket requestor).
  • Defines the interface of the socket requestor, but the real implementation is in #1522. I want to merge that one to this branch first and try it out if the whole socket creator thing works (there's a lot of stuff that is tested only by unit tests and it is very untried concept ‒ I expect it'll break somewhere).
  • Creates a test socket requestor, so it can be used in other tests (portconfig tests, auth tests, resolver tests).

The portconfig library isn't tested completely ‒ the tests disable putting the sockets to ASIO, as the ASIO would like to close the passed file descriptors. However, the test socket requestor makes the file descriptors up, which becomes quite messy then, if they are closed.

Anyway, the branch can be reviewed on three parts, if reviewing at once is too much, eg:

  • f9cbe6fb6e0a3d3dc03218429f530e6c01920169..5345a3c2ba2daf9329b33984782ad50f72734837 (changes to put FDs to the DNS service)
  • c8caecc35195080d091611bb0809cbc2834ba3f4 (interface for the socket requestor)
  • The rest ‒ portconfig library and tests

I'd propose this changelog entry (probably together with #1522):

[func]		vorner,jelte
The authoritative server and resolver request their listening sockets
from the socket creator. The change should be transparent to user, but
it should be possible to restart them or configure privileged ports
even later, when they run as non-root.

comment:9 follow-up: Changed 8 years ago by vorner

So, answering this from #1522:

dns_server_unittest.cc
This is not for this branch (I guess it's a result of importing #805),
but the tests didn't pass for me, too. I needed to add this:

         struct sockaddr_in addr;
+        memset(&addr, 0, sizeof(addr));
         addr.sin_family = AF_INET;

While looking at it I noticed some other things for this file:

  • some of the above comments apply to this implementation, too: exception safety, consideration for sin_len, non plain-old static object, unnecessary namescope-level static.
  • I would personally use getaddrinfo to create a sockaddr() to avoid this kind of portability issue.
  • I would use IPv6 for testing for future generation software like BIND 10 (we could also test IPv4 if we want).
  • technically, passing protocol 0 to socket for AF_INET isn't good:
            int result(socket(AF_INET, type, 0));
    
    If and when DCCP or SCTP is more common this can be ambiguous.

I switched for the getaddrinfo way. I'm not sure if the protocol field is needed, as the type is SOCK_STREAM or SOCK_DGRAM always (and for the DCCP or SCTP, there'd be a different type), but I use it anyway.

An IPv6 address is used (I didn't want to change the whole tests again to run twice, once with one and once with other address).

I did try to find some of the unnecessary static ones (but there are some that are needed, some are function-level and some make it not compile when removed). Also, the std::string, while arguably safe as it is standard library, is now char*.

I didn't notice anything about exception safety, only the socket wasn't released when it returned an error. But if there was an exception, the test would terminate right away anyway, so I don't think it is important.

Did I miss something important?

Thank you

comment:10 Changed 8 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:11 follow-up: Changed 8 years ago by jinmei

Not sure if it still helps, but I'm dumping what I've noted about the
branch. It's only for asiodns changes.

asiodns

  • The 'v6' parameter seems to be ad hoc. I suspect it's also less usable for the caller, because the caller needs to identify which version of IP it is passing just like you indeed did in portconfig.cc:setAddresses:
            bool is_v6(IOAddress(address.first).getFamily() == AF_INET6);
    
    I'd rather pass opaque ID of the address family (in practice it would be an integer and either AF_INET or AF_INET6) and keep the caller agnostic about the address-IPversion mapping...and, on further thought, I guess it makes more sense to have the boss process return to the requestor the information of the family as well as 'token' and 'path', and have requestor return it as part of the return value of requestSocket(). After all, the boss process is the one that necessarily interprets the address so it would make most sense if we centralize the mapping processing there.
  • Don't we need a log message when adding a server?
  • dns_service.h: we need parameter descriptions of addServerXXX. Also note about possible exceptions (if any).
  • addServerFromFD: we need to catch asio specific exceptions and convert them to IOError, etc. We also need to test it (as the fact that I have to point it out seems to indicate there's no test for it).
  • The naming of addServerTCP and addServerUDP is not very intuitive in that they don't indicate we are creating a server object from an FD. Maybe something like add{TCP,UDP}ServerFromFD()?

comment:12 in reply to: ↑ 11 Changed 8 years ago by vorner

Hello

Replying to jinmei:

Not sure if it still helps, but I'm dumping what I've noted about the
branch. It's only for asiodns changes.

Thanks, any bit helps.

asiodns

  • The 'v6' parameter seems to be ad hoc. I suspect it's also less usable for the caller, because the caller needs to identify which version of IP it is passing just like you indeed did in portconfig.cc:setAddresses:
            bool is_v6(IOAddress(address.first).getFamily() == AF_INET6);
    
    I'd rather pass opaque ID of the address family (in practice it would be an integer and either AF_INET or AF_INET6) and keep the caller agnostic about the address-IPversion mapping...and, on further thought, I guess it makes more sense to have the boss process return to the requestor the information of the family as well as 'token' and 'path', and have requestor return it as part of the return value of requestSocket(). After all, the boss process is the one that necessarily interprets the address so it would make most sense if we centralize the mapping processing there.

I changed it for passing the address family for now. It might be better to get it from the boss (though dragging the information so far away seems a waste when it is possible to get it so easily), but I'd rather avoid this change for now. This branch is large enough and this touches changes both here, in the #1522 branch and in the boss itself.

  • Don't we need a log message when adding a server?

Added.

  • dns_service.h: we need parameter descriptions of addServerXXX. Also note about possible exceptions (if any).

Added.

  • addServerFromFD: we need to catch asio specific exceptions and convert them to IOError, etc. We also need to test it (as the fact that I have to point it out seems to indicate there's no test for it).

I added something. I convert all the exceptions (but only around the asio things that might throw), because it started to throw some really strange things from the boost namespace.

  • The naming of addServerTCP and addServerUDP is not very intuitive in that they don't indicate we are creating a server object from an FD. Maybe something like add{TCP,UDP}ServerFromFD()?

Changed.

comment:13 follow-up: Changed 8 years ago by jinmei

More comments on asiodns changes. I've done for this part.

I made some suggested changes directly to the branch.

misc small comments on asiodns

  • logger.{h,cc}: copyright year should be 2012 as they are new files?
  • tcp_server.cc: isn't it better to use LOG_DEBUG instead of logger.debug to avoid unnecessary overhead? Same for udp_server.
  • tcp_server.cc: is accesptor_.reset() exception free? If not it's safer to include it in the try block. Same for udp_server.
  • tcp_server.cc: I thought we generally place 'catch' at the same line as the closing brace for 'try':
        } catch (const std::exception& exception) {
            // Whatever the thing throws, it is something from ASIO and we convert
            // it
            isc_throw(IOError, exception.what());
        }
    
    maybe a matter of taste except for consistency, though. Same for udp_server.cc
  • udp_server.cc: I don't see the need for the temporary variable 'proto' and simplify the code (just like the TCP counterpart)
    +        try {
    +            socket_->assign(af == AF_INET6 ? udp::v6() : udp::v4(), fd);
    +        }
    
    
  • I have some suggestions on the doxygen description of dns_service.h. directly committed.

dns_server_unittest.cc

  • why do we use SetUp() and commonSetup()? Using constructors seem to be more natural, and by doing so we could constify some member variable(s) (such as server_address_).
  • freeaddrinfo should be removed here:
    +        if (error != 0) {
    +            freeaddrinfo(res);
    
  • I'd personally want to make it more exception safe by defining 'checker_', 'lookup_' etc as scoped pointers. See also the discussion about FD leak below. But this branch isn't responsible for it and we are running out of time, so I'm okay with deferring it to a later cleanup (maybe we should create a ticket and get it done in the next sprint).
  • FdInit::SetUp?() could leak created sockets on exception. While in this case we could say it doesn't matter much because the test process would immediately die on the exception anyway, I'd personally like to make our code as clean as possible, even for tests. If we leave such loose code I'm afraid it would degrade the entire product by giving the impression that we generally code it carelessly. I'm also afraid that such misdemeanor bugs would introduce a larger amount of noise if and when we apply static analysis tools to tests, thereby obscuring other, more serious bugs. Finally, leaks in tests may introduce other unexpected side effects such as exceeding the allowable limit of open files, making some subsequent tests fail. Although it would not be a real concern in this particular case, I'd rather try to make all code safe than selectively being loose on cases where "it should be safe". I understand brevity is important, and it could be possible that sometimes brevity may be more important than correctness for tests, so if we needed complicated/tricky workaround to make the code "safer", maybe it could be justified to ignore the error case (and in that case we'd add a false-positive filter or something for static analysis tools). But in this specific case I believe making the code safer doesn't require a large trick. We can simply introduce and use a wrapper container like the ScopedSocket structure I used in socketsession.cc.

That said, considering we are running out of time again, I'm okay
with deferring this, too. I thought I proposed making ScopedSocket?
a semi-public utility, so we might defer the cleanup until that
point.

  • it makes me nervous to define non plain-old static object like this:
    oasio::io_service DNSServerTest::service;
    
    If this really needs to be (class) static, I'd define it as a pointer and new it only when it's NULL in the constructor.
            static asio::io_service* service;
            DNSServerTest() {
                if (service == NULL) {
                    service = new asio::io_service;
                }
            }
    

comment:14 follow-up: Changed 8 years ago by jinmei

I've completed my review.

First, I made a few minor changes directly to the branch which I
believe are correct and/or non controversial.

Second, while looking at details, a lot of design level
questions/suggestions/concerns arose. I think these are substantial
(at least to me) to some extent, but I'm afraid we don't have time to
resolve all of them before the release (even if the "resolution"
results in rejecting the issue after discussions). And, in some sense
many of these points are minor in that it would work somehow as an
initial proof of concept. So, maybe the possible way forward is to
only address easy points and merge the result, then work on more
fundamental points immediately after the release.

higher level points

The following points are some of the design level points. I'd like to
resolve the first one before the release (I have a concrete proposal
so if we agree with it that part is easy), but the other two can be
deferred.

  • I have to say I don't think it a good idea to use multiple inheritance for TestSocketRequestor?. While I'm generally more convinced that multiple inheritance does more harm than good in many cases, I'm also willing to be flexible and can live with it if it really makes the design elegant which couldn't be realized in other ways. Still, in this specific case, it seems to me an unnecessary abuse of it. I have a counter proposal of doing the same thing using more straightforward composition while avoiding (diamond style) multiple inheritance. Please see the attached diff.
  • although this is not in the direct scope of this branch: while thinking about various issues I encountered in the branch (detailed below), I now wonder why we need to make SocketRequestor? a singleton by design. Singleton is a very strong idiom and inherits many bad properties of global variables (the fact that we need to use something like "initTest" for tests is one of them), so we'd rather avoid relying on it unless we have a strong need for it. In our use case, the application would still need to explicitly create a specific instance either by SocketRequestor::init() or instantiating TestSocketRequestor?, we could make the init function a factory (returning an instance) and use a pointer or reference to it wherever we need to get access to it. Then the most part of the application code won't have to be aware that SocketRequestor? is a singleton, and, in particular, we should be able to simplify tests (we'd be able to just create and discard a test requestor for each test case). If we still want to make sure the requestor is never instantiated twice in normal applications by interface, we could implement that logic in the factory function.
  • more or less related to the above two points, it seems to me the portconfig logic is getting more complicated with various states (at least the requestor and "current_sockets" were newly introduced states that are cooperating in the same context), so I wonder it makes more sense to introduce a class to enclose the configuration logic. Then we'd construct a "configurator" object with a pointer or reference to the socket requestor (thereby eliminating the need for getting access to the requestor via the singleton interface) and have the object maintain "current sockets" as a class attribute (thereby eliminating the use of semi-global namespace scope static variable). But, aside from whether this idea is a good one, I also think we need to revisit how to configure listen ports in general (detailed below). So this point should be considered as part of revisiting the entire scheme.

(test) socket_request.h

  • can't we avoid using the global portconfig::test_mode variable? I understand we sometimes have to sacrifice the cleanness for testability as a matter of practice, but this variable is accessed far from the original module (i.e. server_common) and it's difficult to track the dependency. We could, for example, introduce an abstraction of DNSService and use a special derived class of it for tests, where addServerXXXFromFD does nothing.
  • personally I'd hide gtest many of the function definitions from the header file. in particular, I'd avoid directly relying on gtest specific definitions in a header file (and including gtest.h, which subsequently imports many other definitions including macros and will make this header file more susceptible to ordering issues).

portconfig.cc

  • use of namespace scope static variable doesn't seem to be good:
    vector<string> current_sockets;
    
    both in terms of the risk of having initialization fiasco and of introducing a semi-global mutable variable, which is difficult to be tracked (causing bugs, making debug more difficult).
  • setAddresses() doesn't provide the strong exception guarantee (for current_sockets), and if something goes wrong in the middle of this function current_sockets and the boss will (possibly) have some inconsistent states. In this context note also that even releaseSocket() could fail and throw; it's not a usual resource release function (which would generally be considered exception free), but involves many complicated things that can fail, including message creation and network communication. Further, the entire logic of the port configuration doesn't seem to be a good one to me in the first place. It first clears all server objects (so the corresponding sockets will be closed) even if we are reusing the some of the address/port pairs. This could lead to dropping queries even if the server could otherwise handle. We should at least avoid the unnecessary close-rebind process. Also, as commented in installListenAddresses(), it doesn't seem to be the best behavior to try to use the all or nothing policy here. We can discuss this, but my expected behavior would be to handle each address-port config separately and if some of them fails we basically keep going with others (while leaving error logs for the failure). For that matter, that's how BIND 9 works, too.

Frankly, I don't know what we should do with thin in the mean time.
All these weird things seem to suggest that it's time to revisit the
whole design of port configuration, especially with taking into
account the introduction of the socket creator. On the other hand,
I don't think we can complete that task before the coming release.
One possible compromise might be to introduce this version for the
release, ignoring all these issues, and immediately revisit the
design after the release at the highest priority.

  • setAddresses(): what's the magic string of "dummy_app"?
  • While this is minor compared to the bigger issue explained in the bigger comment above, the recovery process in installListenAddresses() is now even more unreliable:
            try {
                setAddresses(service, addressStore);
            }
    
    because even releasing the address in setAddresses() may now fail.
  • I don't understand this comment:
                // Releasing them should really work
    
    (What does "should really work" mean?)

auth_srv_unittest.cc

  • In the following way, dnss_ and address_store_ are passed to TestSocketRequestor? before being initialized.
        AuthSrvTest() :
            TestSocketRequestor(dnss_, address_store_, 53210),
    
    It may or may not be (happen to be) safe depending on the details of TestSocketRequestor?, but not a good practice anyway. But more fundamentally I'd rather avoid using inheritance here (see the higher level issues above). Note: this initialization issue exists in other cases where TestSocketRequestor?() is initialized. Note also that this problem doesn't exist in the proposed diff.

portconfig_unittest.cc

  • I suspect this TODO is now even more impossible.
        // TODO Maybe some test to actually connect to them
    
  • Shouldn't the exception class be more specific?
        // This should not bind them, but should leave the original addresses
        EXPECT_THROW(installListenAddresses(invalid_, store_, dnss_), exception);
    
    Same for the brokenRollback test.
  • I don't understand this comment. What's 'everything'? Perhaps you mean 'releases' instead of 'returns'?
    // Try it at least returns everything when even the rollback fails.
    
  • in the following comment, "These" should be "the first pair"?
        // These should be requested in the first part of the failure to bind
        // and the second pair in the first part of rollback
    
  • I'd say there will be more error cases to test. For example, there's a case when release fails. But I'm not sure if we bother to be complete here, if our next action is to revisit the design fundamentally.

Changed 8 years ago by jinmei

comment:15 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:16 in reply to: ↑ 9 Changed 8 years ago by jinmei

Replying to a couple of points from the previous responses (I don't
have any issues with others).

I didn't notice anything about exception safety, only the socket wasn't released when it returned an error. But if there was an exception, the test would terminate right away anyway, so I don't think it is important.

Did I miss something important?

No, it's more about an opinion matter. I've explained my point
in http://bind10.isc.org/ticket/805?replyto=9#comment:13.

I changed it for passing the address family for now. It might be better to get it from the boss (though dragging the information so far away seems a waste when it is possible to get it so easily), but I'd rather avoid this change for now. This branch is large enough and this touches changes both here, in the #1522 branch and in the boss itself.

This is okay for me.

comment:17 in reply to: ↑ 13 Changed 8 years ago by vorner

Hello

Replying to jinmei:

I made some suggested changes directly to the branch.

Thanks for them.

  • logger.{h,cc}: copyright year should be 2012 as they are new files?

Ah, it's a new year already, I didn't get used to it yet :-|.

  • tcp_server.cc: is accesptor_.reset() exception free? If not it's safer to include it in the try block. Same for udp_server.

The reset itself would be exception safe (because the pointer is null, so it calls no delete and the only code inside would be assignment to a pointer).

But thinking about it, who knows what the constructor of the socket might throw, so I included it inside the block.

  • tcp_server.cc: I thought we generally place 'catch' at the same line as the closing brace for 'try':

Yes, I just forget about it, I'm used to a different style.

  • why do we use SetUp() and commonSetup()? Using constructors seem to be more natural, and by doing so we could constify some member variable(s) (such as server_address_).

Changed, with the exception of SetUp? of the FdInit?, because that one uses the ASSERT_* macros and they contain return.

  • I'd personally want to make it more exception safe by defining 'checker_', 'lookup_' etc as scoped pointers. See also the discussion about FD leak below. But this branch isn't responsible for it and we are running out of time, so I'm okay with deferring it to a later cleanup (maybe we should create a ticket and get it done in the next sprint).

As the destructor calls delete on those, it is exception safe. But the scoped pointers could make it shorter (no need for the deletes in the destructor). Anyway, I'd postpone it for some future cleanup ticket. Would you like to create it?

  • FdInit::SetUp?() could leak created sockets on exception. While in [...]

That said, considering we are running out of time again, I'm okay
with deferring this, too. I thought I proposed making ScopedSocket?
a semi-public utility, so we might defer the cleanup until that
point.

I do agree that we should try to have the cleanest code possible.

Is there a ticket for this proposal? So we could add a note about it there? Or should I put some try-catch(...) block around it, as a short-term workaround?

  • it makes me nervous to define non plain-old static object like this:
    oasio::io_service DNSServerTest::service;
    
    If this really needs to be (class) static, I'd define it as a pointer and new it only when it's NULL in the constructor.

I didn't really create the variable, and I believe this usage should be safe, but I did something with it nevertheless. Instead of reusing the same service, the service is now member of the test object (so each test gets fresh one) and the static one is only a pointer to the active one.

Thanks

comment:18 in reply to: ↑ 14 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Helo

Replying to jinmei:

Second, while looking at details, a lot of design level
questions/suggestions/concerns arose. I think these are substantial
(at least to me) to some extent, but I'm afraid we don't have time to
resolve all of them before the release (even if the "resolution"
results in rejecting the issue after discussions). And, in some sense
many of these points are minor in that it would work somehow as an
initial proof of concept. So, maybe the possible way forward is to
only address easy points and merge the result, then work on more
fundamental points immediately after the release.

That sounds reasonable.

I must admit I did this branch in a way to do minimal impact on the rest of the code, which is not always for the best, and even more when I just kept misusing stuff which was there already for slightly different purposes. Even the old code had some of the problems you describe and I agree it would need some bigger refactoring and redesign. Also, the current approach doesn't use some features of the socket creator, like being able to have multiple auth servers sharing a socket (which is not possible for other reasons as well now).

I don't know if immediately after the release is the best time, though. In case we would go for some more drastic receptionist approach, it could render the portconfig module deprecated, so it would make more sense to just keep it in the proof-of-concept state for a month or two before the receptionist would be implemented.

  • I have to say I don't think it a good idea to use multiple inheritance for TestSocketRequestor?. While I'm generally more convinced that multiple inheritance does more harm than good in many cases, I'm also willing to be flexible and can live with it if it really makes the design elegant which couldn't be realized in other ways. Still, in this specific case, it seems to me an unnecessary abuse of it. I have a counter proposal of doing the same thing using more straightforward composition while avoiding (diamond style) multiple inheritance. Please see the attached diff.

I find the way with member less intuitive/harder to imagine what is going on, so I went for the way I used it. Also, the code would be longer to write at the time when I needed the change. But provided C++ misses the tools to do it cleanly (like mixins of D or roles from perl6) and you already wrote it, we can as well use your version, which is nicer code-wise.

  • although this is not in the direct scope of this branch: while thinking about various issues I encountered in the branch (detailed below), I now wonder why we need to make SocketRequestor? a singleton by design. Singleton is a very strong idiom and inherits many bad properties of global variables (the fact that we need to use something like "initTest" for tests is one of them), so we'd rather avoid relying on it unless we have a strong need for it. In our use case, the application would still need to explicitly create a specific instance either by SocketRequestor::init() or instantiating TestSocketRequestor?, we could make the init function a factory (returning an instance) and use a pointer or reference to it wherever we need to get access to it. Then the most part of the application code won't have to be aware that SocketRequestor? is a singleton, and, in particular, we should be able to simplify tests (we'd be able to just create and discard a test requestor for each test case). If we still want to make sure the requestor is never instantiated twice in normal applications by interface, we could implement that logic in the factory function.

It is singleton now for few reasons:

  • When I started implementing it, I started with a namespace with functions, like in the case of portconfig. I thought of it like a portconfig extension. But it turned out it would be hard to test without the ability to subclass it and that there could be some variables.
  • I don't see a reason why a real application would need more than one of it.
  • I didn't want to pass the object around, at last not at the time of writing it, because it would need to modify too many files, while using it as a global variable allowed me not to change the interface of portconfig.
  • more or less related to the above two points, it seems to me the portconfig logic is getting more complicated with various states (at least the requestor and "current_sockets" were newly introduced states that are cooperating in the same context), so I wonder it makes more sense to introduce a class to enclose the configuration logic. Then we'd construct a "configurator" object with a pointer or reference to the socket requestor (thereby eliminating the need for getting access to the requestor via the singleton interface) and have the object maintain "current sockets" as a class attribute (thereby eliminating the use of semi-global namespace scope static variable). But, aside from whether this idea is a good one, I also think we need to revisit how to configure listen ports in general (detailed below). So this point should be considered as part of revisiting the entire scheme.

Yes, that one grew too much.

  • can't we avoid using the global portconfig::test_mode variable? I understand we sometimes have to sacrifice the cleanness for testability as a matter of practice, but this variable is accessed far from the original module (i.e. server_common) and it's difficult to track the dependency. We could, for example, introduce an abstraction of DNSService and use a special derived class of it for tests, where addServerXXXFromFD does nothing.

It could be avoided in that way, but the branch would get even bigger than now. The way it is, the code needed for the test is quite short, but, as you say, not really nice. I'd be for leaving this out until we decide how the new interface should look like, it may turn out it wouldn't need such measures for the tests anyway.

  • personally I'd hide gtest many of the function definitions from the header file. in particular, I'd avoid directly relying on gtest specific definitions in a header file (and including gtest.h, which subsequently imports many other definitions including macros and will make this header file more susceptible to ordering issues).

But then, I'd need a .cc file to place the methods with the EXPECT_* macros into. And it would introduce a problem with ordering of library compilation, as the testutils library would need to use the server_common library and the server_common tests would need to use the testutils library. Therefore we couldn't have the tests as a subdirectory of server_common, but the lib/Makefile.am would need to include something like:

SUBDIRS = … server_common testutils server_common/tests

which looks ugly.

Or is there a solution I don't see?

Frankly, I don't know what we should do with thin in the mean time.
All these weird things seem to suggest that it's time to revisit the
whole design of port configuration, especially with taking into
account the introduction of the socket creator. On the other hand,
I don't think we can complete that task before the coming release.
One possible compromise might be to introduce this version for the
release, ignoring all these issues, and immediately revisit the
design after the release at the highest priority.

Yes, as I say above, I'd agree with this solution as well. I'd like to see if it works at last to some extent.

  • setAddresses(): what's the magic string of "dummy_app"?

The share name for the socket creator. As the share mode is set to DONT_SHARE, it doesn't matter, so I didn't like to introduce a way to configure the current application name for now. That's why there's "dummy_app" to fill something into the parameter, at last until we start using the sharing somehow. I added a comment explaining it.

  • While this is minor compared to the bigger issue explained in the bigger comment above, the recovery process in installListenAddresses() is now even more unreliable:
            try {
                setAddresses(service, addressStore);
            }
    
    because even releasing the address in setAddresses() may now fail.

Well, failure to release them would mean either of:

  • serious bug in the communication, which should be fixed soon
  • boss disappeared
  • msgq disappeared

These are extremes under which we would probably exit anyway, and I didn't see an easy way to do it in a more reliable way. Requesting (and returning) sockets from/to different application seems cumbersome as of itself.

It may or may not be (happen to be) safe depending on the details of
TestSocketRequestor?, but not a good practice anyway. But more
fundamentally I'd rather avoid using inheritance here (see the
higher level issues above). Note: this initialization issue exists
in other cases where TestSocketRequestor?() is initialized. Note
also that this problem doesn't exist in the proposed diff.

It was, as the constructor only stored references to them and they were not used before being initialized, but as you say, your way doesn't have this problem anyway.

  • I suspect this TODO is now even more impossible.
        // TODO Maybe some test to actually connect to them
    

Well, we could fake the test socket requestor to give us a real socket and then try it. And even if the TODO is hard, I believe it should still stay there, because it is a valid test improvement.

  • Shouldn't the exception class be more specific?
        // This should not bind them, but should leave the original addresses
        EXPECT_THROW(installListenAddresses(invalid_, store_, dnss_), exception);
    

Hmm, right, it could be.

  • I don't understand this comment. What's 'everything'? Perhaps you mean 'releases' instead of 'returns'?
    // Try it at least returns everything when even the rollback fails.
    

I meant returns like returns the socket back to the socket creator/boss, but you're right this could be confused with return as from function. Releases is better.

  • I'd say there will be more error cases to test. For example, there's a case when release fails. But I'm not sure if we bother to be complete here, if our next action is to revisit the design fundamentally.

Well, because I don't now how to handle the failure mode (I don't think there's any other way than just kill the application), I don't check for it (by the principle „don't check for errors you're unable to handle“).

But maybe having the case we don't know how to handle is a signal it needs redesign as well.

I don't think we can come up with anything significantly better than this, if we are about to make it in time to release. So the question is, should we include this? I guess yes, because even a possibly suboptimal user-visible feature is still user-visible and gives an impression we do something. So, do you see any other problems that are not hard-to-solve design issues? We might maybe find a short time to discuss what we want from the configuration subsystem during the F2F, at last in 2 or 3 people.

comment:19 in reply to: ↑ 18 Changed 8 years ago by jinmei

Replying to vorner:

In short, I'd give this branch a go.

I made a few more changes while checking the updates. Most should be
trivial, but please note 3fcadc1a85ca56620c418e8b18cf43e43afbe98d,
which fixed a real regression. Ideally this should have been caught
by a test.

I don't know if immediately after the release is the best time, though. In case we would go for some more drastic receptionist approach, it could render the portconfig module deprecated, so it would make more sense to just keep it in the proof-of-concept state for a month or two before the receptionist would be implemented.

My concern with this approach is that merging something to master will
create inertia against further cleanups. If we allow an option of
(possibly) cleaning them up "some time later", that some time will
never come and we'll suffer from dirty state of the design and code.
IMO, even if it may end up being thrown away, making all efforts
waste, we shouldn't allow keeping the master code too dirty for an
undecided period. So I'd insist either: merge this with a promise of
cleaning up very soon, understanding it can delay development of other
stuff such as DDNS or NSEC3, or don't merge. I believe we need
discipline before merge to avoid the bad inertia.

Which level of cleanup is necessary can be discussed, though.

As for the receptionist approach, just like the case with msgq
replacement I don't like to use it as an excuse to skip the cleanup;
we don't even know whether we adopt it, much less whether it really
solves the issues we are thinking about now.

  • although this is not in the direct scope of this branch: while thinking about various issues I encountered in the branch (detailed below), I now wonder why we need to make SocketRequestor? a singleton by design. [...]

It is singleton now for few reasons:

  • When I started implementing it, I started with a namespace with functions, like in the case of portconfig. I thought of it like a portconfig extension. But it turned out it would be hard to test without the ability to subclass it and that there could be some variables.
  • I don't see a reason why a real application would need more than one of it.
  • I didn't want to pass the object around, at last not at the time of writing it, because it would need to modify too many files, while using it as a global variable allowed me not to change the interface of portconfig.

To me these cannot be a justifiable reason for relying on a global
instance (which is what singleton is effectively). I don't request it
be changed for now, but if these are the reason I'd more strongly
suggest revisiting it as part of the cleanup.

  • personally I'd hide gtest many of the function definitions from the header file. [...]

But then, I'd need a .cc file to place the methods with the EXPECT_* macros into. And it would introduce a problem with ordering of library compilation, as the testutils library would need to use the server_common library and the server_common tests would need to use the testutils library. Therefore we couldn't have the tests as a subdirectory of server_common, but the lib/Makefile.am would need to include something like:

SUBDIRS = … server_common testutils server_common/tests

which looks ugly.

Or is there a solution I don't see?

Ah, okay. So the mutual dependency is the more fundamental problem.
Basically, when I first created this helper library (I think it was
me) I didn't intend to introduce a reverse dependency from this
library to higher level modules such as auth/resolver servers.
server_common is essentially part of these applications, so something
that depends on server_common in the testutil lib breaks the
assumption in the first place. IMO this should go (at least
conceptually) to somewhere around server_common/tests and
auth/resolver tests refer to it, independently from whether or not to
move the definition to .cc.

I suggest this should be cleaned up in a later ticket, too.

  • While this is minor compared to the bigger issue explained in the bigger comment above, the recovery process in installListenAddresses() is now even more unreliable:
            try {
                setAddresses(service, addressStore);
            }
    
    because even releasing the address in setAddresses() may now fail.

Well, failure to release them would mean either of:

  • serious bug in the communication, which should be fixed soon
  • boss disappeared
  • msgq disappeared

These are extremes under which we would probably exit anyway, and I didn't see an easy way to do it in a more reliable way. Requesting (and returning) sockets from/to different application seems cumbersome as of itself.

That's probably true that they are extremes, but the common practice
is that releasing resource (close(), free(), delete, etc) should
generally be 100% error free even considering such extremes. So IMO
we should basically try to meet the common expectation.

I think that in the possible redesign of port config, we should
separate the error free part and other related operations that can
fail. The former is local only operations such as closing the socket
or releasing the corresponding server object. Once making that sure,
we can perform the other part, taking into account failure cases and
whether or how to recover from it.

But as you said, it's also true that the idea of the socket creator
approach itself is cumbersome. We are now seeing specific cases
around it, and the end result may be that it's not a good idea after
all for production systems. It will be a pity if that's a case, but I
think it's okay as long as we are allowed to use time for such
try-and-errors - there are something we cannot know until we actually
play with them.

I don't think we can come up with anything significantly better than this, if we are about to make it in time to release. So the question is, should we include this? I guess yes, because even a possibly suboptimal user-visible feature is still user-visible and gives an impression we do something. So, do you see any other problems that are not hard-to-solve design issues? We might maybe find a short time to discuss what we want from the configuration subsystem during the F2F, at last in 2 or 3 people.

As stated above, I'm okay with including it if we make a promise of
making the follow up cleanups very soon at a higher priority.

comment:20 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:21 follow-up: Changed 8 years ago by vorner

Hello

I'll skip the discussion for today, as I spent most of my time trying to make it work in practice. I wasn't yet fully successful, but it gets nearer. I fixed some problems, the last commit is some work in progress while trying to find a problem. For example, the close doesn't have a check if it succeeded (but it is unclear on what to do if it didn't).

The current problem is, it leaks the socket somewhere. It provides the socket the first time, but the startup is kind of broken, so it resets the sockets once again, it returns them (according to the log messages), but it fails to create them again with „Address already in use“. If you would like to have a look (or anybody else), it's in the branch trac805-merged. Unluckily for me, I can't do much testing, I have problems installing, but I don't know exactly why.

comment:22 in reply to: ↑ 21 ; follow-up: Changed 8 years ago by jinmei

Replying to vorner:

The current problem is, it leaks the socket somewhere. It provides the socket the first time, but the startup is kind of broken, so it resets the sockets once again, it returns them (according to the log messages), but it fails to create them again with „Address already in use“. If you would like to have a look (or anybody else), it's in the branch trac805-merged. Unluckily for me, I can't do much testing, I have problems installing, but I don't know exactly why.

It seems to me that 01d2bc6 fixed the leak. I suspect, from that you
(reportedly) couldn't install it, that you tested it "from source"
using run_bind10.sh without even selectively installing the socket
creator program.

If my guess is correct, you are hit by another, unrelated bug that the
bind10.special_component module unconditionally invokes the installed
version of socket creator. If that is the case please try the
attached patch.

As for how to test SO_REUSEADDR, if it's okay to actually get the
socket the creator code we can test it using getsockopt.

I've noticed a few of other things:

  • at least per API even close(2) could fail. I believe we should catch that case and kill the socket creator should that happen.
  • same for send_fd()
  • sockcreator.cc breaks many style guidelines.

Changed 8 years ago by jinmei

comment:23 in reply to: ↑ 22 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

It seems to me that 01d2bc6 fixed the leak. I suspect, from that you
(reportedly) couldn't install it, that you tested it "from source"
using run_bind10.sh without even selectively installing the socket
creator program.

If my guess is correct, you are hit by another, unrelated bug that the
bind10.special_component module unconditionally invokes the installed
version of socket creator. If that is the case please try the
attached patch.

Thanks, yes, you're right. I included the patch. I'll have to solve the installation problem of mine sometime.

As for how to test SO_REUSEADDR, if it's okay to actually get the
socket the creator code we can test it using getsockopt.

I added the test. I also added another option ‒ the ipv6 only for INET6 sockets the same way, to keep compatibility. But I think we might want to rethink using this one, as the only „problem“ it solves is different default list of addresses for different systems.

  • at least per API even close(2) could fail. I believe we should catch that case and kill the socket creator should that happen.
  • same for send_fd()

I added the termination of the application. But I'm not sure if we really do want to do that, if it isn't better to leak the socket than to bring the whole bind10 down. Anyway, close shouldn't fail for the sockets in socket creator, so it probably doesn't matter.

  • sockcreator.cc breaks many style guidelines.

Well, the code there is 1.5 years old, maybe some of the guidelines are newer than that. If we want to clean it up, should it go to another ticket?

Thank you

comment:24 in reply to: ↑ 23 Changed 8 years ago by jinmei

Replying to vorner:

If my guess is correct, you are hit by another, unrelated bug that the
bind10.special_component module unconditionally invokes the installed
version of socket creator. If that is the case please try the
attached patch.

Thanks, yes, you're right. I included the patch. I'll have to solve the installation problem of mine sometime.

Okay, good to know that. And I believe we can now remove the setting
of "PATH" from run_bind10.sh. (But maybe we want to defer it to a
separate cleanup ticket to avoid unexpected regression due to that).

As for how to test SO_REUSEADDR, if it's okay to actually get the
socket the creator code we can test it using getsockopt.

I added the test. I also added another option ‒ the ipv6 only for INET6 sockets the same way, to keep compatibility. But I think we might want to rethink using this one, as the only „problem“ it solves is different default list of addresses for different systems.

The test didn't pass on my environment (I suspect it doesn't pass for
BSD variants in general). I've fixed it. See b37cfa5.

As for IPV6_V6ONLY, we need it. Maybe you are not aware of it (not
surprisingly - it's quite subtle) it does prevent various kinds of
real harm (but I wouldn't bother to discuss details now).

  • at least per API even close(2) could fail. I believe we should catch that case and kill the socket creator should that happen.
  • same for send_fd()

I added the termination of the application. But I'm not sure if we really do want to do that, if it isn't better to leak the socket than to bring the whole bind10 down. Anyway, close shouldn't fail for the sockets in socket creator, so it probably doesn't matter.

I've made slight modifications - see 61d3224. Regarding whether
aborting is the best option, we can discuss it. Personally, I'd
rather stop everything if the socket creator encounters an unexpected
error. I'd generally assume it's designed and implemented to be very
simple and robust and practically never see such cases - if we really
need to worry about cases where the creator actually has unexpected
failure and triggers the entire system stop, I'm afraid that it
indicates either the design or the implementation of the creator
should be revisited. Also, since the socket creator doesn't leave
logs, ignoring failures doesn't seem to be good in general.

  • sockcreator.cc breaks many style guidelines.

Well, the code there is 1.5 years old, maybe some of the guidelines are newer than that. If we want to clean it up, should it go to another ticket?

Most of the guidelines that it breaks (in particular: camel typing for
functions and using () for return) had existed even before you joined
BIND10.

It's easy to fix them, but at this stage it's probably better to
complete the branch first. So a separate cleanup ticket is okay.

As we mention cleanups, I have a couple of more things to fix in that
context (okay in the separate ticket):

  • I'd avoid using hardcoded magic numbers.
  • I'd really avoid the complicated macro used in the test. That's largely my personal preference, but I encountered one practical issue in this branch due to the use of macro. As noted above, some of the getsockopt tests failed for me. But gtest indicated the line where the macro was used, not the exact point in the macro where it failed, e.g.
    sockcreator_tests.cc:106: Failure
    Value of: on
      Actual: 4
    Expected: 1
    
    And line 106 (at that time, in my local modified copy) was this one:
        TEST_ANY_CREATE(SOCK_DGRAM, sockaddr_in, AF_INET, sin_family, INADDR_SET,
                        UDP_CHECK);
    
    which took me some time to identify the specific test that failed. Eliminating the use of macro may not be the only solution to such problems, but taking into account many other known bad effects of macros, I'd say this incident is a good reason for cleaning it up.

And, one last comment:

  • can this comment be now removed?
                        // FIXME: Check the output and write a test for it
    

comment:25 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:26 Changed 8 years ago by jinmei

I've played with it more and noticed several things to be fixed.
I've committed and pushed the fixes to the trac805-merged branch.

Another thing: I don't think /tmp is the best place for the UNIX
domain socket to exchange file descriptors, especially in the
installed environment. @@LOCALSTATEDIR@@/@PACKAGE@ should be better.

comment:27 follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

I merged to master after removing the comment you mentioned. I'll keep the ticket open for a while, I want to go through it to see if all tickets arising from it were created and if there's anything that needs further discussion.

And I noticed it is missing a changelog entry:

[func]        vorner,jelte,jinmei
The socket creator is now used to provide sockets. It means you can reconfigure the ports and addresses
at runtime even when the rest of the bind10 runs as non root user.

comment:28 in reply to: ↑ 27 ; follow-up: Changed 8 years ago by jinmei

Replying to vorner:

I merged to master after removing the comment you mentioned. I'll keep the ticket open for a while, I want to go through it to see if all tickets arising from it were created and if there's anything that needs further discussion.

It's up to you. Is there any reason you gave the ticket back to me?

And I noticed it is missing a changelog entry:

[func]        vorner,jelte,jinmei
The socket creator is now used to provide sockets. It means you can reconfigure the ports and addresses
at runtime even when the rest of the bind10 runs as non root user.

You previously proposed a changelog entry:-) But in any event this
version looks okay, too.

comment:29 in reply to: ↑ 28 Changed 8 years ago by vorner

  • Owner changed from jinmei to vorner

Hello

Replying to jinmei:

It's up to you. Is there any reason you gave the ticket back to me?

Just the changelog. I'm taking it back, now.

You previously proposed a changelog entry:-) But in any event this
version looks okay, too.

I must be blind, this fine <what time of the day it is with timeshift> O:-).

Thank you

comment:30 Changed 8 years ago by vorner

  • Resolution set to complete
  • Status changed from reviewing to closed
  • Total Hours changed from 0 to 20

Tickets created, closing this one.

Note: See TracTickets for help on using tickets.