Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#1429 closed task (complete)

Commands for sockets

Reported by: vorner Owned by: vorner
Priority: medium Milestone: Sprint-20111220
Component: ~Boss of BIND (obsolete) Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: Core Feature Depending on Ticket: Socket creator
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 2.53 Internal?: no

Description

This is subtask of #802 and will connect code created in #1427 and #1428 together. The interfaces are already created as part of trac802 branch, so this doesn't have to wait for the other subtasks.

It should:

  • Create the _socket_cache as isc.bind10.socket_cache.Cache after the socket creator is created.
  • Create a CC command for requesting a socket. It should call the _socket_cache.get_token and send the token returned and the path for unix domain socket (in _socket_path) back over the CC.
  • Create a command to release a socket (identified by a corresponding token). It should call the _socket_cache.drop_socket.
  • Implement the socket_request_handler. It should call the _socket_cache.get_socket and send the socket over unix_socket. Something from the unix_socket (maybe the filenum?) should be used as the application identifier parameter of get_socket.
  • Implement the socket_consumer_dead. It should call _socket_cache.drop_application.

Subtickets

Attachments (1)

bind10-test.diff (1.6 KB) - added by jinmei 8 years ago.

Download all attachments as: .zip

Change History (20)

comment:1 Changed 8 years ago by vorner

  • Owner set to vorner
  • Status changed from new to accepted

comment:2 Changed 8 years ago by vorner

  • Owner changed from vorner to UnAssigned
  • Status changed from accepted to reviewing

It should be ready for review. It is based on the trac802 code, so it should be reviewed since 51f3cb54492ef02e4951afb15a9c40ba0cdff4ce.

There are some small edits elsewhere than required by this task, but both are needed because of tests and should be done by #1427 and #1428 and will be removed/replaced on the merge. There are both commented.

Thanks

comment:3 Changed 8 years ago by jinmei

Test didn't pass:

ERROR: test_insert_creator (__main__.TestBoB)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jinmei/src/isc/git/bind10-1429/src/bin/bind10/tests/bind10_test.py", line 337, in test_insert_creator
    self.assertIsInstance(bob._socket_cache, isc.bind10.socket_cache.Cache)
AttributeError: 'TestBoB' object has no attribute 'assertIsInstance'

======================================================================
ERROR: test_get_socket_error (__main__.TestCacheCommands)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jinmei/src/isc/git/bind10-1429/src/bin/bind10/tests/bind10_test.py", line 260, in test_get_socket_error
    check_code(1, mod_args('port', 65536))
  File "/Users/jinmei/src/isc/git/bind10-1429/src/bin/bind10/tests/bind10_test.py", line 250, in check_code
    self.assertIsInstance(ranswer, str)
AttributeError: 'TestCacheCommands' object has no attribute 'assertIsInstance'

======================================================================
ERROR: test_get_socket_ok (__main__.TestCacheCommands)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jinmei/src/isc/git/bind10-1429/src/bin/bind10/tests/bind10_test.py", line 229, in test_get_socket_ok
    self.assertIsInstance(addr, IPAddr)
AttributeError: 'TestCacheCommands' object has no attribute 'assertIsInstance'

comment:4 Changed 8 years ago by jinmei

  • Owner changed from UnAssigned to vorner

comment:5 Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Ah, sorry, of course. Python 3.1 doesn't have isInstance, replaced by assertTrue(isinstance).

comment:6 follow-up: Changed 8 years ago by jinmei

bind10_src.py

  • command_handler() is getting quite big. Maybe we should start considering refactoring it, e.g., using the command design pattern. but in any event it's not a direct subject of this ticket. this is just a comment at a higher level.
  • I don't know the details of socket cache's drop_socket() implementation, but depending on how it handles invalid input, isn't it simpler to let it check the case where 'token' isn't specified?
                elif command == "drop_socket":
                    try:
                        self._socket_cache.drop_socket(args.get("token"))
                        answer = isc.config.ccsession.create_answer(0)
                    except Exception as e:
                        answer = isc.config.ccsession.create_answer(1, str(e))
    
    
  • socket_request_handler
    • Why is logging still TODO? Same for socket_consumer_dead()
    • I don't like to see a magic number (or string) like "1\n" or "0\n" hardcoded this way.
    • I don't know how exactly this method is used, but get_socket()/sendall()/send_fd() could block, right? Is it okay for the boss to hang due to an unresponsive remote end? This may not only be for the socket passing, but would also be an issue for, e.g., command interface even if it's not hang but delay, but as we introduce more communication channels/blocking points, I think we should start considering the effect more seriously. Not necessarily a blocking issue for this ticket, though.
  • socket_consumer_dead()
    • is it okay not to catch exceptions?
  • if the argument to drop_application() is the same one as the 'application' parameter of get_socket(), the assumptions seem to be consistent: here it assumes to be an FD; in the latter it seems to be undecided yet according to the docstring.
  • insert_creator(): maybe set_creator() is a better name based on general convention used throughout the BIND 10 code

bind10_test.py

  • Maybe we should rather import everything from bind10_src now?
    from bind10_src import *
    #from bind10_src import ProcessInfo, BoB, parse_args, dump_pid, unlink_pid_file, _BASETIME
    #import bind10_src
    
  • "get_socket" test: I'd use an IPv6 address for this test (so we can cover both the future and mostly-deprecated versions of IP). We'll need to replace _socket_cache.get_token to make it meaningful.

comment:7 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:8 Changed 8 years ago by jinmei

Forgot to mention/push this: I've noticed some minor editorial issues
and directly fixed them on the branch.

comment:9 in reply to: ↑ 6 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

  • command_handler() is getting quite big. Maybe we should start considering refactoring it, e.g., using the command design pattern. but in any event it's not a direct subject of this ticket. this is just a comment at a higher level.

Yes, it might be nice to have a dict name:function and pick them from there, or something like this. But it's not something I'd like to include here, while simple, it will be change with large diff and will take nontrivial time.

  • I don't know the details of socket cache's drop_socket() implementation, but depending on how it handles invalid input, isn't it simpler to let it check the case where 'token' isn't specified?
                elif command == "drop_socket":
                    try:
                        self._socket_cache.drop_socket(args.get("token"))
                        answer = isc.config.ccsession.create_answer(0)
                    except Exception as e:
                        answer = isc.config.ccsession.create_answer(1, str(e))
    
    

Well, it would certainly be slightly simpler, but the error message would be less descriptive, as it would say something like there's no socket identified by token None. On the other hand, user probably should never call this method manually, so it's a question what we want to do.

  • socket_request_handler
    • Why is logging still TODO? Same for socket_consumer_dead()

It was caused by inherently unreliable human memory. It should be fixed now.

  • I don't know how exactly this method is used, but get_socket()/sendall()/send_fd() could block, right? Is it okay for the boss to hang due to an unresponsive remote end? This may not only be for the socket passing, but would also be an issue for, e.g., command interface even if it's not hang but delay, but as we introduce more communication channels/blocking points, I think we should start considering the effect more seriously. Not necessarily a blocking issue for this ticket, though.

Yes, they could block. Actually, get_socket can't block, it's get_token which can, as it talks to the socket creator.

However, I believe that:

  • Socket creator is on the same machine and the chance of it to lock up is as low as having an infinite loop in the boss itself (or I could think even lower, as the boss is more complicated code). So the only delay here would be two context switches, which is less than is needed for a command to pass to remote application and answer to return.
  • The sendall and send_fd ‒ the boss is sending only small amounts of data. It should fit into the OS buffer of the socket without any problems. If the remote application locks up, it doesn't request any new sockets, so the buffer wouldn't fill.
  • There'll be only small number of sockets requested and only on startup and reconfiguration, so even if there are small delays, they shouldn't cause any performance problems.

On the other hand, if we were to make sure they don't block, the handling around select would become significantly more complicated or we would need to introduce threads, with some kind of locking, etc.

My assumptions may be wrong, but if they are right, then solving the problem would be waste of time. So I'd propose waiting with the solution until we find out it is a problem. Would it be OK?

  • socket_consumer_dead()
    • is it okay not to catch exceptions?

Ups, you're right. It's maybe even worse than you thought, as the exception can happen in normal operation and is perfectly OK. Added.

  • if the argument to drop_application() is the same one as the 'application' parameter of get_socket(), the assumptions seem to be consistent: here it assumes to be an FD; in the latter it seems to be undecided yet according to the docstring.

Hmm, I fixed the docstring in the cache. Or did you mean some other? The cache itself doesn't care, so it was actually decided in this ticket.

  • insert_creator(): maybe set_creator() is a better name based on general convention used throughout the BIND 10 code

OK, no problem.

bind10_test.py

  • Maybe we should rather import everything from bind10_src now?
    from bind10_src import *
    #from bind10_src import ProcessInfo, BoB, parse_args, dump_pid, unlink_pid_file, _BASETIME
    #import bind10_src
    

The lower import of bind10_src is there because I need to replace a call from library that one uses, this one:

bind10_src.libutil_io_python.send_fd = self.__send_fd

I'm not sure if it worked if it was imported by * (eg. if * imports the things the other module imports into itself), but even if it did, I thing it would be quite unreadable. This intention is more visible I believe.

  • "get_socket" test: I'd use an IPv6 address for this test (so we can cover both the future and mostly-deprecated versions of IP). We'll need to replace _socket_cache.get_token to make it meaningful.

Hmm, I'm not sure if I got the correct idea which test you mean. The stuff in this ticket only passes the address through (OK, it creates the IPAddr from the string), so it doesn't really matter which address it is. It is distinguished in the IPAddr class (which is tested) and deep down in the socket creator class that encodes it to the socket and sends it to the socket creator process (which is tested too). Did you mean this test, or another?

Thank you

With regard

comment:10 in reply to: ↑ 9 ; follow-up: Changed 8 years ago by jinmei

Replying to vorner:

  • socket_request_handler
    • Why is logging still TODO? Same for socket_consumer_dead()

It was caused by inherently unreliable human memory. It should be fixed now.

  • BIND10_LOST_SOCKET_CONSUMER doesn't include the parameter in the message description, resulting in:
    BIND10_LOST_SOCKET_CONSUMER consumer of sockets disconnected, considering all its sockets closed @@Missing placeholder %1 for '42'@@
    
    (If you normally don't do this) I suggest trying to have the program print newly added log message by 'B10_LOGGER_DESTINATION=stdout make check | grep <msg_id>'. This is my usual practice when I add a new log ID.
  • Also about BIND10_LOST_SOCKET_CONSUMER: I'd not categorize it as an "error". Even though it may be somewhat unexpected if the remote process is abruptly terminated, it can happen and in the boss's point of view in this context it's an expected event that the connection is reset in this case. So I'd suggest using INFO in this case.
  • For a similar reason I'm not sure if "error" is good for BIND10_NO_SOCKET, although this case is mooter because it might be an internal error, too. I'd still think info is better if the "most common reason" is outside the boss process, but I'd leave it to you.
  • As for the description of BIND10_LOST_SOCKET_CONSUMER, if the applications are generally expected to gently release the requested sockets before shutting down themselves, I'd clarify that "terminated" is something like unexpected crash or something.
  • I don't know how exactly this method is used, but get_socket()/sendall()/send_fd() could block, right? [...]

Yes, they could block. Actually, get_socket can't block, it's get_token which can, as it talks to the socket creator.

However, I believe that:

[...]

On the other hand, if we were to make sure they don't block, the handling around select would become significantly more complicated or we would need to introduce threads, with some kind of locking, etc.

My assumptions may be wrong, but if they are right, then solving the problem would be waste of time. So I'd propose waiting with the solution until we find out it is a problem. Would it be OK?

My experiences tell me that such optimistic assumptions on system
reliability are often broken with a surprise in production systems of
the real world. But in this case I agree that the possible concerns
do not outweigh the additional implementation complexity (in fact I
didn't mean we should address this particular blocking issue within
this ticket). My suggestion is to leave some comments about
considerations on blocking for now.

  • socket_consumer_dead()
    • is it okay not to catch exceptions?

Ups, you're right. It's maybe even worse than you thought, as the exception can happen in normal operation and is perfectly OK. Added.

Okay, but is it okay to only catch ValueError??

  • if the argument to drop_application() is the same one as the 'application' parameter of get_socket(), the assumptions seem to be consistent: here it assumes to be an FD; in the latter it seems to be undecided yet according to the docstring.

Hmm, I fixed the docstring in the cache. Or did you mean some other? The cache itself doesn't care, so it was actually decided in this ticket.

Oops I meant "inconsistent" here...anyway, the revised version looks
okay on this point.

bind10_test.py

  • Maybe we should rather import everything from bind10_src now?
    from bind10_src import *
    #from bind10_src import ProcessInfo, BoB, parse_args, dump_pid, unlink_pid_file, _BASETIME
    #import bind10_src
    

The lower import of bind10_src is there because I need to replace a call from library that one uses, this one:

bind10_src.libutil_io_python.send_fd = self.__send_fd

I'm not sure if it worked if it was imported by * (eg. if * imports the things the other module imports into itself), but even if it did, I thing it would be quite unreadable. This intention is more visible I believe.

It just looked a bit awkward to me to have two imports of the same
module, at least in that the policy of whether to omit the
"bind10_src" prefix is inconsistent. Not a big deal anyway, but may
be adding comments like this?

# In some cases we allow omitting "bind10_src" for brevity, in some
# other cases we'll be more explicit to make it clear it belongs to
# the bind10_src module.
from bind10_src import ProcessInfo, BoB, parse_args, dump_pid, unlink_pid_file, _BASETIME
import bind10_src
  • "get_socket" test: I'd use an IPv6 address for this test (so we can cover both the future and mostly-deprecated versions of IP). We'll need to replace _socket_cache.get_token to make it meaningful.

Hmm, I'm not sure if I got the correct idea which test you mean. The stuff in this ticket only passes the address through (OK, it creates the IPAddr from the string), so it doesn't really matter which address it is. It is distinguished in the IPAddr class (which is tested) and deep down in the socket creator class that encodes it to the socket and sends it to the socket creator process (which is tested too). Did you mean this test, or another?

I meant this one:

        bob._get_socket = get_socket
        args = {
            "port": 53,
            "address": "0.0.0.0",
            "protocol": "UDP",
            "share_mode": "ANY",
            "share_name": "app"
        }
        # Test it just returns whatever it got. The real function doesn't
        # work like this, but we don't want the command_handler to touch it
        # at all and this is the easiest way to check.
        self.assertEqual({'result': [0, args]},
                         bob.command_handler("get_socket", args))

And, recognizing you changed the IPv4 address used in
TestCacheCommands? to an IPv6 address, what I would have suggested is
something like an attached diff.

Changed 8 years ago by jinmei

comment:11 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:12 in reply to: ↑ 10 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

  • BIND10_LOST_SOCKET_CONSUMER doesn't include the parameter in the message description, resulting in:
    BIND10_LOST_SOCKET_CONSUMER consumer of sockets disconnected, considering all its sockets closed @@Missing placeholder %1 for '42'@@
    
    (If you normally don't do this) I suggest trying to have the program print newly added log message by 'B10_LOGGER_DESTINATION=stdout make check | grep <msg_id>'. This is my usual practice when I add a new log ID.

I often am too lazy to do so, as I need to read the documentation of the logging system to know what the correct name of the environment variable is :-|. I'll have to write a script to do it for me.

  • As for the description of BIND10_LOST_SOCKET_CONSUMER, if the applications are generally expected to gently release the requested sockets before shutting down themselves, I'd clarify that "terminated" is something like unexpected crash or something.

Well, it doesn't have to crash. Even if it returns all the sockets correctly, it still needs to close the unix-domain socket and the boss doesn't distinguish the difference. Maybe it could, but I prefer not to as it doesn't seem to bring anything important functionality-wise.

  • socket_consumer_dead()
    • is it okay not to catch exceptions?

Ups, you're right. It's maybe even worse than you thought, as the exception can happen in normal operation and is perfectly OK. Added.

Okay, but is it okay to only catch ValueError??

The function is not supposed to raise anything else. It should be simple enough and touches only local data. So, if it raises anything else, it is either a programming error or something really nasty, like out-of-memory. I think it is best to just let it go through and boss to terminate.

I meant this one:

        bob._get_socket = get_socket
        args = {
            "port": 53,
            "address": "0.0.0.0",
            "protocol": "UDP",
            "share_mode": "ANY",
            "share_name": "app"
        }
        # Test it just returns whatever it got. The real function doesn't
        # work like this, but we don't want the command_handler to touch it
        # at all and this is the easiest way to check.
        self.assertEqual({'result': [0, args]},
                         bob.command_handler("get_socket", args))

And, recognizing you changed the IPv4 address used in
TestCacheCommands? to an IPv6 address, what I would have suggested is
something like an attached diff.

That slightly changes the semantics of the test, but I don't oppose. I cleaned it a little bit.

Thanks

comment:13 in reply to: ↑ 12 Changed 8 years ago by jinmei

Replying to vorner:

Looks okay, except this one:

this ticket). My suggestion is to leave some comments about
considerations on blocking for now.

which just seems to be forgotten (it's not even clear whether you
agreed or disagreed with the suggestion).

comment:14 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:15 follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Ah, sorry, I overlooked it. It kind of looked like you agree with my intention not solving it in this ticket, so I probably didn't read it thoroughfully enough. Added.

comment:16 in reply to: ↑ 15 Changed 8 years ago by jinmei

Replying to vorner:

Ah, sorry, I overlooked it. It kind of looked like you agree with my intention not solving it in this ticket, so I probably didn't read it thoroughfully enough. Added.

Okay, please feel free to merge.

comment:17 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:18 Changed 8 years ago by vorner

  • Resolution set to complete
  • Status changed from reviewing to closed

comment:19 Changed 8 years ago by jinmei

  • Total Hours changed from 0 to 2.53
Note: See TracTickets for help on using tickets.