Opened 9 years ago

Closed 9 years ago

#352 closed enhancement (fixed)

Improve Xfrout request handling strategy

Reported by: zzchen_pku Owned by: zhanglikun
Priority: medium Milestone:
Component: xfrout Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: 0.0 Add Hours to Ticket: 0
Total Hours: 1.5 Internal?: no

Description

Xfrout server polls for shutdown every 0.1 seconds. Polling reduces Xfrout server's responsiveness to a shutdown request and wastes cpu at all other times. Consider using interrupt-driven instead of polling.

Subtickets

Change History (17)

comment:1 Changed 9 years ago by zhanglikun

  • Owner set to jinmei
  • Status changed from new to reviewing

Hi jinmei, would you like to have a review the branch? since the problem was reported by you, :). Code has finished in trac352. Thanks

comment:2 in reply to: ↑ description ; follow-up: Changed 9 years ago by jinmei

  • Owner changed from jinmei to zzchen_pku

Replying to zzchen_pku:

Xfrout server polls for shutdown every 0.1 seconds. Polling reduces Xfrout server's responsiveness to a shutdown request and wastes cpu at all other times. Consider using interrupt-driven instead of polling.

The idea looks good, but I have some possibly substantial comments about the implementation.

serve_mixin.py

  • I think we need more documentation about this class. there seem to be many hidden assumptions, which are not clear from the code itself and the current documentation. For example, it seems this class is assumed to be used with socketserver.ThreadingMixIn?, but it's not clear. It's also not clear whether we expect to have multiple (derived) instances of this mixin class. Also non obvious is what we'd expect with poll_interval, which is unused. Overall, we need to descrive the intention of the class, expected usage, and other non trivial implementation decisions.
  • Related to the point of whether we have multiple instances, is it okay to make _is_shut_down and _read/write_sock class variables? If I understana it correctly they will be shared by all instances, and I suspect it would cause undesirable effects.
  • we may want to declare ServeMixIn?._serving as "private" more clearly by renaming it to serving
  • Related to the documentation issue, just from the code the purpose of the poll_interval of serve_forever() isn't clear because it's unused. we need some note in comments about this point. Also, I'm not sure whether we need to bother to provide the default parameter.
  • There's inconsistency between comment and code:
                # block until the self.socket or self._read_sock is readable 
                try:
                    r, w, e = select.select([self, self._read_sock], [], [])
    
    comment says self.socket whle the code uses just self. Also, while not incorrect, having "self" (which is a socketserver object) and "_read_sock" (which is a socket object) in a list seems a bit awkward.
  • I'd add a sanity check for the received data from the socket pair, i.e., check to see if the received data is 'anydata'. (and I'd define a constant variable to minimize hardcoding the magic term)

serve_mixin_test.py

  • I suggest replace 'localhost' with 127.0.0.1. the former may require transaction with an external DNS server, which may block and/or fail.

utils/tests/Makefile.am

  • there's a typo that makes the test fail. I've fixed it in the branch (r3138)

ChangeLog

  • is the "author" (=zhang likun) correct?

comment:3 Changed 9 years ago by jinmei

  • Add Hours to Ticket changed from 0.0 to 1.5
  • Total Hours changed from 0.0 to 1.5

comment:4 in reply to: ↑ 2 ; follow-up: Changed 9 years ago by zhanglikun

Replying to jinmei:

Replying to zzchen_pku:

Xfrout server polls for shutdown every 0.1 seconds. Polling reduces Xfrout server's responsiveness to a shutdown request and wastes cpu at all other times. Consider using interrupt-driven instead of polling.

The idea looks good, but I have some possibly substantial comments about the implementation.

serve_mixin.py

  • I think we need more documentation about this class. there seem to be many hidden assumptions, which are not clear from the code itself and the current documentation. For example, it seems this class is assumed to be used with socketserver.ThreadingMixIn?, but it's not clear. It's also not clear whether we expect to have multiple (derived) instances of this mixin class. Also non obvious is what we'd expect with poll_interval, which is unused. Overall, we need to descrive the intention of the class, expected usage, and other non trivial implementation decisions.

Agree, will do as your suggestion.

  • Related to the point of whether we have multiple instances, is it okay to make _is_shut_down and _read/write_sock class variables? If I understana it correctly they will be shared by all instances, and I suspect it would cause undesirable effects.

it's class variable, but it was used as self._is_shut_down, then the class variable will change to instance variable, so it will not cause undesirable effects. Also, this is what I learned from ThreadingMixIn?() in python code.

  • we may want to declare ServeMixIn?._serving as "private" more clearly by renaming it to serving

yes. I will do some test, since TCPServer() also has that variable with the same name.

  • Related to the documentation issue, just from the code the purpose of the poll_interval of serve_forever() isn't clear because it's unused. we need some note in comments about this point. Also, I'm not sure whether we need to bother to provide the default parameter.

I just want to keep the interface be same, since serve_forever() in base class has the parameter "poll_interval", though it never be used in the mixin class. I should comment it in the code

  • There's inconsistency between comment and code:
                # block until the self.socket or self._read_sock is readable 
                try:
                    r, w, e = select.select([self, self._read_sock], [], [])
    
    comment says self.socket whle the code uses just self. Also, while not incorrect, having "self" (which is a socketserver object) and "_read_sock" (which is a socket object) in a list seems a bit awkward.

I also feel awkward at begining, since it was copied from python's source code, after checking the documentation of select(), it says The first three arguments are sequences of ‘waitable objects’: either integers representing file descriptors or objects with a parameterless method named fileno() returning such an integer:
. I don't know whether I should change self to self.fileno(), since it's really python style.

  • I'd add a sanity check for the received data from the socket pair, i.e., check to see if the received data is 'anydata'. (and I'd define a constant variable to minimize hardcoding the magic term)

Yes, agree. will do it.

serve_mixin_test.py

  • I suggest replace 'localhost' with 127.0.0.1. the former may require transaction with an external DNS server, which may block and/or fail.

Good suggestion

utils/tests/Makefile.am

  • there's a typo that makes the test fail. I've fixed it in the branch (r3138)

Thanks for your fixing. How did I make this mistake?, :)

ChangeLog

  • is the "author" (=zhang likun) correct?

Right, jinmei, it was did by me.

comment:5 in reply to: ↑ 4 Changed 9 years ago by zhanglikun

  • Related to the point of whether we have multiple instances, is it okay to make _is_shut_down and _read/write_sock class variables? If I understana it correctly they will be shared by all instances, and I suspect it would cause undesirable effects.

it's class variable, but it was used as self._is_shut_down, then the class variable will change to instance variable, so it will not cause undesirable effects. Also, this is what I learned from ThreadingMixIn?() in python code.

I just find I made a mistake, I will move these class variable to be object variable.

comment:6 follow-up: Changed 9 years ago by zhanglikun

  • Owner changed from zzchen_pku to jinmei

The lastest change has been committed, see r3149, please have a review. thanks.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 9 years ago by jinmei

  • Owner changed from jinmei to zhanglikun

Replying to zhanglikun:

The lastest change has been committed, see r3149, please have a review. thanks.

First, there are some redundant white spaces after EOLs. I've removed them in r3167.

Second, now that we've seen real race conditions (Trac #335), I've re-reviewed the entire code more carefully, and have had additional concerns. Revised review comments with the new issues follow:

  • I don't think documentation is sufficient yet. it still doesn't talk about why and when we need this mixin; it doesn't explain the seeming assumption of using threads (in the description of the ServeMixIn? class). I suggest you refer to socketserver.py and provide that level of details.
  • I still don't see why we need to give a specific default value (i.e. 0.5) to poll_interval. since it's not used, I'd specify None.
  • this operation seems to be redundant:
            self.__is_shut_down.clear()
    
    because at this point __is_shut_down should always be cleared.
  • do we need self.__serving? in which case do we need it even if we already have synchronization mechanisms? note also that accessing self.__serving isn't protected and can cause a race condition (consider, e.g. the case shutdown() is called immediately after invoking the thread, then serve_forever() is called from the thread. __serving will never be False.).
  • I'm afraid this raise is dangerous:
                except select.error as err:
                    if err.args[0] != EINTR:
                        raise
    
    consider the case where shutdown is called and the parent thread is waiting on __is_shut_down. what if the above raise is triggered under this condition? doesn't it cause a deadlock?

comment:8 in reply to: ↑ 7 ; follow-up: Changed 9 years ago by zhanglikun

  • Owner changed from zhanglikun to jinmei

Replying to jinmei:

Replying to zhanglikun:

The lastest change has been committed, see r3149, please have a review. thanks.

First, there are some redundant white spaces after EOLs. I've removed them in r3167.

Thanks for your fix, :)

Second, now that we've seen real race conditions (Trac #335), I've re-reviewed the entire code more carefully, and have had additional concerns. Revised review comments with the new issues follow:

The race condition problem should has been fixed in r3270, please have a look.

  • I don't think documentation is sufficient yet. it still doesn't talk about why and when we need this mixin; it doesn't explain the seeming assumption of using threads (in the description of the ServeMixIn? class). I suggest you refer to socketserver.py and provide that level of details.

Update some documentation. the mixin doesn't assumpt of using threads.

  • I still don't see why we need to give a specific default value (i.e. 0.5) to poll_interval. since it's not used, I'd specify None.

Already done as you suggested in the latest revision.

  • this operation seems to be redundant:
            self.__is_shut_down.clear()
    
    because at this point __is_shut_down should always be cleared.

Good suggestion. :) thanks

  • do we need self.__serving? in which case do we need it even if we already have synchronization mechanisms? note also that accessing self.__serving isn't protected and can cause a race condition (consider, e.g. the case shutdown() is called immediately after invoking the thread, then serve_forever() is called from the thread. __serving will never be False.).

race condition should has been fixed in the revision.

  • I'm afraid this raise is dangerous:
                except select.error as err:
                    if err.args[0] != EINTR:
                        raise
    
    consider the case where shutdown is called and the parent thread is waiting on __is_shut_down. what if the above raise is triggered under this condition? doesn't it cause a deadlock?

the raise has been removed, if the exception select.error, just continue.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 9 years ago by jinmei

First off, if I understand it correctly, the change to the mixin requires changes to other part of the code (e.g. xfrout), but they've not been modified.

Secondly, while I re-read the documentation of the standard socketserver library, I noticed serve_forever() isn't expected to be overridden.

class TCPServer(BaseServer):
    ...
    Methods for the caller:
    ...
    - serve_forever(poll_interval=0.5)
    ...
    Methods that may be overridden:
    ...

Doesn't this mean this mixin implementation is somehow in a gray zone? That is, even if it may work with the current socketserver implementation it may not be guaranteed for future versions?

The lastest change has been committed, see r3149, please have a review. thanks.

First, there are some redundant white spaces after EOLs. I've removed them in r3167.

Thanks for your fix, :)

The latest revision introduced additional redundant white spaces. Fixed in r3287. I also fixed a minor typo in r3289.

I've noticed this is a repeated style issue in diffs committed by CNNIC developers. Perhaps there's some Chinese character-specific setting in your editor or IDE that tends to this insert the spaces at EOL? You might want to check your local configuration.

(My editor automatically removes such redundant spaces when I continue to a next line, so it's rare for me to see this type of style issue even if I don't carefully type)

Second, now that we've seen real race conditions (Trac #335), I've re-reviewed the entire code more carefully, and have had additional concerns. Revised review comments with the new issues follow:

The race condition problem should has been fixed in r3270, please have a look.

The race seems to be fixed. But I still see some substantial issues with the implementation. See below.

  • I don't think documentation is sufficient yet. it still doesn't talk about why and when we need this mixin; it doesn't explain the seeming assumption of using threads (in the description of the ServeMixIn? class). I suggest you refer to socketserver.py and provide that level of details.

Update some documentation. the mixin doesn't assumpt of using threads.

It's still not sufficient to me. IMO it still doesn't explain "why and when" sufficiently. It also doesn't explain the usage of the caller side (according to the newest test, it seems the caller is assumed to call server.serve_forever(). it's not obvious). It doesn't explain shutdown() cannot be called from the handler, which is internally invoked in a separate thread, because it would cause a dead lock (if I understand it correctly). the socketserver documentation describes that. Again, we need to provide the same level of details. This also seems to be another evidence that we are not so experienced with thread programming.

There's one other minor editorial issue: why did you change socketserver.TCPServer to socketserver::TCPServer? It seems to me the former is correct for python.

  • do we need self.__serving? in which case do we need it even if we already have synchronization mechanisms? note also that accessing self.__serving isn't protected and can cause a race condition (consider, e.g. the case shutdown() is called immediately after invoking the thread, then serve_forever() is called from the thread. __serving will never be False.).

race condition should has been fixed in the revision.

Fixing a clear bug like a race is a bottom line, but the more fundamental question is whether self.__serving is necessary in the first place. I still don't see why we need it. A mutable variable is always a source of troubles, so if we don't need one we should eliminate it.

  • I'm afraid this raise is dangerous:
                except select.error as err:
                    if err.args[0] != EINTR:
                        raise
    
    consider the case where shutdown is called and the parent thread is waiting on __is_shut_down. what if the above raise is triggered under this condition? doesn't it cause a deadlock?

the raise has been removed, if the exception select.error, just continue.

I'm not sure if continue is the correct behavior. I actually don't know what kind of error other than EINTR is expected in this context, but if it's a permanent error (e.g. due to some broken state in the socket), the error will repeat and this loop continues infinitely (until we get a shutdown message). We should clarify which errors we can receive here, and continue only for transient errors (which may be none or all).

comment:10 Changed 9 years ago by jinmei

  • Owner changed from jinmei to zhanglikun

comment:11 in reply to: ↑ 9 ; follow-ups: Changed 9 years ago by zhanglikun

  • Owner changed from zhanglikun to jinmei

Replying to jinmei:

First off, if I understand it correctly, the change to the mixin requires changes to other part of the code (e.g. xfrout), but they've not been modified.

After some hint from your review result, I changed the serve_forever() and shutdown() logic back. see r3298. so now we don't need to touch of the code of xfrout/cmdctl. Also, I realized the best way for avoid race condition is: avoid use the variable that's cause the problem.

Secondly, while I re-read the documentation of the standard socketserver library, I noticed serve_forever() isn't expected to be overridden.

class TCPServer(BaseServer):
    ...
    Methods for the caller:
    ...
    - serve_forever(poll_interval=0.5)
    ...
    Methods that may be overridden:
    ...

Doesn't this mean this mixin implementation is somehow in a gray zone? That is, even if it may work with the current socketserver implementation it may not be guaranteed for future versions?

Yes, the documentation say serve_forever() and shutdown() are not expected to be overridden. But also there is some TODO in the socketserver.serve_forever(): see

            # XXX: Consider using another file descriptor or
            # connecting to the socket to wake this up instead of
            # polling. Polling reduces our responsiveness to a
            # shutdown request and wastes cpu at all other times.

so we have to do it by ourselves before python fix the problem, and again, once python fix it, we can just drop this mixin.

I've noticed this is a repeated style issue in diffs committed by CNNIC developers. Perhaps there's some Chinese character-specific setting in your editor or IDE that tends to this insert the spaces at EOL? You might want to check your local configuration.

(My editor automatically removes such redundant spaces when I continue to a next line, so it's rare for me to see this type of style issue even if I don't carefully type)

Thanks for your fix, and the spaces at EOL was left when I edited the function description, some spaces between two words was not removed.

It's still not sufficient to me. IMO it still doesn't explain "why and when" sufficiently. It also doesn't explain the usage of the caller side (according to the newest test, it seems the caller is assumed to call server.serve_forever(). it's not obvious). It doesn't explain shutdown() cannot be called from the handler, which is internally invoked in a separate thread, because it would cause a dead lock (if I understand it correctly). the socketserver documentation describes that. Again, we need to provide the same level of details. This also seems to be another evidence that we are not so experienced with thread programming.

There's one other minor editorial issue: why did you change socketserver.TCPServer to socketserver::TCPServer? It seems to me the former is correct for python.

why and when ==
{{

Mix-In class to override the function serve_forever()
and shutdown() in class socketserver::TCPServer.

serve_forever() in socketserver.TCPServer use polling

for checking request, which reduces the responsiveness to
the shutdown request and wastes cpu at all other times.

}}

I can't find any better words to describe this mixin, I would appreciate your better words for 'why and when' if you can give. thanks.

Fixing a clear bug like a race is a bottom line, but the more fundamental question is whether self.__serving is necessary in the first place. I still don't see why we need it. A mutable variable is always a source of troubles, so if we don't need one we should eliminate it.

yes, this is the hint for the latest revision r3298

I'm not sure if continue is the correct behavior. I actually don't know what kind of error other than EINTR is expected in this context, but if it's a permanent error (e.g. due to some broken state in the socket), the error will repeat and this loop continues infinitely (until we get a shutdown message). We should clarify which errors we can receive here, and continue only for transient errors (which may be none or all).

I also don't know what kind of error other than EINTR will happen, but this time I change the code as print the error and break. Not sure it's the best way.

comment:12 in reply to: ↑ 11 Changed 9 years ago by jinmei

  • Owner changed from jinmei to zhanglikun

Unless I misunderstand something, the latest version reintroduces the problem it originally tried to solve, i.e., we now rely on a busy loop with polling again (in socketserver.serve_forever()).

comment:13 Changed 9 years ago by zhanglikun

there are three suggestion from jinmei:

  1. sleep(0.1) in the test case may not enough, the better way should to check whether self.is_shut_down is set?
  2. remove the line server_thread.setDaemon(True) in the test case, since it's not used now.
  3. try to update the serve_forever()/shutdown() logic used by zonemgr and notify-out now(should be a seperate ticket, since ticket 335 has been closed). maybe it can share the same logic with this mixin.

comment:14 in reply to: ↑ 11 ; follow-up: Changed 9 years ago by jinmei

Replying to zhanglikun:

why and when ==
{{

Mix-In class to override the function serve_forever()
and shutdown() in class socketserver::TCPServer.

serve_forever() in socketserver.TCPServer use polling

for checking request, which reduces the responsiveness to
the shutdown request and wastes cpu at all other times.

}}

I can't find any better words to describe this mixin, I would appreciate your better words for 'why and when' if you can give. thanks.

I've committed suggested text (r3324), and I have related additional suggestions:

  • I suggest the name of the mix-in class to NoPollMixIn? (documentation was revised with the new name) because "ServeMixIn?" is too generic. IMO the class name should briefly indicate the role of the class.
  • I also suggest change the module name from serve_mixin to socketserver_mixin so that it will be clear this mixin will be used with the socketserver module.

And one related comment: I believe what this class overrides is actually "BaseServer?" instead of "TCPServer" because, as far as I can see, none of the mix-in code assume specific part of TCPServer. Suggested documentation was updated on that point, too.

Please feel free to update the suggested doc further. In particular, whether you agree with the suggested class name or not, please make the documentation and the actual class name consistent (which are currently not).

I'm not sure if continue is the correct behavior. I actually don't know what kind of error other than EINTR is expected in this context, but if it's a permanent error (e.g. due to some broken state in the socket), the error will repeat and this loop continues infinitely (until we get a shutdown message). We should clarify which errors we can receive here, and continue only for transient errors (which may be none or all).

I also don't know what kind of error other than EINTR will happen, but this time I change the code as print the error and break. Not sure it's the best way.

I think just do 'break' is okay for now, but don't think it a good idea to dump the error to stderr since we cannot assume how this mixin is used. Ideally, the serve_forever() thread sends an exit code to the waiting thread, and the waiting thread raises an exception if something went wrong. But that may be too much at the moment. So we should probably simply remove this line for now:

                    sys.stderr.write("Error with select(), %s\n", err)

comment:15 in reply to: ↑ 14 ; follow-up: Changed 9 years ago by zhanglikun

  • Owner changed from zhanglikun to jinmei

Replying to jinmei:

Replying to zhanglikun:
I've committed suggested text (r3324), and I have related additional suggestions:

  • I suggest the name of the mix-in class to NoPollMixIn? (documentation was revised with the new name) because "ServeMixIn?" is too generic. IMO the class name should briefly indicate the role of the class.
  • I also suggest change the module name from serve_mixin to socketserver_mixin so that it will be clear this mixin will be used with the socketserver module.

And one related comment: I believe what this class overrides is actually "BaseServer?" instead of "TCPServer" because, as far as I can see, none of the mix-in code assume specific part of TCPServer. Suggested documentation was updated on that point, too.

Please feel free to update the suggested doc further. In particular, whether you agree with the suggested class name or not, please make the documentation and the actual class name consistent (which are currently not).

I like the two names, see the revision r3346 and r3347.

I think just do 'break' is okay for now, but don't think it a good idea to dump the error to stderr since we cannot assume how this mixin is used. Ideally, the serve_forever() thread sends an exit code to the waiting thread, and the waiting thread raises an exception if something went wrong. But that may be too much at the moment. So we should probably simply remove this line for now:

                    sys.stderr.write("Error with select(), %s\n", err)

yes,agree. has committed in r3346.

comment:16 in reply to: ↑ 15 Changed 9 years ago by jinmei

  • Owner changed from jinmei to zhanglikun

I'd rename utils/tests/serve_mixin_test.py to socketserver_mixin_test.py.

Other than that the branch looks okay.

comment:17 Changed 9 years ago by zhanglikun

  • Resolution set to fixed
  • Status changed from reviewing to closed

merged to trunk in r3366, so close this ticket.

Note: See TracTickets for help on using tickets.