Opened 7 years ago

Closed 7 years ago

#2190 closed defect (fixed)

(msgq) select.poll() broken on osx's python

Reported by: jelte Owned by: jelte
Priority: medium Milestone: Sprint-20120925
Component: ~msgq (obsolete) Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 0.53 Internal?: no

Description

We saw a failure recently where select.poll() returned a bad file descriptor, making msgq crash.

Traceback (most recent call last):
  File "/Local/Users/jelte/repos/bind10_clang/src/bin/msgq/b10-msgq", line 553, in <module>
    msgq.run()
  File "/Local/Users/jelte/repos/bind10_clang/src/bin/msgq/b10-msgq", line 461, in run
    self.run_poller()
  File "/Local/Users/jelte/repos/bind10_clang/src/bin/msgq/b10-msgq", line 481, in run_poller
    self.__process_write(fd)
  File "/Local/Users/jelte/repos/bind10_clang/src/bin/msgq/b10-msgq", line 392, in __process_write
    (_, msg) = self.sendbuffs[fileno]
KeyError: 8249280

(that value being an fd returned by poll())

I am not sure if we can and want to recover from this situation (it is unclear at this point whether these are 'extraneous' events or replacing actual events it should not miss).

One thing I did find is that the system-installed version of python has select.poll() completely disabled. In which case msgq falls back to using kqueue (i am running repeated tests now and have not seen any problems with kqueue yet). Since we use python 3 and have built our own, poll() is once again available, but bad.

This may be related: http://bugs.python.org/issue5154

It would appear this is a wont-fix because it is unclear why Apple disabled this in the first place.

So, perhaps we should use kqueue by default if available, and/or update the documentation so make sure the self-built python has select.poll() disabled (which is, well, bleh). Alternatively maybe we should replace the entire thing by select()...

I also wonder if we should try to reopen that ticket.

Subtickets

Change History (9)

comment:1 Changed 7 years ago by jinmei

Unless there's a specific reason for using poll, I'd suggest simply using select
for all platforms. That will be most portable.

comment:2 Changed 7 years ago by jinmei

  • Milestone changed from New Tasks to Next-Sprint-Proposed

comment:3 Changed 7 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120918

comment:4 Changed 7 years ago by jelte

  • Owner set to jelte
  • Status changed from new to assigned

Since we already have code to use either kqueue or poll, i think we can simply switch the check around (use kqueue if available, poll otherwise), instead of rewriting it to use select.

comment:5 Changed 7 years ago by jinmei

maybe we can close #2019 when this ticket is done.

comment:6 Changed 7 years ago by jelte

  • Owner changed from jelte to UnAssigned
  • Status changed from assigned to reviewing

Since we had both poll and queue code in there already, it seemed even easier to just switch around the initialization (i.e. try setting up kqueue if available, if not, use poll, it's just a one-time thing anyway, and I do now know of any systems where kqueue is offered but known to be broken).

I did run into one problem, see second commit; the statshttpd tests set up a wrapped, but real, msgq (something we might need to look into, btw), and it never shut it down correctly; this didn't pose much of a problem apparently, but it did leave so many things open in my tests that it ran out of file descriptors.

I do not know why it had the weird shutdown-in-finally approach, but it seems to me a normal shutdown as patched in the last commit should work. Of course I may be missing something here :p

comment:7 Changed 7 years ago by vorner

  • Owner changed from UnAssigned to vorner

comment:8 Changed 7 years ago by vorner

  • Owner changed from vorner to jelte
  • Total Hours changed from 0 to 0.53

Hello

I think the changes are simple. But maybe this could be run through build bot first, to see if it works? I don't see much ways to check it does except trying. Anyway, it is ready for merge.

comment:9 Changed 7 years ago by jelte

  • Resolution set to fixed
  • Status changed from reviewing to closed

I had already put it through the buildbots :)

Thanks, merged, closing ticket.

Note: See TracTickets for help on using tickets.