Opened 7 years ago

Closed 5 years ago

#2249 closed defect (wontfix)

order of shutdowns should not allow msgq or cfgmgr to exit before other components

Reported by: jreed Owned by:
Priority: medium Milestone: Remaining BIND10 tickets
Component: ~Boss of BIND (obsolete) Version: bind10-old
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: Core Feature Depending on Ticket:
Estimated Difficulty: 6 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

See https://lists.isc.org/pipermail/bind10-dev/2012-September/003793.html

The order of the logging and timestamps may be misleading, but it seemed to show that the cfgmgr shutdown cleanly and msgq received a SIGTERM before b10-xfrin tried to use them.

2012-09-03 16:03:18.877 INFO  [b10-boss.boss] BIND10_STOP_PROCESS asking
cfgmgr to shut down
...
2012-09-03 16:03:18.877 INFO  [b10-boss.boss] BIND10_STOP_PROCESS asking
b10-xfrin to shut down
...
2012-09-03 16:03:19.878 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED 
process 1953 of cfgmgr ended with status 0
2012-09-03 16:03:19.879 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED 
process 1957 of b10-auth-1 ended with status 256
2012-09-03 16:03:19.879 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED 
process 1959 of b10-stats ended with status 0
2012-09-03 16:03:19.879 INFO  [b10-boss.boss] BIND10_SEND_SIGTERM 
sending SIGTERM to msgq (PID 1952)
2012-09-03 16:03:19.879 INFO  [b10-boss.boss] BIND10_SEND_SIGTERM 
sending SIGTERM to b10-xfrin (PID 1954)
2012-09-03 16:03:19.879 INFO  [b10-boss.boss] BIND10_SEND_SIGTERM 
sending SIGTERM to b10-zonemgr (PID 1955)
2012-09-03 16:03:19.879 INFO  [b10-boss.boss] BIND10_SEND_SIGTERM 
sending SIGTERM to b10-cmdctl (PID 1958)
...
2012-09-03 16:03:19.884 ERROR [b10-xfrin.config] CONFIG_SESSION_STOPPING_FAILED error 
sending stopping message: [Errno 32] Broken pipe
2012-09-03 16:03:19.887 ERROR [b10-zonemgr.config] CONFIG_SESSION_STOPPING_FAILED 
error sending stopping message: [Errno 32] Broken pipe

Then zonemgr crashes (as seen at https://lists.isc.org/pipermail/bind10-users/2012-September/000391.html):

Traceback (most recent call last):
   File "/opt/bind10/libexec/bind10-devel/b10-zonemgr", line 699, in 
<module>
     zonemgrd = Zonemgr()
   File "/opt/bind10/libexec/bind10-devel/b10-zonemgr", line 523, in 
__init__
     self._setup_session()
   File "/opt/bind10/libexec/bind10-devel/b10-zonemgr", line 542, in 
_setup_session
     self._module_cc.add_remote_config(AUTH_SPECFILE_LOCATION)
   File 
"/opt/bind10/lib/python3.3/site-packages/isc/config/ccsession.py", line 
419, in add_remote_config
     self._add_remote_config_internal(module_spec, config_update_callback)
   File 
"/opt/bind10/lib/python3.3/site-packages/isc/config/ccsession.py", line 
343, in _add_remote_config_internal
     answer, _ = self._session.group_recvmsg(False, seq)
   File "/opt/bind10/lib/python3.3/site-packages/isc/cc/session.py", 
line 275, in group_recvmsg
     env, msg  = self.recvmsg(nonblock, seq)
   File "/opt/bind10/lib/python3.3/site-packages/isc/cc/session.py", 
line 130, in recvmsg
     data = self._receive_full_buffer(nonblock)
   File "/opt/bind10/lib/python3.3/site-packages/isc/cc/session.py", 
line 239, in _receive_full_buffer
     raise se
   File "/opt/bind10/lib/python3.3/site-packages/isc/cc/session.py", 
line 212, in _receive_full_buffer
     self._receive_len_data()
   File "/opt/bind10/lib/python3.3/site-packages/isc/cc/session.py", 
line 172, in _receive_len_data
     new_data = self._receive_bytes(self._recv_len_size)
   File "/opt/bind10/lib/python3.3/site-packages/isc/cc/session.py", 
line 158, in _receive_bytes
     data = self._socket.recv(size)
ConnectionResetError: [Errno 104] Connection reset by peer
Exception BrokenPipeError: BrokenPipeError(32, 'Broken pipe') in <bound 
method ModuleCCSession.__del__ of <isc.config.ccsession.ModuleCCSession 
object at 0x7f4106438390>> ignored

From the logging above it appears that stopping the msqg and cfgmgr didn't get postponed.

Also see ticket #2245 which has similar example.

Subtickets

Change History (4)

comment:1 Changed 7 years ago by vorner

It should be safe to stop the config manager first. Components aren't supposed to use it at shutdown, they only ask for configuration on startup and when it changes.

The msgq can be a problem, but if a component shuts down cleanly, it should happen before the msgq gets the SIGTERM. We may want to check the length of the delay before sending SIGTERMs.

However, it looks strange that msgq would stop first. It either crashed, or the process refused to stop by message and needed to be killed anyway. There's currently no way to stop msgq „nicely“, so it always waits for the second stage (sending SIGTERMs).

comment:2 Changed 7 years ago by shane

  • Milestone New Tasks deleted

comment:3 Changed 6 years ago by tomek

  • Milestone set to Remaining BIND10 tickets

comment:4 Changed 5 years ago by tomek

  • Resolution set to wontfix
  • Status changed from new to closed
  • Version set to old-bind10

This issue is related to bind10 code that is no longer part of Kea.

If you are interested in BIND10/Bundy framework or its DNS components,
please check http://bundy-dns.de.

Closing ticket.

Note: See TracTickets for help on using tickets.