Opened 7 years ago

Closed 6 years ago

#3021 closed defect (wontfix)

bad configuration caused cc timeout and tracebacks

Reported by: jreed Owned by:
Priority: medium Milestone: Remaining BIND10 tickets
Component: Unclassified Version: bind10-old
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

I had a typo in a configuration and it resulted in the following:

DEBUG [b10-cfgmgr.pycc]: PYCC_LNAME_RECEIVED received local name: 51d186bf_2@bind10-testing1.lab.isc.org
INFO [b10-cfgmgr.cfgmgr]: CFGMGR_CONFIG_FILE Configuration manager starting with configuration file: /home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/var/bind10/b10-config.db
FATAL [b10-cfgmgr.cfgmgr]: CFGMGR_DATA_READ_ERROR error reading configuration database from disk: Configuration file out of date or corrupt, please update or remove /home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/var/bind10/b10-config.db
Traceback (most recent call last):
  File "/home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/lib/python3.1/site-packages/isc/cc/session.py", line 223, in _receive_full_buffer
    self._receive_len_data()
  File "/home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/lib/python3.1/site-packages/isc/cc/session.py", line 183, in _receive_len_data
    new_data = self._receive_bytes(self._recv_len_size)
  File "/home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/lib/python3.1/site-packages/isc/cc/session.py", line 169, in _receive_bytes
    data = self._socket.recv(size)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/lib/python3.1/site-packages/isc/config/ccsession.py", line 371, in _add_remote_config_internal
    answer, _ = self._session.group_recvmsg(False, seq)
  File "/home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/lib/python3.1/site-packages/isc/cc/session.py", line 306, in group_recvmsg
    env, msg  = self.recvmsg(nonblock, seq)
  File "/home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/lib/python3.1/site-packages/isc/cc/session.py", line 139, in recvmsg
    data = self._receive_full_buffer(nonblock)
  File "/home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/lib/python3.1/site-packages/isc/cc/session.py", line 238, in _receive_full_buffer
    raise SessionTimeout("recv() on cc session timed out")
isc.cc.session.SessionTimeout: recv() on cc session timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/libexec/bind10/b10-msgq", line 863, in <module>
    msgq.socket_file)
  File "/home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/lib/python3.1/site-packages/isc/config/ccsession.py", line 225, in __init__
    default_logconfig_handler)
  File "/home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/lib/python3.1/site-packages/isc/config/ccsession.py", line 447, in add_remote_config
    self._add_remote_config_internal(module_spec, config_update_callback)
  File "/home/jreed/dnsbench/work/bind10-bind10-1.1.0-release/20130701102040/install/lib/python3.1/site-packages/isc/config/ccsession.py", line 375, in _add_remote_config_internal
    module_name)
isc.config.ccsession.ModuleCCSessionError: No answer from ConfigManager when asking about Remote module Logging
INFO [b10-init.init]: BIND10_STARTING starting BIND10: bind10 20110223 (BIND 10 1.1.0)
DEBUG [b10-init.init]: BIND10_CHECK_MSGQ_ALREADY_RUNNING checking if msgq is already running
INFO [b10-init.init]: BIND10_CONFIGURATOR_START bind10 component configurator is starting up
DEBUG [b10-init.init]: BIND10_CONFIGURATOR_BUILD building plan '{}' -> '{'sockcreator': {'priority': 200, 'kind': 'core', 'special': 'sockcreator'}, 'msgq': {'priority': 199, 'kind': 'core', 'special': 'msgq'}, 'cfgmgr': {'priority': 198, 'kind': 'core', 'special': 'cfgmgr'}}'
DEBUG [b10-init.init]: BIND10_CONFIGURATOR_RUN running plan of 3 tasks
DEBUG [b10-init.init]: BIND10_CONFIGURATOR_TASK performing task start on Socket creator
INFO [b10-init.init]: BIND10_COMPONENT_START component Socket creator is starting
INFO [b10-init.init]: BIND10_SOCKCREATOR_INIT initializing socket creator parser
DEBUG [b10-init.init]: BIND10_STARTED_PROCESS_PID started b10-sockcreator (PID 31516)
DEBUG [b10-init.init]: BIND10_CONFIGURATOR_TASK performing task start on msgq
INFO [b10-init.init]: BIND10_COMPONENT_START component msgq is starting
INFO [b10-init.init]: BIND10_STARTING_PROCESS starting process b10-msgq
DEBUG [b10-init.init]: BIND10_STARTED_PROCESS_PID started b10-msgq (PID 31517)
DEBUG [b10-init.pycc]: PYCC_LNAME_RECEIVED received local name: 51d186bf_1@bind10-testing1.lab.isc.org
DEBUG [b10-init.init]: BIND10_CONFIGURATOR_TASK performing task start on cfgmgr
INFO [b10-init.init]: BIND10_COMPONENT_START component cfgmgr is starting
INFO [b10-init.init]: BIND10_STARTING_PROCESS starting process b10-cfgmgr
DEBUG [b10-init.init]: BIND10_STARTED_PROCESS_PID started b10-cfgmgr (PID 31519)
DEBUG [b10-init.init]: BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
DEBUG [b10-init.init]: BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
DEBUG [b10-init.init]: BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
DEBUG [b10-init.init]: BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
DEBUG [b10-init.init]: BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
DEBUG [b10-init.init]: BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
ERROR [b10-init.init]: BIND10_COMPONENT_START_EXCEPTION component cfgmgr failed to start: Read of 0 bytes: connection closed
ERROR [b10-init.init]: BIND10_COMPONENT_FAILED component cfgmgr (pid None) failed: unknown condition
FATAL [b10-init.init]: BIND10_COMPONENT_UNSATISFIED component cfgmgr is required to run and failed
ERROR [b10-init.init]: BIND10_CONFIGURATOR_PLAN_INTERRUPTED configurator plan interrupted, only 2 of 3 done
INFO [b10-init.init]: BIND10_KILLING_ALL_PROCESSES killing all started processes
INFO [b10-init.init]: BIND10_SEND_SIGKILL sending SIGKILL to Socket creator (PID 31516)
WARN [b10-init.init]: BIND10_SOCKCREATOR_KILL killing the socket creator
INFO [b10-init.init]: BIND10_SEND_SIGKILL sending SIGKILL to msgq (PID 31517)
FATAL [b10-init.init]: BIND10_STARTUP_ERROR error during startup: Unable to start b10-cfgmgr: Read of 0 bytes: connection closed

Subtickets

Change History (8)

comment:1 Changed 7 years ago by vorner

So, what do you propose should happen when you:

  • edit the configuration that is not supposed to be edited by hand
  • do it wrong in a way it is impossible to parse

I think there's not a chance the server could start. Even the first FATAL message quite clearly says what is wrong.

comment:2 Changed 7 years ago by jreed

I think it should fail sooner. The cc channel should not hang. It shouldn't have the tracebacks. Maybe the cfgmgr can have run without a configuration when is fatal?

comment:3 Changed 7 years ago by shane

I suppose we could have the cfgmgr ask b10-init to shutdown instead of exiting? It's non-trivial, as we need to make sure that b10-init doesn't block waiting for configuration and so on, but it is possible if we want to consider this a bug.

comment:4 Changed 7 years ago by vorner

Well, I think it would be easier to try reading the CC non-blocking more than once (with shorter timeouts, or with select) and also check if the started process exited. It would be easier.

Or we could just make sure we terminate correctly when the timeout happens.

comment:5 Changed 7 years ago by muks

We should get a few more comments to decide what the potential action would be.

comment:6 Changed 7 years ago by shane

  • Milestone New Tasks deleted

comment:7 Changed 6 years ago by tomek

  • Milestone set to Remaining BIND10 tickets

comment:8 Changed 6 years ago by tomek

  • Resolution set to wontfix
  • Status changed from new to closed
  • Version set to old-bind10

This issue is related to bind10 code that is no longer part of Kea.

If you are interested in BIND10/Bundy framework or its DNS components,
please check http://bundy-dns.de.

Closing ticket.

Note: See TracTickets for help on using tickets.