Opened 8 years ago

Closed 7 years ago

#1597 closed defect (duplicate)

traceback in bind10 when cfgmgr can't start

Reported by: jreed Owned by:
Priority: medium Milestone:
Component: ~Boss of BIND (obsolete) Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: High
Sub-Project: Core Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

This maybe related to #1291.

I tried bind10 --user. I forgot about config permissions.

This is from shutdown first:

2012-01-18 12:54:51.411 ERROR [b10-boss.boss] BIND10_CONFIGURATOR_PLAN_INTERRUPTED configurator plan interrupted, only 2 of 8 done
2012-01-18 12:54:52.503 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 1481 of Socket creator ended with status 0

Not sure what that 2 of 8 means or if it matters. Anyways here is the problem:

2012-01-18 12:55:23.192 INFO  [b10-boss.boss] BIND10_STARTING starting BIND10: bind10 20110223 (BIND 10 20120119)
2012-01-18 12:55:23.192 DEBUG [b10-boss.boss] BIND10_CHECK_MSGQ_ALREADY_RUNNING checking if msgq is already running
2012-01-18 12:55:23.204 INFO  [b10-boss.boss] BIND10_CONFIGURATOR_START bind10 component configurator is starting up
2012-01-18 12:55:23.204 DEBUG [b10-boss.boss] BIND10_CONFIGURATOR_BUILD building plan '{}' -> '{'sockcreator': {'priority': 200, 'kind': 'core', 'special': 'sockcreator'}, 'msgq': {'priority': 199, 'kind': 'core', 'special': 'msgq'}, 'cfgmgr': {'priority': 198, 'kind': 'core', 'special': 'cfgmgr'}}'
2012-01-18 12:55:23.204 DEBUG [b10-boss.boss] BIND10_CONFIGURATOR_RUN running plan of 3 tasks
2012-01-18 12:55:23.204 DEBUG [b10-boss.boss] BIND10_CONFIGURATOR_TASK performing task start on Socket creator
2012-01-18 12:55:23.204 INFO  [b10-boss.boss] BIND10_COMPONENT_START component Socket creator is starting
2012-01-18 12:55:23.231 INFO  [b10-boss.boss] BIND10_SOCKCREATOR_INIT initializing socket creator parser
2012-01-18 12:55:23.231 DEBUG [b10-boss.boss] BIND10_STARTED_PROCESS_PID started b10-sockcreator (PID 10365)
2012-01-18 12:55:23.231 INFO  [b10-boss.boss] BIND10_SETUID setting UID to 1000
2012-01-18 12:55:23.232 DEBUG [b10-boss.boss] BIND10_CONFIGURATOR_TASK performing task start on msgq
2012-01-18 12:55:23.232 INFO  [b10-boss.boss] BIND10_COMPONENT_START component msgq is starting
2012-01-18 12:55:23.232 INFO  [b10-boss.boss] BIND10_STARTING_PROCESS starting process b10-msgq
2012-01-18 12:55:23.348 DEBUG [b10-boss.boss] BIND10_STARTED_PROCESS_PID started b10-msgq (PID 16077)
2012-01-18 12:55:23.569 DEBUG [b10-boss.boss] BIND10_CONFIGURATOR_TASK performing task start on cfgmgr
2012-01-18 12:55:23.569 INFO  [b10-boss.boss] BIND10_COMPONENT_START component cfgmgr is starting
2012-01-18 12:55:23.569 INFO  [b10-boss.boss] BIND10_STARTING_PROCESS starting process b10-cfgmgr
2012-01-18 12:55:23.637 DEBUG [b10-boss.boss] BIND10_STARTED_PROCESS_PID started b10-cfgmgr (PID 13175)
2012-01-18 12:55:23.637 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
2012-01-18 12:55:23.951 FATAL [b10-cfgmgr.cfgmgr] CFGMGR_DATA_READ_ERROR error reading configuration database from disk: Can't read configuration file: [Errno 13] Permission denied: '/home/reed/opt/bind10/var/bind10-devel/b10-config.db'
2012-01-18 12:55:23.967 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
2012-01-18 12:55:24.977 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
2012-01-18 12:55:25.987 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
2012-01-18 12:55:26.998 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
2012-01-18 12:55:28.008 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
2012-01-18 12:55:29.018 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
2012-01-18 12:55:30.029 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
2012-01-18 12:55:31.039 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
2012-01-18 12:55:32.050 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting for configuration manager process to initialize
2012-01-18 12:55:33.060 ERROR [b10-boss.boss] BIND10_COMPONENT_START_EXCEPTION component cfgmgr failed to start: Configuration manager process has not started
2012-01-18 12:55:33.061 ERROR [b10-boss.boss] BIND10_COMPONENT_FAILED component cfgmgr (pid None) failed with unknown exit status
2012-01-18 12:55:33.061 FATAL [b10-boss.boss] BIND10_COMPONENT_UNSATISFIED component cfgmgr is required to run and failed
2012-01-18 12:55:33.062 ERROR [b10-boss.boss] BIND10_CONFIGURATOR_PLAN_INTERRUPTED configurator plan interrupted, only 2 of 3 done
2012-01-18 12:55:33.062 INFO  [b10-boss.boss] BIND10_KILLING_ALL_PROCESSES killing all started processes
2012-01-18 12:55:33.063 INFO  [b10-boss.boss] BIND10_KILL_PROCESS killing process Socket creator
2012-01-18 12:55:33.063 WARN  [b10-boss.boss] BIND10_SOCKCREATOR_KILL killing the socket creator
Traceback (most recent call last):
  File "/home/reed/opt/bind10/lib/python3.1/site-packages/isc/bind10/component.py", line 165, in start
    self._start_internal()
  File "/home/reed/opt/bind10/lib/python3.1/site-packages/isc/bind10/component.py", line 403, in _start_internal
    procinfo = self._start_func()
  File "/home/reed/opt/bind10/sbin/bind10", line 476, in start_cfgmgr
    raise ProcessStartError("Configuration manager process has not started")
__main__.ProcessStartError: Configuration manager process has not started

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/reed/opt/bind10/sbin/bind10", line 638, in startup
    self.start_all_components()
  File "/home/reed/opt/bind10/sbin/bind10", line 599, in start_all_components
    self._component_configurator.startup(self.__core_components)
  File "/home/reed/opt/bind10/lib/python3.1/site-packages/isc/bind10/component.py", line 504, in startup
    self.__reconfigure_internal(self._components, configuration)
  File "/home/reed/opt/bind10/lib/python3.1/site-packages/isc/bind10/component.py", line 492, in __reconfigure_internal
    self._run_plan(self._build_plan(old, new))
  File "/home/reed/opt/bind10/lib/python3.1/site-packages/isc/bind10/component.py", line 633, in _run_plan
    component.start()
  File "/home/reed/opt/bind10/lib/python3.1/site-packages/isc/bind10/component.py", line 168, in start
    self.failed(None)
  File "/home/reed/opt/bind10/lib/python3.1/site-packages/isc/bind10/component.py", line 222, in failed
    self._boss.component_shutdown(1)
  File "/home/reed/opt/bind10/sbin/bind10", line 671, in component_shutdown
    raise Exception("Component failed during startup");
Exception: Component failed during startup

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/reed/opt/bind10/sbin/bind10", line 1170, in <module>
    main()
  File "/home/reed/opt/bind10/sbin/bind10", line 1148, in main
    startup_result = boss_of_bind.startup()
  File "/home/reed/opt/bind10/sbin/bind10", line 640, in startup
    self.kill_started_components()
  File "/home/reed/opt/bind10/sbin/bind10", line 343, in kill_started_components
    self.components[pid].kill(True)
  File "/home/reed/opt/bind10/lib/python3.1/site-packages/isc/bind10/special_component.py", line 68, in kill
    self.__creator.kill()
  File "/home/reed/opt/bind10/lib/python3.1/site-packages/isc/bind10/sockcreator.py", line 239, in kill
    self.__process.kill()
  File "/usr/pkg/lib/python3.1/subprocess.py", line 1403, in kill
    self.send_signal(signal.SIGKILL)
  File "/usr/pkg/lib/python3.1/subprocess.py", line 1393, in send_signal
    os.kill(self.pid, sig)
OSError: [Errno 1] Operation not permitted

I tried to start bind10 again without --user and it failed due to running still:

2012-01-18 12:56:05.693 INFO  [b10-boss.boss] BIND10_STARTING starting BIND10: bind10 20110223 (BIND 10 20120119)
2012-01-18 12:56:05.694 DEBUG [b10-boss.boss] BIND10_CHECK_MSGQ_ALREADY_RUNNING checking if msgq is already running
2012-01-18 12:56:05.695 FATAL [b10-boss.boss] BIND10_MSGQ_ALREADY_RUNNING msgq daemon already running, cannot start
2012-01-18 12:56:05.695 FATAL [b10-boss.boss] BIND10_STARTUP_ERROR error during startup: b10-msgq already running, or socket file not cleaned , cannot start
[b10-msgq] Closing socket fd 5
[b10-msgq] Receive error: EOF
[b10-msgq] Closing socket fd 4
[b10-msgq] Receive error: EOF
[b10-msgq] Closing socket fd 4
[b10-msgq] Receive error: EOF

with these processes:

root 10365  0.0  0.2  17920  1592 ttyp1 I    12:55PM  0:00.01 b10-sockcreator
reed 16077  0.0  1.7 109924 15296 ttyp1 I    12:55PM  0:00.28 /usr/pkg/bin/python3.1 /home/reed/opt/bind10/libexec/bind10-devel/b10-msgq

I killed these manually and started bind10 without --user.

The traceback should not happen.

sockcreator should never stay running (I have seen this before).

msgq should never stay running.

bind10 should say to stderr if can't start (don't assume logging works).

Subtickets

Change History (4)

comment:1 Changed 8 years ago by jreed

  • Summary changed from traceback in bind10 when cfgmrg can't start to traceback in bind10 when cfgmgr can't start

comment:2 Changed 8 years ago by shane

  • Milestone changed from New Tasks to Year 3 Task Backlog

comment:3 Changed 8 years ago by jreed

This is similar to #1449. Need to check this again.

comment:4 Changed 7 years ago by jinmei

  • Resolution set to duplicate
  • Status changed from new to closed

This should have been resolved in #2447.

Note: See TracTickets for help on using tickets.