Opened 8 years ago

Closed 6 years ago

#1703 closed defect (invalid)

should not shutdown if a component is later added and it fails

Reported by: jreed Owned by:
Priority: medium Milestone:
Component: ~Boss of BIND (obsolete) Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: High
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

I did:

> config add Boss/components b10-auth-2
> config set Boss/components/b10-auth-2/special auth
> config set Boss/components/b10-auth-2/kind needed
> config commit

Failed to send request, the connection is closed

Logging showed:

2012-02-22 11:11:58.208 INFO  [b10-boss.boss] BIND10_CONFIGURATOR_RECONFIGURE reconfiguring running components
2012-02-22 11:11:58.208 INFO  [b10-boss.boss] BIND10_COMPONENT_START component b10-auth-2 is starting
2012-02-22 11:11:58.208 INFO  [b10-boss.boss] BIND10_STARTING_PROCESS starting process b10-auth
> 2012-02-22 11:11:58.219 INFO  [b10-auth.datasrc] DATASRC_CACHE_ENABLE enabling the hotspot cache
2012-02-22 11:11:58.220 INFO  [b10-auth.auth] AUTH_SERVER_CREATED server created
2012-02-22 11:11:58.235 FATAL [b10-auth.server_common] SRVCOMM_EXCEPTION_ALLOC exception when allocating a socket: File exists
2012-02-22 11:11:58.235 INFO  [b10-boss.boss] BIND10_LOST_SOCKET_CONSUMER consumer 22 of sockets disconnected, considering all its sockets closed
2012-02-22 11:11:58.236 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 23261 of b10-auth-2 ended with status 6
2012-02-22 11:11:58.236 ERROR [b10-boss.boss] BIND10_COMPONENT_FAILED component b10-auth-2 (pid 23261) failed with 6 exit status
2012-02-22 11:11:58.236 FATAL [b10-boss.boss] BIND10_COMPONENT_UNSATISFIED component b10-auth-2 is required to run and failed
2012-02-22 11:11:58.236 INFO  [b10-boss.boss] BIND10_SHUTDOWN stopping the server
2012-02-22 11:11:58.236 INFO  [b10-boss.boss] BIND10_CONFIGURATOR_STOP bind10 component configurator is shutting down
2012-02-22 11:11:58.237 INFO  [b10-boss.boss] BIND10_COMPONENT_STOP component Socket creator is being stopped
2012-02-22 11:11:58.237 INFO  [b10-boss.boss] BIND10_SOCKCREATOR_TERMINATE terminating socket creator
2012-02-22 11:11:58.237 INFO  [b10-boss.boss] BIND10_COMPONENT_STOP component msgq is being stopped
2012-02-22 11:11:58.237 INFO  [b10-boss.boss] BIND10_COMPONENT_STOP component b10-xfrin is being stopped
2012-02-22 11:11:58.237 INFO  [b10-boss.boss] BIND10_STOP_PROCESS asking b10-xfrin to shut down
2012-02-22 11:11:58.237 INFO  [b10-boss.boss] BIND10_COMPONENT_STOP component b10-cmdctl is being stopped
2012-02-22 11:11:58.237 INFO  [b10-boss.boss] BIND10_STOP_PROCESS asking b10-cmdctl to shut down
2012-02-22 11:11:58.237 INFO  [b10-boss.boss] BIND10_COMPONENT_STOP component b10-xfrout is being stopped
2012-02-22 11:11:58.238 INFO  [b10-boss.boss] BIND10_STOP_PROCESS asking b10-xfrout to shut down
2012-02-22 11:11:58.238 INFO  [b10-boss.boss] BIND10_COMPONENT_STOP component b10-auth is being stopped
2012-02-22 11:11:58.238 INFO  [b10-boss.boss] BIND10_STOP_PROCESS asking b10-auth to shut down
2012-02-22 11:11:58.238 INFO  [b10-boss.boss] BIND10_COMPONENT_STOP component cfgmgr is being stopped
2012-02-22 11:11:58.238 INFO  [b10-boss.boss] BIND10_STOP_PROCESS asking cfgmgr to shut down
2012-02-22 11:11:58.238 INFO  [2012-02-22 11:11:58.238b10-xfrout.xfrout ] INFOXFROUT_RECEIVED_SHUTDOWN_COMMAND shutdown command received 
 [b10-boss.boss] BIND10_COMPONENT_STOP component b10-zonemgr is being stopped

***** unrelated but see the log output is unparsable above ****

2012-02-22 11:11:58.238 INFO  [b10-boss.boss] BIND10_STOP_PROCESS asking b10-zonemgr to shut down
2012-02-22 11:11:58.239 INFO  [b10-boss.boss] BIND10_COMPONENT_STOP component b10-stats is being stopped
2012-02-22 11:11:58.239 INFO  [b10-boss.boss] BIND10_STOP_PROCESS asking b10-stats to shut down
2012-02-22 11:11:58.239 INFO  [b10-boss.boss] BIND10_COMPONENT_STOP component b10-stats-httpd is being stopped
2012-02-22 11:11:58.239 INFO  [b10-boss.boss] BIND10_STOP_PROCESS asking b10-stats-httpd to shut down
2012-02-22 11:11:58.243 INFO  [b10-stats.stats] STATS_RECEIVED_SHUTDOWN_COMMAND shutdown command received
2012-02-22 11:11:58.244 INFO  [b10-stats-httpd.stats-httpd] STATHTTPD_SHUTDOWN shutting down
2012-02-22 11:11:58.244 INFO  [b10-stats-httpd.stats-httpd] STATHTTPD_CLOSING closing 127.0.0.1#8000
2012-02-22 11:11:59.239 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 23222 of Socket creator ended with status 0
2012-02-22 11:11:59.239 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 23224 of cfgmgr ended with status 0
2012-02-22 11:11:59.239 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 23225 of b10-zonemgr ended with status 0
2012-02-22 11:11:59.240 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 23226 of b10-stats ended with status 0
2012-02-22 11:11:59.240 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 23227 of b10-xfrin ended with status 0
2012-02-22 11:11:59.240 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 23228 of b10-cmdctl ended with status 0
2012-02-22 11:11:59.240 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 23229 of b10-xfrout ended with status 0
2012-02-22 11:11:59.240 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 23230 of b10-stats-httpd ended with status 0
2012-02-22 11:11:59.241 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 23231 of b10-auth ended with status 0
2012-02-22 11:11:59.241 INFO  [b10-boss.boss] BIND10_SEND_SIGTERM sending SIGTERM to msgq (PID 23223)
2012-02-22 11:11:59.340 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 23223 of msgq ended with status 0
2012-02-22 11:11:59.340 INFO  [b10-boss.boss] BIND10_SHUTDOWN_COMPLETE all processes ended, shutdown complete

Probably I shouldn't have used "kind needed" since a second auth server is probably not needed by default. (I got that idea from guide.)

I don't know why it didn't start.

Subtickets

Change History (11)

comment:1 Changed 8 years ago by jreed

Even though bindctl exited and bind10 exited, the b10-config.db was updated with the new configuration. So next time it wouldn't start (see #1705).

comment:2 follow-up: Changed 8 years ago by vorner

Hello

This is, however, the expected and wanted behavior. There was no problem with the change to configuration you did and the component could start correctly. So it commited the changes to disk. Then it crashed later on (well, really soon after it started), and you made it needed, so boss correctly exited. I'm not sure what better behaviour should be done in this case. Unless you have a concrete idea how it should behave in this case, I propose to close this one.

However, #1705 is indeed a bug and I don't think it is caused by configuration. But I have no idea what that one means right now.

comment:3 in reply to: ↑ 2 Changed 8 years ago by jreed

According to documentation:

 If it is set to "needed" and it fails at startup, the
 whole bind10 shuts down and exits with error exit code.
 But if it fails some time later, it is just started again. 

So the problem should not be FATAL:

2012-02-22 11:11:58.236 FATAL [b10-boss.boss] BIND10_COMPONENT_UNSATISFIED component b10-auth-2 is required to run and failed

It should just try again per the documentation. In this case (see #1705), we know it won't work again, but bind10 as a whole should not shutdown.

Or the definition of "startup" in the documentation needs to be clarified. Does "startup" mean start of a new component at any time? Or does it mean the start of a component when bind10 boss is first starting everything up?

comment:4 Changed 8 years ago by jreed

My point is that bind10 should not shutdown if a component is later added and it fails.

comment:5 Changed 8 years ago by shane

  • Component changed from Unclassified to Boss of BIND
  • Defect Severity changed from N/A to Medium
  • Milestone changed from New Tasks to Next-Sprint-Proposed

Assuming that we want to do something about this (and I basically agree with Jeremy), we can do it soon.

comment:6 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 5

comment:7 Changed 8 years ago by jreed

  • Summary changed from add second b10-auth component crashes bind10 to should not shutdown if a component is later added and it fails

comment:8 Changed 8 years ago by jreed

  • Defect Severity changed from Medium to High
  • Milestone changed from Year 3 Task Backlog to Next-Sprint-Proposed

comment:9 Changed 7 years ago by jreed

This was duplicated by #2455 (nine months later).

comment:10 Changed 7 years ago by jreed

  • Milestone set to Next-Sprint-Proposed

comment:11 Changed 6 years ago by shane

  • Resolution set to invalid
  • Status changed from new to closed

We really need some offline behavior.

However, I'm going to resolve this particular ticket, as the system is indeed operating as expected. :-/

Note: See TracTickets for help on using tickets.