Opened 8 years ago

Closed 7 years ago

#1937 closed defect (wontfix)

FATAL [b10-auth.auth] AUTH_SERVER_FAILED server failed: Can't assign requested address

Reported by: jreed Owned by: jreed
Priority: high Milestone: Sprint-20130108
Component: b10-auth Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: High
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 8 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

2012-05-01 23:28:55.494 FATAL [b10-auth.auth] AUTH_SERVER_FAILED server failed: Can't assign requested address
2012-05-01 23:28:55.496 INFO  [b10-boss.boss] BIND10_LOST_SOCKET_CONSUMER consumer 26 of sockets disconnected, considering all its sockets closed
2012-05-01 23:28:55.497 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 9114 of b10-auth-3 ended with status 256
2012-05-01 23:28:55.497 ERROR [b10-boss.boss] BIND10_COMPONENT_FAILED component b10-auth-3 (pid 9114) failed: process exited normally with exit status 256
2012-05-01 23:28:55.497 INFO  [b10-boss.boss] BIND10_COMPONENT_START component b10-auth-3 is starting
2012-05-01 23:28:55.497 INFO  [b10-boss.boss] BIND10_STARTING_PROCESS starting process b10-auth
2012-05-01 23:28:55.586 INFO  [b10-auth.auth] AUTH_SERVER_STARTED server started
2012-05-01 23:45:54.161 FATAL [b10-auth.auth] AUTH_SERVER_FAILED server failed: Can't assign requested address
2012-05-01 23:45:54.164 INFO  [b10-boss.boss] BIND10_LOST_SOCKET_CONSUMER consumer 34 of sockets disconnected, considering all its sockets closed
2012-05-01 23:45:54.164 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 9116 of b10-auth-7 ended with status 256
2012-05-01 23:45:54.164 ERROR [b10-boss.boss] BIND10_COMPONENT_FAILED component b10-auth-7 (pid 9116) failed: process exited normally with exit status 256
2012-05-01 23:45:54.164 INFO  [b10-boss.boss] BIND10_COMPONENT_START component b10-auth-7 is starting
2012-05-01 23:45:54.164 INFO  [b10-boss.boss] BIND10_STARTING_PROCESS starting process b10-auth
2012-05-01 23:45:54.254 INFO  [b10-auth.auth] AUTH_SERVER_STARTED server started
2012-05-01 23:45:59.157 FATAL [b10-auth.auth] AUTH_SERVER_FAILED server failed: Can't assign requested address
2012-05-01 23:45:59.159 INFO  [b10-boss.boss] BIND10_LOST_SOCKET_CONSUMER consumer 35 of sockets disconnected, considering all its sockets closed
2012-05-01 23:45:59.160 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 9117 of b10-auth-8 ended with status 256
2012-05-01 23:45:59.160 ERROR [b10-boss.boss] BIND10_COMPONENT_FAILED component b10-auth-8 (pid 9117) failed: process exited normally with exit status 256
2012-05-01 23:45:59.160 INFO  [b10-boss.boss] BIND10_COMPONENT_START component b10-auth-8 is starting
2012-05-01 23:45:59.160 INFO  [b10-boss.boss] BIND10_STARTING_PROCESS starting process b10-auth
2012-05-01 23:45:59.251 INFO  [b10-auth.auth] AUTH_SERVER_STARTED server started
2012-05-01 23:46:09.173 FATAL [b10-auth.auth] AUTH_SERVER_FAILED server failed: Can't assign requested address
2012-05-01 23:46:09.175 INFO  [b10-boss.boss] BIND10_LOST_SOCKET_CONSUMER consumer 21 of sockets disconnected, considering all its sockets closed
2012-05-01 23:46:09.176 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 8581 of b10-auth-2 ended with status 256
2012-05-01 23:46:09.176 ERROR [b10-boss.boss] BIND10_COMPONENT_FAILED component b10-auth-2 (pid 8581) failed: process exited normally with exit status 256
2012-05-01 23:46:09.176 INFO  [b10-boss.boss] BIND10_COMPONENT_START component b10-auth-2 is starting
2012-05-01 23:46:09.176 INFO  [b10-boss.boss] BIND10_STARTING_PROCESS starting process b10-auth
2012-05-01 23:46:09.300 INFO  [b10-auth.auth] AUTH_SERVER_STARTED server started
2012-05-01 23:47:26.323 FATAL [b10-auth.auth] AUTH_SERVER_FAILED server failed: Can't assign requested address
2012-05-01 23:47:26.325 INFO  [b10-boss.boss] BIND10_LOST_SOCKET_CONSUMER consumer 18 of sockets disconnected, considering all its sockets closed
2012-05-01 23:47:26.326 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 8354 of b10-auth ended with status 256
2012-05-01 23:47:26.326 ERROR [b10-boss.boss] BIND10_COMPONENT_FAILED component b10-auth (pid 8354) failed: process exited normally with exit status 256
2012-05-01 23:47:26.326 INFO  [b10-boss.boss] BIND10_COMPONENT_START component b10-auth is starting
2012-05-01 23:47:26.326 INFO  [b10-boss.boss] BIND10_STARTING_PROCESS starting process b10-auth
2012-05-01 23:47:26.419 INFO  [b10-auth.auth] AUTH_SERVER_STARTED server started

Not sure what debuglevel to set. This system has eight b10-auth components running and handling around 18000 queries per second.

Any ideas on how to research this FATAL [b10-auth.auth] AUTH_SERVER_FAILED server failed: Can't assign requested address ?

Subtickets

Attachments (1)

bind10_auth_no_catch.patch (10.9 KB) - added by jelte 7 years ago.
Patch for b10-auth that does not catch any exception while retaining existing cleanup code

Download all attachments as: .zip

Change History (18)

comment:1 Changed 8 years ago by jreed

  • Summary changed from ATAL [b10-auth.auth] AUTH_SERVER_FAILED server failed: Can't assign requested address to FATAL [b10-auth.auth] AUTH_SERVER_FAILED server failed: Can't assign requested address

comment:2 Changed 8 years ago by jreed

Documentation has:

AUTH_SERVER_FAILED server failed: %1

    The authoritative server has encountered a fatal error and is terminating. The reason for the failure is included in the message. 

Actually the reason is too vague. What requested address? And why does this happen? (It appears to be working fine.)

comment:3 Changed 8 years ago by jreed

The counts of each time a component crashed over last 15 hours:

4 b10-auth
4 b10-auth-2
4 b10-auth-3
3 b10-auth-4
2 b10-auth-5
4 b10-auth-6
6 b10-auth-7
3 b10-auth-8

comment:4 Changed 8 years ago by shane

  • Milestone changed from New Tasks to Next-Sprint-Proposed
  • Priority changed from medium to high

comment:5 Changed 8 years ago by jreed

The configuration was using default wildcards which don't work: #1196

But the output above didn't identify it and it appeared to be handling some traffic. And they didn't crash often -- maybe few hours before process crashed.

comment:6 Changed 7 years ago by jinmei

I've seen this on my personal server several times. I think we should fix this
or at least take a closer look at it at a higher priority.

comment:7 Changed 7 years ago by jinmei

  • Milestone set to Next-Sprint-Proposed

comment:9 Changed 7 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20121204

comment:10 Changed 7 years ago by jelte

AUTH_SERVER_FAILED is logged whenever *any* exception bubbles all the way up to main(). At this point it can't do anything but exit, and because we do have that catchall there debugging it is hard (no core, no exception type, only its message).

Normally whenever I start to debug anything with AUTH_SERVER_FAILED in the logs, I start out by removing this catch, then try to reproduce the issue.

So is this ticket now only about the log message? If so I think the 'fix' is to simply remove the try/catch block, and the message (and let it crash with an escaped exception).

Changed 7 years ago by jelte

Patch for b10-auth that does not catch any exception while retaining existing cleanup code

comment:11 follow-up: Changed 7 years ago by jelte

  • Owner set to jreed
  • Status changed from new to assigned

Not entirely sure which version we are running, but I've attached a patch against our latest release (though I suspect it'll apply cleanly to current master as well), which removes the catch, but still performs the cleanup calls (with a naive RAII-style cleaner class)

Could you get this patch applied and see what the core dump gives us?

(assigning this to you, hope you don't mind)

comment:12 in reply to: ↑ 11 Changed 7 years ago by jreed

Replying to jelte:

Could you get this patch applied and see what the core dump gives us?

Thanks. The patch is in place and new b10-auth processes started. I will let you know. It doesn't happen every day so we may need to wait.

comment:13 Changed 7 years ago by jelte

  • Milestone changed from Sprint-DHCP-20130103 to Sprint-20130108

comment:14 Changed 7 years ago by jreed

Can you provide a new patch? (I didn't look at it yet but it didn't apply to today's master.)

comment:15 follow-up: Changed 7 years ago by jinmei

There have been no news about this for quite some time. Unless there
is one soon, I suggest (and am going to) close this ticket for now.

comment:16 in reply to: ↑ 15 Changed 7 years ago by jinmei

Replying to jinmei:

There have been no news about this for quite some time. Unless there
is one soon, I suggest (and am going to) close this ticket for now.

Okay, closing. Let's create a new ticket if it happens again.

comment:17 Changed 7 years ago by jinmei

  • Resolution set to wontfix
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.