Opened 10 years ago

Closed 9 years ago

#109 closed defect (fixed)

bind10 loop restarting b10-auth even if can't bind socket

Reported by: jreed Owned by: jreed
Priority: low Milestone: 05. 3rd Incremental Release: Serious Secondary
Component: b10-auth Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: Add Hours to Ticket:
Total Hours: Internal?:

Description

If something is using same port as b10-auth, bind10 will continue to retry starting it. Not sure if the behaviour is good, so opening ticket so don't forget about it.

Also the error is:

Permission denied

which is vague and doesn't tell you what BIND 10 compoent printed it.

Subtickets

Change History (17)

comment:1 Changed 10 years ago by shane

b10-auth needs a better error message. In the long run it will need a log message.

comment:2 Changed 10 years ago by larissas

  • Milestone set to 04. 2nd Incremental Release
  • Owner set to each
  • Status changed from new to assigned

comment:3 in reply to: ↑ description Changed 10 years ago by each

  • Owner changed from each to UnAssigned
  • Status changed from assigned to reviewing

Also the error is:

Permission denied

which is vague and doesn't tell you what BIND 10 compoent printed it.

Correction, the error is actually "binding socket failure". But yeah, it's vague.

Please review:
svn diff -r 1841:HEAD .

In addition to clarifying that specific error (and a few others), I preprended most messages with "[b10-auth]". I'm wondering if I should do the same in the AuthSrv? code as well. I also wonder if there should be a global variable "program" so that all the libraries can include the name of the module when printing error messages.

comment:4 follow-up: Changed 10 years ago by jreed

The "svn diff -r 1841:HEAD ." gives me an unrelated patch (for a loadzone typo). What revision to check?

As for the error message: I copied and pasted it. Now I test and copy and paste again:

Permission denied

comment:5 in reply to: ↑ 4 Changed 10 years ago by each

The "svn diff -r 1841:HEAD ." gives me an unrelated patch (for a loadzone typo). What revision to check?

Whoops I'm sorry, I meant on branches/trac109, not in trunk.

As for the error message: I copied and pasted it. Now I test and copy and paste again:

Permission denied

Tell me how to repeat the test? When I have something listening on port 5300 (I used a bind9 named) and try to run bind10, I get "binding socket failure".

comment:6 Changed 10 years ago by jreed

I duplicate by using the bind10 --port or b10-auth -p option to listen on a port already used.

ktrace showing same "Permission denied" that I also see on console.

 12124      1 b10-auth RET   read 1024/0x400
 12124      1 b10-auth CALL  fcntl(7,8,0x7f7fffffd1a0)
 12124      1 b10-auth RET   fcntl 0
 12124      1 b10-auth CALL  __socket30(2,2,0x11)
 12124      1 b10-auth RET   __socket30 8
 12124      1 b10-auth CALL  bind(8,0x7f7fffffd740,0x10)
 12124      1 b10-auth MISC  sockargs: 16, 000203b9000000000000000000000000
 12124      1 b10-auth RET   bind -1 errno 13 Permission denied

comment:7 follow-up: Changed 10 years ago by jreed

I forgot to mention that this was with boost-system so using the BOOST ASIO. I am compiling the branch now. As for the patch it looked to be only cosmetic -- but looks great!

comment:8 Changed 10 years ago by jreed

Now I realize the vague "Permission denied" was due to lack of permissions. Now I get "Address already in use" but no indication this error comes from "b10-auth".

comment:9 in reply to: ↑ 7 ; follow-up: Changed 10 years ago by each

I forgot to mention that this was with boost-system so using the BOOST ASIO. I am compiling the branch now. As for the patch it looked to be only cosmetic -- but looks great!

Thanks. I can merge if you think it's ready, but I'd like your input on a few things first...

Is there a problem with the restart-loop behavior mentioned in the ticket description? I'm inclined to think it's the right thing to do, as long as b10-auth is being clear about what's going wrong.

Also wondering if I should add more "[b10-auth]" prefixes in AuthSrv?. (There are some "[AuthSrv?]" prefixes in there already, and this seems wrong to me; I think it's more useful to tell people which program had trouble than which library function it was calling at the time.)

comment:10 Changed 9 years ago by shane

  • Component changed from Unclassified to b10-auth
  • Owner changed from UnAssigned to jreed

I think Evan is waiting on feedback from Jeremy here?

comment:11 in reply to: ↑ 9 Changed 9 years ago by jreed

Replying to each:

I forgot to mention that this was with boost-system so using the BOOST ASIO. I am compiling the branch now. As for the patch it looked to be only cosmetic -- but looks great!

Thanks. I can merge if you think it's ready, but I'd like your input on a few things first...

Is there a problem with the restart-loop behavior mentioned in the ticket description? I'm inclined to think it's the right thing to do, as long as b10-auth is being clear about what's going wrong.

Yes it is probably correct to loop, not sure how fast it should attempt to restart. But the error message must be clear. The error message as I see it is clear but doesn't indicate what component sent the message. So running BIND 10 you don't know which module prints the error.

Also wondering if I should add more "[b10-auth]" prefixes in AuthSrv?. (There are some "[AuthSrv?]" prefixes in there already, and this seems wrong to me; I think it's more useful to tell people which program had trouble than which library function it was calling at the time.)

I agree it is more useful to know which program. Later when we add a debugging framework we can improve this.

comment:12 Changed 9 years ago by jreed

  • Owner changed from jreed to UnAssigned

comment:13 Changed 9 years ago by shane

  • Owner changed from UnAssigned to each

comment:14 Changed 9 years ago by jreed

  • Owner changed from each to jreed

jreed will test this on trunk.

comment:15 Changed 9 years ago by shane

Was this tested? Can we close this ticket, or should I push it to the next release?

comment:16 Changed 9 years ago by jreed

  • Milestone changed from 04. 2nd Incremental Release: Early Adopters to 05. 3rd Incremental Release

Problem still exists. The problem is b10-auth (I assume) says "Address already in use" without saying what component has that problem. (Or if ran without privilege it says "Permission denied".

(EADDRINUSE and EACCES)

Unrelated to b10-auth, but msgq also has messages not prefixed with its name: "Connection"

comment:17 Changed 9 years ago by jreed

  • Resolution set to fixed
  • Status changed from reviewing to closed

For now, we will assume that the looping is fine. So my other complaint is the error message not identifying the component it came from. This is fixed in r2074 (trac #227) so closing this ticket.

Note: See TracTickets for help on using tickets.