Opened 7 years ago

Closed 7 years ago

#2447 closed enhancement (fixed)

User-friendly feedback when not running as root

Reported by: shane Owned by: jinmei
Priority: medium Milestone: Sprint-20121120
Component: ~Boss of BIND (obsolete) Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: High
Sub-Project: Core Feature Depending on Ticket: alpha2
Estimated Difficulty: 3 Add Hours to Ticket: 0
Total Hours: 3.5 Internal?: no

Description

We had users complain about not running as root and getting log messages but no clear indication of what is wrong.

Running as non-root is probably a mistake.

One possible way forward is:

  1. Exit with an error if BIND 10 is not running as root
  2. Allow users to override this with a flag, such as '--non-root'

Because it is possible to run BIND 10 without using privileged ports, users may want to run without needing root access at all, so we add the '--non-root' option to support this mode.

Subtickets

Change History (22)

comment:1 Changed 7 years ago by shane

  • Sub-Project changed from DNS to Core

comment:2 Changed 7 years ago by vorner

I don't know if we want to hardcode this into boss. The boss should be
independent of what is being run under it. If I create some other server, that
doesn't need privileged ports, it makes no sense to have to pass that
parameter. Also, we might want to have the setuid bit on sockcreator, in which
case we want to run as non-root as well.

comment:3 in reply to: ↑ description ; follow-up: Changed 7 years ago by jinmei

Replying to shane:

We had users complain about not running as root and getting log messages but no clear indication of what is wrong.

Specifically which log messages are these?

comment:4 in reply to: ↑ 3 ; follow-up: Changed 7 years ago by jreed

Replying to jinmei:

Specifically which log messages are these?

I reproduce (by starting as non-root and then running in bindctl: "execute init_authoritative_server" and "config commit") to get some examples:

2012-11-02 10:45:24.836 INFO  [b10-boss.boss] BIND10_SOCKET_GET requesting socket [::]:53 of type TCP from the creator
2012-11-02 10:45:24.837 ERROR [b10-boss.boss] BIND10_SOCKET_ERROR error on bind call in the creator: 13/Permission denied
2012-11-02 10:45:24.839 ERROR [b10-auth.server_common] SRVCOMM_ADDRESS_FAIL failed to listen on addresses ("Error creating socket on bind")
2012-11-02 10:45:24.840 ERROR [b10-auth.auth] AUTH_CONFIG_LOAD_FAIL load of configuration failed: Server configuration failed: "Error creating socket on bind"
...

2012-11-02 10:45:24.969 FATAL [b10-auth.auth] AUTH_SERVER_FAILED server failed: Unknown uncaught exception from static createInstance: Failed to open master file: /Local/Users/jreed/opt/bind10/share/bind10-devel/static.zone: No such file or directory
2012-11-02 10:45:25.351 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 68828 of b10-auth ended with status 11
2012-11-02 10:45:25.351 ERROR [b10-boss.boss] BIND10_COMPONENT_FAILED component b10-auth (pid 68828) failed: process terminated with exit status 11 (killed by signal 11: SIGSEGV)
2012-11-02 10:45:25.351 FATAL [b10-boss.boss] BIND10_COMPONENT_UNSATISFIED component b10-auth is required to run and failed

Actually above is multiple problems. The problem for this ticket is SRVCOMM_ADDRESS_FAIL

(I created tickets for the other problems: #2393, #2455, #2457.)

comment:5 in reply to: ↑ 4 Changed 7 years ago by jinmei

Replying to jreed:

Specifically which log messages are these?

2012-11-02 10:45:24.839 ERROR [b10-auth.server_common] SRVCOMM_ADDRESS_FAIL failed to listen on addresses ("Error creating socket on bind")

Actually above is multiple problems. The problem for this ticket is SRVCOMM_ADDRESS_FAIL

In that case I think the "solution" is to make the message "friendly",
rather than considering prohibiting the "non root" mode.

BTW, I don't know in which sense it's unfriendly. Would it become
"friendly" if it says something like this?

2012-11-02 10:45:24.839 ERROR [b10-auth.server_common] SRVCOMM_ADDRESS_FAIL failed to listen on addresses ("Error creating socket on bind: Operation not permitted")

I'd also note that we have this description for this log message:

The server failed to bind to one of the address/port pair it should according
to configuration, for reason listed in the message (usually because that pair
is already used by other service or missing privileges). The server will try
to recover and bind the address/port pairs it was listening to before (if any).

note the "missing privileges".

Although, admittedly people never (are willing to) read this and do
just complain - the sad reality is that such detailed description is
often for developer's self satisfaction.

comment:6 Changed 7 years ago by jreed

Also it is hidden among around 100 other lines of logging (excluding the log level). We need to reduce what is logged by default.

comment:7 Changed 7 years ago by shane

As an administrator, it would not be immediately obvious to me that the server was not functioning even if that was the only message that popped up when running.

One of my complaints about BIND 9 is that it starts and then goes into the background and fails and I have to dig through syslog output to see what has happened.

Probably what we need in the long run is a way to define a point where the server is successfully running, and if we have errors before then to exit telling the user.

However, I think in this case having a compliant about running as non-root is a simple check we can perform which is 99% correct, without much code.

comment:8 Changed 7 years ago by jelte

  • Defect Severity changed from N/A to High
  • Milestone changed from Next-Sprint-Proposed to Sprint-20121120

comment:9 follow-up: Changed 7 years ago by jinmei

Can we see the actual complaint from the user? I'm still not
convinced about what the real problem in this context is, and without
that I cannot be confident about what to do for that.

comment:10 in reply to: ↑ 9 Changed 7 years ago by jreed

Replying to jinmei:

Can we see the actual complaint from the user? I'm still not
convinced about what the real problem in this context is, and without
that I cannot be confident about what to do for that.

From an internal message:

My first thougtht, is that B10 should complain about not being run as
'root', when I first tried to launch the b10-auth module, it failed, and had
to look at the logfile to notice that there was a permission problem
creating the socket.

comment:11 Changed 7 years ago by jinmei

Replying to jreed:

Can we see the actual complaint from the user?

From an internal message:

My first thougtht, is that B10 should complain about not being run as
'root', when I first tried to launch the b10-auth module, it failed, and had
to look at the logfile to notice that there was a permission problem
creating the socket.

Okay, thanks. I've read the complete context of this comment, and it
still doesn't seem to inevitably require tweaking the bind10 process
so it will complain about running as a non root. I tend to agree with
Michal on his comment at: http://bind10.isc.org/ticket/2447?replyto=10#comment:2
At least architecturally, the problem didn't happen in the bind10
process, but primarily in sockcreator and, as a result, in b10-auth.
Tweaking bind10 for a problem of something else seems to be too ad
hoc.

comment:12 Changed 7 years ago by jinmei

  • Owner set to jinmei
  • Status changed from new to accepted

comment:13 follow-ups: Changed 7 years ago by jinmei

trac2447 is ready for review.

After having a discussion with the person who gave this feedback,
I figured out it was sufficient if we improve log message of b10-auth,
at least for him, and at least regarding the failure of binding a
socket to the privileged port.

The major issue for him seemed that the auth log message didn't say
anything like "permission denied":

2012-11-06 12:54:47.108 ERROR [b10-auth.auth] AUTH_CONFIG_LOAD_FAIL load of configuration failed: Server configuration failed: "Error creating socket on bind"

So, in 98a3c1b and 0590b6d, I improved the log message for that case:
boss's sockcreator module now includes more detailed info when binding
the socket fails (including the strerror), and the boss transparently
passes it to auth. So the log now looks like:

2012-11-08 00:40:31.213 ERROR [b10-auth.auth] AUTH_CONFIG_LOAD_FAIL
load of configuration failed: Server configuration failed: "Error
creating socket on bind to be bound to [::]:53: Permission denied -
probably need to restart BIND 10 as a super user"

If we also want to treat this case as a fatal error, it could be done
with a small extension, but I think we first need to discuss it, and
that should go to a separate ticket anyway; the current branch should
sufficiently address the user feedback.

The branch could stop here, but I addressed a few more issues
regarding "start as non root". Currently, if BIND 10 is installed as
a super user and you try to start it without the privilege, it fails
in an ugly way:

  • the bind10 process fails to dump the initial log, and dies with an uncaught exception
  • even if that particular issue is somehow addressed, it would still fail because msgq would fail to start due to the failure of opening the unix domain socket. this will then cause an ugly termination of the bind10 process

So, in the rest of the branch I tried to fix these. It contains two
independent fixes:

  • 88c9fb4 and ec63a9a address the first point.
  • the rest addresses the second point.

The entire branch shouldn't be very big, but since these are
technically off-topic extension, so if they look too much or
controversial, I'm okay with concentrating on the first two commits
for this branch.

And, one final note: the changes are generally log message updates, so
I didn't see the need for adding explicit unit tests for these. Only
86ed7ae is the non-editorial change, but it's basically a refactoring
by unifying two cases, and the existing test should cover it.

I don't think we need a changelog entry for these changes.

comment:14 Changed 7 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from accepted to reviewing

comment:15 Changed 7 years ago by jelte

  • Owner changed from UnAssigned to jelte

comment:16 follow-up: Changed 7 years ago by jelte

  • Owner changed from jelte to jinmei

I pushed one small typo fix, but apart from that (all) changes look fine.

comment:17 in reply to: ↑ 13 Changed 7 years ago by jreed

Replying to jinmei:

  • the bind10 process fails to dump the initial log, and dies with an uncaught exception
  • 88c9fb4 and ec63a9a address the first point.

That is #2258. Thanks!

comment:18 in reply to: ↑ 13 ; follow-up: Changed 7 years ago by jreed

Replying to jinmei:

  • even if that particular issue is somehow addressed, it would still fail because msgq would fail to start due to the failure of opening the unix domain socket. this will then cause an ugly termination of the bind10 process

Is this #1449 or #1168 (or maybe #1597)?

comment:19 in reply to: ↑ 16 ; follow-up: Changed 7 years ago by jinmei

Replying to jelte:

I pushed one small typo fix, but apart from that (all) changes look fine.

Thanks for the review and fix, and I noticed one regression I
introduced with a last-minute change. It's a minor typo in log ID.
I fixed it at b560a59. I believe it's trivial, so I'm going to merge
the branch.

comment:20 in reply to: ↑ 18 Changed 7 years ago by jinmei

Replying to jreed:

  • the bind10 process fails to dump the initial log, and dies with an uncaught exception
  • 88c9fb4 and ec63a9a address the first point.

That is #2258. Thanks!

Replying to jreed:

  • even if that particular issue is somehow addressed, it would still fail because msgq would fail to start due to the failure of opening the unix domain socket. this will then cause an ugly termination of the bind10 process

Is this #1449 or #1168 (or maybe #1597)?

Yes to all. I'll close these tickets too.

comment:21 in reply to: ↑ 19 Changed 7 years ago by jinmei

Replying to jinmei:

Replying to jelte:

I pushed one small typo fix, but apart from that (all) changes look fine.

Thanks for the review and fix, and I noticed one regression I
introduced with a last-minute change. It's a minor typo in log ID.
I fixed it at b560a59. I believe it's trivial, so I'm going to merge
the branch.

Now merge done, closing.

comment:22 Changed 7 years ago by jinmei

  • Resolution set to fixed
  • Status changed from reviewing to closed
  • Total Hours changed from 0 to 3.5
Note: See TracTickets for help on using tickets.