Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#2178 closed defect (fixed)

Logging for Out-of-zone data

Reported by: jreed Owned by: jelte
Priority: medium Milestone: Sprint-20120904
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 3 Add Hours to Ticket: 0
Total Hours: 3.5 Internal?: no

Description

2012-08-01 18:42:04.147 ERROR [b10-auth.config] CONFIG_CCSESSION_MSG_INTERNAL error handling CC session message: Out-of-zone data at line 1

This problem was caused by a data_sources param configuration with a typo in the origin.

1) This log message should have a more precises message label (instead of CONFIG_CCSESSION_MSG_INTERNAL).

2) It should indicate what zone had the problem. In that context I may see the misspelling faster.

Subtickets

Change History (16)

comment:1 Changed 7 years ago by jelte

  • Milestone changed from New Tasks to Sprint-20120821

comment:2 Changed 7 years ago by jelte

  • Owner set to jelte
  • Status changed from new to assigned

comment:3 follow-up: Changed 7 years ago by jelte

  • Owner changed from jelte to UnAssigned
  • Status changed from assigned to reviewing

Ready for review, I've added an optional argument to masterLoad() for even better error messages, and the configurableclientlist now specifically catches masterload errors. It simply logs them and continues (so that other datasources/zones still work).

I have found two additional things for which I have a patch, but have not included it in this branch;

  • the config update hack in auth's main.cc (line 205) appears to be unnecessary; an update is immediately called anyway.
  • a potentially bigger problem that is shown by this ticket is that reconfigure() in the DataSourceConfigurator? can throw, which it shouldn't, as it is used as a remote config callback; we should probably catch and log std::exception and maybe even (...) as well (in reconfigureInternal()).

I can add them or create a separate ticket.

Proposed changelog:
[bug]
b10-auth no longer exits upon encountering a zone content problem (such as out-of-zone data) in configured MasterFiles? zones. It only logs an error, and all other zones and datasources are still loaded. The error log message has been improved to include the zone origin and source file name.

comment:4 Changed 7 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:5 in reply to: ↑ 3 ; follow-up: Changed 7 years ago by jinmei

In short, I'm okay with merging the branch. I've fixed some small
things, and there are some other small open points, which I'd leave to
you. But larger discussion points occurred to me while reviewing the
code (not only for the changes in this branch).

Things already addressed:

  • client_list_unittest.cc didn't compile for me because the result class doesn't have operator<< (Does it work for a newer version of gtest?) I've fixed it myself and committed it. And, while doing so, I've renamed negativeResult_ to negative_result_ so it confirms to the variable naming convention.
  • I did some other minor editorial fixes.

Possible minor issues:

  • MasterLoadError can only happen from here:
                            if (type == "MasterFiles") {
                                finder->load(paramConf->get(*it)->stringValue());
    
    so we can narrow the try block.
  • I'd log the zone name and class in the unified form here:
                        } catch (const isc::dns::MasterLoadError& mle) {
                            LOG_ERROR(logger, DATASRC_MASTERLOAD_ERROR)
                                .arg(mle.what());
    

About Changelog:
I think this is primarily a fix to lib(b10-)datasrc, and the change in
b10-auth is an effect of it that is just most visible (and maybe the
only at this moment) to the user. So, I'd say something like this:

libdatasrc: the data source client list class now ignores zone content
problems (such as out-of-zone data) in MasterFiles type zones, instead
of aborting the entire configuration.  It only logs an error, and all
other zones and datasources are still loaded. The error log message
has been improved to include the zone origin and source file name.  As
a result of this change, b10-auth no longer exits upon encountering
such errors.

Bigger issues:

  • The current error handling in ConfigurableClientList::configure seems quite ad hoc and policyless. It's not clear in which case we (intend to) abort the configuration, accept partial error in a single data source, accept total error in a single data source, etc. What I really would like to see is a general policy about these, introduce higher level exceptions to represent the concept, and let the underlying data source (client) implementation throw them according to the severity of specific errors. I think it's also related to your second "addition" thing.
  • Related to the previous point, catching MasterLoadError seems fragile, because it's not even clear in client_list that the data source uses masterLoad() for loading the zone.
  • Also related, I guess it's probably too strict to refuse loading completely when we see an out-of-zone name. For example, BIND 9's zone parser warns about it and continue parsing. We should probably do the same.

But I don't request addressing these in this ticket. Some of them are
clearly beyond the scope of this ticket, and some may be too big
anyway, and we'll substantially change the loader and parser pretty
soon, so making big changes in this area immaturely wouldn't be a good
idea. It would still be a good idea to make some comments about these
points so we can remember it later.

Replying to jelte:

  • the config update hack in auth's main.cc (line 205) appears to be unnecessary; an update is immediately called anyway.

Is this ("unnecessary") true? My understanding is that even if the
update callback is called it doesn't provide the default
configuration. Has it been fixed, or is it different for remote
configuration?

  • a potentially bigger problem that is shown by this ticket is that reconfigure() in the DataSourceConfigurator? can throw, which it shouldn't, as it is used as a remote config callback; we should probably catch and log std::exception and maybe even (...) as well (in reconfigureInternal()).

See above. I agree we should revisit the error handling policy (and
implementation). But as for "it is used as a remote config callback",
I guess it's part of another big problem of what we should do if
remote config update fails in general. Also, I think std or even
unexpected type of exceptions should actually kill (maybe gracefully)
the process just like in other cases. We generally consider these
an unrecoverable error, and rather than trying to deal with it, just
let the process die and restart.

comment:6 Changed 7 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:7 in reply to: ↑ 5 ; follow-ups: Changed 7 years ago by jelte

  • Owner changed from jelte to jinmei

Replying to jinmei:

In short, I'm okay with merging the branch. I've fixed some small
things, and there are some other small open points, which I'd leave to
you. But larger discussion points occurred to me while reviewing the
code (not only for the changes in this branch).

Things already addressed:

  • client_list_unittest.cc didn't compile for me because the result class doesn't have operator<< (Does it work for a newer version of gtest?) I've fixed it myself and committed it. And, while doing so, I've renamed negativeResult_ to negative_result_ so it confirms to the variable naming convention.

interesting, it certainly compiled here (but i had been wondering why the EQ(==) had been used in other places, since operator== obviously is defined); fyi, I have gtest 1.6.0

  • I did some other minor editorial fixes.

ok, they look fine.

Possible minor issues:

  • MasterLoadError can only happen from here:
                            if (type == "MasterFiles") {
                                finder->load(paramConf->get(*it)->stringValue());
    
    so we can narrow the try block.

Yes, originally i had narrowed it, but then you do get an empty added zone (since the 'zonefinder' object is still added. Which would be solved by either checking that it succeeded, or doing cache->addZone() it in both branches. I did the latter for now, and narrowed the try.

Passing this ticket back to you to check this change; discussion below is needed, but indeed out of scope for this ticket (except perhaps this first one about the unified form).

  • I'd log the zone name and class in the unified form here:
                        } catch (const isc::dns::MasterLoadError& mle) {
                            LOG_ERROR(logger, DATASRC_MASTERLOAD_ERROR)
                                .arg(mle.what());
    

I don't understand; the exception has zonename/class in it, or do you mean it should not be in the exception but in the log arguments?

About Changelog:
I think this is primarily a fix to lib(b10-)datasrc, and the change in
b10-auth is an effect of it that is just most visible (and maybe the
only at this moment) to the user. So, I'd say something like this:

libdatasrc: the data source client list class now ignores zone content
problems (such as out-of-zone data) in MasterFiles type zones, instead
of aborting the entire configuration.  It only logs an error, and all
other zones and datasources are still loaded. The error log message
has been improved to include the zone origin and source file name.  As
a result of this change, b10-auth no longer exits upon encountering
such errors.

ok

Bigger issues:

  • The current error handling in ConfigurableClientList::configure seems quite ad hoc and policyless. It's not clear in which case we (intend to) abort the configuration, accept partial error in a single data source, accept total error in a single data source, etc. What I really would like to see is a general policy about these, introduce higher level exceptions to represent the concept, and let the underlying data source (client) implementation throw them according to the severity of specific errors. I think it's also related to your second "addition" thing.

Probably, but I do hope we can get something easier to edit than command/config in auth :)

  • Related to the previous point, catching MasterLoadError seems fragile, because it's not even clear in client_list that the data source uses masterLoad() for loading the zone.

Indeed, however, in this specific case I didn't think too much of it because 'masterfiles' is already a special case in this implementation, with its own handling.

  • Also related, I guess it's probably too strict to refuse loading completely when we see an out-of-zone name. For example, BIND 9's zone parser warns about it and continue parsing. We should probably do the same.

in that case we'll have to differentiate either with separate exceptions; or come up with a different way to propagate problems.

out-of-zone data is the one that triggered this ticket, but i think most of the other ones are more severe (though none should have the old behaviour of completely crapping out on startup).

But I don't request addressing these in this ticket. Some of them are
clearly beyond the scope of this ticket, and some may be too big
anyway, and we'll substantially change the loader and parser pretty
soon, so making big changes in this area immaturely wouldn't be a good
idea. It would still be a good idea to make some comments about these
points so we can remember it later.

ack.

Replying to jelte:

  • the config update hack in auth's main.cc (line 205) appears to be unnecessary; an update is immediately called anyway.

Is this ("unnecessary") true? My understanding is that even if the
update callback is called it doesn't provide the default
configuration. Has it been fixed, or is it different for remote
configuration?

ah, indeed, we still need to fix that.

  • a potentially bigger problem that is shown by this ticket is that reconfigure() in the DataSourceConfigurator? can throw, which it shouldn't, as it is used as a remote config callback; we should probably catch and log std::exception and maybe even (...) as well (in reconfigureInternal()).

See above. I agree we should revisit the error handling policy (and
implementation). But as for "it is used as a remote config callback",
I guess it's part of another big problem of what we should do if
remote config update fails in general. Also, I think std or even
unexpected type of exceptions should actually kill (maybe gracefully)
the process just like in other cases. We generally consider these
an unrecoverable error, and rather than trying to deal with it, just
let the process die and restart.

well, there's two potential problems (which is why the set-remote-config-handler-call documentation specifies the handler should not throw);

  • the handler cannot be trusted to immediately report errors back to the user since the module may not be running, and even if we had 'offline' config, we still wouldn't know which modules might use it
  • more importantly, especially in this scenario, is that if the module is of kind 'needed', and there is an error in the config which kills the module, you can't start bind10 again and fix it; it will immediately exit during initialization.

So there may be classes of errors that should do so; i certainly don't think zone-loading is one of them :)

comment:8 in reply to: ↑ 7 Changed 7 years ago by jinmei

Replying to jelte:

Possible minor issues:

  • MasterLoadError can only happen from here:
                            if (type == "MasterFiles") {
                                finder->load(paramConf->get(*it)->stringValue());
    
    so we can narrow the try block.

Yes, originally i had narrowed it, but then you do get an empty added zone (since the 'zonefinder' object is still added. Which would be solved by either checking that it succeeded, or doing cache->addZone() it in both branches. I did the latter for now, and narrowed the try.

This looks okay, so please feel free to merge.

comment:9 Changed 7 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:10 in reply to: ↑ 7 ; follow-up: Changed 7 years ago by jinmei

Responding to non-blocking points...

Replying to jelte:

Things already addressed:

  • client_list_unittest.cc didn't compile for me because the result class doesn't have operator<< (Does it work for a newer version of gtest?)

interesting, it certainly compiled here (but i had been wondering why the EQ(==) had been used in other places, since operator== obviously is defined); fyi, I have gtest 1.6.0

Then it looks like a change in 1.6. This issue has been a well known
problem, and you can find many such compromise by grepping the
keywords of 'EXPECT_TRUE' and '==' (I believe we leave comments about
why it cannot be EXPECT_EQ in some of the cases).

  • I'd log the zone name and class in the unified form here:
                        } catch (const isc::dns::MasterLoadError& mle) {
                            LOG_ERROR(logger, DATASRC_MASTERLOAD_ERROR)
                                .arg(mle.what());
    

I don't understand; the exception has zonename/class in it, or do you mean it should not be in the exception but in the log arguments?

Ah, okay, I didn't chase it that deeper. I'd still prefer including
the zone info explicitly, because the current code depends on the
specific what() format of MasterLoadError (that should be common to
all 'throw' cases), and this class is far from the data source
module. But, as discussed, we'll revise the code around this more
substantially pretty soon anyway, so I'm okay with keeping the current
version.

Bigger issues:

  • The current error handling in ConfigurableClientList::configure seems quite ad hoc and policyless. [...]

Probably, but I do hope we can get something easier to edit than command/config in auth :)

Hmm, I'm not sure how the auth config implementation is related in
this context. Out of curiosity, what kind of difficulty are you
seeing?

  • a potentially bigger problem that is shown by this ticket is that reconfigure() in the DataSourceConfigurator? can throw, which it shouldn't, as it is used as a remote config callback; we should probably catch and log std::exception and maybe even (...) as well (in reconfigureInternal()).

See above. I agree we should revisit the error handling policy (and
implementation). But as for "it is used as a remote config callback",
I guess it's part of another big problem of what we should do if
remote config update fails in general. Also, I think std or even
unexpected type of exceptions should actually kill (maybe gracefully)
the process just like in other cases. We generally consider these
an unrecoverable error, and rather than trying to deal with it, just
let the process die and restart.

well, there's two potential problems (which is why the set-remote-config-handler-call documentation specifies the handler should not throw);

  • the handler cannot be trusted to immediately report errors back to the user since the module may not be running, and even if we had 'offline' config, we still wouldn't know which modules might use it
  • more importantly, especially in this scenario, is that if the module is of kind 'needed', and there is an error in the config which kills the module, you can't start bind10 again and fix it; it will immediately exit during initialization.

So there may be classes of errors that should do so; i certainly don't think zone-loading is one of them :)

Content errors in a zone file shouldn't cause such disruption, but I
don't think it causes std or other abnormal type of exceptions.

Regarding the "don't throw" requirement in the remote config handler,
while I've not fully thought about it, in my gut feeling it's not
about throw or no throw, because it should be possible to catch
exceptions in the caller side of the module CC if it were only about
exception. The real issue seems, as I mentioned in the previous
comment (cited above), "what we should do if remote config update
fails in general", whether or not it's reported in the form of
exception. Errors can happen (just like we see it in this topic), and
even if we hide the corresponding exception, the problem of how we
ensure the system-level consistency among different modules that
share the config doesn't go away. That's what I meant by "part of
another big problem".

comment:11 in reply to: ↑ 10 ; follow-up: Changed 7 years ago by jelte

  • Owner changed from jelte to jinmei

Bigger issues:

  • The current error handling in ConfigurableClientList::configure seems quite ad hoc and policyless. [...]

Probably, but I do hope we can get something easier to edit than command/config in auth :)

Hmm, I'm not sure how the auth config implementation is related in
this context. Out of curiosity, what kind of difficulty are you
seeing?

while we have a number of config/command related handlers that are too flat and naive, IMO the configparser classes in auth overshot on the other side, and it has a bit too much abstraction and code going on for what it does, making it less flexible than it could be, and more complicated to change than necessary.

Perhaps if we filter out the essentials and make a framework out of it it would be better (keeping most if not all of the current things it does, like the build->commit design), as we would then at least have a standard way to do it, but right now it seems a bit overdone and too auth-specific.

Regarding the "don't throw" requirement in the remote config handler,
while I've not fully thought about it, in my gut feeling it's not
about throw or no throw, because it should be possible to catch
exceptions in the caller side of the module CC if it were only about
exception. The real issue seems, as I mentioned in the previous

and it does catch that, but as this ticket shows it may have lost too much context to be useful in error reporting :)

comment (cited above), "what we should do if remote config update
fails in general", whether or not it's reported in the form of
exception. Errors can happen (just like we see it in this topic), and
even if we hide the corresponding exception, the problem of how we
ensure the system-level consistency among different modules that
share the config doesn't go away. That's what I meant by "part of
another big problem".

that is one reason i wanted to keep things 'local' as much as possible (and why i tend to dislike system-wide settings). Unless whatever checks a certain snippet of config verifies it well and completely, this may happen, and whoever 'owns' the data is responsible for it, but cannot make sure that other 'users' of said data can handle it correctly.

hmm. monday-morning-wild-idea-time: perhaps config container classes should be shared libraries (shared by the modules that use them) as well.

comment:12 in reply to: ↑ 11 Changed 7 years ago by jinmei

Replying to jelte:

First off, to be clear, I'm okay with merging the branch as its
current form.

Bigger issues:

  • The current error handling in ConfigurableClientList::configure seems quite ad hoc and policyless. [...]

Probably, but I do hope we can get something easier to edit than command/config in auth :)

Hmm, I'm not sure how the auth config implementation is related in
this context. Out of curiosity, what kind of difficulty are you
seeing?

while we have a number of config/command related handlers that are too flat and naive, IMO the configparser classes in auth overshot on the other side, and it has a bit too much abstraction and code going on for what it does, making it less flexible than it could be, and more complicated to change than necessary.

Perhaps if we filter out the essentials and make a framework out of it it would be better (keeping most if not all of the current things it does, like the build->commit design), as we would then at least have a standard way to do it, but right now it seems a bit overdone and too auth-specific.

Okay, points taken. As the person who wrote that stuff first, I admit
in its current state the complexity of the abstraction may not buy
much. But configurable knobs tend to grow rapidly once we reach that
stage, and I guess we can generally agree that flat-style
configuration handler (which would normally have hundreds of if-else
of switch-case blocks) will soon have scalability problems, aside from
the current style of abstraction is a good way to address the issue.
I also wanted to make it externally pluggable in future, so someone
can extend the configuration just by defining a derived class of
ConfigParser, and somehow plug it into the system, preferably
without requiring rebuild.

As for auth-specific'ness, I'm not so sure if it's too much so. The
concept of build->commit is generic, and the base parser class is in
fact mostly independent from specifics of auth except for its name:
AuthConfigParser. I'm not sure if I explicitly documented it, but I
certainly intended to use it as a general framework some day, not only
for auth.

Anyway, whether or not if we use this specific design pattern, I agree
it's better we have a general framework for config parser/handler.

comment (cited above), "what we should do if remote config update
fails in general", whether or not it's reported in the form of
exception. Errors can happen (just like we see it in this topic), and
even if we hide the corresponding exception, the problem of how we
ensure the system-level consistency among different modules that
share the config doesn't go away. That's what I meant by "part of
another big problem".

that is one reason i wanted to keep things 'local' as much as possible (and why i tend to dislike system-wide settings). Unless whatever checks a certain snippet of config verifies it well and completely, this may happen, and whoever 'owns' the data is responsible for it, but cannot make sure that other 'users' of said data can handle it correctly.

Maybe I don't understand it well, but I personally don't think it's
because system-wide settings. Unless different modules can run
completely independently, we still encounter a situation where some of
them share some kind of state/information/config, etc, and ensuring
consistency will generally be an issue (and it's a difficult one).
That's the case whether we implement it as a system-wide configuration
(like the revised "data_sources") or a config in a single module that
can be referenced by other modules (like "secondary zones" in zonemgr
referenced by ddns).

hmm. monday-morning-wild-idea-time: perhaps config container classes should be shared libraries (shared by the modules that use them) as well.

I'm afraid, in general, module specific errors cannot be completely
avoided, but providing a unified config handler with unified error
handling policy will help ensure the consistency in practice.

comment:13 Changed 7 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:14 Changed 7 years ago by jinmei

Hmm, I just noticed you've already merged the branch to master (which
I'm okay with).

So why is this ticket still open?

comment:15 Changed 7 years ago by jelte

  • Resolution set to fixed
  • Status changed from reviewing to closed

Because I forgot to close it :)

comment:16 Changed 7 years ago by jinmei

  • Total Hours changed from 0 to 3.5
Note: See TracTickets for help on using tickets.