Opened 7 years ago

Closed 7 years ago

#2421 closed defect (fixed)

don't reject configuring entire inmemory datasrc due to one broken zone

Reported by: jinmei Owned by: muks
Priority: medium Milestone: Sprint-20121120
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: High
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 4 Add Hours to Ticket: 0
Total Hours: 2.49 Internal?: no

Description (last modified by jinmei)

We currently reject configuring a data source (with in-memory cache)
if loading one particular zone fails. That's probably not a good
behavior, and is at least incompatible with BIND 9. BIND 9 simply
marks it as a bad zone and returns SERVFAIL to queries to that zone,
still loading and serving any other valid zones.

I think that's more reasonable behavior, and I believe it's not very
difficult to implement it. So I suggest we fix it now.

It's also related to an issue reported by a user.

See also #1932. In that ticket I hinted deferring it until we
complete the generic loader work, but with the understanding of the
most recent code I think it can be done separately from the loader,
i.e., just updating the ConfigurableClientList class.

Subtickets

Change History (19)

comment:1 Changed 7 years ago by jinmei

  • Description modified (diff)

comment:2 Changed 7 years ago by jelte

  • Defect Severity changed from N/A to High
  • Milestone changed from Next-Sprint-Proposed to Sprint-20121120

comment:3 Changed 7 years ago by jinmei

  • Priority changed from high to medium

comment:4 Changed 7 years ago by muks

  • Owner set to muks
  • Status changed from new to assigned

I'm not able to reproduce this bug. A lettuce test returns REFUSED for the broken zone but it still serves the other valid zones (from the same datasource).

comment:5 Changed 7 years ago by muks

  • Owner changed from muks to UnAssigned
  • Status changed from assigned to reviewing

See trac2421 for the lettuce test. Please review if the test is OK and if this is reasonable, we can merge the test and simply close this bug.

comment:6 Changed 7 years ago by muks

  • Owner changed from UnAssigned to muks

Taking back. It seems that exceptions other than MasterLoadError (such as AddError) trigger this issue. Will modify the lettuce test for it and fix code.

comment:7 follow-up: Changed 7 years ago by muks

  • Owner changed from muks to UnAssigned

Back to review. It now handles NullRRset, AddError and EmptyZone exceptions.

comment:8 in reply to: ↑ 7 Changed 7 years ago by jinmei

Replying to muks:

Back to review. It now handles NullRRset, AddError and EmptyZone exceptions.

Not looking at the code, but I don't think it a good idea to catch
each and every possible exception if the above means that. I think we
use this opportunity to make it a bit more maintainable: introducing
some exception class hierarchy so we can distinguish minor errors for
a specific zone and more serious errors. The former should be caught
per zone basis and can be basically ignored with a log message; the
latter would need to be propagated to a higher level, and would either
result in dropping the entire configuration or even terminate the
program.

comment:9 follow-up: Changed 7 years ago by vorner

  • Owner changed from UnAssigned to muks

Hello

I agree with Jinmei it would be better to have some kind of exception hierarchy for this.

Also, there are some problems with the code:

  • You reuse log-ids. That's forbidden, each log ID must appear at exactly one place of the code.
  • Can a user cause DATASRC_LOAD_NULL_RRSET to happen? How? I think this is a programmer error (that there's a NULL RRset to start with). What would the user do with a null RRset if it happens? Would he at least guess what the problem is? (because from user point of view, there's no such thing as NULL RRset).
  • Could there also be unit tests for this, besides lettuce tests?

comment:10 in reply to: ↑ 9 ; follow-up: Changed 7 years ago by muks

Hi Michal, Jinmei

Replying to vorner:

I agree with Jinmei it would be better to have some kind of exception hierarchy for this.

This is added now.

Also, there are some problems with the code:

  • You reuse log-ids. That's forbidden, each log ID must appear at exactly one place of the code.

Fixed. :)

  • Can a user cause DATASRC_LOAD_NULL_RRSET to happen? How? I think this is a programmer error (that there's a NULL RRset to start with). What would the user do with a null RRset if it happens? Would he at least guess what the problem is? (because from user point of view, there's no such thing as NULL RRset).

There was a way before using the add() method which has been removed since. But even then, I feel it's better to keep the validation check in place. But we don't catch the NullRRset (it's not in the ZoneException hierarchy, but is an InvalidParameter) anymore.

  • Could there also be unit tests for this, besides lettuce tests?

I feel lettuce is the best place for it as this is a bug that shows up with bad user configuration. One more check has been added to the lettuce test. I think they cover the various cases now. Please see this and if you feel that a unit test would cover something and would be needed, I'll add it.

comment:11 Changed 7 years ago by muks

  • Owner changed from muks to vorner

comment:12 in reply to: ↑ 10 ; follow-up: Changed 7 years ago by vorner

  • Owner changed from vorner to muks

Hello

Replying to muks:

  • Could there also be unit tests for this, besides lettuce tests?

I feel lettuce is the best place for it as this is a bug that shows up with bad user configuration. One more check has been added to the lettuce test. I think they cover the various cases now. Please see this and if you feel that a unit test would cover something and would be needed, I'll add it.

I don't say there should not be the lettuce test. The lettuce test is indeed very useful.

But every piece of code should (at least in theory) be covered by unit tests, at least simple ones. You could mock the loader to throw something.

The reasons to have unit test as well as unit ones are two:

  • Developers usually run unit tests only, not the lettuce tests, during development. I suspect not everybody runs lettuce test at least when submitting the ticket to review. So unit tests find the problem sooner.
  • When there's a problem, the unit test has smaller scope, so it points closer to the problem.

Also, this description:

% DATASRC_LOAD_FROM_ITERATOR_ERROR %1
An error was found in the zone data when it was being loaded from an
iterator. The zone was not loaded. The specific error is shown in the
message, and should be addressed.

I don't know if the user knows enough of the internal to have clue what the iterator is. May I suggest to say something like „when it was being loaded from another data source“ or „from database“ or something like that?

Also, thinking about it, it might be useful to also log which zone it is (in both cases).

With regards

comment:13 Changed 7 years ago by jinmei

btw: this will definitely need a changelog entry. In general, I suggest you either explicitly
say you don't see the need for it (the reviewer may disagree in which case it should
be discussed) or provide one. If you're silent about it, it's not clear whether
you didn't think it was needed or just forgot it, and if the reviewer doesn't notice
it we merge the branch without a changelog when it would actually be needed.

comment:14 Changed 7 years ago by muks

ChangeLog entry:

XXX.    [bug]           muks
        When some zones have invalid data in them, b10-auth no
        longer exits when loading them. It continues and serves
        the good zones.
        (Trac #2421, git ...)

comment:15 in reply to: ↑ 12 ; follow-up: Changed 7 years ago by muks

  • Owner changed from muks to vorner

Hi Michal

Replying to vorner:

The reasons to have unit test as well as unit ones are two:

  • Developers usually run unit tests only, not the lettuce tests, during development. I suspect not everybody runs lettuce test at least when submitting the ticket to review. So unit tests find the problem sooner.
  • When there's a problem, the unit test has smaller scope, so it points closer to the problem.

That is good enough of a reason. :) I've added unit tests.

Also, this description:

% DATASRC_LOAD_FROM_ITERATOR_ERROR %1
An error was found in the zone data when it was being loaded from an
iterator. The zone was not loaded. The specific error is shown in the
message, and should be addressed.

I don't know if the user knows enough of the internal to have clue what the iterator is. May I suggest to say something like „when it was being loaded from another data source“ or „from database“ or something like that?

Updated.

Also, thinking about it, it might be useful to also log which zone it is (in both cases).

Done.

comment:16 in reply to: ↑ 15 ; follow-up: Changed 7 years ago by vorner

  • Owner changed from vorner to muks

Hello

Replying to muks:

That is good enough of a reason. :) I've added unit tests.

I have two very minor points.

First, it would be nice to have some kind of comment explaining the zones are broken and should not be loaded.

Also, you might consider adding EXPECT_NO_THROW around list_->configure(elem, true);

Other than that, it looks OK to merge.

comment:17 in reply to: ↑ 16 Changed 7 years ago by muks

  • Owner changed from muks to vorner

Hi Michal

Replying to vorner:

I have two very minor points.

First, it would be nice to have some kind of comment explaining the zones are broken and should not be loaded.

I've updated the comments.

Also, you might consider adding EXPECT_NO_THROW around list_->configure(elem, true);

Done. :)

comment:18 Changed 7 years ago by vorner

  • Owner changed from vorner to muks
  • Total Hours changed from 0 to 2.49

Hello

I think it is OK to merge.

comment:19 Changed 7 years ago by muks

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged to master branch in commit bbb0752c34f7712864b596f208cefd9746eab287:

* e650a0f [2421] Put code inside EXPECT_NO_THROW
* 744d1be [2421] Update comments about test data
* b37e972 [2421] Include zone name in log messages
* 7498576 [2421] Update DATASRC_LOAD_FROM_ITERATOR_ERROR description
* 49d6a08 [2421] Add unit tests for broken zones during configure()
* 4bc7823 [2421] Check that the load error messages are logged
* 2332578 [2421] Add a lettuce testcase where the zone file doesn't exist
* bc457a1 [2421] Throw ZoneLoaderException even for MasterLoadError
* 0d08fcb [2421] Catch ZoneValidationException at a higher level
* 5ba2cab [2421] Update InMemoryClient doc about exceptions
* 9e1942f [2421] Throw AddError instead of OutOfZone when loading out-of-zone data
* 6c30834 [2421] Add zone exceptions hierarchy
* cec4bf4 [2421] Handle exceptions when loading a zone
* 80c2a36 [2421] Add a testcase for empty zone too (which should throw EmptyZone)
* ecace4b [2421] Change test data to throw ZoneDataUpdater::AddError
* 0ff35f6 [2421] Add lettuce test with broken zone

Resolving as fixed. Thank you for the reviews Michal and Jinmei.

Note: See TracTickets for help on using tickets.