Opened 7 years ago

Closed 7 years ago

#2420 closed defect (fixed)

allow loading zones containing an orphan RRSIG

Reported by: jinmei Owned by: jinmei
Priority: medium Milestone: Sprint-20121204
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: High
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 13.57 Internal?: no

Description (last modified by jinmei)

We currently reject loading (to memory) an entire zone if we find
an RRSIG that doesn't have a covered RRset:

    // Right now, we don't accept RRSIG without covered RRsets (this
    // should eventually allowed, but to do so we'll need to update the
    // finder).

This is overkill, and won't become realistic check anyway when we
support the complete zone parser/loader because we cannot tell when
the covered RRSIG is added (unless we maintain a possibly huge size of
intermediate storage for the "orphan" RRSIGs). The behavior is also
incompatible with BIND 9.

There's also a report from a user who is suffering from this behavior,
so I suggest we should fix it now. In terms of data structures it
should already be possible, so it shouldn't be so difficult.

See also #2273.

Subtickets

Change History (18)

comment:1 Changed 7 years ago by jinmei

  • Description modified (diff)

comment:2 Changed 7 years ago by jelte

  • Defect Severity changed from N/A to High
  • Milestone changed from Next-Sprint-Proposed to Sprint-20121120

comment:3 Changed 7 years ago by jinmei

  • Priority changed from high to medium

comment:4 Changed 7 years ago by jinmei

  • Owner set to jinmei
  • Status changed from new to accepted

comment:5 Changed 7 years ago by jinmei

trac2420 is ready for review.

The main change is not big, but I needed to update various parts of
the implementation and add quite a few numbers of tests to catch
various different cases. So the entire diff is a bit big and changes
are scattered. Here's some suggested instruction for review in the
hope that it might reduce the pain:

  • The first commit (89a6779) is a pure refactoring (no behavior change), and only for the convenience of tests that are added later. I suggest reviewing this commit separately and then forget it.
  • commits from 1558c3b to 7eb7d5e are the main changes for the subject of this ticket. Among these d805ae6 is probably the most important change, and it's basically independent from others. So I suggest reviewing this commit next, and separately.
  • 798e61a is also an independent change, and was needed simply because new test data caused an exception in a test for the old version of in-memory data source. We should really deprecate this stuff soon, but until then I suggest we live with this workaround.
  • I think the rest of the changes between 1558c3b to 7eb7d5e is reasonably understandable. There are many test cases to cover various scenarios, but hopefully the comments help understand them.
  • Finally, 9397bd5 is a totally independent, and unrelated fix. As commented in the commit log, I noticed the current code could cause an unexpected assert() failure for a half-broken zone that is generally NSEC-signed but has no NSEC at the origin. We could exclude this change from this ticket, but since it could be potentially serious and the fix itself is small, I thought it might make sense to piggy back the fix.

The suggested changelog entry is:

503.?	[bug]		jinmei
	The in-memory data source now accepts an RRSIG provided without
	a covered RRset in loading.  A subsequent query for its owner name
	of the covered type would generally result in NXRRSET; if the
	covered RRset is of type NSEC3, the corresponding NSEC3 processing
	would result in SERVFAIL.
	(Trac #2420, git TBD)

comment:6 Changed 7 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from accepted to reviewing

comment:7 Changed 7 years ago by vorner

  • Owner changed from UnAssigned to vorner

comment:8 follow-ups: Changed 7 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

The size was actually quite OK (the changes were related to each other). I just want to confirm, the incoming RRsets need to be (still) grouped by the name, right?

There's one serious thing. The tests don't pass:

[ RUN      ] ZoneDataUpdaterTest.rrisgForNSEC3Only
unknown file: Failure
C++ exception with description "Unknown NSEC3 algorithm: 200" thrown in the test body.
[  FAILED  ] ZoneDataUpdaterTest.rrisgForNSEC3Only (0 ms)

Also, several smaller ones.

In this diff, it looks the comment was there already, but adding the new line
makes it look the comment is about the new code:

 // Ok, we just put it in.
+        const bool is_origin = (node == zone_data_.getOriginNode());

Regarding this logging:

     LOG_DEBUG(logger, DBG_TRACE_DATA, DATASRC_MEMORY_MEM_ADD_RRSET).
-        arg(rrset->getName()).arg(rrset->getType()).arg(zone_name_);
+        arg(name).arg(rrtype).arg(zone_name_);

I think it might be confusing in case the only RRsig is added. Then it'll look
like the real RRset is being added, but that one does not exist.

I'm not sure it works with adding the RRset and RRSIG separately, as the
comment suggests. Won't it throw that such type is already there?

    /// At least one of \c rrset or \c sig_rrset must be non NULL.
    /// \c sig_rrset can be reasonably NULL when \c rrset is not signed in
    /// the zone; it's unusual that \c rrset is NULL, but is still possible
    /// if these RRsets are given separately to the loader, or if even the
    /// zone is half broken and really contains an RRSIG that doesn't have
    /// any covered RRset.  This implementation supports these cases.

About the „tricky case“ with NSEC3 ‒ in the reality, there is no way to get
such RRSIG out, even if the adding was impemented, right? Wouldn't it be better
to just warn and ignore the RRSIG? And thinking about it, would it be worth
warning about the orphan RRSIGs?

Isn't this just unneeded complexity?

    ~ZoneDataUpdaterTest() {
        // Make sure zone data is destroyed even if a test results in exception
        if (zone_data_ != NULL) {
            ZoneData::destroy(mem_sgmt_, zone_data_, zclass_);
        }
    }

I mean, they are tests only. They terminate in a very short while, so the
resource leak is not problematic from the point of view of eating a lot of
memory and not using it. And if there's an exception, the tests fail anyway, so
it is not needed from the point of view to have clear valgrind run (failure
like failure). It looks slightly awkward to have the ZoneData::destroy twice,
once in the destructor and once in TearDown?. Or, could the check for memory
leak be done in the destructor?

comment:9 in reply to: ↑ 8 ; follow-up: Changed 7 years ago by jinmei

Replying to vorner:

The size was actually quite OK (the changes were related to each other). I just want to confirm, the incoming RRsets need to be (still) grouped by the name, right?

There's one serious thing. The tests don't pass:

[ RUN      ] ZoneDataUpdaterTest.rrisgForNSEC3Only
unknown file: Failure
C++ exception with description "Unknown NSEC3 algorithm: 200" thrown in the test body.
[  FAILED  ] ZoneDataUpdaterTest.rrisgForNSEC3Only (0 ms)

One quick check: I can't reproduce it on my environment. Is this
specific to some special cases like distcheck? Or does that happen
even if you run it on the tests/memory directory by "./run_unittests"?
Is it possible that your environment uses the wrong version of shlib?

comment:10 in reply to: ↑ 9 Changed 7 years ago by vorner

Hello

Replying to jinmei:

Replying to vorner:

The size was actually quite OK (the changes were related to each other). I just want to confirm, the incoming RRsets need to be (still) grouped by the name, right?

There's one serious thing. The tests don't pass:

[ RUN      ] ZoneDataUpdaterTest.rrisgForNSEC3Only
unknown file: Failure
C++ exception with description "Unknown NSEC3 algorithm: 200" thrown in the test body.
[  FAILED  ] ZoneDataUpdaterTest.rrisgForNSEC3Only (0 ms)

One quick check: I can't reproduce it on my environment. Is this
specific to some special cases like distcheck? Or does that happen
even if you run it on the tests/memory directory by "./run_unittests"?
Is it possible that your environment uses the wrong version of shlib?

It happens every time. However, the reported number of algorithm differs from run to run.

Also, when run with GTEST_FILTER to run only the one test, it segfaults. Here's a stacktrace, if it is of any help:

Program received signal SIGSEGV, Segmentation fault.
0x000000000049dbf8 in isc::datasrc::memory::NSEC3Data::getSaltLen (this=0x10076ff3f) at ../../../../../src/lib/datasrc/memory/zone_data.h:161
1619    size_t getSaltLen() const { return (*getSaltBuf()); }
(gdb) bt
#0  0x000000000049dbf8 in isc::datasrc::memory::NSEC3Data::getSaltLen (this=0x10076ff3f) at ../../../../../src/lib/datasrc/memory/zone_data.h:161
#1  0x00007ffff7b8a1da in isc::datasrc::memory::ZoneDataUpdater::getNSEC3Hash (this=0x76fc20) at zone_data_updater.cc:220
#2  0x00007ffff7b8b79a in isc::datasrc::memory::ZoneDataUpdater::setupNSEC3<isc::dns::rdata::generic::NSEC3> (this=0x76fc20, rrset=...)
    at zone_data_updater.cc:240
#3  0x00007ffff7b8a296 in isc::datasrc::memory::ZoneDataUpdater::addNSEC3 (this=0x76fc20, name=..., rrset=..., rrsig=...) at zone_data_updater.cc:254
#4  0x00007ffff7b8a597 in isc::datasrc::memory::ZoneDataUpdater::addRdataSet (this=0x76fc20, name=..., rrtype=..., rrset=..., rrsig=...)
    at zone_data_updater.cc:286
#5  0x00007ffff7b8aea1 in isc::datasrc::memory::ZoneDataUpdater::add (this=0x76fc20, rrset=..., sig_rrset=...) at zone_data_updater.cc:376
#6  0x00000000004dc19d in (anonymous namespace)::ZoneDataUpdaterTest_rrisgForNSEC3Only_Test::TestBody (this=0x76fbc0) at zone_data_updater_unittest.cc:189
#7  0x00007ffff4f1bcff in HandleSehExceptionsInMethodIfSupported<testing::Test, void> (method=<optimized out>, object=<optimized out>, 
    location=<optimized out>) at src/gtest.cc:2090
#8  testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x76fbc0, method=&virtual testing::Test::TestBody(), 
    location=0x7ffff4f1c9cb "the test body") at src/gtest.cc:2126
#9  0x00007ffff4f13509 in testing::Test::Run (this=0x76fbc0) at src/gtest.cc:2162
#10 0x00007ffff4f135d6 in testing::TestInfo::Run (this=0x762030) at src/gtest.cc:2338
#11 0x00007ffff4f13716 in testing::TestCase::Run (this=0x761e10) at src/gtest.cc:2445
#12 0x00007ffff4f13abe in RunAllTests (this=0x753250) at src/gtest.cc:4237
#13 testing::internal::UnitTestImpl::RunAllTests (this=0x753250) at src/gtest.cc:4145
#14 0x00007ffff4f1b8df in HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (method=<optimized out>, object=<optimized out>, 
    location=<optimized out>) at src/gtest.cc:2090
#15 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x753250, method=
    (bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x7ffff4f137a0 <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x7ffff4f1dab8 "auxiliary test code (environments or event listeners)") at src/gtest.cc:2126
#16 0x00007ffff4f12b7e in testing::UnitTest::Run (this=<optimized out>) at src/gtest.cc:3874
#17 0x00000000004e553c in isc::util::unittests::run_all () at run_all.cc:87
#18 0x000000000042f63a in main (argc=1, argv=0x7fffffffd658) at run_unittests.cc:25

From a quick look, nsec3_data might be uninitialized when being used here. But I didn't look how that could happen.

I'm not aware of using shlib, and emerge (gentoo's package manager) does not see any package with name or description of shlib.

Does the backtrace help, or should I try digging

Also, maybe pass the ticket to me if you have further questions. I noticed the question mostly by accident.

comment:11 in reply to: ↑ 8 ; follow-up: Changed 7 years ago by jinmei

Replying to vorner:

First, about the crash in tests.

There's one serious thing. The tests don't pass:

[ RUN      ] ZoneDataUpdaterTest.rrisgForNSEC3Only
unknown file: Failure
C++ exception with description "Unknown NSEC3 algorithm: 200" thrown in the test body.
[  FAILED  ] ZoneDataUpdaterTest.rrisgForNSEC3Only (0 ms)

I've found a bug in the test code and fixed it at bb38dbf. It should
fix this issue. I also noticed minor typo in test names and fixed it.

The size was actually quite OK (the changes were related to each other). I just want to confirm, the incoming RRsets need to be (still) grouped by the name, right?

For now, yes. But we'll need to be flexible soon, in the context of
the generic parser/loader feature. See below about passing RRs of the
same name separately.

In this diff, it looks the comment was there already, but adding the new line
makes it look the comment is about the new code:

 // Ok, we just put it in.
+        const bool is_origin = (node == zone_data_.getOriginNode());

Okay, fixed at commit f31f6dd.

Regarding this logging:

     LOG_DEBUG(logger, DBG_TRACE_DATA, DATASRC_MEMORY_MEM_ADD_RRSET).
-        arg(rrset->getName()).arg(rrset->getType()).arg(zone_name_);
+        arg(name).arg(rrtype).arg(zone_name_);

I think it might be confusing in case the only RRsig is added. Then it'll look
like the real RRset is being added, but that one does not exist.

Fixed at commit 4263ba5.

I'm not sure it works with adding the RRset and RRSIG separately, as the
comment suggests. Won't it throw that such type is already there?

    /// At least one of \c rrset or \c sig_rrset must be non NULL.
    /// \c sig_rrset can be reasonably NULL when \c rrset is not signed in
    /// the zone; it's unusual that \c rrset is NULL, but is still possible
    /// if these RRsets are given separately to the loader, or if even the
    /// zone is half broken and really contains an RRSIG that doesn't have
    /// any covered RRset.  This implementation supports these cases.

Right now, this would result in an exception. But as I noted at the
beginning of this response, we'll need to be more flexible about as
(advanced) part of generic zone parser/loader. We could note that
this behavior is for expected future changes if you want, but I
thought it's okay to mention it at this stage.

About the „tricky case“ with NSEC3 ‒ in the reality, there is no way to get
such RRSIG out, even if the adding was impemented, right? Wouldn't it be better
to just warn and ignore the RRSIG? And thinking about it, would it be worth
warning about the orphan RRSIGs?

I actually thought about it, but again, NSEC3 and its RRSIG can be
passed separately (interleaved with records of other names), so at
least we cannot simply ignore such RRSIG. Leaving a log might be
okay, but a warning is probably too noisy as we cannot assume in which
order the RRs are coming.

Isn't this just unneeded complexity?

    ~ZoneDataUpdaterTest() {
        // Make sure zone data is destroyed even if a test results in exception
        if (zone_data_ != NULL) {
            ZoneData::destroy(mem_sgmt_, zone_data_, zclass_);
        }
    }

I mean, they are tests only. They terminate in a very short while, so the
resource leak is not problematic from the point of view of eating a lot of
memory and not using it. And if there's an exception, the tests fail anyway, so
it is not needed from the point of view to have clear valgrind run (failure
like failure). It looks slightly awkward to have the ZoneData::destroy twice,
once in the destructor and once in TearDown?. Or, could the check for memory
leak be done in the destructor?

I know it may look overkill, but I generally want to make any of code
as safe as possible. I also understand that sometimes brevity
outweighs safety in tests, but in this case (admittedly subjectively)
I thought the additional complexity is acceptable.

I think we can unify the call to destroy() and the check for memory
leak in the destructor this way:

    ~ZoneDataUpdaterTest() {
        // Make sure zone data is destroyed even if a test results in exception
        if (zone_data_ != NULL) {
            ZoneData::destroy(mem_sgmt_, zone_data_, zclass_);
        }
        // catch any leak here.
        if (mem_sgmt_.allMemoryDeallocated()) {
            ADD_FAILURE() << "Memory leak detected";
        }
    }

(and remove TearDown()). If this is acceptable I'm okay with
changing it this way.

comment:12 Changed 7 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:13 in reply to: ↑ 11 ; follow-up: Changed 7 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

I've found a bug in the test code and fixed it at bb38dbf. It should
fix this issue. I also noticed minor typo in test names and fixed it.

Yes, it passes now.

I'm not sure it works with adding the RRset and RRSIG separately, as the
comment suggests. Won't it throw that such type is already there?

    /// At least one of \c rrset or \c sig_rrset must be non NULL.
    /// \c sig_rrset can be reasonably NULL when \c rrset is not signed in
    /// the zone; it's unusual that \c rrset is NULL, but is still possible
    /// if these RRsets are given separately to the loader, or if even the
    /// zone is half broken and really contains an RRSIG that doesn't have
    /// any covered RRset.  This implementation supports these cases.

Right now, this would result in an exception. But as I noted at the
beginning of this response, we'll need to be more flexible about as
(advanced) part of generic zone parser/loader. We could note that
this behavior is for expected future changes if you want, but I
thought it's okay to mention it at this stage.

I think the documentation should match the behaviour as much as we can make it.
It is quite frustrating when one writes a code that should be correct by the
documentation, but it doesn't work. After all, it is said that a bug in
documentation is worth two in the code.

About the „tricky case“ with NSEC3 ‒ in the reality, there is no way to get
such RRSIG out, even if the adding was impemented, right? Wouldn't it be better
to just warn and ignore the RRSIG? And thinking about it, would it be worth
warning about the orphan RRSIGs?

I actually thought about it, but again, NSEC3 and its RRSIG can be
passed separately (interleaved with records of other names), so at
least we cannot simply ignore such RRSIG. Leaving a log might be
okay, but a warning is probably too noisy as we cannot assume in which
order the RRs are coming.

But currently such scenario would throw anyway. I'm not saying this would be
for the long-term, since there'll be some substantial changes for merging
RRsets. I thought for now. But if you still think it is not worth it, then we
can probably leave it as it is.

I think we can unify the call to destroy() and the check for memory
leak in the destructor this way:

    ~ZoneDataUpdaterTest() {
        // Make sure zone data is destroyed even if a test results in exception
        if (zone_data_ != NULL) {
            ZoneData::destroy(mem_sgmt_, zone_data_, zclass_);
        }
9// catch any leak here.
        if (mem_sgmt_.allMemoryDeallocated()) {
            ADD_FAILURE() << "Memory leak detected";
        }
    }

(and remove TearDown()). If this is acceptable I'm okay with
changing it this way.

I think this would look better.

comment:14 in reply to: ↑ 13 Changed 7 years ago by jinmei

Replying to vorner:

I'm not sure it works with adding the RRset and RRSIG separately, as the
comment suggests. Won't it throw that such type is already there?

Right now, this would result in an exception. But as I noted at the
beginning of this response, we'll need to be more flexible about as
(advanced) part of generic zone parser/loader. We could note that
this behavior is for expected future changes if you want, but I
thought it's okay to mention it at this stage.

I think the documentation should match the behaviour as much as we can make it.
It is quite frustrating when one writes a code that should be correct by the
documentation, but it doesn't work. After all, it is said that a bug in
documentation is worth two in the code.

Okay, I updated the documentation clarifying the point.

About the „tricky case“ with NSEC3 ‒ in the reality, there is no way to get
such RRSIG out, even if the adding was impemented, right? Wouldn't it be better
to just warn and ignore the RRSIG? And thinking about it, would it be worth
warning about the orphan RRSIGs?

I actually thought about it, but again, NSEC3 and its RRSIG can be
passed separately (interleaved with records of other names), so at
least we cannot simply ignore such RRSIG. Leaving a log might be
okay, but a warning is probably too noisy as we cannot assume in which
order the RRs are coming.

But currently such scenario would throw anyway. I'm not saying this would be
for the long-term, since there'll be some substantial changes for merging
RRsets. I thought for now. But if you still think it is not worth it, then we
can probably leave it as it is.

I wouldn't strongly oppose to logging it, but don't see the strong
need for it either. At least as long as we throw, it's effectively
logged somewhere. So I don't touch it for now.

I think we can unify the call to destroy() and the check for memory
leak in the destructor this way:

I think this would look better.

Updated that way.

comment:15 Changed 7 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:16 follow-up: Changed 7 years ago by vorner

  • Owner changed from vorner to jinmei
  • Total Hours changed from 0 to 3.32

Hello

Replying to jinmei:

But currently such scenario would throw anyway. I'm not saying this would be
for the long-term, since there'll be some substantial changes for merging
RRsets. I thought for now. But if you still think it is not worth it, then we
can probably leave it as it is.

I wouldn't strongly oppose to logging it, but don't see the strong
need for it either. At least as long as we throw, it's effectively
logged somewhere. So I don't touch it for now.

My point was not about the logging, but about accepting the zone where somebody
forgot to remove an NSEC3 RRSIG, or something. But it should be rare enough, so
I won't argue.

So I think it can be merged.

comment:17 in reply to: ↑ 16 Changed 7 years ago by jinmei

Replying to vorner:

But currently such scenario would throw anyway. I'm not saying this would be
for the long-term, since there'll be some substantial changes for merging
RRsets. I thought for now. But if you still think it is not worth it, then we
can probably leave it as it is.

I wouldn't strongly oppose to logging it, but don't see the strong
need for it either. At least as long as we throw, it's effectively
logged somewhere. So I don't touch it for now.

My point was not about the logging, but about accepting the zone
where somebody forgot to remove an NSEC3 RRSIG, or something. But it
should be rare enough, so I won't argue.

I don't understand what this ("somebody forgot...") means in this
context...but anyway, I suspect we'll need to revisit many things when
we are more flexible about load timings anyway, so I'd leave that
particular point to that part of work.

So I think it can be merged.

Okay, thanks, merge done, closing.

comment:18 Changed 7 years ago by jinmei

  • Resolution set to fixed
  • Status changed from reviewing to closed
  • Total Hours changed from 3.32 to 13.57
Note: See TracTickets for help on using tickets.