Opened 7 years ago

Closed 7 years ago

#2268 closed task (fixed)

some cleanups for in-memory zone load

Reported by: jinmei Owned by: muks
Priority: medium Milestone: Sprint-20121023
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 5.25 Internal?: no

Description (last modified by jinmei)

Here are some points to the new in-memory load() function (and its
helper methods). These are things I would have at least discussed
in review if I had been there, so please consider it a kind of delayed
review feedback.

  • stop using pimpl. The in-memory client is now almost private itself (not expected to be directly used outside of the data source implementation), so the advantages of pimpl wouldn't outweigh the overhead and complexity of it.
  • move the load logic to a separate .cc file than memory_client.cc. It's pretty complicated, and effectively acts like a zone updater class implementation. so it makes more sense to have it in a separate file for better readability, just like we separate the finder implementation into a separate file.
  • reuse the same RdataEncoder throughout the entire load(). If we use a helper class as suggested in #2267, that would be a member of it. (If #2267 is resolved without such a helper class, maybe we should consider introducing it here).
  • sort RdataSets in the order of type code. maybe it's a premature optimization, but I guess younger code types (A, NS, SOA, etc, in particular A) are looked up more often than minor ones. Since we use naive linear search, it's generally better to place such types of RdataSets before minor ones.

Subtickets

Change History (20)

comment:1 Changed 7 years ago by jinmei

  • Description modified (diff)

comment:2 Changed 7 years ago by muks

  • Estimated Difficulty changed from 0 to 5

comment:3 Changed 7 years ago by muks

  • Milestone changed from Next-Sprint-Proposed to Sprint-20121009

comment:4 Changed 7 years ago by muks

  • Owner set to muks
  • Status changed from new to assigned

Picking

comment:5 Changed 7 years ago by muks

  • Owner changed from muks to UnAssigned

Up for review.

A few notes:

  • This branch also contains the ZoneFinder? test cleanup mentioned in #2218 (see comments 23 and 25) as it depends on the refactored code.
  • For the sorting, tests insert RRTypes like A, AAAA and MX out of order and check that they are returned in order.
  • I have not added direct tests for ZoneDataUpdater as everything which happens inside it is tested thoroughly from InMemoryClient's testcode. Changing all those tests to directly work with ZoneData would be a lot of work (outside points for this bug) as I know it took a long time to write InMemoryClient's tests to handle every validation case. I don't think we need such excess testing as InMemoryClient already covers it, but if the reviewer says we do, let's create another bug for it.

comment:6 Changed 7 years ago by muks

  • Status changed from assigned to reviewing

comment:7 Changed 7 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:8 follow-up: Changed 7 years ago by jinmei

general

  • regarding tests, IMO we should have dedicated tests for ZoneDataUpdater (maybe mainly by porting memory client tests) because in general we should test a class as direct as possible, minimizing other dependency. Indirect tests often miss some cases (due to interface restrictions) and are harder to maintain (due to the additional dependency). But in this particular case I'm okay with deferring that to a separate ticket.

zone_data_updater.h

  • ZoneDataUpdater class needs class description doc. The general concept, expected usage, other design notes, why it's non copyable, etc.
  • ZoneDataUpdater: destructor is trivial and (the explicit definition) can be omitted.
  • ZoneDataUpdater constructor needs a description.
  • add() needs a description. explain that if it can throw.

zone_data_updater.cc

  • I'd incorporate more of the load() logic here. In fact, the functionality implemented in InMemoryClient::Loader should better belong here.
  • general: log messages are gone. please don't make such radical changes silently (and I actually don't like to remove them).
  • setupNSEC3: using NSEC3Hash looks like a good idea, but I'd avoid creating it every time. I guess we can make it a class member variable and create/initialize it only when necessary and first time.
  • addRdataSet(): if we want to keep the sorting and worry about the overhead, I guess we can relax the condition a bit and introduce some optimization: just sort A/AAAA/NS/SOA (and MX?) before others, and simplify and fasten the comparison logic using bitmaps (we only need 32 bits).
  • nit: technically namespace {...} is not "anonymous". It's an "unnamed" namespace.
    namespace { // anonymous namespace
    

memory_client.cc

  • InMemoryClient constructor: Probably not due to this branch, but it seems to be exception unsafe:
    InMemoryClient::InMemoryClient(util::MemorySegment& mem_sgmt,
                                   RRClass rrclass) :
        mem_sgmt_(mem_sgmt),
        rrclass_(rrclass),
        zone_count_(0),
        zone_table_(ZoneTable::create(mem_sgmt_, rrclass)),
        file_name_tree_(FileNameTree::create(mem_sgmt_, false))
    
    when FileNameTree::create() throws.

memory_client.h

  • The LoadCallback type is essentially private. I'd clarify that by putting it into a separate namespace (e.g. "internal"). But, this should probably go to zone data updater (see above) in the first place.

zone_finder_unittest.cc

  • The comment doesn't make sense any more:
        // simplified version of 'loading' data
        void addZoneData(const ConstRRsetPtr rrset) {
            updater_.add(rrset, rrset->getRRsig());
        }
    
    Also, we might just deprecate entire addZoneData() as its content is a simple one liner (for that matter, addZoneData() doesn't sound like a good name anyway - it should be named something like addRRsetToZone())
  • If we keep the per-type sorting, I think there should be more tests. For example, what if we add NS then NSEC? If NSEC then NS? Case with more than two RRsets, etc. Also, such part should have been written test driven: write some new test case, confirm it fails, and incrementally add the sorting logic (b3d2730 doesn't seem to be like that - it more suggests that the whole main code was written and then failed existing tests were adjusted).

comment:9 Changed 7 years ago by jinmei

  • Owner changed from jinmei to muks

comment:10 in reply to: ↑ 8 ; follow-up: Changed 7 years ago by muks

  • Owner changed from muks to jinmei

Hi Jinmei

Replying to jinmei:

general

I follow the style now for the opening brace.

  • regarding tests, IMO we should have dedicated tests for ZoneDataUpdater (maybe mainly by porting memory client tests) because in general we should test a class as direct as possible, minimizing other dependency. Indirect tests often miss some cases (due to interface restrictions) and are harder to maintain (due to the additional dependency). But in this particular case I'm okay with deferring that to a separate ticket.

Created bug #2338 for it.

zone_data_updater.h

  • ZoneDataUpdater class needs class description doc. The general concept, expected usage, other design notes, why it's non copyable, etc.

These are added.

  • ZoneDataUpdater: destructor is trivial and (the explicit definition) can be omitted.

It has a definition now.

  • ZoneDataUpdater constructor needs a description.

Added.

  • add() needs a description. explain that if it can throw.

Added.

zone_data_updater.cc

  • I'd incorporate more of the load() logic here. In fact, the functionality implemented in InMemoryClient::Loader should better belong here.

I have added Loader, but this is where I want to stop. After this, the next closest part of InMemoryClient contains code to populate the FileNameTree, ZoneTable, etc. which don't belong in a ZoneDataUpdater. I think all code left now in InMemoryClient belongs there.

  • general: log messages are gone. please don't make such radical changes silently (and I actually don't like to remove them).

It was not malicious. I just thought they were redundant next to the throws, as doing the logging when the exception is caught will still have the message with the problem. I've put them back.

  • setupNSEC3: using NSEC3Hash looks like a good idea, but I'd avoid creating it every time. I guess we can make it a class member variable and create/initialize it only when necessary and first time.

Done.

  • addRdataSet(): if we want to keep the sorting and worry about the overhead, I guess we can relax the condition a bit and introduce some optimization: just sort A/AAAA/NS/SOA (and MX?) before others, and simplify and fasten the comparison logic using bitmaps (we only need 32 bits).

Sorting is removed now.

  • nit: technically namespace {...} is not "anonymous". It's an "unnamed" namespace.
    namespace { // anonymous namespace
    

Fixed.

memory_client.cc

  • InMemoryClient constructor: Probably not due to this branch, but it seems to be exception unsafe:
    InMemoryClient::InMemoryClient(util::MemorySegment& mem_sgmt,
                                   RRClass rrclass) :
        mem_sgmt_(mem_sgmt),
        rrclass_(rrclass),
        zone_count_(0),
        zone_table_(ZoneTable::create(mem_sgmt_, rrclass)),
        file_name_tree_(FileNameTree::create(mem_sgmt_, false))
    
    when FileNameTree::create() throws.

I spent a lot of time trying to figure out why it was not being reported already as I'd added tests to catch such leaks (using the MemorySegmentTest class and a throw count loop). It was not creating a separate client for each such run, so the test was not working properly. I fixed the test and fixed this problem too.

memory_client.h

  • The LoadCallback type is essentially private. I'd clarify that by putting it into a separate namespace (e.g. "internal"). But, this should probably go to zone data updater (see above) in the first place.

This is still inside InMemoryClient as it's used there. I've put it into an internal namespace.

zone_finder_unittest.cc

  • The comment doesn't make sense any more:
        // simplified version of 'loading' data
        void addZoneData(const ConstRRsetPtr rrset) {
            updater_.add(rrset, rrset->getRRsig());
        }
    

Comment was removed.

Also, we might just deprecate entire addZoneData() as its content is
a simple one liner (for that matter, addZoneData() doesn't sound
like a good name anyway - it should be named something like
addRRsetToZone())

It's been renamed to addToZoneData, but I'd like to keep the method as a wrapper in case we want to add any further bits there.

  • If we keep the per-type sorting, I think there should be more tests. For example, what if we add NS then NSEC? If NSEC then NS? Case with more than two RRsets, etc. Also, such part should have been written test driven: write some new test case, confirm it fails, and incrementally add the sorting logic (b3d2730 doesn't seem to be like that - it more suggests that the whole main code was written and then failed existing tests were adjusted).

The tests were adjusted because I knew they were going to fail having written them in #2108 and this was a good way to test that the sorting was happening. The sorting code is gone now as we will revisit performance later.

comment:11 in reply to: ↑ 10 Changed 7 years ago by jinmei

Replying to muks:

I made a few suggested changes. Most of them are minor cleanups.
ed2b8eb is beyond that; see below.

zone_data_updater.cc

  • I'd incorporate more of the load() logic here. In fact, the functionality implemented in InMemoryClient::Loader should better belong here.

I have added Loader, but this is where I want to stop. After this, the next closest part of InMemoryClient contains code to populate the FileNameTree, ZoneTable, etc. which don't belong in a ZoneDataUpdater. I think all code left now in InMemoryClient belongs there.

I have multiple comments and suggestions on this part:

  • First, the loader class doesn't have to be part of the updater; the relationship would be rather opposite (but it doesn't have to be so either, so it's even better to separate them completely; in general it's better if two classes has less dependency between them). I thought this point is non controversial, so I made suggested changes directly to the branch (ed2b8eb)
  • The description of ZoneDataLoader now generally seems to be obsolete (e.g., it's not an "helper internal class for load()". Please update. Other public methods should have description.
  • I still suggest moving masterLoadWrapper/generateRRsetFromIterator, etc, to ZoneDataLoader. We'll soon have to extract the zone loading (building a zone data instance from file/iterator) as a more independent step so we can separate the building/swap/cleanup steps. So, the ZoneDataLoader class should have a file name or iterator, have the logic like masterLoadWrapper/generateRRsetFromIterator internally, and provide an interface for the caller to get access to the built zone data.
  • To this end, ZoneDataLoader doesn't even have to be a class but a free function (or functions with different signatures) as it's a blocking operation:
    ZoneData* loadZoneData(MemorySegment&, RRClass, Name, const string& zone_file);
    ZoneData* loadZoneData(MemorySegment&, RRClass, Name, ZoneIterator&);
    
    Although it may be a matter of preference.
  • Then the current InMemoryClient::load() would become like this:
    result::Result
    InMemoryClient::load(const isc::dns::Name& zone_name,
                         const std::string& filename) 
    {
        ZoneData* zone_data = loadZoneData(mem_sgmt_, rrclass_, zone_name, filename);
        loadInternal(zone_name, filename, zone_data);
    }
    
  • Also, I'd move the definition and declaration of the load class or function to a separate header and .cc.

I'm reluctant to spend even more time for this ticket, but I suspect
we'll need to do things like this anyway in the background loading
work (where we need to implement the separation of zone-data
building), so deferring that part wouldn't help much.

  • general: log messages are gone. please don't make such radical changes silently (and I actually don't like to remove them).

It was not malicious. I just thought they were redundant next to the throws, as doing the logging when the exception is caught will still have the message with the problem. I've put them back.

We could discuss the need for specific log messages (and, I tend to
agree with the ones with exceptions). But you removed all of them
(not only for the ones followed by exceptions), and do it silently
(without leaving any commit comment at 371a65 or in the ticket
comment), as part of seemingly straightforward port like 371a65. I'd
normally expect no behavioral change from such a big commit. So my
point is that if you want to introduce actual behavior change, it
should better to be done in a separate commit and/or explicitly noted
in some way.

memory_client.cc

  • InMemoryClient constructor: Probably not due to this branch, but it seems to be exception unsafe: when FileNameTree::create() throws.

I spent a lot of time trying to figure out why it was not being reported already as I'd added tests to catch such leaks (using the MemorySegmentTest class and a throw count loop). It was not creating a separate client for each such run, so the test was not working properly. I fixed the test and fixed this problem too.

The change to the code looks okay, but it wasn't clear to me what was
wrong with the previous tests. Can you explain that? Maybe as code
comment.

comment:12 Changed 7 years ago by jinmei

  • Owner changed from jinmei to muks

comment:13 Changed 7 years ago by muks

  • Owner changed from muks to jinmei

Hi Jinmei

Replying to jinmei:

Replying to muks:

I made a few suggested changes. Most of them are minor cleanups.
ed2b8eb is beyond that; see below.

zone_data_updater.cc

  • I'd incorporate more of the load() logic here. In fact, the functionality implemented in InMemoryClient::Loader should better belong here.

I have added Loader, but this is where I want to stop. After this, the next closest part of InMemoryClient contains code to populate the FileNameTree, ZoneTable, etc. which don't belong in a ZoneDataUpdater. I think all code left now in InMemoryClient belongs there.

I have multiple comments and suggestions on this part:

  • First, the loader class doesn't have to be part of the updater; the relationship would be rather opposite (but it doesn't have to be so either, so it's even better to separate them completely; in general it's better if two classes has less dependency between them). I thought this point is non controversial, so I made suggested changes directly to the branch (ed2b8eb)
  • The description of ZoneDataLoader now generally seems to be obsolete (e.g., it's not an "helper internal class for load()". Please update. Other public methods should have description.
  • I still suggest moving masterLoadWrapper/generateRRsetFromIterator, etc, to ZoneDataLoader. We'll soon have to extract the zone loading (building a zone data instance from file/iterator) as a more independent step so we can separate the building/swap/cleanup steps. So, the ZoneDataLoader class should have a file name or iterator, have the logic like masterLoadWrapper/generateRRsetFromIterator internally, and provide an interface for the caller to get access to the built zone data.
  • To this end, ZoneDataLoader doesn't even have to be a class but a free function (or functions with different signatures) as it's a blocking operation:
    ZoneData* loadZoneData(MemorySegment&, RRClass, Name, const string& zone_file);
    ZoneData* loadZoneData(MemorySegment&, RRClass, Name, ZoneIterator&);
    
    Although it may be a matter of preference.
  • Then the current InMemoryClient::load() would become like this:
    result::Result
    InMemoryClient::load(const isc::dns::Name& zone_name,
                         const std::string& filename) 
    {
        ZoneData* zone_data = loadZoneData(mem_sgmt_, rrclass_, zone_name, filename);
        loadInternal(zone_name, filename, zone_data);
    }
    
  • Also, I'd move the definition and declaration of the load class or function to a separate header and .cc.

I'm reluctant to spend even more time for this ticket, but I suspect
we'll need to do things like this anyway in the background loading
work (where we need to implement the separation of zone-data
building), so deferring that part wouldn't help much.

These have been done.

memory_client.cc

  • InMemoryClient constructor: Probably not due to this branch, but it seems to be exception unsafe: when FileNameTree::create() throws.

I spent a lot of time trying to figure out why it was not being reported already as I'd added tests to catch such leaks (using the MemorySegmentTest class and a throw count loop). It was not creating a separate client for each such run, so the test was not working properly. I fixed the test and fixed this problem too.

The change to the code looks okay, but it wasn't clear to me what was
wrong with the previous tests. Can you explain that? Maybe as code
comment.

Added.

master was also merged into this branch and some tests that were deleted in #2292 were added back.

comment:14 follow-up: Changed 7 years ago by jinmei

The latest branch now looks pretty good. I have a few more points though:

  • MockIterator variants have many duplicates. Obviously, this approach doesn't scale as we add more tests like these. I'd combine them, e.g., by passing the expected sequence of RRsets as vector<ConstRRsetPtr> or something and just iterate over them returning one by one in getNextRRset().
  • I believe addOutOfZoneThrows test is still needed. Consider the case we instantiate ZoneDataUpdater with zone data for example.org and call add() with a.example.com/A.
  • same for addNullRRsetThrows and addEmptyRRsetThrows.
  • addOutOfZoneThrows etc should now better belong to something like "loader (or updater) tests", although it may be a subject of the deferred ticket.

comment:15 Changed 7 years ago by jinmei

  • Owner changed from jinmei to muks

comment:16 in reply to: ↑ 14 ; follow-up: Changed 7 years ago by muks

  • Owner changed from muks to jinmei

Hi Jinmei

Replying to jinmei:

The latest branch now looks pretty good. I have a few more points though:

  • MockIterator variants have many duplicates. Obviously, this approach doesn't scale as we add more tests like these. I'd combine them, e.g., by passing the expected sequence of RRsets as vector<ConstRRsetPtr> or something and just iterate over them returning one by one in getNextRRset().

Done.

  • I believe addOutOfZoneThrows test is still needed. Consider the case we instantiate ZoneDataUpdater with zone data for example.org and call add() with a.example.com/A.

This is still tested by loadOutOfZoneThrows. As part of #2338, such tests can be ported to ZoneDataUpdater tests.

  • same for addNullRRsetThrows and addEmptyRRsetThrows.

addEmptyRRsetThrows equivalent is tested by one of the vector iterators now. For addNullRRsetThrows, there is no way to pass a NULL rrset through the InMemoryClient anymore, but this could happen when calling ZoneDataUpdater::add() directly. I've added a note in #2338 to check this case.

  • addOutOfZoneThrows etc should now better belong to something like "loader (or updater) tests", although it may be a subject of the deferred ticket.

Nod.

comment:17 in reply to: ↑ 16 ; follow-up: Changed 7 years ago by jinmei

Replying to muks:

I'd made one final suggested cleanup: using textToRRset for building
test RRsets instead of constructing RRset followed addRdata. (IMO) it
will generally make the code concise and more readable. I'd also
introduce a common variable for the SOA and zone name, but since some
of the tests will be moved to a different test case in #2338, I didn't
go that further at this time.

Whether or not you're okay with this change, the branch is ready for
merge. If you don't like the change for some reason, please simply
revert it.

I appreciate your patience:-)

comment:18 Changed 7 years ago by jinmei

  • Owner changed from jinmei to muks
  • Total Hours changed from 0 to 5.25

comment:19 in reply to: ↑ 17 Changed 7 years ago by muks

Replying to jinmei:

Replying to muks:

I'd made one final suggested cleanup: using textToRRset for building
test RRsets instead of constructing RRset followed addRdata. (IMO) it
will generally make the code concise and more readable. I'd also
introduce a common variable for the SOA and zone name, but since some
of the tests will be moved to a different test case in #2338, I didn't
go that further at this time.

Whether or not you're okay with this change, the branch is ready for
merge. If you don't like the change for some reason, please simply
revert it.

I appreciate your patience:-)

The changes to the tests look fine to me. You know I appreciate the patient reviews too. :)

comment:20 Changed 7 years ago by muks

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged to master in commit 83d00dc305a33e69e4bee379895697742a1eaab7:

* 66524e1 (origin/trac2268) [2268] use textToRRset instead of building RRset manually
* 0e2c140 [2268] Make minor code cleanup
* 45f0d1f [2168] Untabify
* 31d0926 [2268] Remove per-test iterator classes and use MockVectorIterator instead
* 3744e7a [2268] Add MockVectorIterator class
* e93e6f3 [2268] minor editorial cleanups
* 0e1a1b2 [2268] Remove obsolete comment (the bug was fixed)
* e2b120a [2268] Move FileNameDeleter out of InMemoryClient
* 1e6d210 [2268] Update API doc
* 8b59a7d [2268] Move ZoneDataLoader into an unnamed namespace
* f176512 [2268] Move ZoneDataLoader class out of its header
* e77ef46 [2268] Use loadZoneData() inside InMemoryClient
* 97a6125 [2268] Implement loadZoneData() functions
* 2150123 [2268] Move ZoneDataLoader to a separate file
* 6bceaa2 [2268] Rename load method to loadInternal()
* 65c54c9 [2268] Add a comment about InMemoryClient construction after setThrowCount()
* c1ce544 [2268] Add any attached RRSIGs when loading from an iterator
* eb84aca [2268] Fix disabled tests without using InMemoryClient::add()
* 31fdd06 Merge branch 'master' into trac2268
* ed2b8eb [2268] separated zone data Loader from ZoneDataUpdater.
* c16c1ab [2268] cleanup: removed (now-)unnecessary include
* 2eb96fb [2268] constify
* c6f7cce [2268] doxigen editorial fixes: use \code for code snippet; folded a long line
* 587a136 [2268] Add back logging calls
* ec2aeca [2268] Fix a leak upon exception in InMemoryClient's constructor
* 83a27d4 [2268] Remove redundant MemorySegmentTest definition
* ece0728 [2268] Move constructor and destructor to top of file
* d050293 [2268] Update ZoneDataUpdater API doc
* ef49ada [2268] Move EmptyZone exception to InMemoryClient class
* a475756 [2268] Remove the RdataSet sorting code
* 27be672 [2268] Just remove obsolete comment
* 18cff78 [2268] Rename method to addToZoneData()
* 3402084 [2268] Move LoadCallback typedef to an internal namespace
* ae53c8f [2268] Move Loader to live under ZoneDataUpdater
* 97fb03f [2268] Update comment about unnamed namespace
* 23d2eaf [2268] Use a single NSEC3Hash instance inside ZoneDataUpdater
* 1137e97 [2268] style fix (brace position) and clarify variable (s/set/rdataset/)
* 12bc898 [2268] even more editorial/style fixes
* bf382bb [2268] more style/editorial fixes
* 247cb9c [2268] minor editorial fixes: spacing, combining two short lines
* 25cd191 [2268] Make all hash related checks use isc::dns::NSEC3Hash
* e017162 [2268] Remove duplicate copies of NSEC3 check code
* b3d2730 [2268] Sort oft-used RRTypes to be in front of RdataSet lists
* b9250ec [2268] Remove unused includes from InMemoryClient implementation
* 8de1329 [2268] Untabify code
* 86fcb73 [2268] Remove pimpl from InMemoryClient
* 61beda1 [2268] Remove unused member variables from InMemoryClient::Loader
* 038ba06 [2268] Use the ZoneDataUpdater in test code too
* b643e4e [2268] Use ZoneDataUpdater in InMemoryClient
* 371a652 [2268] Add ZoneDataUpdater class

I've verified that both make check and lettuce passed on the merged master branch.

Resolving as fixed. Thank you for the reviews Jinmei.

Note: See TracTickets for help on using tickets.