Opened 9 years ago

Closed 9 years ago

#517 closed enhancement (complete)

Empty node processing in MemoryZone difficult Part

Reported by: hanfeng Owned by: hanfeng
Priority: medium Milestone: A-Team-Sprint-20110209
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: 0.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Add interface to rbtree to support find the next node for one node

Subtickets

Change History (16)

comment:1 Changed 9 years ago by hanfeng

  • Component changed from Unclassified to data source
  • Owner changed from hanfeng to UnAssigned
  • Status changed from new to reviewing

comment:2 Changed 9 years ago by hanfeng

  • Owner changed from UnAssigned to hanfeng

comment:3 Changed 9 years ago by hanfeng

  • Owner changed from hanfeng to UnAssigned

comment:4 Changed 9 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:5 follow-up: Changed 9 years ago by jinmei

I've taken a quick look at it, and made some minor editorial changes I'd
propose (2f151e4).

gcov indicated all code is covered by tests, which is good.

Most of the editorial changes are s/foo */foo*/, s/foo &/foo&/, etc.
This seems to be quite common patches you submitted. If this convention
is difficult for you to follow, I'd suggest you write a simple checker
script and run it (and fix any trivial style issues) before you ask a
review. That would help the reviewer focus on the more substantial part
of code and make the review process smoothier.

One non-editorial change is that for the RBTree class constructor.
I changed needsReturnEmptyNode_ to a const variable (so that we
don't have to worry about the case where it's dynamically modified),
and initialize the variables in a member initialization list
(as that's the only way to initialize a const variable).

Another observation. Some of the newly written documentation doesn't seem
to be sufficient. In general, we should provide for each method/class

  • one-or-two-line brief description
  • more description that normally consists of several paragraphs. it includes various design choices or usage notes, and in case of methods possible exceptions and if it's not exception free which level of exception guarantee is provided.
  • for methods params and return

There can be exceptions (e.g. for a very trivial method like a trivial
accessor), but normally if some of the above points are missing the
documentation is considered poor (or at least I'd consider it poor as a
reviewer:-) Please check if the new doc provides sufficient information
according to these points, and fix them as I review the rest of the code.

Review continues.

comment:6 in reply to: ↑ 5 Changed 9 years ago by hanfeng

Replying to jinmei:

Most of the editorial changes are s/foo */foo*/, s/foo &/foo&/, etc.
This seems to be quite common patches you submitted. If this convention
is difficult for you to follow, I'd suggest you write a simple checker

Yes, I am used to write "foo *" instead of "foo*", I will pay more
attention. Thanks

One non-editorial change is that for the RBTree class constructor.
I changed needsReturnEmptyNode_ to a const variable (so that we
don't have to worry about the case where it's dynamically modified),

Although make a bool value const is a little uncommon, it's acceptable
for current implementation.

Another observation. Some of the newly written documentation doesn't seem
to be sufficient. In general, we should provide for each method/class

More comment is added

comment:7 follow-up: Changed 9 years ago by jinmei

Review on rbtree.h.

RBTree forward declaration
Is this necessary?

template <typename T>
class RBTree;

RBNode

  • why successor() is a public method? It's not used anywhere, and the interface is not good from design point of view because it reveals an internal detail of RBTree implementation (that it's actually a tree of trees). The only justifiable reason for this to be public that I can think of is the convenience for testing, but the there's no direct test for it. On a related note, if it's really intended to be public, the fact that there's no direct test is bad, because we should generally test things as closely to the test target as possible.
  • documentation isn't sufficient. please explain more details about "successor". providing an example would also help. also explain when to return a null node.

RBTree class description

  • this sentence(?) doesn't parse (and s/ture/true/?):
     * The default policy is to NOT return an empty node to end user,
     * pass ture will get empty node during find is needed.
    
  • the following note is barely understandable, too:
     *  \note The search policy only affect find behavior of rbtree, but when
     *  inserting one name, if the node already exist not matter it's insert
     *  explictly by end user or empty node created by the rbtree itself,
     *  insert will return ALREADYEXISTS not matter we want expose empty node
     *  or not.
    
  • is this "todo" now implemented?
     *  - since \c RBNode only has down pointer without up pointer, the node path
     *    during finding should be recorded for later use
    

RBTree::NodeChain?

  • it needs more documentation, including "what it is (in more detail)", how it's expected to be used, design choice about why it's a stack, etc. usage example will also be helpful.
  • as for the design: I'm afraid stack is not an efficient choice. In addition, using a stack introduces another possibility of exception to find(), which is not good. Due to the limitations of DNS names, node chains have a fixed max size, and BIND 9 uses a pre-allocated fixed buffer internally held in the chain. we might want to do the same. Furthermore, we may not even need the chain framework in the first place. At least for the purpose of empty node support, what we really need is to identify the "next" node of a given RBTree node. The role of chaining for this purpose is to provide a back pointer when we need to go up from one (sub)tree to another. this could actually be done by an explicit pointer at the root node of (sub)trees. this will introduce different complexity, so may or may not be a good idea. For now I'm okay with keeping the chain approach, but I'd like to leave these considerations in documentation.

RBTree constructor

  • documentation: describe the newly introduce parameter.

about findEx

  • First off, why do we need to introduce a new method? My understanding is that this will be used only from MemoryZone::find() (is that correct?). And, so will the existing RBTree::find() with a callback. So, I'd simply merge these two cases specifically designed for MemoryZone? and described it as a note. As I repeatedly said, we don't need too much generalization; the MemoryZone? class is very special, and the functionality it requires won't be needed in other places.
  • If there's a strong reason for defining a new method, I personally would avoid unclear acronyms like "Ex". In fact, I don't know what it means in this context. I'd give it a more descriptive name.
  • I'd want to see more description, including why we need this (in addition to the existing find() variants) and when to use it. I'd update the general description of the "Find methods" in this regard. Note also that we may have to update existing description about the "no data OK" mode.
  • why is "nextNode()" a part of "Find methods"? Is that intentional? If so, the general description should be updated accordingly.

nextNode()

  • code seems to do what it claims to do, but I'm afraid it's unnecessarily inefficient: it involves a copy of current node chain and possibly with additional push/pop operations, which are also expensive. In BIND 9, "node_path" is mutable and it's modified in the "next" function. I think we can do the same thing, because node_path is quite likely to be a locally prepared structure for a specific search. We could also eliminate the expensive push/pop operations by revisiting data structure (see above).

comment:8 in reply to: ↑ 7 Changed 9 years ago by hanfeng

Replying to jinmei:

Review on rbtree.h.

RBTree forward declaration

Removed

RBNode

  • why successor() is a public method? It's not used anywhere, and the interface is not good from design point of view because it reveals an internal detail of RBTree implementation (that it's actually a tree of trees). The only justifiable reason for this to be public that I can think of is the convenience for testing, but the there's no direct test for it. On a related note, if it's

Successor should be private which should be only used by rbtree.
More comment is added

RBTree class description

  • this sentence(?) doesn't parse (and s/ture/true/?):
     * The default policy is to NOT return an empty node to end user,
     * pass ture will get empty node during find is needed.
    

Document is fixed

RBTree::NodeChain?

  • it needs more documentation, including "what it is (in more...

Implementation is modified, now using fixed size array.

RBTree constructor

  • documentation: describe the newly introduce parameter.

Document is added

about findEx

  • First off, why do we need to introduce a new method? My understanding is that this will be used only from

"Ex" means extended version, which is not clear. I have merge it with call back version find

nextNode()

  • code seems to do what it claims to do, but I'm afraid it's unnecessarily inefficient: it involves a copy of current node

NodeChain? implementation is modified and no copy now.

comment:9 follow-up: Changed 9 years ago by jinmei

I've completed the review. I have some design level comments (see below
for details).

Documentation in general

Overall, documentation quality didn't seem sufficient to me:

  • In some cases existing review comments weren't addressed.
  • in some cases some out-of-context copy-past blocks were inserted, making the entire description unreadable.
  • existing documentation that needs to be updated with the changes of this ticket were not fully updated.
  • exception considerations were generally missing
  • param/return were still missing in some cases

Rather than running further cycles of comment/revise/re-review,
however, I've taken the liberty of updating the documentation by
myself (not all of them, though), as you're taking off. They've been
committed and pushed to the repository.

Next time, please make sure the documentation covers these basic
points.

Style issues in general

newly committed code had a non negligible number of style
violations. I suspect the style you are familiar with is so
different from our guideline that you cannot follow the guideline
just by trying to be careful. next time please use a tool (some
script or a batch sequence of shell commands) to pre-detect common
violations (in particular, s/foo */foo* /, s/bar &/bar& /, and
s/return(x)/return (x)).

successor()

  • it didn't fully address the review comment: It still didn't explain when to return NULL. Exception consideration is also missing. I've updated the documentation including these points.

NodeChain?

  • it's now belong to the isc::datasrc namespace level, and the class name seems too generic. I'd either rename it to something like RBTNodeChain or move it inside the RBTree class.
  • there are many style violations (directly fixed). PLEASE introduce a tool to detect these and run it before asking for review.
  • this comment is not really correct:
    /// RBNode did not have parent
    /// pointers in them (for memory usage reasons) so there was no way to find
    /// the path back to the root from any given node.
    
    In fact, RBNode does have a parent pointer.
  • overall, documentation is way too insufficient. you seem to have taken some doc from BIND 9 (which is itself not necessarily bad), so look at the BIND 9 doc again; it explains a lot of details of the chain. We need the same level of documentation. methods need to be described. (I've not touched the documentation of this class yet)
  • now that this is a separate class with non trivial code logic, it must have tests. And, it should have been tested before. next time please make sure you write test first.
  • as for tests, we should especially cover invalid number of pops and pushes. I'd also write a test where an RBTree contains a maximum number of levels and a chain has the maximum number of nodes.
  • I'd use exception instead of assert() due to bad inputs for various methods. assert() is dangerous because careless use of library can crash the whole application. Also, exceptions are more easily tested.
  • why RBT_MAX_LEVEL is an enum? I'd use a static const.
  • what's the rationale of 254? I know BIND 9 uses it, but I suspect it's unnecessarily big, and that's probably due to the leftover from the version in which bitstring labels were supported. In any case, we should be able to explain why 254 is the maximum independently, and confirm it via a test.

nextNode()

  • after re-reading it, I now have some concern about the interface design: passing the node and chain is a bad idea, because it's not guaranteed the given chain is valid for the given node. In BIND 9, a node chain contains the "current node" (which is equivalent to the node parameter of the nextNode()). I guess we should do the same.
  • (the first point aside) I'd pass a reference to node instead of a pointer because the pointer should never be NULL in this case.
  • there's another design level issue. It can return a NULL_NODE(), but the existence of such a node is hidden (which I think is rather a good thing though), and that would confuse the caller. I don't know what is the best regarding this point, but maybe we should return a NULL pointer in case we'd otherwise return NULL_NODE.
  • It looks like this method can return an empty node regardless of how the RBTree is constructed. Is that correct, and if so, intentional? Should we rather skip empty nodes for this method by default, too? In any case this point should be clearly documented and tested.

tests

I've made some editorial changes, and added one minor test.

We need some more tests as pointed out before.

comment:10 Changed 9 years ago by jinmei

  • Owner changed from jinmei to hanfeng

comment:11 in reply to: ↑ 9 Changed 9 years ago by hanfeng

Replying to jinmei:

NodeChain?

  • it's now belong to the isc::datasrc namespace level, and the class name seems too generic. I'd either rename it to something like\

RBTreeNodeChain is the new name for it and its interface is limited accessible
by class except RBTree

In fact, RBNode does have a parent pointer.

parent pointer here means up pointer, I have changed it

  • overall, documentation is way too insufficient. you seem to have taken some doc from BIND 9 (which is itself not necessarily bad),

More document is added

  • now that this is a separate class with non trivial code logic, it must have tests. And, it should have been tested before. next time please make sure you write test first.
  • as for tests, we should especially cover invalid number of pops and pushes. I'd also write a test where an RBTree contains a maximum number of levels and a chain has the maximum number of nodes.

For the maximun is should be 128 which is decided by the MAX_LABEL in Name,
since each node stores at least one label. For push, it's quite hard to
test the exception, since Name has TooLongName? check which stop me from
create too deep tree.

  • I'd use exception instead of assert() due to bad inputs for various methods. assert() is dangerous because careless use of library can crash the whole application. Also, exceptions are more easily tested.
  • why RBT_MAX_LEVEL is an enum? I'd use a static const.

Exception is used instead of assert
using enum is quite old style, since old version g++ doesn't support static const variable
declare in class declare. I have modified it to const static int

nextNode()

  • after re-reading it, I now have some concern about the interface design: passing the node and chain is a bad idea, because it's not
  • (the first point aside) I'd pass a reference to node instead of a pointer because the pointer should never be NULL in this case.

RBTreeNodeChain is the only parameter for this function, and the only
valid way to initialize it is through find function in RBTree

  • there's another design level issue. It can return a NULL_NODE(), but the existence of such a node is hidden (which I think is rather a good thing though), and that would confuse the caller. I don't know what is the best regarding this point, but maybe we should return a NULL pointer in case we'd otherwise return NULL_NODE.
  • It looks like this method can return an empty node regardless of how the RBTree is constructed. Is that correct, and if so, intentional? Should we rather skip empty nodes for this method by default, too? In any case this point should be clearly documented and tested.

NULLNODE is only used by RBTree inner logic, so returning NULL is the correct
way, and for successor, since it is used by RBTree, so it's ok for it to
return NULLNODE. nextNode is not like find, it's quite easy for user to skip empty
node by keep invoking it. Comment has added for this behavior

comment:12 Changed 9 years ago by hanfeng

  • Owner changed from hanfeng to jinmei

comment:13 Changed 9 years ago by jinmei

Some higher level points:

  • we agreed on adding [tracXXX] to commit logs. from now on, please add this when you make commits.
  • there's still common style violation (directly fixed). writing a checker script may be a nice side work if you need to kill your time in vacation:-)

I made overall cleanups on documentation and tests, and directly committed
proposed changes.

I also some non trivial changes:

  • use the absolute maximum for RBT_MAX_LEVEL. check this through tests
  • change exceptions to assert in private functions. see log and comments.
  • have RBTree methods throw exceptions against bogus chains.

My changes can be retrieved by "git diff 2565b9fe".

If these changes are okay, please merge it to trunk (or I can do that if
you don't have time). If not, please respond in this ticket.

comment:14 Changed 9 years ago by jinmei

  • Owner changed from jinmei to hanfeng

comment:15 Changed 9 years ago by jinmei

changes were okayed via a off-ticket emails.

merged to master and pushed. closing ticket.

comment:16 Changed 9 years ago by jinmei

  • Resolution set to complete
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.