Opened 9 years ago

Closed 9 years ago

#397 closed enhancement (fixed)

Port generic red-black tree (RBT) data structure from BIND-9

Reported by: zzchen_pku Owned by: hanfeng
Priority: medium Milestone:
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: 0.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

It is a sub-task of data source performance enhancement.
For the first sprint:
*We can simplify the internal a bit: e.g. omit the supplemental hash table for performance or incremental deletion
*It should be sufficient to provide basic interfaces: add/remove/search.

Subtickets

Attachments (1)

rbtree-leak.diff (612 bytes) - added by jinmei 9 years ago.

Download all attachments as: .zip

Change History (48)

comment:1 Changed 9 years ago by shane

  • Resolution set to duplicate
  • Status changed from new to closed

comment:2 Changed 9 years ago by zzchen_pku

  • Resolution duplicate deleted
  • Status changed from closed to reopened

repon for implementation

comment:3 follow-up: Changed 9 years ago by zzchen_pku

  • Owner changed from hanfeng to UnAssigned
  • Status changed from reopened to reviewing

branches/trac397 is ready for review now.
The initial version will onpy provide basic interfaces: add/remove/search.

comment:4 in reply to: ↑ 3 Changed 9 years ago by jinmei

  • Owner changed from UnAssigned to zzchen_pku

Replying to zzchen_pku:

branches/trac397 is ready for review now.
The initial version will onpy provide basic interfaces: add/remove/search.

Some prilimiray feedback:

  • I've noticed there are trivial violations of coding style guidelines. could you first fix trivial issues so that we can focus on the code logic later?
    • brace position
    • missing '()' after retrun
    • missing braces
  • the code generally seems to be undocumented. in particular, rbt_datasrc.h doen't have any class/member documentation. please add them.
  • some major part of the code aren't tested according to lcov. e.g. about a half of RBNode::successor() isn't tested. please run lcov and make sure most major part of the code are covered.

comment:5 follow-up: Changed 9 years ago by zzchen_pku

  • Owner changed from zzchen_pku to jinmei

Okay, update code according to coding style guidelines and add more unittest, please check.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 9 years ago by jinmei

  • Owner changed from jinmei to zzchen_pku

Replying to zzchen_pku:

Okay, update code according to coding style guidelines and add more unittest, please check.

Many part of RBTree::deleteRebalance still don't seem to be tested.

There are missing braces.

Positions of many */& are incorrect in terms of the coding guideline.

'return' still miss parentheses.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 9 years ago by zzchen_pku

  • Owner changed from zzchen_pku to jinmei

Replying to jinmei:

Many part of RBTree::deleteRebalance still don't seem to be tested.

Added more unittest to improve test coverage, almost cover all exceuted paths except printTree() function now.

There are missing braces.
Positions of many */& are incorrect in terms of the coding guideline.
'return' still miss parentheses.

Updated.
Please check, thanks.

comment:8 in reply to: ↑ 7 Changed 9 years ago by jinmei

Replying to zzchen_pku:

Positions of many */& are incorrect in terms of the coding guideline.
'return' still miss parentheses.

Updated.
Please check, thanks.

Test coverage now looks good. I'll look into it once I'm done with the writable data source branch.

Meanwhile, could you update the documentation more? Currently the amount and quality of the document are not sufficient to me. I'd expect

  • for each class, describe what this class is responsible for; how generally the application uses it; non trivial implimentation decisions (and the consequences of them, e.g. restriction of the usage); other implementation notes; and, things like rbtree, a simple diagram how the tree would look like.
  • for each enum value or constant class variable, give a short description of what it means
  • for each member function, describe what it does; whether/when it throws exceptions; parameters and return value in the doxygen format; other non trivial implementation notes, if any
  • same for other global functions, constant values, etc.

See the current documentation at:
http://bind10.isc.org/~jinmei/bind10-trac397/cpp/classisc_1_1datasrc_1_1_r_b_tree.html
http://bind10.isc.org/~jinmei/bind10-trac397/cpp/classisc_1_1datasrc_1_1_r_b_node.html

and compare it with, e.g., this one:
http://bind10.isc.org/~jinmei/bind10-trac397/cpp/classisc_1_1dns_1_1_e_d_n_s.html#_details

When someone says a branch is "ready for review", I normally expect the latter level of documentation is included. This is not only for completeness; sometimes you notice missing features or defect in the implementation as you try to provide detailed document of the implementation. So, it's also for maturing the implementation.

comment:9 follow-up: Changed 9 years ago by zzchen_pku

Feng has updated the documentation, please check the latest version.

comment:10 in reply to: ↑ 9 Changed 9 years ago by jinmei

Replying to zzchen_pku:

Feng has updated the documentation, please check the latest version.

To me, it's still not quite sufficient. As I said, refer to this one as an example:
http://bind10.isc.org/~jinmei/bind10-trac397/cpp/classisc_1_1dns_1_1_e_d_n_s.html

and consider wheter the current documentation provides the same level of information. If you also agree it's still not sufficient, please improve it. I don't think it a blocking issue for continuing review, however, so I'm now taking a closer looking at the code.

I also still saw many editorial issues such as missing braces, and made proposed fixes directly to the branch (r3523). Please check, and please also consider introduce some automatic check or careful edit or something to avoid having this level of dicussions in subsequent reviews.

comment:11 Changed 9 years ago by jinmei

A couple of higher level comments:

  • why did you make it really a "tree of trees"? that is, RBNode::down_ points to a different RBTree, rather than RBNode. This is different from the BIND 9 implementation, and, although it's not necessarily a bad thing per se, since this is basically a simple port of the working implementation if we intentionally take a different approach we should clarify why.
  • "RBT" is not only intended to store RRsets. In fact, we'll use it for the zone table. So we cannot assume that it's used with RRsets or that an RBNode has an attribute of "delegate" or "non terminal". Likewise, we cannot have higher level methods including isDelegate(), isNonterminal() or addRRset() in this current form (thing like these should go to an equivalent of "rbtdb", not "rbt").

comment:12 follow-up: Changed 9 years ago by hanfeng

why did you make it really a "tree of trees"?
I really haven't read the code in bind9 about the rbtree, it's a little bit long and hard to understand. So I just get the document from Shane about what tree gonna looks like after insert a bunch of domain names and refer to your branch. So from this point of view, this is not a simple porting, I write it according to my understanding. The reason why i make it a real tree in tree is that from implementation point of view, It's more natural. With the recursive data struct the code is less but much easier to understand.

"RBT" is not only intended to store RRsets. ....
This is my misunderstand about the rbtree, so if it doesn't intended to only store rrsets but anything which related to one domain name, the interface of it should be modified and maybe we can make it a template class. I will finish the modification in two days.

comment:13 in reply to: ↑ 12 Changed 9 years ago by jinmei

Replying to hanfeng:

why did you make it really a "tree of trees"?
I really haven't read the code in bind9 about the rbtree, it's a little bit long and hard to understand. So I just get the document from Shane about what tree gonna looks like after insert a bunch of domain names and refer to your branch. So from this point of view, this is not a simple porting, I write it according to my understanding. The reason why i make it a real tree in tree is that from implementation point of view, It's more natural. With the recursive data struct the code is less but much easier to understand.

So, basically everything was written from the scratch?

BTW this is one of the things that should be documented as a design decision.

"RBT" is not only intended to store RRsets. ....
This is my misunderstand about the rbtree, so if it doesn't intended to only store rrsets but anything which related to one domain name, the interface of it should be modified and maybe we can make it a template class. I will finish the modification in two days.

More importantly we cannot assume an existence of an "NS" record to identify a partial match. You'll need some non trivial tricks such as BIND 9's "rbtnodechain" or manipulating parent pointers. Specifically, we cannot simply do the following in RBTree::findHelper()

            } else if (NameComparisonResult::SUBDOMAIN == relation) {
                if (node->isDelegate()) {
                    *tree = (RBTree*)this;
                    *ret = node;
                    return (RBTree::FINDREFERRAL);
                } else if ...

comment:14 Changed 9 years ago by jinmei

  • Owner changed from jinmei to hanfeng

Some more generic comments. I'll give this ticket back to the author at this point because we'll need some substantial changes.

  • there seem to be many points we can make things (method parameter, temporary variables in a function, etc). please "constify" things wherever possible.
  • IMO we shouldn't use C-style cast. also, apparently these casts are to remove const, but (if possible) it's much better if we can (re)organize the code so that we don't have to break constness in the first place. I'd suggest revisiting the logic to see if we can achieve what we do without relying on const_cast.
  • overall the code doesn't seem to be exception safe where it involves resource allocation. for example, the following code fragment isn't exception safe.
                        current->setDownTree(new RBTree());
                        RBNode* sub_root;
                        current->down_->insert(sub_name, &sub_root);
    
    because insert() also involves resource allocation and it can throw on failure. Furthermore, even if we make it exception safe in a straightforward way, the resulting code won't provide strong exception guarantee, i.e., if an exception is thrown the tree structure cannot always revert to the original state. The original BIND 9 implementation ensures this property. We'll probably need the same for this structure.

comment:15 Changed 9 years ago by hanfeng

  • Owner changed from hanfeng to jinmei

remove c style conversion.
add exception safe especially for bad_alloc check.

comment:16 follow-up: Changed 9 years ago by jinmei

I've committed:

  • some trivial editorial fixes (r3604)
  • a missing typename, without which my version of gcc refuses to compile it

to the branch. Please check.

(to be continued)

comment:17 in reply to: ↑ 16 Changed 9 years ago by jinmei

I'd like to propose constifying methods/temporary variables as much as possible. I've committed a proposed change to the branch (r3611). Please check.

Making find(), findHelper(), successor() const may be controversial, and, in fact, I suspect we'll eventually see the need for getting a mutable node via find() (e.g., when manipulating RRsets for a rbt node) but I believe in that case we should provide both mutable and immutable version of find() just like std::map::find() has two versions.

(to be continued)

comment:18 Changed 9 years ago by jinmei

The current use of NULLNODE doesn't make sense to me:

  • passing a "nullnode" to RBNode constructors makes the code fragile. If the caller omit the parameter or intentionally specifies NULL, many part of the code won't work correctly because they assume the specific property of the nullnode. For example, successor() will be broken.
  • btw, the doxygen comment of the constructor doesn't parse to me.
  • in either case, null node determination like this doesn't seem to be good:
        if (right_ != right_->right_) {
    
    it's an implicit statement and therefore less understandable, and is fragile in that it assumes the specific structure of the null node.

A related note: one of the RBNode constructors were unused.

I've committed a proposed change to address these concerns (r3617). Please check.

(to be continued)

comment:19 Changed 9 years ago by jinmei

The code comment on the latter half of successor() didn't actually parse to me:

    // otherwise return the parent without left child or
    // current node is not its right child

but if I my guess of "the parent without left child" is correct, it doesn't describe what the code does. This code actually checks if "parent" reaches the root node.

Also, for the same reason I pointed out in the previous review comment, the code "s != s->left_" is confusing. it's far better to check if it's the null node explicitly.

I've committed my proposed fix to these issues (r3618). Please check.

comment:20 follow-up: Changed 9 years ago by jinmei

  • Owner changed from jinmei to hanfeng

Okay, finally completed review. Here are comments. Giving the ticket back to feng.

First, I made some (more) minor cleanups I noted directly to the branch (r3633). Please check them.

Meta Comments

Was it developed test driven? From the ordering of commit messages
I'm afraid it was actually "code driven". I'm even afraid the code
and tests were written by different developers. If it went that
way, please seriously consider revisiting the development style.
Without practice we will never be able to master TDD. If this is done
by a pair, it shouldn't be a pair of "main coder" and "test coder".
Each person should do both, in the order of test then code.

General Comments

  • I still don't understand why we need to rewrite the whole logic from the scratch. This is complicated stuff by nature, so doing it from the scratch only seems to increase the risk of introducing bugs. In fact, I've already found some strange behaviors and points that don't meet our requirements (see below for the specifics). I couldn't also be sure about the correctness of one part of the code (see the comment for eraseNode() below). Redblack tree algorithm itself is very complicated and difficult to write correctly or difficult to be sure if it's correct, so why don't we simply port the existing, known to work code? That way, we can at least be sure that it should be as correct as the original code. Code written from the scratch requires more load for review and is normally more buggy. "The original code is complicated" is true, but that is not a convincing reason to me because this code is also complicated. So, I'd like to ask revisiting the approach again. If, you still think the scratch code is better, that's a decision, but please describe why we can't use the original code in the class description in detail.
  • A related point: the current design seems to be much less efficient in many points. For example, it stores name objects, which contain std::string and std::vector objects, and therefore heavy in terms of memory footprint. Likewise, having a separate tree object for each subtree can be heavy wrt memory footprint (consider, e.g., a large TLD that contains many <subdomain>.<tld> names with "in bailiwick" NS names such as nsX.<subdomain>.<tld>. Then we'll have a large number of sub trees). find() involves possibly many recursive function calls, which will be slower than a single loop. These things may be acceptable for initial implementation if it has other advantage such as code simplicity (I'm personally okay with having name objects for now, for example), but these are important design decision and should be explicitly documented.
  • As we discussed before, I'm personally against using boost::noncopyable in a public header file. At the very least we should be consistent about the use of it (right now this is the only place we use it). If you still believe increasing the reliance on boost is the way to go, please raise it as a general matter and get consensus, then introduce it throughout our code tree.
  • Admitting this may be a matter of taste, this type of syntax looks awkward, and reduces code readability (by confusing the reader) to me:
                                if (-1 == ret) {
    
    I guess this tries to proactively avoid a common error of mistyping '==' with '='. On one hand, this might be considered a good practice; on the other hand, where I stand, this is an outdated hack that just makes the code awkward. Modern compilers warn about this type mistypes, and by treating warnings as an error we can avoid this type of bugs. For example, if I modify this code to:
                                if (ret = -1) {
    
    my gcc warns as follows:
    rbt_datasrc.h:525: warning: suggest parentheses around assignment used as truth value
    
    In addition, particularly in C++, we can declare many variables const, which also help trigger a compiler error due to an accidental assignment.

Besides, the style isn't consistent even in this file. In some
other parts a "more natural" style is used:

            if (common_label_count == 1) {

The inconsistency is a source of confusion and reduces readability.

If, we agree with adopting this as a coding guideline and using it
throughout our source tree consistently, I could live with that.
Until/unless that happens, I suggest we keep the "natural" style
consistently.

  • About doxygen comment: comments on private methods will be ignored by doxygen. it's still sometimes useful to provide detailed comments on them, but if it's related to a general design matter, it might be better to describe the points in, e.g., the class description.
  • I'm afraid it's not a good idea to store raw "type-T" data in RBNode. We need to represent a "node without data", and the notion of "shadow" isn't sufficient for this purpose (see also discussion about "NXRRSET" below). Also, depending on the implementation of the type, raw data may be heavyweight in terms of memory footprint, even if it's a default-constructed one. So we should probably store the data as a pointer(-like) type object that can be comparable with "NULL".
  • class naming: it may be better to introduce one level of abstraction than "red black tree", because from user's point of view the internal structure doesn't (shouldn't) matter. Also, we may actually want to change the internal structure to something different (maybe a hash table, or other tree-like structure as Michal proposed).
  • Likewise, I suspect the module and filename are not appropriate. This stuff should probably go to lib/datasrc. And, "rbt_datasrc.h" is not a good name in two reasons: this is a generic backend and is not a data source per se; it would be better to raise the abstraction level as discussed above.
  • documentation in general: we need two types of documentation: one for API users, and the other for developers who try to read/modify the code. For the former (or both), we need to describe "what this class is", "how to use it", "when this method throws an exception", "what this parameter should be", "what this method returns in which case", etc. Especially for the latter, we need to describe "why we design it that way", "how this is different from other implementation (e.g. BIND 9) and why", "what is missing in the current implementation", etc. These things are largely missing in this implementation.

operator- for Name

  • technically, an unnamed namespace shouldn't be used in a public header file because it would easily cause breaking the C++ one definition rule. I suspect there's even a compiler that rejects this code. Assuming we'll eventually (actually soon) use more efficient data structure than Name objects, it may be okay for now. But we should at least note these in comments.

RBNode class

  • RBTreeColor: a minor point, but why does it start with 1?
  • RBTreeColor: it seems this can (should) be non public
  • getName(): we'll actually need the full name of the node (not relative)
  • successor()
    • why is it public (other than for testing purpose)? it's not always useful for applications because it only provides the "successor" of a single subtree. What the application would really need is the "next" node of the entire tree-of-tree (= zone). and, it can even be harmful in that it returns the internal null node, whose property is completely hidden in the implementation and will confuse the application.
    • documentation needs to be improved:
      • define "big"
      • what if this is the "biggest" node?
      • note also that "whose name is bigger" is ambiguous. It normally doesn't specify a single node because there are normally more than one "bigger" nodes.
    • cloneDNSData(): "rbt" is now non DNS specific thing, so this method should be renamed.
    • setDownTree(): doc: it's not a complete sentence. also, what it tries to say isn't clear.
    • not clear why we need is_shadow_ (unless you read other part of the code carefully). This should be fully documented (probably in the class description of RBTree).
    • name_: unclear description.
  • setDownTree(): I didn't understand the description.

RBTree class

  • documentation: this is insufficient. see above for the general matter. this class should specifically describe overall data structure, complexity, overall user interfaces, footprint consideration, ownership property (whether the application should keep the ownership of the data stored in the "tree"), etc.
  • FindResult?: we may want to make the naming consistent with ZoneTable?, where we use SUCCESS instead of EXACTMATCH.
  • find()
    • the documentation should clarify when to use which version of this method.
    • semantically, the "mutable" version shouldn't be a const member function, because the caller can effectively modify the tree through the returned node pointer. IMO, it's safer to remove the const and warn that (in doc) this is a dangerous version of find() and should be used only when necessary and with caution.
    • in BIND 9 we add argument validation for this type of API: requiring inserted_node != NULL && *inserted_node = NULL. we may want to do the same check. same comment applies to insert().
  • for what purpose do we need getNodeCount()? for debug? the intent should be documented, and if we don't need it in practice, we should probably remove it. I'd also note that "node created common suffix node" is completely implementation specific, and isn't appropriate for a public interface (it's susceptible to implementation changes)
  • same comment applies to getNameCount()?
  • printTree(): our usual practice is to define toText() and (if necessary) define operator<<() separately (but for this class toText() may not always make sense because the resulting text may become very large). Note also that the naming of "printTree()" is too specific to the current implementation, and "Tree" is actually redundant because it's a member function of "RBTree". maybe we should just call it "print()".
  • insert()
    • I don't like it to return magic numbers such as 0/1/-1. Why don't we generalize the result code (EXACTMATCH, etc) and use it here? Note: ZoneTable? adopts that approach.
    • In any case, I don't think it a good idea to handle the 'out of memory' error via the return value. I'll discuss it more with exception safety below.
    • we may eventually want to introduce an iterator, and want to have insert() return an iterator. for now I'm okay without this extension, but it would be better to note this possibility in the class documentation.
  • erase(): like insert(), I don't like it to return magic numbers.
  • The diagram after the class definition: this should be included in the doxygen document.

~RBTree()

  • This seems to be quite inefficient (consider how many nodes we need to visit before deleting the entire tree). This seems to me to be another bad reason for redeveloping the wheel.

findHelper()

  • I don't think it correct to return NOTFOUND when the search finds an exact match with "is_shadow_" because there's a case like "NXRRSET", where the node is empty but should match the search key. Note that handling this case wouldn't be that trivial, because there will be indirect ways to change the node property, e.g. an application first finds a "shadow" node in an "NXRRSET mode", and then perform setData() on it, which effectively makes that node "non shadow". This design should be substantially revisited.
  • if we specify 'const' for 'tree', why not do that for 'ret', too? note that 'tree' is also subject to a subsequent change, where we need const_cast (see erase()). ideally we should avoid such casting, but I can live with it in this case, understanding that it's used in mutable and immutable contexts and that it's a private helper function. however, such a tricky property should be consistent so that the intent is clearer to code readers. the intent should also be documented in detail, so that when someone needs to modify the code the interface won't confuse the developer.

getNodeCount/Helper()

  • this requires traversing the entire tree just to count the total number of nodes. depending on the purpose, it seems too expensive. Note: in BIND 9 version of rbt this is a constant operation.

getNameCounter()

  • same comment applies.

insert()

  • this is still not exception safe, and this usage of try-catch is not consistent with our convention, and makes the code unreadable.

First, it's still not safe. For example, operator- for names or
cloneDNSData() will require resource allocation for new name
objects and can throw. If this happens the memory allocated for
new_sub_tree will leak. In general, it's very error prone to try
to address exception safety this way unless the code is very short
and only uses very primitive interfaces, because we can easily miss
a function that can throw an exception. It's also fragile to
future changes for the same reason. In fact, this is one of the
reasons "why we shouldn't use exceptions" often claimed by
exception opponents.

Second, in a (C++) package relying on exceptions, handling
exceptions locally doesn't make sense in general. This is first
because how to handle the situation that an exception occurs is
often unclear at a lower level like this (e.g. at a top application
level, we might be able to drop some unnecessary data such as
caches to make more space for getting mandatory resources). It's
also because adding try-catch in a lower level decreases code
readability with other disadvantages of exceptions (such as higher
costs and higher possibility of resource leaks). In fact, the
resulting code is quite unreadable due to the handling of rare
exceptional cases. If we were to handle these error cases at this
point, it's much better to use no-throw new and write the C-style
if-fail-then-(goto)-cleanup code.

I'd personally prefer allocating resources in an RAII manner and
simply let exceptions be propagated. We could use a shared
pointer, but in this case it's a bit overkilling, and the standard
auto_ptr should suffice. For example, the new_sub_tree handling
would look like this:

                    std::auto_ptr<RBTree<T> > new_sub_tree =
                        std::auto_ptr<RBTree<T> >(new RBTree());
                    RBNode<T>* sub_root;
                    new_sub_tree->insert(sub_name, &sub_root);
                    int ret = 0;
                    if (name.getLabelCount() != common_label_count) {
                        ret = new_sub_tree->insert(name - common_ancestor,
                                                   new_node);
                    }
                    RBTree<T>* down_old = current->down_;
                    current->setDownTree(new_sub_tree.get());
                    ...
                    current->is_shadow_ = true;
                    new_sub_tree.release(); // there's no worry for exception
  • in the common_ancestor case, shouldn't we leave the original data in the new "current"?
  • I suspect we shouldn't do this when new_node is NULL:
                            if (name.getLabelCount() == common_label_count) {
                                *new_node = current;
    
    (or we might require new_node must always be non NULL)

insertRebalance()

  • shouldn't we care about the case of 'node == root_' in the while condition?


leftRotate()

  • largely a matter of preference, but I'd use more descriptive variable names like 'child', 'parent', instead of 'c', 'p', except commonly used idioms such as 'i' for a loop counter. Same comment applies to rightRotate().
  • I suggest adding const to p and c as follows:
    RBTree<T>::leftRotate(RBNode<T>* const p) {
        RBNode<T>* const c = p->right_;
    
    that way we can be sure p or c will be never replaced in the function, which helps read the code. Same comment applies to rightRotate().
  • why does this function return a value? it doesn't seem to be used. Same comment applies to rightRotate().

erase()

  • in the case of 'node->down_ != NULL', I don't think it a good idea to leave the data in the node. what if it's a 100MB of object?
  • in the "merge down to up" case, it assumes the "tree->root" node is not "shadow". It's not obvious to me if this is met. Isn't it possible to have the following scenario?
    level L: two nodes=N1, N2
      N1: shadow, with down
      N2: (any type of node)
    level L+1 (from N1): two nodes=N3, N4
      N3: shadow, with down
      N4: (any type of node)
    
    Then suppose removing N4. Tree L+1 has now only one node (N3), which has an upper tree (L) and the upper node (N1) is shadow. N3, a shadow node, will be merged to N1, which now becomes a non shadow node. Even if such case shouldn't happen due to some non obvious constraints, it's far from clear, and we at least need comments why it cannot happen (we should probably add an assert() here, too). It would even be better if the whole code logic is more easy to understand.
  • In any case I see a more fundamental issue in this trick: this part can throw an exception (from Name::concatenate() or cloneDNSData(), which can throw in copying the name, or depending on the implementation of type T, in its reassignment, or from the merge_name copy). In general, methods like "erase" or destructors should not throw, so we need careful reconsideration here.

eraseNode()

  • like left/rightRotate, I'd use more descriptive variable names than 'x', 'y'. It's also awkward that 'y' is declared before 'x'. Meaningless initialization of these to NULLNODE is also a bad practice. These variables are essentially const (i.e. initialized once and never replaced), so I'd revise the first part of the code as follows:
    template <typename T>
    void
    RBTree<T>::eraseNode(RBNode<T>* const node) {
        RBNode<T>* const to_delete =
            (node->left_ == NULLNODE || node->right_ == NULLNODE) ?
            node : const_cast<RBNode<T>*>(node->successor());
    
        RBNode<T>* const child = (y->left_ != NULLNODE) ?
            to_delete->left_ : to_delete->right_;
    
  • and, after reading the code of eraseNode() and deleteRebalance() for over an hour, I finally gave up confirming this is really correct in terms of the red black tree algorithm. It's mostly impossible to be sure about such non trivial code without any comment. Please do either:
    • clarify each chunk of the code with comment about what it intends to do with any non trivial remarks, or
    • fall back to a more straightforward port of BIND 9's rbt (which I still recommend). This way we'll be able to be sure at least that it's as correct as the original code that has been proved to be working.

deleteRebalance()

  • I cannot be sure this code is correct, like eraseNode(). same suggestion applies.
  • variable name 'w' is bad. please use a more descriptive one.

INDNET()

  • it should be INDENT (I fixed it in the branch).
  • We should avoid using a macro this way, especially in a (possibly) public header file. It should be a function (ideally in an unnamed namespace, but also note the issue with an unnamed namespace in a header file).
  • in either case, I'd replace the for loop as follows:
        os << std::string(depth * 5, ' ');
    
  • I'm a magic number hater:-) Why is 5 chosen (not 4 for example)?

printTree()

  • Just checking: is it intentional that \n is used, not std::endl? same question applies to printTreeHelper().

printTreeHelper()

  • at the end of the function, it seems we can avoid the if-else by handling null node case at the beginning of the function:
    {
        INDENT(os, depth);
        if (node == NULLNODE) {
            os << "NULL\n";
            return;
        }
    
    then we can simply the end of the function:
        printTreeHelper(os, node->left_, depth + 1);
        printTreeHelper(os, node->right_, depth + 1);
    
  • maybe a matter of preference, but I'd avoid inserting a blank line for "visible" nodes, and avoid consuming a line just for 'invisible'. so, I prefer
         d.e.f. (black) [invisible]
    
    instead of
         d.e.f. (black)
    [invisible]
    
  • I'd add ' ' to 'end down from' (hey, this type of error could be avoided if you strictly do it test driven:-)
  • "tree has node XX" doesn't parse well. I'd rephrase it "tree has XX node(s)"

rbt_datasrc_unittest

  • insertNames: this comment is not correct any more:
        // a node is considered to "formally" exist only if it has data
        // associated with it
    
    (I guess it was based on the first implementation?) In the current implementation, a node will be considered "existent" once it's explicitly inserted whether or not the data is associated. so for example the following test would pass:
        EXPECT_EQ(0, rbtree.insert(Name("d.e.f"), &rbtnode));
        EXPECT_EQ(1, rbtree.insert(Name("d.e.f"), &rbtnode));
    
  • successor
    • as commented in the main code, it may not be a good idea to have it as a public method.
    • the last successor() operation after "g.h" seems to be no-op
          EXPECT_EQ(Name("g.h"), successor_node->getName());
          successor_node = successor_node->successor();
      
      (although I know there's probably no meaningful test for this)
  • eraseName
    • the diagrams explaining the tests are helpful:-)
    • this comment doesn't seem to match what's tested:
          // o would not be rejoined with w.y if w.y had data
          // associated with the key
      
    • Strange thing happens here:
          // z will rejoin with d.e.f since d.e.f has no data
          EXPECT_EQ(0, rbtree.erase(Name("s.d.e.f")));
      
      until this point, z.d.e.f is "shadow", but once we erase s.d.e.f it becomes "visible". so, the following test would pass
          EXPECT_EQ(RBTree<int>::NOTFOUND, rbtree.find(Name("z.d.e.f"), &crbtnode));
          EXPECT_EQ(0, rbtree.erase(Name("s.d.e.f")));
          EXPECT_EQ(RBTree<int>::EXACTMATCH, rbtree.find(Name("z.d.e.f"), &crbtnode));
      
      I guess the issue I pointed out for erase() is realized here. (This is more about the implementation rather than for the test itself)

comment:21 in reply to: ↑ 20 Changed 9 years ago by hanfeng

  • Owner changed from hanfeng to jinmei


Meta Comments
Was it developed test driven?

To be honest, the implementation is written by me and test code is written by Jerry, we will
change the mode

General Comments

  • I still don't understand why we need to rewrite the whole logic from the scratch. This is complicated stuff by nature, so doing it from the scratch only seems to increase the risk of introducing bugs.

I have modify the code from really tree in tree to the bind9 design, use down node to
point to the sub tree, and remove the recursive code. I think the code isn't totally new
compared with bind9, the idea is same. Bind9 code consist of a lot of logic except just simple tree, so it will be more simpler for me to write, but it really cost your more review effort.

  • A related point: the current design seems to be much less efficient in many points. ....

For efficiency problem, I have remove the recursive code, use boost pool to manage the memory, so if we delete the whole tree, this is no need to walk through the whole tree from node to node. Also the creation and destroy tree node will be more efficient.

  • As we discussed before, I'm personally against using boost::noncopyable in a public header file. ....

I think we should depend less on boost compile part, but if only needs head file, I don't see the difference between depending on one file and more files.

  • Admitting this may be a matter of taste, this type of syntax looks awkward, and reduces code readability (by confusing the reader) to ..

Yes, it's avoid the mistake that make assignment but when we really want to do comparison. It decrease the readability of the code, since compiler can avoid that, I have remove that kink code

  • I'm afraid it's not a good idea to store raw "type-T" data in RBNode. ....

T hasn't to be real data, it can be a shard_ptr.

  • class naming: it may be better to introduce one level of abstraction than "red black tree", because from user's point of...

I do not totally understand it, if what we will provide the class memory data base, we
should add another abstraction, so user don't need to know what concrete data struct we use. but
this part code is red black tree porting.

  • documentation in general: we need two types of documentation: one for API users, and the other for developers who try to read/modify the code.

I will add more comments for user who want to modify the code

operator- for Name

  • technically, an unnamed namespace shouldn't be used in a public header file because it would easily cause breaking the C

I have move it into a inner namespace, originally it with the macro all reside
in cpp file, i forget to modify them.

RBNode class
RBTree class

No magic number is used
Use boost::object_pool to management memory to keep code exception safe
The variables in function is modified to more meaningful name
The erase logic is modified that when delete node with down pointer, instead just set it
to shadow, if it has only one child, it will merge with its down node.
PrintTree? rename to dumpTree, several mistake is removed
Shadow node cann't be seen by end user, because from user perspective, if I insert four names into the rbtree, I don't expect get five or even more names from it.

Add iterator into rbtree to support iterate the whole tree, next step can modify erase and find to return or use iterator.

comment:22 follow-up: Changed 9 years ago by jinmei

A quick check: the response doesn't seem to cover all of the review comments. Does that mean you are still working on other comments, or does that mean you simply adopted the review suggestions?

comment:23 in reply to: ↑ 22 ; follow-up: Changed 9 years ago by hanfeng

Replying to jinmei:

A quick check: the response doesn't seem to cover all of the review comments. Does that mean you are still working on other comments, or does that mean you simply adopted the review suggestions?

I think i covered all the comments, except the following two points:
1 Add more documentation for different kind of user
2 Function rename like copyDNSData
I am still working on them. The comment is really huge, so if I miss anything, please tell me, I will fix them ASAP.

comment:24 in reply to: ↑ 23 ; follow-up: Changed 9 years ago by jinmei

Replying to hanfeng:

A quick check: the response doesn't seem to cover all of the review comments. Does that mean you are still working on other comments, or does that mean you simply adopted the review suggestions?

I think i covered all the comments, except the following two points:
1 Add more documentation for different kind of user
2 Function rename like copyDNSData
I am still working on them. The comment is really huge, so if I miss anything, please tell me, I will fix them ASAP.

Actually there seem to be several open points (I'll send them separately).

It was also not clear whether the introduction of iterator was needed to solve a problem of the original code or was rather a new feature. We shouldn't include a new feature into such an already big ticket - it should go to a separate task. I'm sorry if I was not clear on this point in my previous comments. I intended to mean so by this comment:

  • we may eventually want to introduce an iterator, and want to have insert() return an iterator. for now I'm okay without this extension, but it would be better to note this possibility in the class documentation.

Anyway: for this sprint, we only need a much smaller subset of the feature we're currently having. Specifically:

  • create the table
  • insert entries
  • find a longest match entry

We don't need to erase entries; we don't need iterators. Can we focus on a subset of this branch that only covers the 3 crucial points? It will help complete the review process faster and give us time to work on other tasks. The rest of the features/code will be a subject of future sprints.

comment:25 follow-ups: Changed 9 years ago by jinmei

Here are some major outstanding issues. Some of them are simply not
addressed since the previous comments; some others are a revised
discussion based on the response.

  • this stuff should go to src/lib/datasrc
  • class/module/file name

I do not totally understand it, if what we will provide the class
memory data base, we should add another abstraction, so user don't
need to know what concrete data struct we use. but this part code is
red black tree porting

Actually, there are two issues here. First: Whether we want to make the
rbt stuff externally public or we want to keep it a private helper
used only inside our .cc's. I originally thought the former, but
maybe we should at least begin with the latter approach, in which
case this point may be less critical. The second issue is that it
still affects our internal modules that use this stuff. Consider
the ZoneTable? class. It will contain the "RBTree" and calls its
find() method. If we change the internal data structure from the
red black tree based one (e.g. to a B-tree like thing), the class
name "RBTree" is not correct and we'd like to change the interface
to "BTree" or something. Then we'll need to change the caller side
of the code even if the interface and the provided service
(i.e. giving a longest match for a given name) don't change. From
the user's point of view, the important point is the interface, not
the internal data structure. When naming something that can be
used by other things than that "something", we should focus on the
interface, not the internal implementation. Finally, in any case,
"_datasrc" in "rbt_datasrc.h" doesn't make sense anymore because
the defined stuff in the header file isn't actually a "data source"
itself.

  • boost noncopyable

I think we should depend less on boost compile part, but if only needs head file, I don't see the difference between depending on one file and more files.

No, the point is that boost is a moving target, and using it in a
public header file increases the risk of introducing binary backward
incompatibility. More boost header files to be included, more risks
we have. So, basically, unless it's very difficult to find a
reasonable alternative to the feature that a particular boost module
provides, we shouldn't use it in a public header file. So far, the
only exception is shared_ptr. Its benefit is too big, and due to
its complexity it's very difficult to have a non boost alternative.
A relatively more subtle exception is boost/function, which is
included from two of our public header files. Hopefully we can
remove the dependency on it eventually, but for now we have it as a
compromise. We don't use any other boost header files in our public
header files. IN the case of noncopyable, it's just a syntax sugar,
and its alternative is just a few line of additional C++ code. It's
way below the level of inevitable dependency.
Another issue is overall code consistency. If, for some reason, we
agree on using boost noncopyable, we should use it consistently
throughout the code. Using it only in a single header file is
confusing. We should either use it for all noncopyable classes or
not use it anywhere.

  • exception safety: changing new to object_pool doesn't solve the essential problem. See, e.g. nodeFission(). if the code following the createNode() call throws an exception (and it can throw), the allocated resource will still be hanging inside the pool, without being used by anyone, until we destruct the entire pool. The new code indeed "fixes" memory leak in the long run, but in practice the memory is leaking from the user's point of view. For the code to be really exception safe, we should clean up all newly allocated resources immediately after exception. object_pool is primarily for performance improvement, and the use of it may or may not be a good idea (although I'd defer the tuning later - making such a drastic change to an already big patch will make review harder), but I cannot be free from the exception safety issue regardless of the use of the pool.
  • A related point: again, whether or not using object_pools, returning an error code for the "no memory" condition is IMO not a good idea because it's not consistent with our error handling convention. I already pointed it out in my previous review, and it's still outstanding.
  • NXRRSET vs the notion of "shadow": I pointed this out in my previous review as part of comment on findHelper(). It's not addressed yet.
    • suggestion: drop the idea of shadow; require type T be comparable with NULL; replace the logic that checks whether the node is checked to be shadow with checking T == NULL. So, for example, instead of
                  if (!node->is_shadow_) {
                      *target = node;
                      ret = EXACTMATCH;
                  }
      
      in findHelper(), do this:
                  if (node->data_ != NULL) {
                      *target = node;
                      ret = EXACTMATCH;
                  }
      
      Note: this doesn't yet solve the NXRRSET case, but let's resolve this matter later in a future sprint.
  • documentation is still quite insufficient. quoting my previous comments: "documentation in general: we need two types of documentation: one for API users, and the other for developers who try to read/modify the code..."
  • existing (minor) point that isn't addressed:
    • The diagram after the (RBTree) class definition: this should be included in the doxygen document.

comment:26 in reply to: ↑ 24 Changed 9 years ago by jinmei

  • Owner changed from jinmei to hanfeng

Replying to jinmei:

Anyway: for this sprint, we only need a much smaller subset of the feature we're currently having. Specifically:

  • create the table
  • insert entries
  • find a longest match entry

We don't need to erase entries; we don't need iterators. Can we focus on a subset of this branch that only covers the 3 crucial points? It will help complete the review process faster and give us time to work on other tasks. The rest of the features/code will be a subject of future sprints.

I've created a sub branch named "trac397focused" based on the latest version of trac397, and made suggested initial changes including:

  • limit the features to those we need for this sprint
  • rename the file (for not still called rbt(ree))
  • remove the place to store the file from auth to datasrc

Is it okay to continue our review on this focused branch? (note: it doesn't mean trac397 is dead. We'll import additional features from it as we see the need for them in future sprints).

Giving the ticket back to Feng at this point.

comment:27 in reply to: ↑ 25 ; follow-up: Changed 9 years ago by hanfeng

Replying to jinmei:

Actually, there are two issues here. First: Whether we want to make the
rbt stuff externally public or we want to keep it a private helper
used only inside our .cc's. I originally thought the former, but
maybe we should at least begin with the latter approach, ...

To make the rbt stuff externally, to my understanding, we will create one abstract map, the key
of the map is domain name, and the value part can be anything, so the data struct behind is replaceable and opaque to end user. Is it what you mean? Then the interface will be more like map
in STL, just the key type is restrict to Name instance. If we want to have something like this, after finish rbt tree porting, we can add that abstract layer.

  • boost noncopyable

I think we should depend less on boost compile part, but if only needs head file, I don't see the difference between depending on one file and more files.

No, the point is that boost is a moving target, and using it in a
public header file increases the risk of introducing binary backward
incompatibility. More boost header files to be included, more risks
we have. So, basically, unless it's very difficult to find a

Yes, boost is moving, new stuff is adding into it. And some part of it is absorbed by stand library in the near future. And with class struct modified, binary compatibility will be damaged. But there are a lot of greate things in boost, shared_ptr is one of them, now I think we have also used for_each, maybe bind, functor. If you shared_ptr is used, possibly weak_ptr will be used later. for non_copyable, it's code is quite simple, we can write it. But do we really want to reinvent the wheel. for the compatibility issue, we can includes one version of boost into our code base just like asio. And I agree that we should keep the code consistent, if we decide to use boost::non_copyable, all the code should be modified accordingly.

  • NXRRSET vs the notion of "shadow": I pointed this out in my previous review as part of comment on findHelper(). It's not addressed yet.

I am not sure that i understand the problem. So let me clarify my thinking about NXDOMAIN & NXRRSET. In case, we have zone b. and in it, we only have c.a.b
c.a.b. 86400 A 1.1.1.1
If we

dig @localhost a.b A

BIND9 will return NOERROR instead of NXDOMAIN.
If we

dig @localhost c.a.b mx

BIND9 will return NOERROR

But if do nsupdate, with

prereq yxdomain a.b

BIND9 will return NXDOMAIN
or

prereq yxrrset c.a.b mx

BIND9 will return NXRRSET
so for nsupdate, the rcode returned is quite correct. but for dig, I don't know why BIND9 return noerror not NXDOMAIN & NXRRSET.

In principle, one name is think to be not exist, that means it has no rrsets with it. Otherwise, if it exists but no some kind of rrset user want, we will return NXRRSET, is it correct?

comment:28 in reply to: ↑ 27 Changed 9 years ago by jinmei

Replying to hanfeng:

Replying to jinmei:

Actually, there are two issues here. First: Whether we want to make the
rbt stuff externally public or we want to keep it a private helper
used only inside our .cc's. I originally thought the former, but
maybe we should at least begin with the latter approach, ...

To make the rbt stuff externally, to my understanding, we will create one abstract map, the key
of the map is domain name, and the value part can be anything, so the data struct behind is replaceable and opaque to end user. Is it what you mean? Then the interface will be more like map
in STL, just the key type is restrict to Name instance. If we want to have something like this, after finish rbt tree porting, we can add that abstract layer.

We don't necessarily have to have an additional abstraction layer (while it depends on the definition of "layer"). What I meant is to define something like "NameMap?" class with keeping all the internals that the current RBTree class has.

But I'm okay with keeping the name at the moment, especially if the intended use is to keep the definition semi-private (i.e., not installing the header file) and revisit the issue later.

  • boost noncopyable

I think we should depend less on boost compile part, but if only needs head file, I don't see the difference between depending on one file and more files.

No, the point is that boost is a moving target, and using it in a
public header file increases the risk of introducing binary backward
incompatibility. More boost header files to be included, more risks
we have. So, basically, unless it's very difficult to find a

Yes, boost is moving, new stuff is adding into it. And some part of it is absorbed by stand library in the near future. And with class struct modified, binary compatibility will be damaged. But there are a lot of greate things in boost, shared_ptr is one of them, now I think we have also used for_each, maybe bind, functor. If you shared_ptr is used, possibly weak_ptr will be used later. for non_copyable, it's code is quite simple, we can write it. But do we really want to reinvent the wheel. for the compatibility issue, we can includes one version of boost into our code base just like asio. And I agree that we should keep the code consistent, if we decide to use boost::non_copyable, all the code should be modified accordingly.

We do use boost for_each, but are very carful not to use them in
public header files (we only use it in .cc's). That also applies to,
for example, lexical_cast. As I mentioned, two of 79 header files
include boost/function.hpp, but in my understanding we'd rather avoid
relying on it, and that's why (in my understanding) Jelte avoided to
use it in his writable data source work.

  • NXRRSET vs the notion of "shadow": I pointed this out in my previous review as part of comment on findHelper(). It's not addressed yet.

I am not sure that i understand the problem. So let me clarify my thinking about NXDOMAIN & NXRRSET. In case, we have zone b. and in it, we only have c.a.b

Ah, I see, I was certainly not very accurate, sorry. What I actually meant was "empty non terminals".

c.a.b. 86400 A 1.1.1.1
If we

dig @localhost a.b A

BIND9 will return NOERROR instead of NXDOMAIN.

with an empty answer section, yes. Now, assume we also have

d.a.b. 86400 A 192.0.2.1

Then our rbt will look like this:

   a.b (shadow)
    |
    c
   / \
 nul  d

We should still return the NOERROR + empty answer to "dig a.b A". How can we make it possible with our current rbt backend? As far as I understand it we cannot distinguish the result from "NXDOMAIN" because "find()" just returns NOTFOUND. (btw, it's also not trivial to do it without d.a.b, but that's another question). We need to detect whether a node "has data" anyway, regardless of whether we has explicitly inserted the name for the node, and then the notion of shadow is mostly meaningless.

comment:29 in reply to: ↑ 25 Changed 9 years ago by hanfeng

  • Owner changed from hanfeng to jinmei

Replying to jinmei:

  • exception safety: changing new to object_pool doesn't solve the essential problem. See, e.g. nodeFission(). if the code following the createNode() call throws an exception (and it can throw), the
  • A related point: again, whether or not using object_pools, returning an error code for the "no memory" condition is IMO not a good idea because it's not consistent with our error handling convention......

object pool is removed, whether to use it needs discussion. To handle exception safe
especially bad_alloc, use auto_ptr. So there is NOMEM flag any more

  • NXRRSET vs the notion of "shadow": I pointed this out in my previous review as part of comment on findHelper(). It's not addressed yet.
    • suggestion: drop the idea of shadow; require type T be comparable with NULL; replace the logic that checks whether the node is

shadow is removed, <code>find</code>will return non-empty nodes, so to distinguish
NXDOMAIN from empty non-terminal is left to be done in latter sprint.

  • documentation is still quite insufficient. quoting my previous comments: "documentation in general: we need two types of documentation: one for API users, and the other for developers who try to read/modify the code..."
  • existing (minor) point that isn't addressed:
    • The diagram after the (RBTree) class definition: this should be included in the doxygen document.

Add more document and move the diagram into the doxygen part. The document maybe still insufficient. It's really my problem, I seldom writing comments for my code. If more description should be add to specific interface or design, please let me know

comment:30 Changed 9 years ago by jinmei

I'm now looking at the latest code, but code coverage matter first: some part of the main code are not covered, including this one:

http://gyazo.com/c4c894297dea90a10e754edc43903512.png

The rotate functions have uncovered parts, too.

Please run lcov and make sure all important methods are covered by the tests.

comment:31 Changed 9 years ago by jinmei

Another quick thing. Please make sure the doxygen document is reasonably formatted according to the doxygen grammar. Right now it doesn't even produce any document (at least in my environment).

comment:32 Changed 9 years ago by jinmei

deleteHelper() leaks memory, and re-introduces efficient loop.

As for memory leak, I'd suggest adding an assertion in the destructor:

template <typename T>
RBTree<T>::~RBTree() {
    deleteHelper(root_);
    assert(node_count_ == 0);
}

(or having explicit tests, which is better but doesn't seem to be trivial)

comment:33 Changed 9 years ago by jinmei

hmm, I see the efficiency issue has been addressed. I'm attaching a patch to the leak.

Changed 9 years ago by jinmei

comment:34 follow-up: Changed 9 years ago by jinmei

  • Owner changed from jinmei to hanfeng

The latest code looks quite good, but I have still some substantial
comments. See below.

I've made some minor editorial fixes to the branch (r3788).

Documentation now looks much better, although I think we need some
substantial cleanups and additions. But for now let's focus on the
code.

namespace helper

  • this will cause duplicate definitions of the functions defined in the namespace, resulting in a linker error once more than one .cc includes this header file.
  • one possible approach would be to make these functions inline, and/or to only give declarations in .h and give the definitions in a separate .cc file.
  • in either case, I'm not very comfortable to make these functions publicly accessible. putting them inside a separate namespace just obscures the existence and location of these essentially private functions. It doesn't really prohibit user apps from using them. As a compromise, however, we limit the use of this header file for our internal purposes only (and document so), and keep "hiding" them by not installing the header file in a public space. I'm okay with that approach for the purpose of this ticket.

RBTreeColor

  • my previous comment hasn't been addressed:
    • RBTreeColor: a minor point, but why does it start with 1?

RBNode class definition

  • shouldn't we make constructors private? (we previously did, and I think it made sense)
  • type of data_: I don't think it works well. getting a pointer from a possibly different module and freeing it in the callee (=rbtree) is a bad practice, because the caller and callee may use different new/delete. My suggestion is to impose requirement to T:
    • it's assignable and copy constructible (and should be exception free)
    • comparable to NULL This effectively means T must be a pointer(-like) type, and if it's a bare pointer the caller must be responsible for freeing it. (so, again, effectively it should be a smart pointer type)

RBTree class definition

  • this comment is not really correct any more because findHelper() isn't called recursively:
        /// Each public function has related recursive helper function
    
  • why did you remove const from 'up'? This essentially changed two things: changing the type for 'up' from RBTree to RBNode, and removing the const. I see the reason for the former. I don't for the latter.
  • Is getNameCount() (and name_count_) really necessary? After all, the notion of number of "explicitly added names" is moot, because our interest is whether a name has data or not, rather than how it appears in the tree. I suggest we remove name_count_ and getNameCount().

RBTree<T>::findHelper

  • Note: up_node is currently unused. I'm okay with keeping it for now though because we'll probably need it soon. but please leave comments about that.
  • This use of 'using namespace' may be okay, but I'd like to be conservative and avoid taking a risk of using it in a header file. in any case note that we cannot use a namespace here (see above).
  • this assumption doesn't hold:
                    if (node->isEmpty()) {
                        assert(node->down_ != NULL);
    
    Consider the following simple case:
        RBTree<int> rbtree;
        rbtree.insert(Name("a"), &rbtnode);
        rbtree.find(Name("b.a"), &rbtnode);
    
    AFAICS we can (should) simply merge empty/non empty cases.
  • BTW, maybe shouldn't we use null node also for the down pointer? Then this code could be simplified
                        if (node->down_ != NULL) {
                            node = node->down_;
                        } else {
                            break;
                        }
    
    to this:
                        node = node->down_;
    

RBTree<T>::insert

  • related to whether we need name_count_ (see above), I don't see the benefit of changing the behavior in the "already exist" case based on whether the node is "empty" or not:
                if (current->isEmpty()) {
                    ++name_count_;
                    return (SUCCEED);
                } else {
                    return (ALREADYEXIST);
                }
    
    I'd drop the notion of name_count_ and merge these cases:
                return (ALREADYEXIST);
    
    IMO this is much simpler and makes the code much robust (for cases like we explicitly inserted a node and then set/reset the data).
  • the purpose of using auto_ptr is probably not clear. please add comments on why need to do this.

nodeFission

  • down_node initialization could be simplified (and more efficient):
        std::auto_ptr<RBNode<T> > down_node(new RBNode<T>(sub_name));
    
  • do we need to do RBNode::swap()? After swapping, 3 of the six member variables of node and down_node are replaced anyway, and we could do the same thing with the same number of lines without using RBNode::swap():
        std::swap(node.data_, down_node->data_);
        down_node->down_ = node.down_;
        node.name_ = base_name;
        node.down_ = down_node.get();
    
  • It seems we could reuse the most of the code logic of nodeFission() for the subdomain + current->down==NULL case at the cost of making it a bit more complicated. specifically, we could modify it to:
    template <typename T>
    RBNode<T>*
    RBTree<T>::nodeFission(RBNode<T>& node,
                           const isc::dns::Name& down_name,
                           const isc::dns::Name& base_name)
    {
        using namespace helper;
        const isc::dns::Name sub_name = down_name - base_name;
        std::auto_ptr<RBNode<T> > down_node(new RBNode<T>(sub_name));
        if (&node.name_ == &down_name) {
            std::swap(node.data_, down_node->data_);
            node.name_ = base_name;
        }
        down_node->down_ = node.down_;
        node.down_ = down_node.get();
        //root node of sub tree, the initial color is BLACK
        down_node->color_ = BLACK;
        ++node_count_;
        return (down_node.release());
    }
    
    then we can reuse it for the other case as follows:
                            RBNode<T>* new_down =
                                nodeFission(*current, name, current->name_);
                            if (new_node != NULL) {
                                *new_node = new_down;
                            }
                            ++name_count_; // note: I think we don't need this
    

dumpTree()

  • one of my previous currents still stand:
    • "tree has node XX" doesn't parse well. I'd rephrase it "tree has XX node(s)"

dumpTreeHelper()

  • one of my previous currents still stand:
    • at the end of the function, it seems we can avoid the if-else by handling null node case at the beginning of the function: [...]

set_get_Data test

  • might better be named "setGetData" based on coding guideline.

insertNames test

  • the beginning comment is not true any more "a node is considered..." In fact, this test succeeds:
        EXPECT_EQ(RBTree<int>::SUCCEED, rbtree.insert(Name("d.e.f"), &rbtnode));
        // re-add after "explicitly inserted"
        EXPECT_EQ(RBTree<int>::SUCCEED, rbtree.insert(Name("d.e.f"), &rbtnode));
    
    see also a relevant comment about when to return SUCCEED and when ALREADYEXIST
  • same note applies to some other comments

comment:35 in reply to: ↑ 34 ; follow-up: Changed 9 years ago by hanfeng

  • Owner changed from hanfeng to jinmei

Replying to jinmei:

namespace helper

  • this will cause duplicate definitions of the functions defined in the namespace, resulting in a linker error once more than one .cc

indent has been moved to RBTree has a static helper function, name operator plus is made inline

RBTreeColor

  • my previous comment hasn't been addressed:
    • RBTreeColor: a minor point, but why does it start with 1?

It's quite personal prefer, no reason :)

RBNode class definition

  • shouldn't we make constructors private? (we previously did, and I think it made sense)

because deconstructor will be called by auto_ptr, so it has to be public, so I move the
constructor into public, I have move them back to private and give comment

  • type of data_: I don't think it works well. getting a pointer from a possibly different module and freeing it in the callee (=rbtree) is a bad practice, because the caller and callee may use different new/delete. My suggestion is to impose requirement to T:
    • it's assignable and copy constructible (and should be exception free)
    • comparable to NULL This effectively means T must be a pointer(-like) type, and if it's a bare pointer the caller must be responsible for freeing it. (so, again, effectively it should be a smart pointer type)

I quite agree it's bad to split the alloc and dealloc into different class. I think a better
way is let end user provider deconstruction functor, since boost::functor isn't allowed to use
now, so i haven't figure out a clean way to do it. I will keep the simplest design now, and think about it. also current design make some problem to smart point as template parameter

RBTree class definition

  • this comment is not really correct any more because findHelper() isn't called recursively:

invalid comment is removed

  • why did you remove const from 'up'? This essentially changed two things: changing the type for 'up' from RBTree to RBNode, and removing the const. I see the reason for the former. I don't for the latter.

up node has been add const

  • Is getNameCount() (and name_count_) really necessary? After all, the notion of number of "explicitly added names" is moot, because our interest is wheth...

getNameCount is removed

RBTree<T>::findHelper

  • this assumption doesn't hold:
                    if (node->isEmpty()) {
                        assert(node->down_ != NULL);
    

yes the assumption isn't correct, I have fixed it

AFAICS we can (should) simply merge empty/non empty cases.

  • BTW, maybe shouldn't we use null node also for the down pointer?

Yes, make the default value of down pointer as NULLNODE, make the code more consistent and
simple, I have modified accordingly

nodeFission

  • down_node initialization could be simplified (and more efficient):
        std::auto_ptr<RBNode<T> > down_node(new RBNode<T>(sub_name));
    
  • do we need to do RBNode::swap()? After swapping, 3 of the six member variables of node and down_node are replaced anyway, and we could do the same thing with the same number of lines without using

The way you proposed is more efficient, i have modified. But I cann't reuse the nodeFission
logic to do new down node creation, since there is no fission at all. Because use NULLNODE as
the down node default value, the creation new down node in insert is removed please have a check

dumpTree()

  • one of my previous currents still stand:
    • "tree has node XX" doesn't parse well. I'd rephrase it "tree has XX node(s)"

Modified accordingly

dumpTreeHelper()

  • one of my previous currents still stand:
    • at the end of the function, it seems we can avoid the if-else by handling null node case at the beginning of the function: [...]

Modified accordingly

set_get_Data test

  • might better be named "setGetData" based on coding guideline.

Modified accordingly

insertNames test

  • the beginning comment is not true any more "a node is considered..." In fact, this test succeeds:
        EXPECT_EQ(RBTree<int>::SUCCEED, rbtree.insert(Name("d.e.f"), &rbtnode));
        // re-add after "explicitly inserted"
        EXPECT_EQ(RBTree<int>::SUCCEED, rbtree.insert(Name("d.e.f"), &rbtnode));
    
    see also a relevant comment about when to return SUCCEED and when ALREADYEXIST
  • same note applies to some other comments

Modified accordingly

comment:36 in reply to: ↑ 35 ; follow-up: Changed 9 years ago by jinmei

  • Owner changed from jinmei to hanfeng

First, the previous comment about test coverage doesn't seem to be
addressed:
http://bind10.isc.org/ticket/397#comment:30

namespace helper

  • this will cause duplicate definitions of the functions defined in the namespace, resulting in a linker error once more than one .cc

indent has been moved to RBTree has a static helper function, name
operator plus is made inline

Okay, but leave a comment on indent() that the only reason for having
it as a member function (even though it doesn't have to get access to
the private members) is to hide a helper function.

RBTreeColor

  • my previous comment hasn't been addressed:
    • RBTreeColor: a minor point, but why does it start with 1?

It's quite personal prefer, no reason :)

Oops, I referenced the wrong previous comment. I intended this one:

  • RBTreeColor: it seems this can (should) be non public
  • type of data_: I don't think it works well. getting a pointer from

[snip]

I quite agree it's bad to split the alloc and dealloc into different
class. I think a better
way is let end user provider deconstruction functor, since boost::functor
isn't allowed to use
now, so i haven't figure out a clean way to do it. I will keep the
simplest design now, and think about it. also current design make some
problem to smart point as template parameter

Hmm, I don't see why we cannot use smart pointers (of course we
couldn't call "delete" for example but addressing such things should
be quite trivial).

RBTree class definition

  • RBTree<T>::findHelper
  • this assumption doesn't hold:
                if (node->isEmpty()) {
                    assert(node->down_ != NULL);

yes the assumption isn't correct, I have fixed it

Okay, and I'll refactor the resulting code a little bit:

                if (!node->isEmpty()) {
                    ret = RBTree<T>::PARTIALMATCH;
                    *target = node;
                }
                node = node->down_;

nodeFission

  • down_node initialization could be simplified (and more efficient):

This point isn't addressed.

    std::auto_ptr<RBNode<T> > down_node(new RBNode<T>(sub_name));
  • do we need to do RBNode::swap()? After swapping, 3 of the six member variables of node and down_node are replaced anyway, and we could do the same thing with the same number of lines without using

The way you proposed is more efficient, i have modified. But I cann't
reuse the nodeFission
logic to do new down node creation, since there is no fission at all.
Because use NULLNODE as
the down node default value, the creation new down node in insert is
removed please have a check

There's still redundant re-initialization

    down_node->name_ = sub_name; //<-- we don't need to do this

Also, now that RBNode<T>::swap() isn't used, let's remove them (if
it's needed in future, reintroduce it at that point).

The revised logic for the subdomain case looks okay. Actually, I like
the simplified code. One nit though: as far as I can see we don't
need the auto_ptr hack at the end of the function:

    std::auto_ptr<RBNode<T> > node(new RBNode<T>(name));

(but we should probably note that we don't need to do that explicitly)

RBTree<T>::findHelper

  • this one still doesn't seem to be addressed: Note: up_node is currently unused. I'm okay with keeping it for now though because we'll probably need it soon. but please leave comments about that.

comment:37 in reply to: ↑ 36 ; follow-up: Changed 9 years ago by hanfeng

  • Owner changed from hanfeng to jinmei

Replying to jinmei:

First, the previous comment about test coverage doesn't seem to be
addressed:

Done

namespace helper
Okay, but leave a comment on indent() that the only reason for having
it as a member function (even though it doesn't have to get access to
the private members) is to hide a helper function.

comment added

RBTreeColor

  • RBTreeColor: it seems this can (should) be non public

Basically, no matter which RBTree class will be instantiated(according to the template parameter)
they all use the same color, so I put it outside RBNode and RBTree template class definition.

  • type of data_: I don't think it works well. getting a pointer from

Hmm, I don't see why we cannot use smart pointers (of course we
couldn't call "delete" for example but addressing such things should
be quite trivial).

Now use shared_ptr to save the node data

RBTree class definition
Okay, and I'll refactor the resulting code a little bit:

                if (!node->isEmpty()) {
                    ret = RBTree<T>::PARTIALMATCH;
                    *target = node;
                }
                node = node->down_;

Modified accordingly.

nodeFission

  • down_node initialization could be simplified (and more efficient):

There's still redundant re-initialization

Modified accordingly

Also, now that RBNode<T>::swap() isn't used, let's remove them (if
it's needed in future, reintroduce it at that point).

swap is removed

The revised logic for the subdomain case looks okay. Actually, I like
the simplified code. One nit though: as far as I can see we don't
need the auto_ptr hack at the end of the function:

I will keep it. actually, start from memory allocation to
the end of the function call, we are not sure whether there will be exception raised

RBTree<T>::findHelper

Comment is added.

comment:38 Changed 9 years ago by jinmei

  • Owner changed from jinmei to hanfeng

The latest code doesn't compile:

Making all in datasrc
Making all in .
make[5]: *** No rule to make target `rbtree.cc', needed by `rbtree.lo'.  Stop.

If you really introduced rbtree.cc, you probably forgot to perform svn add.

comment:39 in reply to: ↑ 37 Changed 9 years ago by jinmei

Replying to hanfeng:

Hmm, it compiles without rbtree.cc. Maybe the addition to Makefile.am was a typo?

I've added some more suggested editorial changes to comments (r3852). Please check.

RBTreeColor

  • RBTreeColor: it seems this can (should) be non public

Basically, no matter which RBTree class will be instantiated(according to the template parameter)
they all use the same color, so I put it outside RBNode and RBTree template class definition.

This logic doesn't make sense to me. First, whether it's in a class or not, things that are not supposed to be seen in public shouldn't be published in general. Second, with this logic, any (static) class member variable or enum of a templated class would have to be defined outside of the class unless they depend on a templated parameter.

  • type of data_: I don't think it works well. getting a pointer from

Hmm, I don't see why we cannot use smart pointers (of course we
couldn't call "delete" for example but addressing such things should
be quite trivial).

Now use shared_ptr to save the node data

The bare pointer version of setData() essentially has the same problem of the original design and still makes me nervous. I'd suggest we begin with a minimum set of interfaces (i.e. only having the shared pointer version of setData() and modify the test code accordingly) and extend it in a safer way if and when we see the need for it.

The revised logic for the subdomain case looks okay. Actually, I like
the simplified code. One nit though: as far as I can see we don't
need the auto_ptr hack at the end of the function:

I will keep it. actually, start from memory allocation to
the end of the function call, we are not sure whether there will be exception raised

Okay, but please clarify that in a comment. Otherwise someone may "optimize" it later without understanding the intent.

comment:40 follow-up: Changed 9 years ago by hanfeng

  • Owner changed from hanfeng to jinmei

Color definition is moved into RBNode
setData interface for bare pointer is removed
more comment is added for auto_ptr

comment:41 in reply to: ↑ 40 Changed 9 years ago by jinmei

Replying to hanfeng:

Color definition is moved into RBNode
setData interface for bare pointer is removed
more comment is added for auto_ptr

datasrc/Makefile.am still has rbtree.cc while that .cc isn't in the branch. So it doesn't even compile. Either Makefile.am is wrong or .cc is forgotten to be added. In case it's the former did you really compile and test it?

comment:42 Changed 9 years ago by jinmei

  • Owner changed from jinmei to hanfeng

comment:43 follow-up: Changed 9 years ago by hanfeng

  • Owner changed from hanfeng to jinmei

rbtree.cc isn't in the branch and is useless, I have modified the Makefile.am to remove it.

comment:44 in reply to: ↑ 43 ; follow-up: Changed 9 years ago by jinmei

  • Owner changed from jinmei to hanfeng

Replying to hanfeng:

rbtree.cc isn't in the branch and is useless, I have modified the Makefile.am to remove it.

Okay, I think it's now mostly ready for merge. I've made a minor fix in comment wording (r3873).

One minor suggestion: on re-reading, "NodeDataType?" sounds a bit awkward to me. I'd suggest changing this to, e.g., "NodeData?" or "NodeDataPtr?".

And some final comments: I'd like to keep it a "hidden" backend only used by in memory zone and zone table implementations for now, because we defer some important open design related questions, including

  • reliance on boost definitions
  • class name

"hidden" means other modules are not supposed to use this header file even though it's visible to them.

If we agree, please add a note about this in comments at the beginning of the file.

And one really final thing: if you want to add a changelog entry for this ticket, please provide proposed text.

comment:45 in reply to: ↑ 44 ; follow-up: Changed 9 years ago by hanfeng

  • Owner changed from hanfeng to jinmei

One minor suggestion: on re-reading, "NodeDataType?" sounds a bit awkward to me. I'd suggest changing this to, e.g., "NodeData?" or "NodeDataPtr?".

NodeDataPtr? is the final name

If we agree, please add a note about this in comments at the beginning of the file.

Note is added

And one really final thing: if you want to add a changelog entry for this ticket, please provide proposed text.

  1. [func] feng

src/lib/datasrc: Introduced two new template classes RBTree and RBNode to provide
the generic map with domain name as key and anything as the value, because of
some unresolved design issue, the new classes is only intended to be used by memory zone
and zone table.
(Trac #397, svn r3736)

If the Changelog is ok, I will merge the branch to trunk.

comment:46 in reply to: ↑ 45 Changed 9 years ago by jinmei

  • Owner changed from jinmei to hanfeng

Replying to hanfeng:

If we agree, please add a note about this in comments at the beginning of the file.

Note is added

I've made some suggested cleanups on the note and some other part of the file (r3883). Please check.

And one really final thing: if you want to add a changelog entry for this ticket, please provide proposed text.

  1. [func] feng

src/lib/datasrc: Introduced two new template classes RBTree and RBNode to provide
the generic map with domain name as key and anything as the value, because of
some unresolved design issue, the new classes is only intended to be used by memory zone
and zone table.
(Trac #397, svn r3736)

Looks okay with a few nits:

  • "... the value, because of..." should be "... the value. Because of...". BTW "the comma should be a period" seems to be a common editorial nit in your documentation. You may want to include this point in your self checklist (if any)
  • "the new classes is only..." should be "the new classes are only..."
  • svn number is the one you'll use when merging it to trunk (which is currently unknown).

With these fixes (and if the suggested committed cleanups are okay) please merge the branch to trunk.

comment:47 Changed 9 years ago by hanfeng

  • Resolution set to fixed
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.