Opened 9 years ago

Closed 7 years ago

#55 closed task (fixed)

comments on rrsetlist implementation

Reported by: jinmei Owned by: jinmei
Priority: medium Milestone:
Component: libdns++ Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 0.0 Add Hours to Ticket:
Total Hours: Internal?: no

Description

I've looked at the rrsetlist.{h,cc} implementation and have some
(many?) comments. Based on discussions on design/style issues so far,
some of the comments may be controversial and may simply be a
preference/opinion matter. But I'll give them a try anyway...

As of this writing I'm referring to rev 954.

First, passing "const RRsetPtr" to a function probably does not
implement what was intended:

-RRsetList::addRRset(const RRsetPtr rrsetptr)

It's as useless as passing "const int" as a function argument.

The intention was probably an equivalent of "const RRset*", which is
ConstRRsetPtr. (I was actually confused about these two in the
initial design phase, and that's partially my fault).

As far as I can see, all occurrences of RRsetPtr in rrsetlist.* can
(and should) be ConstRRsetPtr.

(more comments to come)

Subtickets

Attachments (2)

rrsetlist.diff (4.6 KB) - added by jinmei 9 years ago.
rrsetlist.2.diff (4.5 KB) - added by jinmei 9 years ago.

Download all attachments as: .zip

Change History (24)

Changed 9 years ago by jinmei

comment:1 follow-up: Changed 9 years ago by jinmei

Second, maybe it was intended to be a short term hack, but exposing
vector<RRsetPtr>::(const_)iterator via begin() and end() is not a good
practice:

  • typedef std::vector<RRsetPtr>::iterator iterator;
  • iterator begin() { return (rrsets_.begin()); }

It effectively breaks the encapsulation that hides the implementation
detail of "rrsets_".

In general, if we want to define a STL-container like user defined
class used with STL-compatible algorithms (including BOOST_FOREACH),
we should define the corresponding iterator classes inherited from
std::iterator variants.

I'm attaching a proposed patch to "fix" the first and second comments.

Changed 9 years ago by jinmei

comment:2 Changed 9 years ago by jinmei

Replying to jinmei:

As far as I can see, all occurrences of RRsetPtr in rrsetlist.* can
(and should) be ConstRRsetPtr.

Hmm this was not really correct. I didn't realize the main user (data
source) of the RRsetList hasn't been merged to trunk. Data source
heavily uses the entries of RRsetList as mutable objects, so we
actually cannot specify many of them as ConstRRsetPtr. I think the
user side can be more careful about const-ness, but for now I've
changed the RRsetList items to mutable.

A revised patch is attached.

comment:3 Changed 9 years ago by jinmei

  • Milestone set to 03. Authoritative-only server
  • Owner changed from jinmei to each
  • Status changed from new to assigned

giving it to Evan (more comments will come)

comment:4 in reply to: ↑ 1 ; follow-up: Changed 9 years ago by each

  • Owner changed from each to jinmei

Replying to jinmei:

Second, maybe it was intended to be a short term hack, but exposing
vector<RRsetPtr>::(const_)iterator via begin() and end() is not a good
practice:

I have no strong objection to the proposed iterator change, since it doesn't affect the way RRsetList is used. However, I also don't understand why this is a bad practice. We have an object into which we place RRsetPtrs, and we iterate over that object and get RRsetPtrs back. Having a uniquely-defined iterator template for the RRsetList class gives us exactly the same interface and behavior as aliasing the iterators in the underlying object, at the cost of making rrsetlist.h more complex. I don't see a practical benefit to the duplication.

I don't see much harm either, though, so you can merge it if you like.

comment:5 in reply to: ↑ 4 Changed 9 years ago by jinmei

Replying to each:

Replying to jinmei:

Second, maybe it was intended to be a short term hack, but exposing
vector<RRsetPtr>::(const_)iterator via begin() and end() is not a good
practice:

I have no strong objection to the proposed iterator change, since it doesn't affect the way RRsetList is used. However, I also don't understand why this is a bad practice.

As I stated, it's because it effectively breaks the encapsulation.
Essentially, this makes the rrsets_ member (a vector<RRsetPtr> object)
public, and we'll lose the flexibility of changing the internal
implementation in the future.

Besides, vector::iterator is a random access iterator, which is the
most powerful one among the standard iterator family. For example, a
random access iterator allows you to do this:

vector::iterator it = v.begin();
it += 5; jump to the 5th element.

We can't do this with a weaker, simple input iterator, and the
functionality of input iterator is exactly what you wanted:

We have an object into which we place RRsetPtrs, and we iterate

over that object and get RRsetPtrs back.

Random access iterator is much more powerful than this, and once we
publish a powerful interface, we can never weaken it because someone
may depend on the powerful feature. On the other hand, we can always
begin with a minimal necessary set of features and add stronger
functionality as we see the real need for it. This is another reason
(actually this is a corollary of the first point).

Yet another reason is that I believe we should eventually make this
library more portable, by hiding explicit reference to std lib or
exception stuff because they are not generally binary compatible. So,
I've been trying to adopt the pimpl approach where its overhead is
deemed to be acceptable. A user defined iterator for a user defined
STL-like container is essential in this sense (of course, my patch is
incomplete in this sense as it exposes the vector in .h).

There are more reasons, but if you're still not convinced so far, they
won't change it, so I'll stop here, especially because you are already
okay with the change itself.

I don't see much harm either, though, so you can merge it if you like.

okay, but I actually have more fundamental comments on this
implementation. so stay tuned:-)

comment:6 Changed 9 years ago by jinmei

minor points for rrsetlist_unittest. already fixed and committed, but
logged here just for the record.

  • in generall EXPECT_EQ(a,b) is betterh than EXPECT_TRUE(a==b) because on failure we can see the vlaues of a and b (fixed)
  • iterate test: has_xxx should be explicitly initialized (fixed)
  • simplification: rdata_xxx constants doesn't have to be static const members. (they are in anonymous namespace so there's no name conflict issue) (fixed)

comment:7 follow-up: Changed 9 years ago by jinmei

Relatively more substantial issues on rrsetlist and its tests:

  • rrsetlist
    • findRRset(type/class version) I don't think it's a good idea to rely on the default parameter:

RRsetPtr findRRset(const RRType& rrtype,

const RRClass& rrclass = RRClass::IN());

Generally, I think we should use default parameters judiciously,
and in this case it seems to go too far. And, in fact, it causes
a counter intuitive behavior (I'd actually call it a "bug")
through operator[] that relies on the default parameter.
(see the commented-out randomAccess test I added)

Maybe the default parameter was introduced exactly for the
convenience of "operator[]", which can only take one argument.
I actually the usage of operator[] is not a good idea either, and
strongly suggest avoiding this overloading and the use of default
parameter. I'll discuss operator[] separately.

  • in addRRset, the following line is ugly:-) dns_throw(DuplicateRRset, ""); I'm also afraid "DuplicateRRset" is too generic.
  • if we want to have a container-like thing, it's more intuitive if find() variants provide a STL-compatible behavior IMO (i.e, on no match, return end() iterator rather than throwing an exception). That is, findRRset would return a RRsetList::iterator rather than RRsetPtr. I admit this is more or less a matter of style, though.
  • RRsetPtr findRRset(ConstRRsetPtr rrsetptr); As far as I can see this is not used anywhere (The whole tree compiles even if I commented out this member function). If it's not used please remove it. If it's really necessary, I'm not sure the "find" logic is reasonable. It basically finds by matching pointers, not by contents. Is that the intent?
  • a bikeshed point (although I think it's not that minor) I wonder whether we could find a better name than "RRsetList". IMO it's not a good practice to name objects using particular data structures such as "list", because it could mislead the user and let them assume a specific operational characteristics provided by such specific data structures. For example, by "list" I'd normally assume it accepts duplicate entries to be inserted, but this implementation actually doesn't allow that. I'd also assume insertion/deletion are O(1) operations, but in the current implementation using std::vector it's not correct. "list" is particularly confusing because it's in STL containers, so people might assume it's internally implemented using an STL list and assume the operational/performance characteristics that the std list provides.

I wish I could propose a better name, but I don't have a good
idea. I'd say "RRsets", but it's also confusing in that it's very
similar to "RRset". "SetOfRRsets" would best represent the nature
of this object (though "set" is also in STL containers), but may
sound awkward.

Hopefully we can come up with a better name, but for the year1
goal, it's certainly minor. So I'm fine with deferring this point
as a year2 fodder.

  • rrsetlist_unittest
    • findRRset: need tests for the pointer version (but I suspect we

should rather remove this method. see above)

  • findRRset, operator[]: need tests for no match case (but for operator[] see the more fundamental point above)

comment:8 in reply to: ↑ 7 Changed 9 years ago by each

I agree with most of these points.

RRsetList is called that because RRsetSet sounded weird and RRsetContainer was too long and vague. But I'm happy to change the name if you can think of a better one. It was originally just a typedef to a std::vector; turning it into a class and making it possible to index by type was a later addition (mostly done by Michael).

I would probably not have had a class parameter in findRRset() at all; my inclination is to have RRsetList enforce a rule that RRsets of only one class can be inserted--once the first one has gone in, all subsequent ones must match it or throw an exception.

comment:9 Changed 9 years ago by jinmei

another comment on RRsetList.

I guess it's designed to be non-copyable class. If so, I suggest making
the copy constructor and operator= private explicitly.

That will prevent a possible bug that you incorrectly define a function
parameter as 'RRsetList' (an object) where it should be 'RRsetList&'.

I've made this change and committed it to the trunk (rev 1032).

On the ohter hand, if the intent is to allow copying RRsetList objects
with the default version of copy ctor and operator=, please explicitly
make comments so (I'm generally trying to keep this convention for
classes I'm defining).

comment:10 Changed 9 years ago by jinmei

Now, probably a more controversial point.

I don't think the overloading of operator[] for this class makes sense.

First, as I already pointed out in the previous comment message, it
doesn't work unless we fix the type or class (although there are
programming techniques to work around this restriction). I don't
think it's a reasonable assumption for general purpose libraries like
this even though in the vast majority of the cases it would be class
IN.

Second, IMO this usage of operator[] is misleading regarding the
underlying behavior. Many programmers expect operator[] works as an
O(1) operation (in fact the corresponding unittest name "randomAccess"
suggests the tester thought it just works like an index-based array
access...even though he actually knew how it was implemented). But
this implementation of operator[] actually performs linear search, and
could be pretty expensive if it contains many elements. I've checked
how it's used in the data source implementation, and noted there can
probably be only a few elements in an RRsetList object, so, if this is
a private class hidden in the data source implementation, the worst
case performance may be justified. But now it's included in a public
library, which I hope is specifically intended to be other developers,
we can't assume that other developers use this class only in such a
restricted way. Of course, documentation helps, but the best thing is
to provide intuitive behavior that generally matches developer's
expectation.

I'd also like to note that STL containers generally don't provide
operator[] where it couldn't be implemented as an O(1) operation. The
only exception is std::map, but even for map it's guaranteed to be an
O(log(N)) operation where N is the number of elements. In particular,
std::list doesn't have operator[] interestingly.

Google's coding guideline also strongly discourages the use of
operator overloading, seemingly for this (among other) reason:
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Operator_Overloading
(not to argue we should follow their style, though, and I myself do
not agree everything stated there)

I guess RRsetList::operator[] is provided as an analogy from python or
ruby. From a syntax point of view, that may be more intuitive than
the direct use of findRRset(), especially when performance is not
necessarily a top-priority like in these script languages. As I
stated above, however, I suspect the expectation is not so obvious in
the case of C++, and it's IMO even misleading.

I believe I'm providing a convincing argument, but as I already
admitted, this is probably still controversial and may be a matter of
style or opinion. If you (or Michael) still don't agree, I won't
fight against on this point. We should focus on other things that are
more essential for our year 1 release. But if you think this is a
reasonable argument, I suggest dropping operator[] and replace it with
findRRset():
from

if (!target[rt]) {

to

if (target.findRRset(rt, rrclass) == NULL) {

Of course, if we keep operator[], please fix the current buggy
behavior. And, there's one last comment in that case. The standard
expectation is that operator[] returns a reference, not the content
object itself, so that it can be used as a "lvalue":

target[rt] = new_rrsetptr;

unless there's specific reason not to follow the standard convention,
it would be generally better to follow it. Otherwise, it should be
documented explicitly with why it's prohibited.

comment:11 Changed 9 years ago by jinmei

And, finally, even more controversial comment (note, however: I
wouldn't necessarily require this to be addressed for the year 1
release):

I did't understand why we need this user defined container-like class.

I've looked into the data source code to see its usage, but at least
from the current usage I didn't understand why we needed to have a
specialized container.

The heaviest user of the RRsetList is Sqlite3DataSrc::findRecords(),
(which is quite complicated so I may miss something) but in many cases
we only need to store one RRset in effect. RRSIG may also be stored,
but since it's actually tagged into the main RRset that has the sig,
so the stored RRSIG doesn't actually seem to be used.

The only tricky case I can see in the code is type ANY query, but if
it's the only real usage of RRsetList, I suspect we can handle the
special case without having such a large device of user-defined
separate container class.

Next, if we really need some container-like thing, I didn't understand
why std::map wasn't sufficient (where the key is std::pair of rrtype
(and rrclass), and the value is RRsetPtr). Assuming the currently
unused pointer-based matching (see my other comments) is actually
unnecessary, std::map seems to me to provide everything that's
required here. std::map has some additional advantage over the
current vector-based implementation: find works as an O(log(N))
operation. If the ANY query really requires container like storage,
and if the database has many many different RRtypes of records, this
difference in efficiency can matter when handling an ANY query for
such a name.

Anyway, as long as it's currently working, this is probably not worth
discussing for the year-1 release. But I strongly propose we revisit
the design as a year 2 fodder.

comment:12 Changed 9 years ago by jinmei

  • Owner changed from jinmei to each

I've completed my review on RRsetList.

Admittedly some of the points were too detailed as we are running out
of time. For the remaining stuff maybe I should limit myself to
sanity check level review...

Anyway, giving the ticket back to the code author.

comment:13 Changed 9 years ago by shane

Evan, what's the status here? Should I push this into backlog or maybe the Y2 3rd release?

comment:14 Changed 9 years ago by shane

  • Milestone changed from 05. 3rd Incremental Release: Serious Secondary to 06. 4th Incremental Release

comment:15 Changed 9 years ago by jelte

Don't think i've seen it in the comments above, so I have another one; findRRset only matches on type and class, but not name, and since the addition function does a check, you cannot have an RRsetList with the name type/class but different names. This does not sound like a property a list of rrsets would have to me, given the definition of an RRset.

(btw, in the place i need it i could just as well use a vector right now, though for now in my branch i have made it to match name/class/type)

comment:16 follow-up: Changed 9 years ago by each

  • Status changed from assigned to accepted

I have been putting this at low priority because the code we have serves our needs for the moment, but I do agree with many of Jinmei's comments above. As time permits I've been working on it, and I'm close to having some code for review.

We do need a data structure here, but the one we have was a quick hack (by a C++ newbie at that) to fit the immediate needs of the code we were writing; it wasn't designed, as such.

So, going back to basics: The purpose of this class is short-term storage of a closely related collection of RRsets, which are the result of a single QueryTask or other basic data source operation, e.g.:

  • <qname>/A and/or <qname>/AAAA for glue query tasks
  • <qname>/NS, <qname>/DS and/or <qname>/DNAME for referral query tasks
  • <qname>/<qtype> or <qname>/CNAME for normal query tasks (including wildcard)
  • <targetname>/<targettype> (or <targetname>/CNAME) for CNAME chasing
  • <previousname>/NSEC for covering-NSEC lookups
  • <hashname>/NSEC3 for covering-NSEC3 lookups
  • <synthname>/CNAME for DNAME answers

There is never a reason for a single RRsetList to contain data from more than one RRClass--though there are corner cases when you might not know in advance what the class will be. I believe this is also true for names: you don't know always know what the name will be until the lookup is complete (as in the NSEC and NSEC3 cases, particularly), but you should never have an RRsetList with more than one name.

Essentially, my proposal is to turn this class into a wrapper for std::map<RRType,RRsetPtr> instead of std::vector<RRsetPtr>. (It could even be a derived class from the map, but for various reasons I think a wrapper would be cleaner and more portable.)

I would also like to change the name, as RRsetList was always a misnomer (it isn't a list), but I haven't settled on a better name yet, and I think it's best to table that until after the design has been settled.

In any case, its API would differ from std::map in a few major ways:

  • It only accepts RRsets of one RRClass, defined in advance. If the initial class is ANY, then it accepts any _other_ RRClass for the first insertion, but then requires all subsequent insertions to match the first one.
  • It only accepts RRsets of one name, which may optionally be defined in advance. If not defined, then it accepts any name for the first insertion but requires all subsequent insertions to match the first one.
  • The iterator returns values only--that is, RRsetPtr--rather than key/value pairs as a std::map ordinarily does. (There's no particular architectural reason for this, but it simplifies the code in several places, and iterating over the keys isn't necessary.)

I'll submit some code for discussion shortly. (Not "for review"; we're not that far along yet. I'd like to get some buy-in on the general design decisions, and then write a lot of documentation, and then ask for review.)

comment:17 in reply to: ↑ 16 ; follow-up: Changed 9 years ago by each

Replying to each:

I'll submit some code for discussion shortly. (Not "for review"; we're not that far along yet. I'd like to get some buy-in on the general design decisions, and then write a lot of documentation, and then ask for review.)

Please see r2418.

comment:18 in reply to: ↑ 17 Changed 9 years ago by jinmei

  • billable set to 0
  • Estimated Difficulty set to 0.0
  • Internal? unset

Replying to each:

I'll submit some code for discussion shortly. (Not "for review"; we're not that far along yet. I'd like to get some buy-in on the general design decisions, and then write a lot of documentation, and then ask for review.)

Please see r2418.

First impression (not necessarily insisting it, but as requested):

Clarifying the role of what was called RRsetList is good. based on
the clarification, I now have a slightly (or perhaps substantially)
different opinion than that I originally stated.

According to the clarification, the requested feature is quite limited,
while "RRsetList" is trying to be quite generic. I suspect the scope
mismatch between the need and the solution is the essential cause of
the various problems I pointed out in this ticket.

Of course, sometimes we need to provide generic interface, and need to
address difficulties that are often inevitable with a generic class
(such as usage clarity or performance consideration in general case).
But, in our case, we don't need this generality; no other part than
the data source code uses RRsetList. And let's not say "but someone
may need this feature for a different purpose in the future". We
should know in many cases there won't be such "someone". It should
also be noted that if one wants to have a "set of RRsets" for some
specific needs, one can generally use std::vector<RRset>,
std::map<RRset>, etc. Overall, the current implementation of
RRsetList and attempt of fixing it as a general class doesn't seem to
be worthwhile to me.

With this view, using std::map internally is probably not a good idea
(I know it was me who first mentioned it, but at that time I wasn't so
sure if this class must be generic), because it's generally expensive
with smaller numbers of elements. In particular, inserting an element
to std::map inevitably involves memory allocation (we cannot
preallocate/reuse the room for it), it's probably not suitable in a
performance sensitive path like the query look code path.

I'm not so sure what it should actually look like, but my first
impression is that what we need is a different types of RRset "tuples"
based on your clarification:

  • Address tuples: (A, AAAA)
  • Referral tuples: (NS, DS, DNAME)
  • etc.

where some elements of tuples can be empty.

I'd rather consider how we reasonably model this concept, than trying
to fix the generic "RRsetList". I'd also move the resulting class (if
it's a class) to the data source module.

p.s. in any case, inheriting from std::map wouldn't be an option.
(In my understanding) it's a generally discouraged practice.

comment:19 Changed 8 years ago by stephen

  • Owner changed from each to UnAssigned
  • Status changed from accepted to assigned

comment:20 follow-up: Changed 7 years ago by shane

  • Defect Severity set to N/A
  • Owner changed from UnAssigned to jinmei
  • Sub-Project set to DNS

What's the status of this ticket? Does it actually apply to modern BIND 10? Is there work we need to do?

Passing to Jinmei as he was the instigator here.

comment:21 in reply to: ↑ 20 Changed 7 years ago by jinmei

Replying to shane:

What's the status of this ticket? Does it actually apply to modern BIND 10? Is there work we need to do?

Passing to Jinmei as he was the instigator here.

Another issue that is quite specific to the current (old) data
source. We can get rid of this class once we migrate to the new data
source implementation/API.

comment:22 Changed 7 years ago by shane

  • Resolution set to fixed
  • Status changed from assigned to closed

Okay, closing with the expectation that it will be resolved shortly by other work.

Note: See TracTickets for help on using tickets.