Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#404 closed task (complete)

serialize RDATA for in memory data source

Reported by: jinmei Owned by: vorner
Priority: medium Milestone: Sprint-20110503
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 3.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

This is (incorrectly) categorized as a sub task of "handle zone transfer" in the A-team sprint.

I'd break down this task a bit further: rather than extending the idea to RRset, I'll focus on an in-memory representation of RDATA.

Subtickets

Change History (30)

comment:1 in reply to: ↑ description Changed 9 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from new to reviewing

branches/trac404 is ready for review.

For this task I've extended the MessageRenderer? class so that we can have different versions of it from an abstract base class. For this particular purpose we could do it in a different way, but I'm pretty sure we'll eventually need another derived class for high performance authoritative server implementation. So this should be considered an initial step toward that direction. At this moment, not all existing users of this class have been modified to minimize changes. This will be part of a next step task.

The number of affected files is quite large as such, but most of them are trivial replacement due to the introduction of the abstract base class. The eseential part of this branch is rdatafields.h and rdatafields.cc in src/lib/dns and their tests.

This is proposed changelog entry:

  115.	[func]*		jinmei
	src/lib/dns: Introduced a new RdataFields class to represent RDATA
	in more efficient way.  This is intended to be used for
	performance improvement of b10-auth, and is not expected to be
	used by normal applications.  At this moment this class isn't used
	anywhere in the package yet.  This change also introduced an
	extension to the MessageRenderer class: it is now a derived base
	class of an abstract base class so that we can customize the
	rendering implementation in a polymorphic way.  It requires some
	trivial interface changes to the existing API.
	(Trac #404, svn rTBD)

comment:2 Changed 9 years ago by jinmei

Additional pre-review comment:

I've added a benchmark test to measure rendering performance of basic Rdata and RdataFields.

Surprisingly to me, RdataFields was much slower than Rdata. I first thought it was due to the inefficient logic for rendering name fields, but it was also the case for test data that only contained A and AAAA RDATA. I've pasted some results at the end of this note.

Chasing it a little deeper, I've found that if I omit the "else" part of !RdataFields::toWire() the RdataFields version ran much faster (of course, for data that doesn't contain names).

        } else {
            InputBuffer buffer(data_ + offset, fields_[i].len);
            renderer.writeName(Name(buffer),
                               fields_[i].type == COMPRESSIBLE_NAME ?
                               true : false);
        }

So, compiling this part made it much slower even if it was not used run time. Maybe this is related to the code size and whether it fits in a cache.

Anyway, rendering performance is not the primary target of this work, so I think it's okay for now. We should revisit this issue when we are in performance improvement phase.

Test results (g++)

  Input File: benchmarkdata/rdatarender_data_com
Benchmark for rendering with standard Rdata
Performed 280000 iterations in 0.099544s (2812826.49ips)
Benchmark for rendering with RdataFields
Performed 280000 iterations in 0.403323s (694232.66ips)

  Input File: benchmarkdata/rdatarender_data_org
Benchmark for rendering with standard Rdata
Performed 180000 iterations in 0.073044s (2464268.11ips)
Benchmark for rendering with RdataFields
Performed 180000 iterations in 0.213702s (842294.41ips)

  Input File: benchmarkdata/rdatarender_data_nxdomain
Benchmark for rendering with standard Rdata
Performed 60000 iterations in 0.046006s (1304177.72ips)
Benchmark for rendering with RdataFields
Performed 60000 iterations in 0.127334s (471201.72ips)

Test results (clang++)

  Input File: benchmarkdata/rdatarender_data_com
Benchmark for rendering with standard Rdata
Performed 280000 iterations in 0.090705s (3086930.16ips)
Benchmark for rendering with RdataFields
Performed 280000 iterations in 0.406702s (688464.77ips)

  Input File: benchmarkdata/rdatarender_data_org
Benchmark for rendering with standard Rdata
Performed 180000 iterations in 0.063396s (2839295.85ips)
Benchmark for rendering with RdataFields
Performed 180000 iterations in 0.201228s (894507.72ips)

  Input File: benchmarkdata/rdatarender_data_nxdomain
Benchmark for rendering with standard Rdata
Performed 60000 iterations in 0.044533s (1347315.47ips)
Benchmark for rendering with RdataFields
Performed 60000 iterations in 0.135239s (443659.00ips)

comment:3 Changed 9 years ago by jinmei

Additional notes based on jabber chat with Jelte:

  • we may want to use a kind of RdataFields within specific derived classes of Rdata directly. i.e., instead of listing uintN_t/Name/string members, each Rdata variant class contains a single RdataFields object, and its getter method retrieves the corresponding value from the RdataFields data. Accessing individual RDATA fields is likely to be a rare operations, so optimizing other jobs at the cost of heavier accessing method may make more sense.
  • we may want to provide an interface to convert an RdataFields object to corresponding Rdata. (hmm, to make it possible RdataFields or the interface needs to know its RR class and type, so this may not be as trivial as it may look)

In any case, I think these should go to future extensions of this task.

comment:4 Changed 9 years ago by jinmei

(in case it's not clear, this ticket still needs formal review)

comment:5 follow-up: Changed 9 years ago by vorner

  • Owner changed from UnAssigned to vorner

I'll take the review, I haven't done any in some time (I didn't update the sprint page, there seems to be only the old one right now).

comment:6 in reply to: ↑ 5 Changed 9 years ago by jinmei

Replying to vorner:

I'll take the review, I haven't done any in some time (I didn't update the sprint page, there seems to be only the old one right now).

Actually this turned out to be non urgent. This can (or even should) wait until post next tar ball release.

comment:7 Changed 9 years ago by vorner

Ok, I'm running out of time today anyway. I'll see tomorrow what I can look into.

comment:8 follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

First, I want to say I like the approach. But large changeset deserves larger amount of comments ;-)

In the benchmark, using vector<uint8> as a buffer and using casted pointer to &buffer[0] seems kludge. First of, the fact that it should work is not so apparent from the vector interface, one needs to know the guarantee that the elements must be stored in continuous array of memory. Second, sizeof returns the size in chars and the standard guarantees aliasing compatibility with unsigned char pointers only, not with uint8 *. So, the compiler is free to deduce these two pointers can never point to the same memory and the cast breaks that. So, if you want to use vector and not bare pointer to array, could you please use unsigned chars?

In the AbstractMessageRenderer?, all the writeUint8 and alike methods are really small inside and I can't really imagine any way they would do anything else than just write the data into some buffer (not counting writeName). The fact you made them virtual, I guess you made the call to them much more costly than the code inside them and made them impossible to inline. Isn't there a way to have only one implementation for them? The dividing into fields could be done in other functions, like writeName.

Is std::vector really not portable? We use it in headers all the time. Is the hiding in pimpl really worth the prize of the overhead and more complicated implementation?

RdataFieldComposer? has many small methods. Is it worth writing them into the class header and then putting them outside? Isn't it nice to put their bodies inside the class and hint the compiler these methods are worth inlining (though the compiler does mostly what it wants today anyway).

The RdataFields::toWire has a weard statement: condition ? true : false. Wouldn't condition (without the ?) be enough?

Looking at the FieldSpec? ‒ is it probable we use all 16 bits of the length field? Surely not for a name and if it is possible to have such long binary fields (I doubt there are any in the wild), we could split it into more consecutive fields. If we spared 2 bits from the length, we could put the type of field to the same 2-byte word (either by manually masking & shifting) or by field width specification (but I fear it's feature of C99). Since now it takes 3 bytes and another byte will be padding, because the uint16 can't live on odd address, this would make the size half.

comment:9 Changed 9 years ago by shane

So... what's the status of this work?

comment:10 Changed 9 years ago by shane

  • Milestone changed from y2 12 month milestone to feature backlog item

comment:11 Changed 9 years ago by jinmei

  • Owner changed from jinmei to UnAssigned

comment:12 Changed 9 years ago by stephen

  • Milestone feature backlog item deleted

Milestone feature backlog item deleted

comment:13 Changed 9 years ago by shane

  • Milestone set to Sprint-20110405

comment:14 Changed 9 years ago by vorner

  • Owner changed from UnAssigned to vorner
  • Status changed from reviewing to accepted

After asking on jabber, I'll take and implement the suggestions, passing it to someone else for review, so this moves forward.

comment:15 follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to UnAssigned
  • Status changed from accepted to reviewing

Hello

I implemented most of the ideas. I left out the one with squeezing each FieldSpec? into 2 bytes, as premature optimisation. We might want it in future, so I just put a TODO there. I did implement the removal of virtual methods. For one, it's less code and, for second, if they are already in one place, there's no need for virtual functions (at last for now).

Can someone have a look?

comment:16 Changed 9 years ago by jinmei

  • Owner changed from UnAssigned to jinmei
  • Status changed from reviewing to accepted

comment:17 in reply to: ↑ 8 Changed 9 years ago by jinmei

(I'll first provide some responses to the original review)

Replying to vorner:

In the benchmark, using vector<uint8> as a buffer and using casted pointer to &buffer[0] seems kludge. First of, the fact that it should work is not so apparent from the vector interface, one needs to know the guarantee that the elements must be stored in continuous array of memory.

I admit this syntax may look tricky. But at the same time I thought this was a quite well-known and commoly used idiom for decent level of C++/STL programmers, and I would say we should allow this usage in our code when need conversion between the STL world and lower-level pure C interfaces. I noticed you made explict comments about this point in your revised branch, to which I wouldn't necessarily make an objection.

Second, sizeof returns the size in chars and the standard guarantees aliasing compatibility with unsigned char pointers only, not with uint8 *. So, the compiler is free to deduce these two pointers can never point to the same memory and the cast breaks that. So, if you want to use vector and not bare pointer to array, could you please use unsigned chars?

I'm okay with changing uint8 to char (noticing you didn't change that part). Frankly, though, if we worry about possible incompatibility between uint8 and unsigned char, I'm afraid we'll find much more serious cases in our main code (rather than in a benchmark tool).

In the AbstractMessageRenderer?, all the writeUint8 and alike methods are really small inside and I can't really imagine any way they would do anything else than just write the data into some buffer (not counting writeName). The fact you made them virtual, I guess you made the call to them much more costly than the code inside them and made them impossible to inline. Isn't there a way to have only one implementation for them? The dividing into fields could be done in other functions, like writeName.

Personally I'd think it's premature to worry about the overhead of the
virtual functions at this moment (I wouldn't deny the possibility of
finding it to be a critical performance bottleneck when we reach the
stage of optimization). But I also see most of the writeXXX() methods
would be a duplicate of the base class version, so, as a conclusion
I'm okay with defining them non virtual at least until we see the need
for making them virtual. I have some other comments regarding this
point for the actual patch, which I'll explain separately.

Is std::vector really not portable? We use it in headers all the time. Is the hiding in pimpl really worth the prize of the overhead and more complicated implementation?

It's quite likely to be non portable (or non compatible) at the binary
level when we change the STL implementation (even the debug/normal
modes of STLPort are incompatible; for example, we'd have to build and
use gtest with the same mode of STLPort). Basically, I want to
minimize the risk of breaking the compatibility unless there's a
strong need to expose it. I admit we are already breaking this level
of compatibility in various places of our code, but if we use the fact
as an excuse to break it further, it will be more and more difficult
to restore the compatibility when if and we want to do that.

Performance reasons can be a "strong need" in future, so, once we are
in the optimization phase and find the overhead of indirection is a
critical bottleneck, I'm okay with exposing the vector definition.
Until then, I'd rather hide it.

RdataFieldComposer? has many small methods. Is it worth writing them into the class header and then putting them outside? Isn't it nice to put their bodies inside the class and hint the compiler these methods are worth inlining (though the compiler does mostly what it wants today anyway).

I suspect this comment doesn't apply to your own patch, which
eliminated many of the RdataFieldComposer::writeXXX methods, so I'll
skip this point.

The RdataFields::toWire has a weard statement: condition ? true : false. Wouldn't condition (without the ?) be enough?

Ah, this seems to me a common bad practice of mine. I'm not sure why
I tend to make this type of expression unnecessarily complicated, but,
you're right; we don't (shouldn't) need ?:.

Looking at the FieldSpec? ‒ is it probable we use all 16 bits of the length field? Surely not for a name and if it is possible to have such long binary fields (I doubt there are any in the wild), we could split it into more consecutive fields. If we spared 2 bits from the length, we could put the type of field to the same 2-byte word (either by manually masking & shifting) or by field width specification (but I fear it's feature of C99). Since now it takes 3 bytes and another byte will be padding, because the uint16 can't live on odd address, this would make the size half.

I agree it makes sense to consider compressing the fields further, but
that would probably be a next step, probably after measuring the
actual memory footprint with real world example (big) zones.

comment:18 in reply to: ↑ 15 ; follow-up: Changed 9 years ago by jinmei

Replying to vorner:

I implemented most of the ideas. I left out the one with squeezing each FieldSpec? into 2 bytes, as premature optimisation. We might want it in future, so I just put a TODO there. I did implement the removal of virtual methods. For one, it's less code and, for second, if they are already in one place, there's no need for virtual functions (at last for now).

Can someone have a look?

Now the review on the branch.

messagerenderer

  • As I said in the previous comment, I'm okay with making most of the writeXXX() non virtual. But the intent wouldn't be so clear (e.g., one would wonder why only these are non virtual while writeName is virtual). So I'd describe the design intent as a doxygen comment.
  • Also as I said, I'd still hide the buffer using pimpl at the moment.
  • In general I don't like to have a non private member variable unless it's absolutely necessary (just as many C++ books suggest), and in that sense I don't like to make AbstractMessageRenderer::buffer_ protected. IMO in this case it's not "absolute necessary", and I'd make it private (whether or not via pimpl) and have derived classes access it through public writeXXX interfaces. (I'm noticing that in order to do that we'd also need some extensions to AbstractMessageRenderer?)
  • [comment] This is an off topic for this ticket, but I've been thinking it makes more sense to hide the output buffer inside the (Abstract)MessageRenderer? (again, whether or not pimpl) rather than passing it via the constructor. Previously the renderer class didn't have getData() and getDataLength(), so the caller needs to get access to the row data via the OutputBuffer? interfaces. But now the render also provides the way to get access to the raw data via its own getData()/getDataLength(), it makes the caller simpler if the renderer internally constructs the OutputBuffer? (or even some other specialized data structure). Now we are touching the renderer class, this may be a good opportunity to make this cleanup. But since it will affect many other parts of the entire BIND 10 code (we'll also need to update the python binding and its users, for example), it may better be deffered to a separate ticket. (If this change makes sense) either is okay for me.

rdatafields.h

  • I don't understand the point of changing the return type of getFieldSpecData(). Which specific alias problem does the change try to solve?
  • In any case, I'd avoid offending comments like this:
    +        // Nasty cast, because C++ authors didn't read the C specification
    +        // about pointers. We can't return void* from it, that would break
    
    It's an unfounded accusation (how could we know whether C++ authors read the specification?), and even if it's the fact, this type of comment only makes us look vulgar and doesn't help anything IMO. Let's be professional and just provide technical explanation.
  • And in any case, if the type change and cast are really necessary (which I didn't understand yet), I'd avoid using C-style cast.

minor editorial points

  • In messagerenderer.h, 'at last' should be 'at least':
    +    /// It was decided that there's no need to have this in every subclass,
    +    /// at last not now, and this reduces code size and gives compiler a better
    
  • in rdatafields.cc, typo here also? (s/arriwe/arrive/?)
        // So new data can arriwe without us knowing it, this considers all new
    

comment:19 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner
  • Status changed from accepted to reviewing

comment:20 in reply to: ↑ 18 ; follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

I admit this syntax may look tricky. But at the same time I thought this was a quite well-known and commoly used idiom for decent level of C++/STL programmers, and I would say we should allow this usage in our code when need conversion between the STL world and lower-level pure C interfaces. I noticed you made explict comments about this point in your revised branch, to which I wouldn't necessarily make an objection.

Actually, I wasn't worried about the syntax. It was that this kind of thing wouldn't work if it was, let's say, a list. It actually uses an implementation detail of vector to work, which is not advertised trough its API, but only guaranteed somewhere in the documentation. Such things are generally used when coding in perl, but I consider them hard to read, as it needs nontrivial joining of information to understand what is actually happening.

So I'm not really opposing the use, because it has some advantages (like the size can grow by itself), but I think these parts of code really do deserve a comment whenever used to show that the author did it in conscious way, not naively guessing.

I'm okay with changing uint8 to char (noticing you didn't change that part). Frankly, though, if we worry about possible incompatibility between uint8 and unsigned char, I'm afraid we'll find much more serious cases in our main code (rather than in a benchmark tool).

Well, the worry wasn't about incompatibility of their binary representation (if there's, lets say, 9bit char, the whole code is, I guess, unusable), but the original worry was about aliasing, that if uint8_t and unsigned char were declared as different types. But then I decided that it makes no sense to declare new types when an typedef alias is enough, so we shouldn't meet this in practice and saved myself some work.

Is std::vector really not portable? We use it in headers all the time. Is the hiding in pimpl really worth the prize of the overhead and more complicated implementation?

It's quite likely to be non portable (or non compatible) at the binary
level when we change the STL implementation (even the debug/normal
modes of STLPort are incompatible; for example, we'd have to build and
use gtest with the same mode of STLPort). Basically, I want to
minimize the risk of breaking the compatibility unless there's a
strong need to expose it. I admit we are already breaking this level
of compatibility in various places of our code, but if we use the fact
as an excuse to break it further, it will be more and more difficult
to restore the compatibility when if and we want to do that.

I'd really be worried to go that way, it really complicates matters a lot ‒ we wouldn't be able to put anything except basic types into the headers, not even a shared pointer. This kind of thing is usually solved by distro maintainers anyway. So, maybe it's a thing we should bring up to some call? Or, was it already discussed sometime?

Looking at the FieldSpec? ‒ is it probable we use all 16 bits of the length field? Surely not for a name and if it is possible to have such long binary fields (I doubt there are any in the wild), we could split it into more consecutive fields. If we spared 2 bits from the length, we could put the type of field to the same 2-byte word (either by manually masking & shifting) or by field width specification (but I fear it's feature of C99). Since now it takes 3 bytes and another byte will be padding, because the uint16 can't live on odd address, this would make the size half.

I agree it makes sense to consider compressing the fields further, but
that would probably be a next step, probably after measuring the
actual memory footprint with real world example (big) zones.

Yes, we seem to agree on that one then.

Replying to jinmei:

Replying to vorner:

  • Also as I said, I'd still hide the buffer using pimpl at the moment.

Well, there are actually two issues on that. One is it brings quite non-trivial overhead (which might or might not be important, but it's not only the instructions of calling a functions, it's that the compiler loses most of its invariants across a call to unknown function, because it must be in a different file). The second, probably more important one, is code readability. Too little of structure/objects make the code hard to follow, leading to spaghetti code. But the opposite, too many layers of indirection, make the code equally hard to follow. If you look at the writeUint8 function, for example, it is already only a wrapper function to call the same on a buffer, which is, in turn, a single-line function with assignment. If we add a pimpl there, there's another indirection that brings no simplification whatsoever, only adding yet another file that must be opened (bringing it up to 3) to find out there's nothing fancy happening at all.

So I'm little bit reluctant to do that. Is it about the binary compatibility as well?

  • In general I don't like to have a non private member variable unless it's absolutely necessary (just as many C++ books suggest), and in that sense I don't like to make AbstractMessageRenderer::buffer_ protected. IMO in this case it's not "absolute necessary", and I'd make it private (whether or not via pimpl) and have derived classes access it through public writeXXX interfaces. (I'm noticing that in order to do that we'd also need some extensions to AbstractMessageRenderer?)

Again, what good does it bring? Because anybody is able to fully overwrite the buffer with the write* methods, extend it, shrink it, etc. So it doesn't bring any protection of integrity. It only complicates matters for whoever is writing the writeName() function. The rule in whatever book is rule of thumb, that you don't want just anybody being able to dig in your guts. This doesn't apply here, since everybody is already able to do so (and, by definition of what the class does, must be). Or do I miss some benefit here?

  • [comment] This is an off topic for this ticket, but I've been thinking it makes more sense to hide the output buffer inside the (Abstract)MessageRenderer? (again, whether or not pimpl) rather than passing it via the constructor. Previously the renderer class didn't have getData() and getDataLength(), so the caller needs to get access to the row data via the OutputBuffer? interfaces. But now the render also provides the way to get access to the raw data via its own getData()/getDataLength(), it makes the caller simpler if the renderer internally constructs the OutputBuffer? (or even some other specialized data structure). Now we are touching the renderer class, this may be a good opportunity to make this cleanup. But since it will affect many other parts of the entire BIND 10 code (we'll also need to update the python binding and its users, for example), it may better be deffered to a separate ticket. (If this change makes sense) either is okay for me.

It does make sense in the terms of clean design. On the other side, it might make one optimisation a little bit harder ‒ currently, it is possible to reuse the same buffer to render many messages. Maybe it would be possible to reuse the whole renderer, though.

Anyway, could I propose that we take the lazy approach, make a ticket or comment at the interface and leave it as it is until the change brings any advantage?

rdatafields.h

  • I don't understand the point of changing the return type of getFieldSpecData(). Which specific alias problem does the change try to solve?

Thinking about it once again, I was wrong. The problem would be if anybody tried to access the data by a pointer to something different than char, FieldSpec or Type/uint16_t (depending on the part of data accessed). So, in that sense, void * is incompatible with the data, meaning that you can't access the void pointed to because of aliasing rules. However, no one can really access void, so it's not an issue.

So I might return it back to void *. However, thinking about it, why we returned void * in the first place, when the docs say it can be type-casted to FieldSpec? In that sense, the API seems inconsistent, since void * usually describe „typeless unknown data“. Wouldn't returning a FieldSpec * make more sense?

  • In any case, I'd avoid offending comments like this:
    +        // Nasty cast, because C++ authors didn't read the C specification
    +        // about pointers. We can't return void* from it, that would break
    
    It's an unfounded accusation (how could we know whether C++ authors read the specification?), and even if it's the fact, this type of comment only makes us look vulgar and doesn't help anything IMO. Let's be professional and just provide technical explanation.

ACK. Anyway, it seems strange that using an unsafe cast to void * is allowed (this exact cast can be used to break the aliasing rules), but cast to char * that is completely safe regarding these is not.

So, I'm deffering some implementation until we get to some consensus there.

Sorry for disagreeing so much, but I can't help myself feeling it differently.

With regards

comment:21 in reply to: ↑ 20 ; follow-up: Changed 9 years ago by jinmei

Replying to vorner:

It's quite likely to be non portable (or non compatible) at the binary
level when we change the STL implementation (even the debug/normal
modes of STLPort are incompatible; for example, we'd have to build and
use gtest with the same mode of STLPort). Basically, I want to
minimize the risk of breaking the compatibility unless there's a
strong need to expose it. I admit we are already breaking this level
of compatibility in various places of our code, but if we use the fact
as an excuse to break it further, it will be more and more difficult
to restore the compatibility when if and we want to do that.

I'd really be worried to go that way, it really complicates matters a lot ‒ we wouldn't be able to put anything except basic types into the headers, not even a shared pointer. This kind of thing is usually solved by distro maintainers anyway. So, maybe it's a thing we should bring up to some call? Or, was it already discussed sometime?

I don't think the incompatibility like the one between debug/normal
modes of STLPort will be taken care of by the distro. As for the
call/previous discussion, we've discussed things related to this topic
time to time, but as far as I know we don't really have a clear
consensus about how much we'd allow non basic types to be in public
interfaces.

I'm happy to raise this project wise, but I'm a bit pessimistic we can
have a consensus. For this specific case, as you seem to want to
avoid hiding buffer quite strongly, and as I have to admit it wouldn't
be very convincing to insist on this case while leaving many others,
I'm okay with leaving the decision on this point to you.

Replying to jinmei:

Replying to vorner:

  • Also as I said, I'd still hide the buffer using pimpl at the moment.

Well, there are actually two issues on that. [...]

So I'm little bit reluctant to do that. Is it about the binary compatibility as well?

Yes, that's the same discussion. I simply repeated it as a summarized
specific point to the branch code itself. See above.

  • In general I don't like to have a non private member variable unless it's absolutely necessary (just as many C++ books suggest), [...]

Again, what good does it bring? Because anybody is able to fully overwrite the buffer with the write* methods, extend it, shrink it, etc. So it doesn't bring any protection of integrity. It only complicates matters for whoever is writing the writeName() function. The rule in whatever book is rule of thumb, that you don't want just anybody being able to dig in your guts. This doesn't apply here, since everybody is already able to do so (and, by definition of what the class does, must be). Or do I miss some benefit here?

My main concern in this context is to ensure the (larger) flexibility
of changing the internal representation before many other clients,
hopefully third party ones, relies on the public interfaces. Once
someone starts relying on the fact that the backend of MessageRenderer?
is OutputBuffer?, we won't be able to change that without breaking
compatibility of that deployment. It's true that it's not a problem
today, and, if we are lucky (or unlucky that no one else is interested
in using our APIs) this may never be an issue in future. But exactly
because it's not an immediate threat we tend to underestimate the
risk, and will only regret the decision when the risk is realized and
it's too late.

I understand it's a tradeoff between that kind of robustness and
complexity due to more indirections. I also understand when there are
many public methods that access the internal data it can effectively
mean exposing the data itself. But, in this case, taking these into
account I still prefer publishing the internal data representation.

Like I said in the above paragraph, I understand it's a controversial
point and different people may have different opinion. If you still
don't like my suggestion, I propose bringing this to the dev list and
resolve the conflict consensus-basis.

  • [comment] This is an off topic for this ticket, but I've been

thinking it makes more sense to hide the output buffer inside the
(Abstract)MessageRenderer? [...]

It does make sense in the terms of clean design. On the other side, it might make one optimisation a little bit harder ‒ currently, it is possible to reuse the same buffer to render many messages. Maybe it would be possible to reuse the whole renderer, though.

MessageRender? should also be able to be reused. In fact that was my
original intent.

Anyway, could I propose that we take the lazy approach, make a
ticket or comment at the interface and leave it as it is until the
change brings any advantage?

That's fine. In general I agree we should limit the scope of
individual tickets.

rdatafields.h

  • I don't understand the point of changing the return type of getFieldSpecData(). Which specific alias problem does the change try to solve?

Thinking about it once again, I was wrong. The problem would be if anybody tried to access the data by a pointer to something different than char, FieldSpec or Type/uint16_t (depending on the part of data accessed). So, in that sense, void * is incompatible with the data, meaning that you can't access the void pointed to because of aliasing rules. However, no one can really access void, so it's not an issue.

Okay.

So I might return it back to void *. However, thinking about it, why we returned void * in the first place, when the docs say it can be type-casted to FieldSpec? In that sense, the API seems inconsistent, since void * usually describe „typeless unknown data“. Wouldn't returning a FieldSpec * make more sense?

Hmm, good point. I don't remember what I exactly thought about the
interface consistency when I first wrote the original code, but my
general intent is to treat the exposed data as opaque as possible.
So, if it doesn't break other part, it will probably make more sense
to change the second constructor of RdataFields? so that it takes void*
instead of FieldSpec?* (and internally convert it to FieldSpec?* using
cast). And, in that case, it may also make sense to use the actual
data length rather than the number of spec's.

Sorry for disagreeing so much, but I can't help myself feeling it differently.

No worries. I understand many of the above points are controversial
where people's opinions may vary.

comment:22 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:23 in reply to: ↑ 21 ; follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

I'm happy to raise this project wise, but I'm a bit pessimistic we can
have a consensus. For this specific case, as you seem to want to
avoid hiding buffer quite strongly, and as I have to admit it wouldn't
be very convincing to insist on this case while leaving many others,
I'm okay with leaving the decision on this point to you.

OK, I sent a mail to the dev list, let's at last try and wait a little bit. This task isn't in any real hurry anyway.

So I'm little bit reluctant to do that. Is it about the binary compatibility as well?

Yes, that's the same discussion. I simply repeated it as a summarized
specific point to the branch code itself. See above.

OK, so I'm leaving this open for now.

My main concern in this context is to ensure the (larger) flexibility
of changing the internal representation before many other clients,
hopefully third party ones, relies on the public interfaces. Once
someone starts relying on the fact that the backend of MessageRenderer?
is OutputBuffer?, we won't be able to change that without breaking
compatibility of that deployment. It's true that it's not a problem
today, and, if we are lucky (or unlucky that no one else is interested
in using our APIs) this may never be an issue in future. But exactly
because it's not an immediate threat we tend to underestimate the
risk, and will only regret the decision when the risk is realized and
it's too late.

Hmm, I see, I didn't think about this issue. Yes, it makes sense, I tried to hide it inside. But I still had to provide the getBuffer() method, because the MessageRenderer? needs to store references to them from time to time. It could be refactored so it wouldn't need it, but it seemed too big change inside this ticket. Should I create a ticket for it?

Anyway, could I propose that we take the lazy approach, make a
ticket or comment at the interface and leave it as it is until the
change brings any advantage?

That's fine. In general I agree we should limit the scope of
individual tickets.

I put a doxygen todo at the interface.

So I might return it back to void *. However, thinking about it, why we returned void * in the first place, when the docs say it can be type-casted to FieldSpec? In that sense, the API seems inconsistent, since void * usually describe „typeless unknown data“. Wouldn't returning a FieldSpec * make more sense?

Hmm, good point. I don't remember what I exactly thought about the
interface consistency when I first wrote the original code, but my
general intent is to treat the exposed data as opaque as possible.
So, if it doesn't break other part, it will probably make more sense
to change the second constructor of RdataFields? so that it takes void*
instead of FieldSpec?* (and internally convert it to FieldSpec?* using
cast). And, in that case, it may also make sense to use the actual
data length rather than the number of spec's.

I changed it to void* and the field length function to return number of bytes. I also changed the getData to return void* and the constructor to take void* pointers to make it little bit more consistent and noted that it isn't actually expected that users would like to dig trough the data.

I hope I didn't forget anything.

With regards

comment:24 in reply to: ↑ 23 ; follow-up: Changed 9 years ago by jinmei

Replying to vorner:

I'm happy to raise this project wise, but I'm a bit pessimistic we can
have a consensus. For this specific case, as you seem to want to
avoid hiding buffer quite strongly, and as I have to admit it wouldn't
be very convincing to insist on this case while leaving many others,
I'm okay with leaving the decision on this point to you.

OK, I sent a mail to the dev list, let's at last try and wait a little bit. This task isn't in any real hurry anyway.

Ack (but I'm okay with merging this anyway and creating a new task if
the result of the dev list discussion suggests a change. actually it
may be better than holding off in terms of task progress management as
long as we can keep track of the possibly open point).

Hmm, I see, I didn't think about this issue. Yes, it makes sense, I tried to hide it inside. But I still had to provide the getBuffer() method, because the MessageRenderer? needs to store references to them from time to time. It could be refactored so it wouldn't need it, but it seemed too big change inside this ticket. Should I create a ticket for it?

Yes, please. I'm okay with moving forward with getBuffer() for now.

I changed it to void* and the field length function to return number of bytes. I also changed the getData to return void* and the constructor to take void* pointers to make it little bit more consistent and noted that it isn't actually expected that users would like to dig trough the data.

Looks good, but we'll then need getFieldCount() in addition to
getFieldDataSize() (btw, this one may better be named
getFieldDataSpecSize() to be consistent with getFieldSpecData());
otherwise the application cannot easily iterate over all fields using
getFieldSpec() (it could still compute the # of fields by
getFieldDataSpecSize() / sizeof(FieldSpec?), but it's inconvenient and
dangerous in that it relies too much on the internal representation).

Some other comments to the revised code:

  • can't this be just a call to (Abstract)MessageRenderer::getLength()?
    -    const size_t offset = buffer_.getLength();
    +    const size_t offset = getBuffer().getLength();
    
  • Same for this:
    +    virtual const void* getData() const { return (getBuffer().getData()); }
    +    virtual size_t getLength() const { return (getBuffer().getLength()); }
    
    (actually shouldn't this be just removed because these methods of AbstractMessageRenderer? aren't virtual?)
  • And same for these:
    +        last_data_pos_ = getBuffer().getLength();
    ...
    +        if (getBuffer().getLength() == last_data_pos_) {
    ...
    +        fields_.back().len += getBuffer().getLength() - last_data_pos_;
    +        last_data_pos_ = getBuffer().getLength();
    
  • This one can avoid the use of getBuffer() if we first dump the name to a local OutputBuffer? and copy it using AbstraceMessageRenderer::writeData(). But that will be a bit more expensive due to the additional copy, so if you want to avoid this approach for this reason, I'm okay about it for now.
    +        name.toWire(getBuffer());
    

I hope I didn't forget anything.

Except the small things mentioned above, it mostly looks okay.

comment:25 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:26 in reply to: ↑ 24 ; follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

Looks good, but we'll then need getFieldCount() in addition to

ACK, added.

+    virtual const void* getData() const { return (getBuffer().getData()); }
+    virtual size_t getLength() const { return (getBuffer().getLength()); }

(actually shouldn't this be just removed because these methods of
AbstractMessageRenderer? aren't virtual?)

Hmm, yes, I'm just blind.

  • This one can avoid the use of getBuffer() if we first dump the name to a local OutputBuffer? and copy it using AbstraceMessageRenderer::writeData(). But that will be a bit more expensive due to the additional copy, so if you want to avoid this approach for this reason, I'm okay about it for now.

I added a TODO note there. We need the getBuffer now because of the MessageRenderer? anyway, so it can wait until we can actually remove it.

With regards

comment:27 in reply to: ↑ 26 Changed 9 years ago by jinmei

Replying to vorner:

The latest code looks okay with one minor nit (which is actually not for the latest diff though):

    unsigned int getFieldSpecDataSize() const { return (nfields_ *
                                                    sizeof *fields_); }

We generally add parentheses after 'sizeof' (following the BIND 9 coding
guideline).

With fixing this please feel free to merge it. Please also feel free
to add your name to the changelog entry if you like.

comment:28 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:29 Changed 9 years ago by vorner

  • Resolution set to complete
  • Status changed from reviewing to closed

Ok, thanks.

comment:30 Changed 9 years ago by vorner

  • Defect Severity set to N/A
  • Estimated Difficulty changed from 0.0 to 3
  • Sub-Project set to DNS
Note: See TracTickets for help on using tickets.