Opened 8 years ago

Closed 8 years ago

#1259 closed task (complete)

framework for adding/deleting RR in datasource from xfrin

Reported by: jinmei Owned by: vorner
Priority: very high Milestone: Sprint-20111011
Component: xfrin Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

This is a subtask of #1212 and can be independently done from other
subtasks.

It does not necessarily have to be exactly the same, but the tentative
proposal is to port BIND 9's ixfr_putdata(), ixfr_apply(), and
ixfr_commit() (implemented in lib/dns/xfrin.c), and
dns_difftuple_create() and dns_diff_apply() (lib/dns/diff.c). We
introduce a notion of "RR diff", which (conceptually) consists of an
operation mode (add or delete) and an RR (which would be an RRset
object containing only one RDATA in our API). dns_diff_apply()
accepts a list of RR diffs, combining them into RRset diffs (for
efficiency reasons), then make the corresponding changes on the zone
DB. ixfr_putdata() takes an operation (add or delete) and RR from the
main IXFR logic, creates an RR diff object, and maintains it in a
list. For every 100 diffs, it calls ixfr_apply(). ixfr_apply()
starts a new transaction (if it has already started transaction,
continue it) and makes changes for the RR diff list accumulated so far
under the transaction. ixfr_commit() calls ixfr_apply() to apply any
remaining diff and commits the transaction.

Subtickets

Change History (16)

comment:1 Changed 8 years ago by vorner

  • Owner set to vorner
  • Status changed from new to accepted

comment:2 Changed 8 years ago by vorner

As of 9b23d60d6f58b18da3995dc3e090d7fd63233bcc, the code should be able to perform anything needed functionally-wise. All the performance stuff (once every 100 RRs, diff compaction) as well as some checks and exceptions are missing, but this should be enough to start developing on top of it. So #1261 and #1262 could probably start, if anybody is available (and willing to merge the current yet-unmerged branches together in some temporary branch).

I'm going to finish this tomorrow and put it to review.

comment:3 Changed 8 years ago by vorner

  • Owner changed from vorner to UnAssigned
  • Status changed from accepted to reviewing

Hello

It should be ready for review. The interface is similar to the proposed one, just more object oriented and pythonic.

With regards

comment:4 Changed 8 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:5 follow-up: Changed 8 years ago by jinmei

It basically looks clean and okay. I have some minor comments for the
implementation and some general discussions (the latter is not
necessarily a showstopper).

general

  • This is not directly related to this branch, but why does it import isc.dns, not pydnspp? Or, why do we have both isc.dns (as a forwarder module) and pydnspp in the first place? Maybe we should deprecate pydnspp and unify them into isc.dns?
  • This module doesn't do any logging. Is it okay? Note that I think there's at least one event we would like to log (see below)
  • Is the module name 'xfrin' appropriate? I guess we'll reuse (whole or part) of this code for dynamic update. We may also want to support other ways to change zone data (e.g. via bindctl), and it may also want to use this module. Perhaps we should eventually have a single isc.dns module (package) and make this module a sub module of it, like isc.dns.diff?
  • About the magic number of 100: maybe we should define it via a constant variable with some comments on the rationale (actually it was simply derived from BIND 9's implementation and we don't know how it was chosen).
    # explain what this is and how we choose the value
    DIFF_APPLY_THRESHOLD = 100
    

diff.py

  • Diff.init(): the argument name "datasource" might be a bit misleading, because it's actually a datasource "client".
  • Diff.init(): it creates an updater at initialization time. This is slightly different from the original BIND 9 behavior. In BIND 9, we delay starting the transaction when first apply() happens, thereby limiting the duration of the transaction even if the remote server is slowly feeding the data. If you thought that would be a premature optimization and intentionally skipped that, I'm okay with it; just checking the intent.
  • Diff.compact(): in a future version we should probably treat RRSIGs that cover different RR types separately, just as BIND 9 does. I'm okay with deferring it as in practice we won't have to worry about the case, and in any event that wouldn't matter in our current sqlite3 schema (and it's the only writable data source for now).
  • Diff.compact(): BIND 9 checks if the TTLs of the combined RRs are the same, and a warning log if they aren't. We should probably do the same. (and we need a test for that case)
  • Diff.add(), remove(): maybe we should check if the RR class is identical to that associated with the client (updater). hmm, BIND 9 doesn't check this at all, maybe trusting the caller. I know in BIND 9 the DNS message parser ensures all RR classes are the same, so it probably relies on that check. Either way is okay for me - but if we assume the RR class is valid within this module, this should be documented for the class pydoc.
  • Diff.compact(): Related to the previous point, should we also take into account the RR class? (i.e., check the RR class and don't combine them if the classes are different)...probably not in any case, though: If we assume the RR class is valid, we should rely on the assumption here; if we have add()/remove() check the classes, compact() can rely on the result.
  • Diff.apply(): maybe we should make the object unusable (e.g. by adding a flag) when it sees an exception from the data source in add_rrset(), remove_rrset() and commit(), so that even a careless application cannot accidentally continue the operation? Not a strong opinion, maybe noting we should stop using it when that happens, just a thought.

diff_tests.py

  • data_common(): argument 'name' is a bit confusing as it may sound like a domain name, I'd rename it 'operation', 'op', etc.
  • data_common(): a minor point, but technically, "add" may not be 100% correct because it may be remove:
            # Add some proper data
            method(self.__rrset1)
            method(self.__rrset2)
    

comment:6 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:7 in reply to: ↑ 5 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

general

  • This is not directly related to this branch, but why does it import isc.dns, not pydnspp? Or, why do we have both isc.dns (as a forwarder module) and pydnspp in the first place? Maybe we should deprecate pydnspp and unify them into isc.dns?

I don't know. I just looked into some module around and found it is using isc.dns, so I used that too. It looks nicer to me, so I'm in favor of having just one name for the module, isc.dns is more consistent.

But if I should use pydnspp, I'm OK with that of course. Should I replace it?

  • Is the module name 'xfrin' appropriate? I guess we'll reuse (whole or part) of this code for dynamic update. We may also want to support other ways to change zone data (e.g. via bindctl), and it may also want to use this module. Perhaps we should eventually have a single isc.dns module (package) and make this module a sub module of it, like isc.dns.diff?

I'm not sure. I just needed some module and didn't want to spend too much time on deciding about the name, when we need it now. If you have a better name, I'll switch it.

  • Diff.init(): it creates an updater at initialization time. This is slightly different from the original BIND 9 behavior. In BIND 9, we delay starting the transaction when first apply() happens, thereby limiting the duration of the transaction even if the remote server is slowly feeding the data. If you thought that would be a premature optimization and intentionally skipped that, I'm okay with it; just checking the intent.

I don't know if it optimises anything at all, but that's not the reason why I put it directly to the constructor. I want to raise errors (like non-existant zone or the data source not implementing updates) as soon as possible. I'd either need to ask for the zone ID and delay the real creation for later, but that one looks more complicated and more work for computer as well, so I just did it this way.

  • Diff.compact(): in a future version we should probably treat RRSIGs that cover different RR types separately, just as BIND 9 does. I'm okay with deferring it as in practice we won't have to worry about the case, and in any event that wouldn't matter in our current sqlite3 schema (and it's the only writable data source for now).

ACK, let's leave it for later.

  • Diff.compact(): BIND 9 checks if the TTLs of the combined RRs are the same, and a warning log if they aren't. We should probably do the same. (and we need a test for that case)

I added the check and test. I'm not sure what prefix I should give to the logging messages, it may depend if we want to rename the library to anything.

  • Diff.add(), remove(): maybe we should check if the RR class is identical to that associated with the client (updater). hmm, BIND 9 doesn't check this at all, maybe trusting the caller. I know in BIND 9 the DNS message parser ensures all RR classes are the same, so it probably relies on that check. Either way is okay for me - but if we assume the RR class is valid within this module, this should be documented for the class pydoc.

OK, the check was added.

Thanks

comment:8 in reply to: ↑ 7 ; follow-up: Changed 8 years ago by jinmei

Replying to vorner:

general

  • This is not directly related to this branch, but why does it import isc.dns, not pydnspp? Or, why do we have both isc.dns (as a forwarder module) and pydnspp in the first place? Maybe we should deprecate pydnspp and unify them into isc.dns?

I don't know. I just looked into some module around and found it is using isc.dns, so I used that too. It looks nicer to me, so I'm in favor of having just one name for the module, isc.dns is more consistent.

But if I should use pydnspp, I'm OK with that of course. Should I replace it?

No, I don't think so. This is a larger topic, and not only for this
module anyway. Let's discuss it separately (outside of this ticket).

  • Is the module name 'xfrin' appropriate? I guess we'll reuse (whole or part) of this code for dynamic update. We may also want to support other ways to change zone data (e.g. via bindctl), and it may also want to use this module. Perhaps we should eventually have a single isc.dns module (package) and make this module a sub module of it, like isc.dns.diff?

I'm not sure. I just needed some module and didn't want to spend too much time on deciding about the name, when we need it now. If you have a better name, I'll switch it.

One possibility would be making it a submodule of the datasrc package:
isc.datasrc.diff (on second thought it would at least be better than a
submodule of isc.dns (or pydnspp) as the diff module depends on the
concept of datasrc). But it's not a well-baked thought, just an idea.
If you like it, please feel free to use it. If you don't or simply am
not sure, I'm okay with keeping the current module name.

  • Diff.init(): it creates an updater at initialization time. This is slightly different from the original BIND 9 behavior. In BIND 9, we delay starting the transaction when first apply() happens, thereby limiting the duration of the transaction even if the remote server is slowly feeding the data. If you thought that would be a premature optimization and intentionally skipped that, I'm okay with it; just checking the intent.

I don't know if it optimises anything at all, but that's not the reason why I put it directly to the constructor. I want to raise errors (like non-existant zone or the data source not implementing updates) as soon as possible. I'd either need to ask for the zone ID and delay the real creation for later, but that one looks more complicated and more work for computer as well, so I just did it this way.

In practice I suspect these errors normally don't occur in the context
where this class is used: the application (e.g. xfrin or dyn-update
handler) should have confirmed the zone exists in the corresponding
data source, and I guess the writability is ensured via the
configurations for these apps.

As for optimization, maybe it's not much. As long as the application
creates the Diff object on receiving a first IXFR response (or the
dynamic update request), the saved duration of transaction is just for
building the data locally (rather than the one involving network
delay), which would be negligible.

In the end, if you think the current way makes the implementation
simpler, I'm okay with it.

I have a couple of more comments about the revised code:

  • I'd add to the TTL log message something similar to BIND 9's log "adjusting %lu -> %lu". I'd also explain which TTL is overridden in the detailed version of the log description.
  • Related to this point, I'll test the resulting TTL as part of test_ttl (maybe for both cases of first TTL > second and first < second to see it's just about the ordering, not the smallest/largest)

comment:9 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:10 in reply to: ↑ 8 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

Replying to vorner:

I'm not sure. I just needed some module and didn't want to spend too much time on deciding about the name, when we need it now. If you have a better name, I'll switch it.

One possibility would be making it a submodule of the datasrc package:
isc.datasrc.diff (on second thought it would at least be better than a
submodule of isc.dns (or pydnspp) as the diff module depends on the
concept of datasrc). But it's not a well-baked thought, just an idea.
If you like it, please feel free to use it. If you don't or simply am
not sure, I'm okay with keeping the current module name.

I don't think I like submodule of datasrc for two reasons.

One is technical, as datasrc is C++ module and adding a python submodule is non-trivial (there are some rests of the original version of datasrc and we should get rid of them soon, the ugliness of the submoduling is one of the reasons).

Another one is the fact that the C++ version of module don't have the submodule. That might be confusing, as we try to make the versions close to each other, someone might try to find the diffs there.

So, can we wait until it gets shared with DDNS and then see how to name it?

I have a couple of more comments about the revised code:

  • I'd add to the TTL log message something similar to BIND 9's log "adjusting %lu -> %lu". I'd also explain which TTL is overridden in the detailed version of the log description.
  • Related to this point, I'll test the resulting TTL as part of test_ttl (maybe for both cases of first TTL > second and first < second to see it's just about the ordering, not the smallest/largest)

Added.

Thanks

comment:11 in reply to: ↑ 10 ; follow-up: Changed 8 years ago by jinmei

Replying to vorner:

One possibility would be making it a submodule of the datasrc package:
isc.datasrc.diff (on second thought it would at least be better than a
submodule of isc.dns (or pydnspp) as the diff module depends on the
concept of datasrc). But it's not a well-baked thought, just an idea.
If you like it, please feel free to use it. If you don't or simply am
not sure, I'm okay with keeping the current module name.

I don't think I like submodule of datasrc for two reasons.

One is technical, as datasrc is C++ module and adding a python submodule is non-trivial (there are some rests of the original version of datasrc and we should get rid of them soon, the ugliness of the submoduling is one of the reasons).

Another one is the fact that the C++ version of module don't have the submodule. That might be confusing, as we try to make the versions close to each other, someone might try to find the diffs there.

Hmm, I don't think sharing .so and .py modules in the same package is
in itself a problem (it's not uncommon for standard python libraries,
for example). But I missed the fact that datasrc.so is actually a
single module containing multiple classes. Having both a multi-class
module (datasrc.so) and a single class one (diff.py) may be confusing.
I also see datasrc.so might become a non-packaged module once we
remove the old library.

So, can we wait until it gets shared with DDNS and then see how to name it?

As a result, I'm okay with this. (If this module is really only for
xfrin, I'd rather move it to src/bin/xfrin, though). But please add
to the module pydoc (top of diff.cc) that we may revisit this
module-name point so we don't forget.

I have a couple of more comments about the revised code:

  • I'd add to the TTL log message something similar to BIND 9's log "adjusting %lu -> %lu". I'd also explain which TTL is overridden in the detailed version of the log description.
  • Related to this point, I'll test the resulting TTL as part of test_ttl (maybe for both cases of first TTL > second and first < second to see it's just about the ordering, not the smallest/largest)

Added.

I don't see an update in the branch. Have you pushed it?

comment:12 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:13 in reply to: ↑ 11 ; follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

So, can we wait until it gets shared with DDNS and then see how to name it?

As a result, I'm okay with this. (If this module is really only for
xfrin, I'd rather move it to src/bin/xfrin, though). But please add
to the module pydoc (top of diff.cc) that we may revisit this
module-name point so we don't forget.

Added.

I have a couple of more comments about the revised code:

  • I'd add to the TTL log message something similar to BIND 9's log "adjusting %lu -> %lu". I'd also explain which TTL is overridden in the detailed version of the log description.
  • Related to this point, I'll test the resulting TTL as part of test_ttl (maybe for both cases of first TTL > second and first < second to see it's just about the ordering, not the smallest/largest)

Added.

I don't see an update in the branch. Have you pushed it?

Yes, you're right, I forgot. Pushed now. After pushing, I noticed I have wrong numbers in the square brackets at commit messages. If you don't mind, I'd fix it by little branch rewriting during merge.

comment:14 in reply to: ↑ 13 Changed 8 years ago by jinmei

Replying to vorner:

All okay, please merge. I don't care about commit message mangling, btw.

comment:15 Changed 8 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:16 Changed 8 years ago by vorner

  • Resolution set to complete
  • Status changed from reviewing to closed

Thanks, closing

Note: See TracTickets for help on using tickets.