Opened 8 years ago

Closed 8 years ago

#1261 closed task (complete)

IXFR-in protocol handling

Reported by: jinmei Owned by: jelte
Priority: very high Milestone: Sprint-20111011
Component: xfrin Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

This is a subtask of #1212 and depends on #1258, #1259, and #1260.
But it should be possible to start working on it without waiting for
them to fully complete.

In this subtask we'll update the do_xfrin() function of the current
implementation:

            if ret == XFRIN_OK:
                logger.info(XFRIN_AXFR_TRANSFER_STARTED, self._zone_name)
                self._send_query(RRType.AXFR())
                isc.datasrc.sqlite3_ds.load(self._db_file, self._zone_name,
                                            self._handle_xfrin_response)

Unless IXFR is disabled, send IXFR query instead of AXFR, and instead
of doing the sqlite3 specific stuff, implement and perform a
completely separate code logic, where:

  • receive IXFR messages (until the xfrin process completes), parse it with PRESERVE_ORDER mode
  • handle each RR in the responses according to RFC1995. call ixfr_putdata() for each RR, and ixfr_commit() for each "difference" (a single set of deletes and adds for one SOA change). I also suggest checking BIND 9's lib/dns/xfrin.c:xfr_rr() to know any corner cases and hints for tests.
  • in this task we only handle real incremental transfer (i.e., non AXFR-compatible mode). The AXFR-style behavior will be deferred to a separate ticket (to be created)

Subtickets

Change History (14)

comment:1 Changed 8 years ago by jinmei

  • Owner set to jinmei
  • Status changed from new to accepted

comment:2 Changed 8 years ago by jinmei

trac1261 is ready for review.

It turned out to be a bit bigger than I originally expected (as a
hindsight we could have separated this task for the send-query part
and the handle-response part). I also needed to do some small
changes in various places (not only for xfrin) to make it really work.

My suggestion for the reviewer is:

  • first look at the diff for the "send query" part. It can be retrieved by 'git diff e4c78a2..38e530b' This is reasonably small and I believe it is quite straightforward.
  • commits for the Diff python lib (3bfaa40 and f6c675c) and commit for sqlite3 accessor (2de8b71) are not directly related to xfrin, but were necessary to make it work. See the commit logs for the rationale of the change. These can be reviewed separately.
  • The others are mainly for the "handle response" part (and a bunch of small cleanups).

As for the last part, it basically copies the current AXFR-only
implementation to (_handle_axfrin_response() - a bit renamed) to a
separate method (_handle_xfrin_responses()) and adjusts it to support
the IXFR protocol. As for the protocol handling, I basically ported
the BIND 9's xfrin logic (implemented in xfr_rr() and
xfrin_recv_done() of lib/dns/xfrin.c) in relatively straightforward
way, but using the state design pattern instead of the compound
switch-case structure. This decision may be debatable - I thought the
size of the original switch-case (with goto's) was already unreadable
enough and refactoring the code in a more object-oriented way made
sense here (and the abstraction overhead wouldn't matter much for
xfrin).

I don't think we need a changelog entry yet (we'll probably need one
when #1262 is completed).

comment:3 Changed 8 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from accepted to reviewing

comment:4 Changed 8 years ago by jelte

  • Owner changed from UnAssigned to jelte

comment:5 follow-up: Changed 8 years ago by vorner

Hello

I'm not going to review it, but I'm reading the code to know how to integrate #1262. Is it OK to commit the diff after each sequence? Shouldn't the whole transfer be atomic?

Thanks

comment:6 follow-up: Changed 8 years ago by jelte

  • Owner changed from jelte to jinmei

I took the liberty of correcting a few typos, see commit
cf136247fad510f55ba230f746558274fada1de6

Code looks good, i like the use of the state design pattern.

Just a few comments:

xfrin.py.in:

just a note about the doc on line 215:
handle_rr() technically, should it not be overridden, it returns None due to the pass in the default implementation. (i think it could even raise an exception if we don't expect the superclass to be called)

232: s/given/received/ ?

Wouldn't XfrinFirstData? fail (or rather, behave strangely and NOT fail) if the second RR in the response is a SOA with a different serial than we currently have?

Should we also make a note somewhere on which errors we should fall back to AXFR (and how such errors fall into the design pattern)?

I wonder (line 444), that if we find that we have a clearly inconsistent database, whether it is acceptable to just drop it and AXFR-in a fresh one.

Come to think of it, should the exception we raise when we don't have the zone or its SOA at all be a more specific one (so we don't need to fall back on any xfrinexception)? Technically this may not even be enough of an exceptional case to warrant an exception in the first place, though I don't feel to strongly about that.

xfrin.py.in:550 logstr is never used? leftover from old logging?

comment:7 in reply to: ↑ 5 Changed 8 years ago by jinmei

Replying to vorner:

I'm not going to review it, but I'm reading the code to know how to integrate #1262. Is it OK to commit the diff after each sequence? Shouldn't the whole transfer be atomic?

We (briefly) discussed this in the parent ticket (#1212) and we were
not sure about that.
http://bind10.isc.org/ticket/1212#comment:7

If I understand it correctly, this is what BIND 9 does, too. I posted
an email to bind-users asking for a clarification, but I've not got
any response:
https://lists.isc.org/pipermail/bind-users/2011-October/085161.html

As mentioned in that email message, this may not necessarily break
what RFC1995 says, depending on the interpretation of "all the
differences" and "older/newer version".

For now, I suggest keeping the current code and adding the following
note to the XfrinState? class documentation:

    Note that changes are committed for every "difference sequence"
    (i.e. changes for one SOA update).  This means when an IXFR response
    contains multiple difference sequences and something goes wrong
    after several commits, these changes have been published and visible
    to clients even if the IXFR session is subsequently aborted.
    It is not clear if this is valid in terms of the protocol specification.
    Section 4 of RFC 1995 states:

       An IXFR client, should only replace an older version with a newer
       version after all the differences have been successfully processed.

    If this "replacement" is for the changes of one difference sequence
    and "all the differences" mean the changes for that sequence, this
    implementation strictly follows what RFC states.  If this is for
    the entire IXFR response (that may contain multiple sequences),
    we should implement it with one big transaction and one final commit
    at the very end.

    For now, we implement it with multiple smaller commits for two
    reasons.  First, this is what BIND 9 does, and we generally port
    the implementation logic here.  BIND 9 has been supporting IXFR
    for many years, so the fact that it still behaves this way
    probably means it at least doesn't cause a severe operational
    problem in practice.  Second, especially because BIND 10 would
    often use a database backend, a larger transaction could cause an
    undesirable effects, e.g. suspending normal lookups for a longer
    period depending on the characteristics of the database.  Even if
    we find something wrong in a later sequeunce and abort the
    session, we can start another incremental update from what has
    been validated, or we can switch to AXFR to replace the zone
    completely.

comment:8 in reply to: ↑ 6 Changed 8 years ago by jinmei

Replying to jelte:

I took the liberty of correcting a few typos, see commit
cf136247fad510f55ba230f746558274fada1de6

Looks good, thanks.

Code looks good, i like the use of the state design pattern.

Just a few comments:

xfrin.py.in:

just a note about the doc on line 215:
handle_rr() technically, should it not be overridden, it returns None due to the pass in the default implementation. (i think it could even raise an exception if we don't expect the superclass to be called)

Yeah I actually thought that option, and now that you also pointed it
out, it would be a better idea. Revised it that way.

232: s/given/received/ ?

Okay, changed.

Wouldn't XfrinFirstData? fail (or rather, behave strangely and NOT fail) if the second RR in the response is a SOA with a different serial than we currently have?

Actually this is an interesting point, but in this context I believe
we can keep the current behavior. I've provided more details about this
point in the style of documentation. Please check it.

My personal take is to add a check in the not-yet-implemented AXFREnd
state about the SOA consistency and just leave a log message if they
mismatch (but accept it); or we could reject the transfer at the point.

Should we also make a note somewhere on which errors we should fall back to AXFR (and how such errors fall into the design pattern)?

I'd leave it to the "fall back task".

I wonder (line 444), that if we find that we have a clearly inconsistent database, whether it is acceptable to just drop it and AXFR-in a fresh one.

I'm not sure which revision of line 444 you're referring to:-), but are you
talking about this one?

            # Especially for database-based zones, a working zone may be in
            # a broken state where it has more than one SOA RR.  We proactively
            # check the condition and abort the xfr attempt if we identify it.
            if soa_rrset.get_rdata_count() != 1:
                raise XfrinException('Invalid number of SOA RRs for ' +
                                     self.zone_str() + ': ' +
                                     str(soa_rrset.get_rdata_count()))

If so, falling back to AXFR in this case could be an option. I think
this will be done at one level higher, maybe reporting the error to
the zone manager and let it decide the next action. In any case I'd
leave this to the "fallback ticket", too.

Come to think of it, should the exception we raise when we don't have the zone or its SOA at all be a more specific one (so we don't need to fall back on any xfrinexception)? Technically this may not even be enough of an exceptional case to warrant an exception in the first place, though I don't feel to strongly about that.

Yeah perhaps. I was thinking about something similar. We should probably
introduce an exception that indicates an IXFR failure that should (better)
be followed by a fall-back AXFR.

As for whether to use an exception or not, I don't have a strong
opinion. I heard it's actually "Pythononic" to use exceptions to
handle general error cases, rather than handling it as an error code, etc.

xfrin.py.in:550 logstr is never used? leftover from old logging?

Ah, good catch...actually it's not used even before this branch, but
it would be good to clean it up at this opportunity. So I did it.

comment:9 Changed 8 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:10 follow-up: Changed 8 years ago by jelte

  • Owner changed from jelte to jinmei

Ok, looks good.

If we leave the 'specific exception for axfr fallback issues' to the fallback ticket as well, the rest is good to go.

comment:11 in reply to: ↑ 10 Changed 8 years ago by jinmei

Replying to jelte:

Ok, looks good.

If we leave the 'specific exception for axfr fallback issues' to the fallback ticket as well, the rest is good to go.

Okay, thanks, I've also incorporated the additional documentation
I proposed in http://bind10.isc.org/ticket/1261?replyto=10#comment:7
and then merged.

In the merge attempt I made some last-minute updates that may not
be super trivial. I believe it was okay so I pushed them master,
but if you could make a quick double-check, that would be nider.

The diff can be seen by

git diff 95cbc4e c96f735 src/bin/xfrin/tests/xfrin_test.py src/lib/python/isc/xfrin/diff.py src/lib/python/isc/xfrin/tests/diff_tests.py

(Note: it contains some unrelated diffs derived from the master branch)

I'll open follow up tickets and then close this one.

comment:12 Changed 8 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:13 Changed 8 years ago by jinmei

I've created two follow up tickets: #1278 and #1279.

I also updated #1209 about the SOA mismatch for AXFR.

Now I'm going to close this ticket.

comment:14 Changed 8 years ago by jinmei

  • Resolution set to complete
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.