Opened 8 years ago

Closed 8 years ago

#1279 closed task (complete)

IXFR to AXFR fallback

Reported by: jinmei Owned by: vorner
Priority: medium Milestone: Sprint-20111108
Component: xfrin Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

This is a deferred task about IXFR-in.

We need to handle various cases of IXFR failure and subsequent
fallback to AXFR. See the discussion in #1261.

Note that there are several types of fallback:

  • fallback from IXFR to AXFR in the same TCP connection
  • fallback to AXFR after completing the IXFR session. This may be further broken down into two cases: immediate fallback after failure and scheduling a next xfr at the zonemgr indicating this should be AXFR.

See also BIND 9's xfrin_recv_done() (lib/dns/xfrin.c). It covers
some fallback scenario.

Subtickets

Change History (12)

comment:1 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20111025

comment:2 Changed 8 years ago by jreed

Also consider about using AXFR only when zone is not known. (I know that it is the "current" behaviour.) Maybe open a new ticket if that is not handled in this ticket.

comment:3 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 5

comment:4 Changed 8 years ago by vorner

  • Owner set to vorner
  • Status changed from new to accepted

comment:5 Changed 8 years ago by vorner

  • Owner changed from vorner to UnAssigned
  • Status changed from accepted to reviewing

Hello

While I see the differences between the various ways to fallback, I believe it's OK to implement just one.

I've chosen the one with creating a new connection immediately. I believe we need a new connection, as some server might just close the connection after answer for IXFR (I think BIND10 does that) and some servers might get confused if they don't know the IXFR. And I don't see any reason to wait and it would only complicate matters, as we would need to keep the state somewhere.

But the change contains a controversial topic. I don't care how or why the IXFR failed, if it failed, I just retry with AXFR. I do it for two reason. One is simplicity ‒ there's no need to write bunch of tests and handle a lot of different failure cases. Another one is, IXFR could fail in an unpredictable way (I've seen a bug in a server yesterday that returned SERVFAIL if it didn't have the differences stored). In many such cases, the AXFR could help, so I think it's worth at last trying (because if the AXFR fails as well, we wasted only few packets over one TCP connection).

As for the zone that is not known, I think it will fall back to AXFR after trying to initialize the IXFR and failing even before sending anything out. That isn't the most elegant way to do it, but I guess it's OK, as the external functionality is the same, except for little logging.

Anyway, one unrelated note. The process_xfrin function should perform some actions no matter if the transfer succeeds or fails (eg. decrementing the number of active transfers, notifying zone manager, etc), but if there's an unexpected exception, it does not. I think it should be put into some try-finally block. Should it be done right now when the code is being modified, or a new ticket created?

Thanks

comment:6 Changed 8 years ago by vorner

I also forkot to mention. Should we:

  • Enable IXFR by default for zones?
  • Enable/finish the fallback system tests? I think there were some.

comment:7 Changed 8 years ago by jinmei

Quick comment: as usual with cases with controversy, I suggest you check
what other implementations do. Thinking about details from the scratch
without prejudice is a good thing, but at the same time we should learn
from the lessons of others.

comment:8 Changed 8 years ago by jelte

  • Owner changed from UnAssigned to vorner

Initially I thought the method with the extra variable was a bit weird (and rather move the executing code to a separate call, and call that twice), but I suppose we need this to be able to only fall back if the initial connection() succeeded.

More generally, I think we may want to differentiate between errors that should result in fallback and errors that should not. However, I also think that this is 'good enough' as an approach and important enough to get in as it is.

I would like to see a comment at or near the retry variable and while loop as to why this construction is chosen. (oh and there is a typo, 'manu such cases', i'd have fixed it if i hadn't asked for a little extra comments :p)

As for tests, if we have them already, but disabled, then yes we should re-enable them. If we do not, they can be deferred of the IXFR system tests in the upcoming framework IMO.

comment:9 Changed 8 years ago by jelte

  • Milestone changed from Sprint-20111025 to Sprint-20111108

comment:10 Changed 8 years ago by vorner

  • Owner changed from vorner to jelte

I added the comment, merged master into it (because it had old version of systests) and solved some conflicts.

The system test is updated, but not enabled yet, as bind9 is not willing not to support IXFR at all. So I'd leave that one for now.

comment:11 Changed 8 years ago by jelte

  • Owner changed from jelte to vorner

ok, looks good. For the system test perhaps we can set something up with bind10 as the master with the stuff in #1290 :)

comment:12 Changed 8 years ago by vorner

  • Resolution set to complete
  • Status changed from reviewing to closed

As agreed on jabber, the system test will be postponed when the cucumber one is ready. So, merged.

Note: See TracTickets for help on using tickets.