Opened 8 years ago

Closed 8 years ago

#1553 closed defect (complete)

b10-resolver basic feature test in lettuce

Reported by: jreed Owned by: jelte
Priority: medium Milestone: Sprint-20120124
Component: resolver Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0.5 Internal?: no

Description

We should add a lettuce test to enable b10-resolver and to test it.

For this initial test, I think querying for an external answer is okay (later we can be more complex and run the auth servers too for it).

(This would be useful for the automated lettuce system to catch regressions, like seen in #1546.)

Subtickets

Change History (13)

comment:1 Changed 8 years ago by jelte

  • Milestone changed from New Tasks to Sprint-20120124

comment:2 Changed 8 years ago by jelte

  • Owner set to jelte
  • Status changed from new to assigned

comment:3 Changed 8 years ago by jelte

  • Owner changed from jelte to UnAssigned
  • Status changed from assigned to reviewing

Ready for review.

Since we (currently) have one 'built-in' record, and the goal for this test is mainly to prevent the recent regression where it didn't listen at all, the test shouldn't even try to contact the real world.

So the test is quite short, it just checks if querying for l.root-servers.net returns an answer.

One additional change; the 'query for' step in terrain/querying.py didn't accept '-' in labels, so I added that as well (if we want to test for weirder queries we can make it accept even more, but for now this should be enough, and it is quite easy to detect if you try to use the step and it doesn't recognize the domain name)

comment:4 Changed 8 years ago by vorner

  • Owner changed from UnAssigned to vorner

comment:5 Changed 8 years ago by vorner

  • Total Hours changed from 0 to 0.5

Hello

Looks OK, please merge.

comment:6 Changed 8 years ago by jinmei

Not looking at the code, but does (can) it involve queries outside the
test node? If so, I think it should be self contained. Considering
the purpose is to just confirm it's actually listening, I think it
suffices to set up all-reject ACL and check the response is REFUSED.

comment:7 follow-up: Changed 8 years ago by vorner

No, all the queries are local now, because the address of l root server is hardcoded (I'm sure about it, my provider blocks all DNS traffic except to his own servers and the test passed here). This'll not be always the case, but it works for now and we'll need some more tests when we do more on the resolver.

comment:8 in reply to: ↑ 7 Changed 8 years ago by jinmei

Replying to vorner:

No, all the queries are local now, because the address of l root server is hardcoded (I'm sure about it, my provider blocks all DNS traffic except to his own servers and the test passed here). This'll not be always the case, but it works for now and we'll need some more tests when we do more on the resolver.

But doesn't that necessarily mean it never sends queries outside? Or
is that what you mean by "not always the case"?
(Besides, this sounds like the resolver returns records from its
"hints" if all external servers are unreachable. I suspect it's the
wrong behavior - at least should be incompatible with BIND 9. But of
course that's a different matter).

My point is that (IMO) it should never send queries outside the test
node, especially should not spam a public root server. It's not
whether the test succeeds when the node is disconnected (which is
itself important though).

Hmm, lettuce seems to be completely broken in my environment now, so I
cannot test it myself.

And btw, I noticed one minor editorial nit, so I fixed it.

comment:9 follow-up: Changed 8 years ago by vorner

  • Owner changed from vorner to jelte

Hello

For what I know, the current bug is even worse. It just returns the hints it has right away and the hints are hardcoded (I tried it in ISC, where there wasn't any blocking of DNS and it returned the l root server only and it answers here right away, so it couldn't have sent & timed out queries). This is definitely wrong.

And it's what I mean it won't always be the case, the bug will be fixed sometime in future. When it happens, we need to change this test. But until then, I believe it is OK this way.

(And I forgot to reassign the ticket back to Jelte the first time)

comment:10 in reply to: ↑ 9 Changed 8 years ago by jinmei

Replying to vorner:

For what I know, the current bug is even worse. It just returns the hints it has right away and the hints are hardcoded (I tried it in ISC, where there wasn't any blocking of DNS and it returned the l root server only and it answers here right away, so it couldn't have sent & timed out queries). This is definitely wrong.

And it's what I mean it won't always be the case, the bug will be fixed sometime in future. When it happens, we need to change this test. But until then, I believe it is OK this way.

Okay. While I'd rather make it more reliable from the beginning
e.g. by using all-reject ACL, which shouldn't be hard, I'd leave the
decision to the original developer and reviewer as long as it never
leaks the test query outside the test node.

comment:11 Changed 8 years ago by jelte

Just for clarity, right now it actually works (as in, is contained) because we are not doing priming right; we are not using root hints as a base for priming, but setting the address of a root server permanently in our cache; if we implement priming (one of the first tasks to do when we continue work on the resolver, imo), this test *would* send out queries (simply for starting a resolver). We will then also need to make priming so that we can configure the root hints and set it to localhost where we configure a fake root server as well (but currently, as we don't do real priming at all, this isn't possible).

What is running is an actual resolver, and any queries not for l.root-servers.net (btw, as a trivial task, shall we change it to f?) sent to it would result in actual queries. But it is configured to only listen on localhost port 47806, so I don't see this as a problem.

btw, I've also confirmed this experimentally; I ran this in an environment without any network connectivity, and one with network but with running. tcpdump reported no network activity, and as vorner already confirmed, without (dns) networking, the test succeeds as well.

comment:12 Changed 8 years ago by jelte

The suggestion to set the ACL did give me an idea for an additional test; this has been a past regression as well, so I am still convinced we need to do a 'real' query and see that NOERROR is returned.

So I've changed it a bit; initially, the server is started with acl REJECT, and it tests to see if REFUSED is the rcode. Then it sets it to ACCEPT, queries, checks NOERROR, and on success immediately sets REJECT again.

This not only checks whether acls work, but also reduces the window in which is is acting on queries.

comment:13 Changed 8 years ago by jelte

  • Resolution set to complete
  • Status changed from reviewing to closed

Merged, closing ticket

Note: See TracTickets for help on using tickets.