Opened 9 years ago

Closed 9 years ago

#495 closed enhancement (complete)

Hook up NSAS to iterator/cache

Reported by: shane Owned by: jelte
Priority: medium Milestone: R-Team-Sprint-20110316
Component: resolver Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: 3.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Hook up NSAS to iterator/cache

  • Update NSAS based on child response

Subtickets

Change History (7)

comment:1 Changed 9 years ago by jelte

  • Owner set to jelte
  • Status changed from new to assigned

comment:2 Changed 9 years ago by jelte

  • Owner changed from jelte to UnAssigned
  • Status changed from assigned to reviewing

it's alive! ALIVE!!!
(and ready for review)

notable changes that were needed for the actual implementation:

  • removed the 'dummy' asiolink from nsas (it even had a different api than the real one!)
  • removed a few unused parameters in asio (!) and asiolink
  • move recursive_query itself to lib/resolve
  • added NXRRSET result (name exists, but no data for this type) to the possible classifications in response_classifier
  • lots of changes in RunningQuery? (naturally, one would almost say)
  • added a 'fake prime' to resolver; just a hardcoded packet to initialize the cache with a root address

parent of the first commit is ba727c17d3517232a3c40fa3a30c6924a30ed7dc
current head is 87308eeddf767dea581b817d1d2ab2aaa4a99dd3

There are some open todo's, but I think we should have those as separate tasks (so as not to make this one bigger than it already is):

  • RecursiveQuery? and RunningQuery? (both of which could use a rename btw), get quite a lot of arguments, i think we should have something like an ResolverOptions? class for most of these.
  • General cleanup of RunningQuery?; mainly moving methods and names around (didn't do this now so you can still diff the old lib/asiolink/recursive_query.cc and the new lib/resolve/recursive_query and see what the actual changes are now)
  • Real priming (does not need to be extensive, just something that actually sends a priming query when needed and caches the result)
  • Add control and configuration options for the cache (cache_size, flush, etc)
  • This ticket add the use of the NSAS, but it isn't perfect yet i think (i.e. general recursive behaviour)
  • We need to define what exactly to do when in 'forwarder' mode
  • and of course glue correction, which is already a ticket

comment:3 Changed 9 years ago by stephen

  • Owner changed from UnAssigned to stephen

comment:4 follow-up: Changed 9 years ago by stephen

  • Owner changed from stephen to jelte

it's alive! ALIVE!!!

Congratulations Dr Frankenstein. Does this mean that instead of bug reports we'll get users carrying flaming torches hammering on the doors of 950 Charter Street?

src/bin/resolver/main.cc
Why is the try/catch commented out?

src/lib/asiolink/io_address.h
Do we need "#include <iostream>" here? There is nothing in the header file that requires it.

src/lib/nsas/nameserver_address_store.{cc,h}
In cancel(), the "callback" argument can be passed by reference.

src/lib/nsas/zone_entry.{cc,h}
removeCallback(): are we sure at this point that this is the only instance of this particular callback object in the list? (addCallback just pushes the callback on the end of the callbacks_[family] vector without checking whether it already exists in the list.)

The callback argument can be passed by reference.

src/lib/resolve/recursive_query.{cc,h}
Should rename the "#ifdef" variable in recursive_query.h to __RECURSIVE_QUERY_H

send(): should add a TODO to change the call to rand() to a call to a Boost PRNG (or a yet-to-be provided utilities "rand()" equivalent using Boost).

handleRecursiveAnswer(): suggest a TODO be added to the effect that cur_zone_ should be a "Name" object instead of a string. (I think you made a comment like this in another ticket and I agree - we should try to avoid conversions to strings if not required.)

RunningQuery constructor: it should be possible to initialize nsas_callback_ with a plain "new ResolverNSASCallback(this)" and not use an intermediate shared_ptr.

Comment. The outstanding events pointer is fine, but is there a way we could handle the reference-counting semi-automatically, e.g. passing a shared_ptr to the RunningQuery as an argument to the callbacks? Then the object should be automatically deleted when all callbacks have fired without the need to maintain our own count of outstanding events.

operator(): when calculating the elapsed time, do you need a test before doing the calculation? (Also, the test is wrong - the AND should be an OR.)

recursive_query_unittest.cc
The test_data variable encodes the count in the first two bytes of the sent data. As test_data is a byte array, the data sent should be in the order given. However, the first two bytes of a TCP message are interpreted as a 16-bit value and converted from network byte order to host byte order. Is there a chance that on some machines, the byte order will be flipped? (However, in the rest of the test the first two bytes are not interpreted as a value, so perhaps it doesn't matter.)

We should add some tests that check whether the cache was updated on successful queries (and that invalid replies were not added).

comment:5 in reply to: ↑ 4 Changed 9 years ago by jelte

  • Owner changed from jelte to stephen

Replying to stephen:

it's alive! ALIVE!!!

Congratulations Dr Frankenstein. Does this mean that instead of bug reports we'll get users carrying flaming torches hammering on the doors of 950 Charter Street?

as long as the angry mob goes there, i'm ok with it ;)

src/bin/resolver/main.cc
Why is the try/catch commented out?

arg, i was so happy i finally got it working i forgot to remove some changes i needed for debugging it... (on a side note, catch-alls like these, while nice-to-look-at from a user point of view, are really annoying if you want to know what is wrong, but i think i've ranted about that a couple of times before. This catch block is usually the first i remove when i need to fix something...)

src/lib/asiolink/io_address.h
Do we need "#include <iostream>" here? There is nothing in the header file that requires it.

nope, see previous point, forgot to remove it, sorry

src/lib/nsas/nameserver_address_store.{cc,h}
In cancel(), the "callback" argument can be passed by reference.

ack

src/lib/nsas/zone_entry.{cc,h}
removeCallback(): are we sure at this point that this is the only instance of this particular callback object in the list? (addCallback just pushes the callback on the end of the callbacks_[family] vector without checking whether it already exists in the list.)

oh ok, i'll let it loop through the whole vector

The callback argument can be passed by reference.

src/lib/resolve/recursive_query.{cc,h}
Should rename the "#ifdef" variable in recursive_query.h to __RECURSIVE_QUERY_H

done

send(): should add a TODO to change the call to rand() to a call to a Boost PRNG (or a yet-to-be provided utilities "rand()" equivalent using Boost).

added

handleRecursiveAnswer(): suggest a TODO be added to the effect that cur_zone_ should be a "Name" object instead of a string. (I think you made a comment like this in another ticket and I agree - we should try to avoid conversions to strings if not required.)

added (to the member declaration as well)

RunningQuery constructor: it should be possible to initialize nsas_callback_ with a plain "new ResolverNSASCallback(this)" and not use an intermediate shared_ptr.

ack, done

Comment. The outstanding events pointer is fine, but is there a way we could handle the reference-counting semi-automatically, e.g. passing a shared_ptr to the RunningQuery as an argument to the callbacks? Then the object should be automatically deleted when all callbacks have fired without the need to maintain our own count of outstanding events.

and then put the 'final handling' (i.e. call our own callback if not done yet, etc), in the destructor of runningquery? Interesting, I'll need to think about that.

operator(): when calculating the elapsed time, do you need a test before doing the calculation? (Also, the test is wrong - the AND should be an OR.)

I think so, as we don't want the difference to be negative or 0, which are results that are possible with gettimeofday. I think there are some clock-tick-dependent time functions that wouldn't have this property, but I don't think any of those are as portable.
AND should indeed be OR, good catch.

recursive_query_unittest.cc
The test_data variable encodes the count in the first two bytes of the sent data. As test_data is a byte array, the data sent should be in the order given. However, the first two bytes of a TCP message are interpreted as a 16-bit value and converted from network byte order to host byte order. Is there a chance that on some machines, the byte order will be flipped? (However, in the rest of the test the first two bytes are not interpreted as a value, so perhaps it doesn't matter.)

perhaps not here, but i was wondering this about our readUint16() method in buffer actually

We should add some tests that check whether the cache was updated on successful queries (and that invalid replies were not added).

I've added this as a TODO as well. But I'm only being half-lazy; We could add said check to the current disabled tests that do actual lookups, but even those don't work at this point (within the context of these tests, those lookups result in SERVFAIL), and I think we really do need that framework to even be able to test this right.

Updates in commit 454739d105be01d7b9711038ca68271d0f4fd02c

comment:6 Changed 9 years ago by stephen

  • Owner changed from stephen to jelte

operator(): when calculating the elapsed time, do you need a test before doing the calculation?
:

I think so, as we don't want the difference to be negative or 0, which are results that are possible with gettimeofday. I think there are some clock-tick-dependent time functions that wouldn't have this property, but I don't think any of those are as portable.

As both cur_time and current_ns_qsent_time are obtained via calls to gettimeofday() (which should provide a monotonically increasing time), we should not need to worry about a negative difference. It could be a zero, in which case a check after the calculation (the statement "rtt = std::max(1, rtt)"?) is required.

but i was wondering this about our readUint16() method in buffer actually

I've added a line to ResolverTestingBrainstorming about checking TCP communication between machines of different "endian-ness" to address this.

All OK, please merge.

comment:7 Changed 9 years ago by jelte

  • Resolution set to complete
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.