Opened 9 years ago

Closed 8 years ago

#509 closed task (complete)

Define required statistics counters

Reported by: stephen Owned by: y-aharen
Priority: medium Milestone: Sprint-20111122
Component: statistics Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 5.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Document what data should be collected

Subtickets

Change History (25)

comment:1 Changed 9 years ago by stephen

  • Milestone A-Team-Task-Backlog deleted

Milestone A-Team-Task-Backlog deleted

comment:2 Changed 9 years ago by y-aharen

  • Defect Severity set to N/A
  • Milestone set to New Tasks
  • Owner set to y-aharen
  • Status changed from new to accepted
  • Sub-Project set to DNS

I've created a wiki page which describes statistics items. It is ready for reviewing.

StatisticsItems

comment:3 Changed 9 years ago by y-aharen

  • Owner changed from y-aharen to UnAssigned
  • Status changed from accepted to reviewing

comment:4 Changed 9 years ago by shane

  • Milestone changed from New Tasks to Sprint-20110517

Moving this to the current sprint, so people can see that it is ready for review.

comment:5 Changed 9 years ago by stephen

  • Owner changed from UnAssigned to stephen

comment:6 Changed 9 years ago by stephen

  • Owner changed from stephen to UnAssigned

Can you add a key - what does "x" and "o" mean?

Requests
There are a group of items:

  • Authoritative queries rejected
  • Recursive queries rejected
  • Zone transfer requests rejected
  • Update queries rejected

... but there don't appear to be corresponding items for the number received (or accepted, although I think the total received would be an easier statistic to collect.)

Queries resulting in SERVFAIL, FORMERR etc. The IANA registry records RCODEs in the range 0-22 (with values 11-15 unassigned). Why not generalise the statistics and allow the retrieval of "Queries resulting in RCODE = n" with 24 bins (0 - 22 plus one for 'other').

Resolver
I'm not clear what "duplicated recursive queries" is.

"Dropped recursive queries". Presumably these are queries dropped because the server was too busy to handle them. In this case, should there be a similar statistic for the authoritative server ("Dropper authoritative queries")?

"NXDOMAIN/SERVFAILFORMERR/Other responses received" etc. As above, can this be generalised to allow for up to 24 RCODE values?

Clarification: I assume "Mismatch Responses Received" is when a reply is received for which the QNAME/QID does not match that of the question?

"Qeries aborted due to quota control". What is this?

"Failures opening query sockets" - this really belongs in the "Sockets" section of the table.

DNSSEC - it occurs to me that a useful statistic (at least for general interest) would be an indication of the algorithms used when decoding a key. At the very least it would give useful information as to the takeup of SHA256 over SHA1.

RTT frequency table. Question - are the RTT bins passed with the request, or are they hard-coded in the server?

Sockets
Number of sockets closed: presumably the idea here is that the difference between the number opened and the number closed gives the total still open?

I've put this back to "UnAssigned" review to encourage others to give comments.

comment:7 Changed 9 years ago by zhanglikun

  • Owner changed from UnAssigned to zhanglikun

comment:8 Changed 9 years ago by zhanglikun

  • Owner changed from zhanglikun to UnAssigned

The following is my comments, I will keep the ticket as unassigned, let the ticket get more comments or closed by next reviewer, :)

Resolver:

  • what's the mean of "Query timeouts"? average query timeouts?
  • Do we need to collect resolver cache hit rate? or it should be handled by other module?
  • for "TCP requests" in Requests, maybe we need to add it to resolver.

Sockets:

I don't know how will it be implemented, since the sockets belong to different modules, even though they can be created by the socketCreator.

Zones:
what's the meaning for "Number of SOA queries in progress"? To check how many zone tranfer will be triggered?

Msgq:
For the item: "Number of timed out messages", do we need to make a difference for recv and send timeout?

Cmdctl:
I get another two statistic items:

  • Last login time for one user.
  • Number of loginned user since cmdctl starts.
Last edited 9 years ago by zhanglikun (previous) (diff)

comment:9 Changed 9 years ago by jelte

Just some minor comments in addition to the ones above;

i'd also suggest queries with DO=1 in Requests.

btw there was a question about what 'dropped' means; it could either be because server is too busy, or because it couldn't resolve before timeout, in the case of the latter there are two timeouts we could keep statistics of

comment:10 Changed 9 years ago by jelte

Those two timeouts are 'client timeout' (when a SERVFAIL is sent back, while the resolver keeps on resolving so it can cache it for subsequent queries), and 'lookup timeout' (when it really gives up because it just takes too long).

comment:11 Changed 9 years ago by y-aharen

  • Owner changed from UnAssigned to y-aharen

comment:12 Changed 8 years ago by y-aharen

  • Owner changed from y-aharen to UnAssigned

Thank you for your comments. I've updated a wiki page. It is ready for reviewing.

I've moved socket statistics into each modules. We should decide how much we ensure compatibility with BIND 9 since the number of the items are bit large.

StatisticsItems

comment:13 Changed 8 years ago by zhanglikun

  • Owner changed from UnAssigned to zhanglikun

comment:14 Changed 8 years ago by jreed

I'd like to suggest that item names (as seen in bindctl etc), not use periods but use slashes instead. For example:

auth.[zonename].qtype.ipseckey

to

auth/[zonename]/qtype/ipseckey

comment:15 Changed 8 years ago by jreed

Also consider some delimiter that is not in a zone name. (slash may not be good either.) But for URL, a literal slash could be %2F (encoded).

comment:16 Changed 8 years ago by zhanglikun

  • Owner changed from zhanglikun to UnAssigned
  1. Make my f2f point clear here. we may have too many stats items for one module or zone, if we divides the stats items as (basic, all) types, it make the system running more quickly when it running with basic stats mode.
  1. we will add view to bind10, but I didn't see the view in stats items(just a suggestion for not forget it, :) )
  1. some possible stats items suggestion:

auth: served zone counts/names?

I don't think I have covered all, so set the ticket as unassigned, to wait for the possible comments.

comment:17 Changed 8 years ago by naokikambe

Trivial comment: We should be careful for XSS in stats httpd if it displays a part of URL requested by a user.

comment:18 Changed 8 years ago by jelte

  • Milestone changed from Sprint-20110927 to Sprint-20111011

comment:19 Changed 8 years ago by stephen

  • Owner changed from UnAssigned to stephen

comment:20 Changed 8 years ago by stephen

  • Owner changed from stephen to y-aharen

At the moment, there are three counters for incoming requests:

  • auth.[zonename].request.v{4|6}: Total number of requests over IPv4/6
  • auth.[zonename].request.tcp: Total number of TCP requests.

(The total number of UPD requests can be inferred from these counters.) Would it be better to alter the counters (and add another one) to:

  • auth.[zonename].request.udp{4|6}: Total number of UDP requests over IPv4/6
  • auth.[zonename].request.tcp{4|6}: Total number of TCP requests over IPv4/6.

The totals of requests for V4 and V6, as well as the total number of UDP and TCP requests can be found, but in addition the counters give a more detailed breakdown.

For the resolver, the counters resolver.rdtype.xxx (where xxx is a, ns, mx etc.) are described as the number of responses received. Is that responses received from upstream servers, or should the description read "queries received"?

In the resolver, the counters resolver.requestv4, resolver.requestv6 and resolver.reqtcp give the number of IPv4, IPv6 and TCP requests received. For the same reasons as above, would it be better to create four counters: resolver.reqtcp{4|6} and resolver.requdp{4|6}.

resolver.qryrecursion. Is this needed in BIND 10, given that the resolver and the authoritative server are now separate and so will have separate counters?

comment:21 Changed 8 years ago by jelte

  • Milestone changed from Sprint-20111011 to Sprint-20111025

comment:22 Changed 8 years ago by y-aharen

  • Status changed from reviewing to accepted

comment:23 Changed 8 years ago by jelte

  • Milestone changed from Sprint-20111025 to Sprint-20111108

comment:24 Changed 8 years ago by jelte

  • Milestone changed from Sprint-20111108 to Sprint-20111122

comment:25 Changed 8 years ago by jelte

  • Resolution set to complete
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.