Opened 7 years ago

Closed 7 years ago

#2179 closed enhancement (fixed)

Update Stats to support partial statistics updates

Reported by: naokikambe Owned by: naokikambe
Priority: medium Milestone: Sprint-20120904
Component: statistics Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 8 Add Hours to Ticket: 0
Total Hours: 37 Internal?: no

Description (last modified by naokikambe)

According to StatsModule#Partialstatisticsupdates, update the stats module to support partial statistics updates.

When the target module returns statistics data, the stats module should replace the existent one with the new one if it already has existent statistics data of the corresponding name. That is, the method update_statistics_data in stats.py should be updated.

This ticket depends on #2136. So this updating should be done based on the branch trac2136 if it hasn't merged with the master branch yet.

[Additional Description]

  • Update the stats module to update statistics items which each target module specifies. It doesn't need to discard unspecified item when each module replies. It needs to keep it.
  • Add tests so that it is clear that named_set type in a statistics spec is acceptable.
  • Update the stats module so that a configuration unique identifier, e.g. sss/ttt[i], is acceptable for using in a item name as an answer of the getstats command.

Subtickets

Change History (15)

comment:1 Changed 7 years ago by naokikambe

  • Owner set to naokikambe
  • Status changed from new to accepted

comment:2 Changed 7 years ago by shane

  • Milestone changed from New Tasks to StatsRedesign

comment:3 Changed 7 years ago by naokikambe

  • Owner changed from naokikambe to UnAssigned
  • Status changed from accepted to reviewing

The branch is ready for reviewing. I've created it based on the latest trac2136.
But since reviewing of trac2136 is going on, this branch might have to be revised due to the changes.
A proposed ChangeLog entry of this ticket is included in the branch.

comment:4 Changed 7 years ago by jelte

  • Milestone changed from StatsRedesign to Sprint-20120821

comment:5 Changed 7 years ago by naokikambe

  • Description modified (diff)

comment:6 follow-up: Changed 7 years ago by jinmei

I'm assuming it's better to review this ticket once #2136 is
completed.

BTW, I guess this is the stats counterpart of #2196. If so, I suspect
we should discuss what we should actually do about the "differential"
thing before proceeding to the actual implementation/review. As
briefly discussed in #2196 the proposed behavior doesn't make much
sense to me. Michal seems to have the same/similar concern.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 7 years ago by naokikambe

Replying to jinmei:

I'm assuming it's better to review this ticket once #2136 is
completed.

I agree.

BTW, I guess this is the stats counterpart of #2196. If so, I suspect
we should discuss what we should actually do about the "differential"
thing before proceeding to the actual implementation/review. As
briefly discussed in #2196 the proposed behavior doesn't make much
sense to me. Michal seems to have the same/similar concern.

Strictly speaking, #2179 isn't a counterpart of #2196. #2179 matches with StatsModule but #2196 doesn't. "differential" in #2179 might exactly mean "partial". The current stats module can accept a part of statistics items. Thus, the target module, e.g. auth module, doesn't always need to return complete statistics items.

"differential" in #2196 means real "differential". When the auth module returns a counter by the "getstats_delta" command, then it reset it to zero. The current stats module cannot accept such a relative value. However two commands were implemented on #2196. The first one "getstats" returns absolute values of full statistics items. The second one "getstats_delta" returns relative values of partial statistics items.

IMO, "getstats" of #2196 should return absolute values of partial statistics items. "getstats_delta" is not needed for the current stats module yet. Implementing the stats module for supporting a real "differential" update would be done after the end of Sept. because of the complexity.

BTW, #2158 uses a part of codes of #2179. #2158 cannot be merged unless #2179 is merged.

Regards,

comment:8 in reply to: ↑ 7 Changed 7 years ago by naokikambe

Replying to naokikambe:

Implementing the stats module for supporting a real "differential" update would be done after the end of Sept. because of the complexity.

This is not directly related to this ticket. For supporting a real "differential" update, I think we should introduce a new attribute into the statistics spec. For example, this case is that a "item_aggregation" attribute is introduced (Everyone has likes or dislikes about that naming:):

{ "item_name": "name",
  "item_type": "type",
  "item_optional": True | False,
  "item_default": "something",
  "item_aggregation": "cumulative" | "earliest" | "latest"
}

The value of "item_aggregation" can be "cumulative" or "earliest" or "latest". "cumulative" means that multiple values are summable. This case matches with a value returned from the getstats_delta command of #2196. "earliest" means that once the value is set then the value is never replaced with other. "latest" means that the value is always replaced with the newest one. If this attribute is omitted, the default behavior should be "cumulative". Because most of statistics items in BIND 10 are counters, and such counter values should be summed.

comment:9 Changed 7 years ago by jelte

  • Owner changed from UnAssigned to jelte

comment:10 follow-up: Changed 7 years ago by jelte

  • Owner changed from jelte to naokikambe

First of all, i'd be clear and talk about 'partial' rather than 'differential' data.

Changelog:

  • see above (differential->partial)
  • and a few language improvements; i propose:

The stats module now supports partial statistics updates. Each
module can return only statistics data which have been updated since
the last time it sent them to the stats module. The purpose of partial
updates is to reduce the amount of statistics data sent through the
message queue.

And a side note: it's easier to propose a changelog ticket here (on trac) rather than through the branch, as it will most certainly cause an unnecessary conflict (just add it when you merge), and you'll need to update it to get the commit id anyway.

But since it's there I have a comment, we use tabs instead of 8 spaces :)

stats.py.in:

in merge_oldnew(), I'd check whether the types are the same, and raise an error if they arent (i'm guessing you could have some pretty unexpected results if they are not, for some reason, as you then just drop the old).

Also, I wonder if the lists version is useful; if they are of different length, can you be sure that the shorter list is the same as the start of the longer list?

comment:11 in reply to: ↑ 10 Changed 7 years ago by naokikambe

  • Description modified (diff)
  • Summary changed from Update Stats to support differential statistics updates to Update Stats to support partial statistics updates

Hello,

Replying to jelte:

First of all, i'd be clear and talk about 'partial' rather than 'differential' data.
Changelog:

  • see above (differential->partial)
  • and a few language improvements; i propose:

...

And a side note: it's easier to propose a changelog ticket here (on trac) rather than through the branch, as it will most certainly cause an unnecessary conflict (just add it when you merge), and you'll need to update it to get the commit id anyway.

But since it's there I have a comment, we use tabs instead of 8 spaces :)

Thank you very much! That proposal is OK for me. I also updated the file on the repository for consistency like that. And I also updated the ticket summary and description and the wikipage StatsModule: differential -> partial.

stats.py.in:
in merge_oldnew(), I'd check whether the types are the same, and raise an error if they arent (i'm guessing you could have some pretty unexpected results if they are not, for some reason, as you then just drop the old).

I'm not sure we should be so pessimistic. It checks validity by validate_statistics() before it examines the values by merge_oldnew(). So the types of two values must be same at the time when invoking merge_oldnew(). Even if the types are different, such type can be str or float or int or bool or None. In that case, I think it can just replace the old with the new without any exception.

Also, I wonder if the lists version is useful; if they are of different length, can you be sure that the shorter list is the same as the start of the longer list?

I think this case is very rare. If the new list is longer than the old, it should just replace the old with the new. Otherwise, if the old list is longer than the new, values of the latter indexes of the old list shouldn't be discarded, I think. Because that partial update is supported and the existing values not specified in argument should be preserved. For example, if we set the following two values:

> old = [1, 2, 3, 4]
> new = [5, 6]

then, we would get the following merged value:

> stats.merge_oldnew(old, new)
> [5, 6, 3, 4]

Regards,

comment:12 Changed 7 years ago by naokikambe

  • Owner changed from naokikambe to jelte

comment:13 follow-ups: Changed 7 years ago by jelte

  • Owner changed from jelte to naokikambe

the thing i was worried about for lists is that if you have for instance
[ 1, 2, 3, 4, 5 ]
And you only want to update the last three, you cannot do it

So I guess in practice lists will probably always be sent completely (or until we have a standardized json-diff representation)

Ok on the other points, this can be merged (assuming all its dependencies are)

comment:14 in reply to: ↑ 13 Changed 7 years ago by naokikambe

Replying to jelte:

the thing i was worried about for lists is that if you have for instance
[ 1, 2, 3, 4, 5 ]
And you only want to update the last three, you cannot do it
So I guess in practice lists will probably always be sent completely (or until we have a standardized json-diff representation)

That's right, we cannot do that by using merge_old_new() if we update only the last three values.

For supporting updates of that type, for example, the following representation can be used:

{
   'foo[2]' : 1,
   'foo[3]' : 2,
   'foo[4]' : 3
}

Here, the above array is assumed to be assigned to the variable foo on a spec file. It internally does isc.cc.data.set() for each element of the above dictionary instead of doing stats.merge_oldnew().

Ok on the other points, this can be merged (assuming all its dependencies are)

Thank you for reviewing! I will merge later.

comment:15 in reply to: ↑ 13 Changed 7 years ago by naokikambe

  • Resolution set to fixed
  • Status changed from reviewing to closed
  • Total Hours changed from 0 to 37

Thanks for your reviewing. I've merged. I'm closing.

Note: See TracTickets for help on using tickets.