#5589 closed defect (complete)

Implement StatLib, a new, non-premium hook library

Reported by: tmark Owned by: tmark
Priority: medium Milestone: Kea1.4
Component: hooks Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket: Shared Lease Statistics
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Implement the new hook library described in section 4.2 of the Shared Lease Stats Design (http://kea.isc.org/wiki/SharedLeaseStorageStats).

(Depends on #5585)

Subtickets

Change History (10)

comment:1 Changed 19 months ago by tmark

  • Milestone changed from Kea-proposed to Kea1.4

comment:2 Changed 18 months ago by tmark

  • Owner set to tmark
  • Status changed from new to assigned

comment:3 Changed 18 months ago by tmark

  • Owner changed from tmark to UnAssigned
  • Status changed from assigned to reviewing

New library is complete, including developer and guide documentation.

One thing to note, I did not implement the "from-storage" input parameter mentioned in the design. After doing the Memfile implementation I decided that was practical value added.

ChangeLog:

13xx.   [bug]       tmark
    A new hooks library, Stat Cmds, has been added to the open source
    distribution.  This library provides commands for fetching lease
    allocation statistics using lease backend as the source for
    lease counts per state.  This resolves an issue in deployements, 
    where multiple Kea servers share a common lease backend, which made
    it difficult to obtain accurate lease statistics. The commands
    also provide the ability to get the statics for all subnets, per
    subnet, or subnet range.
    (Trac #5589, git TBD)

comment:4 Changed 17 months ago by marcin

  • Owner changed from UnAssigned to marcin

comment:5 follow-up: Changed 17 months ago by marcin

  • Owner changed from marcin to tmark

Reviewed up to commit 7614f221ea7a797bcbd4f938287ac6d3b4bbf990

The code builds and the unit tests pass for me on macOS 10.13.4.

Because the ChangeLog is long, I suggest that you drop the last sentence from it, as it is least important and goes into too much details.

Also, this is not a "bug" category of ChangeLog as it says.

I applied some changes to the Guide and to one of the .cc files so please pull.

doc/guide/hooks-stat-cmds.xml
I have applied numerous changes to this file, mostly correcting XML structure, rather than the text itself. Nevertheless, there are some little text corrections too.

Shouldn't User's Guide mention the name of the new logger added with stats library? Otherwise people won't know how to configure this logger.

Can you provide an example of the shared backend setup immediatelly following this sentence?

These commands were added to address an issue with obtaining accurate lease statistics in deployments running multiple Kea servers that use shared lease back end

Maybe something like this: ", e.g. two DHCP server instances pointing at the same MySQL lease database".

In the following example: "The following command would fetch lease statistcis for all known subnets from kea-dhcp4 server:"

is this required to specify empty "arguments" parameter or it may be skipped altogether?

You're listing response codes to the commands, i.e. "0 - command successful, 1 - an error ...." etc. First of all, it may not be neccessary to provide such list because it is common set of result codes for almost any commmand. If we add additional status codes you then have to remember to update this list too.

If you still think that it should remain, you need to fix error code for "no matching data" case, because it is 3 rather than 2. The code of 2 indicates that the command is not supported, which may also be needed on your list in case the library fails to load for any reason.

The desription of the "assigned-nas" and the "assigned-pds" says that it is a "REQUEST" case when the statistics is bumped up as a result of new lease allocation and that it is decreased in the "RELEASE" case. It is tricky in the DHCPv6 case because you may get new allocations as a result Renew/Rebind? (additional addresses allocated to additional IAs). You may also get new allocations for Solicit with Rapid Commit option. Finally, you may get "releases" as a result of Renew and Rebind as well. So, maybe just not go into too much detail here but say that it is bumped up for new allocations and decreased whenever a lease is removed.

Regarding the timestamp:

"timestamp": "2018-05-04 15:03:37.000000"

Are you actually going to use micro or milli seconds precision? Or it is always rounded to full seconds?

I am asking because in HTTP and HA I am using time formats per RFC 1123. Now I wonder if millisecond precision wouldn't be better for HA, however this is a minor point right now because the timestamps in HA are rather used intenally. Nevertheless, it may be good to unify the timestamp formats across various commands.

There seem to be a copy-paste error in the DHCPv6 response example. It contains the same result set as in the DHCPv4 case but the number of columns is larger (6 rather than 4).

src/hooks/dhcp/stat_cmds/stat_cmds.cc
If the NotFound exception and the LeaseStatCmdsImpl are used only within this file, they could be enclosed within an anonymous namespace.

createResultSet: provided you're using seconds (rather than microsec) clock for generating the timestamp the example provdided in the description of this function is probably wrong, as it contains non-zero decimal values.

getParameters: shouldn't you check that the subnet-id and subnet range are mutually exclusive?

Also, wouldn't it be better to allow specifying no arguments (for cases when people want stats for all subnets) rather than complaining about missing parameters?

src/hooks/dhcp/stat_cmds/stat_cmds.dox

src/hooks/dhcp/stat_cmds/stat_cmds.h
The copyright header is probably incorrect as it includes 2017 for a new file.

There seems to be no reason to include "cc/data.h" in this file.

I suggest using "statistics" instead of "stat" in the comments as it may not be immediatelly clear for everyone what "stat" is.

I am not very fond of providing command examples which do not include actual data, like "subnet-id": x, because some people may not be able to understand what type of data "x" is. If nothing else, the comment could explain that x must be a positive integer. It could also be something like "< int >" notation as it is for the message.

The description of the StatCmds constructor is wrong. It does nothing because individual functions create an instance of LeaseStatsCmdsImpl every time they are called.

Can you remove the impl_ declaration from the header file? If it gets to be needed at some point, you can always add it back.

src/hooks/dhcp/stat_cmds/stat_cmds_callouts.cc
I believe that the following statements may be harmlessly removed:

using namespace isc::config;
using namespace isc::data;

src/hooks/dhcp/stat_cmds/stat_cmds_messages.mes
I can hardly match the argument of STAT_LEASE4_GET and STAT_LEASE4_GET_FAILED (same for v6) with the actual text being provided to those log messages. What are the parameters in that case?

Examples of log messages I captured:

STAT_LEASE4_GET_FAILED stat-lease4-get command failed (parameters: malformed command, reason: 'subnet-id' parameter must be > 0)
STAT_LEASE4_GET stat-lease4-get command successful (parameters: stat-lease4-get: 5 rows found)

The log message doesn't make sense to me. Did you intend to say "result" instead?

Also, why aren't they prefixed with STAT_CMDS?

src/hooks/dhcp/stat_cmds/tests/Makefile.am
I am pretty sure you should link unit tests with the same set of libs as the tested code, because some linkers may report missing symbols otherwise.

src/hooks/dhcp/stat_cmds/tests/run_unittests.cc
The copyright date is wrong.

src/hooks/dhcp/stat_cmds/tests/stat_cmds_unittest.cc
Providing tests which load and unload libraries is usually a bad idea. I think that Francis had some clue how to test basic operations like loading and unloading the library in some other "framework". However, loading a library within a unit test would cause problems with logging initialization and deinitialization. That's because there will be two loggers existing at the same time, one loaded by the library, another loaded by the unit tests. They both attempt to register the same set of log messages and, when the library gets unloaded, it also removes the log messages (which are common for it and unit test). The net result is that subsequent unit test will complain about missing placeholders for the provided logging arguments, because they won't be aware of the messages being logged (and the corresponding placeholders).

What I am trying to do to maximize the code coverage by unit test is to simply test the "implementation" class, e.g. LeaseStatCmds directly, rather than load the hooks library.

I note that all your unit tests are currently loading the library which is preventing the problem described above. However, if you ever add any classes (which perform logging) and you want to test them directly that problem will occur. So, because it is currently harmless you may just apply a @todo in the tests code to say that when new classes are tested directly, the existing tests should be modified to not load the library. But, I still think it would be better to avoid loading the library in the new code, then come up with generic testing approach to be used for all libs.

addPrefix function has not description.

Extraneous using namespace isc::dhcp in line 588.

comment:6 in reply to: ↑ 5 Changed 17 months ago by tmark

  • Owner changed from tmark to marcin

This was a very thorough review, thank you.

Replying to marcin:

Reviewed up to commit 7614f221ea7a797bcbd4f938287ac6d3b4bbf990

The code builds and the unit tests pass for me on macOS 10.13.4.

Because the ChangeLog is long, I suggest that you drop the last sentence from it, as it is least important and goes into too much details.

Ok.

Also, this is not a "bug" category of ChangeLog as it says.

Tomek says to make it just [func].

I applied some changes to the Guide and to one of the .cc files so please pull.

Thanks

doc/guide/hooks-stat-cmds.xml
I have applied numerous changes to this file, mostly correcting XML structure, rather than the text itself. Nevertheless, there are some little text corrections too.

Ok.

Shouldn't User's Guide mention the name of the new logger added with stats library? Otherwise people won't know how to configure this logger.

I could not find where any of the other hooks libraries mentioned their loggers,
though they should. I added it to the "Currently defined loggers" section of the
main guide. I also changed the logger name to use hyphens not underscores as that
seems to be the proper naming for them.

Can you provide an example of the shared backend setup immediatelly following this sentence?

These commands were added to address an issue with obtaining accurate lease statistics in deployments running multiple Kea servers that use shared lease back end

Maybe something like this: ", e.g. two DHCP server instances pointing at the same MySQL lease database".

I added it but a little further down.

In the following example: "The following command would fetch lease statistcis for all known subnets from kea-dhcp4 server:"

is this required to specify empty "arguments" parameter or it may be skipped altogether?

Good catch. I did not really mean for it to be required. I altered the parameter parsing
so it can be omitted entirely and added tests for it. I also removed the empty arguments
paramter from the example in the doc.

You're listing response codes to the commands, i.e. "0 - command successful, 1 - an error ...." etc. First of all, it may not be neccessary to provide such list because it is common set of result codes for almost any commmand. If we add additional status codes you then have to remember to update this list too.

If you still think that it should remain, you need to fix error code for "no matching data" case, because it is 3 rather than 2. The code of 2 indicates that the command is not supported, which may also be needed on your list in case the library fails to load for any reason.

The desription of the "assigned-nas" and the "assigned-pds" says that it is a "REQUEST" case when the statistics is bumped up as a result of new lease allocation and that it is decreased in the "RELEASE" case. It is tricky in the DHCPv6 case because you may get new allocations as a result Renew/Rebind? (additional addresses allocated to additional IAs). You may also get new allocations for Solicit with Rapid Commit option. Finally, you may get "releases" as a result of Renew and Rebind as well. So, maybe just not go into too much detail here but say that it is bumped up for new allocations and decreased whenever a lease is removed.

I lifted these directly from the main admin guide section on statistics for v6. Rather
than attempt get into a long discussion of exactly how the values change I opted to
shorten them a bit.

Regarding the timestamp:

"timestamp": "2018-05-04 15:03:37.000000"

Are you actually going to use micro or milli seconds precision? Or it is always rounded to full seconds?

The format here is identical to how Stats Manager outputs stat times:

tmark@cserver socat $ cat stat.txt | sudo socat - UNIX:/tmp/kea-dhcp4 | python -m json.tool
{
    "arguments": {
        "declined-addresses": [
            [
                0,
                "2018-05-15 09:58:37.378110"
            ]
        ],
        "reclaimed-declined-addresses": [
            [
                0,
                "2018-05-15 09:58:37.378118"
            ]
        ],
        "reclaimed-leases": [
            :

I am asking because in HTTP and HA I am using time formats per RFC 1123. Now I wonder if millisecond precision wouldn't be better for HA, however this is a minor point right now because the timestamps in HA are rather used intenally. Nevertheless, it may be good to unify the timestamp formats across various commands.

There seem to be a copy-paste error in the DHCPv6 response example. It contains the same result set as in the DHCPv4 case but the number of columns is larger (6 rather than 4).

Oops. Spruced it up a bit.

src/hooks/dhcp/stat_cmds/stat_cmds.cc
If the NotFound exception and the LeaseStatCmdsImpl are used only within this file, they could be enclosed within an anonymous namespace.

Meh.

createResultSet: provided you're using seconds (rather than microsec) clock for generating the timestamp the example provdided in the description of this function is probably wrong, as it contains non-zero decimal values.

I've changed it to use the mircosec.

getParameters: shouldn't you check that the subnet-id and subnet range are mutually exclusive?

It does test for this, look closer. I did neglect to unit test scenarios for that. I
have added them.

Also, wouldn't it be better to allow specifying no arguments (for cases when people want stats for all subnets) rather than complaining about missing parameters?

See above changes.

src/hooks/dhcp/stat_cmds/stat_cmds.dox

src/hooks/dhcp/stat_cmds/stat_cmds.h
The copyright header is probably incorrect as it includes 2017 for a new file.

fixed.

There seems to be no reason to include "cc/data.h" in this file.

I suggest using "statistics" instead of "stat" in the comments as it may not be immediatelly clear for everyone what "stat" is.

I am not very fond of providing command examples which do not include actual data, like "subnet-id": x, because some people may not be able to understand what type of data "x" is. If nothing else, the comment could explain that x must be a positive integer. It could also be something like "< int >" notation as it is for the message.

I changed them to numbers.

The description of the StatCmds constructor is wrong. It does nothing because individual functions create an instance of LeaseStatsCmdsImpl every time they are called.

Can you remove the impl_ declaration from the header file? If it gets to be needed at some point, you can always add it back.

cleaned ctor and impl_.

src/hooks/dhcp/stat_cmds/stat_cmds_callouts.cc
I believe that the following statements may be harmlessly removed:

using namespace isc::config;
using namespace isc::data;

Got em.

src/hooks/dhcp/stat_cmds/stat_cmds_messages.mes
I can hardly match the argument of STAT_LEASE4_GET and STAT_LEASE4_GET_FAILED (same for v6) with the actual text being provided to those log messages. What are the parameters in that case?

Examples of log messages I captured:

STAT_LEASE4_GET_FAILED stat-lease4-get command failed (parameters: malformed command, reason: 'subnet-id' parameter must be > 0)
STAT_LEASE4_GET stat-lease4-get command successful (parameters: stat-lease4-get: 5 rows found)

The log message doesn't make sense to me. Did you intend to say "result" instead?

I have revamped these with my apologies. This was stuff I initially tweaked from lease_cmds and then forgot to go back and clean up. This required introducing a couple of new
messages and reworking the handlers a bit.

Also, why aren't they prefixed with STAT_CMDS?

They are now.

src/hooks/dhcp/stat_cmds/tests/Makefile.am
I am pretty sure you should link unit tests with the same set of libs as the tested code, because some linkers may report missing symbols otherwise.

src/hooks/dhcp/stat_cmds/tests/run_unittests.cc
The copyright date is wrong.

Fixed.

src/hooks/dhcp/stat_cmds/tests/stat_cmds_unittest.cc
Providing tests which load and unload libraries is usually a bad idea. I think that Francis had some clue how to test basic operations like loading and unloading the library in some other "framework". However, loading a library within a unit test would cause problems with logging initialization and deinitialization. That's because there will be two loggers existing at the same time, one loaded by the library, another loaded by the unit tests. They both attempt to register the same set of log messages and, when the library gets unloaded, it also removes the log messages (which are common for it and unit test). The net result is that subsequent unit test will complain about missing placeholders for the provided logging arguments, because they won't be aware of the messages being logged (and the corresponding placeholders).

What I am trying to do to maximize the code coverage by unit test is to simply test the "implementation" class, e.g. LeaseStatCmds directly, rather than load the hooks library.

I note that all your unit tests are currently loading the library which is preventing the problem described above. However, if you ever add any classes (which perform logging) and you want to test them directly that problem will occur. So, because it is currently harmless you may just apply a @todo in the tests code to say that when new classes are tested directly, the existing tests should be modified to not load the library. But, I still think it would be better to avoid loading the library in the new code, then come up with generic testing approach to be used for all libs.

I have opted for the todo at this point. Given our current schedule and doing what you suggest would require a pretty big
refactor I'm going to defer it.

addPrefix function has not description.

Fixed.
Done

Extraneous using namespace isc::dhcp in line 588.

Gone.

Ticket is ready for review.

comment:7 follow-up: Changed 17 months ago by marcin

  • Owner changed from marcin to tmark

doc/guide/hooks-stat-cmds.xml
I fixed a typo in this doc, so please pull my change.

You still haven't addressed the issue with listing error codes returned in response to stat commands. The status code of 2 means "unsupported command" rather than "empty response". See my previous review.

You've added this this logger: kea-dhcp4.stat-cmds-hooks but there is no corresponding logger for v6 added in the doc.

src/hooks/dhcp/stat_cmds/stat_cmds_messages.mes
Minor point, but why did you rename: STAT_CMDS_DEINIT_OK to STAT_CMDS_CMDS_DEINIT_OK ? Wasn't the former better because shorter?

The "Extraneous using namespace isc::dhcp" is still there in line 594.

comment:8 in reply to: ↑ 7 Changed 17 months ago by tmark

  • Owner changed from tmark to marcin

Replying to marcin:

doc/guide/hooks-stat-cmds.xml
I fixed a typo in this doc, so please pull my change.

Thanks

You still haven't addressed the issue with listing error codes returned in response to stat commands. The status code of 2 means "unsupported command" rather than "empty response". See my previous review.

Dang it. You complained about so many things, it's no wonder I missed a few. ;)

You've added this this logger: kea-dhcp4.stat-cmds-hooks but there is no corresponding logger for v6 added in the doc.

src/hooks/dhcp/stat_cmds/stat_cmds_messages.mes
Minor point, but why did you rename: STAT_CMDS_DEINIT_OK to STAT_CMDS_CMDS_DEINIT_OK ? Wasn't the former better because shorter?

Oops. Fixed.

The "Extraneous using namespace isc::dhcp" is still there in line 594.

I know I deleted it once. Apparently that line had the sticky bit set.

Please re-review.

comment:9 Changed 17 months ago by marcin

  • Owner changed from marcin to tmark

I revewied commit 71131063fd0eed2967348ffa911eea51cf6ea49d. The changes look good. Go ahead and merge it.

comment:10 Changed 17 months ago by tmark

  • Resolution set to complete
  • Status changed from reviewing to closed

Changes merged with git# 36f20f1c8b28f629fe2896b817ac0f3c6026fe0e
Added ChangeLog? entry 1400.

ticket is complete.

Note: See TracTickets for help on using tickets.