Opened 8 years ago

Closed 8 years ago

#2042 closed enhancement (complete)

Prototype backend performance microbenchmark: MySQL

Reported by: tomek Owned by: tomek
Priority: medium Milestone: Sprint-DHCP-20120903
Component: database-all Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Goal of this task is to develop a simple backend prototype that will measure performance of typical DHCP database operation:

  • write X leases
  • search a lease in X leases Y times
  • delete Z leases out of X leases

This ticket is about backend that will use MySQL. See #2040 and #2041 for corresponding work with other backends.

Subtickets

Change History (15)

comment:1 Changed 8 years ago by tomek

  • Component changed from Unclassified to dhcpdb
  • Milestone changed from New Tasks to Sprint-DHCP-20120703
  • Sub-Project changed from DNS to DHCP

comment:2 Changed 8 years ago by tomek

  • Type changed from defect to enhancement

comment:3 Changed 8 years ago by tomek

  • Owner set to tomek
  • Status changed from new to accepted

comment:4 Changed 8 years ago by tomek

  • Owner changed from tomek to UnAssigned
  • Status changed from accepted to reviewing

MySQL benchmark has been implemented. As this is not the production code and is expected to be used only internally, it does not feature tests or extensive comments.

Please review.

comment:5 Changed 8 years ago by tomek

Reviewer can review trac2040 branch that includes all changes from tickets 2041 and 2042.

comment:6 Changed 8 years ago by stephen

  • Owner changed from UnAssigned to stephen

comment:7 follow-ups: Changed 8 years ago by stephen

  • Owner changed from stephen to tomek

Reviewed commit 6b80e1e57145e8b0796d872f51ca39a0f4bbc17a

Note: a number of small changes (mainly to comments and word spacing) have been made and the results pushed to the repository.

General
If these benchmarks are going to form part of the BIND-10 distribution, then the source files do need comments and the methods need unit tests. (The absence of unit tests was agreed on the assumption that they were not going to be distributed.)

It is OK that they do not form part of the overall BIND 10 build - they are completely separate and MySql is not likely to be installed on all systems.

The most important change needed to get realistic results is to try prepared statements in the database measurements.

tests/tools/dhcp-ubench/Makefile
When looking at it, my editor complained about the lack of a newline at the end of the file, so I added one. I've also corrected a couple of minor errors concerning which flags are on which command line.

tests/tools/dhcp-ubench/README
OK, but I think there could perhaps be a fuller explanation if this will be part of the distributed code.

tests/tools/dhcp-ubench/benchmark.cc
When parsing an integer, suggest that boost::lexical_cast is used. The problem with strtol() is that it ignores trailing invalid characters in a number (although they can be detected by checking the value returned in the endptr variable).

When parsing command switches 's' and 'v', the "if" block could be replaced by:

<variable> = !strcasecmp(optarg, "yes") || !strcmp(optarg, "1");

There is no need for the "case ':'" immediately before "default" - the latter covers it.

tests/tools/dhcp-ubench/benchmark.h
Member variable names should start with a lower-case letter.

Variable "ts" should be named "ts_" as it is a member variable.

tests/tools/dhcp-ubench/dhcp-perf-guide.{html,xml}
Good start. We'll do an editing pass on it once there is more there.

tests/tools/dhcp-ubench/memfile_ubench.cc
Member variable names should start with a lower-case letter.

tests/tools/dhcp-ubench/memfile_ubench.h
Member variable names should start with a lower-case letter.

Members of Lease4 structure should include inline comments.

getLease(): "if" block should be enclosed in braces.

tests/tools/dhcp-ubench/mysql_ubench.cc
The variables initialized in createLease4Test() look a lot like those in the memory benchmark: they could perhaps be member variables in the base class and initialized in the constructor.

The benchmark does not make optimum use of MySql. Creating an SQL statement via sprintf() and and passing it to mysql_real_query() incurs overhead in formatting the binary data, as well as overhead in repeatedly parsing the statement and preparing the execution plan. Faster is to use mysql_stmt_init() and mysql_stmt_prepare() to prepare the statement, mysql_stmt_bind_param() to bind parameters to it and mysql_stmt_execute() to execute it. The first two calls need only be done once for the loop. See http://dev.mysql.com/doc/refman/5.5/en/c-api-prepared-statements.html for more details.

hitRatio is 0.5 for the memory benchmark and 0.9 for the MySql one. To eliminate some uncertainty in the comparison, they should be the same for each. (In fact, the overall logic of the benchmark could be in the base class, with the access to the underlying database abstracted into methods in the database specialization classes).

tests/tools/dhcp-ubench/mysql_ubench.h
Member variable names should start with a lower-case letter.

tests/tools/dhcp-ubench/sqlite_ubench.cc
Same point concerning the use of prepared statements made above applies here: see http://www.sqlite.org/c3ref/prepare.html.

Also applicable here are the points made in the mySql about createLease4() data initializations and the hitRatio value.

tests/tools/dhcp-ubench/sqlite_ubench.h
Member variable names should start with a lower-case letter.

comment:8 in reply to: ↑ 7 Changed 8 years ago by tomek

Replying to stephen:

Reviewed commit 6b80e1e57145e8b0796d872f51ca39a0f4bbc17a

Note: a number of small changes (mainly to comments and word spacing) have been made and the results pushed to the repository.

Thank you.


General
If these benchmarks are going to form part of the BIND-10 distribution, then the source files do need comments and the methods need unit tests. (The absence of unit tests was agreed on the assumption that they were not going to be distributed.)

I take it as a suggestion to write missing comments. The code is now properly documented.

It is OK that they do not form part of the overall BIND 10 build - they are completely separate and MySql is not likely to be installed on all systems.

Ok.

The most important change needed to get realistic results is to try prepared statements in the database measurements.

Still TBD.

tests/tools/dhcp-ubench/Makefile
When looking at it, my editor complained about the lack of a newline at the end of the file, so I added one. I've also corrected a couple of minor errors concerning which flags are on which command line.

Thank you.

tests/tools/dhcp-ubench/README
OK, but I think there could perhaps be a fuller explanation if this will be part of the distributed code.

The idea here is to have a single point where detailed documentation is maintained (the performance guide) with other types of documentation (like README) kept to absolute minimum. All the details are in the performance guide. It will be much easier to maintain in the long run.

tests/tools/dhcp-ubench/benchmark.cc
When parsing an integer, suggest that boost::lexical_cast is used. The problem with strtol() is that it ignores trailing invalid characters in a number (although they can be detected by checking the value returned in the endptr variable).

Done.

When parsing command switches 's' and 'v', the "if" block could be replaced by:

<variable> = !strcasecmp(optarg, "yes") || !strcmp(optarg, "1");

That is shorter, but less readable IMHO. Done as suggested.

There is no need for the "case ':'" immediately before "default" - the latter covers it.

Removed.

tests/tools/dhcp-ubench/benchmark.h
tests/tools/dhcp-ubench/mysql_ubench.h
tests/tools/dhcp-ubench/sqlite_ubench.h
tests/tools/dhcp-ubench/memfile_ubench.h
Member variable names should start with a lower-case letter.

Converted all member variables to lower-case with trailing underscore.

Members of Lease4 structure should include inline comments.

Lease4 structure is now properly documented.

getLease(): "if" block should be enclosed in braces.

Fixed.

tests/tools/dhcp-ubench/mysql_ubench.cc
The variables initialized in createLease4Test() look a lot like those in the memory benchmark: they could perhaps be member variables in the base class and initialized in the constructor.

Using exactly the same values in all back-ends serve no purpose, except aesthetic. It does not impact performance in any way. Refactoring the code to use exactly the same values in every benchmark will not be productive. If you think otherwise, please create separate ticket for that.


hitRatio is 0.5 for the memory benchmark and 0.9 for the MySql one. To eliminate some uncertainty in the comparison, they should be the same for each. (In fact, the overall logic of the benchmark could be in the base class, with the access to the underlying database abstracted into methods in the database specialization classes).

hitratio is now a member variable of the base class. Regarding moving overall logic of the base class - it would only complicate the design. Once the compiled statement support is implemented, where whould you suggest to keep compiled statement? It would have to be shared between calls. That is obviously a local information, so it should not be promoted to class variable.

Again, it is a trade off between perfect design and time constraints. While I agree in principle that refactoring the code until it is a model C++ implementation would be nice, but I'm sure our customers perfer working DHCP server by end of the year.

If you still think we have enough time for this, please create a ticket.

tests/tools/dhcp-ubench/sqlite_ubench.cc
Also applicable here are the points made in the mySql about createLease4() data initializations and the hitRatio value.

hitratio_ moved to base class.

That is a response with all comments, except compiled statements. I will address that in the next comment.

comment:9 in reply to: ↑ 7 Changed 8 years ago by tomek

Replying to stephen:

The benchmark does not make optimum use of MySql. Creating an SQL statement via sprintf() and and passing it to mysql_real_query() incurs overhead in formatting the binary data, as well as overhead in repeatedly parsing the statement and preparing the execution plan. Faster is to use mysql_stmt_init() and mysql_stmt_prepare() to prepare the statement, mysql_stmt_bind_param() to bind parameters to it and mysql_stmt_execute() to execute it. The first two calls need only be done once for the loop. See http://dev.mysql.com/doc/refman/5.5/en/c-api-prepared-statements.html for more details.

Support for compiled statements in MySQL has been added in commits:
6a212576107d4b72f31249850b634eb6f7fec5ce
8c3cf6602f8977639b544fefbc6eb869b01eab07

I was disappointed at the results. I haven't run extensive tests, but quick runs should around 10% improvement with compiled statement. It is paid for with increased code complexity.

Both query types ("classic" and compiled statements) has been implemented within the same methods, e.g. createLease4Test() with several "if (compiled_stmt_) {...}" clauses. It would be cleaner to implement them as separate methods, but then it would be difficult to maintain the same parameters in both approaches.

Ok, now on to the last thing: compiled statements in SQLite.

comment:10 Changed 8 years ago by tomek

  • Owner changed from tomek to UnAssigned

Support for compiled statements in SQLite has been added. They are now enabled by default. There is an additional command-line parameter (-c 0|1) that controls whether compiled statements should be used or not.

It is recommended to start the review by reading tests/tools/dhcp-ubench/dhcp-perf-guide.html first.

The code is ready for review.

comment:11 Changed 8 years ago by marcin

  • Owner changed from UnAssigned to marcin

comment:12 follow-up: Changed 8 years ago by marcin

  • Owner changed from marcin to tomek

I have reviewed commit 272d9f6f5bb929d81f1236ad953378d145225ade.

General
Minor: In the code you're using mixture of std::cout and printf. Im my opinion it would be "cleaner" to use one of them all the time. In any case, this is not very important.

tests/tools/dhcp-ubench/dhcp-perf-guide.html
The information in the document is very useful (especially the results presented on the graph).
One concern I usually have with benchmarks is how consistent the results are. In other words, if I run this test a couple of times in turn, would it give the same result?

tests/tools/dhcp-ubench/benchmark.h
Suggest that void print_clock(...) is ranamed to printClock(...) so as it is consistent with naming convention used for other functions. The same applies to struct timespec get_time().

In doxygen comments to several functions there is a reference to Num_ member that does not exist. It should be replaced with num_ which is declared as uint32_t num_ in protected scope.

tests/tools/dhcp-ubench/benchmark.cc
Although it is not a big deal when calling it once, but the following assignment hitratio_ = 0.9f in constructor's body could be moved to constructor's initialization list.

parseCmdline(): Error string "Failed to iterations (-n option)." does not make sense to me.

parseCmdLine(): The lexical_cast<int> prevents specifying non-numeric arguments but it will not prevent negative arguments like -n-20. Suggest additional check that num_ > 0.

usage(): Parameter -c is not documented.

get_time(): clock_gettime may return negative value which indicates error. This should be checked prior to returning ts. If it occurs it is fatal error.

get_time(): This is a side note. I have been using boost::posix_time::ptime class which I found very useful and more portable than clock_gettime. With the latter you have to take care of ifdefs for particular OSes.

tests/tools/dhcp-ubench/memfile_ubench.cc
The class members Filename_, Sync_ and File_ should have names starting with lowercase letter.

createLease4Test(): Not very important here but it is recommended to use ++i in for() loops rather than i++. The same thing can be found in multiple other places too.

createLease4Test(): For people who are less familiar with the code it would be good to have comments what the magic numbers: 19, 65, 33 are. The other solution would be to define constants with meaningful names.

searchLease4Test(): The variable hitRatio should be renamed to hit_ratio. Otherwise it suggests that hitRatio is a function.

tests/tools/dhcp-ubench/mysql_ubench.h
MySQL_uBenchmark constructor: descirption of parameter hostname says "Name of the hostname to connect to". Suggest that this is changed to "Name of the host to connect to".

tests/tools/dhcp-ubench/mysql_ubench.cc
createLease4Test(): Comment at pool_id declaration does not make sense to me. Variable is initialized to 1000 while comments says 'Let's use pools 0-99'

createLease4Test(): Again, it would be good to have some more comments describing magic numbers: 19,65 etc.

tests/tools/dhcp-ubench/sqlite_ubench.cc
disconnect(): Function closes connection if db_ object is does nothing if db_ object does not exist. It could be better maybe to throw exception (say InvalidOperation?) if db_ does not exist to inform caller that nothing has been done?> The mysql_ubench::disconnect() behaves that way.

search_callback(): Shouldn't be the code under #ifdef 0 executed in verbose mode and not executed in non-verbose?

comment:13 in reply to: ↑ 12 ; follow-up: Changed 8 years ago by tomek

  • Owner changed from tomek to marcin

Replying to marcin:

I have reviewed commit 272d9f6f5bb929d81f1236ad953378d145225ade.

General
Minor: In the code you're using mixture of std::cout and printf. Im my opinion it would be "cleaner" to use one of them all the time. In any case, this is not very important.

All converted to cout.

tests/tools/dhcp-ubench/dhcp-perf-guide.html
The information in the document is very useful (especially the results presented on the graph).
One concern I usually have with benchmarks is how consistent the results are. In other words, if I run this test a couple of times in turn, would it give the same result?

Good point. There's planned re-run with compiled statements implemented. I will rerun the tests 10 times and will provide average.

tests/tools/dhcp-ubench/benchmark.h
Suggest that void print_clock(...) is ranamed to printClock(...) so as it is consistent with naming convention used for other functions. The same applies to struct timespec get_time().

Done.

In doxygen comments to several functions there is a reference to Num_ member that does not exist. It should be replaced with num_ which is declared as uint32_t num_ in protected scope.

Done.

tests/tools/dhcp-ubench/benchmark.cc
Although it is not a big deal when calling it once, but the following assignment hitratio_ = 0.9f in constructor's body could be moved to constructor's initialization list.

Done.

parseCmdline(): Error string "Failed to iterations (-n option)." does not make sense to me.

parseCmdLine(): The lexical_cast<int> prevents specifying non-numeric arguments but it will not prevent negative arguments like -n-20. Suggest additional check that num_ > 0.

Fixed.

usage(): Parameter -c is not documented.

It is now documented.

get_time(): clock_gettime may return negative value which indicates error. This should be checked prior to returning ts. If it occurs it is fatal error.

get_time(): This is a side note. I have been using boost::posix_time::ptime class which I found very useful and more portable than clock_gettime. With the latter you have to take care of ifdefs for particular OSes.

Good information, but it does not offer enough precision. ptime is implemented on Linux using GetTimeOfDay?, which offers microsecond resolution, while clock_gettime offers nanosecond resolution. Please note that in the most efficient cases we are able to do well over 1 million searches per second, so each individual operation may take less than a microsecond. It is useful to have clock resolution greater than the event we are trying to measure.

(Of course, no sane person would measure just a single event, but it is good to keep the measurement error as low as possible).

tests/tools/dhcp-ubench/memfile_ubench.cc
The class members Filename_, Sync_ and File_ should have names starting with lowercase letter.

createLease4Test(): Not very important here but it is recommended to use ++i in for() loops rather than i++. The same thing can be found in multiple other places too.

That is true only if increased entity is an object, not a primitive type. For objects, a temporary object is created.

createLease4Test(): For people who are less familiar with the code it would be good to have comments what the magic numbers: 19, 65, 33 are. The other solution would be to define constants with meaningful names.

Replaced 19 with "hwaddr_len - 1", 65 with 'A' and commented on 33.

searchLease4Test(): The variable hitRatio should be renamed to hit_ratio. Otherwise it suggests that hitRatio is a function.

It now uses Benchmark::hitratio_, just as two other benchmarks do.

tests/tools/dhcp-ubench/mysql_ubench.h
MySQL_uBenchmark constructor: descirption of parameter hostname says "Name of the hostname to connect to". Suggest that this is changed to "Name of the host to connect to".

Done.

tests/tools/dhcp-ubench/mysql_ubench.cc
createLease4Test(): Comment at pool_id declaration does not make sense to me. Variable is initialized to 1000 while comments says 'Let's use pools 0-99'

Comment corrected.

createLease4Test(): Again, it would be good to have some more comments describing magic numbers: 19,65 etc.

Done.

tests/tools/dhcp-ubench/sqlite_ubench.cc
disconnect(): Function closes connection if db_ object is does nothing if db_ object does not exist. It could be better maybe to throw exception (say InvalidOperation?) if db_ does not exist to inform caller that nothing has been done?> The mysql_ubench::disconnect() behaves that way.

An exception is now thrown. We can't use typical exception types used in BIND10, as this code is an independent tool (no dependencies on any other BIND10 code).

search_callback(): Shouldn't be the code under #ifdef 0 executed in verbose mode and not executed in non-verbose?

No. Verbose only tracks progress by printing dots. Printing lease content when search is executed 1000s of times would be overkill. The code has been modified to actually consume contents of found lease.

Thanks for thorough review.

comment:14 in reply to: ↑ 13 Changed 8 years ago by marcin

  • Owner changed from marcin to tomek

Replying to tomek:

Replying to marcin:

I have reviewed commit 272d9f6f5bb929d81f1236ad953378d145225ade.

General
Minor: In the code you're using mixture of std::cout and printf. Im my opinion it would be "cleaner" to use one of them all the time. In any case, this is not very important.

All converted to cout.

tests/tools/dhcp-ubench/dhcp-perf-guide.html
The information in the document is very useful (especially the results presented on the graph).
One concern I usually have with benchmarks is how consistent the results are. In other words, if I run this test a couple of times in turn, would it give the same result?

Good point. There's planned re-run with compiled statements implemented. I will rerun the tests 10 times and will provide average.

tests/tools/dhcp-ubench/benchmark.h
Suggest that void print_clock(...) is ranamed to printClock(...) so as it is consistent with naming convention used for other functions. The same applies to struct timespec get_time().

Done.

In doxygen comments to several functions there is a reference to Num_ member that does not exist. It should be replaced with num_ which is declared as uint32_t num_ in protected scope.

Done.

tests/tools/dhcp-ubench/benchmark.cc
Although it is not a big deal when calling it once, but the following assignment hitratio_ = 0.9f in constructor's body could be moved to constructor's initialization list.

Done.

parseCmdline(): Error string "Failed to iterations (-n option)." does not make sense to me.

parseCmdLine(): The lexical_cast<int> prevents specifying non-numeric arguments but it will not prevent negative arguments like -n-20. Suggest additional check that num_ > 0.

Fixed.

usage(): Parameter -c is not documented.

It is now documented.

get_time(): clock_gettime may return negative value which indicates error. This should be checked prior to returning ts. If it occurs it is fatal error.

get_time(): This is a side note. I have been using boost::posix_time::ptime class which I found very useful and more portable than clock_gettime. With the latter you have to take care of ifdefs for particular OSes.

Good information, but it does not offer enough precision. ptime is implemented on Linux using GetTimeOfDay?, which offers microsecond resolution, while clock_gettime offers nanosecond resolution. Please note that in the most efficient cases we are able to do well over 1 million searches per second, so each individual operation may take less than a microsecond. It is useful to have clock resolution greater than the event we are trying to measure.

Agreed.
The API exposed by boost::posix_time classes is "ready" for nanoseconds precision but that's true that actual timestamps have microsecond precision only (I just confirmed this with some quick experiments.

(Of course, no sane person would measure just a single event, but it is good to keep the measurement error as low as possible).

tests/tools/dhcp-ubench/memfile_ubench.cc
The class members Filename_, Sync_ and File_ should have names starting with lowercase letter.

createLease4Test(): Not very important here but it is recommended to use ++i in for() loops rather than i++. The same thing can be found in multiple other places too.

That is true only if increased entity is an object, not a primitive type. For objects, a temporary object is created.

Even if you increase value of primitive type, the original value has to be stored in temporary location prior to increment operation. If you do ++i it is first incremented and then its value is returned. There is no need to copy original value to temporary location. For this reason there is a redundancy with primitives too.

createLease4Test(): For people who are less familiar with the code it would be good to have comments what the magic numbers: 19, 65, 33 are. The other solution would be to define constants with meaningful names.

Replaced 19 with "hwaddr_len - 1", 65 with 'A' and commented on 33.

searchLease4Test(): The variable hitRatio should be renamed to hit_ratio. Otherwise it suggests that hitRatio is a function.

It now uses Benchmark::hitratio_, just as two other benchmarks do.

tests/tools/dhcp-ubench/mysql_ubench.h
MySQL_uBenchmark constructor: descirption of parameter hostname says "Name of the hostname to connect to". Suggest that this is changed to "Name of the host to connect to".

Done.

tests/tools/dhcp-ubench/mysql_ubench.cc
createLease4Test(): Comment at pool_id declaration does not make sense to me. Variable is initialized to 1000 while comments says 'Let's use pools 0-99'

Comment corrected.

createLease4Test(): Again, it would be good to have some more comments describing magic numbers: 19,65 etc.

Done.

tests/tools/dhcp-ubench/sqlite_ubench.cc
disconnect(): Function closes connection if db_ object is does nothing if db_ object does not exist. It could be better maybe to throw exception (say InvalidOperation?) if db_ does not exist to inform caller that nothing has been done?> The mysql_ubench::disconnect() behaves that way.

An exception is now thrown. We can't use typical exception types used in BIND10, as this code is an independent tool (no dependencies on any other BIND10 code).

search_callback(): Shouldn't be the code under #ifdef 0 executed in verbose mode and not executed in non-verbose?

No. Verbose only tracks progress by printing dots. Printing lease content when search is executed 1000s of times would be overkill. The code has been modified to actually consume contents of found lease.

Thanks for thorough review.

Code is good. Please merge.

comment:15 Changed 8 years ago by tomek

  • Resolution set to complete
  • Status changed from reviewing to closed

Thanks for the review. Code is merged.

Note: See TracTickets for help on using tickets.