Opened 7 years ago

Closed 6 years ago

#3080 closed enhancement (fixed)

New DB backend: Postgres (patch)

Reported by: dclink Owned by: tmark
Priority: medium Milestone: Kea0.8
Component: database-all Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket: 2592
Estimated Difficulty: 3 Add Hours to Ticket: 6
Total Hours: 61 Internal?: no

Description

Attempts to put additional PostgreSQL backend.

Subtickets

Attachments (3)

patch.3080 (63.3 KB) - added by dclink 7 years ago.
Early version of pgsql backend
patch.2.3080 (128.2 KB) - added by dclink 7 years ago.
Version fixed + unit tests ... trac2952 needs to be merged related to tomek's update (creation of GenericLeastMgTest?...)
postgres-valgrind.log (1.1 MB) - added by marcin 6 years ago.
Valgrind failires on Marcin's Ubuntu 13.10

Download all attachments as: .zip

Change History (24)

Changed 7 years ago by dclink

Early version of pgsql backend

Changed 7 years ago by dclink

Version fixed + unit tests ... trac2952 needs to be merged related to tomek's update (creation of GenericLeastMgTest?...)

comment:1 Changed 7 years ago by dclink

  • Component changed from dhcp to dhcpdb
  • Feature Depending on Ticket set to 2592
  • Type changed from defect to enhancement

comment:2 Changed 7 years ago by dclink

  • Owner set to tomek
  • Status changed from new to reviewing

comment:3 Changed 6 years ago by tomek

  • Milestone changed from New Tasks to DHCP-Kea-proposed
  • Priority changed from low to high

After today's presentation for the Board, Postgres is not a low priority anymore.

comment:4 Changed 6 years ago by tomek

  • Summary changed from DHCP Server new alternative DB backend to New DB backend: Postgres (patch)

comment:5 Changed 6 years ago by tomek

Before we get to this, #3359 has to be done.

comment:6 Changed 6 years ago by tomek

  • Milestone changed from DHCP-Kea-proposed to DHCP-Kea0.9-alpha

Moving tasks related to additional backends to Kea0.9-alpha (as discussed with Marcin, Stephen, Vicky and Jeff in London).

comment:7 Changed 6 years ago by tomek

  • Status changed from reviewing to accepted

comment:8 Changed 6 years ago by tomek

  • Status changed from accepted to assigned

comment:9 Changed 6 years ago by tomek

  • Priority changed from high to medium

comment:10 Changed 6 years ago by tomek

  • Owner changed from tomek to UnAssigned
  • Status changed from assigned to reviewing

The code is now available for review. See trac3080.

I have extensively cleaned up the patch, added over 20 unit-tests, added User's Guide sections, added Developer's Guide text and made several other smaller improvements.

Please review.

Note that the code still requires some work, but I decided that it is stable and clean for now to make its first attempt at review.

I think we'll need to consider (as additional tickets):

  • using boost::lexical_cast<> instead of stringstream
  • many parameters can be consts
  • better line wrapping
  • mode detailed comments for code in pgsql_lease_mgr.cc

comment:11 Changed 6 years ago by tomek

  • Add Hours to Ticket changed from 0 to 32
  • Total Hours changed from 0 to 32

comment:12 Changed 6 years ago by tmark

  • Owner changed from UnAssigned to tmark

comment:13 Changed 6 years ago by tmark

  • Add Hours to Ticket changed from 32 to 10
  • Total Hours changed from 32 to 42

As it stands there are build and runtime issues that need to be fixed before
the review can continue. I am listing build and runtime issues first, then
other issues:

BUILD ISSUES:


Using Postgres 9.3.2, I encountered build issues on both OS-X 10.9.2
(Mavericks) and Centos 6.4:

The value returned by pg_config used for LDFLAGS in configured begins with
"-L../../../src/common".

When building dhcpsrv, libtool fails as it cannot turn this
"../../../src/common" into an absolute path.

Under OS-X, the pg_config file defines LDFLAGS as this:

LDFLAGS = -L../../../src/common -L/usr/local/Cellar/ossp-uuid/1.6.2/lib -Wl,-dead_strip_dylibs

under Centos, it is this:

LDFLAGS = -L../../../src/common -L/usr/lib64 -Wl,--as-needed

Removing "-L../../../src/common" from the Makfile and it builds without error.

On Centos 6.4, configure fails to build Postgresql test code due to missing
libraries, namely it cannot find libpq. By adding LIBDIR from pg_config to
LDFLAGS in configure, this check passes.

I think the solution is to change configure.ac to use LIBDIR from pg_config.


pgsql_lease_mgr.cc

Under clang++ on Mavericks:

pgsql_lease_mgr.cc:38:14: error: unused variable 'ADDRESS6_TEXT_MAX_LEN'

[-Werror,-Wunused-const-variable]


pgsql_lease_mgr.h

There are structures instantiations which such as the following:

        PgSqlParam paddr4 = { .value = tmp.str() };

Not all compilers are going to like this, gcc 4.4.2 on Centos produces this error:

pgsql_lease_mgr.cc:250: error: expected primary-expression before ‘.’ token

There dozens of occurances. I think it would be far more portable and far cleaner to add constructors to the structs.


There are several cppcheck complaints:

src/lib/dhcpsrv/pgsql_lease_mgr.cc:621: check_fail: The scope of the variable 'r' can be reduced. (style,variableScope)
src/lib/dhcpsrv/pgsql_lease_mgr.cc:47: check_fail: struct or union member 'TaggedStatement::index' is never used. (style,unusedStructMember)
src/lib/dhcpsrv/pgsql_lease_mgr.cc:289: check_fail: Unused variable: valid_lft_str (style,unusedVariable)
src/lib/dhcpsrv/pgsql_lease_mgr.cc:300: check_fail: Unused variable: subnet_id_str (style,unusedVariable)
src/lib/dhcpsrv/pgsql_lease_mgr.cc:431: check_fail: Unused variable: valid_lft_str (style,unusedVariable)
src/lib/dhcpsrv/pgsql_lease_mgr.cc:887: check_fail: Unused variable: baddr (style,unusedVariable)
src/lib/dhcpsrv/pgsql_lease_mgr.cc:229: check_fail: Member variable 'PgSqlLease4Exchange::hwaddr_length_' is not initialized in the constructor. (warning,uninitMemberVar)
src/lib/dhcpsrv/pgsql_lease_mgr.cc:229: check_fail: Member variable 'PgSqlLease4Exchange::client_id_length_' is not initialized in the constructor. (warning,uninitMemberVar)
src/lib/dhcpsrv/pgsql_lease_mgr.cc:395: check_fail: Member variable 'PgSqlLease6Exchange::duid_length_' is not initialized in the constructor. (warning,uninitMemberVar)
src/lib/dhcpsrv/tests/pgsql_lease_mgr_unittest.cc:117: check_fail: The scope of the variable 'r' can be reduced. (style,variableScope)
src/lib/dhcpsrv/tests/pgsql_lease_mgr_unittest.cc:139: check_fail: The scope of the variable 'r' can be reduced. (style,variableScope)

RUNTIME ISSUES:


pgsql_lease_mgr.cc

Use of "SET AUTOCOMMIT TO OFF" is no longer supported and causes most of the
unit tests to fail with this message:

STATEMENT:  SET AUTOCOMMIT TO OFF
ERROR:  SET AUTOCOMMIT TO OFF is no longer supported

It is currently being used in methods that fetch, such as getLeaseCollection.
Since commits are meaningless for fetches there is no sense in doing it even
if we could.

Similarly, fetch methods do not need the BEGIN and END statements. This is
an attempt to wrap the read in a transaction. Transactions should be
meaningless in terms of fetches. This should be examined and they should be
removed unless there is a compelling reason to keep them.


None of the PgSqlLeaseMgrTest? tests pass under Mavericks/9.3.2. They all
fail on cltt comparisons:

[ RUN      ] PgSqlLeaseMgrTest.basicLease4
WARNING:  there is no transaction in progress
test_utils.cc:53: Failure
Value of: second->cltt_
  Actual: 105456
Expected: first->cltt_
Which is: 123456
test_utils.cc:53: Failure
Value of: second->cltt_
  Actual: 216567
Expected: first->cltt_
Which is: 234567
test_utils.cc:53: Failure
Value of: second->cltt_
  Actual: 216567
Expected: first->cltt_
Which is: 234567
test_utils.cc:53: Failure
Value of: second->cltt_
  Actual: 216567
Expected: first->cltt_
Which is: 234567
NOTICE:  there is no transaction in progress
[  FAILED  ] PgSqlLeaseMgrTest.basicLease4 (52 ms)

The issue is that calculation of cltt_ when extracting it from the database
does not account for localtime. Note the actual and expected times are 5
hours apart and I am currently in EDT.


Typcial Review Issues:


pgsql_lease_mgr.cc

The Oid integer values need to be replaced either with constants we define or
use the #defines in server/catalog/pg_type.h. However, including pg_type.h
does come with a twist, you need to include a higher level include such as
server/postgres_fe.h but that introduces some redefines. It can be
circumvented as follows:

// Undefine the following from our config.h so they do not collide
// with those in postgres_fe.h.
#undef PACKAGE_BUGREPORT
#undef PACKAGE_NAME
#undef PACKAGE_STRING
#undef PACKAGE_TARNAME
#undef PACKAGE_VERSION

#include <postgres_fe.h>
#include <catalog/pg_type.h>

pgsql_lease_mgr.h

The typedef "bindparams" should not be lowercased.


pgsql_lease_mgr.cc

several comments still refer to MYSQL_BIND


pgsql_lease_mgr.cc

TBD

Conversion of hwaddr_ to string seems sketchy...

         PgSqlParam pdest = { .value = string(lease_->hwaddr_.begin(),
                                                 lease_->hwaddr_.end()),

Seems like this should be treated the same way as client_id, I think it
should be this:

         PgSqlParam pdest = { .value = reinterpret_cast<char*>(&(lease_->hwaddr_[0]));

And of course ".value =" must be replaced as mentioned earlier.


pgsql_lease_mgr.cc

Remove the "inline" from convertToQuery:

inline void
PgSqlLeaseMgr::convertToQuery(...

pgsql_lease_mgr.cc

PgSqlLeaseExchange::stringToBool, the commentary should mention that postgressql stores
booleans as NULL, 't', or 'f'. or call it pgBoolToBool().


pgsql_lease_mgr.cc

Spacing is wrong on these:

    PgSqlParam fqdn_fwd = { .value = lease_->fqdn_fwd_?"TRUE":"FALSE" };

pgsql_lease_mgr.cc

This constant is actually wrong, the maximum number of parameters used SO FAR,
is 13 used with UPDATE_LEASE6:

// Maximum number of parameters used in any signle query
const size_t MAX_PARAMETERS_IN_QUERY = 12;

This may explain why it TaggedStatement::types is dimensioned as as
MAX_PARAMETERS_IN_QUERY + 1.

There ought to be a cleaner way to do this but at the very least it should be
accurate. There is no reason (at least none documented) for the " + 1".


pgsql_lease_mgr.cc

I am not really sure I see any actual value in using the BOOST_STATIC_ASSERTs,
and at the very least the asserts in MySqlLease6Exchange() and
PgSqlLease6Exchange() aren't even testing against the correct number.
They check that LEASE_COLUMNS is greater than 8, when in fact there are 12
column names:

    PgSqlLease6Exchange() {
        memset(duid_buffer_, 0, sizeof(duid_buffer_));
        // Set the column names (for error messages)
        columns_[0] = "address";
        columns_[1] = "duid";
        columns_[2] = "valid_lifetime";
        columns_[3] = "expire";
        columns_[4] = "subnet_id";
        columns_[5] = "pref_lifetime";
        columns_[6] = "lease_type";
        columns_[7] = "iaid";
        columns_[8] = "prefix_len";
        columns_[9] = "fqdn_fwd";
        columns_[10]= "fqdn_rev";
        columns_[11]= "hostname";
        BOOST_STATIC_ASSERT(8 < LEASE_COLUMNS);

        params.reserve(LEASE_COLUMNS);
    }

How much value is there in doing a static assert on two constants anyway?


At this point I suspending further review, pending correction of the build and
test failures.

As tomek is out of the office for two weeks, I will be making these fixes myself.

comment:14 Changed 6 years ago by tmark

  • Add Hours to Ticket changed from 10 to 8
  • Total Hours changed from 42 to 50

I have addressed most of the above issues with
git commits efbde0adb7a7dcff8438fc7a5209afe77b4b1b32 and 6adb8b731be3a72e9a6e407ca076a7ee1810d826

With these changes it should build and pass unit testing on OS-X and Centos
with Postgres9.3.4. There are still runtime issues to address. I have run
some basic manual tests on Centos and verified v4 and v6 leases can be stored
and retrieved.

Changed the definition of PGSQL_LIBS for building Kea with PostgreSQL
back end to use pg_config value for LIBDIR rather than LDFLAGS. The latter
did not build with PostgreSQL 9.3.4 on OS-X or Centos.

Added OS-X version numbers 10.9.1 and 10.9.2 to the test for setting the value
of bind10_undefined_pthread_behavior. Without this the death test for
conditional variables fails as the problem introduced in 10.9 is still there
as of 10.9.2. This is unrelated to PostgreSQL.

Changed expire column type in lease tables to be "TIMESTAMP WITH TIME ZONE",
and added methods to convert to and from such fields to LeaseExchange?. This
corrects mismatched time conversion to and from database which was causing unit tests to fail.

Added constructors to PgSqlParam? to eliminate use of ".value" initializers and
to provide a safe, uniform way to create parameters for binary data. Prior to
this valgrind was reporting invalid reads when vectors were statically cast
to char*.

Removed superfluous BOOST_STATIC_ASSERT and corrected values tested in remaining
check.

Removed use of "SET AUTOCOMMIT TO" as it is no longer supported in PostgreSQL.

Altered failure logic in PgSqlLeaseMgr::openDatabase() to release connection
if it is not NULL. This was causing memory leak in unit tests.

Added PQfinish call to createSchema() function to release the connection to fix
memory leaks during unit testing.

Cleaned most cppcheck complaints.

Outstanding Issues:

RUNTIME ISSUE:


Several methods are structured as a series of sql statement executes and
are therefore wrapped in a transaction by executing SQL "BEGIN" and "END"
statements. The most pressing issue with this that these methods do not
properly clean up if errors occur. Rather they throw and therefore bypass
executing the "END" or attempting a rollback. This leaves the current
transaction open and all subsequent transactions fail. The only way to
recover is to restart the server.

In addition to error handling problem, structuring these methods as multiple
statement executions is going to be costly from a performance stand point.
Their counterparts in the MySQL implementation are performed as single
statements.

At the very least, we need to alter the error handling to clean up, but a
better approach would be to do away with need for multiple statement
executions.


Minor Review Issues:


pgsql_lease_mgr.cc

The Oid integer values need to be replaced either with constants we define or
use the #defines in server/catalog/pg_type.h. However, including pg_type.h
does come with a twist, you need to include a higher level include such as
server/postgres_fe.h but that introduces some redefines. It can be
circumvented as follows:

// Undefine the following from our config.h so they do not collide
// with those in postgres_fe.h.
#undef PACKAGE_BUGREPORT
#undef PACKAGE_NAME
#undef PACKAGE_STRING
#undef PACKAGE_TARNAME
#undef PACKAGE_VERSION

#include <postgres_fe.h>
#include <catalog/pg_type.h>

pgsql_lease_mgr.h

The typedef "bindparams" should not be lowercased.


pgsql_lease_mgr.cc

Several comments still refer to MYSQL_BIND


pgsql_lease_mgr.cc

Remove the "inline" from convertToQuery:

inline void
PgSqlLeaseMgr::convertToQuery(...

I have not yet reviewed the unit tests themselves for completeness.

comment:15 follow-up: Changed 6 years ago by marcin

I ran some manual tests to make ure the documentation is correct and that the setup works.

A couple of observations:


When compiling with the configure option --with-dhcp-mysql the compilation failed because mysql_lease_mgr.cc was not modified after splitting it into schema_mysql_copy.h and schema_pgsql_copy.h. I fixed it on the branch.


It is not explained anywhere in the bind10-guide that kea requires special setup for unit tests when using Postgres (compiling with --with-dhcp-pgsql). Actually the same is the case for MySQL, I think. If this is explained in the developer's guide there could be a reference to the developer's guide or at least it should be mentioned that this is in developer's guide (if not a direct reference).


bind10-guide There should be no apostrophe here "postgres=#> GRANT ALL PRIVILEGES ON DATABASE database-name TO 'user-name;" - before username


The following text in bind10-guide is a little unclear:

If instead of seeing keatest=> prompt, your login will be refused with error code about failed peer or indent authentication, it means that PostgreSQL is configured to check unix username and reject login attepts if PostgreSQL names are different. To alter that, PostgreSQL configuration must be changed. That file is located at /etc/postgresql/9.1/main/pg_hba.conf on Ubuntu 13.10. Its location may be different on your system. Please consult your PostgreSQL user manual before applying those changes as those changes may expose your other databases that you run on the same system.

It would be better to copy-paste exact error messages.

Also, it would be good to explain what exactly has to be changed in the pg_hba.conf file, e.g. replace ''peer'' with ''trust''.

BTW, these are the steps that I used to setup db in postgres:

CREATE DATABASE keatest;
\i /opt/bind10/share/bind10/dhcpdb_create.pgsql
CREATE USER keatest WITH PASSWORD 'keatest';
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public
\c keatest
SELECT 'ALTER TABLE '|| tablename ||' OWNER TO keatest;' FROM pg_tables WHERE SCHEMANAME = 'public'
edit /etc/postgresql/9.1/main/pg_hba.conf and replace peer with trust
service postgresql restart

\c keatest
ALTER TABLE lease6 OWNER TO keatest;
postgres=# \c keatest
You are now connected to database "keatest" as user "postgres".
keatest=# ALTER TABLE lease6 OWNER TO keatest;
ALTER TABLE
keatest=# ALTER TABLE lease6_types OWNER TO keatest;
ALTER TABLE
keatest=# ALTER TABLE schema_version OWNER TO keatest

According to our jabber discussion, some of them are redundant, but I paste it here just for clarity what I did, versus what is in the doc.

Maybe it should be put as a note or warning in the bind10-guide that the keatest or kea or any other user MUST be an owner of tables!


Postgres backend has invalid logging statement in addLeaseCommon:

    LOG_DEBUG(dhcpsrv_logger, DHCPSRV_DBG_TRACE_DETAIL,
              DHCPSRV_PGSQL_ADD_ADDR4).arg(statements_[stindex].stmt_name);

It should be logged in the caller function, not here. Also stmt_name is not the address :)


Postgres backend is using MySQL logging messages in several places.


Not strictly related to this work but... DHCPSRV_ADDRESS6_ALLOC_ERROR - this error message in alloc_engine is wrong because it talks about IPv6 addresses only. If you fail to allocate the prefix (not a 128 address) it is quite misleading. And it was misleading when I spotted it for the first time in the kea log file, when running system tests.


I had valgrind failing on Ubuntu 13.10 (gcc 4.7 and 4.8) but I was never able to reproduce it elsewhere. It worked fine on Debian7. So, maybe my Ubuntu setup is foo-bared. I attached the log from the valgrind run.


There is one thing that I saw initially under valgrind, but I couldn't reproduce later:

convertToQuery function initializes values three vectors passed by reference. The ultimate sizes of vectors are equal to size of ''params'' vector, so given that we never pass empty params vector it should be fine to do this:

    convertToQuery(params, out_values, out_lengths, out_formats);

    PGresult * r = PQexecPrepared(conn_, statements_[stindex].stmt_name,
                                  statements_[stindex].stmt_nbparams,
                                  &out_values[0], &out_lengths[0],
                                  &out_formats[0], 0);

Trying to access element with index 0 of out_values and out_formats may result in segfault if params size is 0. So, in other words, the code could check for empty vectors returned. But, given that we will remove these bits of code soon, it is not a MUST.


I found some local variables terminated with underscore (see addLeaseCommon):

    vector<const char *> params_;
    vector<int> lengths_;
    vector<int> formats_;
    convertToQuery(params, params_, lengths_, formats_);

Changed 6 years ago by marcin

Valgrind failires on Marcin's Ubuntu 13.10

comment:16 Changed 6 years ago by marcin

I ran the following tests:

  • K4 using PG backend on Debian7, perdhcp running on the other machine. I ended up having 100.000 leases in the database.
  • The same test for v6 addresses
  • The same test for v6 prefixes
  • isc-dhcp requesting a lease in two variants: without client identifier, and with client identifier

All tests passed.

comment:17 in reply to: ↑ 15 Changed 6 years ago by tmark


It is not explained anywhere in the bind10-guide that kea requires special setup for unit tests when using Postgres (compiling with --with-dhcp-pgsql). Actually the same is the case for MySQL, I think. If this is explained in the developer's guide there could be a reference to the developer's guide or at least it should be mentioned that this is in developer's guide (if not a direct reference).

Since the bind10-guide is currently devoid of unit test discussion, I
have not altered this. I think the developer's guide discussion is sufficient.


bind10-guide There should be no apostrophe here "postgres=#> GRANT ALL PRIVILEGES ON DATABASE database-name TO 'user-name;" - before username


The following text in bind10-guide is a little unclear:

If instead of seeing keatest=> prompt, your login will be refused with error code about failed peer or indent authentication, it means that PostgreSQL is configured to check unix username and reject login attepts if PostgreSQL names are different. To alter that, PostgreSQL configuration must be changed. That file is located at /etc/postgresql/9.1/main/pg_hba.conf on Ubuntu 13.10. Its location may be different on your system. Please consult your PostgreSQL user manual before applying those changes as those changes may expose your other databases that you run on the same system.

It would be better to copy-paste exact error messages.

Also, it would be good to explain what exactly has to be changed in the pg_hba.conf file, e.g. replace ''peer'' with ''trust''.

BTW, these are the steps that I used to setup db in postgres:

CREATE DATABASE keatest;
\i /opt/bind10/share/bind10/dhcpdb_create.pgsql
CREATE USER keatest WITH PASSWORD 'keatest';
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public
\c keatest
SELECT 'ALTER TABLE '|| tablename ||' OWNER TO keatest;' FROM pg_tables WHERE SCHEMANAME = 'public'
edit /etc/postgresql/9.1/main/pg_hba.conf and replace peer with trust
service postgresql restart

\c keatest
ALTER TABLE lease6 OWNER TO keatest;
postgres=# \c keatest
You are now connected to database "keatest" as user "postgres".
keatest=# ALTER TABLE lease6 OWNER TO keatest;
ALTER TABLE
keatest=# ALTER TABLE lease6_types OWNER TO keatest;
ALTER TABLE
keatest=# ALTER TABLE schema_version OWNER TO keatest

According to our jabber discussion, some of them are redundant, but I paste it here just for clarity what I did, versus what is in the doc.

Maybe it should be put as a note or warning in the bind10-guide that the keatest or kea or any other user MUST be an owner of tables!

I have reworked this section in guide.


Postgres backend has invalid logging statement in addLeaseCommon:

    LOG_DEBUG(dhcpsrv_logger, DHCPSRV_DBG_TRACE_DETAIL,
              DHCPSRV_PGSQL_ADD_ADDR4).arg(statements_[stindex].stmt_name);

It should be logged in the caller function, not here. Also stmt_name is not the address :)


I deleted this call.

Postgres backend is using MySQL logging messages in several places.


Fixed this.

Not strictly related to this work but... DHCPSRV_ADDRESS6_ALLOC_ERROR - this error message in alloc_engine is wrong because it talks about IPv6 addresses only. If you fail to allocate the prefix (not a 128 address) it is quite misleading. And it was misleading when I spotted it for the first time in the kea log file, when running system tests.


This is out-of-scope for this ticket. I have created #3381 to clarify IPv6 logging messages to distinguish between allocation types.

I had valgrind failing on Ubuntu 13.10 (gcc 4.7 and 4.8) but I was never able to reproduce it elsewhere. It worked fine on Debian7. So, maybe my Ubuntu setup is foo-bared. I attached the log from the valgrind run.


I am unable to reproduce this under Centos6.4/Postgresql 9.3.4, or Fedora19/Postgresql9.2.6
There are mentions of other people experiencing similiar errors underneath PQconnectdb:

http://osdir.com/ml/pgsql-admin/2010-06/msg00113.html
http://stackoverflow.com/questions/19484230/checking-memory-with-valgrind

It seems like they tend to boil down to the SSL libraries that are getting used by PQConnect. Without more investigation on Marcin's failing environment there
isn't much to be done with it. Should we consider a new ticket?

Other issues, stemming from PGgetresult, should be re-examined as part of #3382, which calls for a rework of the bind array handling.

There is one thing that I saw initially under valgrind, but I couldn't reproduce later:

convertToQuery function initializes values three vectors passed by reference. The ultimate sizes of vectors are equal to size of ''params'' vector, so given that we never pass empty params vector it should be fine to do this:

    convertToQuery(params, out_values, out_lengths, out_formats);

    PGresult * r = PQexecPrepared(conn_, statements_[stindex].stmt_name,
                                  statements_[stindex].stmt_nbparams,
                                  &out_values[0], &out_lengths[0],
                                  &out_formats[0], 0);

Trying to access element with index 0 of out_values and out_formats may result in segfault if params size is 0. So, in other words, the code could check for empty vectors returned. But, given that we will remove these bits of code soon, it is not a MUST.


This is going to go away with #3382.

I found some local variables terminated with underscore (see addLeaseCommon):

    vector<const char *> params_;
    vector<int> lengths_;
    vector<int> formats_;
    convertToQuery(params, params_, lengths_, formats_);

This is going to be reworked with #3382.

comment:18 Changed 6 years ago by tmark

  • Add Hours to Ticket changed from 8 to 5
  • Owner changed from tmark to marcin
  • Total Hours changed from 50 to 55

comment:19 Changed 6 years ago by marcin

I went over the code in commit 8e7f2f910d6c7901706e60656b9107177470e725

Several things listed below are nits. Several other things are more important and indicate that operations we perform here are unsafe (e.g. memcpy). In most cases they will not show up I guess. In case of memcpy, unless someone messes up in the lease database, the length of the data will remain within the boundary of the allocated buffer.

Also, I don't think that our server's code will allow passing NULL leases to functions like addLease etc, therefore the NULL pointer assert may not show up.

But, I insist that we rework the whole backend as soon as possible after the BIND10 1.2 release and treat this as urgent.

The comments below will be referenced in #3382 as we probably want to address most of these issues there.

src/lib/dhcpsrv/pgsql_lease_mgr.cc
Functions in PgSqlLease4Exchange and PgSqlLease6Exchange should be documented.
Also, the inner parts of the functions deserve some commentary too.

createBindForSend doesn't check the lease object for being NULL. Note, that this function is directly called by the public functions of the lease manager. When someone passes a NULL pointer to these functions they will result in an assert which gets logged but is completely useless because there is no information where the assert origins.

The exception thrown at line 282 is fubared. There is an opening paranthesis after ''longer'' but not closing one. The maximal value is not in paranthesis at all.

Nit: I tend to agree with Stephen that we should rather avoid the names for variables like ''tmp''. This simply doesn't mean anything, except that it is a temporary variable (as almost all variables in the function scope) :)

I guess it was out of scope to turn istringstream to lexical_cast?

I wonder what is the benefit of initializing ''params'' as a member of the !PgSqlLease4Exchange? There may be a benefit if params is returned as const reference, so as you don't have to copy construct the returned value. But, params are returned by value. Maybe they should be returned by const reference?

convertFromDatabase: so pointer to ''r'' is passed by reference? Any need for this?

    convertFromDatabase(PGresult *& r, int line) {

I wonder if this is really safe operation:

        memcpy(hwaddr_buffer_, hwaddr_str, hwaddr_length_);
        memcpy(client_id_buffer_, client_id_str, client_id_length_);

The hwaddr_length_ and client_id_length_ are set by the PQunescapeBytea which have undefined length? If they happen to be longer than the !HWAddr::MAX_HWADDR_LEN and !ClientId::MAX_CLIENT_ID_LEN they the memcpy will overrun the allocated array and cause undefined behavior.

PgSqlLease4Exchange: hostname_ and hwaddr_ are unused.

PgSqlLease6Exchange: duid_ is unused.

convertFromDatabase: Perhaps instead of the switch construct we could just use cast:

        if (lease_type > Lease6::TYPE_PD) {
            isc_throw(BadValue, "invalid lease type returned (" <<
                      lease_type_ << ") for lease with address " <<
                      addr6_ << ". Only 0, 1, or 2 are allowed.");
        }
        Lease6::Type type = static_cast<Lease6::Type>(lease_type);

openDatabase:

    if (PQstatus(conn_) != CONNECTION_OK) {
        // If we have a connection object, we have to call finish
        // to release it, but grab the error message first.
        std::string error_message = PQerrorMessage(conn_);
        PQfinish(conn_);
        isc_throw(DbOpenError, error_message);
    }

AIUI, when database fails to open, the PQfinish will release the memory for conn_, but are we sure that the conn_ will be set to NULL by PQfinish? My guess is that it is not guaranteed, which in turn implies that the destructor may not invoke this:

    if (conn_) {
        // Deallocate the prepared queries.
        PGresult* r = PQexec(conn_, "DEALLOCATE all");
        if(PQresultStatus(r) != PGRES_COMMAND_OK) {
            // Highly unlikely but we'll log it and go on.
            LOG_ERROR(dhcpsrv_logger, DHCPSRV_PGSQL_DEALLOC_ERROR)
                      .arg(PQerrorMessage(conn_));
        }

        PQclear(r);
        PQfinish(conn_);
    }

I am not sure whether it is an issue or not, but I woiuld set the conn_ to NULL to be sure.

PgSqlLeaseMgr? constructor: conn_, exchange4_ and exchange6_ could be initialized using the constructor's initialization list, rather than be initialized in the constructor's body.

prepareStatements: There is no reason to call clear() before resize(). The resize() and then bunch of assignments should effectively remove and destroy objects present in the container.

comment:20 Changed 6 years ago by marcin

  • Owner changed from marcin to tmark

The changes in bind10-guide are good.

comment:21 Changed 6 years ago by tmark

  • Add Hours to Ticket changed from 5 to 6
  • Resolution set to fixed
  • Status changed from reviewing to closed
  • Total Hours changed from 55 to 61

Replying to marcin:

I went over the code in commit 8e7f2f910d6c7901706e60656b9107177470e725

Several things listed below are nits. Several other things are more important and indicate that operations we perform here are unsafe (e.g. memcpy). In most cases they will not show up I guess. In case of memcpy, unless someone messes up in the lease database, the length of the data will remain within the boundary of the allocated buffer.

Also, I don't think that our server's code will allow passing NULL leases to functions like addLease etc, therefore the NULL pointer assert may not show up.

But, I insist that we rework the whole backend as soon as possible after the BIND10 1.2 release and treat this as urgent.

The comments below will be referenced in #3382 as we probably want to address most of these issues there.

src/lib/dhcpsrv/pgsql_lease_mgr.cc
Functions in PgSqlLease4Exchange and PgSqlLease6Exchange should be documented.
Also, the inner parts of the functions deserve some commentary too.

createBindForSend doesn't check the lease object for being NULL. Note, that this function is directly called by the public functions of the lease manager. When someone passes a NULL pointer to these functions they will result in an assert which gets logged but is completely useless because there is no information where the assert origins.

The exception thrown at line 282 is fubared. There is an opening paranthesis after ''longer'' but not closing one. The maximal value is not in paranthesis at all.

I have fixed the exception.

Nit: I tend to agree with Stephen that we should rather avoid the names for variables like ''tmp''. This simply doesn't mean anything, except that it is a temporary variable (as almost all variables in the function scope) :)

I guess it was out of scope to turn istringstream to lexical_cast?

I wonder what is the benefit of initializing ''params'' as a member of the !PgSqlLease4Exchange? There may be a benefit if params is returned as const reference, so as you don't have to copy construct the returned value. But, params are returned by value. Maybe they should be returned by const reference?

convertFromDatabase: so pointer to ''r'' is passed by reference? Any need for this?

    convertFromDatabase(PGresult *& r, int line) {

I wonder if this is really safe operation:

        memcpy(hwaddr_buffer_, hwaddr_str, hwaddr_length_);
        memcpy(client_id_buffer_, client_id_str, client_id_length_);

The hwaddr_length_ and client_id_length_ are set by the PQunescapeBytea which have undefined length? If they happen to be longer than the !HWAddr::MAX_HWADDR_LEN and !ClientId::MAX_CLIENT_ID_LEN they the memcpy will overrun the allocated array and cause undefined behavior.

PgSqlLease4Exchange: hostname_ and hwaddr_ are unused.

PgSqlLease6Exchange: duid_ is unused.

convertFromDatabase: Perhaps instead of the switch construct we could just use cast:

        if (lease_type > Lease6::TYPE_PD) {
            isc_throw(BadValue, "invalid lease type returned (" <<
                      lease_type_ << ") for lease with address " <<
                      addr6_ << ". Only 0, 1, or 2 are allowed.");
        }
        Lease6::Type type = static_cast<Lease6::Type>(lease_type);

openDatabase:

    if (PQstatus(conn_) != CONNECTION_OK) {
        // If we have a connection object, we have to call finish
        // to release it, but grab the error message first.
        std::string error_message = PQerrorMessage(conn_);
        PQfinish(conn_);
        isc_throw(DbOpenError, error_message);
    }

AIUI, when database fails to open, the PQfinish will release the memory for conn_, but are we sure that the conn_ will be set to NULL by PQfinish? My guess is that it is not guaranteed, which in turn implies that the destructor may not invoke this:

    if (conn_) {
        // Deallocate the prepared queries.
        PGresult* r = PQexec(conn_, "DEALLOCATE all");
        if(PQresultStatus(r) != PGRES_COMMAND_OK) {
            // Highly unlikely but we'll log it and go on.
            LOG_ERROR(dhcpsrv_logger, DHCPSRV_PGSQL_DEALLOC_ERROR)
                      .arg(PQerrorMessage(conn_));
        }

        PQclear(r);
        PQfinish(conn_);
    }

I am not sure whether it is an issue or not, but I woiuld set the conn_ to NULL to be sure.

This is a good catch. In fact, none of the Postgresql functions which free
resources set the pointer to NULL. I have added a set of conn_ to NULL.

PgSqlLeaseMgr? constructor: conn_, exchange4_ and exchange6_ could be initialized using the constructor's initialization list, rather than be initialized in the constructor's body.

Fixed.

prepareStatements: There is no reason to call clear() before resize(). The resize() and then bunch of assignments should effectively remove and destroy objects present in the container.

There is one case in which you would need it, though it's rather esoteric. If
upon entry the vector contained n elements and NUM_STATEMENTS is greater than n
but the index of the tagged_statement element that is empty is less than n-1,
you could end up with vestigial" elements. Since this is not something called
often, typically once at start-up, I'm going to leave it for now.

The remainder of the issues will be addressed under 3382.

Changes are merged in with git 1aae8b1fab3008e62c4f085948b1abadad512447
Added ChangeLog? entry 777 for it.

Note: See TracTickets for help on using tickets.