Opened 8 years ago

Closed 8 years ago

#1292 closed defect (fixed)

Xfrin retransfer shared object "sqlite3_ds.so" not found

Reported by: jreed Owned by: jelte
Priority: high Milestone: Sprint-20111206
Component: Unclassified Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: High
Sub-Project: DNS Feature Depending on Ticket: none
Estimated Difficulty: 4 Add Hours to Ticket: 0
Total Hours: 16.17 Internal?: no

Description (last modified by jreed)

Running from source tree, when I attempt an "Xfrin retransfer":

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/pkg/lib/python3.1/threading.py", line 509, in _bootstrap_inner
    self.run()
  File "/usr/pkg/lib/python3.1/threading.py", line 462, in run
    self._target(*self._args, **self._kwargs)
  File "/home/reed/work/isc/build/src/bin/xfrin/b10-xfrin", line 819, in process_xfrin
    datasrc_client = DataSourceClient(datasrc_type, datasrc_config)
isc.datasrc.Error: Failed to create DataSourceClient of type sqlite3:Shared object "sqlite3_ds.so" not found

I have:

./src/lib/datasrc/.libs/sqlite3_ds.so.0.0
./src/lib/datasrc/.libs/sqlite3_ds.so.0
./src/lib/datasrc/.libs/sqlite3_ds.so
./src/lib/datasrc/.libs/sqlite3_ds.so.0.0T

Subtickets

Change History (39)

comment:1 Changed 8 years ago by jreed

On FreeBSD i386 system:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.1/threading.py", line 516, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.1/threading.py", line 469, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/home/jreed/builder/work/BIND10/20111011192506-FreeBSD8-i386/install/libexec/bind10-devel/b10-xfrin", line 819, in process_xfrin
    datasrc_client = DataSourceClient(datasrc_type, datasrc_config)
isc.datasrc.Error: Failed to create DataSourceClient of type sqlite3:Shared object "sqlite3_ds.so" not found, required by "python3.1"

comment:2 Changed 8 years ago by jreed

  • Defect Severity changed from N/A to High
  • Description modified (diff)
  • Priority changed from major to critical

So running from install I didn't see any output regarding this, no logging related to it. So I didn't know what was wrong. But Xfrin retransfer is broken.

Later attempts always say: "error": "zone xfrin is in progress"

My workaround was env LD_LIBRARY_PATH=/home/reed/work/isc/bind10-install-2/lib before running bind10. Then Xfrin retransfer worked.

In jabber, jinmei suggested:

1. in some environment dynamically loadable data source module cannot be found.

2. when an xfrin connection thread dies, it silently leaves a "lock" about an ongoing xfrin session, preventing further transfers.

if we complain something about logging, we might also want to include "it's not logged when the thread dies", though.

comment:3 Changed 8 years ago by jreed

Or when running from source tree I used:

env LD_LIBRARY_PATH=/home/reed/work/isc/build-2/bind10/src/lib/datasrc/.libs

comment:4 Changed 8 years ago by jreed

A temporary workaround is #1300.

comment:5 Changed 8 years ago by jreed

The fix for this ticket also needs to handle exception cleanly.

comment:6 Changed 8 years ago by shane

  • Milestone changed from New Tasks to Next-Sprint-Proposed

comment:7 Changed 8 years ago by jelte

  • Add Hours to Ticket changed from 0 to 4

comment:8 Changed 8 years ago by jelte

  • Add Hours to Ticket 4 deleted
  • Estimated Difficulty changed from 0 to 4

comment:9 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to New Tasks

comment:10 Changed 8 years ago by jreed

(12:42:53) jinmei: jeremy: I've addressed part of your concerns of #1292 in #1028.

(12:43:33) jinmei: xfrin would now log the event and also make sure subsequent transfer attempt is allowed.

comment:11 Changed 8 years ago by jinmei

  • Milestone changed from New Tasks to Next-Sprint-Proposed

comment:12 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20111122

comment:13 Changed 8 years ago by jelte

  • Owner set to jelte
  • Status changed from new to assigned

comment:14 Changed 8 years ago by jelte

  • Add Hours to Ticket set to 5
  • Owner changed from jelte to UnAssigned
  • Status changed from assigned to reviewing

Regarding the file not found error itself, looks like it is just a matter of setting the SET_ENV_LIBRARY_PATH used by the in-tree scripts for freebsd as well.

The branch actually contains mostly modifications and extensions to the lettuce tests, so i could make a nice test out of it :)

(most important change there is that the 'wait for output' step can now get a line you do NOT want (like XFRIN FAILED)), and prints the actual line in the error instead of just 'wrong match', so you can see the error (which in this case would be the sqlite3_ds.so not found).

comment:15 Changed 8 years ago by stephen

  • Owner changed from UnAssigned to stephen

comment:16 follow-up: Changed 8 years ago by stephen

  • Owner changed from stephen to jelte

tests/lettuce/features/terrain/bind10_control.py
Docstring for send_command is the same as for set_config_command.

Minor thing and probably not worth changing it for here but maybe something we could do later: the code for spawning off bindctl is duplicated in send_config_command and send_command. A common function to invoke bindctl, perhaps accepting a sequence of commands as a list, may be useful.

tests/lettuce/features/terrain/steps.py
Question: in wait_for_message, is it possible to leave out the message saying what you are looking for but only including the string noting what you are not looking for (i.e.

Wait for new bind10 stderr message not ERROR_FOUND

... which returns success in 10 second if the text ERROR_FOUND is not in the stderr stream.

If the Docstring is fixed, it can be merged with no further review needed.

comment:17 in reply to: ↑ 16 Changed 8 years ago by jelte

  • Resolution set to fixed
  • Status changed from reviewing to closed

Replying to stephen:

tests/lettuce/features/terrain/bind10_control.py
Docstring for send_command is the same as for set_config_command.

fixed.

Minor thing and probably not worth changing it for here but maybe something we could do later: the code for spawning off bindctl is duplicated in send_config_command and send_command. A common function to invoke bindctl, perhaps accepting a sequence of commands as a list, may be useful.

Noted, will do so for next addition :)

tests/lettuce/features/terrain/steps.py
Question: in wait_for_message, is it possible to leave out the message saying what you are looking for but only including the string noting what you are not looking for (i.e.

Wait for new bind10 stderr message not ERROR_FOUND

... which returns success in 10 second if the text ERROR_FOUND is not in the stderr stream.

We *could* (if message is None, catch timeout and return, if not
caught, raise error), but one problem is that it would always take the
full time to get there. So IMO, unless we actually need it for
something, and there is no other way to do whatever test we need it
for, I rather leave the option out for now :)

If the Docstring is fixed, it can be merged with no further review needed.

Ok, thanks, merged, closing ticket.

comment:18 Changed 8 years ago by jinmei

Actually, the real problem hasn't been solved yet.

Although the problem statement of the ticket could be misleading,
it's not only for the "in source" environment. It happens for
an installed xfrin program, and it's actually more substantial.

I'm reopening the ticket.

comment:19 Changed 8 years ago by jinmei

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:20 follow-up: Changed 8 years ago by jelte

  • Owner changed from jelte to jinmei
  • Status changed from reopened to reviewing

(I am assigning this directly back to you jinmei, since you had the original request and proposal in the first place, feel free to make unassigned if you don't have time before tomorrow, or throw chairs in my direction if you hate the idea too much)

Okay, I have a proposal for a slightly different middle-term solution to this problem than the one you sent to bind10-dev; there are no additional values passed, but the DataSourceClient? container now calls a function that makes an absolute path for dlopen() to use;

  • if the file does not end with .so, _ds.so is appended (so that once we do have configuration, any name can be used for home-grown modules)
  • if the file is not absolute, it is made absolute. By default this uses $(prefix)/lib (but we can change to whatever directory we want to put our modules in), unless B10_FROM_BUILD is set in the environment (at this moment I can't think of any way not to need something like that somewhere in the code path; we'll always have to be able to override the value for in-tree runs and tests), then it will use $(abs_top_builddir)/src/datasrc/.libs.

I have an initial implementation of this in trac1292_2; please take a look if you have the opportunity.

This branch also has a slight addition to the lettuce test; it prints bindctl stdout/stderr if bindctl calls fails when sending a command. Also, a number of tests now need to set B10_FROM_BUILD so that the install prefix isn't used when using the factory loader.

BTW, I have not checked, but if this works, i suspect it solves several other related tickets as well (#1385 and #1421, maybe more)

comment:21 in reply to: ↑ 20 ; follow-up: Changed 8 years ago by jinmei

Replying to jelte:

First, this point:

BTW, I have not checked, but if this works, i suspect it solves several other related tickets as well (#1385 and #1421, maybe more)

For #1421, probably. I suspect #1385 is a separate issue.

About the branch:

general

I'm not very happy about introducing yet another hardcode of things
like B10_FROM_BUILD in our main source code. Is the plan to have this
as a short to middle term workaround and eventually replace it with a
more generic framework? If so, I can live with that (but please make
that clear in comments). If not...we should discuss it:-)

datasrc_config.h.in

  • We need a header guard for this file.
  • I'd avoid using macro whenever possible, and I think in this case we can avoid that (it should be able to be a const char* const). I'd also put the definition in a namespace isc::datasrc.
  • We need documentation for MODULE_PATH.
  • Although this is independent from the topic of this ticket, I personally think @prefix@/lib is not the right place for loadable modules for two reasons: 1. they are not really "libraries", 2. they are quite BIND 10 specific. So I'd suggest changing it to @localstatedir@/@PACKAGE@/modules (which would be expanded to something like /usr/local/libexec/bind10-devel/modules), either within this ticket or in a new ticket after creating it.

bind10_src.py.in

Why did you reintroduce start_zonemgr and start_stats? Also, although
this is a subject of #1388, we should actually even be able to
obsolete (i.e. remove) start_xfrout() and start_xfrin() and remove
XfrIn/XfrOut? from isc.bind10.special_component.py (and these would
also require updating bob.spec).

We could leave this task to #1388 as originally planned, or complete
that part within this ticket.

factory.cc

  • maybe a matter of taste in the case of std::string, but I suggest using empty() instead of length() == 0:
        if (type.length() == 0) {
    
    At least for some STL containers that's a better practice, and I think using consistent convention would make sense here, too.
  • I'd also move the definition of lib_file just before its first use:
        std::string lib_file = type;
        const int ext_pos = lib_file.rfind(".so");
    
  • Don't we need a test for getDataSourceLibFile?

datasrc/tests/Makefile.am

  • Why was this commented out, not simply removed?
    -run_unittests_SOURCES += factory_unittest.cc
    +#run_unittests_SOURCES += factory_unittest.cc
    
  • Regarding run_unittests_factory, if the intent is as commented:
    # Also, we only want to do this when static building is not used.
    
    Shouldn't we wrap this with "if !USE_STATIC_LINK/endif"?

lettuce bind10_control.py

I've not confirmed this by running it, but is this okay?

    result = bindctl.wait()
    (stdout, stderr) = bindctl.communicate()

According to the documentation communicate() seems to have the
functionality of wait().

lettuce xfrin_bind10.feature

I'm afraid I'll need to learn more about lettuce to understand this
change. I'm willing to do that, but it will take some time. For now,
maybe it's better to have someone more familiar with lettuce (if any)
quickly review and approve it.

comment:22 Changed 8 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:23 Changed 8 years ago by jinmei

Forgot to mention this: I made some trivial (I hope) cleanups directly
on the branch and pushed them.

comment:24 in reply to: ↑ 21 ; follow-up: Changed 8 years ago by jelte

  • Owner changed from jelte to jinmei

Thanks for looking at it on such short notice :)

Replying to jinmei:

Replying to jelte:

First, this point:

BTW, I have not checked, but if this works, i suspect it solves several other related tickets as well (#1385 and #1421, maybe more)

For #1421, probably. I suspect #1385 is a separate issue.

he hasn't answered my question there yet, but i have seen this problem
too; when i had an older installed version that wasn't
binary-compatible with recent changes. Exact same symptoms, caused by
dlopen picking the wrong one. So if that is the case for him too, the
root cause is of course different (we need version check), but the
system should fall in that trap a lot less (theoretically, never,
even, unless you start mixing versions).

About the branch:

general

I'm not very happy about introducing yet another hardcode of things
like B10_FROM_BUILD in our main source code. Is the plan to have this
as a short to middle term workaround and eventually replace it with a
more generic framework? If so, I can live with that (but please make
that clear in comments). If not...we should discuss it:-)

Actually, I just don't see another way (yet? perhaps you have an
idea). Even if we have a configuration value, we'll still need to
override the default when running from source tree (either by the
caller or the callee, but i'd prefer to put it in as central a
location as possible). And not only for tests, also for run_bind10.sh.
I did not want to add yet another env var, and the location should
always be fixed from the build directory, so if we do need one, this
one seems best to me. This is a very similar problem to the database
file one.

datasrc_config.h.in

  • We need a header guard for this file.

ack

  • I'd avoid using macro whenever possible, and I think in this case

we can avoid that (it should be able to be a const char* const). I'd
also put the definition in a namespace isc::datasrc.

ack

  • We need documentation for MODULE_PATH.

ack

  • Although this is independent from the topic of this ticket, I personally think @prefix@/lib is not the right place for loadable modules for two reasons: 1. they are not really "libraries", 2. they are quite BIND 10 specific. So I'd suggest changing it to @localstatedir@/@PACKAGE@/modules (which would be expanded to something like /usr/local/libexec/bind10-devel/modules), either within this ticket or in a new ticket after creating it.

ah, yes i asked around for better suggestions but hadn't gotten any,
so I defaulted to what we were already using. The annoying thing about
localstatedir is that it expands to a shell variable, not an absolute
directory (for that matter, even if it would be prefix/lib it
should've been libdir which does the same, the reason being that
make is supposed to be able to modify it as well, after configure has
run).

Oh, localstate wouldn't expand to libexec, but to var, btw. I like
libexec a bit more though (not that it's exec, but it's not really
var/ material either, and this way it is closer to the binary).
Anyway, I have no really strong opinion, and now that it is expanded
correctly, we can easily change it.

bind10_src.py.in

Why did you reintroduce start_zonemgr and start_stats? Also, although
this is a subject of #1388, we should actually even be able to
obsolete (i.e. remove) start_xfrout() and start_xfrin() and remove
XfrIn/XfrOut? from isc.bind10.special_component.py (and these would
also require updating bob.spec).

Oops. Merge error (started out in the existing branch, then figured
out there was no reason for that, and move changes to master. That
brought along some old code. Reapplying cleanly now).

We could leave this task to #1388 as originally planned, or complete
that part within this ticket.

Actually I prefer doing it here, so we can show that it works.

factory.cc

  • maybe a matter of taste in the case of std::string, but I suggest using empty() instead of length() == 0:
        if (type.length() == 0) {
    

ok

  • I'd also move the definition of lib_file just before its first use:
        std::string lib_file = type;
        const int ext_pos = lib_file.rfind(".so");
    
  • Don't we need a test for getDataSourceLibFile?

did we decide on a good approach for testing
anonymous-namespace-functions? We can of course make it a function in
isc::datasrc otherwise.
(i.e. have not done this yet)

datasrc/tests/Makefile.am

  • Why was this commented out, not simply removed?
    -run_unittests_SOURCES += factory_unittest.cc
    +#run_unittests_SOURCES += factory_unittest.cc
    
  • Regarding run_unittests_factory, if the intent is as commented:
    # Also, we only want to do this when static building is not used.
    
    Shouldn't we wrap this with "if !USE_STATIC_LINK/endif"?

yes, sorry, forgot about this in my hurry to get you to take a look at
it :p Moved the if check and removed the old part.

lettuce bind10_control.py

I've not confirmed this by running it, but is this okay?

    result = bindctl.wait()
    (stdout, stderr) = bindctl.communicate()

According to the documentation communicate() seems to have the
functionality of wait().

It works (communicate calls wait() again, which immediately returns),
but it is probably cleaner to only use communicate, then check the
returncode attribute, so I've changed that.

lettuce xfrin_bind10.feature

I'm afraid I'll need to learn more about lettuce to understand this
change. I'm willing to do that, but it will take some time. For now,
maybe it's better to have someone more familiar with lettuce (if any)
quickly review and approve it.

Ok, will do.

comment:25 in reply to: ↑ 24 Changed 8 years ago by jinmei

Replying to jelte:

First off, distcheck failed:

ERROR: files left in build directory after distclean:
./src/lib/datasrc/datasrc_config.h

BTW, I have not checked, but if this works, i suspect it solves several other related tickets as well (#1385 and #1421, maybe more)

For #1421, probably. I suspect #1385 is a separate issue.

he hasn't answered my question there yet, but i have seen this problem
too; when i had an older installed version that wasn't
binary-compatible with recent changes. Exact same symptoms, caused by
dlopen picking the wrong one. So if that is the case for him too, the
root cause is of course different (we need version check), but the
system should fall in that trap a lot less (theoretically, never,
even, unless you start mixing versions).

I'm still not sure whether dlopen() helps #1385 - could it be an
incompatible version of our own loadable modules? Anyway, that's not
a showstopper issue for this ticket. As for #1385 itself maybe you
should poke Kevin on jabber or give the ticket back to him.

About the branch:

general

I'm not very happy about introducing yet another hardcode of things
like B10_FROM_BUILD in our main source code. [...]

Actually, I just don't see another way (yet? perhaps you have an
idea). Even if we have a configuration value, we'll still need to
override the default when running from source tree (either by the
caller or the callee, but i'd prefer to put it in as central a
location as possible). And not only for tests, also for run_bind10.sh.
I did not want to add yet another env var, and the location should
always be fixed from the build directory, so if we do need one, this
one seems best to me. This is a very similar problem to the database
file one.

I've not fully thought about it either, but I can imagine something
like this:

  • We (eventually) introduce more generic data source configuration where we can specify the default module path
  • We specify the installed version of module path for the installed version of the spec file for the data source config "module"
  • For in-source run environment, we might either provide a hacked version of spec file (if possible) or provide a template b10-config file that overrides the module path config to the in-source path
  • For unit tests, we can probably provide a utility module/helper library that tweaks the path and have tests that need to use data sources import/call it

This is not a baked idea and may or may not be feasible, but in
general if we can replace this kind of harcode in the core
module/program with configuration and helper modules specific for
tests (while reasonably avoiding duplicate setups for tests), I'd
prefer that.

Anyway, for the purpose of this ticket I'm okay with the current
approach (even if the above idea makes sense it would require generic
data source config framework, which would be way beyond the scope for
this ticket).

datasrc_config.h.in

  • Although this is independent from the topic of this ticket, I personally think @prefix@/lib is not the right place for loadable modules [...]

ah, yes i asked around for better suggestions but hadn't gotten any,
[...]

Oh, localstate wouldn't expand to libexec, but to var, btw. I like
libexec a bit more though (not that it's exec, but it's not really
var/ material either, and this way it is closer to the binary).
Anyway, I have no really strong opinion, and now that it is expanded
correctly, we can easily change it.

Ah, yes, what I meant was libexec rather than localstatedir (maybe I
incorrectly assumed the latter would expand to libexec when I made the
comment). The revised code on this point looks okay.

And this means the existing modules could now be cleaned up, and to
note that I think this ticket needs a changelog entry.

bind10_src.py.in

Why did you reintroduce start_zonemgr and start_stats? [...]


Oops. Merge error (started out in the existing branch, then figured
out there was no reason for that, and move changes to master. That
brought along some old code. Reapplying cleanly now).

We could leave this task to #1388 as originally planned, or complete
that part within this ticket.

Actually I prefer doing it here, so we can show that it works.

Okay, then for the next step we should at least remove start_xfrin/out
and update bob.spec and isc.bind10.special_component accordingly.

I think that changes regarding this should also be noted in the
changelog.

factory.cc

  • Don't we need a test for getDataSourceLibFile?

did we decide on a good approach for testing
anonymous-namespace-functions? We can of course make it a function in

I don't even remember whether we have ever discussed this, but in
general I don't think we can directly test things in an unnamed (aka
"anonymous") namespace. In this case I can think of two
possibilities:

  • define it in something like isc::datasrc::detail (with an explicit note that it's only for tests) and test the function directly.
  • test the behavior indirectly, e.g., by passing a non existent path to DataSourceClientContainer? and check the resulting what() string on exception.

(If the second option isn't feasible) It would be case-by-case whether
we want to test a particular function/method even if it would
technically publish internal details. I personally think the
testability is more important in this case because we can often
overlook some corner case errors in this type of string manipulation.

datasrc/tests/Makefile.am

yes, sorry, forgot about this in my hurry to get you to take a look at
it :p Moved the if check and removed the old part.

Okay, then I have some additional comments:

  • is this about "the libraries" or "modules"?
    # For the factory unit tests, we need to specify that we want
    # the libraries from the build tree, ...
    
  • also, maybe we should copy the reason why we limit this to !USE_STATIC_LINK from the now-removed comment:
    # This test uses dynamically loadable module.  It will cause various
    # troubles with static link such as "missing" symbols in the static object
    # for the module.  As a workaround we disable this particualr test
    # in this case.
    

Changes I've not explicitly commented on above are okay to me.

comment:26 Changed 8 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:27 Changed 8 years ago by jelte

  • Owner changed from jelte to jinmei

Replying to jinmei:

Replying to jelte:

First off, distcheck failed:

ERROR: files left in build directory after distclean:
./src/lib/datasrc/datasrc_config.h

fixed

BTW, I have not checked, but if this works, i suspect it solves several other related tickets as well (#1385 and #1421, maybe more)

For #1421, probably. I suspect #1385 is a separate issue.

he hasn't answered my question there yet, but i have seen this problem
too; when i had an older installed version that wasn't
binary-compatible with recent changes. Exact same symptoms, caused by
dlopen picking the wrong one. So if that is the case for him too, the
root cause is of course different (we need version check), but the
system should fall in that trap a lot less (theoretically, never,
even, unless you start mixing versions).

I'm still not sure whether dlopen() helps #1385 - could it be an
incompatible version of our own loadable modules? Anyway, that's not
a showstopper issue for this ticket. As for #1385 itself maybe you
should poke Kevin on jabber or give the ticket back to him.

Yes, incompatible loaded libraries (at least in the case where I ran
into it), needs some form of version check, but it also needs to
(be able to) pick the right file in the first place (otherwise you can
only detect something is wrong, not do anything about it).

About the branch:

I've not fully thought about it either, but I can imagine something
like this:

  • We (eventually) introduce more generic data source configuration where we can specify the default module path
  • We specify the installed version of module path for the installed version of the spec file for the data source config "module"
  • For in-source run environment, we might either provide a hacked version of spec file (if possible) or provide a template b10-config file that overrides the module path config to the in-source path
  • For unit tests, we can probably provide a utility module/helper library that tweaks the path and have tests that need to use data sources import/call it

This is not a baked idea and may or may not be feasible, but in
general if we can replace this kind of harcode in the core
module/program with configuration and helper modules specific for
tests (while reasonably avoiding duplicate setups for tests), I'd
prefer that.

(this has now been kind of superseded by the meeting discussion, but
leaving it in for reference)

Hmm, differing spec-files may indeed be the best approach here. One
open question could be whether it should be possible to work with
'staging setups', where the target of 'make install' is not the final
location on the system (not saying that it should, just wondering).

This could then also be used for things like the current
Auth/database_file problem (where some environment variables even
cause failing tests right now).

Also not fully thought out yet, but possibly a nice approach could be
to just modify them upon installation (or recreate them with
install-values at that point), so there's no confusion in-tree.

I think we should make removing environment variables a goal (or at
least reduce them to the bare minimum). A slightly different,
but related issue is the current run-scripts, and generated python
files.

So I propose to keep the approach in the branch for now, then

  • identify the environment variables we use
  • discuss a final approach to remove or consolidate them
  • implement that approach

And the same for the generated files and run_scripts (perhaps even in
one go, though that might be a bit much).

I'll suggest this as a topic for the meeting today.

Anyway, for the purpose of this ticket I'm okay with the current
approach (even if the above idea makes sense it would require generic
data source config framework, which would be way beyond the scope for
this ticket).

ack

Ah, yes, what I meant was libexec rather than localstatedir (maybe I
incorrectly assumed the latter would expand to libexec when I made the
comment). The revised code on this point looks okay.

And this means the existing modules could now be cleaned up, and to
note that I think this ticket needs a changelog entry.

See below for a proposal for changelog.

bind10_src.py.in

Actually I prefer doing it here, so we can show that it works.

Okay, then for the next step we should at least remove start_xfrin/out
and update bob.spec and isc.bind10.special_component accordingly.

I think that changes regarding this should also be noted in the
changelog.

Of course, done, should now be started as dispensable normal
components.

factory.cc

  • Don't we need a test for getDataSourceLibFile?

did we decide on a good approach for testing
anonymous-namespace-functions? We can of course make it a function in

I don't even remember whether we have ever discussed this, but in
general I don't think we can directly test things in an unnamed (aka
"anonymous") namespace. In this case I can think of two
possibilities:

  • define it in something like isc::datasrc::detail (with an explicit note that it's only for tests) and test the function directly.
  • test the behavior indirectly, e.g., by passing a non existent path to DataSourceClientContainer? and check the resulting what() string on exception.

I personally am not fond of changing or adding interfaces just for
testing (in general, of course there are always exceptions), so adding
a specialized namespace for this would not be my favourite option. And
neither would another way we could go; introduce a member variable
that stores the file that has been loaded.

But checking the error message is a good idea, added a test. Do you
know if there are dlopen() implementations that form different errors?
(if so i'll have to modify the test a bit to first discover that error,
or do some form of string.startswith check, but the error a direct
EXPECT_EQ() gives is much nicer.

datasrc/tests/Makefile.am

yes, sorry, forgot about this in my hurry to get you to take a look at
it :p Moved the if check and removed the old part.

Okay, then I have some additional comments:

  • is this about "the libraries" or "modules"?
    # For the factory unit tests, we need to specify that we want
    # the libraries from the build tree, ...
    

Ack changed (into backend libraries, as per the call just now)

  • also, maybe we should copy the reason why we limit this to !USE_STATIC_LINK from the now-removed comment:
    # This test uses dynamically loadable module.  It will cause various
    # troubles with static link such as "missing" symbols in the static object
    # for the module.  As a workaround we disable this particualr test
    # in this case.
    

ok, readded

Changes I've not explicitly commented on above are okay to me.

Thanks.

Finally, here's the changelog proposal (it's a pretty long one for a
relatively small change :p)

[build?] jelte

The DataSourceClient? class that dynamically loads datasource backend
libraries no longer provides just a .so file name to its call to
dlopen(), but passes it an absolute path. This means that it is no
longer an system implementation detail that depends on
[DY]LD_LIBRARY_PATH which file is chosen, should there be multiple
options (for instance, when test-running a new build while a different
version is installed).
These loadable modules are also no longer installed in the default
library path, but in a subdirectory of the libexec directory of the
target ($prefix/libexec/[version]/backends).
This also removes the need to handle b10-xfin and b10-xfrout as
'special' hardcoded components, and they are now started as regular
components as dictated by the configuration of the boss process.

comment:28 follow-up: Changed 8 years ago by jinmei

Replying to jelte:

Also not fully thought out yet, but possibly a nice approach could be
to just modify them upon installation (or recreate them with
install-values at that point), so there's no confusion in-tree.

Yes, that's one concrete form of realization of the general idea.

So I propose to keep the approach in the branch for now, then

  • identify the environment variables we use
  • discuss a final approach to remove or consolidate them
  • implement that approach

And the same for the generated files and run_scripts (perhaps even in
one go, though that might be a bit much).

This is fine to me.

factory.cc

  • Don't we need a test for getDataSourceLibFile?

But checking the error message is a good idea, added a test. Do you
know if there are dlopen() implementations that form different errors?
(if so i'll have to modify the test a bit to first discover that error,
or do some form of string.startswith check, but the error a direct
EXPECT_EQ() gives is much nicer.

Yeah actually that failed for my environment:

factory_unittest.cc:41: Failure
Value of: error
  Actual: "dlopen(/no_such_file.so, 10): image not found"
Expected: expected_error
Which is: "/no_such_file.so: cannot open shared object file: No such file or directory"

I think it makes more sense to make the what() message independent
from dlopen() implementation details, e.g.:

        isc_throw(DataSourceLibrarySymbolError, "dlopen failed for "
        << name << ": " << dlsym_error);

and search for "dlopen failed for /no_such_file.so", etc. That may
result in some redundant message like repeating the path name, but
assuming it's basically something "exceptional" I think it's
acceptable.

Also, pathtest_helper would better be named pathtestHelper per naming
guideline.

Finally, here's the changelog proposal (it's a pretty long one for a
relatively small change :p)

[build?] jelte

The DataSourceClient? class that dynamically loads datasource backend
libraries no longer provides just a .so file name to its call to
dlopen(), but passes it an absolute path. This means that it is no
longer an system implementation detail that depends on
[DY]LD_LIBRARY_PATH which file is chosen, should there be multiple
options (for instance, when test-running a new build while a different
version is installed).
These loadable modules are also no longer installed in the default
library path, but in a subdirectory of the libexec directory of the
target ($prefix/libexec/[version]/backends).
This also removes the need to handle b10-xfin and b10-xfrout as
'special' hardcoded components, and they are now started as regular
components as dictated by the configuration of the boss process.

Overall it looks okay. I have some minor comments.

  • I think this is categorized as "bug" (for which we currently provide workaround, and this change is a more complete fix)
  • This is a backward incompatible change (someone who tweaks bob.spec will need to update it), so we'll need '*'. We may also explicitly note the incompatibility.
  • I would consistently call the .so's "modules" (this changelog entry first calls it "library" then "module". See also below for library vs module)
  • We may want to note that previously installed .so's can now be removed.

BTW I think it's okay to write a long entry. Our general consensus is
to rather provide longer ones than helplessly concise ones like just
"Fixed a b10-auth crash bug".

factory.h

I don't see the reason for changing "module" to "library" here:

-/// The value of type can be a specific loadable module; if it already ends
+/// The value of type can be a specific loadable library; if it already ends
 /// with '.so', the loader will not add '_ds.so'.
 /// It may also be an absolute path; if it starts with '/', nothing is
-/// prepended. If it does not, the loadable module will be taken from the
-/// installation library directory.
+/// prepended. If it does not, the loadable library will be taken from the
+/// libexec/backends/ installation directory.

I see the motivation of not using "module" for the installation
directory, but in this context we don't see any confusion in using
"module". I also believe the term (dynamic) "loadable module" is
pretty well known (just like indicated by the "-module" LD flag).
Besides, even if we want to avoid "module" for them they are not
really "libraries" in their usual sense either, so I don't think the
latter is necessarily a better choice anyway. Same comment applies to
tests/Makefile.am.

Also, I'm not sure if we really want to hardcode "libexec/backends"
here because we might change the decision of the naming/path, in which
case we could easily just update Makefile.am and forget update this
one.

comment:29 Changed 8 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:30 in reply to: ↑ 28 Changed 8 years ago by jelte

  • Owner changed from jelte to jinmei

Replying to jinmei:

factory.cc

  • Don't we need a test for getDataSourceLibFile?

But checking the error message is a good idea, added a test. Do you
know if there are dlopen() implementations that form different errors?
(if so i'll have to modify the test a bit to first discover that error,
or do some form of string.startswith check, but the error a direct
EXPECT_EQ() gives is much nicer.

Yeah actually that failed for my environment:

factory_unittest.cc:41: Failure
Value of: error
  Actual: "dlopen(/no_such_file.so, 10): image not found"
Expected: expected_error
Which is: "/no_such_file.so: cannot open shared object file: No such file or directory"

I was afraid of that :) (and it's even worse than I feared).

I think it makes more sense to make the what() message independent
from dlopen() implementation details, e.g.:

        isc_throw(DataSourceLibrarySymbolError, "dlopen failed for "
        << name << ": " << dlsym_error);

and search for "dlopen failed for /no_such_file.so", etc. That may
result in some redundant message like repeating the path name, but
assuming it's basically something "exceptional" I think it's
acceptable.

Right. Done.

Also, pathtest_helper would better be named pathtestHelper per naming
guideline.

ack, done

Finally, here's the changelog proposal (it's a pretty long one for a
relatively small change :p)

<snip>

BTW I think it's okay to write a long entry. Our general consensus is
to rather provide longer ones than helplessly concise ones like just
"Fixed a b10-auth crash bug".

ack. I'll take your suggestions for the final entry

factory.h

I don't see the reason for changing "module" to "library" here:

-/// The value of type can be a specific loadable module; if it already ends
+/// The value of type can be a specific loadable library; if it already ends
 /// with '.so', the loader will not add '_ds.so'.
 /// It may also be an absolute path; if it starts with '/', nothing is
-/// prepended. If it does not, the loadable module will be taken from the
-/// installation library directory.
+/// prepended. If it does not, the loadable library will be taken from the
+/// libexec/backends/ installation directory.

I see the motivation of not using "module" for the installation
directory, but in this context we don't see any confusion in using
"module". I also believe the term (dynamic) "loadable module" is
pretty well known (just like indicated by the "-module" LD flag).
Besides, even if we want to avoid "module" for them they are not
really "libraries" in their usual sense either, so I don't think the
latter is necessarily a better choice anyway. Same comment applies to
tests/Makefile.am.

(see also discussion in jabber, I myself have no strong preference for
any specific name either but we should indeed be consistent)

Oh yeah I missed a number of them, updated, also updated MODULE_PATH
to BACKEND_LIBRARY_PATH.

Also, I'm not sure if we really want to hardcode "libexec/backends"
here because we might change the decision of the naming/path, in which
case we could easily just update Makefile.am and forget update this
one.

Replaced it with a reference to the const value in the
datasrc_config.h file.

comment:31 follow-up: Changed 8 years ago by jinmei

The branch now seems to be mostly okay. I have a few comments on
FactoryTest::paths.

  • for pathtestHelper() I'd make sure the actual error string has a sufficient length:
        ASSERT_LT(expected_error.size(), error.size());
        EXPECT_EQ(expected_error, error.substr(0, expected_error.size()));
    
  • I think we should also cover the case for an empty string.
  • I'd also like to test the case when B10_FROM_BUILD isn't set. That would require unsetenv it once (and probably recover it immediately after that specific test case), though. Hopefully its side effect is acceptable.
  • I'd also check some tricky name like "no_such_file.so.noso" (the intermediate '.so' shouldn't confuse the parser)
  • What should we do if we just pass ".so" (no prefix)?

comment:32 Changed 8 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:33 in reply to: ↑ 31 Changed 8 years ago by jelte

  • Owner changed from jelte to jinmei

Replying to jinmei:

The branch now seems to be mostly okay. I have a few comments on
FactoryTest::paths.

  • for pathtestHelper() I'd make sure the actual error string has a sufficient length:
        ASSERT_LT(expected_error.size(), error.size());
        EXPECT_EQ(expected_error, error.substr(0, expected_error.size()));
    
  • I think we should also cover the case for an empty string.
  • I'd also like to test the case when B10_FROM_BUILD isn't set. That would require unsetenv it once (and probably recover it immediately after that specific test case), though. Hopefully its side effect is acceptable.
  • I'd also check some tricky name like "no_such_file.so.noso" (the intermediate '.so' shouldn't confuse the parser)

Ack, all added

  • What should we do if we just pass ".so" (no prefix)?

Hmm, yes, that should probably explicitely fail too, added a check for
that.

BTW, also changed the error that is raised to DataSourceLibraryError?
(instead of DataSourceError?)

comment:34 Changed 8 years ago by shane

  • Feature Depending on Ticket set to none

comment:35 Changed 8 years ago by jinmei

One minor nit:

I believe there should be a space between "called" and "with".

        isc_throw(DataSourceLibraryError, "DataSourceClient container called"
                                          "with bad type or file name");

With fixing this the branch is okay to merge.

comment:36 Changed 8 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:37 Changed 8 years ago by jinmei

Oh, and I think we can also close #1388.

comment:38 Changed 8 years ago by jinmei

  • Total Hours changed from 0 to 6.17

comment:39 Changed 8 years ago by jelte

  • Add Hours to Ticket changed from 5 to 0
  • Resolution set to fixed
  • Status changed from reviewing to closed
  • Total Hours changed from 6.17 to 16.17

Thanks, merged, closing ticket.

Note: See TracTickets for help on using tickets.