Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#931 closed task (complete)

Implement signing part in b10-auth

Reported by: vorner Owned by: vorner
Priority: medium Milestone: Sprint-20110531
Component: Unclassified Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket: tsig
Estimated Difficulty: 3.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

This is the final part of #875. This will implement signing the responses when they go out. This uses code from #925, even if it is not merged yet.

Subtickets

Attachments (6)

propsal.diff (5.9 KB) - added by jinmei 9 years ago.
cfgmgr.diff (2.9 KB) - added by jinmei 9 years ago.
make.log (10.0 KB) - added by jinmei 9 years ago.
ccsessiontest.diff (3.5 KB) - added by jinmei 9 years ago.
fetchRemoteSpec.diff (2.6 KB) - added by jinmei 9 years ago.
ccsession.diff (3.0 KB) - added by jinmei 9 years ago.

Download all attachments as: .zip

Change History (35)

comment:1 Changed 9 years ago by shane

  • Milestone changed from New Tasks to Sprint-20110517
  • Owner changed from vorner to UnAssigned
  • Status changed from new to assigned

comment:2 Changed 9 years ago by vorner

It's been handed to UnAssigned?, because I don't have enough time today to finish it before the end of sprint, so if someone wants to take it...

There are some tests already in the branch, the actual code needs to be written, probably into the lookup handler of auth_srv.cc.

If noone takes it today to finish it now, I'll take it back.

comment:3 Changed 9 years ago by vorner

  • Owner changed from UnAssigned to vorner

Hmm, nobody took it, so I'm going to take it back.

comment:4 follow-ups: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei
  • Status changed from assigned to reviewing

It is ready for review. I tested it with dig and it seems to work (I didn't check the signature itself, but there's one).

There are few unrelated bug fixes, but it didn't work without them. Maybe they could be done better or might need some tests for them (I tested it when it stopped crashing, instead of writing tests first, it's faster when trying to find the source of problem). Is it OK to do it in another ticket?

The proposed changelog is:

Authoritative server can now sign the answers using TSIG (configured in tsig_keys/keys,
list of strings like "name:c2VjcmV0Cg==:sha1-hmac"). It doesn't use them for ACL yet,
only signs if the request is signed.

comment:5 follow-up: Changed 9 years ago by jinmei

(The following are comments for changes in auth, i.e., the main part
of the branch)

First, I've made some minor changes and pushed them to the repo.

Higher level comment

(I don't request it to be addressed within this ticket, but would like
to discuss it separately.)

I personally don't like to refer to isc::server_common::keyring (a
global variable) so directly in various places of code. (specifically
in this branch, both in test and auth_srv.cc). This way we need to
have the knowledge that the auth server internally refers to the
global variable.

Also, due to the dependency on the global variable, we need to be
careful in clearing the keyring in each test:

    isc::server_common::keyring.reset(new TSIGKeyRing);
    isc::server_common::keyring->add(key);
    server.processMessage(*io_message, parse_message, response_obuffer,
                          &dnsserv);
    isc::server_common::keyring.reset();

so that we won't accidentally leave temporary key in the global
storage.

I think we should have one level abstraction like:

  • add an explicit interface to install a keyring to a server, say, AuthServ::setTSIGKeyring()
  • In auth/main.cc we use this interface to specify the global key ring
  • In tests, the test fixture creates its own keyring and sets its in the server
  • within server, it always refers to the keyring given via setTSIGKeyring(), whether it's a global one or some local data.

Also, eventually, we'll probably want to have multiple key rings (per
view, etc), so a single initKeyring() may not suffice. but for now,
I'm okay with the current design.

auth_srv.cc

Other than the higher level point, it looks okay.

auth_srv_unittest.cc

AuthSrvTest::TSIGSigned

  • I'm not sure what this "TODO" means, but in this case we wouldn't expect it to be counted. getRRCount() doesn't count meta RRs (by design).
        // TODO Is the TSIG counted here?
        headerCheck(*parse_message, default_qid, Rcode::NOERROR(),
                    opcode.getCode(), QR_FLAG | AA_FLAG, 1, 1, 1, 0);
    
  • I'd avoid implicit conversion from a pointer to bool:
        ASSERT_TRUE(tsig) << "Missing TSIG signature";
    
    I'd either say "tsig != NULL" or ASSERT_NE(static_cast<void*>(NULL), tsg). (unfortunately static_cast is necessary for some deviant compilers)

AuthSrvTest::TSIGSignedNoKey

The comment and the test name don't seem to match the actual test
code. It seems to test a badsig case. BTW, assuming you meant
TSIGBadSig, I'd suggest:

  • Change the 3rd arg to headerCheck to TSIGError::BAD_SIG().toRcode() even if the result is the same
        headerCheck(*parse_message, default_qid, TSIGError::BAD_KEY().toRcode(),
                    opcode.getCode(), QR_FLAG, 1, 0, 0, 0);
    
  • and, add a more explicit test about the TSIG error code:
        EXPECT_EQ(TSIGError::BAD_KEY_CODE, tsig->getRdata().getError());
    

AuthSrvTest::TSIGBadSig

same mismatch for TSIGBadSig. same sense of comments apply.

Both for TSIGSignedNoKey and TSIGBadSig

Regardless of the previous point: I don't understand this:

    // Make sure we get the extended error code as well
    request_message.setEDNS(ConstEDNSPtr(new EDNS));
    createRequestPacket(request_message, IPPROTO_UDP, &context);

For what do you want to get the extended rcode? If you mean TSIG
error code, it's directly coded in the TSIG RR, not in the form of
the extended RCODE.

Other

I'd check if TSIG is checked before anything else. For example, give
a request of an unsupported query with a bad sig, and confirm we have
a bad sig (+ NOTAUTH) response rather than NOTIMP. But the current
code looks correct on this point, so (considering we're running out of
time) I'm okay with deferring it to a later ticket.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 9 years ago by vorner

Hello

Replying to jinmei:

I think we should have one level abstraction like:

  • add an explicit interface to install a keyring to a server, say, AuthServ::setTSIGKeyring()
  • In auth/main.cc we use this interface to specify the global key ring
  • In tests, the test fixture creates its own keyring and sets its in the server
  • within server, it always refers to the keyring given via setTSIGKeyring(), whether it's a global one or some local data.

This will come with some technical issues, because the keyring is created every time it is loaded, but generally I agree that something like this is a good idea.

Also, eventually, we'll probably want to have multiple key rings (per
view, etc), so a single initKeyring() may not suffice. but for now,
I'm okay with the current design.

I don't want to worry about views, at last not now. The code is not ready for views at all yet and we didn't decide how we'll handle them.

  • I'm not sure what this "TODO" means, but in this case we wouldn't expect it to be counted. getRRCount() doesn't count meta RRs (by design).

Thanks. I forgot to remove the TODO when I discovered they are not counted in the count RRs.

I'd either say "tsig != NULL" or ASSERT_NE(static_cast<void*>(NULL), tsg).
(unfortunately static_cast is necessary for some deviant compilers)

OK.

AuthSrvTest::TSIGSignedNoKey

The comment and the test name don't seem to match the actual test
code. It seems to test a badsig case. BTW, assuming you meant
TSIGBadSig, I'd suggest:

Bad copy-paste when writing the tests. I accidentally switched them, it should be correct now.

Regardless of the previous point: I don't understand this:

    // Make sure we get the extended error code as well
    request_message.setEDNS(ConstEDNSPtr(new EDNS));
    createRequestPacket(request_message, IPPROTO_UDP, &context);

For what do you want to get the extended rcode? If you mean TSIG
error code, it's directly coded in the TSIG RR, not in the form of
the extended RCODE.

That's why the tests didn't fail when I switched them. Thanks for clarification, this is not needed then. I assumed it is returned in the extended RCODE.

I'd check if TSIG is checked before anything else. For example, give
a request of an unsupported query with a bad sig, and confirm we have
a bad sig (+ NOTAUTH) response rather than NOTIMP. But the current
code looks correct on this point, so (considering we're running out of
time) I'm okay with deferring it to a later ticket.

You mean like another test for it, right?

So, how is it now? Should I create a follow-up ticket for the cleanups after the merge and say it's still subtask of #875?

Thanks

comment:7 Changed 9 years ago by jinmei

For the other part of fix (for session.cc, etc) I have counter/additional
proposal. I'm attaching a diff file, and will explain it.

Changed 9 years ago by jinmei

Changed 9 years ago by jinmei

comment:8 in reply to: ↑ 4 Changed 9 years ago by jinmei

Replying to vorner:

There are few unrelated bug fixes, but it didn't work without them. Maybe they could be done better or might need some tests for them (I tested it when it stopped crashing, instead of writing tests first, it's faster when trying to find the source of problem). Is it OK to do it in another ticket?

Okay, I've finally completed my review on this part.

In short, I'd not give it a go as it is. I have some counter
proposals (and some additional patch, the bug that it fixes should be
fixed anyway whether or not by this patch). If we can quickly agree
on the proposed fix, I'd say it can be merged. If you don't agree
with some part of it and want to discuss it more, that's fine, but in
that case I have to say we should delay the merge.

As for lacking tests...normally I wouldn't accept a proposal like a
patch without tests "because it's working". But in this case I'd like
as much to see the feature to be included in the release...so, I'd
personally loosen my usual criterion. I'd suggest you to check if
it's okay with others on jabber, though.

Now, about the code:

cfgmgr.py

Your patch changed the semantics of get_module_spec(): it now returns
an empty ModuleSpec? object when "module_name && module_name not in
self.module_specs". When I first looked at it I failed to understand
how such a radical change works without tweaking other parts...then I
noticed it was due to lack of tests. See the additional tests in
cfgmgr.diff. It would fail with the original patch.

I'm not sure whether there exist other module that relies on the
original behavior, but especially since we're trying to make a last
minute change without many tests, I think we should be careful about
compatibility. I'm also not sure if returning an "empty ModuleSpec?"
is safe. The caller would just consider it a successful result and
would try normal methods like get_module_name(), and then see
surprising disruption.

My counter proposal patch in cfgmgr.diff tries to achieve the same
thing that your fix seemingly tried to achieve with preserving the
original behavior as much as possible. Actually, my proposed patch is
uglier, so, if we can find a cleaner and still safe solution later
(with more tests), I'm okay with changing it.

ccsession.cc

The fix to addRemoteConfig() looks correct.

But, this method is quite unreadable to me, which I suspect is a
background reason why we had this type of bug. Specifically, it's not
really clear at a glance how module_name and rmod_spec are set to the
expected values and whether the processing is correct, especially in
the !spec_is_filename case. The method is also too long (at least in
my sense).

If we have enough time, I'd further refactor this method, but,
considering we don't have time, that would be a subject of a separate
ticket.

session.cc

I'm afraid this change is too tricky and affects a so fundamental part
of the system at this critical time. I think I understand the problem
correctly, and would like to proposal an alternative fix. It's in
proposal.diff (for main.cc, ccsession.cc, ccsession.h, and, it assumes
we use the unmodified session.cc although the diff file doesn't
include that part of diff for brevity). I believe the intent is clear
from the patch, but to explain it the idea is to delay starting
asynchronous read until the auth server actually starts asio loop.

Of course, this is not a complete fix. For example, we wouldn't be
able to follow changes to the keyring (although I'm not sure if it was
really possible in the original patch). And, my patch doesn't come
with tests either, which is bad. We'll need to develop a better
solution pretty soon, and, of course, with a sufficient number of
tests.

Others

I also noticed some other problems and fixed them locally. The fix is
in propsal.diff.

  • the original code crashed if we didn't include tsig_key in the config. the change to keyring.cc fixes that (again, we need tests)
  • the original code prevent system tests from working correctly. change to bind10_config.py.in fixes that.

comment:9 in reply to: ↑ 4 Changed 9 years ago by jinmei

Replying to vorner:

As for changelog:

The proposed changelog is:

Authoritative server can now sign the answers using TSIG (configured in tsig_keys/keys,
list of strings like "name:c2VjcmV0Cg==:sha1-hmac"). It doesn't use them for ACL yet,
only signs if the request is signed.

I'd use a hmac-md5 example with omitting the algorithm because this
format isn't compatible with BIND 9 dig and could be confusing. Also,
I would try to do something so that people naively copy this secret to
their configuration (e.g., using a bogus string like <base64-secret>
or adding a note that "this secret is example only; don't copy it to
your configuration" (although it may sound too verbose)).

Also, "only signs if the request is signed." is not 100% correct. I'd
say "only verifies the request if it's signed and sends signed
responses" or something (but this point is minor. it's up to you)

comment:10 in reply to: ↑ 6 Changed 9 years ago by jinmei

Replying to vorner:

I'd check if TSIG is checked before anything else. For example, give
a request of an unsupported query with a bad sig, and confirm we have
a bad sig (+ NOTAUTH) response rather than NOTIMP. But the current
code looks correct on this point, so (considering we're running out of
time) I'm okay with deferring it to a later ticket.

You mean like another test for it, right?

Yes.

So, how is it now? Should I create a follow-up ticket for the cleanups after the merge and say it's still subtask of #875?

This part now looks okay. For the deferring things I'd suggest
creating separate tickets (for different topics). I don't know if
we'd call it a subtask of #875 or consider it new ones, I'd leave it
to you.

comment:11 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:12 Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

As we are no longer in hurry, I tried to address everything (if I missed something, it's probably I overlooked it). The new code is in trac931_2, as I did some history rewrites. The current changes are:

  • It is the same up to a8307030f7af9fc88e3e66b6eefcc89f6b6e15c5.
  • The fix for sending correct module name in addRemoteConfig got a test (actually, it's not a new test, but extended the original). It's now directly in the commit with the fix, but except adding the check, the commit is the same.
  • I've thrown out my fixes and included yours (I didn't throw out the size_t → uint32_t change, but I have no idea how to test it, when it didn't produce any bugs, only looked completely wrong).
  • I added test for the double-read bug.
  • Added a test for the signature and invalid request (using unknown opcode).
  • Then there's a merge from master, it brings another parameter for the addRemoteConfig function (the original #931 was based on older version of #925).
  • The addRemoteConfig got some more attention, I split part of it into separate function and added some spaces and comments, but I don't know if there's a way to clarify it more.
  • Implemented the part with auth server not referring directly to the global keyring variable.

I hope it looks better now.

Regarding the changelog, the sha1-hmac is there for the exact reason that it's a different format than dig, so they can see it. I think confusion from the changelog would be smaller than confusion when the software rejects their keys.

So, what about this?

Authoritative server can now sign the answers using TSIG
(configured in tsig_keys/keys, list of strings like
"name:<base64-secret>:sha1-hmac"). It doesn't use them for
ACL yet, only verifies them and signs if the request is signed.

Thanks

comment:13 Changed 9 years ago by vorner

And, I forgot to mention, reloading keys works even with your version, I both tried it and it makes sense in theory ‒ the change comes as a command from cfgmgr.

comment:14 Changed 9 years ago by jinmei

It didn't compile for me. Not fully looking into it yet. Attaching the log of make for the moment.

Changed 9 years ago by jinmei

comment:15 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:16 Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

I hope it should compile now, I added missing headers and removed one warning from the file with asio.hpp. I hope it doesn't matter that much (the removed warning), since it's only unused parameter and it's in the tests. And, if we move away from msgq, this one would go anyway.

Thanks

comment:17 Changed 9 years ago by jinmei

Comments on the changes to the point of merging with master (857abf9).

First, I made some editorial cleanups and directly pushed them.

session.cc

I don't understand exactly what kind of problem that the change from
size_t to uint32_t tried to solve. The change even seems to be more
slippery in that it now passes a uint32_t to asio::buffer(), which
expects a size_t value.

If the size of size_t can be smaller than 32 bits (I don't know if the
standard defines that) I see there's a problem, but as far as I can
see the change doesn't solve that problem either.

To me, the current implementation is safer in that we limit the use of
fixed size variable to the point exactly where we need to do that and
use the generic size_t (which many other APIs expect) everywhere else.
If we wanted to deal with issues due to the possible size difference
between uint32_t and size_t, I'd rather catch and handle it within
SessionImpl::readDataLength().

server_common/keyring.cc

  • do we have a test for the change to keyring.cc?

ccsession_unittests.cc (and relevant change to ccsession.cc)

  • I'd add a comment about why we need to include asio.hpp (with a note of understanding this is generally discouraged unless there's a strong need for it), but see below for the delayedStart test first.
  • CCSessionTest::remoteConfig: at first glance it was not clear to me where the value of 1 for "item1" comes from:
            EXPECT_NO_THROW(item1 =
                            mccs.getRemoteConfigValue(module_name,
                                                      "item1")->intValue());
            EXPECT_EQ(1, item1);
    
    due to the fact that this test passes an empty config and due to the following comment in the ccsession.cc
            // Merge the received config into the defaults
    
    Then I found this comment was actually incorrect (or very confusing at best). I fixed the comment and added a clarification comment to the test and pushed the changes (e37672e).
  • delayedStart: this test seems to be too tricky (and have larger side effects such as requiring asio.hpp or adding a test-only parameter to a public method) to me. At a higher level this problem is that we try to do (a variant of) recvmsg() after starting asynchronous read on the underlying Session. I'd write a test at that level, rather than going deeper into the lower IPC level. I'm attaching a sample diff to implement this idea.

On one hand, this alternative test doesn't directly test what kind
of bad thing happens at the lower level if we do recvmsg() while we
are in the asynchronous mode, but IMO if we really want to test
that, that test should go to session_unittests. On the other hand,
the alternative approach is even better because it has a larger
potential of catching similar types of bugs within ModuleCCSession.

Please look at the diff. If you agree with the general idea, please
apply it (with cleanups: some explanation in comments, throw an
exception instead of assert(), write some more supplemental tests to
catch the exception, etc) instead of the current delayedStart. You
may still want to add tests the lower level thing at the session
class level, and if so, go ahead, but I'd suggest deferring it to a
separate ticket (this ticket has already do too many things for the
original purpose of TSIG support in b10-auth).

If you don't agree and still argue keeping delayedStart, let's
discuss it.

  • I'd add a test for ModuleCCSession::start() (to see if it throws when called twice, either implicitly or explicitly)

Changed 9 years ago by jinmei

comment:18 follow-up: Changed 9 years ago by jinmei

Comments on the rest of the changes. This completes my review.

The comments contain subtle and perhaps controversial design points.
I believe some are worth discussing, but don't think we should tweak
this branch too much at this stage. It already did too many things
for the original purpose of the ticket.

ccsession.cc

Refactoring of addRemoteConfig() looks good, but I'd go one step
further in order to make sure fetchRemoteSpec() do only one thing and
make more things const. See attached diff. It may be a matter of
thing, however, so I'd leave the decision to you.

Looking at fetchRemoteSpec(), I wonder if we have a test for the case
where module != spec.getModuleName(). If not, we should have it.

BTW, I also had a concern (that I forgot to mention previously) that
a variable like spec_is_filename is IMO error prone especially when
there's an implicit default. This variable enables to make
addRemoteConfig() do quite different types of jobs, and it's not so
clear why it's reasonable to define a default and 'true' is a better
default (except, perhaps, for compatibility). If I were to add this
feature, I'd consider an interface where the user of it can be more
clearly aware of what they are doing, e.g., by separating the methods
name to addRemoteConfigFromFile() and addRemoteConfigForModule(), or
if we have a reason we rather want to make these two modes agnostic,
by introducing an abstract notion of "spec identifier" and encapsulate
the difference there.

You may or may not agree with this view, and in any case it's far
beyond the scope of this ticket. If you think this makes sense,
please create a separate ticket; if you don't, I don't fight against
that at this moment.

auth_srv (and test)

First, let me be clear: this part of code is okay. I don't require a
change for merging this ticket. On top that...

Why do we use a pointer to a shared pointer? hmm...okay it's
explained, but it led to my another level concern I had when I first
saw the original implementation: it seemed too low-level to allow
applications to refer to keyring object (via a shared pointer)
directly. For example, right now applications can freely add or
remove a specific key from a keyring, but should that be allowed (I
simply don't know, but at least from the current implementation the
intent rather seems to be that keyring is read-only outside the
server_common/keyring module).

Allowing applications to refer to a non primitive object (in this case
a shared pointer object) that is statically defined has another usual
problem: static initialization fiasco (see below).

So, if I were to implement this stuff, I'd introduce one higher level
of abstraction, for example:

Define a new class KeyRingAccessor? like this

class KeyRingAccessor {
public:
    const KeyRing& get() const {
        if (!keyring_) {
            create();
        }
        return (*keyring_);
    }
    keyRing& getMutable() {
        if (!keyring) {
            create();
        }
        return (*keyring_);
    }
    void create() {        // this also removes existing one if there is
        keyring_.reset(new TSIGKeyRing);
    }
private:
    scoped_ptr<KeyRing> keyring_;
};

Within keyring.cc define a factory function that returns a reference
to a singleton const KeyRingAccessor? object. Also, keyring.cc itself
has a way to get access to the single object as a mutable object.

AuthSrv? is constructed with that reference, and it always refers to
the global KeyRing? via the accessor: accessor_.get(). (As a bonus the
server doesn't have to handle the case where keyring is NULL).

Tests create their own accessor and constructs AuthSrv? with it.

Like the discussion for addRemoteConfig(), I recognize this is far
beyond the scope of this ticket. For this topic I have a relatively
stronger opinion, but I'd defer further discussion whether or not you
agree with this proposal.

Miscellaneous final things

  • server_common/keyring.cc: (not directly related, but happen to notice in review) keyring (a shared pointer) is defined as a namespace scope static object. Such an object can easily cause static initialization fiasco. Consider some application defines its own static object that relies on that pointer:
    #include <server_common/keyring.h>
    
    namespace {
    SomeSingleton singleton(isc::server_common::keyring);
    }
    

Of course, the other namespace scope static object is also an example
of not-so-good practice, and/or we could say don't initialize a static
object with isc::server_common::keyring, but in general it's better to
ensure that doesn't happen, especially for a public library.

An additional indirection like KeyRingAccessor? (see above) can prevent
this issue.

Once again, whether or not you agree this is an issue to be addressed,
I think this is beyond the scope of this ticket.

  • As for updating the key ring:

And, I forgot to mention, reloading keys works even with your
version, I both tried it and it makes sense in theory ‒ the change
comes as a command from cfgmgr.

Oh, okay, understood. But (with my workaround) I suspect we still
cannot do addRemoteConfig() after starting the server (we don't have
to be able to do this today, but for the dynamic nature of BIND 10 I
can easily imagine a future possibility for that extension). So, IMO,
we should eventually make the internal send/recv in addRemoteConfig()
asynchronous (or define an asynchronous version of addRemoteConfig()),
and adjust the key initialization code accordingly.

BTW, do we have a test of key update cases?

  • changelog

Regarding the use of sha-1: I think we should actually provide
compatibility for both dig and drill, but that's a different topic,
and okay to keep the current (revised) text on this point.

Changed 9 years ago by jinmei

comment:19 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:20 in reply to: ↑ 18 ; follow-ups: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

session.cc

I don't understand exactly what kind of problem that the change from
size_t to uint32_t tried to solve. The change even seems to be more
slippery in that it now passes a uint32_t to asio::buffer(), which
expects a size_t value.

It seems I was already confused at the time after a lot of debugging, I overlooked the length is uint32_t already. I was afraid that it could read too many bytes from the socket. I reverted that commit, it was correct. Sorry for the confusion.

server_common/keyring.cc

  • do we have a test for the change to keyring.cc?

Which change? The only change that happened from last time was adding of the parameter for the remote config from master and that one is implicitly tested from the usual tests. Actually, it was the tests which brought attention to the need to adding the false to the call.

Or do you mean some other change?

ccsession_unittests.cc (and relevant change to ccsession.cc)

  • delayedStart: this test seems to be too tricky (and have larger side effects such as requiring asio.hpp or adding a test-only parameter to a public method) to me. At a higher level this problem is that we try to do (a variant of) recvmsg() after starting asynchronous read on the underlying Session. I'd write a test at that level, rather than going deeper into the lower IPC level. I'm attaching a sample diff to implement this idea.

I didn't really like the test as well. So I took your version, but I don't know if it's enough, it doesn't really cause the bug before the fix.

Anyway, if we do replace msgq (which I hope we'll do), all this might turn out as unneeded, I expect any grown up message queue to be able to listen for commands and request blocking answer at the same time.

Please look at the diff. If you agree with the general idea, please
apply it (with cleanups: some explanation in comments, throw an
exception instead of assert(), write some more supplemental tests to
catch the exception, etc) instead of the current delayedStart. You

I did try adding some, but I feel like writing tests for the fake session. Do you have any idea for a better test? Or is this enough?

Replying to jinmei:

ccsession.cc

Refactoring of addRemoteConfig() looks good, but I'd go one step
further in order to make sure fetchRemoteSpec() do only one thing and
make more things const. See attached diff. It may be a matter of
thing, however, so I'd leave the decision to you.

OK, I have no problem with this.

Looking at fetchRemoteSpec(), I wonder if we have a test for the case
where module != spec.getModuleName(). If not, we should have it.

Added.

BTW, I also had a concern (that I forgot to mention previously) that
a variable like spec_is_filename is IMO error prone especially when
there's an implicit default. This variable enables to make
addRemoteConfig() do quite different types of jobs, and it's not so
clear why it's reasonable to define a default and 'true' is a better
default (except, perhaps, for compatibility). If I were to add this
feature, I'd consider an interface where the user of it can be more
clearly aware of what they are doing, e.g., by separating the methods
name to addRemoteConfigFromFile() and addRemoteConfigForModule(), or
if we have a reason we rather want to make these two modes agnostic,
by introducing an abstract notion of "spec identifier" and encapsulate
the difference there.

The original purpose of the default value was for compatibility, yes. Otherwise, I would let user explicitly specify the boolean. I don't mind two different methods, but if you don't mind, I'd put it into a separate ticket. I don't like the spec identifier, it looks too complicated for such simple thing as passing an parameter.

Why do we use a pointer to a shared pointer? hmm...okay it's
explained, but it led to my another level concern I had when I first
saw the original implementation: it seemed too low-level to allow
applications to refer to keyring object (via a shared pointer)
directly. For example, right now applications can freely add or
remove a specific key from a keyring, but should that be allowed (I
simply don't know, but at least from the current implementation the
intent rather seems to be that keyring is read-only outside the
server_common/keyring module).

Well, yes, modifying the keyring is a bad idea and the code doesn't explicitly disallow it. However, I believe that by saying it can be replaced by a different one at any time implies that one probably doesn't want to do it.

I know that the pointer to shared pointer looks suspicious, but the accessor class looks at last equally inelegant. It looks like too much code for something that does nothing and is hard to build flow-graphs in the mind when getting trough the levels of such thing.

Allowing applications to refer to a non primitive object (in this case
a shared pointer object) that is statically defined has another usual
problem: static initialization fiasco (see below).

OK, this one probably needs a solution. Should I do something about it now, or open another ticket?

BTW, do we have a test of key update cases?

The tests in server_common contain one such case. I think that updating the keys is not a concern of the AuthSrv? class, so there's no test for that.

comment:21 in reply to: ↑ 20 Changed 9 years ago by jinmei

Replying to vorner:

One quick point:

server_common/keyring.cc

  • do we have a test for the change to keyring.cc?

Which change? The only change that happened from last time was adding of the parameter for the remote config from master and that one is implicitly tested from the usual tests. Actually, it was the tests which brought attention to the need to adding the false to the call.

Sorry for not being clear. I meant the added check for list == NULL
in updateKeyring():

void
updateKeyring(const std::string&, ConstElementPtr data) {
...
    for (size_t i(0); list && i < list->size(); ++ i) {

(in 67d9442a23eb4fab4dff3f4d1ab142948e21e880)

I provided this patch without a test at that time.

comment:22 in reply to: ↑ 20 ; follow-up: Changed 9 years ago by jinmei

Replying to vorner:

server_common/keyring.cc

  • do we have a test for the change to keyring.cc?

Which change? The only change that happened from last time was adding of the parameter for the remote config from master and that one is implicitly tested from the usual tests. Actually, it was the tests which brought attention to the need to adding the false to the call.

See my previous short response, but I didn't want to delay the process
due to this further, so I wrote the test myself (e9975a7). As noted
in the code, this means if the default could ever be non empty list
the keyring module would need to extract the default value explicitly.
I personally think this is a flaw of the config module interface (it
simply doesn't make sense to me to put such a burden to the
application; it shouldn't be bothered with what's the default, it
should only need to deal with the "currently effective" set of
configuration), but that's totally a different topic.

ccsession_unittests.cc (and relevant change to ccsession.cc)

Please look at the diff. If you agree with the general idea, please
apply it (with cleanups: some explanation in comments, throw an
exception instead of assert(), write some more supplemental tests to
catch the exception, etc) instead of the current delayedStart. You

I did try adding some, but I feel like writing tests for the fake session. Do you have any idea for a better test? Or is this enough?

I've added a test to reproduce the situation we had: call
addRemoteConfig() without delaying asynchronous read on
ModuleCCSession and confirm it results in an exception.

I also added a note to the addRemoteConfig() documentation about this
restriction.

These are c8cf644 and 1ccec99.

And, on writing these, I'd now rather think we should be more
explicit: let addRemoteConfig() throw an exception unless
ModuleCCSession hasn't been "started". Then if some other app
incorrectly tries to use addRemoteConfig() it can see what's wrong
sooner and more explicitly (rather than via scratching the head in
seeing the strange timeout, etc). I'm attaching a diff of this
version of change. Note that with this change the above added test
won't work as expected any more.

I'd leave it to you whether to make this additional change.

BTW, I also had a concern (that I forgot to mention previously) that

The original purpose of the default value was for compatibility, yes. Otherwise, I would let user explicitly specify the boolean. I don't mind two different methods, but if you don't mind, I'd put it into a separate ticket. I don't like the spec identifier, it looks too complicated for such simple thing as passing an parameter.

Deferring it to a separate ticket is fine by me (actually I don't like
to delay this ticket due to this). Note also that I'm not necessarily
sticking to the specific idea of separating the methods. My point is
that the current interface is not very intuitive and is error prone.
Any solution to address this concern is good to me.

Why do we use a pointer to a shared pointer? hmm...okay it's

[...]

Well, yes, modifying the keyring is a bad idea and the code doesn't explicitly disallow it. However, I believe that by saying it can be replaced by a different one at any time implies that one probably doesn't want to do it.

I'd say that's naive (or I simply don't trust people being smart
enough:-). I'm okay with leaving the design for now as I already
said, but please add an explicit comment about this restriction in
keyring.h (oh btw maybe the top description of keyring.h is now a bit
stale due to the recent change).

I know that the pointer to shared pointer looks suspicious, but the accessor class looks at last equally inelegant. It looks like too much code for something that does nothing and is hard to build flow-graphs in the mind when getting trough the levels of such thing.

The awkwardness of a pointer to shared pointer is not the main issue
to me. It's more that the interface is fragile and error prone (at
least IMO). While I kind of admit the sample access class doesn't
look so sophisticated, I still personally believe this type of
solution is much better. But, again, I'm okay with (or actually
rather prefer) leaving this out of this ticket. If you agree this is
worth further discussion, please create a separate file. If you don't
even agree with that, that's also fine for now. I may or may not
create one myself if and when I see a stronger need for it.

Allowing applications to refer to a non primitive object (in this case
a shared pointer object) that is statically defined has another usual
problem: static initialization fiasco (see below).

OK, this one probably needs a solution. Should I do something about it now, or open another ticket?

Another ticket please:-) It's not a problem right now.

BTW, do we have a test of key update cases?

The tests in server_common contain one such case. I think that updating the keys is not a concern of the AuthSrv? class, so there's no test for that.

Ah, okay I missed that. I didn't mean a new test to AuthSrv?, so this
point is okay.

Changed 9 years ago by jinmei

comment:23 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:24 in reply to: ↑ 22 ; follow-up: Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

Hello

Replying to jinmei:

See my previous short response, but I didn't want to delay the process
due to this further, so I wrote the test myself (e9975a7). As noted
in the code, this means if the default could ever be non empty list
the keyring module would need to extract the default value explicitly.
I personally think this is a flaw of the config module interface (it
simply doesn't make sense to me to put such a burden to the
application; it shouldn't be bothered with what's the default, it
should only need to deal with the "currently effective" set of
configuration), but that's totally a different topic.

OK, thanks.

And I agree with the point about currently effective thing. It might be worth a discussion or something.

And, on writing these, I'd now rather think we should be more
explicit: let addRemoteConfig() throw an exception unless
ModuleCCSession hasn't been "started". Then if some other app
incorrectly tries to use addRemoteConfig() it can see what's wrong
sooner and more explicitly (rather than via scratching the head in
seeing the strange timeout, etc). I'm attaching a diff of this
version of change. Note that with this change the above added test
won't work as expected any more.

I don't think so. Because, as I look at it, the inability to do the blocking read when we already have a nonblocking is a bug of the underlying session. It should be fixed sometime. This does only say there's the bug and only in one situation. The current workaround is, I hope, only workaround.

If we really want to add an exception, we should do it in the session, checking there are not two reads at once and it should be some kind of NotImplemented? or something.

Deferring it to a separate ticket is fine by me (actually I don't like
to delay this ticket due to this). Note also that I'm not necessarily
sticking to the specific idea of separating the methods. My point is
that the current interface is not very intuitive and is error prone.
Any solution to address this concern is good to me.

OK.

The awkwardness of a pointer to shared pointer is not the main issue
to me. It's more that the interface is fragile and error prone (at
least IMO). While I kind of admit the sample access class doesn't
look so sophisticated, I still personally believe this type of
solution is much better. But, again, I'm okay with (or actually
rather prefer) leaving this out of this ticket. If you agree this is
worth further discussion, please create a separate file. If you don't
even agree with that, that's also fine for now. I may or may not
create one myself if and when I see a stronger need for it.

I agree this needs a solution, so I'll open a ticket upon merging this. But sophisticated is exactly the opposite of what I find appropriate ‒ simple problems should have simple solutions.

I added the comment to keyring and I agree with the changes you did in git. Is there anything else left?

Thanks

comment:25 in reply to: ↑ 24 Changed 9 years ago by jinmei

Replying to vorner:

I added the comment to keyring and I agree with the changes you did in git. Is there anything else left?

No, everything looks okay. Please merge (finally:-)

comment:26 Changed 9 years ago by jinmei

  • Owner changed from jinmei to vorner

comment:27 Changed 9 years ago by vorner

Ok, thanks. Merged, keeping the ticket open until tomorrow as a reminder to create the tickets.

comment:28 Changed 9 years ago by vorner

  • Resolution set to complete
  • Status changed from reviewing to closed

comment:29 Changed 9 years ago by stephen

  • Estimated Difficulty changed from 0.0 to 3
Note: See TracTickets for help on using tickets.