Opened 8 years ago

Closed 8 years ago

#1104 closed task (complete)

support TSIG in DNS (Request) ACL

Reported by: jinmei Owned by: jinmei
Priority: medium Milestone: Sprint-20110816
Component: xfrout Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

This is necessary for the expected ACL support for xfrout.

The difficult point is that TSIG keys are identified as DNS names,
so naive comparison as string may result in the wrong (mis)match.
Using dns:Name object is one solution, but it adds dependency from
the ACL library to libdns++ (we may end up having it for a different
reason, but right now there's no such dependency, and in general
it would be better to have fewer dependency).

Also, whether we use (some canonical type of) string or bare Name
object, comparing these is generally expensive. (Although it may not
matter much if we only use TSIG based ACL for performance insensitive
operations).

What I'm thinking is to give unique integer IDs to each TSIG key
(a monotinically increasing global counter would propbably suffice),
have the application of the ACL extract it and pass it to the ACL
library, and within ACL TSIG keys are simply compared as integers.
But this is just a not fully baked idea. Whoever actually works this
may find a better way (or reasonable short term solution even if
it's, e.g., inefficient).

Subtickets

Change History (19)

comment:1 Changed 8 years ago by stephen

  • Milestone changed from Next-Sprint-Proposed to Sprint-20110802

comment:2 Changed 8 years ago by jinmei

  • Owner set to jinmei
  • Status changed from new to accepted

comment:3 Changed 8 years ago by jinmei

trac1104 is ready for review.

After thinking about how to handle TSIG keys in the ACL, I've chosen
a quite straightforward way in the end: simply passing TSIGRecord
to RequestContext? and having the context retrieve the key name
from it. Comparison is based on Name::operator==(). I chose this
approach so that the caller can construct the context in the most
intuitive way like this:

        getQueryACL().execute(acl::dns::RequestContext(
                                  client.getRequestSourceIPAddress(),
                                  query_message->getTSIGRecord())));

The dependency on libdns++ (and pydnspp in the Python case) might be
a disadvantage, but if an application needs DNS related ACLs it's
quite likely that the application also needs libdns++/pydnspp anyway.
Comparing two names as Name object is relatively expensive, but
at least for now we don't expect to use this ACL in a very performance
sensitive path (such as for ordinary queries to auth), so it should
be okay. We could consider optimizing it if and when we see the real
need for it.

If this design is okay, the implementation should be quite
straightforward. Probably the most tricky part is the Python wrapper.
Since it's mostly impossible to refer to symbols in the C++ Python
binding for libdns++ from dns.so (or in general between different .so
modules), I introduced a frontend .py to preprocess TSIGRecord passed
to RequestContext? so that the C++ implementation for the context class
doesn't have to use symbols in pydnspp bindings directly.

Another note: right now b10-resolver cannot use TSIG based ACL in
practice because there's no way to configure TSIG keys for it right
now. I could add that part in this ticket, but it's not an urgent
goal for us, so I'd rather defer it to a separate task.

Proposed changelog:

274.?	[func]		jinmei
	Added support for TSIG key matching in ACLs.  The xfrout ACL can
	now refer to TSIG key names using the "key" attribute.  For
	example, the following specifies an ACL that allows zone transfer
	if and only if the request is signed with a TSIG of a key name
	"key.example":
	> config set Xfrout/query_acl[0] {"action": "ACCEPT", \
	                                  "key": "key.example"}
	(Trac #1104, git TBD)

comment:4 Changed 8 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from accepted to reviewing

comment:5 Changed 8 years ago by stephen

  • Owner changed from UnAssigned to stephen

comment:6 follow-up: Changed 8 years ago by stephen

  • Owner changed from stephen to jinmei

src/bin/xfrout/tests/xfrout_test.py.in
In the part of test_parse_query_message that does the TSIG ACL checks, TSIG_KEY is added to the "self.xfrsess" key ring multiple times - is this needed?

src/lib/acl/tests/dnsname_check_unittest.cc
In the "match" test, the superdomain against which the check should be made should be "com", not "org".

src/lib/python/isc/acl/_dns.py
Is this really a good name for this file? _dns.py" is very close to "dns.py".

A comment in this file refers to "log.so", which appears not to be relevant here.

ChangeLog
Looks OK

Miscellaneous
The TSIG ACL check is only on the basis of record name, which prompts the question "can we guarantee that the TSIG data is always checked?". In other words, could it be possible for a user to construct an ACL for some operation that includes a check on the TSIG key, but for the code for that operation not to check it? In which case security could be subverted by sending through a key of a given name but with arbitrary data.

comment:7 in reply to: ↑ 6 Changed 8 years ago by jinmei

Thanks for the review.

Replying to stephen:

src/bin/xfrout/tests/xfrout_test.py.in
In the part of test_parse_query_message that does the TSIG ACL checks, TSIG_KEY is added to the "self.xfrsess" key ring multiple times - is this needed?

Ah, good catch. No, they are not needed (not harmful in this context, but
simply redundant). Removed them, and added supplemental tests to catch
it if we do this again due to a naive copy.

src/lib/acl/tests/dnsname_check_unittest.cc
In the "match" test, the superdomain against which the check should be made should be "com", not "org".

Another good catch (and it's embarassing in that this type of error
shouldn't happen with a pure TDD). Fixed.

src/lib/python/isc/acl/_dns.py
Is this really a good name for this file? _dns.py" is very close to "dns.py".

I don't know if it's "good" or not:-) but it seems to be the common
convention in Python libraries. For example, socket.py uses
_socket.py the sqlite3 package uses _sqlite3.so, etc.

A comment in this file refers to "log.so", which appears not to be relevant here.

Yet another good catch. In fact, the entire comment should rather be
the same one as other .py in that directory. So I replaced it
as a whole.

I also noticed some editorial errors in comment lines of a .cc file,
so I fixed them.

Miscellaneous
The TSIG ACL check is only on the basis of record name, which prompts the question "can we guarantee that the TSIG data is always checked?". In other words, could it be possible for a user to construct an ACL for some operation that includes a check on the TSIG key, but for the code for that operation not to check it? In which case security could be subverted by sending through a key of a given name but with arbitrary data.

We cannot guarantee that. The assumption is that if an application uses
a TSIG based ACL, it's application's responsibility to perform TSIG
validation before performing the ACL check. Maybe we should document
it as a note somewhere (but I'm not sure what's the best place for that.
In changelog at the moment?)

comment:8 Changed 8 years ago by jinmei

  • Owner changed from jinmei to stephen

comment:9 follow-up: Changed 8 years ago by stephen

  • Owner changed from stephen to jinmei

Changes are OK. (Although I think you're being a bit hard on yourself about the "com" v "org" test - the name in question matches neither and I can't think of an easy way to check that you're not matching the the name you don't want to match against :-))

We cannot guarantee that. The assumption is that if an application uses a TSIG based ACL, it's application's responsibility to perform TSIG validation before performing the ACL check. Maybe we should document it as a note somewhere (but I'm not sure what's the best place for that. In changelog at the moment?)

I think that's dangerous. As the ACL is set up by the user (who may decide to include a TSIG check for a whole host of reasons) and the ACL checking code can be used by any application, the application may not know that the ACL contains a TSIG check. For that reason, it seems logical that once the ACL code has checked that the name matches, it should also check that the TSIG key data is correct.

As we are both at the IETF next week, rather than leave this ticket in limbo I suggest that you merge the code to master and we continue this discussion on the bind10-dev list. Another ticket can be created if the conclusion is to add the TSIG key data check to the code.

comment:10 in reply to: ↑ 9 Changed 8 years ago by jinmei

Replying to stephen:

Changes are OK. (Although I think you're being a bit hard on yourself about the "com" v "org" test - the name in question matches neither and I can't think of an easy way to check that you're not matching the the name you don't want to match against :-))

Okay, I've pushed the change, and I'll close this ticket.

Regarding the possible open issue...

We cannot guarantee that. The assumption is that if an application uses a TSIG based ACL, it's application's responsibility to perform TSIG validation before performing the ACL check. Maybe we should document it as a note somewhere (but I'm not sure what's the best place for that. In changelog at the moment?)

I think that's dangerous. As the ACL is set up by the user (who may decide to include a TSIG check for a whole host of reasons) and the ACL checking code can be used by any application, the application may not know that the ACL contains a TSIG check. For that reason, it seems logical that once the ACL code has checked that the name matches, it should also check that the TSIG key data is correct.

...one possible solution is to enhance the TSIGRecord class so that it
can store a state indicating the record has been verified, and have
TSIGContext::verify() change the state to "verified" on success (to
make it possible we need to stop constifying the 'record' parameter,
though). Then we can change RequestContext? so that it will ignore the
given TSIG record unless its state is "verified".

If that works for you, I'll create a separate ticket for it.

I'm not sure if it requires a community wide discussion at the
bind10-dev list just yet, so I'll simply keep this ticket open for now.

comment:11 Changed 8 years ago by jinmei

  • Owner changed from jinmei to stephen

comment:12 follow-up: Changed 8 years ago by stephen

  • Owner changed from stephen to jinmei

...one possible solution is to enhance the TSIGRecord class so that it can store a state indicating the record has been verified, and have TSIGContext::verify() change the state to "verified" on success (to make it possible we need to stop constifying the 'record' parameter,

though).
As part of a solution to avoid avoid multiple verifications of the same record, that would work. To avoid the overhead involved with "unconstifying" TSIGContext, we could make the internal "verified" flag mutable.

Then we can change RequestContext so that it will ignore the given TSIG record unless its state is "verified".

I'm not sure what you meant here. If an ACL requires a check on a TSIG record, that check needs to be carried out; you can't ignore the record just because it has not been verified.

Using your idea, I was more envisaging TSIGContext::verify() being called during the ACL checking procedure (and in other places), performing the verification if called for the first time and returning the stored verification state on subsequent calls.

comment:13 Changed 8 years ago by stephen

  • Estimated Difficulty changed from 0.0 to 5

comment:14 in reply to: ↑ 12 Changed 8 years ago by jinmei

Replying to stephen:

...one possible solution is to enhance the TSIGRecord class so that it can store a state indicating the record has been verified, and have TSIGContext::verify() change the state to "verified" on success (to make it possible we need to stop constifying the 'record' parameter,

though).
As part of a solution to avoid avoid multiple verifications of the same record, that would work. To avoid the overhead involved with "unconstifying" TSIGContext, we could make the internal "verified" flag mutable.

Firs off: in case I was not clear, what I suggested is to introduce a
"verified" flag to the TSIGRecord class, not TSIGContext. And as a
result we'll have to change from:

TSIGError
TSIGContext::verify(const TSIGRecord* const record, const void* const data,
                    const size_t data_len)

to:

TSIGError
TSIGContext::verify(TSIGRecord* const record, const void* const data,
                    const size_t data_len)

(where the "verified" flag of 'record' may be modified).

I don't think it a good idea to deceive the compiler with mutable in
this case. When we set the "verified" flag of TSIGRecord, it actually
changes externally observable behavior. IMO hiding the fact with
mutable goes beyond the acceptable usage of it.

Another approach would be to pass TSIGContext instead of TSIGRecord to
RequestContext?. TSIGContext (internally) contains information of the
result of the latest verification and the associated TSIG key, so we
could expose these to RequestContext? as read only attributes. One
possible drawback is that this approach would be a bit more error
prone than the approach of passing TSIGRecord. For example, an
extremely buggy implementation may reuse a TSIGContext that has
validated a signature multiple times for different ACL checks. Such a
bug can happen with TSIGRecord, too, but it will be less likely if the
expected usage is to pass the result of Message::getTSIGRecord() to
the RequestContext? constructor (then the application never holds a
reference to the TSIG record itself).

Then we can change RequestContext so that it will ignore the given TSIG record unless its state is "verified".

I'm not sure what you meant here. If an ACL requires a check on a TSIG record, that check needs to be carried out; you can't ignore the record just because it has not been verified.

For the purpose of ACL we only use the TSIG key name (at least in the
current implementation, and as in BIND 9 and NSD). So the proposed
logic would be to change this:

template<>
bool
NameCheck<RequestContext>::matches(const RequestContext& request) const {
    return (request.tsig != NULL && request.tsig->getName() == name_);
}

to this:

template<>
bool
NameCheck<RequestContext>::matches(const RequestContext& request) const {
    return (request.tsig != NULL && request.tsig->isVerified() &&
            request.tsig->getName() == name_);
}

Using your idea, I was more envisaging TSIGContext::verify() being called during the ACL checking procedure (and in other places), performing the verification if called for the first time and returning the stored verification state on subsequent calls.

I was aware of that possibility, but didn't think it the best away to
handle this issue. IMO, if an application wants to handle TSIG, it
should explicitly do it at an appropriate position of the code. On
one hand it may look convenient if we merge TSIG verification in ACL
checks (in addition to doing verification separately), but it will
make the ACL module have tighter coupling with DNS specific logic,
which will subsequently lead to reducing clarity. It would also
require that RequestContext? be modifiable in matches(), which is
against an assumption of the method. Further, deeper inspection
before performing TSIG verification could be even dangerous, because
some of the checks may depend on the integrity of the message content
(e.g., whether a particular header flag is on/off) but we cannot rely
on the integrity until we perform the signature.

I still tend to agree on providing a safety net for naive/buggy
implementations that don't correctly perform TSIG check but still
do pass a TSIG record to RequestContext? (with which, as you pointed
out, an ACL with TSIG key would cause a confusing/insecure result).
But I believe it should be a minimal guard that is sufficient to
prevent the harmful result, and shouldn't try to be too smart.

Now, as this point seems to be not so trivial, my current suggestion
is to create a new ticket as an open enhancement, referring to this
discussion, and close this one. Would that be okay for you? (If you
like it I don't mind with keeping this ticket open instead of creating
a new one).

comment:15 Changed 8 years ago by jinmei

  • Owner changed from jinmei to stephen

comment:16 follow-up: Changed 8 years ago by stephen

  • Owner changed from stephen to jinmei

First off, I agree with the suggestion to close this ticket and open a new one.

what I suggested is to introduce a "verified" flag to the TSIGRecord class, not TSIGContext

My mistake, apologies.

I don't think it a good idea to deceive the compiler with mutable in this case. When we set the "verified" flag of TSIGRecord, it actually changes externally observable behavior

Only if we are allowing for the possibility that the contents of the TSIG keyring may have changed between two calls to verify(). If we are using the paradigm that all calls to verify() for a particular record are against the same instance of the keyring, if the first call to verify() succeeds, all subsequent calls will succeed. In this case the TSIGrecord is still logically "const", the mutable flag being used to cache a result to avoid expensive computation.

IMO, if an application wants to handle TSIG, it should explicitly do it at an appropriate position of the code.

But there is a separation between ACLs and the application; the ACL is set by the user, the packet is checked against it, then the (rest of the) application is invoked. If we restrict the TSIG check to just the name, we leave open the possibility that an incorrect packet may get through the check.

comment:17 in reply to: ↑ 16 Changed 8 years ago by jinmei

Replying to stephen:

First off, I agree with the suggestion to close this ticket and open a new one.

I don't think it a good idea to deceive the compiler with mutable in this case. When we set the "verified" flag of TSIGRecord, it actually changes externally observable behavior

Only if we are allowing for the possibility that the contents of the TSIG keyring may have changed between two calls to verify().

It can also happen if we call verify() multiple times for the same
record (with different data and/or TSIG context). Admittedly these
are an unlikely scenario in practice, but IMO we should not rely on
likely common cases in designing an API.

Also, while it was my own idea I now think it may not be that a good
idea to have TSIGRecord a state of "validated", because such a state
cannot be well defined with the record itself (we need data and
context).

IMO, if an application wants to handle TSIG, it should explicitly do it at an appropriate position of the code.

But there is a separation between ACLs and the application; the ACL is set by the user, the packet is checked against it, then the (rest of the) application is invoked. If we restrict the TSIG check to just the name, we leave open the possibility that an incorrect packet may get through the check.

RequestContext? is constructed with TSIGRecord (or it may have to be
TSIGContext according to the discussion of the previous paragraph),
not a particular name. So, it's impossible even for a naive/buggy
implementation to accidentally bypass the ACL with TSIG validation if
we add some kind of "validated-or-not" state to TSIGRecord or
TSIGContext and have RequestContext? check that on construction.

That is, a good and complete implementation would do this:

    tsig = message->getTSIGRecord();
    ctx.verify(tsig, data, data_len);
    acl.execute(RequestContext(from_addr, tsig));

A good bug incomplete (missing TSIG support) would do this:

    acl.execute(RequestContext(from_addr, NULL));

A naive application would do this:

    acl.execute(RequestContext(from_addr, message->getTSIGRecord()));

But even in the last case, the ACL check would identify the record has
not been verified and avoid accidental match.

comment:18 Changed 8 years ago by jinmei

Follow up ticket created: #1164.

Closing this one.

comment:19 Changed 8 years ago by jinmei

  • Resolution set to complete
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.