Opened 9 years ago

Closed 9 years ago

#1067 closed task (complete)

support zone iterator in the new data source API

Reported by: jinmei Owned by: vorner
Priority: medium Milestone: Sprint-20110830
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 4.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

This task includes:

  • define the (abstract) ZoneIterator? class
  • implement DatabaseZoneIterator? class.
  • define DatabaseConnection::traverseZone() class.
  • implement SQLite3 version of traverseZone.

See commit 144caea728054ddcd91c908c6177e089e20aa5ef of trac817a for
prototype implementation.

(but the actual implementation doesn't have to stick to the way that
the prototype does)

This task depends on #1060 and #1061, but otherwise can be done in
parallel with other data source refactoring tasks.

Subtickets

Change History (14)

comment:1 Changed 9 years ago by stephen

  • Estimated Difficulty changed from 0.0 to 4

comment:2 Changed 9 years ago by stephen

  • Milestone changed from Next-Sprint-Proposed to Sprint-20110816

comment:3 Changed 9 years ago by vorner

  • Owner set to vorner
  • Status changed from new to accepted

comment:4 Changed 9 years ago by vorner

  • Owner changed from vorner to UnAssigned
  • Status changed from accepted to reviewing

Hello

It should be ready for review. There's a merge with #1061 to bring in some trivial changes from there, which however cause a lot of changes and collisions. And it is based on yet unmerged branch, so the changes inside this branch can be requested by git diff origin/trac1061....

I implemented it not as a direct function on the DatabaseAbstraction? (originally DatabaseConnection?), as it seems to be wrong way to do so for some databases, like sqlite3, but as another object. The object can hold the whole connection if needed, or just the one statement in progress. I also included interation of in-memory data source.

One possibly controversial thing is the use of NotImplemented? as the default behaviour. I do it because some user might want to try writing his own data source sometime in future and this will help him try it quickly, without having to implement too many empty methods.

With regards

comment:5 Changed 9 years ago by vorner

I updated the interface a little, so it is more similar to the one in #1062. I also merged with master (where #1061 was merged before) to incorporate latest changes from there. So current diff can be requested by git diff master....

comment:6 Changed 9 years ago by jelte

  • Owner changed from UnAssigned to jelte

comment:7 follow-up: Changed 9 years ago by jelte

  • Owner changed from jelte to vorner

I have taken the liberty to commit a few style-fixes.

I also have some naming suggestions: first is getIterator (as discussed, i want
to modify searchForRecords to return an iterator as well; i think we should
consistently name them; either we make them something like getRecords(name) and
getAllRecords(), or getRecordIterator() and getZoneIterator())

And even though is it a local class, i suggest the name of Iterator is changed
to DatabaseIterator? (so it matches the naming structure of MemoryIterator?)

sqlite3_accessor.cc:
getNext(); I don't think sqlite3_step() should only check for SQLITE_ROW as a
result value; it can signal the 'end' of the data directly with SQLITE_DONE, and
otherwise there is an error we should probably throw an exception (that includes
sqlite3_errmsg()).

as discussed on jabber, instead of getstr() we can use the convertToPlainChar()
that is in master now (it also does a bit of extra error checking), though i
would not object to renaming it :)

also, we should use an enum for the columns instead of hardcoded numbers for the
different fields. I was going to suggest reusing the same we use in
getNextRecord() (the RecordColumns? enum), but it's actually different values
that are returned here, so we can't simply do that.

I originally intended to suggest discussing either making it 4 direct arguments
again, but if we want one interface for both (and perhaps more) iterator types,
we can't do that either. Nor can we use a trick like getNext(std::string
(&data)[4]) to enforce the row size (We might change the number of columns in
the other call to 3).

So I would suggest to add a second argument to getNext(), the size of the array.
Checking this number (and passing the right one) will be a contract-level issue,
but it should, at least in theory, reduce the change of out-of-bounds writing to
the array.

iterator.h: getNextRRset(); we should probably add to the documentation that if
a (datasource)error is raised, the iterator cannot safely be used anymore (data
might get lost, and with at least the current implementation, if it is a
different-TTL problem, you'll get a looping error if you happen to catch the
exception inside the loop that calls getNextRRset())

memory_datasrc.h: 'friend' statements tend to set off alarm bells, so i suggest
we add some comments as to why we use/need them.

database.cc: note; it could be open for discussion, but the 1062 version 'fixes'
differing TTL values (by modifying the ttl to the lowest one it finds, and
logging an INFO message that the data should be fixed), instead of raising an
error.

And speaking of logging, perhaps we can use a few log messages here as well
(though if we take the ones i added in 1062 as a measure it would be a couple at
most, esp. since two or three in there will disappear once we use RAII through
an iterator like this)

comment:8 in reply to: ↑ 7 ; follow-ups: Changed 9 years ago by vorner

  • Owner changed from vorner to jelte

Hello

Replying to jelte:

I also have some naming suggestions: first is getIterator (as discussed, i want
to modify searchForRecords to return an iterator as well; i think we should
consistently name them; either we make them something like getRecords(name) and
getAllRecords(), or getRecordIterator() and getZoneIterator())

I went for the getAllRecords(), but I'm waiting for the modification to return the iterator for renaming the second part.

also, we should use an enum for the columns instead of hardcoded numbers for the
different fields. I was going to suggest reusing the same we use in
getNextRecord() (the RecordColumns? enum), but it's actually different values
that are returned here, so we can't simply do that.

I unified it by adding a fifth (well, in C++, fourth) column for the name. That one is unused in case of findRecords, but I think it doesn't matter much. Now we can reuse the iterator, if it is (in case of SQLite one) slightly modified in the constructor.

So I would suggest to add a second argument to getNext(), the size of the array.
Checking this number (and passing the right one) will be a contract-level issue,
but it should, at least in theory, reduce the change of out-of-bounds writing to
the array.

I don't think it helps much, but anyway, I added it.

database.cc: note; it could be open for discussion, but the 1062 version 'fixes'
differing TTL values (by modifying the ttl to the lowest one it finds, and
logging an INFO message that the data should be fixed), instead of raising an
error.

I'm not sure I like that way. The zone is clearly inconsistent and broken, so in that case we're serving bad data.

And anyway, this should be at last WARN, I believe. What else should we warn about, if not about clearly bad data ;-). And we might create a tool that would scan whole zone/datasource and show suggestions/recommendations sometime in future.

comment:9 in reply to: ↑ 8 Changed 9 years ago by jinmei

Replying to vorner:

database.cc: note; it could be open for discussion, but the 1062 version 'fixes'
differing TTL values (by modifying the ttl to the lowest one it finds, and
logging an INFO message that the data should be fixed), instead of raising an
error.

I'm not sure I like that way. The zone is clearly inconsistent and broken, so in that case we're serving bad data.

I expect we'll allow the data source to be updated directly, i.e., not
through the BIND 10 data source API. For example, a user may want to
update the data source via the corresponding DBMS directly. Since
each RR is stored in a separate row (which is the case at least for
the current sqlite3 data source schema), it's possible that two RRs of
the same RRset could have different TTLs.

We could ban that situation by returning SERVFAIL for normal queries
and entirely failing zone transfers, or we could try to be a bit
flexible and internally convert the inconsistent set to normalized
RRsets. I'm personally inclined to do the latter. It's analogous to
common implementations of master file parser (both BIND 9 and NSD work
that way. b10-loadzone doesn't even care about TTL differences -
although it's probably just naive rather than a result of a deliberate
decision).

comment:10 in reply to: ↑ 8 Changed 9 years ago by jelte

  • Owner changed from jelte to vorner

Replying to vorner:

I pushed one const fix that some compilers gonna want in sqlite3_accessor_unittest.cc

I went for the getAllRecords(), but I'm waiting for the modification to return the iterator for renaming the second part.

ack. Let's defer the other change to the unification (or refactor-refactor-)ticket, which i'll create shortly

also, we should use an enum for the columns instead of hardcoded numbers for the
different fields. I was going to suggest reusing the same we use in
getNextRecord() (the RecordColumns? enum), but it's actually different values
that are returned here, so we can't simply do that.

I unified it by adding a fifth (well, in C++, fourth) column for the name. That one is unused in case of findRecords, but I think it doesn't matter much. Now we can reuse the iterator, if it is (in case of SQLite one) slightly modified in the constructor.

ok cool

So I would suggest to add a second argument to getNext(), the size of the array.
Checking this number (and passing the right one) will be a contract-level issue,
but it should, at least in theory, reduce the change of out-of-bounds writing to
the array.

I don't think it helps much, but anyway, I added it.

As discussed shortly on jabber, if we are going for a always-fixed-length array in which, depending on context, the calls update specific fields, we can actually enforce the array size compile-time (by using a ref-to-array), and we don't need the second argument anymore.

But this too we can defer to the refactor-refactor-ticket

database.cc: note; it could be open for discussion, but the 1062 version 'fixes'
differing TTL values (by modifying the ttl to the lowest one it finds, and
logging an INFO message that the data should be fixed), instead of raising an
error.

I'm not sure I like that way. The zone is clearly inconsistent and broken, so in that case we're serving bad data.

And anyway, this should be at last WARN, I believe. What else should we warn about, if not about clearly bad data ;-). And we might create a tool that would scan whole zone/datasource and show suggestions/recommendations sometime in future.

See jinmei's comment :)

Other than that the changes look OK to me.

comment:11 Changed 9 years ago by vorner

  • Owner changed from vorner to jinmei

OK, I changed it to warning. But I still believe the other one should be warning as well.

So, is it ready to merge now?

Thank you

comment:12 Changed 9 years ago by vorner

  • Owner changed from jinmei to jelte

Why do we have so many people starting with 'j'? O:-)

comment:13 Changed 9 years ago by jelte

  • Owner changed from jelte to vorner

the log call didn't have the right arguments, but I have taken the liberty to update it (also changed the text and description a little bit). I also added a bit of the description you added to the 'other' ttl log message.

If you would like to verify the change, thanks, and then it can indeed be merged :)

comment:14 Changed 9 years ago by vorner

  • Resolution set to complete
  • Status changed from reviewing to closed

Thanks, closed.

Note: See TracTickets for help on using tickets.