Opened 8 years ago

Closed 7 years ago

#1756 closed defect (fixed)

revise "ITERATE" query in sqlite3_accessor

Reported by: jinmei Owned by: muks
Priority: medium Milestone: Sprint-20130205
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 6 Add Hours to Ticket: 0
Total Hours: 0.74 Internal?: no

Description

See https://lists.isc.org/pipermail/bind10-dev/2012-March/003167.html

With the current schema zone iterator will be slow for a huge zone (I
actually confirmed it via the command line interface of sqlite3).

We should either:

  • sort the result only by rname, or
  • add an index on (rname, rdtype)

The latter has a side effect of making the DB bigger (in my
experimental data it made the DB file 20% bigger), and will make
updates slower (I have no data about this though).

Subtickets

Change History (28)

comment:1 Changed 7 years ago by jinmei

  • Milestone set to Next-Sprint-Proposed

comment:2 Changed 7 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120925

comment:3 Changed 7 years ago by muks

  • Owner set to muks
  • Status changed from new to assigned

Picking

comment:4 Changed 7 years ago by muks

I've updated the ANY_SUB query too as it's mentioned in the mailing list. Here is what happens with the new syntax:

sqlite> --- old:
sqlite> explain query plan SELECT rdtype, ttl, sigtype, rdata FROM records WHERE zone_id=5 AND name LIKE ('%.' || 'example.org.');
0|0|0|SCAN TABLE records (~50000 rows)
sqlite> --- new:
sqlite> explain query plan SELECT rdtype, ttl, sigtype, rdata FROM records WHERE zone_id=5 AND rname LIKE 'org.example.%';
0|0|0|SEARCH TABLE records USING INDEX records_byrname (rname>? AND rname<?) (~3125 rows)
sqlite> 

comment:5 Changed 7 years ago by muks

Here are some measurements on the records table:

32 seconds for insert without index of 299957 records
-rw-rw-r--   1 muks muks 95246336 Sep 20 15:36 tmp2.sqlite3-without-index

7 seconds for creating index after inserting records
-rw-rw-r--   1 muks muks 114114560 Sep 20 15:39 tmp2.sqlite3-with-index-after
~20% increase in file size

41 seconds for insert with index of 299957 records
-rw-rw-r--   1 muks muks 114114560 Sep 20 15:44 tmp2.sqlite3-with-index-before
~20% increase in file size
~28% increase in insert time

comment:6 Changed 7 years ago by muks

  • Owner changed from muks to UnAssigned
  • Status changed from assigned to reviewing

When RR separation is not done by the iterator, it does seem to want rows for each RRType together. I have pushed this index creation to the branch, along with the the ANY_SUB changes. Please review them.

I don't know if a schema version upgrade is necessary. Things seem to work without it, but please tell me if I am mistaken.

comment:7 Changed 7 years ago by vorner

  • Owner changed from UnAssigned to vorner

comment:8 follow-up: Changed 7 years ago by vorner

  • Owner changed from vorner to muks

Hello

First, I'm not completely sure the tradeoff is worth it (larger DB and slower updates for faster iteration). Should we ask on the ML about it?

Anyway, the change should have a changelog entry I think. Also, I think this is the kind of change in DB to increase the minor version (it is backwards compatible, but it is different). And the update utility should be extended to be able to add the indices to already existing database.

I'm not against the change to sqlite3_datasrc.cc, but we should drop this file really soon now. I don't think anything uses it any more.

Also, it seems the lettuce tests fail for me. Did you try running them? I tried 3 times and all three times it failed on the same test:

last bindctl output should not contain "error"                                                               # features/terrain/bind10_control.py:152
    Then wait for new master stderr message XFROUT_RECEIVED_GETSTATS_COMMAND                                     # features/terrain/steps.py:34
    Traceback (most recent call last):
      File "/home/vorner/.local/lib64/python2.7/site-packages/lettuce/core.py", line 117, in __call__
        ret = self.function(self.step, *args, **kw)
      File "/home/vorner/work/bind10/tests/lettuce/features/terrain/steps.py", line 52, in wait_for_stderr_message
        (found, line) = world.processes.wait_for_stderr_str(process_name, strings, new, int(times))
      File "/home/vorner/work/bind10/tests/lettuce/features/terrain/terrain.py", line 344, in wait_for_stderr_str
        matches)
      File "/home/vorner/work/bind10/tests/lettuce/features/terrain/terrain.py", line 244, in wait_for_stderr_str
        strings, only_new, matches)
      File "/home/vorner/work/bind10/tests/lettuce/features/terrain/terrain.py", line 228, in _wait_for_output_str
        assert False, "Timeout waiting for process output: " + str(strings)
    AssertionError: Timeout waiting for process output: [u'XFROUT_RECEIVED_GETSTATS_COMMAND']
    The counter notifyoutv4 for the zone _SERVER_ should be 0                                                    # features/terrain/bind10_control.py:395

With regards

comment:9 in reply to: ↑ 8 Changed 7 years ago by muks

Hi Michal

Replying to vorner:

Hello

First, I'm not completely sure the tradeoff is worth it (larger DB and slower updates for faster iteration). Should we ask on the ML about it?

I've asked on the mailing list: https://lists.isc.org/pipermail/bind10-dev/2012-September/003828.html

Anyway, the change should have a changelog entry I think.

How does this look:

XXX.    [func]          muks
	Added an SQLite3 index on records(rname, rdtype). This decreases
        insert performance by ~28% and adds about ~20% to the file size,
        but increases zone iteration performance. As it introduces a new
        index, a database upgrade would be required.
        (Trac #1756, git ...)

Also, I think this is the kind of change in DB to increase the minor version (it is backwards compatible, but it is different). And the update utility should be extended to be able to add the indices to already existing database.

Done.

I'm not against the change to sqlite3_datasrc.cc, but we should drop this file really soon now. I don't think anything uses it any more.

Nod.

Also, it seems the lettuce tests fail for me. Did you try running them? I tried 3 times and all three times it failed on the same test:

last bindctl output should not contain "error"                                                               # features/terrain/bind10_control.py:152
    Then wait for new master stderr message XFROUT_RECEIVED_GETSTATS_COMMAND                                     # features/terrain/steps.py:34
    Traceback (most recent call last):
      File "/home/vorner/.local/lib64/python2.7/site-packages/lettuce/core.py", line 117, in __call__
        ret = self.function(self.step, *args, **kw)
      File "/home/vorner/work/bind10/tests/lettuce/features/terrain/steps.py", line 52, in wait_for_stderr_message
        (found, line) = world.processes.wait_for_stderr_str(process_name, strings, new, int(times))
      File "/home/vorner/work/bind10/tests/lettuce/features/terrain/terrain.py", line 344, in wait_for_stderr_str
        matches)
      File "/home/vorner/work/bind10/tests/lettuce/features/terrain/terrain.py", line 244, in wait_for_stderr_str
        strings, only_new, matches)
      File "/home/vorner/work/bind10/tests/lettuce/features/terrain/terrain.py", line 228, in _wait_for_output_str
        assert False, "Timeout waiting for process output: " + str(strings)
    AssertionError: Timeout waiting for process output: [u'XFROUT_RECEIVED_GETSTATS_COMMAND']
    The counter notifyoutv4 for the zone _SERVER_ should be 0                                                    # features/terrain/bind10_control.py:395

I think this was an issue with the stats counter updates when the branch was made, and has since been fixed in master.

comment:10 Changed 7 years ago by muks

This bug is waiting on a decision whether the index should be added or not, due to the performance impact of adding it.

comment:11 Changed 7 years ago by muks

This index is also needed by one of the queries in #1899, so we should include it.

comment:12 Changed 7 years ago by muks

We do need to return RRs of the same type consecutively for the iterator to put in a single RRset when separate_rrs=false. We cannot remove that ORDER BY. So we should have this key unless we want iteration to remain slow with large zones.

Jinmei has replied to the mailing list that he's going to change the way RRTypes are stored from text to integer. This would indeed make key comparison faster, and hence insertions faster. We can wait and try this then.

I don't like moving the RRType sorting within the iterator itself, as that'd involve sorting it every time the iterator is used rather than just once.

This bug remains blocked.

comment:13 Changed 7 years ago by muks

Bug #2301 was created for the ANY_SUB query optimization discussed earlier in the bug.

comment:14 Changed 7 years ago by jinmei

This ticket is now pending, and needs to wait for more higher level
discussions including more fundamental changes to the schema.

For now, I'm removing this ticket from the review queue and from the
current sprint.

comment:15 Changed 7 years ago by jinmei

  • Milestone changed from Sprint-20121009 to New Tasks
  • Owner changed from muks to jinmei
  • Status changed from reviewing to accepted

comment:16 Changed 7 years ago by jinmei

  • Owner changed from jinmei to muks
  • Status changed from accepted to assigned

comment:17 Changed 7 years ago by shane

  • Milestone New Tasks deleted

comment:18 Changed 7 years ago by jinmei

See the discussion around this:
https://lists.isc.org/pipermail/bind10-dev/2013-January/004204.html

I guess it's time to complete this task with *adopting* it.

If our basic expectation is to not use SQLite3 for huge zones,
this change won't particularly help but won't do harm either
(because the additional overhead would also be marginal).

If the user still wants to use SQLite3 with understanding the basic
expectation and do it at their own risk, it helps them. And the
additional overhead is part of the cost they need to pay in that case.

So, either way, it doesn't harm and help some cases. Since we
basically completed the task itself, all we have to do is to just
merge it.

comment:19 Changed 7 years ago by jinmei

  • Milestone set to Next-Sprint-Proposed
  • Priority changed from medium to high

comment:20 Changed 7 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20130205

comment:21 Changed 7 years ago by muks

I'll rebase the patch on current master and put it to review.

comment:22 Changed 7 years ago by jinmei

  • Priority changed from high to medium

comment:23 Changed 7 years ago by muks

  • Owner changed from muks to UnAssigned
  • Status changed from assigned to reviewing

This is now rebased on a more recent master branch and is ready for review.

Please review the trac1756_2 branch.

comment:24 Changed 7 years ago by vorner

  • Owner changed from UnAssigned to vorner

comment:25 follow-up: Changed 7 years ago by vorner

  • Owner changed from vorner to muks

Hello

I think there should be a test checking the dbutil can upgrade from 2.1 to 2.2. Otherwise it looks OK.

comment:26 in reply to: ↑ 25 Changed 7 years ago by muks

  • Owner changed from muks to vorner

Hi Michal

Replying to vorner:

Hello

I think there should be a test checking the dbutil can upgrade from 2.1 to 2.2. Otherwise it looks OK.

Done. :)

I've also patched the test script to update the section numbers which are printed, so that we don't have to adjust all of them manually every time we change anything in-between.

comment:27 Changed 7 years ago by vorner

  • Owner changed from vorner to muks
  • Total Hours changed from 0 to 0.74

It seems OK to merge now.

comment:28 Changed 7 years ago by muks

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged to master branch in commit 9b3c959af13111af1fa248c5010aa33ee7e307ee:

* 6d750aa [1756] Add V2.1 to 2.2 upgrade tests
* 750c848 [1756] Use a variable to print section numbers in dbutil tests
* c00795e [1756] Bump schema version to 2.2 and update dbutil
* 388434f [1756] Add an index on (rname, rdtype) to records SQLite3 table

Resolving as fixed. Thank you for the reviews Michal.

Note: See TracTickets for help on using tickets.