Opened 8 years ago

Closed 7 years ago

#1899 closed defect (fixed)

performance issue of SQLite3 "iterate" query (w/ or w/o NSEC3)

Reported by: jinmei Owned by: muks
Priority: medium Milestone: Sprint-20121009
Component: data source Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: Medium
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 6 Add Hours to Ticket: 0
Total Hours: 1.5 Internal?: no

Description

Our current implementation of iterating a zone in an SQLite3 data
source isn't scalable and won't work well for a huge zone (e.g.,
when enumerating the zone's RRs for xfrout). It was already so
before #1782. See the very end of
https://lists.isc.org/pipermail/bind10-dev/2012-March/003167.html

But I suspect #1782 has made it worse due to the introduction of union:

SELECT rdtype, ttl, sigtype, rdata, name, rname FROM records
WHERE zone_id = ?1
UNION
SELECT rdtype, ttl, "NSEC3", rdata, owner, owner FROM nsec3
WHERE zone_id = ?1 ORDER by rname, rdtype

Even if we define optimized indices for both normal and nsec3
tables, the union will require generating temporary B-tree and will
make the iteration unacceptably slow:

sqlite> explain query plan SELECT rdtype, ttl, sigtype, rdata, name, rname FROM records WHERE zone_id = 1 union select rdtype, ttl, "NSEC3", rdata, owner, owner FROM nsec3 WHERE zone_id = 1 limit 10 ;
1|0|0|SCAN TABLE records (~100000 rows)
2|0|0|SCAN TABLE nsec3 (~100000 rows)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)

There seem to be several choices:

  • Avoid sorting and use 'UNION ALL' (the latter would make sense anyway, as long as we use UNION). Personally, I'd still like to keep the result at least sorted by names, however. (I know there are discussions about whether we need to ensure the iteration result is sorted)
  • Use two separate queries for these tables and return the combined result via the iterator object.
  • Maybe some other way

Subtickets

Change History (20)

comment:1 Changed 8 years ago by shane

  • Defect Severity changed from N/A to Medium
  • Milestone New Tasks deleted

I kind of think that using two separate queries and merging the results in the code is the best way. (Although I wonder what the implications of using a non-sorted AXFR/IXFR is... perhaps we should try it and see what happens!)

comment:2 Changed 7 years ago by jinmei

  • Milestone set to Next-Sprint-Proposed

comment:3 Changed 7 years ago by jinmei

  • Milestone set to Next-Sprint-Proposed

comment:4 Changed 7 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120925

comment:5 Changed 7 years ago by muks

This is surprising, that the query planner for SQLite3 doesn't merge the two sides of the union as they are already returned in sorted order.

comment:6 Changed 7 years ago by muks

Something like this avoids the temporary Btree construction:

sqlite> explain query plan SELECT rdtype, ttl, sigtype, rdata, name, rname FROM (SELECT *, '' as hash, '' as owner FROM records WHERE zone_id=5 UNION SELECT *, owner as name, owner as rname, '' as sigtype FROM nsec3 WHERE zone_id=5 ORDER by rname, rdtype);
2|0|0|SCAN TABLE records USING INDEX records_byrname_and_rdtype (~2 rows)
3|0|0|SCAN TABLE nsec3 USING INDEX nsec3_byowner_and_rdtype (~1 rows)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION)
0|0|0|SCAN SUBQUERY 1 (~3 rows)

comment:7 Changed 7 years ago by muks

With 300000 records each in records and nsec3 tables:

[muks@jurassic jreed]$ time (echo "SELECT rdtype, ttl, sigtype, rdata, name, rname FROM records WHERE zone_id = 5 UNION SELECT rdtype, ttl, 'NSEC3', rdata, owner, owner FROM nsec3 WHERE zone_id = 5 ORDER by rname, rdtype;" | sqlite3 tmp2.sqlite3 > /dev/null)

real	0m15.763s
user	0m7.698s
sys	0m7.982s

Add the relevant keys and change query style:

[muks@jurassic jreed]$ time (echo "SELECT rdtype, ttl, sigtype, rdata, name, rname FROM (SELECT *, '' as hash, '' as owner FROM records WHERE zone_id=5 UNION SELECT *, owner as name, owner as rname, '' as sigtype FROM nsec3 WHERE zone_id=5 ORDER by rname, rdtype);" | sqlite3 tmp2.sqlite3 > /dev/null)

real	0m9.565s
user	0m6.766s
sys	0m2.772s

comment:8 Changed 7 years ago by muks

Please strike those last two comments, it's merging columns in the wrong order together.

comment:9 Changed 7 years ago by muks

  • Owner set to muks
  • Status changed from new to assigned

Picking

comment:10 Changed 7 years ago by muks

The UNION has been split up into two SQL clauses with manual merging.

This still is buggy code, and it has always been buggy (with the UNION too) because it tries to merge rname (reversed name) from the records table with owner column in the nsec3 table. So you have results out of order. There was no unit test to check if both sides of the UNION were being merged properly before. Such a check has been added now, but the problem exists.

We'd have to fix this by adding nsec3.rowner column.

comment:11 Changed 7 years ago by muks

  • Owner changed from muks to UnAssigned
  • Status changed from assigned to reviewing

I've updated it by matching the rname vs. a concatenated string of zone_name.reverse() + hash, to return results in the same order as the records table. I'm putting this bug up for review.

comment:12 Changed 7 years ago by vorner

  • Owner changed from UnAssigned to vorner

comment:13 Changed 7 years ago by vorner

  • Owner changed from vorner to muks

Hello

First, I'd clarify why the result was sorted in the first place. We don't
really need it sorted (I think the protocol does not mandate any order for
xfrout, nor does when we insert stuff into the in-memory). We just need to
collate the RRsets together. It is best done when we have a guarantee that all
RRs of the same RRset are consecutive in the stream. And that is easier done by
some kind of sorting. The fact that the NSEC3 namespace is sorted by a slightly
different key is not really very interesting.

When modifying the iteration back then to include NSEC3 namespace, I tried
sorting each of the smaller queries together and then concatenating them
together by union. But it might be that my knowledge of SQL is not good enough
O:-). If you have an idea how to solve the above problem without sorting (and
without accumulating the whole zone in memory), I'm all for it. But the extra
work to make sure the NSEC3 things and the other RRs are sorted the exact same
way is IMO not needed and only complicates the code.

Also, the tests fail for me in the dbutil directory. I think you modified the
schema only in some places, but not others:

5.2. Database is an old V1 database - upgrade
2012-09-26 11:56:50.327 INFO  [b10-dbutil.dbutil] DBUTIL_FILE Database file: /home/vorner/work/bind10/bind10-devel-20120817/_build/src/bin/dbutil/tests/dbutil_test_tempfile_13574
2012-09-26 11:56:50.327 INFO  [b10-dbutil.dbutil] DBUTIL_BACKUP created backup of /home/vorner/work/bind10/bind10-devel-20120817/_build/src/bin/dbutil/tests/dbutil_test_tempfile_13574 in /home/vorner/work/bind10/bind10-devel-20120817/_build/src/bin/dbutil/tests/dbutil_test_tempfile_13574.backup
2012-09-26 11:56:50.328 INFO  [b10-dbutil.dbutil] DBUTIL_UPGRADING upgrading database from V1.0 to V2.0
2012-09-26 11:56:50.331 INFO  [b10-dbutil.dbutil] DBUTIL_UPGRADING upgrading database from V2.0 to V2.1
2012-09-26 11:56:50.331 INFO  [b10-dbutil.dbutil] DBUTIL_UPGRADE_SUCCESFUL database upgrade successfully completed
ERROR: upgraded schema not as expected
*** FAIL

Thank you

comment:14 Changed 7 years ago by muks

I forgot to commit changes to the v2_1.sqlite3 test data file which is why make check would have failed. I've pushed it now. Will check on the other bits later.

comment:15 Changed 7 years ago by muks

  • Owner changed from muks to vorner

I have made a new branch trac1899_2. It reimplements the bug fix:

  • A check constraint is added on nsec3 table to ensure that only one NSEC3 RR exists per-owner, per-zone.
  • The UNION clause is split into two individual clauses per table.
  • Unnecessary columns are no longer selected.
  • The ORDER BY for nsec3 table results is gone (as there's only one result per-owner and they can be in any order when iterating a zone).
  • A testcase was updated to check that records in the nsec3 table were actually being returned.

comment:16 Changed 7 years ago by vorner

  • Owner changed from vorner to muks

Hello

The code looks mostly OK, but I'm not completely sure about the design.

First, is it true there can be only one NSEC3 per name? There can be multiple
NSEC3 chains (with different parameters). That could in theory intersect at
some places. I'm not saying our code works correctly at the moment with this,
but hardcoding the brokenness it into the DB seems wrong. However, I don't
think these NSEC3s would be collated to the same RRset anyway, because they
are, effectively, each in its own namespace and not talking to each other.

Also, the restriction in your case is even stronger. The table contains both
NSEC3s and their RRSIGs, which won't be possible in your case.

And then, when iterating, the NSEC3s and their RRsets may want to end up
together (I didn't look at how we collate RRsets with their RRSIGs, but I
believe they should meet).

comment:17 Changed 7 years ago by muks

  • Owner changed from muks to vorner

Hi vorner

  • The unique constraint on NSEC3 owner per zone is now removed
  • NSEC3 RRSIGs are now returned next to their RR set
  • Test was adjusted to check for RRSIGs too

This still doesn't add or have existing indexes for queries like the first half of the union on the records table (bug #1756), or index on zone_id column used in queries.

comment:18 Changed 7 years ago by muks

(Note that the work is in trac1899_2 branch.)

comment:19 Changed 7 years ago by vorner

  • Owner changed from vorner to muks
  • Total Hours changed from 0 to 1.5

I think it now can be merged.

comment:20 Changed 7 years ago by muks

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged to master in commit daedcaea541b1193018cb71d3cc68ba3ba9ff59c:

* f5bd136 [1899] Allow other RRtypes in nsec3 table (for RRSIGs)
* a38fe35 [1899] Add another unittest to check that NSEC3 RRs are being returned
* 66f45b3 [1899] Break up UNION SQL query into two and merge manually
* f284e18 [1899] Add a unique constraint to nsec3 table

Resolving as fixed. Thank you for reviews Michal.

Note: See TracTickets for help on using tickets.