Opened 7 years ago

Closed 5 years ago

#2610 closed defect (wontfix)

Query for large zone in SQLite datasrc takes a very long time

Reported by: vorner Owned by:
Priority: medium Milestone: Remaining BIND10 tickets
Component: data source Version: bind10-old
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: Medium
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: discuss Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

I tried loading a large zone into SQLite data source (without the in-memory
cache, over 3M records). Then I queried for a SOA record of the zone. The query
got answered correctly, but after something like 12 seconds of b10-auth running
on 100% CPU. And my CPU is pretty fast.

Such performance is unusable for any practical purpose for such a large zone. I
thought we fixed such problem by some indices, but it seems it didn't help
enough.

Subtickets

Change History (21)

comment:1 in reply to: ↑ description Changed 7 years ago by jinmei

Replying to vorner:

I tried loading a large zone into SQLite data source (without the in-memory
cache, over 3M records). Then I queried for a SOA record of the zone. The query
got answered correctly, but after something like 12 seconds of b10-auth running
on 100% CPU. And my CPU is pretty fast.

Does this mean you only sent one query (for SOA), b10-auth answered
it, and then it started eating CPU time (without handling any other
queries)? Or did you sent more (same or different) queries, which
caused the problem?

comment:2 follow-up: Changed 7 years ago by vorner

Maybe I wasn't clear in the description. I sent a query. The server started to
eat 100% CPU and then, after working for that 12 seconds, it sent answer (and
returned to being idle).

If more than one query is sent in that time, it gets queued in the UDP socket
and answered afterwards (after another 12 seconds of hard work).

So I'm not complaining it would be acting incorrectly. It is just incredibly slow.

comment:3 in reply to: ↑ 2 ; follow-up: Changed 7 years ago by jinmei

Replying to vorner:

Maybe I wasn't clear in the description. I sent a query. The server started to
eat 100% CPU and then, after working for that 12 seconds, it sent answer (and
returned to being idle).

If more than one query is sent in that time, it gets queued in the UDP socket
and answered afterwards (after another 12 seconds of hard work).

So I'm not complaining it would be acting incorrectly. It is just incredibly slow.

Okay, in that case dumping detailed logs may help diagnose the issue.
BTW, I thought we fixed the issue(s) that would have caused this type
of performance problem as you mentioned in the ticket description,
so I suspect it's a new issue or some kind of regression from the
previous fixes.

comment:4 in reply to: ↑ 3 ; follow-up: Changed 7 years ago by jinmei

Replying to jinmei:

Maybe I wasn't clear in the description. I sent a query. The
server started to eat 100% CPU and then, after working for that 12
seconds, it sent answer (and returned to being idle).

If more than one query is sent in that time, it gets queued in the UDP socket
and answered afterwards (after another 12 seconds of hard work).

So I'm not complaining it would be acting incorrectly. It is just incredibly slow.

Okay, in that case dumping detailed logs may help diagnose the issue.
BTW, I thought we fixed the issue(s) that would have caused this type
of performance problem as you mentioned in the ticket description,
so I suspect it's a new issue or some kind of regression from the
previous fixes.

FYI, I just tested a simple query with a very large zone at its apex
and didn't see this problem. So it may also be related to the
characteristic of your zone data.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 7 years ago by jinmei

Replying to jinmei:

Maybe I wasn't clear in the description. I sent a query. The
server started to eat 100% CPU and then, after working for that 12
seconds, it sent answer (and returned to being idle).

FYI, I just tested a simple query with a very large zone at its apex
and didn't see this problem. So it may also be related to the
characteristic of your zone data.

I guess you created your sqlite3 DB file by (new) b10-loadzone from
the scratch. In this case the creation of the DB and tables are done
by sqlite3_accessor, and it doesn't create indexes. If my guess is
correct, it's quite likely to be the reason for your problem.

An operational workaround would be to create the indexes by hand:

% sqlite3 zone.sqlite3
sqlite> CREATE INDEX zones_byname ON zones (name);
sqlite> CREATE INDEX records_byname ON records (name);
sqlite> CREATE INDEX records_byrname ON records (rname);
sqlite> CREATE INDEX records_bytype_and_rname ON records (rdtype, rname);
sqlite> CREATE INDEX nsec3_byhash ON nsec3 (hash);
sqlite> CREATE INDEX nsec3_byhash_and_rdtype ON nsec3 (hash, rdtype);

For a more complete solution, I propose scheduling #1861 at this
opportunity, and have (new) b10-loadzone use it.

comment:6 in reply to: ↑ 5 Changed 7 years ago by jinmei

Replying to jinmei:

FYI, I just tested a simple query with a very large zone at its apex
and didn't see this problem. So it may also be related to the
characteristic of your zone data.

I guess you created your sqlite3 DB file by (new) b10-loadzone from
the scratch. In this case the creation of the DB and tables are done
by sqlite3_accessor, and it doesn't create indexes.

Sorry, I was confused. It does create indexes.

If my guess is
correct, it's quite likely to be the reason for your problem.

comment:7 follow-up: Changed 7 years ago by vorner

So, I looked into it little bit more.

The indices are there. I just checked.

It's really because of how the zone looks like. It's somehow broken and old
copy of the .cz zone. It turns out it is missing address records for 3 of the
nameservers (actually, there are no records with their names). This leads to
call to DatabaseClient::Finder::findNoNameResult for each of them. The
hasSubdomains call takes some time and the findWildcardMatch even more. I added
two more log messages to the method to see what takes the time, one at the
beginning (DATASRC_DATABASE_FIND_NONAME) and the second just before the
findWildcardMatch (TRY_WILDCARD). Here's what I get.

2013-01-08 13:34:07.352 DEBUG [b10-auth.auth/18468] AUTH_PACKET_RECEIVED message received:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64678
;; flags: rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096

;; QUESTION SECTION:
;cz. IN SOA

2013-01-08 13:34:07.352 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_RECORDS looking in datasource sqlite3_zone.sqlite3 for record cz./SOA/IN
2013-01-08 13:34:07.353 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FOUND_RRSET search in datasource sqlite3_zone.sqlite3 resulted in RRset cz. 18000 IN SOA a.ns.nic.cz. hostmaster.nic.cz. 1283265961 900 300 604800 900

2013-01-08 13:34:07.353 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_RECORDS looking in datasource sqlite3_zone.sqlite3 for record cz./NS/IN
2013-01-08 13:34:07.353 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FOUND_RRSET search in datasource sqlite3_zone.sqlite3 resulted in RRset cz. 18000 IN NS a.ns.nic.cz.
cz. 18000 IN NS b.ns.nic.cz.
cz. 18000 IN NS c.ns.nic.cz.
cz. 18000 IN NS d.ns.nic.cz.
cz. 18000 IN NS f.ns.nic.cz.

2013-01-08 13:34:07.353 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_RECORDS looking in datasource sqlite3_zone.sqlite3 for record a.ns.nic.cz./A/IN
2013-01-08 13:34:07.354 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FOUND_RRSET search in datasource sqlite3_zone.sqlite3 resulted in RRset a.ns.nic.cz. 18000 IN A 194.0.12.1

2013-01-08 13:34:07.354 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_RECORDS looking in datasource sqlite3_zone.sqlite3 for record a.ns.nic.cz./AAAA/IN
2013-01-08 13:34:07.354 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FOUND_RRSET search in datasource sqlite3_zone.sqlite3 resulted in RRset a.ns.nic.cz. 18000 IN AAAA 2001:678:f::1

2013-01-08 13:34:07.354 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_RECORDS looking in datasource sqlite3_zone.sqlite3 for record b.ns.nic.cz./A/IN
2013-01-08 13:34:07.354 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FOUND_RRSET search in datasource sqlite3_zone.sqlite3 resulted in RRset b.ns.nic.cz. 18000 IN A 194.0.13.1

2013-01-08 13:34:07.354 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_RECORDS looking in datasource sqlite3_zone.sqlite3 for record b.ns.nic.cz./AAAA/IN
2013-01-08 13:34:07.355 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FOUND_RRSET search in datasource sqlite3_zone.sqlite3 resulted in RRset b.ns.nic.cz. 18000 IN AAAA 2001:678:10::1

2013-01-08 13:34:07.355 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_RECORDS looking in datasource sqlite3_zone.sqlite3 for record c.ns.nic.cz./A/IN
2013-01-08 13:34:07.355 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_NONAME Special cases for no such zone c.ns.nic.cz.
2013-01-08 13:34:08.395 DEBUG [b10-auth.datasrc/18468] TRY_WILDCARD Trying wildcard c.ns.nic.cz.
2013-01-08 13:34:10.424 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_NO_MATCH not match for c.ns.nic.cz./A/IN in sqlite3_zone.sqlite3
2013-01-08 13:34:10.424 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_RECORDS looking in datasource sqlite3_zone.sqlite3 for record c.ns.nic.cz./AAAA/IN
2013-01-08 13:34:10.425 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_NONAME Special cases for no such zone c.ns.nic.cz.
2013-01-08 13:34:11.455 DEBUG [b10-auth.datasrc/18468] TRY_WILDCARD Trying wildcard c.ns.nic.cz.
2013-01-08 13:34:13.502 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_NO_MATCH not match for c.ns.nic.cz./AAAA/IN in sqlite3_zone.sqlite3
2013-01-08 13:34:13.502 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_RECORDS looking in datasource sqlite3_zone.sqlite3 for record d.ns.nic.cz./A/IN
2013-01-08 13:34:13.503 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FOUND_RRSET search in datasource sqlite3_zone.sqlite3 resulted in RRset d.ns.nic.cz. 18000 IN A 193.29.206.1

2013-01-08 13:34:13.503 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_RECORDS looking in datasource sqlite3_zone.sqlite3 for record d.ns.nic.cz./AAAA/IN
2013-01-08 13:34:13.503 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FOUND_RRSET search in datasource sqlite3_zone.sqlite3 resulted in RRset d.ns.nic.cz. 18000 IN AAAA 2001:678:1::1

2013-01-08 13:34:13.504 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_RECORDS looking in datasource sqlite3_zone.sqlite3 for record f.ns.nic.cz./A/IN
2013-01-08 13:34:13.504 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_NONAME Special cases for no such zone f.ns.nic.cz.
2013-01-08 13:34:14.520 DEBUG [b10-auth.datasrc/18468] TRY_WILDCARD Trying wildcard f.ns.nic.cz.
2013-01-08 13:34:16.583 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_NO_MATCH not match for f.ns.nic.cz./A/IN in sqlite3_zone.sqlite3
2013-01-08 13:34:16.583 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_RECORDS looking in datasource sqlite3_zone.sqlite3 for record f.ns.nic.cz./AAAA/IN
2013-01-08 13:34:16.584 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_NONAME Special cases for no such zone f.ns.nic.cz.
2013-01-08 13:34:17.605 DEBUG [b10-auth.datasrc/18468] TRY_WILDCARD Trying wildcard f.ns.nic.cz.
2013-01-08 13:34:19.652 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_NO_MATCH not match for f.ns.nic.cz./AAAA/IN in sqlite3_zone.sqlite3
2013-01-08 13:34:19.653 DEBUG [b10-auth.auth/18468] AUTH_SEND_NORMAL_RESPONSE sending a normal response (297 bytes):
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64678
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 5, ADDITIONAL: 8

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096

;; QUESTION SECTION:
;cz. IN SOA

;; ANSWER SECTION:
cz. 18000 IN SOA a.ns.nic.cz. hostmaster.nic.cz. 1283265961 900 300 604800 900

;; AUTHORITY SECTION:
cz. 18000 IN NS a.ns.nic.cz.
cz. 18000 IN NS b.ns.nic.cz.
cz. 18000 IN NS c.ns.nic.cz.
cz. 18000 IN NS d.ns.nic.cz.
cz. 18000 IN NS f.ns.nic.cz.

;; ADDITIONAL SECTION:
a.ns.nic.cz. 18000 IN A 194.0.12.1
a.ns.nic.cz. 18000 IN AAAA 2001:678:f::1
b.ns.nic.cz. 18000 IN A 194.0.13.1
b.ns.nic.cz. 18000 IN AAAA 2001:678:10::1
d.ns.nic.cz. 18000 IN A 193.29.206.1
d.ns.nic.cz. 18000 IN AAAA 2001:678:1::1

We may be missing some indexes for matches of this type.

comment:8 Changed 7 years ago by jwright

  • Defect Severity changed from N/A to Medium
  • Milestone changed from New Tasks to Next-Sprint-Proposed

comment:9 in reply to: ↑ 7 ; follow-up: Changed 7 years ago by jinmei

Replying to vorner:

Where do these come from?

2013-01-08 13:34:07.355 DEBUG [b10-auth.datasrc/18468] DATASRC_DATABASE_FIND_NONAME Special cases for no such zone c.ns.nic.cz.
2013-01-08 13:34:08.395 DEBUG [b10-auth.datasrc/18468] TRY_WILDCARD Trying wildcard c.ns.nic.cz.

I can't find these in the source code.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 7 years ago by vorner

Replying to jinmei:

Where do these come from?

[...]

I can't find these in the source code.

They are not in the code. I just temporary added them locally, into the
findNoNameResult method (one at the very beginning, the other before the
findWildcardMatch call in there).

comment:11 in reply to: ↑ 10 ; follow-up: Changed 7 years ago by jinmei

Replying to vorner:

I can't find these in the source code.

They are not in the code. I just temporary added them locally, into the
findNoNameResult method (one at the very beginning, the other before the
findWildcardMatch call in there).

Okay, then I suspect the most likely bottleneck is "hasSubdomains()",
which requires this SQL query:

    // ANY_SUB:
    // This query returns records in the specified zone for the domain
    // matching the passed name, and its sub-domains.
    "SELECT rdtype, ttl, sigtype, rdata "
        "FROM records WHERE zone_id=?1 AND rname LIKE ?2",

So I'd try this manually:

sqlite> SELECT rdtype, ttl, sigtype, rdata FROM records WHERE zone_id=xx AND rname LIKE 'cz.nic.ns.c.%'
sqlite> explain query plan SELECT rdtype, ttl, sigtype, rdata FROM records WHERE zone_id=XX AND rname LIKE 'cz.nic.ns.c.%'

(replace 'XX' with the actual zone ID in the table)

and, if this works fast, add logs before and after sqlite3_step here:

    bool getNext(std::string (&data)[COLUMN_COUNT]) {
        // If there's another row, get it
        // If finalize has been called (e.g. when previous getNext() got
        // SQLITE_DONE), directly return false
        while (statement_ != NULL) {
            rc_ = sqlite3_step(statement_);

to see how much it takes in the C++ code.

comment:12 in reply to: ↑ 11 ; follow-up: Changed 7 years ago by vorner

Hello

Replying to jinmei:

Okay, then I suspect the most likely bottleneck is "hasSubdomains()",
which requires this SQL query:

That seems possible, the query itself takes about a second and it is called
multiple times from findWildcardMatch too (the other query called from this
method is the usual one for direct lookup).

So I'd try this manually:

sqlite> SELECT rdtype, ttl, sigtype, rdata FROM records WHERE zone_id=xx AND rname LIKE 'cz.nic.ns.c.%'
sqlite> explain query plan SELECT rdtype, ttl, sigtype, rdata FROM records WHERE zone_id=XX AND rname LIKE 'cz.nic.ns.c.%'

It is indeed slow:

SELECT rdtype, ttl, sigtype, rdata FROM records WHERE zone_id=1 AND rname LIKE 'cz.nic.ns.c.%';
CPU Time: user 0.966666 sys 0.080000

explain query plan SELECT rdtype, ttl, sigtype, rdata FROM records WHERE zone_id=1 AND rname LIKE 'cz.nic.ns.c.%';
0|0|0|SCAN TABLE records (~50000 rows)

It doesn't seem to use the index much (it probably does something, since it
says it'd scan 50k rows, but the table contains 1911305 rows in the zone).

Looking at the sqlite3 webpage, it should use the index. But I don't know why
it doesn't.

Anyway, we don't really need to know if the result is NXRRSET or NXDOMAIN in
this case, since these are the additional records and the result would be the
same. Should we add a flag to disable that kind of check with subdomains? We
probably still need it for the general case, though, so we should make the
query faster somehow too.

comment:13 in reply to: ↑ 12 Changed 7 years ago by jinmei

Replying to vorner:

It doesn't seem to use the index much (it probably does something, since it
says it'd scan 50k rows, but the table contains 1911305 rows in the zone).

Looking at the sqlite3 webpage, it should use the index. But I don't know why
it doesn't.

In my experiment it should use an index. I don't know why it doesn't
happen your setup, but I'd primarily check the schema and sqlite3
version.

Anyway, we don't really need to know if the result is NXRRSET or NXDOMAIN in
this case, since these are the additional records and the result would be the
same. Should we add a flag to disable that kind of check with subdomains? We
probably still need it for the general case, though, so we should make the
query faster somehow too.

There could be some cases where we can skip some of the processing,
but I suspect we'll still hit the unacceptable bottleneck while
checking wildcards (which cannot be omitted for glue records too).
So, such micro optimization wouldn't be a solution for this ticket.

My suggestion is: first figure out why the index doesn't work for you;
if it cannot be resolved by upgrading either/both schema/sqlite3 or
by some operational fix, I think we have to just give up saying
"basically sqlite3 data source isn't expected to work for a very large
zone".

comment:14 Changed 7 years ago by vorner

I don't know why the index isn't used. This is what I get (includes sqlite3 version and schema version). The index seems to be there:

$ sqlite3 zone.sqlite3 
SQLite version 3.7.15.2 2013-01-09 11:53:05
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .schema
CREATE TABLE diffs (id INTEGER PRIMARY KEY,
                    zone_id INTEGER NOT NULL,
                    version INTEGER NOT NULL,
                    operation INTEGER NOT NULL,
                    name TEXT NOT NULL COLLATE NOCASE,
                    rrtype TEXT NOT NULL COLLATE NOCASE,
                    ttl INTEGER NOT NULL,
                    rdata TEXT NOT NULL);
CREATE TABLE nsec3 (id INTEGER PRIMARY KEY,
                    zone_id INTEGER NOT NULL,
                    hash TEXT NOT NULL COLLATE NOCASE,
                    owner TEXT NOT NULL COLLATE NOCASE,
                    ttl INTEGER NOT NULL,
                    rdtype TEXT NOT NULL COLLATE NOCASE,
                    rdata TEXT NOT NULL);
CREATE TABLE records (id INTEGER PRIMARY KEY,
                    zone_id INTEGER NOT NULL,
                    name TEXT NOT NULL COLLATE NOCASE,
                    rname TEXT NOT NULL COLLATE NOCASE,
                    ttl INTEGER NOT NULL,
                    rdtype TEXT NOT NULL COLLATE NOCASE,
                    sigtype TEXT COLLATE NOCASE,
                    rdata TEXT NOT NULL);
CREATE TABLE schema_version (version INTEGER NOT NULL,
                    minor INTEGER NOT NULL DEFAULT 0);
CREATE TABLE zones (id INTEGER PRIMARY KEY,
                    name TEXT NOT NULL COLLATE NOCASE,
                    rdclass TEXT NOT NULL COLLATE NOCASE DEFAULT 'IN',
                    dnssec BOOLEAN NOT NULL DEFAULT 0);
CREATE INDEX nsec3_byhash ON nsec3 (hash);
CREATE INDEX nsec3_byhash_and_rdtype ON nsec3 (hash, rdtype);
CREATE INDEX records_byname ON records (name);
CREATE INDEX records_byrname ON records (rname);
CREATE INDEX records_bytype_and_rname ON records
                       (rdtype, rname);
CREATE INDEX zones_byname ON zones (name);
sqlite> select * from schema_version;
2|1

comment:15 Changed 7 years ago by shane

  • Milestone changed from Previous-Sprint-Proposed to Next-Sprint-Proposed

comment:16 Changed 7 years ago by shane

  • Milestone changed from Previous-Sprint-Proposed to Next-Sprint-Proposed

comment:17 Changed 7 years ago by shane

  • Milestone changed from Previous-Sprint-Proposed to Next-Sprint-Proposed

comment:18 Changed 7 years ago by shane

  • Milestone changed from Previous-Sprint-Proposed to Next-Sprint-Proposed

comment:19 Changed 7 years ago by shane

  • Milestone changed from Previous-Sprint-Proposed to Next-Sprint-Proposed

comment:20 Changed 6 years ago by tomek

  • Milestone set to Remaining BIND10 tickets

comment:21 Changed 5 years ago by tomek

  • Resolution set to wontfix
  • Status changed from new to closed
  • Version set to old-bind10

This issue is related to bind10 code that is no longer part of Kea.

If you are interested in BIND10/Bundy framework or its DNS components,
please check http://bundy-dns.de.

Closing ticket.

Note: See TracTickets for help on using tickets.