XMLmind logoCompany | Contact | Site Map
 
 

Benchmarking Qizx

Since one of Qizx's distinctive characteristics is its querying speed, we find it fair to provide some concrete execution measurements on real-size examples. In the following:

  • Qizx is compared with two open-source products on the XMark benchmark.
  • An example of multi-criteria search is given on a large database of small documents.
  • Measurements of document import and indexing operations are also included.

Please remember that like for any benchmark, figures given here are highly dependent on test conditions (details below), hence subject to variations from one platform to another, and are given for information purpose only.

Test platform

All following tests have been performed on the same commodity PC:

  • processor: Dual core Pentium at 2.8 GHz,
  • memory: 1 Gigabyte,
  • hard-disk: SATA 7200 rpm,
  • JVM: Sun Hotspot™ 1.6.0 used in optimized ("server") mode,
  • OS: GNU/Linux Ubuntu 7.0 (kernel 2.6.20). Note: measurements on Windows XP™ are nearly identical.

Database creation and indexing

These figures concern a complete load and indexing process, with all indexes (including numeric, date and full-text).

  • Indexing jobs have been achieved with a Java VM using a maximum of 256 Mb of memory.
  • Disabling full-text indexing makes the process nearly twice faster in general.

  • In all cases, the indexing is achieved in one single transaction.

    Note: that matters only when indexing many small documents: the cost of a commit is not completely negligible, therefore doing a commit after each document import would make the last example much slower.

DatabaseImport and indexing timeSpeedFinal database size
Aircraft Maintenance Manual (single document, 128 Mb)74 seconds6.2 Gb per hour60 Mb
XMark size factor 1 (single document, 112 Mb)79 seconds5.1 Gb per hour101 Mb
XMark size factor 18 (single document, 2 Gb)30.5 minutes3.9 Gb per hour1.8 Gb
Two millions small artificial documents (average size 700 bytes).

66 minutes

(one transaction)

505 documents per second.

2.3 Gb

Simple queries on two millions documents

We generated a database of two millions small documents by making the Cartesian product of six parameters (respectively item, date, color, origin, weight, price) varying each on 10 different values (20 values for the date), hence 2 millions documents. Parameters are present both as attributes and as elements with simple content. Additional data (random text) is also generated to produce documents with an average size of 700 bytes. The test has also been performed on a similar 4 million document database.

Sample document:

<?xml version='1.0' encoding='UTF-8'?>
<doc id='500'>
<info item='car' date='2004-05-13' color='red' 
      origin='Canada' weight='1000' price='10' />
<item>car</item>
<date>2004-05-13</date>
<color>red</color>
<origin>Canada</origin>
<weight>1000</weight>
<price>10</price>
...random text...
</doc>

The following queries measure the capacity to perform intersections of large subsets of a database: each particular parameter value selects a set of 200,000 documents.

The following query intersects 4 sets of 200,000 documents each and returns 200 documents. It takes about 15 milliseconds (with hot caches).

count( collection("/mega")//doc[ item="car" and 
                                 color = "red" and 
                                 @origin="Italy" and
                                 weight = 1000 ] )
= 200

As a reference point, a RDBMS like MySQL, with a fully indexed table containing the same data (2 millions rows) takes about 500 ms to complete the equivalent query.

The following query intersects 2 sets of 200,000 documents each and returns 20,000 items. It takes 150 to 300 milliseconds (depending on selection parameters used).

count( collection("/mega")//doc[ item="chair" and color = "black" ] )
= 20000

Notice that actually retrieving data from documents takes more time, in this example from 1 to 3 milliseconds per document for the first fetch (then much less thanks to caches).

XMark

XMark is a XML database benchmark designed by the CWI Amsterdam and other organizations. It consist of 20 queries on a single document (whose reference size is 112 Megabytes, but can be scaled at will).

Note: another comparison of several products on XMark is available on the site of Monet DB, but it does not include Qizx and is performed on a different platform.

XMark favors massive scans and joins, and puts little emphasis on value-based queries (unlike example in the section above).

The following test compares Qizx with Monet DB XQuery 4.26, the XML Query database also developed by CWI. Monet DB is written in C and has been so far the fastest XQuery database engine available. We also included eXist, a widely used open-source Java native XML database.

These 3 products share the same characteristic that queries can be executed "out of the box" without need to define indexes manually.

For legal reasons, we do not include tests performed on commercial products (some vendors explicitly prohibit it...), but that would not have made a difference anyway. We did not make comprehensive measurements on open-source products that require an explicit definition of indexes, like Sedna or Berkeley DB-XML.

Test conditions: each query is executed 2 times and the average execution time is computed.

These measurements are made after an initial "warm-up" phase (executing all queries twice).This is made necessary by Java dynamic class loading and compilation, and implies that caches are not empty. Such conditions are deemed realistic in a server environment

XMark Queries are modified to simply count the result items without actually performing output, but we made sure that both systems actually materialize results (we did not rely on the count() function which is sometimes specially optimized).

  • All times are in milliseconds.
  • In all respects, please remember that measured times in Qizx are highly dependent on caching effects. As a matter of fact, the first execution of a query can take much longer than following executions, because the system need to fetch data from disk.
  • Unlike Qizx, Qizx/open is not a database engine: it has to parse and build the 112 Mb document in memory.
  • DNF means Did Not Finish within 15 minutes. AL means "terminating but abnormally long".

 

Comparison of Qizx 3.0 with several open-source databases on XMark 112 Mb
QueriesQizxQizx/openMonet DB XML 4.26eXist 1.2.5 new core

Q01 

1

20

105

450

Q02

230

34

170

1230

Q03

235

95

350

3280

Q04

260

92

280

AL

Q05

60

25

130

480

Q06

7

190

55

40

Q07

24

440

90

180

Q08

450

175

490

DNF

Q09

810

310

660

DNF

Q10

2080

1180

2050

DNF

Q11

1170

820

4170

DNF

Q12

880

770

AL

DNF

Q13

155

41

160

290

Q14

1260

440

1305

1650

Q15

285

20

175

260

Q16

290

35

200

1450

Q17

240

75

250

1520

Q18

440

56

70

495

Q19

1050

270

380

2900

Q20

190

120

390

1950

Average

506

260

604

≫ 1100

Geometric average [2]

213

125

289

≫ 1000

Import time (seconds)49 (without full-text)824150 (with full-text)

Remarks:

  1. QIzx/open is faster than Qizx in quite a few queries: this is because it works in memory, while in Qizx there is an overhead for reading from persistent storage. But when value indexes are really used (Q01, Q06, Q07), Qizx is significantly faster.
  2. The geometric average is more meaningful when aggregating short queries and long queries.
To conclude, Qizx is globally slightly faster than MonetDb on XMark (using geometric average and ignoring Q12 where MonetDb is abnormally slow). The most visible and significant difference is on value-based queries (here Q1: lookup of an element by a unique attribute value), which are in very small number in XMark, yet very much used in practice.