XMLmind logoCompany | Contact | Site Map
 
 

Performance Facts

Here are a few tests showing how Qizx 4.1 performs on large XML databases and very large individual documents.

arrow See also: benchmark

Test platform

All following tests have been performed on the same commodity PC:

  • processor: Intel i7 860 quad core at 2.8 GHz (one core is actually used)
  • memory: 8 Gigabytes,
  • hard-disk: SATA 3 Gb, 7200 rpm,
  • JVM: Sun Hotspot™ 1.6.0 64-Bit Server,
  • OS: GNU/Linux OpenSuse 11.2 (kernel 2.6.31). Measurements on Windows 7™ are very close.

Measurements of queries were made QizxStudio: the indicated query times include counting the number of results and retrieving the 100 first items.

Medium-size database creation and indexing

This measures a complete load and indexing process, with all indexes (including numeric, date and full-text).

  • Indexing jobs have been achieved with a Java VM using a maximum of 150 Mb of memory.
  • Disabling full-text indexing makes the process nearly twice faster in general.

  • In all cases, the indexing is achieved in one single transaction.

    Note: that matters only when indexing many small documents: the cost of a commit is not completely negligible, therefore doing a commit after each document import would make the last example much slower.

DataImport and indexing timeSpeedFinal database size
Aircraft Maintenance Manual (single document, 129 Mb)23 seconds19.7 Gb per hour62 Mb
XMark size factor 1 (single document, 112 Mb)26.2 seconds15 Gb per hour102 Mb
XMark size factor 10 (single document, 1120 Mb)265 seconds15 Gb per hour994 Mb
Two millions small artificial documents (average size 780 bytes).

22 minutes 40 seconds

(one transaction)

1460 documents per second.

2.36 Gb

Simple queries on two millions documents

We generated a database of two millions small documents by making the Cartesian product of six parameters (respectively item, date, color, origin, weight, price) varying each on 10 different values (20 values for the date), hence 2 millions documents.

Parameters are present both as attributes and as elements with simple content. Additional data (random text) is also generated to produce documents with an average size of 700 bytes.

Sample document:

<?xml version='1.0' encoding='UTF-8'?>
<Doc id='500'>
<info item='car' date='2004-05-13' color='red' 
      origin='Canada' weight='1000' price='10' />
<item>car</item>
<date>2004-05-13</date>
<color>red</color>
<origin>Canada</origin>
<weight>1000</weight>
<price>10</price>
...random text...
</doc>

Creation of the Qizx database takes 22 min 40s on the above hardware (directly storing synthetic documents through the API without using a XML parser).

The following queries measure the capacity to perform intersections of large subsets of a database: each particular parameter value selects a set of 200,000 documents.

The following query intersects 4 sets of 200,000 documents each and returns 200 documents. It takes about 11 milliseconds (with hot caches).

count( /Doc[ info/item="car" and 
             info/color = "red" and 
             info/@origin="USA" and
             info/@weight = 100 ] )
= 200

The following query intersects 2 sets of 200,000 documents each and returns 20,000 items. It takes 20 to 40 milliseconds (depending on selection parameters used).

count( /Doc[ .//item="chair" and .//color = "black" ] )
= 20000

Notice that actually retrieving data from documents takes more time, in this example about 0.5 millisecond per document for the first fetch (then much less thanks to caches).

A RDBMS like MySQL, with a fully indexed table containing the same data (2 millions rows) takes significantly more time to complete the equivalent queries.

Database with 100 millions documents

We have created and stored in Qizx 100 millions documents of 3.3 Kilobytes each, representing financial transactions between 2 parties in FpML (Financial Product Markup Language).

  • The database was created in 20 batches of 5 millions documents each (one transaction for each batch).
  • Storing and indexing took about 23.5 hours (full-text indexing was disabled).
queriesnumber of resultstime in milliseconds

All transactions involving party 10151

/FpML[party/partyId = "10151"]
100000 items400 ms to 3500 ms

All transactions between party 10010 and party 10000:

/FpML[party/partyId = "10010" 
      and party/partyId = "10000"]
4400 items1510 ms

Full-text search on English Wikipedia

English Wikipedia represents some 9.5 millions pages and 38 Gb of XML.

  • We have converted each Wikipedia into an XML document using an ad hoc schema.

    Note: the WikiMedia format is fairly complex, we have only represented the overall structure and the links.

queriesnumber of resultstime in milliseconds

All pages containing word "XML"

//page[. contains text "XML" ]
15405640 ms

All pages containing words "XML" and "database"

//page[. contains text "XML database" all words ]
2327950 ms

All pages containing phrase "XML database"

//page[. contains text "XML database" ]
801050 ms

Very Large Document

This was achieved using the "XL" edition. This edition has a theoretical limit of 1 Terabyte approximately for a single document, instead of 2 Gb in the standard version.

  • A 43 Gigabyte document, containing a GML representation of the town of Ettenheim in Germany, has been stored and indexed (without full-text). This took 1 hour 23 mn.

  • The document contains 349,540,929 elements and 138,656,540 attributes.
queriesnumber of resultstime in milliseconds

Number of Polygons

//*:Polygon
60340764195 ms

Polygons that have an interior (i.e hole):

declare namespace gml="http://www.opengis.net/gml";

//gml:Polygon[gml:interior]
10415686325 ms