| |||||||||||||||||||||||||||||||||||||||||||||||||||
| OverviewXSL-FO ConverterQizx XML database engineXML Editor | |||||||||||||||||||||||||||||||||||||||||||||||||||
Performance FactsHere are a few tests showing how Qizx 4.1 performs on large XML databases and very large individual documents.
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| Data | Import and indexing time | Speed | Final database size |
|---|---|---|---|
| Aircraft Maintenance Manual (single document, 129 Mb) | 23 seconds | 19.7 Gb per hour | 62 Mb |
| XMark size factor 1 (single document, 112 Mb) | 26.2 seconds | 15 Gb per hour | 102 Mb |
| XMark size factor 10 (single document, 1120 Mb) | 265 seconds | 15 Gb per hour | 994 Mb |
| Two millions small artificial documents (average size 780 bytes). | 22 minutes 40 seconds (one transaction) | 1460 documents per second. | 2.36 Gb |
We generated a database of two millions small documents by making the Cartesian product of six parameters (respectively item, date, color, origin, weight, price) varying each on 10 different values (20 values for the date), hence 2 millions documents.
Parameters are present both as attributes and as elements with simple content. Additional data (random text) is also generated to produce documents with an average size of 700 bytes.
Sample document:
<?xml version='1.0' encoding='UTF-8'?>
<Doc id='500'>
<info item='car' date='2004-05-13' color='red'
origin='Canada' weight='1000' price='10' />
<item>car</item>
<date>2004-05-13</date>
<color>red</color>
<origin>Canada</origin>
<weight>1000</weight>
<price>10</price>
...random text...
</doc>Creation of the Qizx database takes 22 min 40s on the above hardware (directly storing synthetic documents through the API without using a XML parser).
The following queries measure the capacity to perform intersections of large subsets of a database: each particular parameter value selects a set of 200,000 documents.
The following query intersects 4 sets of 200,000 documents each and returns 200 documents. It takes about 11 milliseconds (with hot caches).
count( /Doc[ info/item="car" and
info/color = "red" and
info/@origin="USA" and
info/@weight = 100 ] )
= 200The following query intersects 2 sets of 200,000 documents each and returns 20,000 items. It takes 20 to 40 milliseconds (depending on selection parameters used).
count( /Doc[ .//item="chair" and .//color = "black" ] ) = 20000
Notice that actually retrieving data from documents takes more time, in this example about 0.5 millisecond per document for the first fetch (then much less thanks to caches).
A RDBMS like MySQL, with a fully indexed table containing the same data (2 millions rows) takes significantly more time to complete the equivalent queries.
We have created and stored in Qizx 100 millions documents of 3.3 Kilobytes each, representing financial transactions between 2 parties in FpML (Financial Product Markup Language).
| queries | number of results | time in milliseconds |
|---|---|---|
All transactions involving party 10151 /FpML[party/partyId = "10151"] | 100000 items | 400 ms to 3500 ms |
All transactions between party 10010 and party 10000: /FpML[party/partyId = "10010"
and party/partyId = "10000"] | 4400 items | 1510 ms |
English Wikipedia represents some 9.5 millions pages and 38 Gb of XML.
We have converted each Wikipedia into an XML document using an ad hoc schema.
Note: the WikiMedia format is fairly complex, we have only represented the overall structure and the links.
| queries | number of results | time in milliseconds |
|---|---|---|
All pages containing word "XML" //page[. contains text "XML" ] | 15405 | 640 ms |
All pages containing words "XML" and "database" //page[. contains text "XML database" all words ] | 2327 | 950 ms |
All pages containing phrase "XML database" //page[. contains text "XML database" ] | 80 | 1050 ms |
This was achieved using the "XL" edition. This edition has a theoretical limit of 1 Terabyte approximately for a single document, instead of 2 Gb in the standard version.
A 43 Gigabyte document, containing a GML representation of the town of Ettenheim in Germany, has been stored and indexed (without full-text). This took 1 hour 23 mn.
| queries | number of results | time in milliseconds |
|---|---|---|
Number of Polygons //*:Polygon | 60340764 | 195 ms |
Polygons that have an interior (i.e hole): declare namespace gml="http://www.opengis.net/gml"; //gml:Polygon[gml:interior] | 1041568 | 6325 ms |