| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| OverviewXSL-FO ConverterQizx XML search engineXML Editor | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Benchmarking QizxSince one of Qizx's distinctive characteristics is its querying speed, we find it fair to provide some concrete execution measurements on real-size examples. In the following:
Please remember that like for any benchmark, figures given here are highly dependent on test conditions (details below), hence subject to variations from one platform to another, and are given for information purpose only. Test platformAll following tests have been performed on the same commodity PC:
Database creation and indexingThese figures concern a complete load and indexing process, with all indexes (including numeric, date and full-text).
Simple queries on two millions documentsWe generated a database of two millions small documents by making the Cartesian product of six parameters (respectively item, date, color, origin, weight, price) varying each on 10 different values (20 values for the date), hence 2 millions documents. Parameters are present both as attributes and as elements with simple content. Additional data (random text) is also generated to produce documents with an average size of 700 bytes. The test has also been performed on a similar 4 million document database. Sample document: <?xml version='1.0' encoding='UTF-8'?>
<doc id='500'>
<info item='car' date='2004-05-13' color='red'
origin='Canada' weight='1000' price='10' />
<item>car</item>
<date>2004-05-13</date>
<color>red</color>
<origin>Canada</origin>
<weight>1000</weight>
<price>10</price>
...random text...
</doc>The following queries measure the capacity to perform intersections of large subsets of a database: each particular parameter value selects a set of 200,000 documents. The following query intersects 4 sets of 200,000 documents each and returns 200 documents. It takes about 15 milliseconds (with hot caches). count( collection("/mega")//doc[ item="car" and
color = "red" and
@origin="Italy" and
weight = 1000 ] )
= 200As a reference point, a RDBMS like MySQL, with a fully indexed table containing the same data (2 millions rows) takes about 500 ms to complete the equivalent query. The following query intersects 2 sets of 200,000 documents each and returns 20,000 items. It takes 150 to 300 milliseconds (depending on selection parameters used). count( collection("/mega")//doc[ item="chair" and color = "black" ] )
= 20000Notice that actually retrieving data from documents takes more time, in this example from 1 to 3 milliseconds per document for the first fetch (then much less thanks to caches). XMarkXMark is a XML database benchmark designed by the CWI Amsterdam and other organizations. It consist of 20 queries on a single document (whose reference size is 112 Megabytes, but can be scaled at will). Note: another comparison of several products on XMark is available on the site of Monet DB, but it does not include Qizx and is performed on a different platform. XMark favors massive scans and joins, and puts little emphasis on value-based queries (unlike example in the section above). The following test compares Qizx with Monet DB XQuery 4.26, the XML Query database also developed by CWI. Monet DB is written in C and has been so far the fastest XQuery database engine available. We also included eXist, a widely used open-source Java native XML database. These 3 products share the same characteristic that queries can be executed "out of the box" without need to define indexes manually. For legal reasons, we do not include tests performed on commercial products (some vendors explicitly prohibit it...), but that would not have made a difference anyway. We did not make comprehensive measurements on open-source products that require an explicit definition of indexes, like Sedna or Berkeley DB-XML. Test conditions: each query is executed 2 times and the average execution time is computed. These measurements are made after an initial "warm-up" phase (executing all queries twice).This is made necessary by Java dynamic class loading and compilation, and implies that caches are not empty. Such conditions are deemed realistic in a server environment XMark Queries are modified to simply count the result items without actually performing output, but we made sure that both systems actually materialize results (we did not rely on the count() function which is sometimes specially optimized).
Remarks:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2003-2010 Pixware. Updated on 2010/2/15 using Qizx/open. Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. Acrobat and PostScript are trademarks of Adobe Systems Incorporated. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||