XMLmind logoCompany | Contact | Site Map
 
 

Application Examples

Querying XML B2B exchange data

More and more corporations use XML to exchange data between their information systems. Examples: XBRL, FIX, FpML®, IFX, SWIFTStandards, etc.

Exchanged data would typically be imported into a Relational Database Management System (RDBMS) and the original XML form of the data be discarded. Generally, some information loss occurs during this import process.

How Qizx is used:

Qizx can be used as a logger indexing all the XML exchange data sent or received by an information system. This allows:

  • Solving data exchange issues by retrieving/querying what has actually been sent and received over the last several months.
  • Performing data mining on exchanged data (as opposed to data stored into the information system).
  • Quickly prototyping all kinds of reports thanks to the power of the XQuery language.

Benefits of Qizx:

  • Efficient indexing process, allows adding gigabytes of data per day without affecting the performance of consultation applications.

 

XML Repositories

In this very general kind of application, Qizx is used as an advanced search engine to leverage information assets already in XML form.

  • The purpose of the system is to provide services to a number of client applications:
    • consultation, search: possibly involving sophisticated full-search
    • reporting, statistics computation: on a daily, weekly, monthly basis
    • data mining: involving bespoke sophisticated queries
  • The actual XML repository resides in a file-system, a database, or any other legacy system.
  • XML documents are added to the repository as a continuous flow.

How Qizx is used:

Qizx is used as an indexing and querying engine, in parallel with an existing infrastructure. Thus it can be added without affecting the existing system.

Benefits of Qizx:

  • Power of the XQuery language
  • Fast query engine available: allows many simultaneous queries; allows massive queries to be completed in reasonable time (reporting, data mining).
  • Efficient indexing process, allows adding gigabytes of data per day without affecting the performance of consultation applications.
  • Modest additional storage volume: typically roughly the size of the stored XML documents.

 

Non-XML Data

An XML query engine can handle more than pure XML data: many non-XML file formats are semi-structured, or contain structured meta-data, and as such may be easily parsed as if they were XML[1].

Examples:

  • email messages: headers and text
  • log records fields and data
  • PDF: meta-data and text
  • image files (embedded metadata) etc.

By parsing these documents as XML and indexing in Qizx what has been parsed, you can overcome some limitations of traditional full-text engines:

  • Most if not all full-text engines can only handle text fields, not numeric or date fields
  • In most full-text engines, queries cannot include inequalities or range conditions, like "date greater than '2008-03-21'". Actually even a query like "field = value" is not always easy to perform.

Benefits of Qizx:

  • Qizx can do queries involving full-text, string values, numeric values, date values. No need to assemble a relational DBMS and a full-text engine.
  • XQuery expressions include equality, inequalities, range comparisons; Qizx executes this kind of queries efficiently.

 

Content Delivery on CD

Some companies or organizations publish on CD/DVD applications containing large amounts of structured data, and requiring an evolved search engine: for example an Aircraft Maintenance Manual coming with an application that helps maintenance agents to quickly diagnose faults or issues according to the configuration of the particular aircraft under repair.

Such an application must be able to perform complex queries efficiently using a multiplicity of criteria ("applicabilities"). Using XML format for representing the information facilitates integration in the editorial process, and makes updates easier.

How Qizx is used:

A XML Library indexed by Qizx can be burnt on CD/DVD and be used in read-only mode.

Qizx provides both a compact and fast storage for the data.

Benefits of Qizx:

  • Powerful standalone query engine, featuring full-text.
  • Well adapted to relatively slow CD drives.
  • Makes the application implementation much easier through the use of XQuery
  • The totality of XML contents is stored by Qizx and available through the XQuery language and the API. This allows all kinds of processing and generation of contents.

Massive XML processing

Processing massive XML documents (size 100 megabytes or more) to extract and format relevant information is usually not an easy task:

  • Parsing documents into DOM requires huge amounts of memory. DOM lacks high-level functions, and does not rely on indexing to boost searching.
  • XSLT and simple XQuery processors (like Qizx/open) are a much higher level tool but, like DOM, still require parsing the whole document into memory.
  • Using SAX is memory efficient, but very low level and requires higher programming skills.
  • Some XQuery processors are called "Streaming"", i.e build in memory only what is necessary for the XQuery script to execute, but in the current state of art they often fail to optimize even simple scripts and end up building the whole document.

How Qizx would be used:

The document to process is stored and indexed into a temporary XML Library (database). This is about 8 times longer than building a DOM but needs to be done only once. Then many queries can be executed on this document.

Benefits of Qizx:

  • Expressive power and simplicity of the XQuery language.
  • Indexes are automatically built (no configuration or tuning needed), and automatically used to boost data retrieval tremendously.
  • Formatting of results can be achieved by piping into any XSLT processor supporting the JAXP interfaces, like Saxon.

 


Notes

  1. Several libraries, commercial or open-source, exist for parsing a variety of formats.

    From a technical point of view, Qizx Java APIs allow on-the-fly storing and indexing of documents without generating intermediate XML and re-parsing.