Application ExamplesQuerying XML B2B exchange dataMore and more corporations
use XML to exchange data between their information systems. Examples: XBRL, FIX,
FpML®, IFX, SWIFTStandards, etc. Exchanged data
would typically be imported into a Relational Database Management System
(RDBMS) and the original XML form of the data be discarded. Generally, some
information loss occurs during this import process. How Qizx is
used:Qizx can be used as a logger indexing all the XML exchange data
sent or received by an information system. This allows: - Solving data exchange issues by retrieving/querying what has
actually been sent and received over the last several months.
- Performing data mining on exchanged data (as opposed to data stored
into the information system).
- Quickly prototyping all kinds of reports thanks to the power of the
XQuery language.
Benefits of Qizx:- Efficient indexing process, allows adding gigabytes of data per day
without affecting the performance of consultation applications.
XML RepositoriesIn this
very general kind of application, Qizx is used as an advanced search engine
to leverage information assets already in XML form. - The purpose of the system is to provide services to a number of
client applications:
- consultation, search: possibly involving sophisticated
full-search
- reporting, statistics computation: on a daily, weekly, monthly
basis
- data mining: involving bespoke sophisticated queries
- The actual XML repository resides in a file-system, a database, or
any other legacy system.
- XML documents are added to the repository as a continuous flow.
How Qizx is used:Qizx is used as an indexing and querying
engine, in parallel with an existing infrastructure. Thus it can be added
without affecting the existing system. Benefits of Qizx:- Power of the XQuery language
- Fast query engine available: allows many simultaneous queries;
allows massive queries to be completed in reasonable time (reporting,
data mining).
- Efficient indexing process, allows adding gigabytes of data per day
without affecting the performance of consultation applications.
- Modest additional storage volume: typically roughly the size of the
stored XML documents.
Non-XML DataAn XML query
engine can handle more than pure XML data: many non-XML file formats are
semi-structured, or contain structured meta-data, and as such may be easily
parsed as if they were XML[1].
Examples: - email messages: headers and text
- log records fields and data
- PDF: meta-data and text
- image files (embedded metadata) etc.
By parsing these documents as XML and indexing in Qizx what has
been parsed, you can overcome some limitations of traditional full-text
engines: - Most if not all full-text engines can only handle text fields, not
numeric or date fields
- In most full-text engines, queries cannot include inequalities or
range conditions, like "date greater than '2008-03-21'". Actually even a
query like "field = value" is not always easy to perform.
Benefits of Qizx:- Qizx can do queries involving full-text, string values, numeric
values, date values. No need to assemble a relational DBMS and a
full-text engine.
- XQuery expressions include equality, inequalities, range
comparisons; Qizx executes this kind of queries efficiently.
Content Delivery on CDSome
companies or organizations publish on CD/DVD applications containing large
amounts of structured data, and requiring an evolved search engine: for
example an Aircraft Maintenance Manual coming with an application that helps
maintenance agents to quickly diagnose faults or issues according to the
configuration of the particular aircraft under repair. Such an
application must be able to perform complex queries efficiently using a
multiplicity of criteria ("applicabilities"). Using XML format for
representing the information facilitates integration in the editorial
process, and makes updates easier. How Qizx is used:A XML
Library indexed by Qizx can be burnt on CD/DVD and be used in read-only
mode. Qizx provides both a compact and fast storage for the
data. Benefits of Qizx:- Powerful standalone query engine, featuring full-text.
- Well adapted to relatively slow CD drives.
- Makes the application implementation much easier through the use of
XQuery
- The totality of XML contents is stored by Qizx and available through
the XQuery language and the API. This allows all kinds of processing and
generation of contents.
Massive XML processingProcessing
massive XML documents (size 100 megabytes or more) to extract and format
relevant information is usually not an easy task: - Parsing documents into DOM requires huge amounts of memory. DOM
lacks high-level functions, and does not rely on indexing to boost
searching.
- XSLT and simple XQuery processors (like Qizx/open) are a much higher level tool but,
like DOM, still require parsing the whole document into memory.
- Using SAX is memory efficient, but very low level and requires
higher programming skills.
- Some XQuery processors are called "Streaming"", i.e build in memory
only what is necessary for the XQuery script to execute, but in the
current state of art they often fail to optimize even simple scripts and
end up building the whole document.
How Qizx would be used:The document to process is stored
and indexed into a temporary XML Library (database). This is about 8 times
longer than building a DOM but needs to be done only once. Then many queries
can be executed on this document. Benefits of Qizx:- Expressive power and simplicity of the XQuery language.
- Indexes are automatically built (no configuration or tuning needed),
and automatically used to boost data retrieval tremendously.
- Formatting of results can be achieved by piping into any XSLT
processor supporting the JAXP interfaces, like Saxon.
NotesSeveral libraries, commercial or open-source, exist
for parsing a variety of formats. From a technical point of view,
Qizx Java APIs allow on-the-fly storing and indexing of documents
without generating intermediate XML and re-parsing.
|