| |
Authors:
Anders W. Tell, Financial Toolsmiths AB, Maintainer
Design
BML Design document.
Goals:
Primary
-
An binary stream-representation of XML 1.0 information-items,all or a subset.
-
As simple stream-representation as possible, KISS.
-
Support all XML Schema datatypes
-
Use native and IEEE datatypes to facilitate fast stream to/from memory
conversions.
-
Fast parsing and conversion to in-memory trees, W3C DOM or others.
-
Simple and small codebase for encoders /decoders.
-
The core BML files should be compatible with Java Micro Edition CLDC 1.0
Secondary
-
Small binary-stream size compared to XML text-stream-representation, on
average anyway.
However GZIP sizes are a secondary goal compared to performance and
ease of use.
-
Extensible on in terms of tokens and datatypes.
-
Allow encapsulation of OMG IIOP messages in a self describing stream-representation.
-
Include small and fast Tree API
Status (v0.40)
-
All four representation are now supported.
-
An small XML parser has been added.
-
Only a few datatypes are currentlu supported but most of these scheduled
to be added in the next version.
-
The codebase should be really usable when version 0.6 is released .
Changes
Major changes are collected and described in the Changelog
TODO
A TODO list is kept in the TODO
Space considerations
Streamspace saving techniques
-
Tag compression
-
Native datatypes
-
... there is more in there ...
Performance testing
The framework for testing performance is described here.
Stream performance enhancements techniques
-
Binary tokens
-
No need to discover or look for where the next token is.
-
No need to handle multiibyte character encodings in tokenization.
-
Native datatypes
-
No needs to do characters to native datatype conversion, which is a very
expensive operation.
-
Matches many implementation languages and machines representation of native
datatypes
(follows CDR en/decoding rules without alignment (could be added
later)).
-
Strings are preceded by a length indicator which improves reading and memory
handling.
-
Explicit datatyping reduces the amout of syntax and error handling code.
-
Relativly small and simple grammer
-
Smaller code for parsing and reading.
-
Fewer bugs since the codebase is simpler and understandable.
-
Produces, on average, smaller streamsize
-
Reusable information
-
Recurring information may be encoded only once and reused in other parts.
-
Namespace handle is simpler since no prefixes are used. Element, attributes
and datatypes are all automatically bundled with namespace and localname.
-
Some information is packaged in a way that a tree implementation may reuse
it without doing extensive / expensive information restructuring and conversion.
Requirements
-
JDK 1.3.x (JDK 1.1.x, JDK 1.2.x should work, not tested
though)
-
xerces.jar - for SAX and DOM conversions
-
junit.jar v3.5 - for testing
XML files to BML stream compressor:
java -client org.openebxml.comp.bml.ext.appl.BMLCompressor
-options filename
creates a file named: 'filename.bml'
BML Stream to XML streams:
java -client org.openebxml.comp.bml.ext.appl.BMLDecompressor
-options filename
creates a file named 'filename.xml'
BML Assembler
java -client org.openebxml.comp.bml.ext.appl.BMLTokens filename
Parses a BML encoded stream and print the tokens to screen.
Examples are found in the 'examples' directory
NOTE: Many more examples are scheduled to be created. |