Version:
0.1 April 2001
| Performance
testing of XML based Markup Languages |
Version:
0.4 June 2001
Authors:
Anders W. Tell Financial Toolsmiths AB |
Content
Performance testing of XML based markup languages is in this document defined
as a set of measurements taken from transformation processes
which operates on representations (instances) of an information
model (see XML Information
Set) and produces a new representation.
[representation] -> [transformation process] -> [representation]
The representations and the transformation processes are characterized
by a number of properties and each performance measurement is valid
with respect to a specific set of properties.
A baseline measurement is a measurement taken with respect to
a specific set properties which other measurement are related to.
Usually the baseline set of properties are selected in order to create,
what is believed to be one of the fastest possible measurement.
Baselines are used to minimize the effect of machine configurations
and create a relative measurement which easier could be used to compare
measurement from different machines.
NOTE: Properties are name-value pairs. Property names are
shown in BOLD and its possible values are enclosed on
quotes.
Boolean properties are however shown with their name in quotes with
implicit true/false as possible values.
Stream
This representation corresponds to a serialization of an instance.
Properties:
-
Name: name of stream representation.
-
Identifier: identifer of stream representation. Filename, URI, etc.
-
Encoding: "UTF8", "UTF 16", "ISO-8859-1", ...
Type of character encoding used
-
StreamType:
"File"
The stream is stored in a operating system file. Information in the
stream is accessed through I/O which involves a performance degradation
relating to machine configurations but also how the algorithm that accesses
the stream is constructed.
"ByteArray"
The stream is stored in memory, in an array of bytes. No I/O necessary
to access stream information.
-
"W3C XML 1.0"
The stream syntax is valid according to W3C XML 1.0 specification.
-
"W3C XML Namespace"
The stream syntax is valid according to W3C XML Namespace specification.
-
Size: Stream size expressed as bytes
Tree
An instance is represented as in-memory tree which is traversable.
Properties:
-
Name: name of tree representation.
"W3C DOM"
The tree representation is according to W3C DOM specification.
-
Threading:
"S" - Single reader and writer
"M" - Multiple readers and writers
-
"defer-node-expansion"
If this feature is set to true, the tree nodes in the returned document
are expanded as the tree is traversed. This feature allows the parser to
return a document faster than if the tree is fully expanded during parsing
and improves memory usage when the whole tree is not traversed.
-
"Include ignorable whitespace"
Includes text nodes that can be considered "ignorable whitespace" in
the tree.
-
"create-entity-ref-nodes"
Create EntityReference nodes in the tree. The EntityReference nodes
and their child nodes will be read-only.
-
Size: In.memory size expressed as bytes
Events
An instance is represented as sequence of events, usually defined
by an API and a set of sequence rules.
Properties:
-
Name: "SAX2", "SAX"
Name of representation.
-
"namespace-prefixes"
Report the original prefixed names and attributes used for XML Namespace
declarations.
-
"java-string-interning"
All element names, prefixes, attribute names, XML Namespace URIs, and
local names are internalized using java.lang.String.intern.
Properties:
-
Iterations: The number of iterations the transformation was performed.
-
"validation"
The transformation process includes an activity which validate the
input representation against a schema.
-
ValidationSchema: "XML-DTD-x.x", "XML-Schema-x.x", "TREX-x.x", "RELAX-x.x",...
-
"external-general-entities"
Include external general entities
-
"external-parameter-entities"
Include external parameter entities or the external DTD subset.
-
"W3C XML Namespace handling"
Perform XML namespace processing: prefixes will be stripped off element
and attribute names and replaced with the corresponding namespace URIs.
By default, the two will simply be concatenated, but the namespace-sep
core property allows the application to specify a delimiter string for
separating the URI part and the local part.
Duration
Mesured as the time to complete a transformation.
CodeSize
Measured as the size of the "binary" codebase of the piece of software
which performs the transformation.
size
Measured as the size of resulting representation.
-
stream : size of stream
-
tree : size of in-memory representation
-
event: not applicable
Symmetry
The symmetry measurement is the relation between a transformation measureement
and the measurement from the reverse transformation.
An example: time of and stream -> event transformation divided
by the time of an event -> stream transformation.
Memory
The memory size measurement is measured as the difference between the
size of runtime memory before and after the transformation.
General rules:
-
Java
-
forced garbage-collection SHOULD be performed before each transformation
-
Streams
-
the activity of flushing intermediate buffers MUST be included in any time
measurements.
| Basic
Transformation processes |
The following list corresponds to 9 basic transformation processes. For
each process a baseline is defined.
NOTE: Absence of a baseline property name means that the value
is either empty or false
1. Stream -> Event
Baseline properties:
-
Encoding: UTF8
-
StreamType: ByteArray
-
W3C XML 1.0
-
W3C XML namespace
-
SAX2
-
NoValidation
2. Stream -> Tree
Baseline properties:
-
Encoding: UTF8
-
StreamType: ByteArray
-
W3C XML 1.0
-
W3C XML namespace
-
W3C DOM
-
NoValidation
3. Event -> Tree
Baseline properties:
-
SAX2
-
W3C DOM
-
NoValidation
4. Event -> Stream
Baseline properties:
-
SAX2
-
Encoding: UTF8
-
StreamType: ByteArray
-
W3C XML 1.0
-
W3C XML namespace
-
NoValidation
5. Tree -> Event
Baseline properties:
-
W3C DOM
-
SAX2
-
NoValidation
6. Tree -> Stream
Baseline properties:
-
W3C DOM
-
Encoding: UTF8
-
StreamType: ByteArray
-
W3C XML 1.0
-
W3C XML namespace
-
NoValidation
7. Stream -> Stream
Baseline properties:
-
Encoding: UTF8
-
StreamType: ByteArray
-
W3C XML 1.0
-
W3C XML namespace
-
NoValidation
8. Tree -> Tree
Baseline properties:
9. Event -> Event
Baseline properties:
A proof of concept implementation have been create, see the files
section for bml-x.x.tar.gx file.
The JUnit testcases are found in the 'test' directory.
TODO: create testcode/ testcases based on unit testing and the code
package JUnit.
.