This specification explains the CML dictionaries mechanism, and defines the minimum requirements for interoperability.
1. Introduction
1.1 Notational Conventions
1.2 Namespaces
2. The Dictionary Mechanism
2.1 Dictionary Namespace
3. Convention
4. Authoring Dictionaries
4.1 Dictionary Element
4.2 Entry Element
5. Standard Dictionaries
A. References
B. Acknowledgements
Dictionaries allow concepts beyond those defined by the core CML schema types to be understood by machines. Much of physical science can managed through referencing appropriate dictionary entries, which can represent data types (e.g. string, integer, double), units (e.g. grams, Kelvin), physical properties (e.g heat of formation, melting point) and other scientific concepts (e.g. point group).
Dictionaries aso provide for disambiguation between similarly named concepts. Different programs sometimes produce data with the same label but a different interpretation; does density mean electron density or physical density? This ambiguity can be resolved through referencing specific dictionary entries.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [IETF RFC 2119].
The terms "element", "attribute", "child" and "parent" in this document are to be interpreted as described in the W3C Recommendation for Extensible Markup Language (XML) [W3C XML].
The use of fonts is as follows:
this font
.This specification uses the following namespaces and prefixes to indicate those namespaces:
Prefix | Namespace URI | Description |
---|---|---|
cml |
http://www.xml-cml.org/schema |
Chemical Markup Language elements |
convention |
http://www.xml-cml.org/convention/ |
Standard Chemical Markup Language convention namespace |
xhtml |
http://www.w3.org/1999/xhtml |
XHTML |
CML dictionaries consist of a collection of entries, each of which defines
a single concept. Dictionaries are constructed using the
dictionary
element, and entries using the entry
element.
Concepts defined by dictionary entries can be referenced from other
locations using dictionary references, specified using dictRef
attributes.
In order to be referenceable a dictionary MUST specify a namespace using
the namespace
attribute, and entries MUST specify an ID using
the id
attribute, which MUST be unique within the scope
of the entry's parent dictionary.
The value of the dictRef
attribute MUST be a QName which
expands to the namespace
and id
of the
dictionary entry defining the concept being referenced.
The value of an entry
's id
attribute MUST start
with a letter, and MUST only contain letters, numbers, dot, hyphen or
underscore.
IdStartChar |
::= | [A-Z] | [a-z] |
IdChar |
::= | IdStartChar | [0-9] | "." | "-" | "_" |
Id |
::= | IdStartChar (IdChar)* |
A minimal CML dictionary:
<?xml version="1.0" encoding="UTF-8"?> <dictionary xmlns="http://www.xml-cml.org/schema" namespace="http://www.xml-cml.org/dictionary/dummy/"> <entry id="mw" term="Molecular Weight" /> </dictionary>
A CML property referencing a concept defined by a dictionary entry:
<?xml version="1.0" encoding="UTF-8"?> <cml xmlns="http://www.xml-cml.org/schema" xmlns:dummy="http://www.xml-cml.org/dictionary/dummy/"> <property dictRef="dummy:mw"> <scalar dataType="xsd:double">72.02</scalar> </property> </cml>
A dictionary's namespace URI SHOULD resolve to a representation of the dictionary. A dictionary's namespace URI SHOULD end with either a '/' character or a '#' character so that referenced terms may be resolved by appending them to the URI.
In order for dictionary namespaces to be resolvable a CML document SHOULD NOT contain more than one dictionary.
The CML schema only loosely defines the structure of a dictionary. More detailed structure is imposed through use of conventions.
To promote interoperability between CML aware software and dictionaries,
all dictionaries SHOULD specify a convention, using the
convention
attribute. The RECOMMENDED convention for a
dictionary is http://www.xml-cml.org/convention/dictionary
.
A dictionary
element SHOULD have a namespace
attribute defining the the scope within which the entry terms are unique.
A dictionary
element MAY have a title
attribute
specifying a human-readable name for the dictionary.
A dictionary
element MAY have a description
child
element specifying a human-readable description of the dictionary.
A dictionary
element MAY have a dictionaryPrefix
attribute specifying a default prefix to use when referencing dictionary
entries.
An entry
element SHOULD have a id
attribute
specifying a unique ID, within the scope of the dictionary, by which the
entry can be referenced.
An entry
element SHOULD have a term
attribute
specifying unique nounal phrase linguistically identifying the subject
of the entry.
An entry
element MAY have a child definition
element containing a nounal phrase defining of the subject of the entry
more verbosely than the term.
An entry
element MAY have a child description
element containing further information regarding the term.
An entry
element MAY have a child dataType
attribute specifying the data type of values referencing the entry.
An entry
element MAY have a child unitType
attribute specifying the unit type of values referencing the entry.
An entry
element MAY have a child units
attribute specifying the units of values referencing the entry.
Where concepts are defined by the CML schema they SHOULD NOT be redefined using the dictionary mechanism.
Where concepts are defined by standard dictionaries these entries SHOULD be
referenced, rather than redefining the concept in another dictionary.
Standard (widely used) dictionaries SHOULD be namespaced under the
http://www.xml-cml.org/dictionary/
domain, and registered
with the xml-cml.org
site.
The following standard dictionaries are / are expected to be defined.
For testing purposes: conceptually equivalent to http://example.com
.