This specification defines the requirements of the Chemical Markup Language dictionary convention.
1. Introduction
1.1 Notational Conventions
1.2 Namespaces
2. Applying the dictionary convention
3. Dictionary Element
3.1 Namespace
3.2 Prefix
3.3 Title
3.4 Description
3.5Entries
4. Entry Elements
4.1 Id
4.2 Term
4.3 Definition
4.4 Description
4.5 Data type
4.6 Unit type
4.7 Default units
5. Example Dictionary
A. References
B. Acknowledgements
Dictionaries allow CML to be understood by machines. Much of physical science is managed through the dictionary mechanism. We find terms and units relating to a aspect of science (such as heat of formation, melting point, point group) and create entries for these items in a dictionary.
The entries can consist of just a unique id (within the dictionary's namespace) and some human-understandable definition but we highly encourage more information to be given. For instance what are the units, are there upper and lower bounds, what is the type of the data (string, integer, float etc).
Different programs sometimes produce data with the same label but a different interpretation; does density mean electron density or physical density? Therefore each computational chemistry code will have its own dictionary and then the community can then decide to group particular concepts together.
Where concepts are defined by the CML schema they SHOULD NOT be redefined using the dictionary mechanism.
Where concepts are defined by standard dictionaries these entries SHOULD be referenced, rather than redefining the concept in another dictionary.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [IETF RFC 2119].
The terms "element", "attribute", "child" and "parent" in this document are to be interpreted as described in the W3C Recommendation for Extensible Markup Language (XML) [W3C XML].
The use of fonts is as follows:
this font.This specification uses the following namespaces and prefixes to indicate those namespaces:
| Prefix | Namespace URI | Description |
|---|---|---|
cml |
http://www.xml-cml.org/schema |
Chemical Markup Language elements |
convention |
http://www.xml-cml.org/convention/ |
Standard Chemical Markup Language convention namespace |
xhtml |
http://www.w3.org/1999/xhtml |
XHTML |
The dictionary convention MUST be specified using the convention
attribute on either a cml or a dictionary element.
If the convention is specified on a cml element then that element
MUST contain a single child element in the http://www.xml-cml.org/schema
namespace, which MUST be a dictionary element.
The dictionary element MUST have a namespace attribute,
the value of which MUST be a valid URI defining the the scope within which the
entry terms are unique.
The dictionary's namespace URI SHOULD resolve to a representation of the dictionary.
The dictionary's namespace URI SHOULD end with either a '/' character or a '#' character
so that terms may be referenced by appending them to the URI.
The dictionary element SHOULD have a dictionaryPrefix
attribute specifying the default prefix to use when referencing dictionary entries.
The dictionaryPrefix MUST be a valid XML QName prefix, and SHOULD be
unique within the CML domain.
The dictionary element SHOULD have a title
attribute intended for human-readability.
The dictionary element SHOULD have a single description
child element, the contents of which provide a human-readable description of the
domain of the dictionary.
The description element MUST contain one or more child elements in the
http://www.w3.org/1999/xhtml namespace.
The description element MUST NOT contain any child elements not in
the http://www.w3.org/1999/xhtml namespace.
The dictionary element MUST contain one ore more child entry
elements, and MUST not contain any other child elements from the
http://www.xml-cml.org/schema namespace.
An entry element MUST have an id attribute, the
value of which MUST be unique within the scope of the dictionary.
The value of the id attribute MUST start with a letter, and
MUST only contain letters, numbers, dot, hyphen or underscore.
IdStartChar |
::= | [A-Z] | [a-z] |
IdChar |
::= | IdStartChar | [0-9] | "." | "-" | "_" |
Id |
::= | IdStartChar (IdChar)* |
An entry element MUST have a term attribute, the
value of which provides a unique nounal phrase linguistically identifying
the subject of the entry.
The value of the term attribute MAY contain any valid unicode
character, however it is RECOMMENDED that any character from outside of
the ASCII subset (codepoints 32-127) is represented using an entity reference.
An entry element MUST contain a single definition
child element, the content of which provides a nounal phrase defining of the
subject of the entry more verbosely than the term.
The definition element MUST contain one or more child elements in the
http://www.w3.org/1999/xhtml namespace.
The definition element MUST NOT contain any child elements not in
the http://www.w3.org/1999/xhtml namespace.
An entry element MAY have a single description
child element, the content of which provides further information regarding
the term, including, but not limited to: examples, human-readable semantics
and hyperlinks to other useful resources.
The description element MUST contain one or more child elements in the
http://www.w3.org/1999/xhtml namespace.
The description element MUST NOT contain any child elements not in
the http://www.w3.org/1999/xhtml namespace.
When applicable to the concept defined, an entry SHOULD have
dataType attribute, the value of which is a QName
referencing the data type of value defined using the entry.
xsd:stringxsd:doublexsd:integerxsd:boolean
An entry MUST have a unitType attribute, the value of which is a QName
referencing the unit type (e.g. temperature) of any value defined using
the entry.
An entry describing a concept which should not have units;
for example a name of a program should have unitType none
in the standard CML unitType dictionary.
When applicable to the concept defined, an entry SHOULD have
a units attribute, the value of which is a QName
referencing the default units (e.g. Kelvin) of any value defined using the
entry.
If the unitType is expressly given as unknown
then the unit attribute MUST NOT be present.
If the unitType is expressly given as none
then the unit attribute MUST be present and its value must point to
http://www.xml-cml.org/unit/si#none.
<?xml version="1.0" encoding="UTF-8" ?>
<dictionary xmlns="http://www.xml-cml.org/schema"
xmlns:convention="http://www.xml-cml.org/convention/"
xmlns:unit="http://www.xml-cml.org/unit/nonSi/"
xmlns:unitType="http://www.xml-cml.org/unit/unitType/"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
convention="convention:dictionary"
title="fundamental chemistry concepts"
namespace="http://www.xml-cml.org/dictionary/dummy/"
dictionaryPrefix="dummy">
<description>
<xhtml:p>
This is an example dictionary
</xhtml:p>
</description>
<entry id="molecmass" term="Molecular Mass"
dataType="xsd:double" unitType="unitType:amount" units="unit:dalton">
<definition>
<xhtml:p>
The mass of one mole of a substance in unified atomic mass units (Dalton).
</xhtml:p>
</definition>
<description>
<xhtml:p>
The molecular mass (m) of a substance is the mass of one molecule of that substance,
in unified atomic mass unit(s) u (equal to 1/12 the mass of one atom of the isotope
carbon-12). This is numerically equivalent to the relative molecular mass (Mr) of a
molecule, frequently referred to by the term molecular weight, which is the ratio of
the mass of that molecule to 1/12 of the mass of carbon-12 and is a dimensionless
number. Thus, it is incorrect to express relative molecular mass (molecular weight)
in daltons (Da). Unfortunately, the terms molecular weight and molecular mass have
been confused on numerous websites, which often state that molecular weight was used
in the past as another term for molecular mass.
</xhtml:p>
<xhtml:p>
Molecular mass differs from more common measurements of the mass of chemicals, such
as molar mass, by taking into account the isotopic composition of a molecule rather
than the average isotopic distribution of many molecules. As a result, molecular mass
is a more precise number than molar mass; however it is more accurate to use molar
mass on bulk samples. This means that molar mass is appropriate most of the time
except when dealing with single molecules.
</xhtml:p>
</description>
</entry>
<entry id="molarmass" term="Molar Mass"
dataType="xsd:double" unitType="unitType:amount" units="unit:dalton">
<definition>
<xhtml:p>
The mass per amount of substance.
</xhtml:p>
</definition>
<description>
<xhtml:p>
Molar mass, symbol M, is a physical property characteristic of a given substance
(chemical element or chemical compound), namely its mass per amount of substance.
The base SI unit for mass is the kilogram and that for amount of substance is
the mole. Thus, the derived unit for molar mass is kg/mol. However, for both
practical and historical reasons, molar masses are almost always quoted in grams
per mole (g/mol or g mol−1), especially in chemistry.
</xhtml:p>
<xhtml:p>
Molar mass is closely related to the relative molar mass (Mr) of a compound, the
older term formula weight and to the standard atomic masses of its constituent
elements. However, it should be distinguished from the molecular mass (also
known as molecular weight), which is the mass of one molecule (of any single
isotopic composition) and is not directly related to the atomic mass, the mass
of one atom (of any single isotope). The dalton, symbol Da, is also sometimes
used as a unit of molar mass, especially in biochemistry, with the definition
1 Da = 1 g/mol, despite the fact that it is strictly a unit of molecular mass
(1 Da = 1.660 538 782(83)×10−27 kg).
</xhtml:p>
</description>
</entry>
</dictionary>