This specification defines the requirements of the Chemical Markup Language dictionary convention.
1. Introduction
1.1 Notational Conventions
1.2 Namespaces
2. Applying the dictionary convention
3. Dictionary Element
3.1 Namespace
3.2 Prefix
3.3 Title
3.4 Description
3.5Entries
4. Entry Elements
4.1 Id
4.2 Term
4.3 Definition
4.4 Description
4.5 Data type
4.6 Unit type
4.7 Default units
5. Example Dictionary
A. References
B. Acknowledgements
Dictionaries allow CML to be understood by machines. Much of physical science is managed through the dictionary mechanism. We find terms and units relating to a aspect of science (such as heat of formation, melting point, point group) and create entries for these items in a dictionary.
The entries can consist of just a unique id (within the dictionary's namespace) and some human-understandable definition but we highly encourage more information to be given. For instance what are the units, are there upper and lower bounds, what is the type of the data (string, integer, float etc).
Different programs sometimes produce data with the same label but a different interpretation; does density mean electron density or physical density? Therefore each computational chemistry code will have its own dictionary and then the community can then decide to group particular concepts together.
Where concepts are defined by the CML schema they SHOULD NOT be redefined using the dictionary mechanism.
Where concepts are defined by standard dictionaries these entries SHOULD be referenced, rather than redefining the concept in another dictionary.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [IETF RFC 2119].
The terms "element", "attribute", "child" and "parent" in this document are to be interpreted as described in the W3C Recommendation for Extensible Markup Language (XML) [W3C XML].
The use of fonts is as follows:
this font
.This specification uses the following namespaces and prefixes to indicate those namespaces:
Prefix | Namespace URI | Description |
---|---|---|
cml |
http://www.xml-cml.org/schema |
Chemical Markup Language elements |
convention |
http://www.xml-cml.org/convention/ |
Standard Chemical Markup Language convention namespace |
xhtml |
http://www.w3.org/1999/xhtml |
XHTML |
The dictionary convention MUST be specified using the convention
attribute on either a cml
or a dictionary
element.
If the convention is specified on a cml
element then that element
MUST contain a single child element in the http://www.xml-cml.org/schema
namespace, which MUST be a dictionary
element.
The dictionary
element MUST have a namespace
attribute,
the value of which MUST be a valid URI defining the the scope within which the
entry terms are unique.
The dictionary's namespace URI SHOULD resolve to a representation of the dictionary.
The dictionary's namespace URI SHOULD end with either a '/' character or a '#' character
so that terms may be referenced by appending them to the URI.
The dictionary
element SHOULD have a dictionaryPrefix
attribute specifying the default prefix to use when referencing dictionary entries.
The dictionaryPrefix
MUST be a valid XML QName prefix, and SHOULD be
unique within the CML domain.
The dictionary
element SHOULD have a title
attribute intended for human-readability.
The dictionary
element SHOULD have a single description
child element, the contents of which provide a human-readable description of the
domain of the dictionary.
The description
element MUST contain one or more child elements in the
http://www.w3.org/1999/xhtml
namespace.
The description
element MUST NOT contain any child elements not in
the http://www.w3.org/1999/xhtml
namespace.
The dictionary
element MUST contain one ore more child entry
elements, and MUST not contain any other child elements from the
http://www.xml-cml.org/schema
namespace.
An entry
element MUST have an id
attribute, the
value of which MUST be unique within the scope of the dictionary.
The value of the id
attribute MUST start with a letter, and
MUST only contain letters, numbers, dot, hyphen or underscore.
IdStartChar |
::= | [A-Z] | [a-z] |
IdChar |
::= | IdStartChar | [0-9] | "." | "-" | "_" |
Id |
::= | IdStartChar (IdChar)* |
An entry
element MUST have a term
attribute, the
value of which provides a unique nounal phrase linguistically identifying
the subject of the entry.
The value of the term
attribute MAY contain any valid unicode
character, however it is RECOMMENDED that any character from outside of
the ASCII subset (codepoints 32-127) is represented using an entity reference.
An entry
element MUST contain a single definition
child element, the content of which provides a nounal phrase defining of the
subject of the entry more verbosely than the term.
The definition
element MUST contain one or more child elements in the
http://www.w3.org/1999/xhtml
namespace.
The definition
element MUST NOT contain any child elements not in
the http://www.w3.org/1999/xhtml
namespace.
An entry
element MAY have a single description
child element, the content of which provides further information regarding
the term, including, but not limited to: examples, human-readable semantics
and hyperlinks to other useful resources.
The description
element MUST contain one or more child elements in the
http://www.w3.org/1999/xhtml
namespace.
The description
element MUST NOT contain any child elements not in
the http://www.w3.org/1999/xhtml
namespace.
When applicable to the concept defined, an entry
SHOULD have
dataType
attribute, the value of which is a QName
referencing the data type of value defined using the entry
.
xsd:string
xsd:double
xsd:integer
xsd:boolean
An entry
MUST have a unitType
attribute, the value of which is a QName
referencing the unit type (e.g. temperature) of any value defined using
the entry
.
An entry
describing a concept which should not have units;
for example a name of a program should have unitType none
in the standard CML unitType dictionary.
When applicable to the concept defined, an entry
SHOULD have
a units
attribute, the value of which is a QName
referencing the default units (e.g. Kelvin) of any value defined using the
entry
.
If the unitType
is expressly given as unknown
then the unit attribute MUST NOT be present.
If the unitType
is expressly given as none
then the unit attribute MUST be present and its value must point to
http://www.xml-cml.org/unit/si#none.
<?xml version="1.0" encoding="UTF-8" ?> <dictionary xmlns="http://www.xml-cml.org/schema" xmlns:convention="http://www.xml-cml.org/convention/" xmlns:unit="http://www.xml-cml.org/unit/nonSi/" xmlns:unitType="http://www.xml-cml.org/unit/unitType/" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xsd="http://www.w3.org/2001/XMLSchema" convention="convention:dictionary" title="fundamental chemistry concepts" namespace="http://www.xml-cml.org/dictionary/dummy/" dictionaryPrefix="dummy"> <description> <xhtml:p> This is an example dictionary </xhtml:p> </description> <entry id="molecmass" term="Molecular Mass" dataType="xsd:double" unitType="unitType:amount" units="unit:dalton"> <definition> <xhtml:p> The mass of one mole of a substance in unified atomic mass units (Dalton). </xhtml:p> </definition> <description> <xhtml:p> The molecular mass (m) of a substance is the mass of one molecule of that substance, in unified atomic mass unit(s) u (equal to 1/12 the mass of one atom of the isotope carbon-12). This is numerically equivalent to the relative molecular mass (Mr) of a molecule, frequently referred to by the term molecular weight, which is the ratio of the mass of that molecule to 1/12 of the mass of carbon-12 and is a dimensionless number. Thus, it is incorrect to express relative molecular mass (molecular weight) in daltons (Da). Unfortunately, the terms molecular weight and molecular mass have been confused on numerous websites, which often state that molecular weight was used in the past as another term for molecular mass. </xhtml:p> <xhtml:p> Molecular mass differs from more common measurements of the mass of chemicals, such as molar mass, by taking into account the isotopic composition of a molecule rather than the average isotopic distribution of many molecules. As a result, molecular mass is a more precise number than molar mass; however it is more accurate to use molar mass on bulk samples. This means that molar mass is appropriate most of the time except when dealing with single molecules. </xhtml:p> </description> </entry> <entry id="molarmass" term="Molar Mass" dataType="xsd:double" unitType="unitType:amount" units="unit:dalton"> <definition> <xhtml:p> The mass per amount of substance. </xhtml:p> </definition> <description> <xhtml:p> Molar mass, symbol M, is a physical property characteristic of a given substance (chemical element or chemical compound), namely its mass per amount of substance. The base SI unit for mass is the kilogram and that for amount of substance is the mole. Thus, the derived unit for molar mass is kg/mol. However, for both practical and historical reasons, molar masses are almost always quoted in grams per mole (g/mol or g mol−1), especially in chemistry. </xhtml:p> <xhtml:p> Molar mass is closely related to the relative molar mass (Mr) of a compound, the older term formula weight and to the standard atomic masses of its constituent elements. However, it should be distinguished from the molecular mass (also known as molecular weight), which is the mass of one molecule (of any single isotopic composition) and is not directly related to the atomic mass, the mass of one atom (of any single isotope). The dalton, symbol Da, is also sometimes used as a unit of molar mass, especially in biochemistry, with the definition 1 Da = 1 g/mol, despite the fact that it is strictly a unit of molecular mass (1 Da = 1.660 538 782(83)×10−27 kg). </xhtml:p> </description> </entry> </dictionary>