This specification defines the requirements of the Chemical Markup Language Molecular convention.
This document describes the concepts which are introduced in the molecular convention, explains how to compose a document that conforms to the molecular convention and illustrates these with examples.
The molecular convention is used to specify chemistry relating to molecules; for example connection tables formulae, names and properties. The molecules can also contain spectra though these will have their own conventions depending on whether they are Infra Red, NMR etc.
Where the behaviour of an element or attribute is completely explained by the schema it is not further elaborated on in this document; typically in these cases an entry will only state whether the node is required, suggested or optional.
Except where they are expressly forbidden, the convention allows users to optionally include both other cml elements and attributes, and foreign namespaced elements and attributes. It is expected that in general tools will silently ignore the extra information because they will not be able to understand it.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [ IETF RFC 2119 ].
The terms "element", "attribute", "child" and "parent" in this document are to be interpreted as described in the W3C Recommendation for Extensible Markup Language (XML) [ W3C XML ].
The use of fonts is as follows:
this font.dictRef), are written in this font.This specification uses the following namespaces and prefixes to indicate those namespaces:
| Prefix | Namespace URI | Description |
|---|---|---|
cml
|
http://www.xml-cml.org/schema
|
Chemical Markup Language elements |
convention
|
http://www.xml-cml.org/convention/
|
Standard Chemical Markup Language convention namespace |
The molecular convention MUST be specified using the convention attribute
on a molecular element or a cml element. The value of the attribute
MUST be a QName that represents the molecular convention, i.e. convention:molecular.
If the molecular convention is specified on a cml element then that element
MUST have at least one child molecule element that either has no convention specified or
specifies the molecular convention.
<molecule xmlns="http://www.xml-cml.org/schema"
xmlns:convention="http://www.xml-cml.org/convention/"
convention="convention:molecular" id="m1">
<!-- body is omitted. -->
</molecule>
<cml xmlns="http://www.xml-cml.org/schema"
xmlns:convention="http://www.xml-cml.org/convention/"
convention="convention:molecular">
<molecule id="m1">
<!-- body is omitted. -->
</molecule id="m1">
</cml>
<cml xmlns="http://www.xml-cml.org/schema" xmlns:convention="http://www.xml-cml.org/convention/" convention="convention:molecular"> </cml>
<formula xmlns="http://www.xml-cml.org/schema"
xmlns:convention="http://www.xml-cml.org/convention/"
convention="convention:molecular">
<!-- body is omitted. -->
</formula>
Within the molecular convention, a molecule is REQUIRED to be
a child of either cml or molecule elements.
A molecule in the molecular convention is used to hold any combination of:
name, a label
held in a label or an inline representation in a inline
attribute on a formula element).
concise
attribute on a formula element, or by specifically listing the
atoms in either an atomArray as a child of a molecule
or in an atomArray as a child or a formula.
atoms, connected by bonds. By definition
a molecule is a connected set, therefore hydrochloric acid (bonded) would be a single
molecule, whilst H+ Cl- could be represented as a parent molecule
containing two child molecules (one containing H+, the other containing Cl-).
A molecule element MUST have an id attribute, the
value of which MUST be unique amongst the molecules within the scope of the document.
The value of the id attribute MUST start with a letter, and
MUST only contain letters, numbers, dot, hyphen or underscore.
IdStartChar
|
::= |
[A-Z] | [a-z]
|
IdChar
|
::= |
IdStartChar | [0-9] | "." | "-" | "_"
|
Id
|
::= |
IdStartChar (IdChar)*
|
A molecule that is a child of another molecule MUST have a
count attribute specified. The value of this attribute MUST be a non-negative
number.
<cml:molecule convention="convention:molecular" id="parentMol">
<cml:molecule id="childMol1" count="1">
<!-- body is omitted. -->
</cml:molecule>
<cml:molecule id="childMol2" count="0.5">
<!-- body is omitted. -->
</cml:molecule>
</cml:molecule>
<cml:molecule convention="convention:molecular" id="parentMol">
<cml:molecule id="childMol1">
<!-- body is omitted. -->
</cml:molecule>
</cml:molecule>
A molecule that is not a child MUST NOT have a count specified.
<cml:molecule convention="convention:molecular" id="parentMol" count="2">
<!-- body is omitted. -->
</cml:molecule>
A molecule MAY contain a single atomArray child except when it
contains child molecules.
<cml:molecule convention="convention:molecular" id="parentMol">
<cml:molecule id="childMol1" count="1">
<!-- body is omitted. -->
</cml:molecule>
<cml:atomArray>
<!-- body is omitted. -->
<cml:atomArray>
</cml:molecule>
A molecule MAY contain a single bondyArray child provided
that it does not contain child molecules.
A molecule MAY contain any number of formula children.
A molecule MAY contain any number of property children.
A molecule MAY contain any number of label children.
A molecule MAY contain any number of name children.
A molecule SHOULD have a formalCharge attribute specified.
For molecules that have child molecules, the value of the formalCharge
SHOULD be equal to the sum of the formalCharges of the child molecules multiplied by the count value
of those molecules.
The value of the formalCharge attribute on a molecule that does not
contain child molecules is less well defined. In general its value is more
important than that of individual atoms (i.e. a cyclopentadienyl anion
would have formalCharge="-1" on the molecule but not necessarily have a formalCharge
attribute on any of the atoms).
A molecule SHOULD have a spinMultiplicity attribute specified.
A molecule MAY have a chirality attribute specified.
A molecule MAY have child spectrum elements. Each spectrum
element MUST specify a convention using the convention attribute to which they
conform.
An atomArray element MUST be a child of either a molecule or a
formula element. The atomArray is simply a container for atoms.
An atomArray element MUST contain at least one child atom element.
A bondArray element MUST be a child of a molecule element.
The bondArray is simply a container for bonds.
A bondArray element MUST contain at least one child bond element.
A formula element MUST be the child of either molecule or
formula elements.
A formula MUST have at least one of an atomArray child, a concise attribute
and an inline attribute.
A formula that is a child of another formula MUST have a
count attribute specified. The value of this attribute MUST be a non-negative
number.
<cml:molecule convention="convention:molecular" id="ml">
<cml:atomArray>
<cml:atomArray count="1">
<!-- body is omitted. -->
</cml:atomArray>
</cml:atomArray>
</cml:molecule>
A formula that is not a child of a formula element MUST NOT
have count specified.
<cml:molecule convention="convention:molecular" id="ml">
<cml:atomArray count="1">
<!-- body is omitted. -->
</cml:atomArray>
</cml:molecule>
A formula element MAY contain a single atomArray element.
A formula element SHOULD have a concise attribute if possible, i.e.
if it can be calculated from the atoms in the formula's atomArray or
potentially from the parent molecule's atoms.
The concise attribute
is used to hold an (unstructured) formula i.e. no submolecules. The schema defines the
allowed pattern for the concise attribute.
A formula element MAY have an inline attribute.
The inline attribute
can be used to hold any information. There is no fixed way for markup to be specified but
it is recommended that Latex style is used i.e. H_{3}O^{+} to represent the
hydroxonium ion.
A property element is used to wrap a scalar and define to what
the scalar value relates.
A property MUST have a single scalar child that gives the value
of the property.
A property MUST have a dictRef attribute,
the value of which is a QName referencing an entry in a dictionary
which defines how this property should be interpreted.
It is RECOMMENDED that property elements have a title attribute
intended for human-readability.
The title attribute MUST NOT be empty and MUST contain at least one non-whitespace character.
The value of the title attribute MAY contain any valid unicode character, however it is RECOMMENDED that any character from outside of the ASCII subset (codepoints 32-127) is represented using an entity reference.
A scalar MUST have
a units attribute, the value of which is a QName
referencing the units of the value defined
using the scalar.
A scalar element MUST have
dataType attribute, the value of which is a QName
referencing the data type of the value defined.
The semantics of the label are not defined in the schema but are normally commonly
used standard or semi-standard text strings.
A label MUST have a dictRef attribute,
the value of which is a QName referencing an entry in a dictionary
which defines how this label should be interpreted.
A name element contains a string that is the chemical name of the molecule.
The name does not need to be a structural chemical name. It is RECOMMENDED that formatting
and foreign (non-ASCII) characters are encoded using Latex style markup.
A name MUST have a dictRef attribute,
the value of which is a QName referencing an entry in a dictionary
which defines how this name should be interpreted.
An atom MUST be a child of atomArray.
An atom MUST have an elementType attribute.
An atom MUST have an id attribute it is part of an
atomArray in a formula (when the id is optional).
<cml:molecule convention="convention:molecular" id="ml" formalCharge="1">
<cml:atomArray>
<cml:atom elementType="H" id="a1" formalCharge="1"/>
</cml:atomArray>
</cml:molecule>
<cml:molecule convention="convention:molecular" id="ml" formalCharge="1">
<cml:atomArray>
<cml:atom elementType="H" formalCharge="1"/>
</cml:atomArray>
</cml:molecule>
<cml:molecule convention="convention:molecular" id="ml" formalCharge="1">
<cml:formula concise="H 1 1">
<cml:atomArray>
<cml:atom elementType="H" formalCharge="1" />
</cml:atomArray>
</cml:formula>
</cml:molecule>
The value of the id MUST be unique amongst the atoms within the
eldest containing molecule.
The value of the id attribute MUST start with a letter, and
MUST only contain letters, numbers, dot, hyphen or underscore.
IdStartChar
|
::= |
[A-Z] | [a-z]
|
IdChar
|
::= |
IdStartChar | [0-9] | "." | "-" | "_"
|
Id
|
::= |
IdStartChar (IdChar)*
|
An atom MAY have an x2 attribute, the value of which is used for displaying the
object in 2 dimensions. This is unrelated to the 3-D coordinates for the object.
If a x2 attribute is present there MUST also be a y2 attribute.
An atom MAY have an y2 attribute, the value of which is used for displaying the
object in 2 dimensions. This is unrelated to the 3-D coordinates for the object.
If a y2 attribute is present there MUST also be a x2 attribute.
An atom MAY have an x3 attribute, the value of which is the x
coordinate of a 3 dimensional object. The units are Angstrom and the axis system is always
right handed.
If a x3 attribute is present there MUST also be a y3 and z3 present.
An atom MAY have an y3 attribute, the value of which is the y
coordinate of a 3 dimensional object. The units are Angstrom and the axis system is always
right handed.
If a y3 attribute is present there MUST also be a x3 and z3 present.
An atom MAY have an x3 attribute, the value of which is the z
coordinate of a 3 dimensional object. The units are Angstrom and the axis system is always
right handed.
If a z3 attribute is present there MUST also be a x3 and y3 present.
An atom MAY have a formalCharge attribute.
An atom MAY have an isotopeNumber attribute.
An atom MAY have a spinMultiplicity attribute.
An atom that is an ancestor of a formula MAY have a
count attribute. If it does not have a count attribute
it is assumed to be present only once.
An atom that is not an ancestor of a formula MAY have a
atomParity element child.
An atom MAY contain any number of label children.
An atom MAY contain any number of property children.
An atomParity element MUST be the child of an atom. The atomParity
defines the stereochemistry around an atom centre.
An atomParity MUST have an atomRefs4 attribute, the value of which MUST
be the space separated ids of four different atoms which MUST be in the same overall parent
molecule as the atomParity.
A bond element MUST be the child of a bondArray. In the molecular
convention a bond MUST be between only two atoms and these atomss
MUST (by definition) have the same molecule parent.
A bond MUST have a atomRefs2 attribute, the value of which
MUST be the space separated ids of two different atoms which
MUST be in the same molecule.
A bond MUST have an order attribute.
It is RECOMMENDED that the value of this order should not use numeric values. If the
value is other the bond SHOULD have a dictRef
to add further information.
It is RECOMMENDED that a bond has an id attribute so that it can
be referenced. The id of a bond MUST be unique amongst the bonds
of the eldest containing molecule.
The value of the id attribute MUST start with a letter, and
MUST only contain letters, numbers, dot, hyphen or underscore.
IdStartChar
|
::= |
[A-Z] | [a-z]
|
IdChar
|
::= |
IdStartChar | [0-9] | "." | "-" | "_"
|
Id
|
::= |
IdStartChar (IdChar)*
|
A bond MAY have a bondStereo element child.
A bond MAY have any number of label children.
The bondStereo element MUST be a child of a bond. bondStereo
is a container used to support primarily cis C/trans T and wedge
W/hatch H stereochemistry but other forms may also be supported.
If the value of the bondStereo is W or H there MUST
be a atomRefs2 attribute present. The value of which MUST be the space separated
ids of the two atoms in the parent bond. The order of the
ids is important; the first is the sharp end of the wedge or hatch and the second is
the blunt end.
If an atomRef2 attribute is present there MUST NOT be an atoRefs4
attribute present.
If the value of the bondStereo is C or T there MUST
be a atomRefs4 attribute present. The value of which MUST be the space separated
ids of four different atoms. Two of the ids in the
atomRefs4 MUST be the ids of the atoms in the parent
bond.
The atomRefs4 define a
system, if cis this will be syn-periplanar, if trans this will be anti-periplananr. Typically the
two central atoms will be bonded to each other (and the bondStereo
element will be a child of this bond) with a bond order of D
and the two terminal atoms will be bonded directly to these, however this does not have to
be the case.
If an atomRef4 attribute is present there MUST NOT be an atoRefs2
attribute present.
If the value of the bondStereo is other the element MUST have a
dictRef attribute used add further semantics.
This work is licensed under a Creative Commons
Attribution 3.0 Unported License.