This specification defines the requirements of the Chemical Markup Language Molecular convention.
This document describes the concepts which are introduced in the molecular convention, explains how to compose a document that conforms to the molecular convention and illustrates these with examples.
The molecular convention is used to specify chemistry relating to molecules; for example connection tables formulae, names and properties. The molecules can also contain spectra though these will have their own conventions depending on whether they are Infra Red, NMR etc.
Where the behaviour of an element or attribute is completely explained by the schema it is not further elaborated on in this document; typically in these cases an entry will only state whether the node is required, suggested or optional.
Except where they are expressly forbidden, the convention allows users to optionally include both other cml elements and attributes, and foreign namespaced elements and attributes. It is expected that in general tools will silently ignore the extra information because they will not be able to understand it.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [ IETF RFC 2119 ].
The terms "element", "attribute", "child" and "parent" in this document are to be interpreted as described in the W3C Recommendation for Extensible Markup Language (XML) [ W3C XML ].
The use of fonts is as follows:
this font
.dictRef
), are written in this font
.This specification uses the following namespaces and prefixes to indicate those namespaces:
Prefix | Namespace URI | Description |
---|---|---|
cml
|
http://www.xml-cml.org/schema
|
Chemical Markup Language elements |
convention
|
http://www.xml-cml.org/convention/
|
Standard Chemical Markup Language convention namespace |
The molecular convention MUST be specified using the convention
attribute
on a molecular
element or a cml
element. The value of the attribute
MUST be a QName that represents the molecular convention, i.e. convention:molecular
.
If the molecular convention is specified on a cml
element then that element
MUST have at least one child molecule
element that either has no convention specified or
specifies the molecular convention.
<molecule xmlns="http://www.xml-cml.org/schema" xmlns:convention="http://www.xml-cml.org/convention/" convention="convention:molecular" id="m1"> <!-- body is omitted. --> </molecule>
<cml xmlns="http://www.xml-cml.org/schema" xmlns:convention="http://www.xml-cml.org/convention/" convention="convention:molecular"> <molecule id="m1"> <!-- body is omitted. --> </molecule id="m1"> </cml>
<cml xmlns="http://www.xml-cml.org/schema" xmlns:convention="http://www.xml-cml.org/convention/" convention="convention:molecular"> </cml>
<formula xmlns="http://www.xml-cml.org/schema" xmlns:convention="http://www.xml-cml.org/convention/" convention="convention:molecular"> <!-- body is omitted. --> </formula>
Within the molecular
convention, a molecule
is REQUIRED to be
a child of either cml
or molecule
elements.
A molecule
in the molecular convention is used to hold any combination of:
name
, a label
held in a label
or an inline representation in a inline
attribute on a formula
element).
concise
attribute on a formula
element, or by specifically listing the
atom
s in either an atomArray
as a child of a molecule
or in an atomArray
as a child or a formula
.
atom
s, connected by bond
s. By definition
a molecule is a connected set, therefore hydrochloric acid (bonded) would be a single
molecule, whilst H+ Cl- could be represented as a parent molecule
containing two child molecules (one containing H+, the other containing Cl-).
A molecule
element MUST have an id
attribute, the
value of which MUST be unique amongst the molecule
s within the scope of the document.
The value of the id
attribute MUST start with a letter, and
MUST only contain letters, numbers, dot, hyphen or underscore.
IdStartChar
|
::= |
[A-Z] | [a-z]
|
IdChar
|
::= |
IdStartChar | [0-9] | "." | "-" | "_"
|
Id
|
::= |
IdStartChar (IdChar)*
|
A molecule
that is a child of another molecule
MUST have a
count
attribute specified. The value of this attribute MUST be a non-negative
number.
<cml:molecule convention="convention:molecular" id="parentMol"> <cml:molecule id="childMol1" count="1"> <!-- body is omitted. --> </cml:molecule> <cml:molecule id="childMol2" count="0.5"> <!-- body is omitted. --> </cml:molecule> </cml:molecule>
<cml:molecule convention="convention:molecular" id="parentMol"> <cml:molecule id="childMol1"> <!-- body is omitted. --> </cml:molecule> </cml:molecule>
A molecule
that is not a child MUST NOT have a count
specified.
<cml:molecule convention="convention:molecular" id="parentMol" count="2"> <!-- body is omitted. --> </cml:molecule>
A molecule
MAY contain a single atomArray
child except when it
contains child molecules.
<cml:molecule convention="convention:molecular" id="parentMol"> <cml:molecule id="childMol1" count="1"> <!-- body is omitted. --> </cml:molecule> <cml:atomArray> <!-- body is omitted. --> <cml:atomArray> </cml:molecule>
A molecule
MAY contain a single bondyArray
child provided
that it does not contain child molecules.
A molecule
MAY contain any number of formula
children.
A molecule
MAY contain any number of property
children.
A molecule
MAY contain any number of label
children.
A molecule
MAY contain any number of name
children.
A molecule
SHOULD have a formalCharge
attribute specified.
For molecule
s that have child molecule
s, the value of the formalCharge
SHOULD be equal to the sum of the formalCharge
s of the child molecule
s multiplied by the count
value
of those molecule
s.
The value of the formalCharge
attribute on a molecule
that does not
contain child molecule
s is less well defined. In general its value is more
important than that of individual atom
s (i.e. a cyclopentadienyl anion
would have formalCharge="-1"
on the molecule but not necessarily have a formalCharge
attribute on any of the atoms).
A molecule
SHOULD have a spinMultiplicity
attribute specified.
A molecule
MAY have a chirality
attribute specified.
A molecule
MAY have child spectrum
elements. Each spectrum
element MUST specify a convention using the convention
attribute to which they
conform.
An atomArray
element MUST be a child of either a molecule
or a
formula
element. The atomArray
is simply a container for atom
s.
An atomArray
element MUST contain at least one child atom
element.
A bondArray
element MUST be a child of a molecule
element.
The bondArray
is simply a container for bond
s.
A bondArray
element MUST contain at least one child bond
element.
A formula
element MUST be the child of either molecule
or
formula
elements.
A formula
MUST have at least one of an atomArray
child, a concise
attribute
and an inline
attribute.
A formula
that is a child of another formula
MUST have a
count
attribute specified. The value of this attribute MUST be a non-negative
number.
<cml:molecule convention="convention:molecular" id="ml"> <cml:atomArray> <cml:atomArray count="1"> <!-- body is omitted. --> </cml:atomArray> </cml:atomArray> </cml:molecule>
A formula
that is not a child of a formula
element MUST NOT
have count
specified.
<cml:molecule convention="convention:molecular" id="ml"> <cml:atomArray count="1"> <!-- body is omitted. --> </cml:atomArray> </cml:molecule>
A formula
element MAY contain a single atomArray
element.
A formula
element SHOULD have a concise
attribute if possible, i.e.
if it can be calculated from the atom
s in the formula
's atomArray
or
potentially from the parent molecule
's atom
s.
The concise
attribute
is used to hold an (unstructured) formula i.e. no submolecules. The schema defines the
allowed pattern for the concise
attribute.
A formula
element MAY have an inline
attribute.
The inline
attribute
can be used to hold any information. There is no fixed way for markup to be specified but
it is recommended that Latex style is used i.e. H_{3}O^{+} to represent the
hydroxonium ion.
A property
element is used to wrap a scalar
and define to what
the scalar value relates.
A property
MUST have a single scalar
child that gives the value
of the property.
A property
MUST have a dictRef
attribute,
the value of which is a QName referencing an entry in a dictionary
which defines how this property should be interpreted.
It is RECOMMENDED that property
elements have a title
attribute
intended for human-readability.
The title attribute MUST NOT be empty and MUST contain at least one non-whitespace character.
The value of the title attribute MAY contain any valid unicode character, however it is RECOMMENDED that any character from outside of the ASCII subset (codepoints 32-127) is represented using an entity reference.
A scalar
MUST have
a units
attribute, the value of which is a QName
referencing the units of the value defined
using the scalar
.
A scalar
element MUST have
dataType
attribute, the value of which is a QName
referencing the data type of the value defined.
The semantics of the label
are not defined in the schema but are normally commonly
used standard or semi-standard text strings.
A label
MUST have a dictRef
attribute,
the value of which is a QName referencing an entry in a dictionary
which defines how this label should be interpreted.
A name
element contains a string that is the chemical name of the molecule.
The name does not need to be a structural chemical name. It is RECOMMENDED that formatting
and foreign (non-ASCII) characters are encoded using Latex style markup.
A name
MUST have a dictRef
attribute,
the value of which is a QName referencing an entry in a dictionary
which defines how this name should be interpreted.
An atom
MUST be a child of atomArray
.
An atom
MUST have an elementType
attribute.
An atom
MUST have an id
attribute it is part of an
atomArray
in a formula
(when the id
is optional).
<cml:molecule convention="convention:molecular" id="ml" formalCharge="1"> <cml:atomArray> <cml:atom elementType="H" id="a1" formalCharge="1"/> </cml:atomArray> </cml:molecule>
<cml:molecule convention="convention:molecular" id="ml" formalCharge="1"> <cml:atomArray> <cml:atom elementType="H" formalCharge="1"/> </cml:atomArray> </cml:molecule>
<cml:molecule convention="convention:molecular" id="ml" formalCharge="1"> <cml:formula concise="H 1 1"> <cml:atomArray> <cml:atom elementType="H" formalCharge="1" /> </cml:atomArray> </cml:formula> </cml:molecule>
The value of the id
MUST be unique amongst the atom
s within the
eldest containing molecule
.
The value of the id
attribute MUST start with a letter, and
MUST only contain letters, numbers, dot, hyphen or underscore.
IdStartChar
|
::= |
[A-Z] | [a-z]
|
IdChar
|
::= |
IdStartChar | [0-9] | "." | "-" | "_"
|
Id
|
::= |
IdStartChar (IdChar)*
|
An atom
MAY have an x2
attribute, the value of which is used for displaying the
object in 2 dimensions. This is unrelated to the 3-D coordinates for the object.
If a x2
attribute is present there MUST also be a y2
attribute.
An atom
MAY have an y2
attribute, the value of which is used for displaying the
object in 2 dimensions. This is unrelated to the 3-D coordinates for the object.
If a y2
attribute is present there MUST also be a x2
attribute.
An atom
MAY have an x3
attribute, the value of which is the x
coordinate of a 3 dimensional object. The units are Angstrom and the axis system is always
right handed.
If a x3
attribute is present there MUST also be a y3
and z3
present.
An atom
MAY have an y3
attribute, the value of which is the y
coordinate of a 3 dimensional object. The units are Angstrom and the axis system is always
right handed.
If a y3
attribute is present there MUST also be a x3
and z3
present.
An atom
MAY have an x3
attribute, the value of which is the z
coordinate of a 3 dimensional object. The units are Angstrom and the axis system is always
right handed.
If a z3
attribute is present there MUST also be a x3
and y3
present.
An atom
MAY have a formalCharge
attribute.
An atom
MAY have an isotopeNumber
attribute.
An atom
MAY have a spinMultiplicity
attribute.
An atom
that is an ancestor of a formula
MAY have a
count
attribute. If it does not have a count
attribute
it is assumed to be present only once.
An atom
that is not an ancestor of a formula
MAY have a
atomParity
element child.
An atom
MAY contain any number of label
children.
An atom
MAY contain any number of property
children.
An atomParity
element MUST be the child of an atom
. The atomParity
defines the stereochemistry around an atom centre.
An atomParity
MUST have an atomRefs4
attribute, the value of which MUST
be the space separated id
s of four different atom
s which MUST be in the same overall parent
molecule as the atomParity
.
A bond
element MUST be the child of a bondArray
. In the molecular
convention a bond MUST be between only two atom
s and these atoms
s
MUST (by definition) have the same molecule
parent.
A bond
MUST have a atomRefs2
attribute, the value of which
MUST be the space separated id
s of two different atom
s which
MUST be in the same molecule
.
A bond
MUST have an order
attribute.
It is RECOMMENDED that the value of this order should not use numeric values. If the
value is other
the bond
SHOULD have a dictRef
to add further information.
It is RECOMMENDED that a bond
has an id
attribute so that it can
be referenced. The id
of a bond
MUST be unique amongst the bond
s
of the eldest containing molecule
.
The value of the id
attribute MUST start with a letter, and
MUST only contain letters, numbers, dot, hyphen or underscore.
IdStartChar
|
::= |
[A-Z] | [a-z]
|
IdChar
|
::= |
IdStartChar | [0-9] | "." | "-" | "_"
|
Id
|
::= |
IdStartChar (IdChar)*
|
A bond
MAY have a bondStereo
element child.
A bond
MAY have any number of label
children.
The bondStereo
element MUST be a child of a bond
. bondStereo
is a container used to support primarily cis C
/trans T
and wedge
W
/hatch H
stereochemistry but other forms may also be supported.
If the value of the bondStereo
is W
or H
there MUST
be a atomRefs2
attribute present. The value of which MUST be the space separated
id
s of the two atom
s in the parent bond
. The order of the
id
s is important; the first is the sharp end of the wedge or hatch and the second is
the blunt end.
If an atomRef2
attribute is present there MUST NOT be an atoRefs4
attribute present.
If the value of the bondStereo
is C
or T
there MUST
be a atomRefs4
attribute present. The value of which MUST be the space separated
id
s of four different atom
s. Two of the id
s in the
atomRefs4
MUST be the id
s of the atom
s in the parent
bond
.
The atomRefs4
define a
system, if cis this will be syn-periplanar, if trans this will be anti-periplananr. Typically the
two central atom
s will be bonded to each other (and the bondStereo
element will be a child of this bond
) with a bond order of D
and the two terminal atoms will be bonded directly to these, however this does not have to
be the case.
If an atomRef4
attribute is present there MUST NOT be an atoRefs2
attribute present.
If the value of the bondStereo
is other
the element MUST have a
dictRef
attribute used add further semantics.
This work is licensed under a Creative Commons
Attribution 3.0 Unported License.