CML - Some basic concepts

There are three fundamental concepts in CML: the schema, conventions and dictionaries, providing the grammar, vocabulary and semantic definitions. Interoperability is provided through namespaces, which must be unique. To qualify as "well-formed", CML code will require compliance with all these factors, but can be checked for validity via the validation service provided.

The CML Schema

XML schemas describe the structure of an XML (CML) document. The CML schema defines the elements and attributes that can appear in a CML document, together with their data types.

The schema content model in CML has been relaxed quite substantially between version 2.4 and version 3. Further details can be found in the 'schema' section.

CML conventions

Different domains of chemistry think about chemistry differently; often this means a very tight specification of rules in your area of expertise and very little if any applied to the rest. The loosening of the CML content model in Schema 3 allows users to combine the elements and attributes as they need to represent data. However, users still need to be able to specify a set of rules (constraints) which model their particular domain. This can be likened to thinking of the elements and attributes of CML as representing the allowed vocabulary and the set of rules as a grammar specifying how these words are allowed to be put together. The entire set of constraints which the CML should conform to is called a convention.

This website provides a list of all currently supported conventions.

CML dictionaries

Dictionaries allow CML to be understood by machines. Much of physical science is managed through the dictionary mechanism. We find terms and units relating to an aspect of science (such as heat of formation, melting point, point group) and create entries for these items in a dictionary. Different programs sometimes produce data with the same label but a different interpretation, so each computational chemistry code has its own dictionary reflecting its specific vocabulary. Some concepts are sufficiently universal that the community can decide to group particular concepts together into a higher-level dictionary.

The entries can consist of just a unique id (within the dictionary's namespace) and some human-understandable definition but we highly encourage more information to be given. For instance what are the units, are there upper and lower bounds, what is the type of the data (string, integer, float etc.).

This website provides a list of all currently supported dictionaries.


XML (and CML) encourage the creation of vocabularies (dictionaries). But what if two authors choose identical tags for chemical concepts? Similarly, different programs sometimes produce data with the same label but a different interpretation; does 'density' mean 'electron density' or 'physical density'? The problem is circumvented by using namespaces based on unique URIs (e.g., etc.) to differentiate the dictionaries. Element and attribute names in each dictionary have both a namespace prefix and a local name to define them, the combination being required to uniquely identify them. Thus:

<property dictRef="comp:density" xmlns:comp="">...</property>
does not represent the same property as
<property dictRef="geo:density" xmlns:geo="">...</property>
The namespace can be any string mapped onto a URI. Namespace URIs should not be confused with Internet URLs - while it is best practice for them to resolve to a page containing information about the resource they identify, it is not an absolute requirement. Some URIs are reserved, e.g. W3C XML reserves xml:, xsl:, xsd:, xsi: etc.. For example: <xsl:stylesheet xmlns:xsl=""/>

Multiple references

Documents may refer to a large number of different online resources each with their own namespace.

<?xml version="1.0"?>
<cml:molecule xmlns:atomic="" xmlns:my="" xmlns:xsi=""
    xsi:schemaLocation=" ../../schema.xsd" xmlns:cml=""> 
    <cml:atom my:annnotation="specialBasisSet" elementType="H" id="a1">
        <my:basisSet type="minimalist"> <cml:scalar units="atomic:bohr" dictRef="my:radius">1.234</cml:scalar> </my:basisSet>