Chemical Markup Language - CompChem Convention

24 April 2011

This version:
http://www.xml-cml.org/convention/compchem-20110524
Latest version:
http://www.xml-cml.org/convention/compchem
Authors:
See acknowledgments.
Editors:
Weerapong Phadungsukanan, University of Cambridge
Sam Adams, University of Cambridge
Joe Townsend, University of Cambridge

Abstract

This specification defines the requirements of the Chemical Markup Language CompChem convention.


Table of Contents

TODO


1. Introduction

The CompChem convention is used to specify computational chemistry documents. It is designed to capture the typical underlying processes of quantum calculations and their relationships in a well defined implicit semantic structure using Chemical Markup Language (CML).

This document describes the concepts which are introduced in CompChem, explains how to compose a CompChem document and illustrates it with examples. The core concepts that make up CompChem are:

The CompChem convention also relies on the molecular convention or other RECOMMENDED molecular conventions which defines a set of constraints for storing to representations of molecules in CML. CompChem requires that any molecules e.g. in the input molecular geometry and the optimised geometry result conform to the molecular convention. The molecular convention is not explained here and the reader should consult the molecular convention specification for more information.

Except the few instances where they are expressly forbidden, the convention allows users to optionally include both other cml elements and attributes, and foreign namespaced elements and attributes. It is expected that in general tools will silently ignore the extra information because they will not be able to understand it.

1.1 Notational Conventions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [ IETF RFC 2119 ].

The terms "element", "attribute", "child" and "parent" in this document are to be interpreted as described in the W3C Recommendation for Extensible Markup Language (XML) [ W3C XML ].

The use of fonts is as follows:

1.2 Namespaces

This specification uses the following namespaces and prefixes to indicate those namespaces:

Prefix Namespace URI Description
cml http://www.xml-cml.org/schema Chemical Markup Language elements
convention http://www.xml-cml.org/convention/ Standard Chemical Markup Language convention namespace
compchem http://www.xml-cml.org/dictionary/compchem/ CompChem Dictionary namespace
xhtml http://www.w3.org/1999/xhtml XHTML
dc http://purl.org/dc/elements/1.1/ Dublin Core Metadata Element Set, Version 1.1

2. Applying the CompChem convention on CML module element

The CompChem convention MUST be specified using the convention attribute on a module element and the value MUST be a QName that represents the CompChem convention, i.e. convention:compchem. Such a module will be addressed as a CompChem module or a CompChem module element in this convention. The Example 1 shows the declarations of the namespaces and the convention for the CompChem module element.

Example 1: Minimal requirements for CompChem module element.

<module xmlns="http://www.xml-cml.org/schema"
        xmlns:convention="http://www.xml-cml.org/convention/"
        convention="convention:compchem">

        <!-- body is omitted. -->

</module>
            

A CompChem module MUST contain at least one jobList module element, e.g. <module dictRef="compchem:jobList" id="jobList-0000">, see Example 2.

A CompChem module MAY also contain any number of child elements in any namespace, see Example 2.

Example 2: CompChem module as root element and contains jobList modules.

<?xml version="1.0" encoding="UTF-8" ?>
<module xmlns="http://www.xml-cml.org/schema"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:compchem="http://www.xml-cml.org/dictionary/compchem/"
        xmlns:convention="http://www.xml-cml.org/convention/"
        convention="convention:compchem">

    <module dictRef="compchem:jobList" id="jobList-0001">
        <!-- 1st job list body is omitted. -->
    </module>

    <module dictRef="compchem:jobList" id="jobList-0002">
        <!-- 2nd job list body is omitted. -->
    </module>

    <dc:creator>Weerapong Phadungsukanan</dc:creator>
    <dc:title>Geometry optimization for hydrocarbon species</dc:title>
    <dc:description>Consist of CH4 and CH3 molecules</dc:description>
    <dc:date>2011-03-25</dc:date>

</module>
            

3. CompChem Concepts and Elements

3.1 Joblist

A quantum chemistry calculation is often comprised of a series of subtasks, e.g. coarse optimisation -> fine optimisation -> NMR Spectrum Analysis; this is because most quantum chemistry software packages are designed to be modularised and only to perform a single task at a time. The jobList concept is introduced to capture these series of successive subtasks and links the information from one subtask to the next subtask.

A job list concept in CompChem is represented by a module element with a dictRef="compchem:jobList" attribute on a module element. Such a module will be addressed as a jobList module or a jobList module element in this convention.

A jobList module is constructed according to the following rules:

Example 3: A jobList module

<module dictRef="compchem:jobList" id="jobList-0000" >
    <module dictRef="compchem:job" title="Geometry optimization for hydrocarbon species">
        <!-- job body is omitted. -->
    </module>

    <dc:date>2011-03-25</dc:date>

</module>
            

3.2 Job

A job concept represents a computational job performed by quantum chemistry software, e.g. geometry optimisation job, frequency analysis job. The job concept is the smallest unit which can fully describe a general picture of computational modelling.

A job consists of model parameters (3.3) and model optimisations or calculations (3.4), model results (3.5) and computing environments (3.6). These four components are fundamental to modelling in every field. However, a job does not require all four components. Only model parameters are mandatory.

A job concept in CompChem is represented by a module element with a dictRef="compchem:job" attribute on a module element. Such a module will be addressed as a job module or a job module element in this convention.

A job module is constructed according to the following rules:

The only essential child element of a job module is an initialization module which represents a model input for a computational job, see Example 4.

Example 4: Minimal requirements for a job module

<module dictRef="compchem:job" id="job-0000" title="input for coarse geometry optimisation">
    <module dictRef="compchem:initialization">
        <!-- required init body is omitted. -->
    </module>
</module>
            

Example 5: A job module with multiple calculation steps

<module dictRef="compchem:job" id="job-0000" title="coarse geometry optimisation">
    <module dictRef="compchem:initialization">
        <!-- required init body is omitted. -->
    </module>
    <module dictRef="compchem:calculation">
        <!-- required calculation body is omitted. -->
    </module>
    <module dictRef="compchem:calculation">
        <!-- required calculation body is omitted. -->
    </module>
    <module dictRef="compchem:calculation">
        <!-- required calculation body is omitted. -->
    </module>
    <module dictRef="compchem:finalization">
        <!-- required final body is omitted. -->
    </module>
</module>
            

Example 6: A job module without calculation steps

<module dictRef="compchem:job" id="job-0000" title="coarse geometry optimisation">
    <module dictRef="compchem:initialization">
        <!-- required init body is omitted. -->
    </module>
    <module dictRef="compchem:finalization">
        <!-- required final body is omitted. -->
    </module>
</module>
            

3.3 Model Initialization

The model initialization concept represents the model parameters and inputs for computational job. The model parameters are one of the most important elements and exist in every modelling study. Therefore, this concept is a REQUIRED element in the CompChem convention.

The model initialization concept in CompChem is represented by a module element with a dictRef="compchem:initialization" attribute on a module element. Such a module will be addressed as an initialization module or an initialization module element in this convention.

An initialization module is constructed according to the following rules:

Example 7: An initialization module

<module dictRef="compchem:initialization">
    <molecule convention="convention:molecular">
        <!-- consult molecular convention or recommended molecular convention. -->
    </molecule>
    <parameterList>
        <!-- parameterList body is omitted. -->
    </parameterList>
</module>
            

3.4 Model Calculation

A model calculation concept represents the computation, the optimisation or the iteration processes for computational job. The calculation process may or may not be of interest to some scientists; therefore, the model calculation is an OPTIONAL information in CompChem.

A model calculation concept in CompChem is represented by a module element with a dictRef="compchem:calculation" attribute on a module element. Such a module will be addressed as an calculation module or an calculation module element in this convention.

A calculation module are constructed according to the following rules:

Example 8: CompChem Calculation module

<module dictRef="compchem:calculation">
    <molecule convention="convention:molecular">
        <!-- consult molecular convention and molecule section below. -->
    </molecule>
    <parameterList>
        <!-- parameterList body is omitted. -->
    </parameterList>
    <propertyList>
        <!-- propertyList body is omitted. -->
    </propertyList>
</module>
            

3.5 Model Finalization

A model finalization concept represents the model outputs or results for computational job. In some cases, a CompChem module MAY only represent the model inputs and does not contain any calculations, therefore, the model finalization is an OPTIONAL information in CompChem.

A model finalization concept in CompChem is represented by a module element with a dictRef="compchem:finalization" attribute on a module element. Such a module will be addressed as a finalization module or a finalization module element in this convention.

A finalization module is constructed according to the following rules:

Example 9: CompChem Final module

<module dictRef="compchem:final">
    <molecule convention="convention:molecular">
        <!-- consult molecular convention and molecule section below. -->
    </molecule>
    <propertyList>
        <!-- propertyList body is omitted. -->
    </propertyList>
</module>
            

3.6 Computing Environment

The computing environment concept refers to a hardware platform, software application, the operating system and any hardware and software configurations used in order to run the job or computational task. The environment also includes the metadata such as machine id, username, starting and finishing date time, tools, compilers, IP, etc.

This information is not related to input and output of the model but is supplementary to the software application to run properly and may vary from machine to machine. Therefore, the computing environment is OPTIONAL element in the CompChem convention.

The computing environment concept in CompChem is represented by a module element with a dictRef="compchem:environment" attribute on a module element. Such a module will be addressed as an environment module or an environment module element in this convention.

An environment module is constructed according to the following rules:

Although CompChem allows an environment module element to contain elements that represent the concept of parameters, this is NOT RECOMMENDED.

Example 10: An environment module

<module dictRef="compchem:environment">
    <propertyList>
        <!-- propertyList body is omitted. -->
    </propertyList>
</module>
            

3.7 User Defined Concept

CompChem allows users or authors to define their own concepts if the RECOMMENDED concepts above do not fit into their requirements.

A user defined concept in CompChem is represented by a module element with a dictRef attribute whose value points to an entry in a dictionary that defines the concept, see Example 11. Such a module will be addressed as a user defined module or a user defined module element in this convention.

Users are free to design any structure for a user defined module. However, it is RECOMMENDED to use existing structures or a structure that has a schema for validation.

Information in a user defined module cannot be guaranteed to be understandable by all processing software tools.

Example 11: A user defined module

<module dictRef="prefix:identifier">

</module>
            

4. Metadata

5. Adding Properties and Parameters

Authors MUST follow the RECOMMENDED style for propertys and parameters, visit external link.

6. Recommended Properties and Parameters for CompChem

6.1 Recommended Properties for an environment module

TABLE

6.2 Recommended Parameters for an initalization module

TABLE

6.3 Recommended Properties and Parameters for a calculation module

TABLE

6.4 Recommended Properties for a finalization module

TABLE

7. Recommended Conventions for a molecule element

TABLE

4. Data Structure with CML Elements

CompChem has adopted a set of elements from CML to store the contents of the computational data. This section describes how CompChem constructs the data structure from these CML elements set.

NOTE: elements in the CML namespace (http://www.xml-cml.org/schema) are denoted using the (arbitary) prefix cml:.

4.1 Molecule

CompChem uses cml:molecule to store molecular geometry and coordinates. cml:molecule specifications are defined in the molecular convention so the convention of the molecule element MUST be specified using the convention attribute with the value of convention:molecular. Readers SHOULD consult the specifications of the molecular convention for more details.

4.2 z-Matrix

In many quantum chemistry calculations, the coordinates of the atoms are represented using a z-Matrix coordinate system. CompChem adopta the z-Matrix concept from the CML schema. Z-Matrices and cartesian coordinates can be converted in both directions, but may differ in orientation and translation. There may be some conventions for ordering atoms in z-matrices which cannot be guaranteed.

4.3 ParameterList

CompChem uses cml:parameterList to group the cml:parameter(s). cml:parameterList MUST have only cml:parameter(s) as child element(s).

4.4 PropertyList

CompChem uses cml:propertyList to group the cml:property(s). cml:propertyList MUST have only cml:property(s) as child element(s).

4.5 Parameter

CompChem uses cml:parameter to represent the inputs of the quantum chemistry calculations.

A CML parameter is defined by cml:parameter element and MUST have a reference to an entry in a dictionary which specifies the meaning of the parameter.

A parameter MUST contain only one of the value containers given in section 4.8.

4.6 Property

CompChem uses cml:property to represent the concept of the outputs of the quantum chemistry calculations.

A CML property is defined by cml:property element and MUST have a reference to an entry in a dictionary which specifies the meaning of the property.

A property MUST contain only one of the value containers given in section 4.8.

4.7 Units

Units are among the most important information in scientific data. All value containers, see section 4.8, MUST have units associated with them except for string or text values.

4.8 Value container

CML provides elements to wrap many differrent types of mathematical, scientifical and computational values and variables, e.g. scalar, vector, matrix, array, etc., which will be addressed as value containers in the current context. In CompChem, we currently only allow the use of scalar, array and matrix as value containers. This should be sufficient for many computational outputs. Other type of CML containers will be added later and may be built from combinations of primitive containers.

4.8.1 Scalar

A CML scalar element in CompChem is used to hold scalar data, which can be a single value of dataType integer, real, boolean, string or date.

In CompChem, a cml:scalar MUST conform to the following rules :

4.8.2 Array

A CML array element in CompChem is used to hold a one dimensional array data structure of only either integer or real values, i.e. a collection of integer of real values where each element is identified by index. Thus, the data type of the CML Array MUST only be xsd:integer or xsd:double. It MUST also conform to the following rules :

4.8.3 Matrix

A CML Matrix element in CompChem is used to hold a two-dimensional rectangular matrix data structure of only either integer or real values. Thus, the data type of this value container MUST only be xsd:integer or xsd:double. It MUST also conform to the following rules :

Example : Fully qualified CompChem document (Isn't qualified at the moment due to change in the rules)

<?xml version="1.0" encoding="UTF-8"?>
<module cmlx:note="A valid compchem containg two jobs is valid."
     convention="convention:compchem"
     xmlns="http://www.xml-cml.org/schema"
     xmlns:convention="http://www.xml-cml.org/convention/"
     xmlns:units="http://www.xml-cml.org/units/units"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema"
     xmlns:cmlx="http://www.xml-cml.org/schema/cmlx">

     <module dictRef="compchem:jobList">

        <identifier convention="chemid:EmpiricalFormula" value="CCl2O2"/>
        <identifier convention="chemid:InChI" value="InChI=1/CCl2O2/c2-1(4)5-3"/>
        <identifier convention="chemid:CanonicalSmiles" value="ClOC(=O)Cl"/>
        <identifier convention="chemid:IsomericSmiles" value="C(=O)(Cl)OCl"/>

        <module dictRef="compchem:job">
            <module dictRef="compchem:initialization">
                <parameterList>
                    <parameter dictRef="cmlqm:qm.method">
                        <scalar dataType="xsd:string">B97-1</scalar>
                    </parameter>
                    <parameter dictRef="cmlqm:qm.basis">
                        <scalar dataType="xsd:string">6-311+G(d,p)</scalar>
                    </parameter>
                    <parameter dictRef="cmlqm:qm.integral">
                        <scalar dataType="xsd:string">UltraFine</scalar>
                    </parameter>
                    <parameter dictRef="cmlqm:qm.symm">
                        <scalar dataType="xsd:string">None</scalar>
                    </parameter>
                    <parameter dictRef="cmlqm:qm.pop">
                        <scalar dataType="xsd:string">Regular</scalar>
                    </parameter>
                    <parameter dictRef="cmlqm:cmd.freq">
                        <scalar dataType="xsd:string">Frequency analysis</scalar>
                    </parameter>
                    <parameter dictRef="cmlqm:cmd.geo">
                        <scalar dataType="xsd:string">Geometry optimization</scalar>
                    </parameter>
                    <parameter dictRef="cmlqm:cmd.scf">
                        <scalar dataType="xsd:string">Tight, SCF Convergence Criteria of 10^-8</scalar>
                    </parameter>
                </parameterList>
                <molecule convention="convention:molecular" formalCharge="0" spinMultiplicity="1">
                    <atomArray>
                        <atom id="a1" elementType="C" x3="-1.6629" y3="-1.148" z3="-0.0578"/>
                        <atom id="a2" elementType="Cl" x3="-3.0407" y3="-0.0568" z3="-0.2355"/>
                        <atom id="a3" elementType="O" x3="-0.5528" y3="-0.8196" z3="0.1622"/>
                        <atom id="a4" elementType="O" x3="-2.1799" y3="-2.3872" z3="-0.2293"/>
                        <atom id="a5" elementType="Cl" x3="-1.0066" y3="-3.6732" z3="-0.1014"/>
                    </atomArray>
                    <bondArray>
                        <bond atomRefs2="a1 a2" order="S" id="a1_a2"/>
                        <bond atomRefs2="a1 a3" order="D" id="a1_a3"/>
                        <bond atomRefs2="a1 a4" order="S" id="a1_a4"/>
                        <bond atomRefs2="a4 a5" order="S" id="a4_a5"/>
                    </bondArray>
                </molecule>
            </module>
            <module dictRef="compchem:finalization">
                <molecule convention="convention:molecular" formalCharge="0" spinMultiplicity="1">
                    <atomArray>
                        <atom id="a1" elementType="C" x3="-1.661722" y3="-1.150648" z3="-0.057744"/>
                        <atom id="a2" elementType="Cl" x3="-3.041068" y3="-0.059007" z3="-0.235698"/>
                        <atom id="a3" elementType="O" x3="-0.55104" y3="-0.821319" z3="0.162394"/>
                        <atom id="a4" elementType="O" x3="-2.180501" y3="-2.387568" z3="-0.229457"/>
                        <atom id="a5" elementType="Cl" x3="-1.00857" y3="-3.666258" z3="-0.101295"/>
                    </atomArray>
                    <bondArray>
                        <bond atomRefs2="a1 a2" order="S" id="a1_a2"/>
                        <bond atomRefs2="a1 a3" order="D" id="a1_a3"/>
                        <bond atomRefs2="a1 a4" order="S" id="a1_a4"/>
                        <bond atomRefs2="a4 a5" order="S" id="a4_a5"/>
                    </bondArray>
                </molecule>
                <propertyList>
                    <property dictRef="cmlqm:property.nsymrot">
                        <scalar dataType="xsd:integer">1</scalar>
                    </property>
                    <property dictRef="cmlqm:property.rotconstants">
                        <array dataType="xsd:double" units="units:GHz" delimiter="" size="3">
                        9.610774107948902 1.6540686120943338 1.4111941740951228
                        </array>
                    </property>
                    <property dictRef="cmlqm:property.pmoi">
                        <array dataType="xsd:double" units="units:amu.rbohr2" delimiter="" size="3">
                        187.7832739452039 1091.092965516907 1278.876188881425
                        </array>
                    </property>
                    <property dictRef="cmlqm:property.pmoiaxis">
                        <matrix rows="3" columns="3" dataType="xsd:double">
                        0.4547378900162555 0.8730253212283468 -0.17618240513089256
                        -0.8902808990268888 0.45108016953679547 -0.06266260031733822
                        0.02476635240479569 0.18534688868388882 0.9823610123797482
                        </matrix>
                    </property>
                    <property dictRef="cmlqm:property.scfenergy">
                        <scalar dataType="xsd:double" units="units:hartree">-1108.82623721</scalar>
                    </property>
                    <property dictRef="cmlqm:property.hf298">
                        <scalar units="units:kcal.mol-1" dataType="xsd:double">-61.5885860322</scalar>
                    </property>
                    <property dictRef="cmlqm:property.freqmode">
                        <array dataType="xsd:double" units="units:cm-1" delimiter="" size="9">
                        142.6122 218.2282 326.208 471.9528 665.9777 708.8044 841.3894 1047.4277 1905.0634
                        </array>
                    </property>
                    <property dictRef="cmlqm:property.forceconstant">
                        <array dataType="xsd:double" units="units:mDyne.A-1" delimiter="" size="9">
                        0.2216 0.5379 1.7308 2.7391 3.3245 5.34 6.1336 8.6831 28.3818
                        </array>
                    </property>
                    <property dictRef="cmlqm:property.irintensity">
                        <array dataType="xsd:double" units="units:KM.mol-1" delimiter="" size="9">
                        0.4756 1.7945 5.3957 3.2514 12.5217 26.7365 85.8421 376.7849 400.4058
                        </array>
                    </property>
                    <property dictRef="cmlqm:property.reducedmass">
                        <array dataType="xsd:double" units="units:amu" delimiter="" size="9">
                        18.4952 19.1715 27.6069 20.8721 12.7222 18.04 14.7051 13.4331 13.273
                        </array>
                    </property>
                </propertyList>
            </module>
        </module>
    </module>
</module>
            

A. References

[RFC2119]
IETF RFC 2119: Keywords for use in RFCs to Indicate Requirement Levels , S. Bradner, March 1997. Available at http://www.ietf.org/rfc/rfc2119.txt.
[XML]
Extensible Markup Language (XML) 1.0 (Fifth Edition) , T. Bray, J. Paoli, C.M. Sperberg-McQueen E. Maler and F. Yergeau, Editors. World Wide Web Consortium. 26 October 2008. This version is http://www.w3.org/TR/2008/REC-xml-20081126. latest version of XML is available at http://www.w3.org/TR/REC-xml.

B. Acknowledgements


Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License.