Chemical Markup Language - CompChem Convention

24 April 2011

This version:
http://www.xml-cml.org/convention/compchem-20110524
Latest version:
http://www.xml-cml.org/convention/compchem
Authors:
See acknowledgments.
Editors:
Weerapong Phadungsukanan, University of Cambridge
Sam Adams, University of Cambridge
Joe Townsend, University of Cambridge
Jens Thomas, STFC Daresbury Laboratory

Abstract

This specification defines the requirements of the Chemical Markup Language CompChem convention.


Table of Contents

1. Introduction
    1.1 Notational Conventions
    1.2 Namespaces
    1.3 Development
2. Applying the CompChem convention on CML module element
3. CompChem Concepts and Elements
    3.1 JobList
    3.2 Job
    3.3 Model Initialization
    3.4 Model Calculation
    3.5 Model Finalization
    3.6 Computing Environment
    3.7 User Defined Concept
4. Data Structure with CML Elements
    4.1 Molecule
    4.2 z-Matrix
    4.3 ParameterList
    4.4 PropertyList
    4.5 Parameter
    4.6 Property
    4.7 Units
    4.8 Value container
        4.8.1 Scalar
        4.8.2 Array
        4.8.3 Matrix
5. Adding Properties, Parameters and other objects
6. Recommended Properties and Parameters for CompChem
    6.1 Recommended Properties for an environment module
    6.1 Recommended Parameters and objects for an initalization module
    6.1 Recommended Properties and objects for a finalization module
7. Example Files

Appendices

A. References
B. Acknowledgements


1. Introduction

The CompChem convention is used to specify computational chemistry documents. It is designed to capture the typical underlying processes of quantum calculations and their relationships in a well defined implicit semantic structure using Chemical Markup Language (CML).

This document describes the concepts which are introduced in CompChem, explains how to compose a CompChem document and illustrates it with examples. This document references and should be read in conjunction with the CompChem dictionary, which is a core part of the convention mechanism and defines the terms used within it.

The core concepts that make up CompChem are:

The CompChem convention also relies on the molecular convention or other RECOMMENDED molecular conventions which defines a set of constraints for storing to representations of molecules in CML. CompChem requires that any molecules e.g. in the input molecular geometry and the optimised geometry result conform to the molecular convention. The molecular convention is not explained here and the reader should consult the molecular convention specification for more information.

Except the few instances where they are expressly forbidden, the convention allows users to optionally include both other cml elements and attributes, and foreign namespaced elements and attributes. It is expected that in general tools will silently ignore the extra information because they will not be able to understand it. This also permits the conventions and dictionaries to be developed around working tools, and only updated at well-defined intervals.

The current recommendation for the development of more complex objects not already represented within CML, are that they are expressed as a CML list, with a dictRef attribute that points to the relevant entry in the CompChem dictionary. Examples of this are the current representations for basis sets and DFT functionals.

1.1 Notational Conventions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [ IETF RFC 2119 ].

The terms "element", "attribute", "child" and "parent" in this document are to be interpreted as described in the W3C Recommendation for Extensible Markup Language (XML) [ W3C XML ].

The use of fonts is as follows:

1.2 Namespaces

This specification uses the following namespaces and prefixes to indicate those namespaces:

Prefix Namespace URI Description
cml http://www.xml-cml.org/schema Chemical Markup Language elements
convention http://www.xml-cml.org/convention/ Standard Chemical Markup Language convention namespace
compchem http://www.xml-cml.org/dictionary/compchem/ CompChem Dictionary namespace
xhtml http://www.w3.org/1999/xhtml XHTML
dc http://purl.org/dc/elements/1.1/ Dublin Core Metadata Element Set, Version 1.1

1.3 Development

The conventions, the dictionaries and CML are ongoing efforts. This convention serves to document the procedures used in working software (such as Chempound), but is also intended as a discussion document and will be revised as its scope is extended and additional areas of computational chemistry are explored. We very much welcome the input from all members of the community (pracitioners, developers, teachers, students, etc.) so please feel free to contact us and get involved if this is of use to you in your work.

It is hoped that the convention in its current form is both broad enough in scope, and flexible enough to be extended to cover most areas of interest in computational chemistry. At the same time, there should be enough structure present that it is already possible to develop tools that rely on the convention and will not be broken by future updates - they may just not be able to take full advantages of the extensions. For example, the results of a job are always present in the finalization module. Tools can rely on finding the results they know about in this module (and new scalar, array and matrix properties that may appear can often be handled with no additional code). More complex objects (such as orbitals) will also appear in this module, but as the tool will not expect them, they can just be ignored.

The "human-readable" text for the conventions currently resides in the the bitbucket repository for the xml-cml.org website.

The rules that the convention texts describe are implemented in xslt within the cml validator, which is currently hosted on bitbucket. The rules defining the compchem convention, are implemented in the compchem-rules.xsl file. Any changes or updates to the convention generally requires editing BOTH files.

The reference software for generating CompChem convention compliant CML is currently the NWChem Jumbo Converters. These will parse a subset of the standard NWChem log files into CML, which can then be imported into Chempound. Documentation on the template parsing approach used in the jumbo coverters can be found on the Quixote Wiki.

2. Applying the CompChem convention on CML module element

The CompChem convention MUST be specified using the convention attribute on a module element and the value MUST be a QName that represents the CompChem convention, i.e. convention:compchem. Such a module will be addressed as a CompChem module or a CompChem module element in this convention. The Example 1 shows the declarations of the namespaces and the convention for the CompChem module element.

Example 1: Minimal requirements for CompChem module element.

<module xmlns="http://www.xml-cml.org/schema"
        xmlns:convention="http://www.xml-cml.org/convention/"
        convention="convention:compchem">

        <!-- body is omitted. -->

</module>
            

A CompChem module MUST contain at least one jobList module element, e.g. <module dictRef="compchem:jobList" id="jobList-0000">, see Example 2.

A CompChem module MAY also contain any number of child elements in any namespace, see Example 2.

Example 2: CompChem module as root element and contains jobList modules.

<?xml version="1.0" encoding="UTF-8" ?>
<module xmlns="http://www.xml-cml.org/schema"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:compchem="http://www.xml-cml.org/dictionary/compchem/"
        xmlns:convention="http://www.xml-cml.org/convention/"
        convention="convention:compchem">

    <module dictRef="compchem:jobList" id="jobList-0001">
        <!-- 1st job list body is omitted. -->
    </module>

    <module dictRef="compchem:jobList" id="jobList-0002">
        <!-- 2nd job list body is omitted. -->
    </module>

    <dc:creator>Weerapong Phadungsukanan</dc:creator>
    <dc:title>Geometry optimization for hydrocarbon species</dc:title>
    <dc:description>Consist of CH4 and CH3 molecules</dc:description>
    <dc:date>2011-03-25</dc:date>

</module>
            

3. CompChem Concepts and Elements

3.1 Joblist

A quantum chemistry calculation is often comprised of a series of subtasks, e.g. coarse optimisation -> fine optimisation -> NMR Spectrum Analysis; this is because most quantum chemistry software packages are designed to be modularised and only to perform a single task at a time. The jobList concept is introduced to capture these series of successive subtasks and links the information from one subtask to the next subtask.

A Joblist concept in CompChem is represented by a module element with a dictRef="compchem:jobList" attribute on a module element. Such a module will be addressed as a jobList module or a jobList module element in this convention.

A jobList module is constructed according to the following rules:

Example 3: A jobList module

<module dictRef="compchem:jobList" id="jobList-0000" >
    <module dictRef="compchem:job" title="Geometry optimization for hydrocarbon species">
        <!-- job body is omitted. -->
    </module>

    <dc:date>2011-03-25</dc:date>

</module>
            

3.2 Job

A job concept represents a computational job performed by quantum chemistry software, e.g. geometry optimisation job, frequency analysis job. The job concept is the smallest unit which can fully describe a general picture of computational modelling.

A job consists of model parameters (3.3) and model optimisations or calculations (3.4), model results (3.5) and computing environments (3.6). These four components are fundamental to modelling in every field. However, a job does not require all four components. Only an initialization module with the model parameters is mandatory.

The job and calculation modules are effectively functionally identical, as many of the calculations within a job could be run as separate jobs themselves (e.g. a single-point calculation in an optimisation run could be submitted separately). A job serves to group one or more calculations into a logical unit of work; it can therefore be thought of as the unit of work that would be submitted to a computational resource. The main differences between jobs and calculations are that jobs may contain an envionment module, and cannot themselves contain other jobs, whereas calculations can be nested within other calculations to any degree.

A job concept in CompChem is represented by a module element with a dictRef="compchem:job" attribute on a module element. Such a module will be addressed as a job module or a job module element in this convention.

A job module is constructed according to the following rules:

The only essential child element of a job module is an initialization module which represents a model input for a computational job, see Example 4.

Example 4: Minimal requirements for a job module

<module dictRef="compchem:job" id="job-0000" title="input for coarse geometry optimisation">
    <module dictRef="compchem:initialization">
        <!-- required init body is omitted. -->
    </module>
</module>
            

Example 5: A job module with multiple calculation steps

<module dictRef="compchem:job" id="job-0000" title="coarse geometry optimisation">
    <module dictRef="compchem:initialization">
        <!-- required init body is omitted. -->
    </module>
    <module dictRef="compchem:calculation">
        <!-- required calculation body is omitted. -->
    </module>
    <module dictRef="compchem:calculation">
        <!-- required calculation body is omitted. -->
    </module>
    <module dictRef="compchem:calculation">
        <!-- required calculation body is omitted. -->
    </module>
    <module dictRef="compchem:finalization">
        <!-- required final body is omitted. -->
    </module>
</module>
            

Example 6: A job module without calculation steps

<module dictRef="compchem:job" id="job-0000" title="coarse geometry optimisation">
    <module dictRef="compchem:initialization">
        <!-- required init body is omitted. -->
    </module>
    <module dictRef="compchem:finalization">
        <!-- required final body is omitted. -->
    </module>
</module>
            

3.3 Model Initialization

The model initialization concept represents the model parameters and inputs for a computational job or calculation. The model parameters are one of the most important elements and exist in every modelling study. Therefore, this concept is a REQUIRED element in the CompChem convention. The module defines the job or calculation, so that it should be possible to reproduce the job or calculation based soley on the data in this module.

The model initialization concept in CompChem is represented by a module element with a dictRef="compchem:initialization" attribute on a module element. Such a module will be addressed as an initialization module or an initialization module element in this convention.

An initialization module is constructed according to the following rules:

Example 7: An initialization module

<module dictRef="compchem:initialization">
    <molecule convention="convention:molecular">
        <!-- consult molecular convention or recommended molecular convention. -->
    </molecule>
    <parameterList>
        <!-- parameterList body is omitted. -->
    </parameterList>
</module>
            

3.4 Model Calculation

A model calculation concept represents the computation, the optimisation or the iteration processes for computational job. Almost any computational procedure is a calculation, and calculations can be nested to any level. As an example, an SCF calculation consists of an initial guess calculation, and a number of iterative calculations, the output of the final iteration constituting the results. An SCF geometry optimisation process consists of multiple calculation steps, each of which consists of an SCF calculation, followed by a gradient calculation.

Calculations can also inherit attributes from their parents, so for example, a basis set need not be contained within each SCF iteration calculation, as the parent calculation's basis set can be inherited by the iteration calculation.

The calculation process may or may not be of interest to some scientists; therefore, the model calculation is an OPTIONAL information in CompChem.

A model calculation concept in CompChem is represented by a module element with a dictRef="compchem:calculation" attribute on a module element. Such a module will be addressed as an calculation module or an calculation module element in this convention.

A calculation module are constructed according to the following rules: