Chemical Markup Language - Dictionary Convention

8 April 2011

This version:
http://www.xml-cml.org/convention/dictionary-20110408
Latest version:
http://www.xml-cml.org/convention/dictionary
Previous version:
http://www.xml-cml.org/convention/dictionary-20110209
Authors:
See acknowledgments.
Editors:
Sam Adams, University of Cambridge
Joe Townsend, University of Cambridge

Abstract

This specification defines the requirements of the Chemical Markup Language dictionary convention.


Table of Contents

1. Introduction
    1.1 Notational Conventions
    1.2 Namespaces
2. Applying the dictionary convention
3. Dictionary Element
    3.1 Namespace
    3.2 Prefix
    3.3 Title
    3.4 Description
    3.5Entries
4. Entry Elements
    4.1 Id
    4.2 Term
    4.3 Definition
    4.4 Description
    4.5 Data type
    4.6 Unit type
    4.7 Default units
5. Example Dictionary

Appendices

A. References
B. Acknowledgements


1. Introduction

Dictionaries allow CML to be understood by machines. Much of physical science is managed through the dictionary mechanism. We find terms and units relating to a aspect of science (such as heat of formation, melting point, point group) and create entries for these items in a dictionary.

The entries can consist of just a unique id (within the dictionary's namespace) and some human-understandable definition but we highly encourage more information to be given. For instance what are the units, are there upper and lower bounds, what is the type of the data (string, integer, float etc).

Different programs sometimes produce data with the same label but a different interpretation; does density mean electron density or physical density? Therefore each computational chemistry code will have its own dictionary and then the community can then decide to group particular concepts together.

Where concepts are defined by the CML schema they SHOULD NOT be redefined using the dictionary mechanism.

Where concepts are defined by standard dictionaries these entries SHOULD be referenced, rather than redefining the concept in another dictionary.

1.1 Notational Conventions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [IETF RFC 2119].

The terms "element", "attribute", "child" and "parent" in this document are to be interpreted as described in the W3C Recommendation for Extensible Markup Language (XML) [W3C XML].

The use of fonts is as follows:

1.2 Namespaces

This specification uses the following namespaces and prefixes to indicate those namespaces:

Prefix Namespace URI Description
cml http://www.xml-cml.org/schema Chemical Markup Language elements
convention http://www.xml-cml.org/convention/ Standard Chemical Markup Language convention namespace
xhtml http://www.w3.org/1999/xhtml XHTML

2. Applying the dictionary convention

The dictionary convention MUST be specified using the convention attribute on either a cml or a dictionary element. If the convention is specified on a cml element then that element MUST contain a single child element in the http://www.xml-cml.org/schema namespace, which MUST be a dictionary element.

3. Dictionary Element

3.1 Namespace

The dictionary element MUST have a namespace attribute, the value of which MUST be a valid URI defining the the scope within which the entry terms are unique. The dictionary's namespace URI SHOULD resolve to a representation of the dictionary. The dictionary's namespace URI SHOULD end with either a '/' character or a '#' character so that terms may be referenced by appending them to the URI.

3.2 Prefix

The dictionary element SHOULD have a dictionaryPrefix attribute specifying the default prefix to use when referencing dictionary entries. The dictionaryPrefix MUST be a valid XML QName prefix, and SHOULD be unique within the CML domain.

3.3 Title

The dictionary element SHOULD have a title attribute intended for human-readability.

3.4 Description

The dictionary element SHOULD have a single description child element, the contents of which provide a human-readable description of the domain of the dictionary. The description element MUST contain one or more child elements in the http://www.w3.org/1999/xhtml namespace. The description element MUST NOT contain any child elements not in the http://www.w3.org/1999/xhtml namespace.

3.5 Entries

The dictionary element MUST contain one ore more child entry elements, and MUST not contain any other child elements from the http://www.xml-cml.org/schema namespace.

4. Entry Elements

4.1 ID

An entry element MUST have an id attribute, the value of which MUST be unique within the scope of the dictionary.

The value of the id attribute MUST start with a letter, and MUST only contain letters, numbers, dot, hyphen or underscore.

IdStartChar ::= [A-Z] | [a-z]
IdChar ::= IdStartChar | [0-9] | "." | "-" | "_"
Id ::= IdStartChar (IdChar)*

4.2 Term

An entry element MUST have a term attribute, the value of which provides a unique nounal phrase linguistically identifying the subject of the entry.

The value of the term attribute MAY contain any valid unicode character, however it is RECOMMENDED that any character from outside of the ASCII subset (codepoints 32-127) is represented using an entity reference.

4.3 Definition

An entry element MUST contain a single definition child element, the content of which provides a nounal phrase defining of the subject of the entry more verbosely than the term.

The definition element MUST contain one or more child elements in the http://www.w3.org/1999/xhtml namespace. The definition element MUST NOT contain any child elements not in the http://www.w3.org/1999/xhtml namespace.

4.4 Description

An entry element MAY have a single description child element, the content of which provides further information regarding the term, including, but not limited to: examples, human-readable semantics and hyperlinks to other useful resources.

The description element MUST contain one or more child elements in the http://www.w3.org/1999/xhtml namespace. The description element MUST NOT contain any child elements not in the http://www.w3.org/1999/xhtml namespace.

4.5 Data type

When applicable to the concept defined, an entry SHOULD have dataType attribute, the value of which is a QName referencing the data type of value defined using the entry.

Common data types:

4.6 Unit type

An entry MUST have a unitType attribute, the value of which is a QName referencing the unit type (e.g. temperature) of any value defined using the entry.

An entry describing a concept which should not have units; for example a name of a program should have unitType none in the standard CML unitType dictionary.

4.7 Default units

When applicable to the concept defined, an entry SHOULD have a units attribute, the value of which is a QName referencing the default units (e.g. Kelvin) of any value defined using the entry.

If the unitType is expressly given as unknown then the unit attribute MUST NOT be present.

If the unitType is expressly given as none then the unit attribute MUST be present and its value must point to http://www.xml-cml.org/unit/si#none.

5. Example Dictionary

<?xml version="1.0" encoding="UTF-8" ?>
<dictionary xmlns="http://www.xml-cml.org/schema"
            xmlns:convention="http://www.xml-cml.org/convention/"
            xmlns:unit="http://www.xml-cml.org/dictionary/unit/"
            xmlns:unitType="http://www.xml-cml.org/dictionary/unitType/"
            xmlns:xhtml="http://www.w3.org/1999/xhtml"
            xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            convention="convention:dictionary"
            title="fundamental chemistry concepts"
            namespace="http://www.xml-cml.org/dictionary/dummy/"
            dictionaryPrefix="dummy">

    <entry id="molecmass" term="Molecular Mass"
           dataType="xsd:double" unitType="unitType:amount" units="unit:amu">
        <definition>
            <xhtml:p>
                The mass of one mole of a substance in unified atomic mass units.
            </xhtml:p>
        </definition>
        <description>
            <xhtml:p>
                The molecular mass (m) of a substance is the mass of one molecule of that substance,
                in unified atomic mass unit(s) u (equal to 1/12 the mass of one atom of the isotope
                carbon-12). This is numerically equivalent to the relative molecular mass (Mr) of a
                molecule, frequently referred to by the term molecular weight, which is the ratio of
                the mass of that molecule to 1/12 of the mass of carbon-12 and is a dimensionless
                number. Thus, it is incorrect to express relative molecular mass (molecular weight)
                in daltons (Da). Unfortunately, the terms molecular weight and molecular mass have
                been confused on numerous websites, which often state that molecular weight was used
                in the past as another term for molecular mass.
            </xhtml:p>
            <xhtml:p>
                Molecular mass differs from more common measurements of the mass of chemicals, such
                as molar mass, by taking into account the isotopic composition of a molecule rather
                than the average isotopic distribution of many molecules. As a result, molecular mass
                is a more precise number than molar mass; however it is more accurate to use molar
                mass on bulk samples. This means that molar mass is appropriate most of the time
                except when dealing with single molecules.
            </xhtml:p>
        </description>
    </entry>

    <entry id="molarmass" term="Molar Mass"
           dataType="xsd:double" unitType="unitType:amount" units="unit:amu">
        <definition>
            <xhtml:p>
                The mass per amount of substance.
            </xhtml:p>
        </definition>
        <description>
            <xhtml:p>
                Molar mass, symbol M, is a physical property characteristic of a given substance
                (chemical element or chemical compound), namely its mass per amount of substance.
                The base SI unit for mass is the kilogram and that for amount of substance is
                the mole. Thus, the derived unit for molar mass is kg/mol. However, for both
                practical and historical reasons, molar masses are almost always quoted in grams
                per mole (g/mol or g mol−1), especially in chemistry.
            </xhtml:p>
            <xhtml:p>
                Molar mass is closely related to the relative molar mass (Mr) of a compound, the
                older term formula weight and to the standard atomic masses of its constituent
                elements. However, it should be distinguished from the molecular mass (also
                known as molecular weight), which is the mass of one molecule (of any single
                isotopic composition) and is not directly related to the atomic mass, the mass
                of one atom (of any single isotope). The dalton, symbol Da, is also sometimes
                used as a unit of molar mass, especially in biochemistry, with the definition
                1 Da = 1 g/mol, despite the fact that it is strictly a unit of molecular mass
                (1 Da = 1.660 538 782(83)×10−27 kg).
            </xhtml:p>
        </description>
    </entry>

</dictionary>

A. References

[RFC2119]
IETF RFC 2119: Key words for use in RFCs to Indicate Requirement Levels, S. Bradner, March 1997. Available at http://www.ietf.org/rfc/rfc2119.txt.
[XML]
Extensible Markup Language (XML) 1.0 (Fifth Edition), T. Bray, J. Paoli, C.M. Sperberg-McQueen E. Maler and F. Yergeau, Editors. World Wide Web Consortium. 26 October 2008. This version is http://www.w3.org/TR/2008/REC-xml-20081126. latest version of XML is available at http://www.w3.org/TR/REC-xml.

B. Acknowledgements


Creative Commons Licence
This work is licensed under aCreative Commons Attribution 3.0 Unported License.

Valid XHTML 1.0 Strict