Chemical Markup Language - Dictionaries

9 February 2011

This version:
http://www.xml-cml.org/spec/dictionaries-20110209
Latest version:
http://www.xml-cml.org/spec/dictionaries
Authors:
See acknowledgments.
Editors:
Sam Adams, University of Cambridge
Joe Townsend, University of Cambridge

Abstract

This specification explains the CML dictionaries mechanism, and defines the minimum requirements for interoperability.


Table of Contents

1. Introduction
    1.1 Notational Conventions
    1.2 Namespaces
2. The Dictionary Mechanism
    2.1 Dictionary Namespace
3. Convention
4. Authoring Dictionaries
    4.1 Dictionary Element
    4.2 Entry Element
5. Standard Dictionaries

Appendices

A. References
B. Acknowledgements


1. Introduction

Dictionaries allow concepts beyond those defined by the core CML schema types to be understood by machines. Much of physical science can managed through referencing appropriate dictionary entries, which can represent data types (e.g. string, integer, double), units (e.g. grams, Kelvin), physical properties (e.g heat of formation, melting point) and other scientific concepts (e.g. point group).

Dictionaries aso provide for disambiguation between similarly named concepts. Different programs sometimes produce data with the same label but a different interpretation; does density mean electron density or physical density? This ambiguity can be resolved through referencing specific dictionary entries.

1.1 Notational Conventions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [IETF RFC 2119].

The terms "element", "attribute", "child" and "parent" in this document are to be interpreted as described in the W3C Recommendation for Extensible Markup Language (XML) [W3C XML].

The use of fonts is as follows:

1.2 Namespaces

This specification uses the following namespaces and prefixes to indicate those namespaces:

Prefix Namespace URI Description
cml http://www.xml-cml.org/schema Chemical Markup Language elements
convention http://www.xml-cml.org/convention/ Standard Chemical Markup Language convention namespace
xhtml http://www.w3.org/1999/xhtml XHTML

2. The Dictionary Mechanism

CML dictionaries consist of a collection of entries, each of which defines a single concept. Dictionaries are constructed using the dictionary element, and entries using the entry element.

Concepts defined by dictionary entries can be referenced from other locations using dictionary references, specified using dictRef attributes. In order to be referenceable a dictionary MUST specify a namespace using the namespace attribute, and entries MUST specify an ID using the id attribute, which MUST be unique within the scope of the entry's parent dictionary. The value of the dictRef attribute MUST be a QName which expands to the namespace and id of the dictionary entry defining the concept being referenced.

The value of an entry's id attribute MUST start with a letter, and MUST only contain letters, numbers, dot, hyphen or underscore.

IdStartChar ::= [A-Z] | [a-z]
IdChar ::= IdStartChar | [0-9] | "." | "-" | "_"
Id ::= IdStartChar (IdChar)*

A minimal CML dictionary:

<?xml version="1.0" encoding="UTF-8"?>
<dictionary xmlns="http://www.xml-cml.org/schema"
        namespace="http://www.xml-cml.org/dictionary/dummy/">

    <entry id="mw" term="Molecular Weight" />

</dictionary>

A CML property referencing a concept defined by a dictionary entry:

<?xml version="1.0" encoding="UTF-8"?>
<cml xmlns="http://www.xml-cml.org/schema"
        xmlns:dummy="http://www.xml-cml.org/dictionary/dummy/">

    <property dictRef="dummy:mw">
        <scalar dataType="xsd:double">72.02</scalar>
    </property>

</cml>

2.1 Dictionary Namespace

A dictionary's namespace URI SHOULD resolve to a representation of the dictionary. A dictionary's namespace URI SHOULD end with either a '/' character or a '#' character so that referenced terms may be resolved by appending them to the URI.

In order for dictionary namespaces to be resolvable a CML document SHOULD NOT contain more than one dictionary.

3. Convention

The CML schema only loosely defines the structure of a dictionary. More detailed structure is imposed through use of conventions.

To promote interoperability between CML aware software and dictionaries, all dictionaries SHOULD specify a convention, using the convention attribute. The RECOMMENDED convention for a dictionary is http://www.xml-cml.org/convention/dictionary.

4. Authoring Dictionaries

4.1 Dictionary Element

A dictionary element SHOULD have a namespace attribute defining the the scope within which the entry terms are unique.

A dictionary element MAY have a title attribute specifying a human-readable name for the dictionary.

A dictionary element MAY have a description child element specifying a human-readable description of the dictionary.

A dictionary element MAY have a dictionaryPrefix attribute specifying a default prefix to use when referencing dictionary entries.

4.2 Entry Element

An entry element SHOULD have a id attribute specifying a unique ID, within the scope of the dictionary, by which the entry can be referenced.

An entry element SHOULD have a term attribute specifying unique nounal phrase linguistically identifying the subject of the entry.

An entry element MAY have a child definition element containing a nounal phrase defining of the subject of the entry more verbosely than the term.

An entry element MAY have a child description element containing further information regarding the term.

An entry element MAY have a child dataType attribute specifying the data type of values referencing the entry.

An entry element MAY have a child unitType attribute specifying the unit type of values referencing the entry.

An entry element MAY have a child units attribute specifying the units of values referencing the entry.

5. Standard Dictionaries

Where concepts are defined by the CML schema they SHOULD NOT be redefined using the dictionary mechanism.

Where concepts are defined by standard dictionaries these entries SHOULD be referenced, rather than redefining the concept in another dictionary. Standard (widely used) dictionaries SHOULD be namespaced under the http://www.xml-cml.org/dictionary/ domain, and registered with the xml-cml.org site.

The following standard dictionaries are / are expected to be defined.

5.x Dummy

For testing purposes: conceptually equivalent to http://example.com.

5.x Core Concepts

5.x Units

5.x Comp Chem

5.x Crystallography

5.x Polymers

5.x Reactions

5.x Spectroscopy

A. References

[RFC2119]
IETF RFC 2119: Key words for use in RFCs to Indicate Requirement Levels, S. Bradner, March 1997. Available at http://www.ietf.org/rfc/rfc2119.txt.
[XML]
Extensible Markup Language (XML) 1.0 (Fifth Edition), T. Bray, J. Paoli, C.M. Sperberg-McQueen E. Maler and F. Yergeau, Editors. World Wide Web Consortium. 26 October 2008. This version is http://www.w3.org/TR/2008/REC-xml-20081126. latest version of XML is available at http://www.w3.org/TR/REC-xml.

B. Acknowledgements


Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License.