Chemical Markup Language - UnitType Dictionary Convention

10 March 2011

This version:
http://www.xml-cml.org/convention/unitType-dictionary-20110310
Latest version:
http://www.xml-cml.org/convention/unitType-dictionary
Authors:
See acknowledgments.
Editors:
Sam Adams, University of Cambridge
Joe Townsend, University of Cambridge

Abstract

This specification defines the requirements of the Chemical Markup Language unitType-dictionary convention.


Table of Contents

1. Introduction
    1.1 Notational Conventions
    1.2 Namespaces
2. Applying the unitType-dictionary convention
3. UnitTypeList Element
    3.1 Namespace
    3.2 Title
    3.3 Description
    3.4 UnitTypes
4. UnitType Elements
    4.1 ID
    4.2 Name
    4.3 Title
    4.4 Preserve
    4.5 Dimensions
    4.6 Definition
    4.7 Description
5. Dimension Elements
    5.1 Name
    5.2 UnitType
    5.3 Power
6. Example UnitType Dictionary

Appendices

A. References
B. Acknowledgements


1. Introduction

Units are required throughout CML but both humans and machines need to know how they can be combined. To support this CML includes the unitType concept. Each unitType needs to have a unique identifier and be defined in such a way that they can be understood by both humans and machines.

Lists of unitTypes are similar to dictionaries but require more information for each "entry" such as their dimensionality i.e. the units metre, angstrom and picometre are all of dimension length whilst the unit square metre is of dimension length squared. The phrases "unitType list" and "unitType dictionary" are used interchangably; the only reason for choosing one term over the other is for readability.

Where unitTypes are already defined in the standard unitType dictionaries (see http://www.xml-cml.org/unit/) these unitTypes SHOULD be used, rather than redefining the concepts in another unitType dictionary.

1.1 Notational Conventions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [ IETF RFC 2119 ].

The terms "element", "attribute", "child" and "parent" in this document are to be interpreted as described in the W3C Recommendation for Extensible Markup Language (XML) [ W3C XML ].

The use of fonts is as follows:

1.2 Namespaces

This specification uses the following namespaces and prefixes to indicate those namespaces:

Prefix Namespace URI Description
cml http://www.xml-cml.org/schema Chemical Markup Language elements
convention http://www.xml-cml.org/convention/ Standard Chemical Markup Language convention namespace
xhtml http://www.w3.org/1999/xhtml XHTML

2. Applying the unitType-dictionary convention

The unitType-dictionary convention MUST be specified using the convention attribute on a unitTypeList element.

3. UnitTypeList Element

3.1 Namespace

The unitTypeList element MUST have a namespace attribute, the value of which MUST be a valid URI defining the the scope within which the unitType ids are unique.

The unitTypeList's namespace URI SHOULD resolve to a representation of the dictionary of unitTypes. The unitTypeList's namespace URI SHOULD end with either a '/' character or a '#' character so that terms may be referenced by appending them to the URI.

3.2 Title

The unitTypeList element SHOULD have a title attribute intended for human-readability.

The title attribute MUST NOT be empty and MUST contain at least one non-whitespace character.

The value of the title attribute MAY contain any valid unicode character, however it is RECOMMENDED that any character from outside of the ASCII subset (codepoints 32-127) is represented using an entity reference.

3.3 Description

The unitTypeList element SHOULD have a single description child element, the contents of which provide a human-readable description of the domain of the dictionary. The description element MUST contain one or more child elements in the http://www.w3.org/1999/xhtml namespace. The description element MUST NOT contain any child elements not in the http://www.w3.org/1999/xhtml namespace.

3.4 UnitTypes

The unitList element MUST contain one more more child unitType elements, and MUST not contain any other child elements from the http://www.xml-cml.org/schema namespace.

<?xml version="1.0" encoding="UTF-8" ?>
<unitTypeList
        xmlns="http://www.xml-cml.org/schema"
        xmlns:unitType="http://www.xml-cml.org/units/unitType"
        xmlns:xhtml="http://www.w3.org/1999/xhtml"
        xmlns:convention="http://www.xml-cml.org/convention/"
        namespace="http://www.xml-cml.org/unit/unitType/"
        title="example unitType dictionary"
        convention="convention:unitType-dictionary" >
     <description>
         <xhtml:p>
            This is an example unitType list for demonstration purposes
         </xhtml:p>
     </description>
     <unitType>
     <!-- rest of document omitted -->
     </unitType>
</unitTypeList>

4. UnitType Elements

4.1 ID

A unitType element MUST have an id attribute, the value of which MUST be unique within the scope of the unitList.

The value of the id attribute MUST start with a letter, and MUST only contain letters, numbers, dot, hyphen or underscore.

IdStartChar ::= [A-Z] | [a-z]
IdChar ::= IdStartChar | [0-9] | "." | "-" | "_"
Id ::= IdStartChar (IdChar)*

4.2 Name

A unitType element MUST have a name attribute, the value of which linguistically identifies the type of the unit. For example 'length', 'volume', 'velocity', 'concentration' and 'wavenumber' would be suitable names for unitTypes.

The name attribute MUST NOT be empty and MUST contain at least one non-whitespace character.

The value of the name attribute MAY contain any valid unicode character, however it is RECOMMENDED that any character from outside of the ASCII subset (codepoints 32-127) is represented using an entity reference.

4.3 Title

A unitType element MUST have a title attribute, the value of which will typically be more descriptive than the name i.e. 'concentration (moles of substance per volume of solution)'.

The title attribute MUST NOT be empty and MUST contain at least one non-whitespace character.

4.4 Preserve

A unitType element MAY have a preserve attribute. It is RECOMMENDED that the value is "true" or "false" although "1" (for true) and "0" (for false) are also allowed. There is no default value for the attribute but in effect if the preserve attribute is not present it will generally be assumed that the dimensions will not be preserved.

Dimensions of the same type can be algebraically combined using the 'power' attributes. Normally dimensions will be aggregated and cancelled algebraically, but the 'preserve' attribute can be used to prevent this. Thus a velocity gradient over length can be defined as:

        <unitType id="a1" preserve="true">
            <dimension name="length" power="1" />
            <dimension name="time" power="-1" />
            <dimension name="length" power="-1" />
        </unitType>
    
whereas cancelling the dimensions would give:
        <unitType id="a1" preserve="true">
            <dimension name="time" power="-1" />
        </unitType>
    

4.5 Dimensions

A unitType element MUST contain at least one dimension child. These elements define the unitType in terms of other unitTypes to particular powers.

4.6 Definition

A unitType element MUST contain a single definition child element, the content of which provides a concise human-readable definition of the unit. For example, the definition of the SI unit of time (a second) would be;
"The SI base unit of time, equal to the duration of 9192631770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium-133 atom."

The definition element MUST contain one or more child elements in the http://www.w3.org/1999/xhtml namespace.

There MUST be at least one non-whitespace character as a child of the http://www.w3.org/1999/xhtml element.

4.7 Description

A unitType element MAY have a single description child element, the content of which provides further information regarding the unit, including, but not limited to: examples, human-readable semantics and hyperlinks to other useful resources.

The description element MUST contain one or more child elements in the http://www.w3.org/1999/xhtml namespace.

There MUST be at least one non-whitespace character as a child of the http://www.w3.org/1999/xhtml element.

5. Dimension Elements

5.1 Name

A dimension element MUST have a name attribute, the value of which linguistically identifies the type of the unit. For example 'mass', 'length' and 'time' would be suitable names for dimensions.

The name attribute MUST NOT be empty and MUST contain at least one non-whitespace character.

The value of the name attribute MAY contain any valid unicode character, however it is RECOMMENDED that any character from outside of the ASCII subset (codepoints 32-127) is represented using an entity reference.

5.2 UnitType

Every dimension element MUST have a unitType attribute, the value of which is a QName referencing the unit type (e.g. time, temperature, length, force) of the unit. The prefix of the QName MUST be bound to a declared namespace which MUST be a valid URI and SHOULD be a valid URL. There SHOULD be an element with the same id as the local-name of the QName available at this URL giving further information.

For base units i.e. mass, length and time the unitType will be self referential.

5.3 Power

A dimension element MUST have a power attribute, the value of which MUST be a double and defines the power to which a dimension should be raised.

6. Example Dictionary

<?xml version="1.0" encoding="UTF-8" ?>
<unitTypeList
        xmlns="http://www.xml-cml.org/schema"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:dummyDictionary="http://www.xml-cml.org/dictionary/"
        xmlns:convention="http://www.xml-cml.org/convention/"
        xmlns:xhtml="http://www.w3.org/1999/xhtml"
        xmlns:unitType="http://www.xml-cml.org/unit/unitType/"
        convention="convention:unitType-dictionary"
        namespace="http://www.xml-cml.org/unit/unitType/"
        title="example unitType dictionary" >
    <description>
         <xhtml:p>
            This is an example unitType list for demonstration purposes
         </xhtml:p>
    </description>

    <unitType id="dimensionless" name="Dimensionless" title="Dimensionless">
        <dimension name="dimensionless" unitType="unitType:dimensionless" power="1"/>
        <definition>
            <xhtml:p>A quantity without an associated physical dimension</xhtml:p>
        </definition>
        <description>
            <xhtml:p>This is a difficult concept and users should read the Wikipedia article and further references.
                <xhtml:code>dimensionless</xhtml:code>
                can be used for numeric quantities which formally have no units such as
                dimensionless constants. It can also be used for ratios of quantities with units (e.g. mg/kg).
                Cancelling
                units loses semantic information and we shall try in CML to avoid this if possible.
            </xhtml:p>
            <xhtml:p>
                <xhtml:code>dimensionless</xhtml:code>
                MUST not be used for concepts which are not numeric (e.g. algorithms) or pure numbers
                and
                <xhtml:a href="#none">none</xhtml:a>
                SHOULD be used.
            </xhtml:p>
            <xhtml:p>
                <xhtml:a href="http://en.wikipedia.org/wiki/Dimensionless_quantity">Dimensionless</xhtml:a>
                in Wikipedia
            </xhtml:p>
        </description>
        <dc:source>Peter Murray-Rust</dc:source>
    </unitType>

      <unitType id="mass"
              name="Mass"
              title="Mass">
        <dimension name="mass" unitType="unitType:mass" power="1"/>
        <definition>
            <xhtml:p>Mass used as a type of unit corresponding to dimension
                <xhtml:code>
                    <xhtml:b>M</xhtml:b>
                </xhtml:code>
                .
            </xhtml:p>
        </definition>
        <description>
            <xhtml:p>Mass is a difficult concept. In Chemistry the three types of mass (inertial mass, active gravitational
                mass,
                and passive gravitational mass) are generally used interchangeably. Here
                <xhtml:code>mass</xhtml:code>
                is used as the type of the
                unit and can be linked to the fundamental
                <xhtml:b>M</xhtml:b>
                dimension.
            </xhtml:p>
            <xhtml:p>Mass should never be used as a synonym for weight.</xhtml:p>
            <xhtml:p>see
                <xhtml:a href="http://en.wikipedia.org/wiki/Mass">Mass</xhtml:a>
                in Wikipedia
            </xhtml:p>
            <dc:source>Wikipedia</dc:source>
        </description>
    </unitType>
    
    <unitType id="length" name="Length" title="Length">
        <dimension name="length" unitType="unitType:length" power="1"/>
        <dc:source>Wikipedia</dc:source>
        <definition>
            <xhtml:p>Length is a measure of one dimension</xhtml:p>
        </definition>
        <description>
            <xhtml:p>Length is a measure of one dimension, whereas area is a measure of two dimensions (length squared) and volume is a measure of three dimensions (length cubed). In most systems of measurement, the unit of length is a fundamental unit, from which other units are defined.</xhtml:p>
            <xhtml:p>In the physical sciences and engineering, when one speaks of "units of length", the word "length" is synonymous with "distance".</xhtml:p>
            <xhtml:p>The metre (or meter), symbol m, is the base unit of length in the International System of Units (SI). Originally intended to be one ten-millionth of the distance from the Earth's equator to the North Pole (at sea level), its definition has been periodically refined to reflect growing knowledge of metrology. Since 1983, it is defined as the length of the path travelled by light in vacuum in 1⁄299,792,458 of a second.</xhtml:p>
        </description>
    </unitType>
    
    <unitType id="reciprocal_length" name="reciprocal length" title="reciprocal length">
        <dimension name="length" unitType="unitType:length" power="-1"/>
        <definition>
            <xhtml:p>Reciprocal length or inverse length is often used in spectroscopy as a measure of energy but in other
                areas it could be the absorption coefficient or attenuation coefficient in materials science or
                the curvature of a line in mathematics.
            </xhtml:p>
        </definition>
        <description>
            <xhtml:p>Reciprocal length or inverse length is a measurement used in several branches of
                science and mathematics. As the reciprocal of length, common units used for this
                measurement include the reciprocal metre or inverse metre (m−1), the reciprocal centimetre or inverse
                centimetre (cm−1), and, in optics, the dioptre.
            </xhtml:p>
        </description>
    </unitType>
    
</unitTypeList>   

A. References

[RFC2119]
IETF RFC 2119: Key words for use in RFCs to Indicate Requirement Levels , S. Bradner, March 1997. Available at http://www.ietf.org/rfc/rfc2119.txt.
[XML]
Extensible Markup Language (XML) 1.0 (Fifth Edition) , T. Bray, J. Paoli, C.M. Sperberg-McQueen E. Maler and F. Yergeau, Editors. World Wide Web Consortium. 26 October 2008. This version is http://www.w3.org/TR/2008/REC-xml-20081126. latest version of XML is available at http://www.w3.org/TR/REC-xml.

B. Acknowledgements


Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License.