From XiphWiki
Revision as of 14:35, 9 March 2006 by Andrel (Talk | contribs) (rm spam)

Jump to: navigation, search

CMML stands for Continuous Media Markup Language and is to audio or video what HTML is to text. CMML is essentially a timed text codec. It allows to structure a time-continuously sampled data file by dividing it into temporal sections (so-called clips) and provides these clips with some additional information. This information is HTML-like and is essentially a textual representation of the audio or video file. CMML enables textual searches on these otherwise binary files.

CMML is appropriate for use with all Ogg media formats, to provide subtitles and timed metadata. This description gives a quick introduction only and explains how to map CMML into Ogg. For full specifications, see

CMML specification

Before describing the actual data that goes into a logical Ogg bitstream, we need to understand what the stand-alone "codec" packets contains.

CMML basically consists of:

  • a head tag which contains information for the complete audio/video file
  • a set of clip tags which each contains information on a temporal section of the file
  • for authoring purposes, CMML also allows a stream tag which spcifies the file it describes

An example CMML file looks like this:

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
 <!DOCTYPE cmml SYSTEM "cmml.dtd">

 <cmml lang="en" id="simple" granulerate="1000/1">

 <stream id="fish" basetime="0">
  <import id="videosrc" lang="en" title="Video fish" 
          granulerate="25/1" contenttype="video/theora" 
          src="fish.ogg" start="0" end="360">
    <param id="vheight" name="video.height" value="250"/>
    <param id="vwidth"  name="video.width"  value="180"/>

  <title>Types of fish</title>
  <meta name="Producer" content="Joe Ordinary"/>
  <meta name="DC.Author" content="Joe's friend"/>

 <clip id="intro" start="0">
  <a href="">Read more about fish</a>
  <desc>This is the introduction to the film Joe made about fish.</desc>

 <clip id="dolphin" start="npt:3.5" end="npt:5:5.9">
  <img src="dolphin.jpg"/>
  <desc>Here, Joe caught sight of a dolphin in the ocean.</desc>
  <meta name="Subject" content="dolphin"/>

 <clip id="goldfish" start="npt:5:5.9">
  <a href="">More video clips on goldfish.</a>
  <img src=""/>
  <desc>Joe has a fishtank at home with many colourful fish. The common goldfish is one of them and Joe's favourite.
        Here are some fabulous pictures he has taken of them.</desc>
  <meta name="Location" content="Joe's fishtank"/>
  <meta name="Subject" content="goldfish"/>


The head element is a standard head element from html.

Clips contain (amongst others) the following information:

  • a name in the id attribute so addressing of the clips is possible, as in (Web server needs to support this)
  • a start and possibly an end attribute, to tell the clip where it is temporally located
  • a title attribute to give it a short description
  • meta elements to provide it with structed meta data as name-value pairs
  • a img element which links to a picture that represents the content of the clip visually
  • a a element which puts a hyperlink to another Web resource into the clip
  • a desc element giving a long, free-text description/annotation/transcription for the clip

Most of this information is optional.

CMML mapping into Ogg

When CMML is mapped into an Ogg logical bitstream it needs to be serialised first. XML is a hierarchical file format, so is not generally serialisable. However, CMML has been designed to be serialised easily.

CMML is serialised by having some initial header packets that set up the CMML decoding environment, and contain header type information. The content packets of a CMML logical bitstream then consists of clip tags only. The stream tag is not copied into the CMML bitstream as it controls the authoring only.

All of the CMML bitstream information is text. As it gets encoded into a binary bitstream, an encoding format has to be specified. To simplify things, UTF-8 is defined as the mandatory encoding format for all data in a CMML binary bitstream. Also, the encoding process MUST ensure that newline characters are represented as LF (or "\n" in C) only and replace any new line representations that come as CR LF combinations (or "\r\n" in C) with LF only.

The media mapping for CMML into Ogg is as follows:

  • The bos page contains a CMML ident packet.
  • The first secondary header packet of CMML contains the xml preamble.
  • The second secondary header packet contains the CMML "head" tag.
  • The content or data packets for CMML contain the CMML "clip" tags each encoded in their own packet and inserted at the accurate time.
  • The eos page contains a packet with an empty clip tag.

The CMML ident header packet

The CMML logical bitstream starts with an ident header which is encapsulated into the CMML bos page. The ident header contains all information required to identify the CMML bitstream and to set up a CMML decoder. It has the following format:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
| Identifier 'CMML\0\0\0\0'                                     | 0-3
|                                                               | 4-7
| Version major                 | Version minor                 | 8-11
| Granulerate numerator                                         | 12-15
|                                                               | 16-19
| Granulerate denominator                                       | 20-23
|                                                               | 24-27
| Granuleshift  |                                                 28
| ...

The CMML version as described here is major=2 minor=1.

The granulerate represents the temporal resolution of the logical bitstream in Hz given as a rational number in the same way as the OggSkeleton fisbone secondary header specifies granulerate. It enables a mapping of granule position of the data pages to time by calculating "granulepos / granulerate".

The default granule rate for CMML is: 1/1000.

The granuleshift is a 1 Byte integer number describing whether to partition the granule_position into two for the CMML logical bitstream, and how many of the lower bits to use for the partitioning. The upper bits then still signify a time-continuous granule position for a directly decodable and presentable data granule. The lower bits allow for specification of the granule position of a previous CMML data packet (i.e. "clip" element), which helps to identify how much backwards seeking is necessary to get to the last and still active "clip" element (of the given track). The granuleshift is therefore the log of the maximum possible clip spacing.

The default granule shift used is 32, which halfs the granule position to allow for the backwards pointer.

The CMML secondary header packets

The CMML secondary headers are a sequence of two packets that contain the CMML and XML "setup" information:

  • one packet with the CMML xml preamble and cmml tag.
  • one packet with the CMML head tag.

These packets contain textual, not binary information.

The CMML preamble tags are all single-line tags, such as the xml processing instruction (<?xml...>) and the document type declaration (<!DOCTYPE...>).

The only CMML tag that is not already serialized from a CMML file is the cmml tag, as it encloses all the other content tags. To serialise it, the cmml start tag is transformed into a processing instruction, retaining all its attributes (<?cmml ...>), and the cmml end tag is deleted.

The first CMML secondary header packet has the following format:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
| <?xml ...                                                     | 0-
| ...                                                           |
| <!DOCTYPE ...                                                 |
| ...                                                           |
| <?cmml ...                                                    |

The second CMML secondary header packet contains the CMML head element with all its attributes and other containing elements and has the following format.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
| <head ...                                                     | 0-
| ...                                                           |
| </head>                                                       |

The CMML data packets

The data packets of the CMML bitstream contain the CMML clip elements. Their start and end attributes however only exist for authoring purposes and are not copied into the bitstream (to avoid contradictory information), but are rather represented through the time mapping of the encapsulation format that interleaves CMML data with data from other time-continuous bitstreams. Generally the time mapping is done through some timestamp representation and through the position in the stream.

A clip tag is encoded with all tags (except for the start and end attributes) as a string printed into a clip packet. The clip tag's start attribute tells the encapsulator at what time to insert the clip packet into the bitstream. If an end attribute is present, it leads to the creation of another clip packet, unless another clip packet starts on the same track beforehand. This clip packet contains an "empty" clip tag, i.e. a clip tag without meta, a, img or desc elements and no attribute values except for a copy of the track attribute from the original clip tag.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
| <clip ...                                                     | 0-
| ...                                                           |
| </clip>                                                       |


Ogg CMML is being supported by the following projects:

External links

  • CMML is described in more detail in the CMML v2.1 specification: I-D in svn or I-D