From XiphWiki
Revision as of 14:54, 26 January 2008 by Aleksandersen (Talk | contribs) (Describing performers)

Jump to: navigation, search

The following is a draft.

It is at best incomplete and at worst completely broken. In any case, it is not an "official" Xiph spec/codec, so use with care.

See other suggested metadata methods.

This document describes the proposed Multimedia Metadata Format (M3F) for the Ogg container. The format is built on the Extensible Markup Language (XML). It is intended to describe any kind of multimedia (audio, video, text, images, …) that can reside in an Ogg container.

Format description

Multimedia Metadata Format documents describe media resources in Ogg containers and stream. The format can link resources with one another for media players that support rendering multiple kinds of media. (Such as audio tracks and albumart; and video and commentary audio overlays.)

No element except ‘metadata’ is required. But some elements have required attributes.

All dates must be formatted as ISO 8601:2000 – International Date and Time Format.

XML, declaration, and name spaces

A metadata document must have a standard XML declaration on the very first line. The XML deceleration must contain the ‘version’ and ‘encoding’ attributes.

<?xml encoding="UTF-8" version="1.0" ?>

Encoding should default to Unicode/UTF-8. XML version should always use the oldest version were desired features are available, as per the XML specifications. Only when features not pressent in older XML versions are required should a newer version be used.

In addition the XML attributes ‘base’ and ‘lang’ may be used. Refer to them with an ‘xml:’ prefix if used in any element except the XML declaration. The optional ‘base’ attribute defines the base URI for compleating relative URIs. This value must default to the current Ogg Container. (Basically all other URIs in the format is relative. The attribute is usefull for formats other than Ogg.) The attribute is inherited by all children. The optional ‘lang’ element describes the language of a resource using three letter ISO 639-3 codes. Note that this element should only be used on the ‘resource’ element or in the XML declaration for easier software parsing. The attribute is inherited by all children and have no default value.

The ‘metadata’ element is required as the top level container. It must contain at least one XML name space defining the format via the ‘xmlns’ attribute. (The URL used in the example is not the final address as no name space have been created yet.)

<metadata xmlns="http://xmlns.xiph.org/metadata/0.1/">

The Multimedia Metadata Format can be extended by including multiple XML name spaces to the ‘metadata’ element. As with any other XML format: Software may not add, modify, or expect elements and attributes not defined by a XML name space.


The ‘xml:lang’ can be used on any element, and it is inherited from parent elements. In the below example, the first title element inherits ‘eng’ as its language from the ‘resource‘ parent element. A player may present any title, but should prefer the global language (language of the ‘metadata’ element) or the user’s language, as specified by the player. If the player’s interface language (or possibly a dedicated metadata language option) is German (ISO 639-3:deu), than it should assume the user prefers the German title.

<resource […] xml:lang="eng">
	<title>The Science of Sleep</title>
	<title xml:lang="deu">Anleitung zum Träumen</title>
	<title xml:lang="ita">La science des rêves</title></resource>

Addressing the media resource

Media resources in the stream is described as ‘resource’ children of the ‘metadata’ element. Each resource element must have a ‘oggserial’ linking it to the correct chunk in the stream. It must also have a ‘type’ attribute with the native MIME type of the resource.

<resource oggserial="0×EXAMPLE" type="audio/vorbis">

The ‘uri’ attribute may be used in stead of ‘oggserial’. For ogg serials the URI would be ‘urn:oggserial:#0×EXAMPLE’. (Other container and native file formats may specify any URI that works with that format.)

Resource elements can also have an optional unique ‘id’ attribute. The ‘id’ attribute is used as a label when the resource needs to be addressed by another resource element.

<resource id="unique-resource-id" […]>

Note: It is good practise to include every resource in a stream as a resource element. This makes it easier to link and describe relationships with other resources. (Such as with films and subtitles; and music and albumart.)

Describing the media resource

There are many children elements of the ‘resource’ element. All are optional and everyone can be used with any resource. Though media type spesific children are grouped together. These children does not make much sense with all media types

Describing audiences

The ‘audience’ element is a self-regulated filtering mechanism intended for parental control and self-regulated filtering. The optional ‘nudity’ attribute is a space separated list with one or more of the following values ‘breasts’, ‘buttocks’, and ‘genitals’. The optional ‘sexual’ attribute is a space separated list of with one or more for the following values ‘kissing’, ‘sexact’, ‘touching’, ‘sexlanguage’, ‘erections’ and ‘erotica’. The optional ‘violence’ attribute is a space separated list with one or more of the following values ‘rape’, ‘human-injury’, ‘animal-injury’, ‘anime-injury’, ‘human-blod’, ‘animal-blod’, ‘anime-blod’, ‘human-torture’, ‘animal-torture’, and ‘anime-torture’. The optional ‘language’ atribute is a space separated list with one or more of the following values ‘vulgar’, ’swear’, and ‘mild’. The optional ‘harmful’ attribute is a space separated list with one or more of the following values ‘tobacco’, ‘alcohol’, ‘drug’, ‘weapons’, ‘example-dangerous’, ‘horror’, and ‘discrimination’. The required ‘context’ attribute is a space separated list with one or more of the following values ‘artistic’, ‘educational’, ‘medical’, ’sports’ and ‘news context’.

<audience context="education" language="mild" nudity="sexact kissing" […] />

Note: The filters are based on Internet Content Ration Associations' (ICRA) work. See their label generator for full meaning of values.

Describing categories

The ‘category’ element describes the listing genre of the resource. The required ‘sort’ attribute describes the preferred genre for listing.

<category sort="metal">

The optional ‘genre’ child element describes more in-depth sorting for the resource.

<category sort="metal">
	<genre>symphonic metal</genre>
	<genre>goth metal</genre>
Describing collections

Media resources may appear in collections (DVD set boxes, CD albums, etc.). The ‘collection’ element describes the resources relation and order/place in collections. The optional ‘date’ attribute describes the date the collection was made publicly available (its ‘release date’). The optional ‘track’ attribute describes the resource's order/place in the collection. The optional ‘tracks’ attribute describes the total number of resources in the collection. The optional ‘uri’ attribute should uniquely identify the collection as a whole.

<collection date="2019-01-15" track="2" tracks="12" uri="urn:x-isrc:0123456789">

The optional ‘artwork’ child element links the collection to a image resource. The required ‘uri’ attribute should either be a resource's ‘id’ attribute value with a ‘#’ prefix (as below) or a web URL resource. The optional ‘type’ attribute should be the MIME type of the image resource. The attribute should not be used when linking to other internal resources; but is encurraged when linking to external resources (such as web URLs).

	<artwork uri="#embedded-image" />

The optional ‘title’, ‘subtitle’, and ‘tagline’ child elements function as the ‘resource:title’ element.

	<title>Great Audio VI</title>
	<tagline>Music that rocks you!</tagline>

Note that CD singles are indeed collections too.

Describing encodings

The ‘encoding’ element describes the encoding or digitalization of the resource.


The optional ‘date’ child element describes when the last file encoding happen. When the file is re-encoded the original date of encoding should be preserved, and another date element should be added with the date of re-encoding.


The optional ‘source’ child element describes the original media source for the encoding. The required ‘media’ attribute must be either ‘cd’, ‘dvd’, ‘tape’, ‘web-stream’, ‘tv-stream’, ‘radio-stream’, ‘file’, or ‘unknown’. The optional ‘uri’ attribute should uniquely identify the media.

	<source media="cd" uri="urn:x-isrc:0123456789" />

The optional ‘software’ child element describes the softwares used for the encoding. The optional ‘title’ attribute describes the software name. The optional ‘version’ attribute describes the software version. The required ‘uri’ attribute should uniquely identify the software (and version).

	<software title="flac" version="2.2" uri="http://xiph.org/flac/" />

Note: The software version attribute is important for one reason; It makes it so much easier to find out what files needs to be re-encoded (from a huge collection) if there ever were a bug in a software release.

Describing performers

The ‘performers’ element describes by whom the resource was performed. The unrequired ‘sort’ attribute describes the preferred performer for listing. (This sorting attribute was included for backwards compatibility with music library managers/players that lists only one artist's name.)

<performers sort="White Stripes, The">

The optional ‘person’ element describes an performer. The ‘name’ child element describes the performer’s name.

	<name>Jack White</name></person>

The optional ‘instrument’ child element describes the instrument used by the performer, with one of the following values ‘wind’, ‘lamellophone’, ‘percussion, ‘string’, ‘voice’, ‘electronic’, and ‘keyboard’. Two additional value to this child element is ‘vocal’, and ‘lead-vocal’. The value can be blank (<code><pre><instrument />
) an indicates that the person is an musician even when the instrument is unknown.
	<name>Jack White</name>

Note: When searching for ‘Jack White’ as a guitarist the above example should suffice as a guitar is grouped under string instrument. This should be considered when implementing the above elements in search engines.


The optional ‘role’ attribute is a space separated list of roles in films with one or more of the following values ‘actor’, ‘director’, and ‘producer’. (NEEDS WORK). The optional ‘as’ attribute describes the name of the character as whom the performer played in a film.

The required ‘actor’ child element has no value. The ‘portrait’ child element describes a fictional name an actors portraits in a movie (his role). The ‘portrait’ child element cannot have no value.

	<name>Gael García Bernal</name>
	<actor />
	<portrait>Stéphane Miroux</portrait></person>
Other and organisations
Describing related texts (lyrics and subtitles)

The ‘texts’ element links media resources with CMML text resources such as song lyrics and film subtitles in the stream. The required ‘uri’ must point to another resource's id attribute or an external web URL resource. The optional ‘type’ attribute specifies the MIME type of external resource. )It is not encouraged to use the type nor URL option. Keep things in the stream, so to speak.)

<texts uri="#example-text-resource" />
Describing recordings

The ‘recording’ element describes recording conditions.


The optional ‘date’ child element describes when the recording was made.


The optional ‘duration’ child element describes how long the recording lasts. This value must be specified as a colon separated value containing days:hours:minutes:seconds:milliseconds. When the value is low enough to not use a field it should be left blank or have the value zero (‘0’). The below examples says zero days, zero hours, seven minutes, four seconds, and 54 milliseconds.


The optional ‘location’ child element describes when the recording was made in a human readable-way. The optional ‘lat’ and ‘long’ attributes are the machine-readable latitude and longitude position of the recording.

	<location lat="22.20N" long="114.11E">Hong Kong, China, Earth</location>
Describing rights

The ‘rights’ element describes the Copyright and license status of the resource.


The optional ‘date’ child element describes when the Copyright were put in place. This is especially useful when determining when a work's Copyright expires.


The optional ‘license’ child element is a short and human-readable version of the full license.

	<license>© 2018 Recording Company. All distribution rights reserved.</license>

The optional ‘link’ child element can point to any URI via it's ‘uri’ attribute where a full version of the license is available. This means it can be pointed to a ‘resource’ element via it's ‘id’ attribute as well!

	<link type="text/html" uri="http://licenses.record-company.com/artist.html" />
Describing titles, subtitles, and taglines

The ‘title’ element describes the resource's title.

<title>Awesome Audio Track</title>

The ‘subtitle’ element describes secondary title.

<subtitle>The Sound of Music</subtitle>

The ‘tagline’ element describes promotional taglines and slogans.

<tagline>Get to the real sound!</tagline>

Below are media type specific children of the ‘resource’ element. The elements are grouped by the media type they describe.


  • 2007-11-25 – Began work with simplifying the format.
  • 2007-09-08 – Wiki page created based on original format and suggestsion from the email list.
  • 2007-09-06 – Format suggested on Xiph's ogg-dev email list by Daniel Aleksandersen.