ROE: Difference between revisions

From XiphWiki
Jump to navigation Jump to search
(added usag example)
mNo edit summary
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Rich Open multitrack media Exposition (ROE)
= Overview =
= Overview =


Line 12: Line 15:


== ROE in Use ==
== ROE in Use ==
A draft version of the spec is in use in media CMS's such as [http://metavid.org/w/index.php/MetaVidWiki MetaVidWiki]. The Dynamic web request senerio is best illustrated by the mv_embed library remote embedding clips from the metavid archive. This can be seen in [http://metavid-mike.blogspot.com/ this blog] demonstrating its usage. If you view source you can see the roe xml that is making ogg, flash video clips available along with multiple timed text tracks in CMML.
A draft version of the spec is in use in the mediaWiki extension [http://metavid.org/w/index.php/MetaVidWiki MetaVidWiki]. This runs on the site [http://metavid.org metavid.org] and is used for remote embeding. In [http://metavid-mike.blogspot.com/ this blog] for example all the clips refrence a single roe file to expose multiple video tracks and text transcripts. [http://metavid.org/w/index.php?title=Special:MvExportStream&feed_format=roe&stream_name=House_proceeding_06-09-08_01&t=0%3A01%3A38%2F0%3A10%3A00 Sample ROE output] from metavid


= The ROE model =
= The ROE model =

Revision as of 23:28, 12 April 2009

Rich Open multitrack media Exposition (ROE)


Overview

ROE (Rich Open multitrack media Exposition) is a way of describing the relationships between tracks of media in a stream. It is used to group tracks which have similar purpose and to identify alternatives.

Usage

Authoring

One use of ROE is to author a multi-track audio-visual stream from multiple input files. In this document, we present a description of how to use ROE to author multi-track Ogg files.

Dynamic Web Requests

Another use of ROE is in a Web client-server scenario. The Web server uses ROE as a means of representing the different tracks that are available for a multi-track Web resource. A Web client may not require all available tracks to present the resource to the user. It may decide to request the ROE representation first and then request only a subset of tracks from the server, e.g. only the English soundtrack. Or it may directly request particular tracks only. The server will use the request from the client to dynamically compose a multi-track stream with the requested tracks and mandatory tracks and serve this to satisfy the resource request.

ROE in Use

A draft version of the spec is in use in the mediaWiki extension MetaVidWiki. This runs on the site metavid.org and is used for remote embeding. In this blog for example all the clips refrence a single roe file to expose multiple video tracks and text transcripts. Sample ROE output from metavid

The ROE model

Here we describe two representations of ROE: that of ROE XML, and that of ROE in Ogg Skeleton. Each representation is capable of entirely encoding the relationships of the ROE model, such that it is possible to losslessly convert between them.

ROE XML

ROE XML is a XML markup language that describes a hierarchical serialization of the ROE model.

A ROE XML file is an instance document of the ROE XML schema.

It is composed of a <head> tag followed by a <body> tag.


Head Element

Head Tags

The <head> tag is optional and may optionally contain:

  • a <title> tag to provide a textual description for the multi-track stream,
  • a set of <link> tags that provide an alternative representation of the multi-track stream, e.g. as a html document,
  • a <img> tag to provide a representative thumbnail for the multi-track stream,
  • a set of <meta> tags that provide structured name-value annotations of the multi-track stream,
  • a <base> tag to provide a base URI for resources referred to in the ROE file, and
  • a set of <profile> tags that allows description of so-called track profiles.

The <title>, <link>, <meta>, and <base> tags are taken out of XHTML and serve the same purpose as they serve there.

Track Profiles

A track profile is a combination of tracks that is pre-defined within the ROE file and can be accessed by Web clients or authoring applications directly. Examples of such profiles are the Director's cut, or the Australian version.

A profile defines a list of references to the tracks of a media resource and possibly a selection from the alternative media sources of the track, to use for a particular pre-defined profile of the resource.

To that end, the profile element has a subelement called "partial" which contains the ID of a selected track and potentially the ID of a selected alternate media source for the track.

An example profile is:

 <profile name="director's cut">
   <partial track="v" select="v1" />
   <partial track="a" />
 </profile>

The <head> tag essentially separates the profiles from the core document structure being provided in the <body> element.


Body Element

The <body> tag consists of a sequence of <track> elements that each describe a logical media track.

The Track Tag

A media track may consist of one of:

  • a media source, such as a audio, video, or text stream described in a <mediaSource> tag,
  • a sequence of media sources described in a <seq> tag with start and end times, or
  • a set of alternate media sources described in a <switch> tag, only one of which can be selected.

The <track> element contains a mandatory "provides" attribute, which introduces a virtual label such as "commentary", "video", "audio", "textoverlay", "closedcaption", "logo", or "scoreboard". The track provides that kind of content.

The Switch Tag

The <switch> tag provides a choice between alternates, distinguished for a specific reason. The reason is given in the "distinction" attribute of the <switch> tag.

Inside a <switch> tag, the choices can be specified through the following means:

  • directly as a <mediaSource>,
  • as a sequence of media sources in a <seq> element, or
  • as the outcome of another <switch> tag.

Example <switch> element:

 <switch distinction="language" default="a3">
   <switch id="a1" distinction="bitrate" default="a1b1">
    <mediaSource id="a1b1" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b1.oga" />
    <mediaSource id="a1b2" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b2.oga" />
   </switch>
   <mediaSource id="a2" lang="de" content-type="audio/vorbis" src="http://example.com/lang2.oga" />
   <seq id="a3">
     <mediaSource id="a3a" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3a.oga" />
     <mediaSource id="a3b" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3b.oga" />
   </seq>
 </switch>

In this example, we have a choice between three languages: en, de and fr. The English language track also comes in two different bitrates. The French language track comes in two different files that should be played in sequence

Inline XML files

Some media source elements are XML documents themselves. These can be represented inline in a ROE file. The purpose of this is to contain all or some the annotation information of a media resource inside one XML file. Thus, the "inline" attribute can have the values "false", "partial" or "full".

An example inline XML file is the use of CMML inside a ROE track:

 <track id="t1" provides="caption">
   <mediaSource id="c" src="http://example.com/cmml1.cmml" inline="partial" content-type="text/cmml" >
     <cmml role="caption" xmlns:cmml="http://www.annodex.org/spec/cmml/cmml40">
       <cmml:head>
         <cmml:title>random 1</cmml:title>
       </cmml:head>
       <cmml:clip start="t1" end="t2">
         <cmml:body>
           <html:p><html:span>rillian:</html:span>FOMS rocks</html:p>
         </cmml:body>
       </cmml:clip>
     </cmml>
   </mediaSource>
 </track>

An example ROE XML file

Putting it all together, here is an example of a ROE XML file:

 <?xml version="1.0"?>
 <xs:schema targetNamespace="http://www.xiph.org/roe1.0"
            xmlns:xs="http://www.w3.org/2001/XMLS
            xmlns:html="http://www.w3.org/1999/xhtml"
            elementFormDefault="qualified"
            attributeFormDefault="unqualified">
 <ROE>
   <head>
     <link id="html_linkback" rel="alternate" type="text/html" href="http://example.com/full_video.html"/>
     <img id="stream_thumb" src="http://example.com/full_video.jpg"/>
     <title>Example video</title>
     <profile name="director's cut">
       <partial track="v" select="v1" />
       <partial track="a" />			
     </profile>
   </head>
   <body>
     <track id="v" provides="video">
       <switch distinction="angle" default="v1">
         <mediaSource id="v1" content-type="video/theora" src="http://example.com/angle1.ogv?track=v1&t=t1/t2" />
         <mediaSource id="v2" content-type="video/theora" src="http://example.com/angle2.ogv" />
       </switch>
     </track>
     <track id="a" provides="audio">
       <switch distinction="Content-Language" default="a3">
         <switch id="a1" distinction="bitrate" default="a1b1">
           <mediaSource id="a1b1" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b1.oga" />
           <mediaSource id="a1b2" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b2.oga" />
         </switch>
         <mediaSource id="a2" lang="de" content-type="audio/vorbis" src="http://example.com/lang2.oga" />
         <seq id="a3">
           <mediaSource id="a3a" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3a.oga" />
           <mediaSource id="a3b" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3b.org" />
         </seq>
       </switch>
     </track>
     <track id="t" provides="text overlay">
       <switch distinction="Content-Language" default="t1">
         <mediaSource id="t1" lang="en" content-type="text/cmml" src="http://example.com/transcript1.cmml" />
         <mediaSource id="t2" lang="de" content-type="text/cmml" src="http://example.com/transcript2.cmml" />
         <mediaSource id="t3" lang="fr" content-type="text/cmml" src="http://example.com/transcript3.cmml" />
       </switch>
     </track>
     <track id="l" provides="logo" default="O1">
       <seq>
     	  <mediaSource id="O1" content-type="application/ogg" src="http://example.com/mng.ogx?track=1" />
         <mediaSource id="O2" content-type="application/ogg" src="http://example.com/mng.ogx?track=2" />
       </seq>
     </track>
   </body>
 </ROE>

Representation in Skeleton

When the relationships described by ROE are written into an Ogg stream, they are encoded using the message header fields of Ogg Skeleton fisbones for each track. One of the primary design goals for fisbone headers is to minimize the need for global information to be stored in a stream. Each track's fisbone contains headers describing only itself and its relationship to other tracks in the stream. This allows tracks to be inserted or removed at the Ogg level without needing to modify any data in individual headers.

Relationships

Relationships between tracks are given by the following headers:

Provides

Provides introduces a virtual label such as "commentary", which this track provides. Many tracks may provide the same such label, and as long as one is present then a dependency on that label can be satisfied.

Depends

This declares that it is not valid to include this track in a stream unless the track it depends on is present. An example use of this might be the generic captioning of sound effects for the deaf, which may not make sense unless the captioning of speech (in an appropriate language) is also rendered. Depends refers to either a virtual label provided by another track, or an explicit track ID.

When removing a track from a file, any other tracks dependent on it must also be removed.

Recommends

Recommends refers to either a virtual label provided by another track, or an explicit track ID.

Suggests

Suggests refers to either a virtual label provided by another track, or an explicit track ID.

Conflicts

Conflicts refers to either a virtual label provided by another track, or an explicit track ID.


Serving Suggestions

Disposition

HTTP-style message headers for client-server negotiation