ROE

From XiphWiki

(Difference between revisions)
Jump to: navigation, search
m (An example ROE XML file)
m
 
(9 intermediate revisions not shown)
Line 1: Line 1:
 +
Rich Open multitrack media Exposition (ROE)
 +
 +
= Overview =
= Overview =
Line 11: Line 14:
Another use of ROE is in a Web client-server scenario. The Web server uses ROE as a means of representing the different tracks that are available for a multi-track Web resource. A Web client may not require all available tracks to present the resource to the user. It may decide to request the ROE representation first and then request only a subset of tracks from the server, e.g. only the English soundtrack. Or it may directly request particular tracks only. The server will use the request from the client to dynamically compose a multi-track stream with the requested tracks and mandatory tracks and serve this to satisfy the resource request.
Another use of ROE is in a Web client-server scenario. The Web server uses ROE as a means of representing the different tracks that are available for a multi-track Web resource. A Web client may not require all available tracks to present the resource to the user. It may decide to request the ROE representation first and then request only a subset of tracks from the server, e.g. only the English soundtrack. Or it may directly request particular tracks only. The server will use the request from the client to dynamically compose a multi-track stream with the requested tracks and mandatory tracks and serve this to satisfy the resource request.
 +
== ROE in Use ==
 +
A draft version of the spec is in use in the mediaWiki extension [http://metavid.org/w/index.php/MetaVidWiki MetaVidWiki]. This runs on the site [http://metavid.org metavid.org] and is used for remote embeding. In [http://metavid-mike.blogspot.com/ this blog] for example all the clips refrence a single roe file to expose multiple video tracks and text transcripts. [http://metavid.org/w/index.php?title=Special:MvExportStream&feed_format=roe&stream_name=House_proceeding_06-09-08_01&t=0%3A01%3A38%2F0%3A10%3A00 Sample ROE output] from metavid
= The ROE model =
= The ROE model =
Line 32: Line 37:
* a <title> tag to provide a textual description for the multi-track stream,
* a <title> tag to provide a textual description for the multi-track stream,
* a set of <link> tags that provide an alternative representation of the multi-track stream, e.g. as a html document,
* a set of <link> tags that provide an alternative representation of the multi-track stream, e.g. as a html document,
 +
* a <img> tag to provide a representative thumbnail for the multi-track stream,
* a set of <meta> tags that provide structured name-value annotations of the multi-track stream,
* a set of <meta> tags that provide structured name-value annotations of the multi-track stream,
* a <base> tag to provide a base URI for resources referred to in the ROE file, and
* a <base> tag to provide a base URI for resources referred to in the ROE file, and
Line 79: Line 85:
Example <switch> element:
Example <switch> element:
-
   <switch distinction="language">
+
   <switch distinction="language" default="a3">
-
     <switch distinction="bitrate">
+
     <switch id="a1" distinction="bitrate" default="a1b1">
     <mediaSource id="a1b1" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b1.oga" />
     <mediaSource id="a1b1" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b1.oga" />
     <mediaSource id="a1b2" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b2.oga" />
     <mediaSource id="a1b2" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b2.oga" />
     </switch>
     </switch>
     <mediaSource id="a2" lang="de" content-type="audio/vorbis" src="http://example.com/lang2.oga" />
     <mediaSource id="a2" lang="de" content-type="audio/vorbis" src="http://example.com/lang2.oga" />
-
     <seq>
+
     <seq id="a3">
-
       <mediaSource id="a3" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3.oga" />
+
       <mediaSource id="a3a" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3a.oga" />
-
       <mediaSource id="a4" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3a.oga" />
+
       <mediaSource id="a3b" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3b.oga" />
     </seq>
     </seq>
   </switch>
   </switch>
Line 96: Line 102:
=== Inline XML files ===
=== Inline XML files ===
-
Some media source elements are XML documents themselves. These can be represented inline in a ROE file. The purpose of this is to contain all the annotation information of a media resource inside one XML file.
+
Some media source elements are XML documents themselves. These can be represented inline in a ROE file. The purpose of this is to contain all or some the annotation information of a media resource inside one XML file. Thus, the "inline" attribute can have the values "false", "partial" or "full".
An example inline XML file is the use of CMML inside a ROE track:
An example inline XML file is the use of CMML inside a ROE track:
   <track id="t1" provides="caption">
   <track id="t1" provides="caption">
-
     <mediaSource id="c" src="http://example.com/cmml1.cmml" inline="true" content-type="text/cmml" >
+
     <mediaSource id="c" src="http://example.com/cmml1.cmml" inline="partial" content-type="text/cmml" >
       <cmml role="caption" xmlns:cmml="http://www.annodex.org/spec/cmml/cmml40">
       <cmml role="caption" xmlns:cmml="http://www.annodex.org/spec/cmml/cmml40">
         <cmml:head>
         <cmml:head>
Line 114: Line 120:
     </mediaSource>
     </mediaSource>
   </track>
   </track>
-
 
== An example ROE XML file ==
== An example ROE XML file ==
Line 120: Line 125:
Putting it all together, here is an example of a ROE XML file:
Putting it all together, here is an example of a ROE XML file:
 +
  <?xml version="1.0"?>
 +
  <xs:schema targetNamespace="http://www.xiph.org/roe1.0"
 +
            xmlns:xs="http://www.w3.org/2001/XMLS
 +
            xmlns:html="http://www.w3.org/1999/xhtml"
 +
            elementFormDefault="qualified"
 +
            attributeFormDefault="unqualified">
   <ROE>
   <ROE>
     <head>
     <head>
 +
      <link id="html_linkback" rel="alternate" type="text/html" href="http://example.com/full_video.html"/>
 +
      <img id="stream_thumb" src="http://example.com/full_video.jpg"/>
 +
      <title>Example video</title>
       <profile name="director's cut">
       <profile name="director's cut">
         <partial track="v" select="v1" />
         <partial track="v" select="v1" />
Line 129: Line 143:
     <body>
     <body>
       <track id="v" provides="video">
       <track id="v" provides="video">
-
         <switch distinction="angle">
+
         <switch distinction="angle" default="v1">
           <mediaSource id="v1" content-type="video/theora" src="http://example.com/angle1.ogv?track=v1&amp;t=t1/t2" />
           <mediaSource id="v1" content-type="video/theora" src="http://example.com/angle1.ogv?track=v1&amp;t=t1/t2" />
           <mediaSource id="v2" content-type="video/theora" src="http://example.com/angle2.ogv" />
           <mediaSource id="v2" content-type="video/theora" src="http://example.com/angle2.ogv" />
Line 135: Line 149:
       </track>
       </track>
       <track id="a" provides="audio">
       <track id="a" provides="audio">
-
         <switch distinction="language">
+
         <switch distinction="Content-Language" default="a3">
-
           <switch distinction="bitrate">
+
           <switch id="a1" distinction="bitrate" default="a1b1">
             <mediaSource id="a1b1" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b1.oga" />
             <mediaSource id="a1b1" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b1.oga" />
             <mediaSource id="a1b2" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b2.oga" />
             <mediaSource id="a1b2" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b2.oga" />
           </switch>
           </switch>
           <mediaSource id="a2" lang="de" content-type="audio/vorbis" src="http://example.com/lang2.oga" />
           <mediaSource id="a2" lang="de" content-type="audio/vorbis" src="http://example.com/lang2.oga" />
-
           <seq>
+
           <seq id="a3">
-
             <mediaSource id="a3" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3.oga" />
+
             <mediaSource id="a3a" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3a.oga" />
-
             <mediaSource id="a4" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3a.org" />
+
             <mediaSource id="a3b" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3b.org" />
           </seq>
           </seq>
         </switch>
         </switch>
       </track>
       </track>
       <track id="t" provides="text overlay">
       <track id="t" provides="text overlay">
-
         <switch distinction="language">
+
         <switch distinction="Content-Language" default="t1">
           <mediaSource id="t1" lang="en" content-type="text/cmml" src="http://example.com/transcript1.cmml" />
           <mediaSource id="t1" lang="en" content-type="text/cmml" src="http://example.com/transcript1.cmml" />
           <mediaSource id="t2" lang="de" content-type="text/cmml" src="http://example.com/transcript2.cmml" />
           <mediaSource id="t2" lang="de" content-type="text/cmml" src="http://example.com/transcript2.cmml" />
Line 154: Line 168:
         </switch>
         </switch>
       </track>
       </track>
-
       <track id="l" provides="logo">
+
       <track id="l" provides="logo" default="O1">
         <seq>
         <seq>
         <mediaSource id="O1" content-type="application/ogg" src="http://example.com/mng.ogx?track=1" />
         <mediaSource id="O1" content-type="application/ogg" src="http://example.com/mng.ogx?track=1" />
Line 198: Line 212:
=== Disposition ===
=== Disposition ===
 +
 +
 +
= HTTP-style message headers for client-server negotiation =

Latest revision as of 06:28, 13 April 2009

Rich Open multitrack media Exposition (ROE)


Contents

Overview

ROE (Rich Open multitrack media Exposition) is a way of describing the relationships between tracks of media in a stream. It is used to group tracks which have similar purpose and to identify alternatives.

Usage

Authoring

One use of ROE is to author a multi-track audio-visual stream from multiple input files. In this document, we present a description of how to use ROE to author multi-track Ogg files.

Dynamic Web Requests

Another use of ROE is in a Web client-server scenario. The Web server uses ROE as a means of representing the different tracks that are available for a multi-track Web resource. A Web client may not require all available tracks to present the resource to the user. It may decide to request the ROE representation first and then request only a subset of tracks from the server, e.g. only the English soundtrack. Or it may directly request particular tracks only. The server will use the request from the client to dynamically compose a multi-track stream with the requested tracks and mandatory tracks and serve this to satisfy the resource request.

ROE in Use

A draft version of the spec is in use in the mediaWiki extension MetaVidWiki. This runs on the site metavid.org and is used for remote embeding. In this blog for example all the clips refrence a single roe file to expose multiple video tracks and text transcripts. Sample ROE output from metavid

The ROE model

Here we describe two representations of ROE: that of ROE XML, and that of ROE in Ogg Skeleton. Each representation is capable of entirely encoding the relationships of the ROE model, such that it is possible to losslessly convert between them.

ROE XML

ROE XML is a XML markup language that describes a hierarchical serialization of the ROE model.

A ROE XML file is an instance document of the ROE XML schema.

It is composed of a <head> tag followed by a <body> tag.


Head Element

Head Tags

The <head> tag is optional and may optionally contain:

  • a <title> tag to provide a textual description for the multi-track stream,
  • a set of <link> tags that provide an alternative representation of the multi-track stream, e.g. as a html document,
  • a <img> tag to provide a representative thumbnail for the multi-track stream,
  • a set of <meta> tags that provide structured name-value annotations of the multi-track stream,
  • a <base> tag to provide a base URI for resources referred to in the ROE file, and
  • a set of <profile> tags that allows description of so-called track profiles.

The <title>, <link>, <meta>, and <base> tags are taken out of XHTML and serve the same purpose as they serve there.

Track Profiles

A track profile is a combination of tracks that is pre-defined within the ROE file and can be accessed by Web clients or authoring applications directly. Examples of such profiles are the Director's cut, or the Australian version.

A profile defines a list of references to the tracks of a media resource and possibly a selection from the alternative media sources of the track, to use for a particular pre-defined profile of the resource.

To that end, the profile element has a subelement called "partial" which contains the ID of a selected track and potentially the ID of a selected alternate media source for the track.

An example profile is:

 <profile name="director's cut">
   <partial track="v" select="v1" />
   <partial track="a" />
 </profile>

The <head> tag essentially separates the profiles from the core document structure being provided in the <body> element.


Body Element

The <body> tag consists of a sequence of <track> elements that each describe a logical media track.

The Track Tag

A media track may consist of one of:

  • a media source, such as a audio, video, or text stream described in a <mediaSource> tag,
  • a sequence of media sources described in a <seq> tag with start and end times, or
  • a set of alternate media sources described in a <switch> tag, only one of which can be selected.

The <track> element contains a mandatory "provides" attribute, which introduces a virtual label such as "commentary", "video", "audio", "textoverlay", "closedcaption", "logo", or "scoreboard". The track provides that kind of content.

The Switch Tag

The <switch> tag provides a choice between alternates, distinguished for a specific reason. The reason is given in the "distinction" attribute of the <switch> tag.

Inside a <switch> tag, the choices can be specified through the following means:

  • directly as a <mediaSource>,
  • as a sequence of media sources in a <seq> element, or
  • as the outcome of another <switch> tag.

Example <switch> element:

 <switch distinction="language" default="a3">
   <switch id="a1" distinction="bitrate" default="a1b1">
    <mediaSource id="a1b1" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b1.oga" />
    <mediaSource id="a1b2" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b2.oga" />
   </switch>
   <mediaSource id="a2" lang="de" content-type="audio/vorbis" src="http://example.com/lang2.oga" />
   <seq id="a3">
     <mediaSource id="a3a" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3a.oga" />
     <mediaSource id="a3b" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3b.oga" />
   </seq>
 </switch>

In this example, we have a choice between three languages: en, de and fr. The English language track also comes in two different bitrates. The French language track comes in two different files that should be played in sequence

Inline XML files

Some media source elements are XML documents themselves. These can be represented inline in a ROE file. The purpose of this is to contain all or some the annotation information of a media resource inside one XML file. Thus, the "inline" attribute can have the values "false", "partial" or "full".

An example inline XML file is the use of CMML inside a ROE track:

 <track id="t1" provides="caption">
   <mediaSource id="c" src="http://example.com/cmml1.cmml" inline="partial" content-type="text/cmml" >
     <cmml role="caption" xmlns:cmml="http://www.annodex.org/spec/cmml/cmml40">
       <cmml:head>
         <cmml:title>random 1</cmml:title>
       </cmml:head>
       <cmml:clip start="t1" end="t2">
         <cmml:body>
           <html:p><html:span>rillian:</html:span>FOMS rocks</html:p>
         </cmml:body>
       </cmml:clip>
     </cmml>
   </mediaSource>
 </track>

An example ROE XML file

Putting it all together, here is an example of a ROE XML file:

 <?xml version="1.0"?>
 <xs:schema targetNamespace="http://www.xiph.org/roe1.0"
            xmlns:xs="http://www.w3.org/2001/XMLS
            xmlns:html="http://www.w3.org/1999/xhtml"
            elementFormDefault="qualified"
            attributeFormDefault="unqualified">
 <ROE>
   <head>
     <link id="html_linkback" rel="alternate" type="text/html" href="http://example.com/full_video.html"/>
     <img id="stream_thumb" src="http://example.com/full_video.jpg"/>
     <title>Example video</title>
     <profile name="director's cut">
       <partial track="v" select="v1" />
       <partial track="a" />			
     </profile>
   </head>
   <body>
     <track id="v" provides="video">
       <switch distinction="angle" default="v1">
         <mediaSource id="v1" content-type="video/theora" src="http://example.com/angle1.ogv?track=v1&t=t1/t2" />
         <mediaSource id="v2" content-type="video/theora" src="http://example.com/angle2.ogv" />
       </switch>
     </track>
     <track id="a" provides="audio">
       <switch distinction="Content-Language" default="a3">
         <switch id="a1" distinction="bitrate" default="a1b1">
           <mediaSource id="a1b1" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b1.oga" />
           <mediaSource id="a1b2" lang="en" content-type="audio/vorbis" src="http://example.com/lang1b2.oga" />
         </switch>
         <mediaSource id="a2" lang="de" content-type="audio/vorbis" src="http://example.com/lang2.oga" />
         <seq id="a3">
           <mediaSource id="a3a" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3a.oga" />
           <mediaSource id="a3b" lang="fr" content-type="audio/vorbis" src="http://example.com/lang3b.org" />
         </seq>
       </switch>
     </track>
     <track id="t" provides="text overlay">
       <switch distinction="Content-Language" default="t1">
         <mediaSource id="t1" lang="en" content-type="text/cmml" src="http://example.com/transcript1.cmml" />
         <mediaSource id="t2" lang="de" content-type="text/cmml" src="http://example.com/transcript2.cmml" />
         <mediaSource id="t3" lang="fr" content-type="text/cmml" src="http://example.com/transcript3.cmml" />
       </switch>
     </track>
     <track id="l" provides="logo" default="O1">
       <seq>
     	  <mediaSource id="O1" content-type="application/ogg" src="http://example.com/mng.ogx?track=1" />
         <mediaSource id="O2" content-type="application/ogg" src="http://example.com/mng.ogx?track=2" />
       </seq>
     </track>
   </body>
 </ROE>

Representation in Skeleton

When the relationships described by ROE are written into an Ogg stream, they are encoded using the message header fields of Ogg Skeleton fisbones for each track. One of the primary design goals for fisbone headers is to minimize the need for global information to be stored in a stream. Each track's fisbone contains headers describing only itself and its relationship to other tracks in the stream. This allows tracks to be inserted or removed at the Ogg level without needing to modify any data in individual headers.

Relationships

Relationships between tracks are given by the following headers:

Provides

Provides introduces a virtual label such as "commentary", which this track provides. Many tracks may provide the same such label, and as long as one is present then a dependency on that label can be satisfied.

Depends

This declares that it is not valid to include this track in a stream unless the track it depends on is present. An example use of this might be the generic captioning of sound effects for the deaf, which may not make sense unless the captioning of speech (in an appropriate language) is also rendered. Depends refers to either a virtual label provided by another track, or an explicit track ID.

When removing a track from a file, any other tracks dependent on it must also be removed.

Recommends

Recommends refers to either a virtual label provided by another track, or an explicit track ID.

Suggests

Suggests refers to either a virtual label provided by another track, or an explicit track ID.

Conflicts

Conflicts refers to either a virtual label provided by another track, or an explicit track ID.


Serving Suggestions

Disposition

HTTP-style message headers for client-server negotiation

Retrieved from "http://wiki.xiph.org/ROE"
Personal tools


Main Page

Xiph.Org Projects

Audio—

Video—

Text—

Container—

Streaming—