ROE (Rich Open multitrack media Exposition) is a way of describing the relationships between tracks of media in a stream. It is used to group tracks which have similar purpose and to identify alternatives.
One use of ROE is to author a multi-track audio-visual stream from multiple input files. In this document, we present a description of how to use ROE to author multi-track Ogg files.
Another use of ROE is in a Web client-server scenario. The Web server uses ROE as a means of representing the different tracks that are available for a multi-track Web resource. A Web client may not require all available tracks to present the resource to the user. It may decide to request the ROE representation first and then request only a subset of tracks from the server, e.g. only the English soundtrack. Or it may directly request particular tracks only. The server will use the request from the client to dynamically compose a multi-track stream with the requested tracks and mandatory tracks and serve this to satisfy the resource request.
The ROE model
Here we describe two representations of ROE: that of ROE XML, and that of ROE in Ogg Skeleton. Each representation is capable of entirely encoding the relationships of the ROE model, such that it is possible to losslessly convert between them.
ROE XML is a hierarchical serialization of the ROE model for a particular stream.
Representation in Skeleton
When the relationships described by ROE are written into an Ogg stream, they are encoded using the message header fields of Ogg Skeleton fisbones for each track. One of the primary design goals for fisbone headers is to minimize the need for global information to be stored in a stream. Each track's fisbone contains headers describing only itself and its relationship to other tracks in the stream. This allows tracks to be inserted or removed at the Ogg level without needing to modify any data in individual headers.
Relationships between tracks are given by the following headers:
Provides introduces a virtual label such as "commentary", which this track provides. Many tracks may provide the same such label, and as long as one is present then a dependency on that label can be satisfied.
This declares that it is not valid to include this track in a stream unless the track it depends on is present. An example use of this might be the generic captioning of sound effects for the deaf, which may not make sense unless the captioning of speech (in an appropriate language) is also rendered. Depends refers to either a virtual label provided by another track, or an explicit track ID.
When removing a track from a file, any other tracks dependent on it must also be removed.
Recommends refers to either a virtual label provided by another track, or an explicit track ID.
Suggests refers to either a virtual label provided by another track, or an explicit track ID.
Conflicts refers to either a virtual label provided by another track, or an explicit track ID.