SkeletonHeaders

From XiphWiki
Revision as of 03:33, 20 March 2010 by Silvia (talk | contribs) (added roles section)
Jump to navigation Jump to search

Adding Required Headers to Skeleton

With the HTML5 video element, Ogg is now a major format on the Web and is being applied to solve use cases it hasn't had to solve before, but was built to allow, see http://www.xiph.org/ogg/doc/oggstream.html.

One particular such use case is dealing with multitrack audio and video, such as in videos with multiple view angles encoded in one, or ones with a sign language video track, an audio description audio track, a caption track and several subtitle tracks in different languages (i.e. several theora, several vorbis and several kate tracks).

While encoding of multitrack files is already possible, it is unclear how such files would be rendered, how tracks would be differentiated and addressed (e.g. from a JavaScript API), etc. Skeleton has been built in a way such that it is extensible with message header fields for this purpose.

On this wiki page, we are collecting such new information fields.


Content-type

Right now, there is one mandatory message header field for all of the logical bitstreams: the "Content-type" header field, which contains the mime type of the track. The mime types in use here are listed at http://wiki.xiph.org/MIME_Types_and_File_Extensions#Codec_MIME_types.


Language

Content in a track usually originates from a specific language. This language can be specified in a Language message header field. The code is created according to http://www.w3.org/TR/ltli/ and http://www.rfc-editor.org/rfc/bcp/bcp47.txt.

For audio tracks with speech, the Language would be the language that dominates.

For video tracks, it might be the language that is signed (if it is a sign language video), or the language that is most often represented in scene text.

For text tracks, it is the dominating language in the text, e.g. English or German subtitles.

Examples are: en-US, de-DE, sgn-ase, en-cockney


Role

Role describe what semantic type of content is contained in a track. Every track can only have a single role value, so the most appropriate role has to be chosen. The same role can be used across multiple tracks.

The following list some commonly used roles. Other roles are possible, too, but should only be used/introduced if there is really a need for it.

Text tracks:

  • "text/caption"
  • "text/subtitle"
  • "text/textaudiodesc"
  • "text/karaoke"
  • "text/chapters"
  • "text/tickertext"
  • "text/lyrics"

Video tracks:

  • "video/main"
  • "video/alternate" (e.g. different camera angle)
  • "video/sign" (for sign language)
  • "video/alpha" (a track to alpha blend)

Audio tracks:

  • "audio/main"
  • "audio/alternate" (probably linked to an alternate video track)
  • "audio/dub"
  • "audio/audiodesc"
  • "audio/music"
  • "audio/speech"
  • "audio/sfx" (sound effects)

Notice how we are re-using the Content-type approach of specifying the main semantic type of the track first. This is necessary, since mime types don't always provide the right main content type (e.g. application/kate is semantically a text format).


Name

This field provides the opportunity to associate a free text string with the track to allow direct addressing of the track through its name.

Characters allowed are basically all the characters that are also allowed for XML id fields:

the first character has to be one of:
[A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] |
[#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
any following characters can be one of:
[A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | 
[#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] | 
"-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]


The name needs to be unique between all the track names, otherwise it is undefined which of the tracks is retrieved when addressing by name.

An example means of addressing the track by name is: track[name="Madonna_singing"]


Track order

In many applications it is necessary to walk through all the tracks in a media file and address tracks by an index.

In Ogg, the means to number through the tracks is by the order in which the bos pages of the tracks appear in the Ogg stream. If a file is re-encoded, the order may change, so you can only rely on this for addressing if the file doesn't change.

For example, a video file with the following composition would have the following indexes:

  • track[0]: Skeleton BOS
  • track[1]: Theora BOS for main video
  • track[2]: Vorbis BOS for main audio
  • track[3]: Kate BOS for English captions
  • track[4]: Kate BOS for German subtitles
  • track[5]: Vorbis BOS for audio descriptions
  • track[6]: Theora BOS for sign language

This track order is simply to have a means to address tracks through an index. It has no influence on what should be displayed on top of which other track.