SkeletonHeaders: Difference between revisions

Revision as of 10:33, 20 March 2010

Adding Required Headers to Skeleton

With the HTML5 video element, Ogg is now a major format on the Web and is being applied to solve use cases it hasn't had to solve before, but was built to allow, see http://www.xiph.org/ogg/doc/oggstream.html.

One particular such use case is dealing with multitrack audio and video, such as in videos with multiple view angles encoded in one, or ones with a sign language video track, an audio description audio track, a caption track and several subtitle tracks in different languages (i.e. several theora, several vorbis and several kate tracks).

While encoding of multitrack files is already possible, it is unclear how such files would be rendered, how tracks would be differentiated and addressed (e.g. from a JavaScript API), etc. Skeleton has been built in a way such that it is extensible with message header fields for this purpose.

On this wiki page, we are collecting such new information fields.

Content-type

Right now, there is one mandatory message header field for all of the logical bitstreams: the "Content-type" header field, which contains the mime type of the track. The mime types in use here are listed at http://wiki.xiph.org/MIME_Types_and_File_Extensions#Codec_MIME_types.

Language

Content in a track usually originates from a specific language. This language can be specified in a Language message header field. The code is created according to http://www.w3.org/TR/ltli/ and http://www.rfc-editor.org/rfc/bcp/bcp47.txt.

For audio tracks with speech, the Language would be the language that dominates.

For video tracks, it might be the language that is signed (if it is a sign language video), or the language that is most often represented in scene text.

For text tracks, it is the dominating language in the text, e.g. English or German subtitles.

Examples are: en-US, de-DE, sgn-ase, en-cockney

Role

Role describe what semantic type of content is contained in a track. Every track can only have a single role value, so the most appropriate role has to be chosen. The same role can be used across multiple tracks.

The following list some commonly used roles. Other roles are possible, too, but should only be used/introduced if there is really a need for it.

Text tracks:

"text/caption"
"text/subtitle"
"text/textaudiodesc"
"text/karaoke"
"text/chapters"
"text/tickertext"
"text/lyrics"

Video tracks:

"video/main"
"video/alternate" (e.g. different camera angle)
"video/sign" (for sign language)
"video/alpha" (a track to alpha blend)

Audio tracks:

"audio/main"
"audio/alternate" (probably linked to an alternate video track)
"audio/dub"
"audio/audiodesc"
"audio/music"
"audio/speech"
"audio/sfx" (sound effects)

Notice how we are re-using the Content-type approach of specifying the main semantic type of the track first. This is necessary, since mime types don't always provide the right main content type (e.g. application/kate is semantically a text format).

Name

This field provides the opportunity to associate a free text string with the track to allow direct addressing of the track through its name.

Characters allowed are basically all the characters that are also allowed for XML id fields:

the first character has to be one of:
[A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] |
[#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

any following characters can be one of:
[A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | 
[#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] | 
"-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

The name needs to be unique between all the track names, otherwise it is undefined which of the tracks is retrieved when addressing by name.

An example means of addressing the track by name is: track[name="Madonna_singing"]

Track order

In many applications it is necessary to walk through all the tracks in a media file and address tracks by an index.

In Ogg, the means to number through the tracks is by the order in which the bos pages of the tracks appear in the Ogg stream. If a file is re-encoded, the order may change, so you can only rely on this for addressing if the file doesn't change.

For example, a video file with the following composition would have the following indexes:

track[0]: Skeleton BOS
track[1]: Theora BOS for main video
track[2]: Vorbis BOS for main audio
track[3]: Kate BOS for English captions
track[4]: Kate BOS for German subtitles
track[5]: Vorbis BOS for audio descriptions
track[6]: Theora BOS for sign language

This track order is simply to have a means to address tracks through an index. It has no influence on what should be displayed on top of which other track.

@@ Line 12: / Line 12: @@
 === Content-type ===
-Right now, there is one mandatory Message header field for all of the logical bitstreams: the "Content-type" header field, which contains the mime type of the track. The mime types in use here are listed at http://wiki.xiph.org/MIME_Types_and_File_Extensions#Codec_MIME_types.
+Right now, there is one mandatory message header field for all of the logical bitstreams: the "Content-type" header field, which contains the mime type of the track. The mime types in use here are listed at http://wiki.xiph.org/MIME_Types_and_File_Extensions#Codec_MIME_types.
-=== Content-role ===
+=== Language ===
+Content in a track usually originates from a specific language. This language can be specified in a Language message header field. The code is created according to http://www.w3.org/TR/ltli/ and http://www.rfc-editor.org/rfc/bcp/bcp47.txt.
-=== Content-language ===
+For audio tracks with speech, the Language would be the language that dominates.
+For video tracks, it might be the language that is signed (if it is a sign language video), or the language that is most often represented in scene text.
+For text tracks, it is the dominating language in the text, e.g. English or German subtitles.
+Examples are: en-US, de-DE, sgn-ase, en-cockney
+=== Role ===
+Role describe what semantic type of content is contained in a track. Every track can only have a single role value, so the most appropriate role has to be chosen. The same role can be used across multiple tracks.
+The following list some commonly used roles. Other roles are possible, too, but should only be used/introduced if there is really a need for it.
+Text tracks:
+* "text/caption"
+* "text/subtitle"
+* "text/textaudiodesc"
+* "text/karaoke"
+* "text/chapters"
+* "text/tickertext"
+* "text/lyrics"
+Video tracks:
+* "video/main"
+* "video/alternate" (e.g. different camera angle)
+* "video/sign" (for sign language)
+* "video/alpha" (a track to alpha blend)
+Audio tracks:
+* "audio/main"
+* "audio/alternate" (probably linked to an alternate video track)
+* "audio/dub"
+* "audio/audiodesc"
+* "audio/music"
+* "audio/speech"
+* "audio/sfx" (sound effects)
+Notice how we are re-using the Content-type approach of specifying the main semantic type of the track first. This is necessary, since mime types don't always provide the right main content type (e.g. application/kate is semantically a text format).
+=== Name ===
+This field provides the opportunity to associate a free text string with the track to allow direct addressing of the track through its name.
+Characters allowed are basically all the characters that are also allowed for XML id fields:
+ the first character has to be one of:
+ [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] |
+ [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
+ any following characters can be one of:
+ [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] |
+ [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] |
+ "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
+The name needs to be unique between all the track names, otherwise it is undefined which of the tracks is retrieved when addressing by name.
+An example means of addressing the track by name is: track[name="Madonna_singing"]