SkeletonHeaders: Difference between revisions

Revision as of 11:32, 20 March 2010

Adding Required Headers to Skeleton

With the HTML5 video element, Ogg is now a major format on the Web and is being applied to solve use cases it hasn't had to solve before, but was built to allow, see http://www.xiph.org/ogg/doc/oggstream.html.

One particular such use case is dealing with multitrack audio and video, such as in videos with multiple view angles encoded in one, or ones with a sign language video track, an audio description audio track, a caption track and several subtitle tracks in different languages (i.e. several theora, several vorbis and several kate tracks).

While encoding of multitrack files is already possible, it is unclear how such files would be rendered, how tracks would be differentiated and addressed (e.g. from a JavaScript API), etc. Skeleton has been built in a way such that it is extensible with message header fields for this purpose.

On this wiki page, we are collecting such new information fields.

Content-type

Right now, there is one mandatory message header field for all of the logical bitstreams: the "Content-type" header field, which contains the mime type of the track. The mime types in use here are listed at http://wiki.xiph.org/MIME_Types_and_File_Extensions#Codec_MIME_types.

Language

Content in a track usually originates from a specific language. This language can be specified in a Language message header field. The code is created according to http://www.w3.org/TR/ltli/ and http://www.rfc-editor.org/rfc/bcp/bcp47.txt.

For audio tracks with speech, the Language would be the language that dominates.

For video tracks, it might be the language that is signed (if it is a sign language video), or the language that is most often represented in scene text.

For text tracks, it is the dominating language in the text, e.g. English or German subtitles.

Examples are: en-US, de-DE, sgn-ase, en-cockney

Role

Role describe what semantic type of content is contained in a track. Every track can only have a single role value, so the most appropriate role has to be chosen. The same role can be used across multiple tracks.

The following list some commonly used roles. Other roles are possible, too, but should only be used/introduced if there is really a need for it.

Text tracks:

"text/caption"
"text/subtitle"
"text/textaudiodesc"
"text/karaoke"
"text/chapters"
"text/tickertext"
"text/lyrics"
"text/activeregion"
"text/metadata"
"text/annotation"
"text/transcript"
"text/linguistic"
"text/chapters"

Video tracks:

"video/main"
"video/alternate" (e.g. different camera angle)
"video/sign" (for sign language)
"video/alpha" (a track to alpha blend)

Audio tracks:

"audio/main"
"audio/alternate" (probably linked to an alternate video track)
"audio/dub"
"audio/audiodesc"
"audio/music"
"audio/speech"
"audio/sfx" (sound effects)

Notice how we are re-using the Content-type approach of specifying the main semantic type of the track first. This is necessary, since mime types don't always provide the right main content type (e.g. application/kate is semantically a text format).

There may also be parameters to describe the roles better, such as "video/alternate;angle=nw"

Display-hint

Media players that do not get informed about how a content author intends a media file to be displayed have no change to display the content "correctly". This is why the Display-hint message header field allows providing of hints on how a certain track should be displayed. A media player can of course decide to ignore these hints.

Example hints are:

pip(x,y,w,h) on a video track - picture-in-picture display in relation to the "main" video track with x,y providing the origin of the top left corner of the PIP video and w,h the width and height

mask(x,y,w,h,img) on a video track - use the image given at img url (?) as a video mask to allow the video to appear in shapes other than rectangular. The masking image should be a black shape on a white background. The image is placed at offset x,y and scaled to width and height w and h. Pixels under the white background are made transparent and only pixels under the black shape are retained.

overlay(transparency) on a video track -

alpha(trackref) on a video track -

Name

This field provides the opportunity to associate a free text string with the track to allow direct addressing of the track through its name.

Characters allowed are basically all the characters that are also allowed for XML id fields:

the first character has to be one of:
[A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] |
[#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

any following characters can be one of:
[A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | 
[#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] | 
"-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

The name needs to be unique between all the track names, otherwise it is undefined which of the tracks is retrieved when addressing by name.

An example means of addressing the track by name is: track[name="Madonna_singing"]

Track order

In many applications it is necessary to walk through all the tracks in a media file and address tracks by an index.

In Ogg, the means to number through the tracks is by the order in which the bos pages of the tracks appear in the Ogg stream. If a file is re-encoded, the order may change, so you can only rely on this for addressing if the file doesn't change.

For example, a video file with the following composition would have the following indexes:

track[0]: Skeleton BOS
track[1]: Theora BOS for main video
track[2]: Vorbis BOS for main audio
track[3]: Kate BOS for English captions
track[4]: Kate BOS for German subtitles
track[5]: Vorbis BOS for audio descriptions
track[6]: Theora BOS for sign language

This track order is simply to have a means to address tracks through an index. It has no influence on what should be displayed on top of which other track.

@@ Line 42: / Line 42: @@
 * "text/tickertext"
 * "text/lyrics"
+* "text/activeregion"
+* "text/metadata"
+* "text/annotation"
+* "text/transcript"
+* "text/linguistic"
+* "text/chapters"
 Video tracks:
@@ Line 59: / Line 65: @@
 Notice how we are re-using the Content-type approach of specifying the main semantic type of the track first. This is necessary, since mime types don't always provide the right main content type (e.g. application/kate is semantically a text format).
+There may also be parameters to describe the roles better, such as "video/alternate;angle=nw"
+=== Display-hint ===
+Media players that do not get informed about how a content author intends a media file to be displayed have no change to display the content "correctly". This is why the Display-hint message header field allows providing of hints on how a certain track should be displayed. A media player can of course decide to ignore these hints.
+Example hints are:
+* pip(x,y,w,h) on a video track - picture-in-picture display in relation to the "main" video track with x,y providing the origin of the top left corner of the PIP video and w,h the width and height
+* mask(x,y,w,h,img) on a video track - use the image given at img url (?) as a video mask to allow the video to appear in shapes other than rectangular. The masking image should be a black shape on a white background. The image is placed at offset x,y and scaled to width and height w and h. Pixels under the white background are made transparent and only pixels under the black shape are retained.
+* overlay(transparency) on a video track -
+* alpha(trackref) on a video track -
@@ Line 98: / Line 121: @@
 This track order is simply to have a means to address tracks through an index. It has no influence on what should be displayed on top of which other track.
+== Track dependencies ===