See also the [Google proposal].


[WebVTT] is a subtitle/text track format developed for the HTML5 media element and related applications. This is a proposal for how to encapsulate such tracks in the Matroska/WebM container.

WebVTT is based on the .srt format, effectively the simplest text track format is use, so we can make use of the [Matroska mapping for SRT] as a starting point.

- CodecID is S_WEBVTT
- TrackType is 0x11 subtitle
- File-header metadata, if any, is included verbatim in CodecPrivate
- cue text/payload is included verbatim in the Block data
- timestamp and BlockDuration as set from the cue timestamps

The sequence number and presentation attributes from the cue header are not represented.

The 'Chapters' kind of WebVTT file should be translated in to the equivalent Matroska element for marking chapters. Both formats allow the same hierarchical/overlapping options, so this should be an equivalent representation.


We need some way to signal the 'kind' attribute from the html5 embedding. That is, whether a give track is subtitles, captions, description, or metadata. Use per-track Tag elements? Likewise for 'srclang'.

It would be nice to represent the presentation attributes. How do to that?

Would be be easier to embed the complete cue as is? The parser has to handle this from javascript. The Matroska subtitle guidelines recommend against including the cue timings in the packet data because it makes it harder to edit the file.