Revision as of 07:18, 5 January 2009

The following is a draft!

It is at best incomplete and at worst completely broken. In any case, it is not an “official” Xiph spec or codec, so use with care.

Introduction

This page specifies a subclass of HTML documents that is a time-aligned text format for audio-visual content. We call the format "timed divs within HTML" or TDHT. It is intended to be used only in a World Wide Web context i.e. everywhere that Web browser functionality is available. Use cases for the format are subtitles, captions, annotations and other time aligned text as listed at http://wiki.xiph.org/index.php/OggText#Categories_of_Text_Codecs .

TDHT may be similar to W3C TimedText DFXP in many respects, but in comparison to DFXP it does not re-invent HTML, CSS and effects, but rather uses existing HTML, CSS and javascript for these. The purpose of DFXP is to create a web-independent exchange format for timed text, which is why it cannot directly be specified as a subpart of HTML.

TDHT in contrast is HTML with a minimum number of changes. TDHT is parsable by any HTML parser. It works with CSS and javascript. No new functionality has to be defined for TDHT.

File Extension

Files in this format are to be of text/x-tdht mime type.

Files in this format should have a file extension of .tdht .

The TDHT format changes from HTML

TDHT files are time-aligned text. This means there is a time association with blocks of text and there is time-based seeking functionality on those blocks of text.

Here is an example TDHT file for subtitles:

<html>
  <head>
    <title>Desperate Housewives - Season 5, Episode 6</title>
  </head>
  <body>
    <div start="00:00:00,070" end="00:00:02,270">
      <p>Previously on...</p>
    </div>
    <div start="00:00:02,280" end="00:00:04,270">
      <p>We had an agreement to keep things casual.</p>
    </div>
    <div start="00:00:04,280" end="00:00:06,660">
      <p>Susan made her feelings clear.</p>
    </div>
    <div start="00:00:06,800" end="00:00:10,100">
      <p>So if I was with another woman, that wouldn't bother you? No, it wouldn't.</p>
    </div>
  </body>
</html>

Right now, TDHT is based on HTML4.01, but it should also be possible to work on HTML5, which is still in flux.

The following changes to HTML4.01 are made for TDHT:

1. The body element

In HTML4.01, the body element is defined as follows:

<!ELEMENT BODY O O (%block;|SCRIPT)+ +(INS|DEL) -- document body -->
<!ATTLIST BODY
  %attrs;                              -- %coreattrs, %i18n, %events --
  onload          %Script;   #IMPLIED  -- the document has been loaded --
  onunload        %Script;   #IMPLIED  -- the document has been removed --
  >

In TDHT1.0 we restrict it to just contain a sequence of div tags:

<!ELEMENT BODY O O (DIV)+ -- document body -->
<!ATTLIST BODY
  %attrs;                              -- %coreattrs, %i18n, %events --
  onload          %Script;   #IMPLIED  -- the document has been loaded --
  onunload        %Script;   #IMPLIED  -- the document has been removed --
  >

The div tags in turn can contain anything that HTML div tags can contain, thus enabling a very flexible, but time-aligned text model.

2. The div element

In HTML4.01, the div element is defined as follows:

<!ELEMENT DIV - - (%flow;)*            -- generic language/style container -->
<!ATTLIST DIV
  %attrs;                              -- %coreattrs, %i18n, %events --
  >

In TDHT1.0 we extend it with start and end time attributes:

<!ELEMENT DIV - - (%flow;)*            -- generic language/style container -->
<!ATTLIST DIV
  %attrs;                              -- %coreattrs, %i18n, %events --
  start           %Time;    #IMPLIED   -- start time
  end             %Time;    #IMPLIED   -- end time
  >

The Time entity is defined in HTML5: http://www.whatwg.org/specs/web-apps/current-work/#valid-time-string .

Rendering in a Web Browser

A TDHT file is meant to be associated with a audio or video file and rendered in a Web browser in sync with the audio or video file.

For security reasons, the TDHT file's div elements are not rendered into an existing HTML page, but rather a TDHT file creates its own iframe-like new nested browsing context.

The rendering and CSS view port are either by default the rectangle occupied by the given <video> or <audio> tag, or an area provided for by the parent HTML page. The zoom factor of the iframe must be set to such a value that the width of the view port established by the iframe is equally wide in CSS px as the video frame is wide in codec pixels. (Example: If the video encodes a frame that is 240 pixels wide but is displayed at 480 CSS px wide, the zoom factor of the iframe should be 200% so that the box that on the outsize measures 480 px seems like a box of 240 px from within the iframe.)

When the browser happens upon a TDHT file, it must create a document by calling createDocument() on DOMImplementation and then calling open() on the created document. The browser must insert a <html> element in the HTML namespace as the root element of the document and insert <head> and <body> elements in the HTML namespace into the root element.

A TDHT file can either be received by a HTML parser in one go (as a TDHT file) or a div-less TDHT file can be received together with its <head> tag and create a HTML shell into which the

elements can be added as they come (e.g. from a video file that is decoded and played back in parallel) by using a HTML fragment parser.

The <head> tag must decode into a DOMString (using REPLACEMENT CHARACTER on errors) and set the TDHT DOM property of the <head> element to the DOMString.

As the browser plays the video, it must render the TDHT

tags in sync. As the start time of a

tag is reached, the

tag appears, and it is removed as the

tag's end time is reached. If no start time is given, the start is assumed to be 0, and if no end time is given, end is assumed to be infinity. As the browser has parsed the TDHT file or its consitutent

tags, it is expected to keep the structure in memory. When seeking happens on the video, it can then decide upon which

tags are supposed to be active at the seek time and display these.

Encapsulation into Ogg

The OggText specification is used to encapsulate a TDHT file into Ogg.

The codec-specific header data for the OggText ident header is the <head>..</head> part of the TDHT file. The complete <head> tag including all its subtags is encoded into the ident header unchanged.

The

tags are the data packets of the TDHT text codec and are thus encapsulated into the data packets as text codec data. A complete

including all its subtags is encoded into one data packet each.

@@ Line 106: / Line 106: @@
 A TDHT file is meant to be associated with a audio or video file and rendered in a Web browser in sync with the audio or video file.
-For security reasons, the TDHT file's div elements are not rendered into an existing HTML page, but rather a TDHT file creates its own [http://www.whatwg.org/specs/web-apps/current-work/#the-iframe-element iframe-like] new nested browsing context. The rendering is either by default on top of a given <video> or <audio> tag area, or into an area provided for by the parent HtML page.
+For security reasons, the TDHT file's div elements are not rendered into an existing HTML page, but rather a TDHT file creates its own [http://www.whatwg.org/specs/web-apps/current-work/#the-iframe-element iframe-like] new nested browsing context.
-A TDHT file can either be received by a HTML parser in one go (as a TDHT file) or a div-less TDHT file can be received and create a
+The rendering and CSS view port are either by default the rectangle occupied by the given <video> or <audio> tag, or an area provided for by the parent HTML page. The zoom factor of the iframe must be set to such a value that the width of the view port established by the iframe is equally wide in CSS px as the video frame is wide in codec pixels. (Example: If the video encodes a frame that is 240 pixels wide but is displayed at 480 CSS px wide, the zoom factor of the iframe should be 200% so that the box that on the outsize measures 480 px seems like a box of 240 px from within the iframe.)
+When the browser happens upon a TDHT file, it must create a document by calling createDocument() on DOMImplementation and then calling open() on the created document. The browser must insert a <html> element in the HTML namespace as the root element of the document and insert <head> and <body> elements in the HTML namespace into the root element.
+A TDHT file can either be received by a HTML parser in one go (as a TDHT file) or a div-less TDHT file can be received together with its <head> tag and create a HTML shell into which the <div> elements can be added as they come (e.g. from a video file that is decoded and played back in parallel) by using a HTML fragment parser.
+The <head> tag must decode into a DOMString (using REPLACEMENT CHARACTER on errors) and set the TDHT DOM property of the <head> element to the DOMString.
+As the browser plays the video, it must render the TDHT <div> tags in sync. As the start time of a <div> tag is reached, the <div> tag appears, and it is removed as the <div> tag's end time is reached. If no start time is given, the start is assumed to be 0, and if no end time is given, end is assumed to be infinity.
+As the browser has parsed the TDHT file or its consitutent <div> tags, it is expected to keep the structure in memory. When seeking happens on the video, it can then decide upon which <div> tags are supposed to be active at the seek time and display these.
+= Encapsulation into Ogg =
+The [http://wiki.xiph.org/index.php/OggText OggText] specification is used to encapsulate a TDHT file into Ogg.
+The codec-specific header data for the OggText ident header is the <head>..</head> part of the TDHT file. The complete <head> tag including all its subtags is encoded into the ident header unchanged.
+The <div> tags are the data packets of the TDHT text codec and are thus encapsulated into the data packets as text codec data. A complete <div> including all its subtags is encoded into one data packet each.