Timed Divs HTML: Difference between revisions

From XiphWiki
Jump to navigation Jump to search
(renamed <text> to <itext>)
m (Undo revision 12270 by Vsimon213 (Talk))
 
(15 intermediate revisions by 2 users not shown)
Line 12: Line 12:
= File Extension =
= File Extension =


Files in this format are to be of text/x-tdht mime type.
Files in this format are to be of text/html mime type since they are valid html files, apart from some extra attributes.
 
Files in this format should have a file extension of .tdht .


Files in this format should have a file extension of .tdht to separate them from plain html files.


= The TDHT format changes from HTML =
= The TDHT format changes from HTML =
Line 29: Line 28:
   </head>
   </head>
   <body>
   <body>
     <div start="00:00:00,070" end="00:00:02,270">
     <div start="00:00:00.070" end="00:00:02.270">
       <p>Previously on...</p>
       <p>Previously on...</p>
     </div>
     </div>
     <div start="00:00:02,280" end="00:00:04,270">
     <div start="00:00:02.280" end="00:00:04.270">
       <p>We had an agreement to keep things casual.</p>
       <p>We had an agreement to keep things casual.</p>
     </div>
     </div>
     <div start="00:00:04,280" end="00:00:06,660">
     <div start="00:00:04.280" end="00:00:06.660">
       <p>Susan made her feelings clear.</p>
       <p>Susan made her feelings clear.</p>
     </div>
     </div>
     <div start="00:00:06,800" end="00:00:10,100">
     <div start="00:00:06.800" end="00:00:10.100">
       <p>So if I was with another woman, that wouldn't bother you? No, it wouldn't.</p>
       <p>So if I was with another woman, that wouldn't bother you? No, it wouldn't.</p>
     </div>
     </div>
Line 45: Line 44:
</pre>
</pre>


Right now, TDHT is based on [http://www.w3.org/TR/html401/ HTML4.01], but it should also be possible to work on [http://www.whatwg.org/specs/web-apps/current-work/ HTML5], which is still in flux.
The differences of TDHT from HTML are described using [http://www.w3.org/TR/html401/ HTML4.01], but the changes apply the same to [http://www.whatwg.org/specs/web-apps/current-work/ HTML5], which doesn't have a normative schema.


The following changes to HTML4.01 are made for TDHT:
The following changes to HTML are made for TDHT:




Line 63: Line 62:
</pre>
</pre>


In TDHT1.0 we restrict it to just contain a sequence of div tags:
In TDHT1.0 we restrict body to just contain a sequence of div tags:


<pre>
<pre>
Line 74: Line 73:
</pre>
</pre>


Any tags inside the body tag that are non-conformant to this specification (such as regular html tags that are allowed inside body) should be ignored for TDHT.
Any tags inside the body tag that are non-conformant to this specification (such as regular html tags that are allowed inside body) must be ignored for TDHT.


The div tags, however, can contain anything that HTML div tags can contain, thus enabling a very flexible, but time-aligned text model.
The div tags, however, can contain anything that HTML div tags can contain, thus enabling a very flexible, but time-aligned text model.
Line 80: Line 79:
== 2. The div element ==
== 2. The div element ==


In HTML4.01, the [http://www.w3.org/TR/html401/struct/global.html#h-7.5.4 div element] is defined as follows:
In HTML, the [http://www.w3.org/TR/html401/struct/global.html#h-7.5.4 div element] is defined as follows:


<pre>
<pre>
Line 100: Line 99:
</pre>
</pre>


The Time entity is defined in HTML5: http://www.whatwg.org/specs/web-apps/current-work/#valid-time-string .
The Time entity represents a valid time string accroding to HTML5: http://www.whatwg.org/specs/web-apps/current-work/#valid-time-string . The end time string must be larger than the start time string, otherwise the div element does not exist for any duration and can never turn active.


&lt;div> elements in a TDHT file should be ordered by start time to simplify parsing. Inside Ogg or when rendered, they will be ordered by start time.


= Rendering in a Web Browser =
= Rendering in a Web Browser =
Line 107: Line 107:
A TDHT file is meant to be associated with a audio or video file and rendered in a Web browser in sync with the audio or video file.
A TDHT file is meant to be associated with a audio or video file and rendered in a Web browser in sync with the audio or video file.


The TDHT file's div elements are not rendered into an existing HTML page, but rather a TDHT file creates its own [http://www.whatwg.org/specs/web-apps/current-work/#the-iframe-element iframe-like] new nested browsing context. This is because a TDHT file can come from a different URI than the Web page and thus for security reasons and for general base URI computations a nested browsing context is the better approach with the DOM nodes of the hosting page and the DOM nodes of the TDHT document in different owner documents. That way, the hosting document has the security origin of its own URL and the TDHT document has the security origin of its URL.  
The TDHT file's div elements are not rendered into an existing HTML page, but rather a TDHT file creates its own [http://www.whatwg.org/specs/web-apps/current-work/#the-iframe-element iframe-like] new nested browsing context. It is linked to the parent HTML page through an itext element that is inserted as a child of the video element. Creation of a nested browsing context is important because a TDHT file can come from a different URI than the Web page and thus for security reasons and for general base URI computations a nested browsing context is the better approach with the DOM nodes of the hosting page and the DOM nodes of the TDHT document in different owner documents. That way, the hosting document has the security origin of its own URL and the TDHT document has the security origin of its URL.  


The rendering and CSS view port are either by default the rectangle occupied by the given <video> or <audio> tag, or an area provided for by the hosting HTML page. The zoom factor of the iframe must be set to such a value that the width of the view port established by the iframe is equally wide in CSS px as the video frame is wide in codec pixels. (Example: If the video encodes a frame that is 240 pixels wide but is displayed at 480 CSS px wide, the zoom factor of the iframe should be 200% so that the box that on the outsize measures 480 px seems like a box of 240 px from within the iframe.)
The rendering and CSS view port are either by default the rectangle occupied by the given <video> or <audio> tag, or an area provided for by the hosting HTML page through the itext element's properties. The zoom factor of the iframe must be set to such a value that the width of the view port established by the itext frame is equally wide in CSS px as the video frame is wide in codec pixels. (Example: If the video encodes a frame that is 240 pixels wide but is displayed at 480 CSS px wide, the zoom factor of the itext frame should be 200% so that the box that on the outsize measures 480 px seems like a box of 240 px from within the itext frame.)


When the browser happens upon a TDHT file, it must create a document by calling createDocument() on DOMImplementation and then calling open() on the created document. The browser must insert a <html> element in the HTML namespace as the root element of the document and insert <head> and <body> elements in the HTML namespace into the root element.
The itext frame is by default transparent.


A TDHT file can either be received by a HTML parser in one go (as a TDHT file) or a div-less TDHT file can be received together with its <head> tag and create a HTML shell into which the &lt;div> elements can be added as they come (e.g. from a video file that is decoded and played back in parallel) by using a HTML fragment parser.
A TDHT file can get to a browser either as a external resource, or as part of audio or video resource (in particular inside Ogg, see below). Parsing in these two cases is slightly different for the browser.


The <head> tag must decode into a DOMString (using REPLACEMENT CHARACTER on errors) and set the TDHT DOM property of the <head> element to the DOMString.
For the external TDHT file case:
The TDHT file is parsed using the HTML5 parsing algorithm in its normal mode into a non-rendered DOM. To render a div, the children of the div would be cloned into the body of the rendering shell document (replacing possible previous children of body).


As the browser plays the video, it must render the TDHT &lt;div> tags in sync. As the start time of a &lt;div> tag is reached, the &lt;div> tag appears, and it is removed as the &lt;div> tag's end time is reached. If no start time is given, the start is assumed to be 0, and if no end time is given, end is assumed to be infinity.
For the Ogg-internal TDHT case:
To multiplex an external TDHT file into Ogg, each div with its innerHTML would be placed into a data packet and the head data in to an Ogg header. For decoding, the rendering shell document is set up and the head tag is included from the Ogg headers. To render a packet, the div and its innerHTML would be added to the innerHTML of the body element of the rendering shell document as they come. This will use the HTML fragment parser.


As the browser has parsed the TDHT file or its consitutent &lt;div> tags, it is expected to keep the structure in memory. When seeking happens on the video, it can then decide upon which &lt;div> tags are supposed to be active at the seek time and display these. [There is a discussion to be had here about the effect this has on the DOM. Different selectors may apply to a caption depending on whether the video was played back all the way there or seeking skipped over data to get there. It was suggested that inactive captions should be removed from the DOM, so there's always a well-defined small unambiguous DOM to match selectors against. However, this may for example not be desirable on some text display formats.]
As the browser plays the video, it must render the TDHT &lt;div> tags in sync. As the start time of a &lt;div> tag is reached, the &lt;div> tag is made activate, and it is made inactive as the &lt;div> tag's end time is reached. If no start time is given, the start is assumed to be 0, and if no end time is given, end is assumed to be infinity.


An "active" &lt;div> tag may, incidentally, be a &lt;div> tag that is being displayed ("display: block") in contrast to an "inactive" &lt;div> tag, which may not be displayed ("display: none"). For some text formats however the difference between "active" and "inactive" may be a background colour or the display location on screen or some other mechanism. The default should be between "block" and "inactive", but changeable through CSS.
An "active" &lt;div> tag may be a &lt;div> tag that is being displayed ("display: block") in contrast to an "inactive" &lt;div> tag, which may not be displayed ("display: none"). For some text formats however the difference between "active" and "inactive" may be a background colour or the display location on screen or some other mechanism. The default should be between "block" and "none", but changeable through CSS.


As the browser has parsed the TDHT file or its consitutent &lt;div> tags, it is expected to keep the structure in memory. When seeking happens on the video, it can then decide upon which &lt;div> tags are supposed to be active at the seek time and display these. [There is a discussion to be had here about the effect this has on the DOM. Different selectors may apply to a caption depending on whether the video was played back all the way there or seeking skipped over data to get there. It was suggested that inactive captions should be removed from the DOM, so there's always a well-defined small unambiguous DOM to match selectors against. However, this may for example not be desirable on some text display formats.]


= Encapsulation into Ogg =
= Encapsulation into Ogg =
Line 130: Line 133:
The codec-specific header data for the OggText ident header is the <head>..</head> part of the TDHT file. The complete <head> tag including all its subtags is encoded into the ident header unchanged.
The codec-specific header data for the OggText ident header is the <head>..</head> part of the TDHT file. The complete <head> tag including all its subtags is encoded into the ident header unchanged.


The &lt;div> tags are the data packets of the TDHT text codec and are thus encapsulated into the data packets as text codec data. A complete &lt;div> including all its subtags is encoded into one data packet each.
The &lt;div> elements with all their inner HTML are the data packets of the TDHT text codec and are thus encapsulated into the data packets as text codec data. A complete &lt;div> including all its subtags is encoded into one data packet each.
 


= Direct linking on a HTML5 page =
= Direct linking on a HTML5 page =
Line 140: Line 142:


<pre>
<pre>
<video src="http://example.com/video.ogv" controls>
<video i="video" src="http://example.com/video.ogv" controls>
   <itext category="CC" lang="en" src="caption.srt" style=""></itext>
   <itext id="caption1"  category="CC" lang="en/us" src="caption.srt" style=""></itext>
   <itext category="CC" lang="de" src="caption.tdht" style=""></itext>
   <itext id="caption2"  category="CC" lang="de/de" src="caption.tdht" style=""></itext>
   <itext category="SUB" lang="de" src="german.dfxp" style=""></itext>
   <itext id="subtitle1" category="SUB" lang="de/de" src="german.dfxp" style=""></itext>
   <itext category="SUB" lang="jp" src="japanese.smil" style="></itext>
   <itext id="subtitle2" category="SUB" lang="jp" src="japanese.smil" style="></itext>
   <itext category="SUB" lang="fr" src="translation_webservice/fr/caption.srt" style=""></itext>
   <itext id="subtitle3" category="SUB" lang="fr" src="translation_webservice/fr/caption.srt" style=""></itext>
</video>
</video>
</pre>
</pre>


Notice the second set of closed captions being a TDHT file.
Notice the second set of closed captions being a TDHT file.
The id tag is simply a unique identifier for the tag.
The category is from [http://wiki.xiph.org/index.php/OggText#Categories_of_Text_Codecs Ogg text categories].
The lang contains a natural language according to [http://en.wikipedia.org/wiki/Language_code  language codes].
The src element contains the actual file URI that we are after.
The style element allows to attach styling to marked-up import files.


The <itext> element would act like an <iframe> element and create the nested browsing context described earlier. It has been renamed from earlier mentions of this approach from <text> to <itext> to avoid name clashes with e.g. SVG.
The <itext> element would act like an <iframe> element and create the nested browsing context described earlier. It has been renamed from earlier mentions of this approach from <text> to <itext> to avoid name clashes with e.g. SVG.
The user agent would then provide an interface such as:
  interface MediaItextElement : HTMLElement {
    attribute DOMString src;
    attribute DOMString category;
    attribute DOMString lang;
    attribute DOMString id;
    attribute DOMString style;
  };
In javascript there will need to be additional functions such as:
  getItext (): returns an array of time-aligned text elements
  addItext({src,category,lang,style,name}): adds a time-aligned text element to a <video> or <audio> element
  enable(itextElement): activates display of an itext file
  disable(itextElement) : deactivates display of an itext file
  delay(itextElement, seconds) : delays the itext file in relation to the video file by a positive or negative number of seconds

Latest revision as of 10:45, 9 July 2010


Introduction

This page specifies a subclass of HTML documents that is a time-aligned text format for audio-visual content. We call the format "timed divs within HTML" or TDHT. It is intended to be used only in a World Wide Web context i.e. everywhere that Web browser functionality is available. Use cases for the format are subtitles, captions, annotations and other time aligned text as listed at http://wiki.xiph.org/index.php/OggText#Categories_of_Text_Codecs .

TDHT may be similar to W3C TimedText DFXP in many respects, but in comparison to DFXP it does not re-invent HTML, CSS and effects, but rather uses existing HTML, CSS and javascript for these. The purpose of DFXP is to create a web-independent exchange format for timed text, which is why it cannot directly be specified as a subpart of HTML.

TDHT in contrast is HTML with a minimum number of changes. TDHT is parsable by any HTML parser. It works with CSS and javascript. No new functionality has to be defined for TDHT.


File Extension

Files in this format are to be of text/html mime type since they are valid html files, apart from some extra attributes.

Files in this format should have a file extension of .tdht to separate them from plain html files.

The TDHT format changes from HTML

TDHT files are time-aligned text. This means there is a time association with blocks of text and there is time-based seeking functionality on those blocks of text.

Here is an example TDHT file for subtitles:

<html>
  <head>
    <title>Desperate Housewives - Season 5, Episode 6</title>
  </head>
  <body>
    <div start="00:00:00.070" end="00:00:02.270">
      <p>Previously on...</p>
    </div>
    <div start="00:00:02.280" end="00:00:04.270">
      <p>We had an agreement to keep things casual.</p>
    </div>
    <div start="00:00:04.280" end="00:00:06.660">
      <p>Susan made her feelings clear.</p>
    </div>
    <div start="00:00:06.800" end="00:00:10.100">
      <p>So if I was with another woman, that wouldn't bother you? No, it wouldn't.</p>
    </div>
  </body>
</html>

The differences of TDHT from HTML are described using HTML4.01, but the changes apply the same to HTML5, which doesn't have a normative schema.

The following changes to HTML are made for TDHT:


1. The body element

In HTML4.01, the body element is defined as follows:

<!ELEMENT BODY O O (%block;|SCRIPT)+ +(INS|DEL) -- document body -->
<!ATTLIST BODY
  %attrs;                              -- %coreattrs, %i18n, %events --
  onload          %Script;   #IMPLIED  -- the document has been loaded --
  onunload        %Script;   #IMPLIED  -- the document has been removed --
  >

In TDHT1.0 we restrict body to just contain a sequence of div tags:

<!ELEMENT BODY O O (DIV)+ -- document body -->
<!ATTLIST BODY
  %attrs;                              -- %coreattrs, %i18n, %events --
  onload          %Script;   #IMPLIED  -- the document has been loaded --
  onunload        %Script;   #IMPLIED  -- the document has been removed --
  >

Any tags inside the body tag that are non-conformant to this specification (such as regular html tags that are allowed inside body) must be ignored for TDHT.

The div tags, however, can contain anything that HTML div tags can contain, thus enabling a very flexible, but time-aligned text model.

2. The div element

In HTML, the div element is defined as follows:

<!ELEMENT DIV - - (%flow;)*            -- generic language/style container -->
<!ATTLIST DIV
  %attrs;                              -- %coreattrs, %i18n, %events --
  >

In TDHT1.0 we extend it with start and end time attributes:

<!ELEMENT DIV - - (%flow;)*            -- generic language/style container -->
<!ATTLIST DIV
  %attrs;                              -- %coreattrs, %i18n, %events --
  start           %Time;    #IMPLIED   -- start time
  end             %Time;    #IMPLIED   -- end time
  >

The Time entity represents a valid time string accroding to HTML5: http://www.whatwg.org/specs/web-apps/current-work/#valid-time-string . The end time string must be larger than the start time string, otherwise the div element does not exist for any duration and can never turn active.

<div> elements in a TDHT file should be ordered by start time to simplify parsing. Inside Ogg or when rendered, they will be ordered by start time.

Rendering in a Web Browser

A TDHT file is meant to be associated with a audio or video file and rendered in a Web browser in sync with the audio or video file.

The TDHT file's div elements are not rendered into an existing HTML page, but rather a TDHT file creates its own iframe-like new nested browsing context. It is linked to the parent HTML page through an itext element that is inserted as a child of the video element. Creation of a nested browsing context is important because a TDHT file can come from a different URI than the Web page and thus for security reasons and for general base URI computations a nested browsing context is the better approach with the DOM nodes of the hosting page and the DOM nodes of the TDHT document in different owner documents. That way, the hosting document has the security origin of its own URL and the TDHT document has the security origin of its URL.

The rendering and CSS view port are either by default the rectangle occupied by the given <video> or <audio> tag, or an area provided for by the hosting HTML page through the itext element's properties. The zoom factor of the iframe must be set to such a value that the width of the view port established by the itext frame is equally wide in CSS px as the video frame is wide in codec pixels. (Example: If the video encodes a frame that is 240 pixels wide but is displayed at 480 CSS px wide, the zoom factor of the itext frame should be 200% so that the box that on the outsize measures 480 px seems like a box of 240 px from within the itext frame.)

The itext frame is by default transparent.

A TDHT file can get to a browser either as a external resource, or as part of audio or video resource (in particular inside Ogg, see below). Parsing in these two cases is slightly different for the browser.

For the external TDHT file case: The TDHT file is parsed using the HTML5 parsing algorithm in its normal mode into a non-rendered DOM. To render a div, the children of the div would be cloned into the body of the rendering shell document (replacing possible previous children of body).

For the Ogg-internal TDHT case: To multiplex an external TDHT file into Ogg, each div with its innerHTML would be placed into a data packet and the head data in to an Ogg header. For decoding, the rendering shell document is set up and the head tag is included from the Ogg headers. To render a packet, the div and its innerHTML would be added to the innerHTML of the body element of the rendering shell document as they come. This will use the HTML fragment parser.

As the browser plays the video, it must render the TDHT <div> tags in sync. As the start time of a <div> tag is reached, the <div> tag is made activate, and it is made inactive as the <div> tag's end time is reached. If no start time is given, the start is assumed to be 0, and if no end time is given, end is assumed to be infinity.

An "active" <div> tag may be a <div> tag that is being displayed ("display: block") in contrast to an "inactive" <div> tag, which may not be displayed ("display: none"). For some text formats however the difference between "active" and "inactive" may be a background colour or the display location on screen or some other mechanism. The default should be between "block" and "none", but changeable through CSS.

As the browser has parsed the TDHT file or its consitutent <div> tags, it is expected to keep the structure in memory. When seeking happens on the video, it can then decide upon which <div> tags are supposed to be active at the seek time and display these. [There is a discussion to be had here about the effect this has on the DOM. Different selectors may apply to a caption depending on whether the video was played back all the way there or seeking skipped over data to get there. It was suggested that inactive captions should be removed from the DOM, so there's always a well-defined small unambiguous DOM to match selectors against. However, this may for example not be desirable on some text display formats.]

Encapsulation into Ogg

The OggText specification is used to encapsulate a TDHT file into Ogg.

The codec-specific header data for the OggText ident header is the <head>..</head> part of the TDHT file. The complete <head> tag including all its subtags is encoded into the ident header unchanged.

The <div> elements with all their inner HTML are the data packets of the TDHT text codec and are thus encapsulated into the data packets as text codec data. A complete <div> including all its subtags is encoded into one data packet each.

Direct linking on a HTML5 page

Often, subtitles and other time-aligned text files are not actually provided inside a video stream (e.g. inside Ogg), but are referenced as a separate partner resource to a video.

To allow association of such files with a <video> or <audio> element, we propose the following approach:

<video i="video" src="http://example.com/video.ogv" controls>
  <itext id="caption1"  category="CC" lang="en/us" src="caption.srt" style=""></itext>
  <itext id="caption2"  category="CC" lang="de/de" src="caption.tdht" style=""></itext>
  <itext id="subtitle1" category="SUB" lang="de/de" src="german.dfxp" style=""></itext>
  <itext id="subtitle2" category="SUB" lang="jp" src="japanese.smil" style="></itext>
  <itext id="subtitle3" category="SUB" lang="fr" src="translation_webservice/fr/caption.srt" style=""></itext>
</video>

Notice the second set of closed captions being a TDHT file.

The id tag is simply a unique identifier for the tag. The category is from Ogg text categories. The lang contains a natural language according to language codes. The src element contains the actual file URI that we are after. The style element allows to attach styling to marked-up import files.

The <itext> element would act like an <iframe> element and create the nested browsing context described earlier. It has been renamed from earlier mentions of this approach from <text> to <itext> to avoid name clashes with e.g. SVG.

The user agent would then provide an interface such as:

 interface MediaItextElement : HTMLElement {
   attribute DOMString src;
   attribute DOMString category;
   attribute DOMString lang;
   attribute DOMString id;
   attribute DOMString style;
 };

In javascript there will need to be additional functions such as:

 getItext (): returns an array of time-aligned text elements
 addItext({src,category,lang,style,name}): adds a time-aligned text element to a <video> or <audio> element
 enable(itextElement): activates display of an itext file
 disable(itextElement) : deactivates display of an itext file
 delay(itextElement, seconds) : delays the itext file in relation to the video file by a positive or negative number of seconds