@@ Line 1: / Line 1: @@
-== Introduction ==
+#REDIRECT [[OggWrit]]
-Ogg Writ is a text phrase codec.  While its primary purpose is to embed
-subtitles or captions in a Theora stream, its design makes it useful
-for many other purposes.  It could provide lyrics to song encoded in
-Vorbis, a transcript to a political debate encoded in Speex, or even
-incorporate a live chat session as part of a continuous video stream.
-One of the unique aspects of Writ is its discontinuous nature, that is,
-unlike other Ogg codecs the granules for which seperate packets effect
-may overlap.  See the Granules and Muxing section
-below for how this works.
-=== SVN ===
-Current Ogg Writ development is on Xiph SVN as /trunk/writ/.  It's
-being developed to use libogg2, so you'll need both to work on it.
-The reference encoder and decoder are available as part of the py-ogg2
-package which is available on Xiph SVN as /trunk/py-ogg2/.
-<B>This is a (near final) working draft of the spec</B><BR>
-Writ has been designed so that encoders/decoders can support a bare
-minimum and be fully compatable with future subversions. Each subversion
-adds a new feature, some building on others, adding a new header packet
-and likely a new field to each body packet.
-<P>
-Decoders should ignore header packets beyond what they were written to
-support and also ignore extra fields in data packets beyond their
-current version.  This allows new features to be added without requiring
-that all software, or even most software, to support them.
-<P>
-We will be conservative about adding future subversions.
-<pre>
-Header Packet 0 (BOS, 16 bytes):
-x00                                   ( 8 bit Header 0)
- "writ" (LSB 0x74697277)                (32 bit codec identification)
- version                                ( 8 bit unsigned int, 0 = Alpha)
- subversion                             ( 8 bit unsigned int)
- granulerate_numerator                  (32 bit unsigned int)
- granulerate_denominator                (32 bit unsigned int)
-Data Packet (each):
-xFF                                   ( 8 bit 0xFF = data packet)
- granule_start                          (64 bit signed integer)
- granule_duration                       (32 bit unsigned integer)
- text_length                            ( 8 bit unsigned integer)
- text_string                            (variable-length UTF-8 string)
-<B>Subversion 1 adds multiple language support</B>
-Header Packet 1 (Language Definition, 8+ bytes) :
-x01                                   ( 8 bit Header 1)
- "writ" (LSB 0x74697277)                (32 bit codec identification)
- num_languages                          ( 8 bit unsigned int)
- [repeated 1+num_languages times] :
-   language_length                      ( 8 bit unsigned int)
-   language_string                      (0+language_length rfc3066)
-   language_desc_length                 ( 8 bit unsigned int)
-   language_desc_string                 (0+language_desc_length UTF-8)
-Data Packet (each):
-xFF                                   ( 8 bit 0xFF = data packet)
- granule_start                          (64 bit signed integer)
- granule_duration                       (32 bit unsigned integer)
- [repeated num_languages times] :
-   text_length                          ( 8 bit unsigned integer)
-   text_string                          (variable-length UTF-8 string)
-<B>Subversion 2 adds text window support</B>
-Header Packet 2 (Window Definition, 10+ bytes) :
-x02                                   ( 8 bit Header 2)
- "writ" (LSB 0x74697277)                (32 bit codec identification)
- location_scale_x                       (16 bit unsigned int)
- location_scale_y                       (16 bit unsigned int)
- num_windows                            ( 8 bit unsigned int)
- [if (window_num > 0) repeated window_num times] :
-   location_x                           (variable length, see below)
-   location_y                           (variable length, see below)
-   location_width                       (variable length, see below)
-   location_height                      (variable length, see below)
-   alignment_x                          ( 2 bit alignment, see below)
-   alignment_y                          ( 2 bit alignment, see below)
-Data Packet (each):
-xFF                                   ( 8 bit 0xFF = data packet)
- granule_start                          (64 bit signed integer)
- granule_duration                       (32 bit unsigned integer)
- [repeated num_languages times] :
-   text_length                          ( 8 bit unsigned integer)
-   text_string                          (variable-length UTF-8 string)
- [if (window_num > 1)] :
-   window_id                            ( 8 bit unsigned integer)
-<B>Example Stream</B>
- Header Packet 0
-  version 0
-  subversion 2
-  granulenum 1
-  granuledom 1
- \x00writ\x00\x02\x01\x00\x00\x00\x01\x00\x00\x00
- Header Packet 1
-  num_languages 2
-   Language 0:
-    language en
-    language_desc English
-   Language 1:
-    language es
-    language_desc Spanish
- \x01writ\x01\x02en\x07English\x02es\x07Spanish
- Header Packet 2
-  location_scale_x 4000 (12 bits)
-  location_scale_y 270  ( 9 bits)
-  num_windows 2
-   Window 0:
-    location_x 1
-    location_y 2
-    location_width 3
-    location_height 1
-    alignment_x 3 (Full)
-    alignment_y 3 (Full)
-   Window 1:
-    location_x 5
-    location_y 6
-    location_width 7
-    location_height 1
-    alignment_x 3 (Full)
-    alignment_y 3 (Full)
- \x02writ\xa0\x0f\x0e\x01\x02\x01\x20\x60\x00\x02\x7c\x01\x18\x38\x80\x00\x0f
- Phrase Packet:
-  granule_start 5
-  granule_duration 10
-  Language 0: "Hello World!"
-  Language 1: "Hola, Mundo!"
-  window_id 0
- \xff\x05\x00\x00\x00\x00\x00\x00\x00\x0a\x00\x00\x00\x0cHello World!\x0cHola, Mundo!\x00
- Phrase Packet:
-  granule_start 12
-  granule_duration 15
-  Language 0: "It's a beautiful day to be born."
-  Language 1: "Es un día hermoso para que se llevará."
-  window_id 1
- \xff\x0c\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x00\x20It's a beautiful day to be born.\x26Es un d\xeda hermoso para que se llevar\xe1.\x01
-</pre>
-== Granules and Muxing ==
-Granulepos in Writ (as well as future discontinuous codecs) will be by
-start time, not end time, that the data in a given page is tagged for.
-This greatly simplifies this specification (see the old method below).
-All Writ phrases will be provided at and given the granulepos of their
-start time, ordered by their start time within the logical bitstream.
-Phrase packets with long durations should be repeated in the logical
-bitstream at regular intervals to ensure that a player seeking to the
-middle of their duration will still see them.  These packet copies will
-be identical to their original, including the start and duration fields,
-the granulepos of the page they reside on will be incremented for each
-copy to place it forward on the logical bitstream.
-No two phrases can start on the same granule. On decoding, each packets'
-start granule is checked against already known packets.  If a match is
-found the new packet is ignored.  This prevents phrase copies from being
-interpreted as new phrases.
-== Seeking Example ==
-<pre>
-Here is a timeline (granule numbers at top, read down) of a sample stream:
-                        <- Granules ->
-0000000000111111111122222222223333333333444444444455555555556666666666
-0123456789012345678901234567890123456789012345678901234567890123456789
- ___________  ____________  ____________  ____________  _____________
-|_Vorbis____||_Vorbis_____||_Vorbis_____||_Vorbis_____||_Vorbis______|
- ____________________   ____________________________________
-|_A____________>_____| |_D____________>______________>______|
-     _________      ___    __________     ___________
-    |_B_______|    |_C_|  |_E________|   |_F_________|
- (note: these have been seperated vertically for easy viewing only)
-Packet  Granule Description
- V H0   0       Vorbis Header 0x01 (page by itself)
- W H0   0       Writ Header 0 (page by itself)
- V H1   0       Vorbis Header 0x03
- V H2   0       Vorbis Header 0x05
- W H1   0       Writ Header 1 (Language Defs)
- W H2   0       Writ Header 2 (Window Defs)
- W A    0       Writ Phrase A
- W B    4       Writ Phrase B
- V      12      Vorbis 0-12
- W A    15      Writ Phrase A
- W C    19      Writ Phrase C
- W D    23      Writ Phrase D
- V      26      Vorbis 13-26
- W E    26      Writ Phrase E
- W D    38      Writ Phrase D
- V      40      Vorbis 27-40
- W F    41      Writ Phrase F
- W D    53      Writ Phrase D (EOF)
- V      54      Vorbis 41-54
- V      69      Vorbis 55-69 (EOF)
-</pre>
-Player begins decoding at beginning of stream.  It reads the BOS pages
-for both codecs, then receives a non-BOS page.  At this point it knows
-that it has two bitstreams to decode and has resolved that one is Writ
-and the other Vorbis.  It'll continue processing the headers for both.
-Next it's going to find two Writ packets (phrases A and B) and toss them
-into libwrit.  Then it'll get to the first Vorbis data page.  It now has
-data from both bitstreams, and it knows (from the granulepos on the
-Vorbis page) that it has enough data to run until 12.  If there were any
-Writ packets before 12 they would have appeared first.
-At around granule 9 the listener seeks forward to 24.  This will cause a
-rapid seek through the file to find the first page with a granulepos
-greater than the seek position and begin decoding at that point.
-It'll find a Vorbis packet containing 13-26 (and not use 13-23) and Writ
-phrase E.  Again, having data from both bitstreams it can begin playing.
-D would normally appear at granule 24 but is not known about yet.  The
-player knows that this is only enough to decode until 26 so, knowing
-enough to prebuffer, continues reading the file as it plays the media.
-The next packet it finds is Writ phrase D, and passing it to libwrit, is
-found that the current granulepos is within the duration.  It is thus
-displayed immediatly, as it's prebuffered, without waiting for
-granulepos 38.  It'll keep reading (because the maximum decoded Vorbis
-is still 26) and find a Vorbis packet with a 40 granulepos.
-As it nears 38 it'll read the file again and find Writ phrase F, which
-takes it out to 41.  Vorbis only goes until 40, so it'll have to keep
-reading until the next Vorbis packet.
-Next it'll find Writ phrase D, which will be ignored by libwrit because
-phrase D is already known (matches start granule of earlier D), and the
-EOF on that page marks this as the last of the Writ stream.
-It'll continue reading for the next Vorbis data and find the packet
-for granule 54, followed by the Vorbis packet for granule 69.  With that
-it's EOS, EOF, finished.
-This is of course a simplistic example, Writ and Vorbis will rarely have
-granules which equal the same amount of time.  Each bitstream has its'
-own granule -> time mapping which is calculated when muxing concurrent
-bitstreams within the file.  So if there are 44100 Vorbis granules
-per second and only 4 Writ granules per second, pages would be ordered
-as W25 V297892 W31 V385932 W39 W41 V463057 etc.  The logic used in the
-above example works after this granule-time mapping is calculated.
-== Ongoing Discussion ==
-* How does this get "encoded" and "merged"?
-** &lt;purple_haese&gt; The muxing rule is pages are arranged in ascending order by the timestamp that is represented by their granulepos.
-* For what reason is the 0x00 and 0xFF byte at the beginning of header and data packet respectively?
-** &lt;xiphmont&gt; If, after a seek, I hand your codec a header packet, what does the codec do?
-** &lt;xiphmont&gt; It does *nothing*.  If I haven't told it to reset, the header is not data, *it must ignore the header*.
-** &lt;xiphmont&gt; this eliminates a huge raft of special cases in Ogg seeking.
-== "The Old Way" ==
-<B>The section below is for historical purposes only!</B>
-<pre>
-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-/08/17
-  In a lengthy discussion with Monty and Derf the decidion to change the
-  behavior of discontinuous bitstreams in Ogg, or rather, extend the
-  current Ogg specification to handle discontinuous codecs, was made.
-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-</pre>
-The Ogg granulepos of each page is equal to the expiration of the text,
-packets are ordered by expiration time and may overlap.  So, at or before
-text A is to be displayed, the following sequence is included:
-<pre>
-Physical        Text    Text    Text
-Location        Packet  Start   Expire  (text expire = page granulepos)
----------------------------------------
-              B       04      14
-              D       19      23
-              C       09      24
-              F       27      34
-              E       26      37
-              G       35      47
-              H       42      54
-              A       00      59
-              I       51      66
-</pre>
-So B, D, C, F, E, G, and H are all defined before A, building a FIFO (first
-in first out) buffer in the player.  Encoders should limit the extend of this
-behavior to reduce nessesary buffer size on the player side by prematurly
-expiring captions and recreating them periodically.
-The screen should not be updated with the new captions until they've all
-been processed to prevent "flicker".  New caption data to the same position
-will scroll the previous data upwards with no line breaks seperating them
-(unless present in text).

Writ: Difference between revisions

Latest revision as of 16:10, 12 May 2007

Navigation menu