Difference between revisions of "Writ"

From XiphWiki
Jump to navigation Jump to search
m (→‎"The Old Way": spam removal)
Line 1: Line 1:
== Introduction ==
moved to [[OggWrit]]
Ogg Writ is a text phrase codec.  While its primary purpose is to embed
subtitles or captions in a Theora stream, its design makes it useful
for many other purposes.  It could provide lyrics to song encoded in
Vorbis, a transcript to a political debate encoded in Speex, or even
incorporate a live chat session as part of a continuous video stream.
One of the unique aspects of Writ is its discontinuous nature, that is,
unlike other Ogg codecs the granules for which seperate packets effect
may overlap.  See the Granules and Muxing section
below for how this works.
=== SVN ===
Current Ogg Writ development is on Xiph SVN as /trunk/writ/.  It's
being developed to use libogg2, so you'll need both to work on it.
The reference encoder and decoder are available as part of the py-ogg2
package which is available on Xiph SVN as /trunk/py-ogg2/.
<B>This is a (near final) working draft of the spec</B><BR>
Writ has been designed so that encoders/decoders can support a bare
minimum and be fully compatable with future subversions. Each subversion
adds a new feature, some building on others, adding a new header packet
and likely a new field to each body packet.
Decoders should ignore header packets beyond what they were written to
support and also ignore extra fields in data packets beyond their
current version.  This allows new features to be added without requiring
that all software, or even most software, to support them.
We will be conservative about adding future subversions.
Header Packet 0 (BOS, 16 bytes):
0x00                                  ( 8 bit Header 0)
"writ" (LSB 0x74697277)                (32 bit codec identification)
version                                ( 8 bit unsigned int, 0 = Alpha)
subversion                            ( 8 bit unsigned int)
granulerate_numerator                  (32 bit unsigned int)
granulerate_denominator                (32 bit unsigned int)
Data Packet (each):
0xFF                                  ( 8 bit 0xFF = data packet)
granule_start                          (64 bit signed integer)
granule_duration                      (32 bit unsigned integer)
text_length                            ( 8 bit unsigned integer)
text_string                            (variable-length UTF-8 string)
<B>Subversion 1 adds multiple language support</B>
Header Packet 1 (Language Definition, 8+ bytes) :
0x01                                  ( 8 bit Header 1)
"writ" (LSB 0x74697277)                (32 bit codec identification)
num_languages                          ( 8 bit unsigned int)
[repeated 1+num_languages times] :
  language_length                      ( 8 bit unsigned int)
  language_string                      (0+language_length rfc3066)
  language_desc_length                ( 8 bit unsigned int)
  language_desc_string                (0+language_desc_length UTF-8)
Data Packet (each):
0xFF                                  ( 8 bit 0xFF = data packet)
granule_start                          (64 bit signed integer)
granule_duration                      (32 bit unsigned integer)
[repeated num_languages times] :
  text_length                          ( 8 bit unsigned integer)
  text_string                          (variable-length UTF-8 string)
<B>Subversion 2 adds text window support</B>
Header Packet 2 (Window Definition, 10+ bytes) :
0x02                                  ( 8 bit Header 2)
"writ" (LSB 0x74697277)                (32 bit codec identification)
location_scale_x                      (16 bit unsigned int)
location_scale_y                      (16 bit unsigned int)
num_windows                            ( 8 bit unsigned int)
[if (window_num > 0) repeated window_num times] :
  location_x                          (variable length, see below)
  location_y                          (variable length, see below)
  location_width                      (variable length, see below)
  location_height                      (variable length, see below)
  alignment_x                          ( 2 bit alignment, see below)
  alignment_y                          ( 2 bit alignment, see below)
Data Packet (each):
0xFF                                  ( 8 bit 0xFF = data packet)
granule_start                          (64 bit signed integer)
granule_duration                      (32 bit unsigned integer)
[repeated num_languages times] :
  text_length                          ( 8 bit unsigned integer)
  text_string                          (variable-length UTF-8 string)
[if (window_num > 1)] :
  window_id                            ( 8 bit unsigned integer)
<B>Example Stream</B>
Header Packet 0
  version 0
  subversion 2
  granulenum 1
  granuledom 1
Header Packet 1
  num_languages 2
  Language 0:
    language en
    language_desc English
  Language 1:
    language es
    language_desc Spanish
Header Packet 2
  location_scale_x 4000 (12 bits)
  location_scale_y 270  ( 9 bits)
  num_windows 2
  Window 0:
    location_x 1
    location_y 2
    location_width 3
    location_height 1
    alignment_x 3 (Full)
    alignment_y 3 (Full)
  Window 1:
    location_x 5
    location_y 6
    location_width 7
    location_height 1
    alignment_x 3 (Full)
    alignment_y 3 (Full)
Phrase Packet:
  granule_start 5
  granule_duration 10
  Language 0: "Hello World!"
  Language 1: "Hola, Mundo!"
  window_id 0
\xff\x05\x00\x00\x00\x00\x00\x00\x00\x0a\x00\x00\x00\x0cHello World!\x0cHola, Mundo!\x00
Phrase Packet:
  granule_start 12
  granule_duration 15
  Language 0: "It's a beautiful day to be born."
  Language 1: "Es un día hermoso para que se llevará."
  window_id 1
\xff\x0c\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x00\x20It's a beautiful day to be born.\x26Es un d\xeda hermoso para que se llevar\xe1.\x01
== Granules and Muxing ==
Granulepos in Writ (as well as future discontinuous codecs) will be by
start time, not end time, that the data in a given page is tagged for.
This greatly simplifies this specification (see the old method below).
All Writ phrases will be provided at and given the granulepos of their
start time, ordered by their start time within the logical bitstream.
Phrase packets with long durations should be repeated in the logical
bitstream at regular intervals to ensure that a player seeking to the
middle of their duration will still see them.  These packet copies will
be identical to their original, including the start and duration fields,
the granulepos of the page they reside on will be incremented for each
copy to place it forward on the logical bitstream.
No two phrases can start on the same granule. On decoding, each packets'
start granule is checked against already known packets.  If a match is
found the new packet is ignored.  This prevents phrase copies from being
interpreted as new phrases.
== Seeking Example ==
Here is a timeline (granule numbers at top, read down) of a sample stream:
                        <- Granules ->
___________  ____________  ____________  ____________  _____________
____________________  ____________________________________
|_A____________>_____| |_D____________>______________>______|
    _________      ___    __________    ___________
    |_B_______|    |_C_|  |_E________|  |_F_________|
(note: these have been seperated vertically for easy viewing only)
Packet  Granule Description
V H0  0      Vorbis Header 0x01 (page by itself)
W H0  0      Writ Header 0 (page by itself)
V H1  0      Vorbis Header 0x03
V H2  0      Vorbis Header 0x05
W H1  0      Writ Header 1 (Language Defs)
W H2  0      Writ Header 2 (Window Defs)
W A    0      Writ Phrase A
W B    4      Writ Phrase B
V      12      Vorbis 0-12
W A    15      Writ Phrase A
W C    19      Writ Phrase C
W D    23      Writ Phrase D
V      26      Vorbis 13-26
W E    26      Writ Phrase E
W D    38      Writ Phrase D
V      40      Vorbis 27-40
W F    41      Writ Phrase F
W D    53      Writ Phrase D (EOF)
V      54      Vorbis 41-54
V      69      Vorbis 55-69 (EOF)
Player begins decoding at beginning of stream.  It reads the BOS pages
for both codecs, then receives a non-BOS page.  At this point it knows
that it has two bitstreams to decode and has resolved that one is Writ
and the other Vorbis.  It'll continue processing the headers for both.
Next it's going to find two Writ packets (phrases A and B) and toss them
into libwrit.  Then it'll get to the first Vorbis data page.  It now has
data from both bitstreams, and it knows (from the granulepos on the
Vorbis page) that it has enough data to run until 12.  If there were any
Writ packets before 12 they would have appeared first.
At around granule 9 the listener seeks forward to 24.  This will cause a
rapid seek through the file to find the first page with a granulepos
greater than the seek position and begin decoding at that point.
It'll find a Vorbis packet containing 13-26 (and not use 13-23) and Writ
phrase E.  Again, having data from both bitstreams it can begin playing.
D would normally appear at granule 24 but is not known about yet.  The
player knows that this is only enough to decode until 26 so, knowing
enough to prebuffer, continues reading the file as it plays the media.
The next packet it finds is Writ phrase D, and passing it to libwrit, is
found that the current granulepos is within the duration.  It is thus
displayed immediatly, as it's prebuffered, without waiting for
granulepos 38.  It'll keep reading (because the maximum decoded Vorbis
is still 26) and find a Vorbis packet with a 40 granulepos.
As it nears 38 it'll read the file again and find Writ phrase F, which
takes it out to 41.  Vorbis only goes until 40, so it'll have to keep
reading until the next Vorbis packet.
Next it'll find Writ phrase D, which will be ignored by libwrit because
phrase D is already known (matches start granule of earlier D), and the
EOF on that page marks this as the last of the Writ stream.
It'll continue reading for the next Vorbis data and find the packet
for granule 54, followed by the Vorbis packet for granule 69.  With that
it's EOS, EOF, finished.
This is of course a simplistic example, Writ and Vorbis will rarely have
granules which equal the same amount of time.  Each bitstream has its'
own granule -> time mapping which is calculated when muxing concurrent
bitstreams within the file.  So if there are 44100 Vorbis granules
per second and only 4 Writ granules per second, pages would be ordered
as W25 V297892 W31 V385932 W39 W41 V463057 etc.  The logic used in the
above example works after this granule-time mapping is calculated.
== Ongoing Discussion ==
* How does this get "encoded" and "merged"?
** &lt;purple_haese&gt; The muxing rule is pages are arranged in ascending order by the timestamp that is represented by their granulepos.
* For what reason is the 0x00 and 0xFF byte at the beginning of header and data packet respectively?
** &lt;xiphmont&gt; If, after a seek, I hand your codec a header packet, what does the codec do?
** &lt;xiphmont&gt; It does *nothing*.  If I haven't told it to reset, the header is not data, *it must ignore the header*.
** &lt;xiphmont&gt; this eliminates a huge raft of special cases in Ogg seeking.
== "The Old Way" ==
<B>The section below is for historical purposes only!</B>
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  In a lengthy discussion with Monty and Derf the decidion to change the
  behavior of discontinuous bitstreams in Ogg, or rather, extend the
  current Ogg specification to handle discontinuous codecs, was made.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The Ogg granulepos of each page is equal to the expiration of the text,
packets are ordered by expiration time and may overlap.  So, at or before
text A is to be displayed, the following sequence is included:
Physical        Text    Text    Text
Location        Packet  Start  Expire  (text expire = page granulepos)
00              B      04      14
00              D      19      23
00              C      09      24
00              F      27      34
00              E      26      37
00              G      35      47
00              H      42      54
00              A      00      59
51              I      51      66
So B, D, C, F, E, G, and H are all defined before A, building a FIFO (first
in first out) buffer in the player.  Encoders should limit the extend of this
behavior to reduce nessesary buffer size on the player side by prematurly
expiring captions and recreating them periodically.
The screen should not be updated with the new captions until they've all
been processed to prevent "flicker".  New caption data to the same position
will scroll the previous data upwards with no line breaks seperating them
(unless present in text).

Revision as of 05:06, 5 January 2005

moved to OggWrit