|
|
(13 intermediate revisions by 11 users not shown) |
Line 1: |
Line 1: |
| == Introduction ==
| | #REDIRECT [[OggWrit]] |
| Ogg Writ is a text phrase codec. While its primary purpose is to embed
| |
| subtitles or captions in a Theora stream, its design makes it useful
| |
| for many other purposes. It could provide lyrics to song encoded in
| |
| Vorbis, a transcript to a political debate encoded in Speex, or even
| |
| incorporate a live chat session as part of a continuous video stream.
| |
| | |
| One of the unique aspects of Writ is its discontinuous nature, that is,
| |
| unlike other Ogg codecs the granules for which seperate packets effect
| |
| may overlap. See the Granules and Muxing section
| |
| below for how this works.
| |
| | |
| | |
| === SVN ===
| |
| Current Ogg Writ development is on Xiph SVN as /trunk/writ/. It's
| |
| being developed to use libogg2, so you'll need both to work on it.
| |
| The reference encoder and decoder are available as part of the py-ogg2
| |
| package which is available on Xiph SVN as /trunk/py-ogg2/.
| |
| | |
| | |
| <B>This is a (near final) working draft of the spec</B><BR>
| |
| Writ has been designed so that encoders/decoders can support a bare
| |
| minimum and be fully compatable with future subversions. Each subversion
| |
| adds a new feature, some building on others, adding a new header packet
| |
| and likely a new field to each body packet.
| |
| <P>
| |
| Decoders should ignore header packets beyond what they were written to
| |
| support and also ignore extra fields in data packets beyond their
| |
| current version. This allows new features to be added without requiring
| |
| that all software, or even most software, to support them.
| |
| <P>
| |
| We will be conservative about adding future subversions.
| |
| | |
| <pre>
| |
| Header Packet 0 (BOS, 16 bytes):
| |
| 0x00 ( 8 bit Header 0)
| |
| "writ" (LSB 0x74697277) (32 bit codec identification)
| |
| version ( 8 bit unsigned int, 0 = Alpha)
| |
| subversion ( 8 bit unsigned int)
| |
| granulerate_numerator (32 bit unsigned int)
| |
| granulerate_denominator (32 bit unsigned int)
| |
| | |
| Data Packet (each):
| |
| 0xFF ( 8 bit 0xFF = data packet)
| |
| granule_start (64 bit signed integer)
| |
| granule_duration (32 bit unsigned integer)
| |
| text_length ( 8 bit unsigned integer)
| |
| text_string (variable-length UTF-8 string)
| |
| | |
| | |
| <B>Subversion 1 adds multiple language support</B>
| |
| | |
| Header Packet 1 (Language Definition, 8+ bytes) :
| |
| 0x01 ( 8 bit Header 1)
| |
| "writ" (LSB 0x74697277) (32 bit codec identification)
| |
| num_languages ( 8 bit unsigned int)
| |
| [repeated 1+num_languages times] :
| |
| language_length ( 8 bit unsigned int)
| |
| language_string (0+language_length rfc3066)
| |
| language_desc_length ( 8 bit unsigned int)
| |
| language_desc_string (0+language_desc_length UTF-8)
| |
| | |
| Data Packet (each):
| |
| 0xFF ( 8 bit 0xFF = data packet)
| |
| granule_start (64 bit signed integer)
| |
| granule_duration (32 bit unsigned integer)
| |
| [repeated num_languages times] :
| |
| text_length ( 8 bit unsigned integer)
| |
| text_string (variable-length UTF-8 string)
| |
| | |
| | |
| <B>Subversion 2 adds text window support</B>
| |
| | |
| Header Packet 2 (Window Definition, 10+ bytes) :
| |
| 0x02 ( 8 bit Header 2)
| |
| "writ" (LSB 0x74697277) (32 bit codec identification)
| |
| location_scale_x (16 bit unsigned int)
| |
| location_scale_y (16 bit unsigned int)
| |
| num_windows ( 8 bit unsigned int)
| |
| [if (window_num > 0) repeated window_num times] :
| |
| location_x (variable length, see below)
| |
| location_y (variable length, see below)
| |
| location_width (variable length, see below)
| |
| location_height (variable length, see below)
| |
| alignment_x ( 2 bit alignment, see below)
| |
| alignment_y ( 2 bit alignment, see below)
| |
| | |
| Data Packet (each):
| |
| 0xFF ( 8 bit 0xFF = data packet)
| |
| granule_start (64 bit signed integer)
| |
| granule_duration (32 bit unsigned integer)
| |
| [repeated num_languages times] :
| |
| text_length ( 8 bit unsigned integer)
| |
| text_string (variable-length UTF-8 string)
| |
| [if (window_num > 1)] :
| |
| window_id ( 8 bit unsigned integer)
| |
| | |
| | |
| <B>Example Stream</B>
| |
| Header Packet 0
| |
| version 0
| |
| subversion 2
| |
| granulenum 1
| |
| granuledom 1
| |
| \x00writ\x00\x02\x01\x00\x00\x00\x01\x00\x00\x00
| |
| | |
| Header Packet 1
| |
| num_languages 2
| |
| Language 0:
| |
| language en
| |
| language_desc English
| |
| Language 1:
| |
| language es
| |
| language_desc Spanish
| |
| \x01writ\x01\x02en\x07English\x02es\x07Spanish
| |
| | |
| Header Packet 2
| |
| location_scale_x 4000 (12 bits)
| |
| location_scale_y 270 ( 9 bits)
| |
| num_windows 2
| |
| Window 0:
| |
| location_x 1
| |
| location_y 2
| |
| location_width 3
| |
| location_height 1
| |
| alignment_x 3 (Full)
| |
| alignment_y 3 (Full)
| |
| Window 1:
| |
| location_x 5
| |
| location_y 6
| |
| location_width 7
| |
| location_height 1
| |
| alignment_x 3 (Full)
| |
| alignment_y 3 (Full)
| |
| \x02writ\xa0\x0f\x0e\x01\x02\x01\x20\x60\x00\x02\x7c\x01\x18\x38\x80\x00\x0f
| |
| | |
| Phrase Packet:
| |
| granule_start 5
| |
| granule_duration 10
| |
| Language 0: "Hello World!"
| |
| Language 1: "Hola, Mundo!"
| |
| window_id 0
| |
| \xff\x05\x00\x00\x00\x00\x00\x00\x00\x0a\x00\x00\x00\x0cHello World!\x0cHola, Mundo!\x00
| |
| | |
| Phrase Packet:
| |
| granule_start 12
| |
| granule_duration 15
| |
| Language 0: "It's a beautiful day to be born."
| |
| Language 1: "Es un día hermoso para que se llevará."
| |
| window_id 1
| |
| \xff\x0c\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x00\x20It's a beautiful day to be born.\x26Es un d\xeda hermoso para que se llevar\xe1.\x01
| |
| | |
| | |
| </pre>
| |
| | |
| == Granules and Muxing ==
| |
| | |
| Granulepos in Writ (as well as future discontinuous codecs) will be by
| |
| start time, not end time, that the data in a given page is tagged for.
| |
| This greatly simplifies this specification (see the old method below).
| |
| | |
| All Writ phrases will be provided at and given the granulepos of their
| |
| start time, ordered by their start time within the logical bitstream.
| |
| | |
| Phrase packets with long durations should be repeated in the logical
| |
| bitstream at regular intervals to ensure that a player seeking to the
| |
| middle of their duration will still see them. These packet copies will
| |
| be identical to their original, including the start and duration fields,
| |
| the granulepos of the page they reside on will be incremented for each
| |
| copy to place it forward on the logical bitstream.
| |
| | |
| No two phrases can start on the same granule. On decoding, each packets'
| |
| start granule is checked against already known packets. If a match is
| |
| found the new packet is ignored. This prevents phrase copies from being
| |
| interpreted as new phrases.
| |
| | |
| == Seeking Example ==
| |
| <pre>
| |
| | |
| Here is a timeline (granule numbers at top, read down) of a sample stream:
| |
| | |
| <- Granules ->
| |
| 0000000000111111111122222222223333333333444444444455555555556666666666
| |
| 0123456789012345678901234567890123456789012345678901234567890123456789
| |
| ___________ ____________ ____________ ____________ _____________
| |
| |_Vorbis____||_Vorbis_____||_Vorbis_____||_Vorbis_____||_Vorbis______|
| |
| ____________________ ____________________________________
| |
| |_A____________>_____| |_D____________>______________>______|
| |
| _________ ___ __________ ___________
| |
| |_B_______| |_C_| |_E________| |_F_________|
| |
| | |
| (note: these have been seperated vertically for easy viewing only)
| |
| | |
| Packet Granule Description
| |
| V H0 0 Vorbis Header 0x01 (page by itself)
| |
| W H0 0 Writ Header 0 (page by itself)
| |
| V H1 0 Vorbis Header 0x03
| |
| V H2 0 Vorbis Header 0x05
| |
| W H1 0 Writ Header 1 (Language Defs)
| |
| W H2 0 Writ Header 2 (Window Defs)
| |
| W A 0 Writ Phrase A
| |
| W B 4 Writ Phrase B
| |
| V 12 Vorbis 0-12
| |
| W A 15 Writ Phrase A
| |
| W C 19 Writ Phrase C
| |
| W D 23 Writ Phrase D
| |
| V 26 Vorbis 13-26
| |
| W E 26 Writ Phrase E
| |
| W D 38 Writ Phrase D
| |
| V 40 Vorbis 27-40
| |
| W F 41 Writ Phrase F
| |
| W D 53 Writ Phrase D (EOF)
| |
| V 54 Vorbis 41-54
| |
| V 69 Vorbis 55-69 (EOF)
| |
| | |
| </pre>
| |
| | |
| Player begins decoding at beginning of stream. It reads the BOS pages
| |
| for both codecs, then receives a non-BOS page. At this point it knows
| |
| that it has two bitstreams to decode and has resolved that one is Writ
| |
| and the other Vorbis. It'll continue processing the headers for both.
| |
| | |
| Next it's going to find two Writ packets (phrases A and B) and toss them
| |
| into libwrit. Then it'll get to the first Vorbis data page. It now has
| |
| data from both bitstreams, and it knows (from the granulepos on the
| |
| Vorbis page) that it has enough data to run until 12. If there were any
| |
| Writ packets before 12 they would have appeared first.
| |
| | |
| At around granule 9 the listener seeks forward to 24. This will cause a
| |
| rapid seek through the file to find the first page with a granulepos
| |
| greater than the seek position and begin decoding at that point.
| |
| | |
| It'll find a Vorbis packet containing 13-26 (and not use 13-23) and Writ
| |
| phrase E. Again, having data from both bitstreams it can begin playing.
| |
| D would normally appear at granule 24 but is not known about yet. The
| |
| player knows that this is only enough to decode until 26 so, knowing
| |
| enough to prebuffer, continues reading the file as it plays the media.
| |
| | |
| The next packet it finds is Writ phrase D, and passing it to libwrit, is
| |
| found that the current granulepos is within the duration. It is thus
| |
| displayed immediatly, as it's prebuffered, without waiting for
| |
| granulepos 38. It'll keep reading (because the maximum decoded Vorbis
| |
| is still 26) and find a Vorbis packet with a 40 granulepos.
| |
| | |
| As it nears 38 it'll read the file again and find Writ phrase F, which
| |
| takes it out to 41. Vorbis only goes until 40, so it'll have to keep
| |
| reading until the next Vorbis packet.
| |
| | |
| Next it'll find Writ phrase D, which will be ignored by libwrit because
| |
| phrase D is already known (matches start granule of earlier D), and the
| |
| EOF on that page marks this as the last of the Writ stream.
| |
| | |
| It'll continue reading for the next Vorbis data and find the packet
| |
| for granule 54, followed by the Vorbis packet for granule 69. With that
| |
| it's EOS, EOF, finished.
| |
| | |
| This is of course a simplistic example, Writ and Vorbis will rarely have
| |
| granules which equal the same amount of time. Each bitstream has its'
| |
| own granule -> time mapping which is calculated when muxing concurrent
| |
| bitstreams within the file. So if there are 44100 Vorbis granules
| |
| per second and only 4 Writ granules per second, pages would be ordered
| |
| as W25 V297892 W31 V385932 W39 W41 V463057 etc. The logic used in the
| |
| above example works after this granule-time mapping is calculated.
| |
| | |
| | |
| == Ongoing Discussion ==
| |
| | |
| * How does this get "encoded" and "merged"?
| |
| ** <purple_haese> The muxing rule is pages are arranged in ascending order by the timestamp that is represented by their granulepos.
| |
| | |
| * For what reason is the 0x00 and 0xFF byte at the beginning of header and data packet respectively?
| |
| ** <xiphmont> If, after a seek, I hand your codec a header packet, what does the codec do?
| |
| ** <xiphmont> It does *nothing*. If I haven't told it to reset, the header is not data, *it must ignore the header*.
| |
| ** <xiphmont> this eliminates a huge raft of special cases in Ogg seeking.
| |
| | |
| | |
| | |
| == "The Old Way" ==
| |
| <B>The section below is for historical purposes only!</B>
| |
| <pre>
| |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| |
| 2003/08/17
| |
| In a lengthy discussion with Monty and Derf the decidion to change the
| |
| behavior of discontinuous bitstreams in Ogg, or rather, extend the
| |
| current Ogg specification to handle discontinuous codecs, was made.
| |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| |
| </pre>
| |
| | |
| The Ogg granulepos of each page is equal to the expiration of the text,
| |
| packets are ordered by expiration time and may overlap. So, at or before
| |
| text A is to be displayed, the following sequence is included:
| |
| | |
| <pre>
| |
| Physical Text Text Text
| |
| Location Packet Start Expire (text expire = page granulepos)
| |
| ---------------------------------------
| |
| 00 B 04 14
| |
| 00 D 19 23
| |
| 00 C 09 24
| |
| 00 F 27 34
| |
| 00 E 26 37
| |
| 00 G 35 47
| |
| 00 H 42 54
| |
| 00 A 00 59
| |
| 51 I 51 66
| |
| </pre>
| |
| So B, D, C, F, E, G, and H are all defined before A, building a FIFO (first
| |
| in first out) buffer in the player. Encoders should limit the extend of this
| |
| behavior to reduce nessesary buffer size on the player side by prematurly
| |
| expiring captions and recreating them periodically.
| |
| | |
| The screen should not be updated with the new captions until they've all
| |
| been processed to prevent "flicker". New caption data to the same position
| |
| will scroll the previous data upwards with no line breaks seperating them
| |
| (unless present in text).
| |