transOgg Page Primitive
transOgg transport consists of a single encapsulation primitive, based on the original Ogg page. Pages are byte aligned; in the diagram below, bytes are encoded left to right, top to bottom. Greater-than-byte quantities are encoded in little-endian byteorder.
01234567 01234567 01234567 01234567 0 1 2 3 -- |-------- -------- -------- --------| -- ^ 0| capture pattern |3 ^ | |--------|-------- -------- --------| | | 4| sgmnts | stream identification |7 | | |------|-|--------|-------- --------| | 20 8| flag | hbytes | dbytes |11 | | |------|- --------|-------- --------| | | 12| 32 bit checksum |15 | | |-------- -------- -------- --------| | V 16| DTS [low] word |19 | -- |-------- -------- -------- --------| | 20? DTS high word | sgmnts+ |-------- -------- -------- -------- hbytes+ ? sequence... 19 |-------- | ? distance... | |-------- | ? segment table... | |-------- | ? pp-reldistance?... | ? pp-delay?... | ? pp-duration?... | ? pp-ppflags?... V -- |-------- -- ^ ? | ? dbytes ? data payloads... | ? V ? -- |--------
the capture pattern consists of four ASCII 7-bit clean bytes: 'Og2S' (0x4F, 0x47, 0x32, 0x53 in order)
'sgmnts' (8 bits, 0-255) indicates the number of packet segments encoded in this page. The first and/or last packets may be partial as specified by the FROM/TO flags (below).
the stream identification ID is a 24-bit pseudo-random number that uniquely identifies this media stream within the larger multiplexed stream. It must be unique both in the current multiplexed section, as well as globally unique within a chained stream. The large size of the 24 bit ID is intended to be used like a weak hash such that it will be highly unlikely to need to rewrite a stream's ID number (and thus rechecksum all the pages as well) when multiplexing or concatenating.
'flags' defines seven bit flags (bits 0-6 of byte 8) as follows:
0 FROM: set == initial packet continued from previous page note: unset if page contains no packets 1 TO : set == final packet continued on next page note: unset if page contains no packets 2 CRC : set == checksum applies to header and data unset == checksum applies to header fields only 3 BREF: set == backreference field present; no packets on this page represent a syncpoint unset == backreference field not present; interpretation subject to SUBR field 4 SUBR: set == If BREF also set: explicit backreference distances are encoded for each packet. The page must not contain a syncpoint. If BREF unset: all packets on the page are at max backreference distance, ie, max preroll. Intra-only stream types should use this flag combination for all pages along with with a preroll value of zero. unset == If BREF set: all packets on this page share the single encoded backreference value. If BREF also unset: first packet is a syncpoint. (implicit backreference value of zero). 5 SEQ : set == sequence field is present 6 DURA: set == full raw duration encoding present
header variable fields bytes
'hbytes' (bit 7 of byte 8 and bits 0-7 of byte 9 for 9 bits total, 0-511) encodes the number of bytes spanned bythe variable-length header fields (DTS high word, sequence, distance, lacing, and per-packet fields). The actual number of header bytes is computed as (19 + segments + hbytes); this computed value serves as a direct byte offset to the page data, as measured from the first byte of the page.
data payload bytes
'dbytes' (16 bits, 0-65535) indicates the number of bytes of data payload.
the checksum is 32 bit CRC value (direct algorithm, initial val and final XOR = 0, generator polynomial=0x04c11db7) encoded in the page header in little-endian format. The checksum is computed over the 20+hbytes header bytes, skipping the CRC bytes. When the CRC flag is set, the CRC continues over the entire page body (dbytes).
The DTS field is a variable-length encoded delivery time stamp value, equivalent to the high bits of the granule position in the original Ogg container. The DTS value is encoded in V32/64 format and is either 4 or 8 bytes in total.
The DTS value for a page in a continuous stream is equal to the end-time of the data returned up to the last completed packet in the stream (the ending DTS of the last completed packet). When a continuous stream page contains no completed packets, the DTS is equal to the end-time of the last previously complete packet. oterwise, the DTS of the page is equal to the end-time of the last packet completed on the page.
The DTS value for a page in a discontinuous stream is equal to the start time of the first packet begun on the page. ( more to write here on correct discont timing and packet spanning behavior )
The sequence field is present when the SEQ flag is set; this field orders any sequence of pages that have the same DTS, such as pages without complete packets, or pages containing only packets with zero duration. The sequence value is encoded in V8/16/32 format. The first page in a sequence of pages with identical DTS does not set the SEQ flag. The second page in a sequence sets the SEQ flag and the sequence value to zero. The third page in a sequence sets the SEQ flag and the sequence value to one, etc.
The distance field is conditionally present only if the SYNC flag is unset. It is equal to the DTS of the current page minus the DTS of the previous syncpoint packet (not page) minus one. The value is encoded in V8/16/32 format.
Lacing values encode the length of each payload segment in the page into the segment table. Lacing values are coded in one, two or three bytes. Lacing values are coded until the total number of coded segment lengths == 'sgmnts-1'. Length of the last segment is implicit, equalling the unencoded remainder of 'dbytes', which may be zero.
Overrunning the number of declared segments (ie, a zero run encodes past the 'sgmnts'-1 limit), or underruning the expected number of segments (ie, reading to the end of hbytes before seeing the expected number of segments) shall be considered an error condition rendering the page undecodable. Lacing values may encode no more than 255 segments total (including the implicit last segment), null or otherwise, in a single page.
If 'sgmnts' is zero, the page is a null-page containing no data. 'dbytes' must also be zero. More on proper encoding and use of null pages
the delay, duration and per-packet flags are collectively byte-aligned and fill out the remainder of the 'hbytes' span not filled by the DTS, sequence, distance and lacing fields. Within this span, the individual delay, duration and flag fields are bit-aligned using a big-endian byte-packer.
Delay values are written first. A field of N bits is written for every packet completed on the page. N is set in the stream metaheader; N may be zero in which case no delay values are written. The value encoded is the PTS minus the DTS of the packet. The value in unsigned (positive).
Duration values are written next, bit aligned to the end of the delay values. A duration value is written for each packet completed on the page.
If the DURA flag is unset, each duration value is written as an N bit quantity, where N is set in the metadata header. N may be zero. The value as written is interpreted according to the duration base, duration multiplier and duration table declared in the metadata header.
If the DURA flag is set, each duration value is instead encoded as an V8/16/32 value and interpreted directly against the stream's master PTS/DTS timebase.
per-packet 'private' flags
Codec-private 'per packet' flags are encoded next. A field of N bits is written for every packet completed on the page. N is set in the stream metadata; N may be zero in which case no packet flags are written.
Any unused bits needed to fill the last byte out such that the lags are written into an integral number of bytes are set to zero. The hbytes field may not be used to 'pad' the flags fields with extra space; more than seven 'left over' bits (hbytes + 4 - DTS bytes - sequence field bytes - distance field bytes - segment tabel bytes) * 8 - flag bits > 7) shall be considered an error rendering the page undecodable.
Data payload is byte-aligned and immediately follows the last flag byte. The size of the data payload is equal to dbytes.
Lacing codewords are as follows:
first byte == 0 through 251 : stop reading, use unsigned value as sizeof packet. Note that zero is a valid packet size. == 252 : read a second byte; packet size is the unsigned value of the second byte + 252. == 253 : read a second byte; packet size is the unsigned value of the second byte + 508. == 254 : read a second byte; packet size is the unsigned value of the second byte + 764. == 255 : read a second and third unsigned byte; If the second byte is <= 251, packet size is (second byte << 8) + third byte + 1020. If the second byte == 252, read a third byte. If the third byte is < 4, packet size is (second byte << 8) + third byte + 1020. If the third byte is >= 4, this indicates the presence of (third byte) zero-length packets in sequence. It is always more efficient to code more than three zero-length packets in sequence using the this three-byte signalling, however muxers MAY use either encoding. Demuxers MUST handle both cases. If the second byte > 252, this indicates a case reserved for future use; this shall render the page not decodable.
V32/64 is a bit-extension format that signals the codeword length using the leading bit (LSbit of the codeword) to indicate if the codeword is 32 or 64 bits.
inital [low] bit: 0 -> codeword is 32 bits total; upper 31 bits encode an unsigned value between 0 and 2^31-1 1 -> codeword is 64 bits total; upper 63 bits encode an unsigned value between 2^31 and 2^63 + 2^31 -1
V8/16/32 is a bit-extension format that signals the codeword length using the leading bit[s] (LSbit[s] of the codeword) to indicate if the codeword is 8, 16 or 32 bits.
inital [low] bit[s]: 0 -> codeword is 8 bits total; upper 7 bits encode an unsigned value between 0 and 127 (2^7 - 1) 10 -> codeword is 16 bits total; upper 15 bits encode an unsigned value between 128 (2^7) and 32895 (2^15 + 2^7 - 1) 11 -> codeword is 32 bits total; upper 31 bits encode an unsigned value between 32896 (2^15 + 2^7) and 2147516543 (2^31 + 2^15 + 2^7 - 1)
Syncpoints / Keyframes
Streams in which not every frame serves as a syncpoint may place only one syncpoint (keyframe) packet per page. The syncpoint packet must be the first packet begun (the first segment) and the first packet completed on the page (if any).
- TransOgg: Toplevel transOgg page