TransOgg Page

From XiphWiki
Jump to navigation Jump to search

TransOgg Page Primitive

transOgg transport consists of a single encapsulation primitive, based on the original Ogg page. Pages are byte aligned; in the diagram below, bytes are encoded left to right, top to bottom. Greater-than-byte quantities are encoded in little-endian byteorder.

          01234567 01234567 01234567 01234567
             0        1        2        3
   --    |-------- -------- -------- --------|
        0|          capture pattern          |3
         |--------|-------- -------- --------|
        4| sgmnts |   stream identification  |7
         |------|-|--------|-------- --------|
   20   8| flag |  hbytes  |      dbytes     |11
         |------|- --------|-------- --------|
       12|          32 bit checksum          |15
         |-------- -------- -------- --------|
       16|          DTS [low] word           |19
   --    |-------- -------- -------- --------|
       20?           DTS high word           |
         |-------- -------- -------- --------
         ? sequence...
 hbytes  ? distance...
         ? segment table...
         ? delay... duration... ppflags...
   --    |--------
 dbytes  ? data payloads...
   --    |--------                   

capture pattern

the capture pattern consists of four ASCII 7-bit clean bytes: 'tOgS' (0x74, 0x4F, 0x47, 0x53 in order)

segment count

'sgmnts' (8 bits, 0-255) indicates the number of packet segments encoded in this page. The first and/or last packets may be partial as specified by the FROM/TO flags (below).

stream identification

the stream identification ID is a 24-bit pseudo-random number that uniquely identifies this media stream within the larger multiplexed stream. It must be unique both in the current multiplexed section, as well as globally unique within a chained stream. The large size of the 24 bit ID is intended to be used like a weak hash such that it will be highly unlikely to need to rewrite a stream's ID number (and thus rechecksum all the pages as well) when multiplexing or concatenating.


'flags' defines seven bit flags (bits 0-6 of byte 8) as follows:

  0 FROM:   set == initial packet continued from previous page
                   note: unset if page contains no packets
  1 TO  :   set == final packet continued on next page
                   note: unset if page contains no packets
  2 CRC :   set == checksum applies to header and data
          unset == checksum applies to header fields only
  3 SYNC:   set == payload data begins with a syncpoint/keyframe
                   note: always set for keyframeless codecs
                   note: set if a keyframe/syncpoint packet is continued
                         onto the current page
  4 SEQ :   set == sequence field is present
  5 DURA:   set == full raw duration encoding present 
  6 EVIL:          as specified in RFC 3514

header variable fields bytes

'hbytes' (bit 7 of byte 8 and bits 0-7 of byte 9 for 9 bits total, 0-511) indicates the number of bytes spanned by the variable-length header fields (DTS high word, sequence, distance, lacing, and delay/duration/ppflags fields)

data payload bytes

'dbytes' (16 bits, 0-65535) indicates the number of bytes of data payload.


the checksum is 32 bit CRC value (direct algorithm, initial val and final XOR = 0, generator polynomial=0x04c11db7) encoded in the page header in little-endian format. The checksum is computed over the 20+hbytes header bytes, skipping the CRC bytes. When the CRC flag is set, the CRC continues over the entire page body (dbytes).

delivery/decode-time stamp

The DTS field is a variable-length encoded delivery time stamp value, equivalent to the high bits of the granule position in the original Ogg container. The DTS value is encoded in V32/64 format and is either 4 or 8 bytes in total.

sequence field

The sequence field is present when the SEQ flag is set; this field orders any sequence of pages that have the same DTS, such as pages without complete packets, or pages containing only packets with zero duration. The sequence value is encoded in V8/16/32 format. The first page in a sequence of pages with identical DTS does not set the SEQ flag. The second page in a sequence sets the SEQ flag and the sequence value to zero. The third page in a sequence sets the SEQ flag and the sequence value to one, etc.

distance field

The distance field is conditionally present only if the SYNC flag is unset. It is equal to the DTS of the current page minus the DTS of the previous syncpoint _packet_ (not page) minus one. The value is encoded in V8/16/32 format (see below).

segment table

Lacing values encode the length of each payload segment in the page into the segment table. Lacing values are coded in one, two or three bytes. Lacing values are coded until the total number of coded segment lengths == 'sgmnts-1'. Length of the last segment is implicit, equalling the unencoded remainder of 'dbytes', which may be zero.

Overrunning the number of declared segments (ie, a zero run encodes past the 'sgmnts'-1 limit), or underruning the expected number of segments (ie, reading to the end of hbytes before seeing the expected number of segments) shall be considered an error condition rendering the page undecodable. Lacing values may encode no more than 255 segments total (including the implicit last segment), null or otherwise, in a single page.

If 'sgmnts' is zero, the page is a null-page containing no data. 'dbytes' must also be zero. More on proper encoding and use of null pages

per-packet fields

the delay, duration and per-packet flags are collectively byte-aligned and fill out the remainder of the 'hbytes' span not filled by the DTS, sequence, distance and lacing fields. Within this span, the individual delay, duration and flag fields are bit-aligned using a big-endian byte-packer.

delay values

Delay values are written first. A field of N bits is written for every packet completed on the page. N is set in the stream metaheader; N may be zero in which case no delay values are written. The value encoded is the PTS minus the DTS of the packet. The value in unsigned (positive).

duration values

Duration values are written next, bit aligned to the end of the delay values. A duration value is written for each packet completed on the page.

If the DURA flag is unset, each duration value is written as an N bit quantity, where N is set in the metadata header. N may be zero. The value as written is interpreted according to the duration base, duration multiplier and duration table declared in the metadata header.

If the DURA flag is set, each duration value is instead encoded as an V8/16/32 value (see below) and interpreted directly against the stream's master PTS/DTS timebase.

per-packet 'private' flags

Codec-private 'per packet' flags are encoded next. A field of N bits is written for every packet completed on the page. N is set in the stream metadata; N may be zero in which case no packet flags are written.


Any unused bits needed to fill the last byte out such that the lags are written into an integral number of bytes are set to zero. The hbytes field may not be used to 'pad' the flags fields with extra space; more than seven 'left over' bits (hbytes + 4 - DTS bytes - sequence field bytes - distance field bytes - segment tabel bytes) * 8 - flag bits > 7) shall be considered an error rendering the page undecodable.

data payload

Data payload is byte-aligned and immediately follows the last flag byte. The size of the data payload is equal to dbytes.

Value encodings

Lacing codewords

Lacing codewords are as follows:

  first byte == 0 through 
                251 : stop reading, use unsigned value as sizeof packet.  
  	      	       Note that zero is a valid packet size.
             == 252 : read a second byte; packet size is the unsigned value
                      of the second byte + 252.
             == 253 : read a second byte; packet size is the unsigned value
                      of the second byte + 508.
             == 254 : read a second byte; packet size is the unsigned value
                      of the second byte + 764.
             == 255 : read a second and third unsigned byte;
                      If the second byte is < 251, packet size is
      	                  (second byte << 8) + third byte + 1020.
                      If the second byte == 252, read a third byte.
                         If the third byte is < 4, packet size is
                            (second byte << 8) + third byte + 1020.
                         If the third byte is >= 4, this indicates the 
                            presence of (third byte) zero-length packets 
                            in sequence.  It is always more efficient to 
                            code more than three zero-length packets in 
                            sequence using the this three-byte signalling, 
                            however muxers MAY use either encoding.  
                            Demuxers MUST handle both cases.
                      If the second byte > 252, this indicates a case reserved
                      for future use; this shall render the page not 

V32/64 format

V32/64 is a simple bit-extension format that uses a single leading bit (LSbit of the codeword) to indicate if the codeword is 32 or 64 bits.

inital bit: 0 -> codeword is 32 bits; upper 31 bits encode an unsigned value between 0 and 2^31-1
            1 -> codeword is 64 bits; upper 63 bits encode an unsigned value between 2^31 and 2^63 + 2^31 -1

Syncpoints / Keyframes

Streams in which not every frame serves as a syncpoint may place only one syncpoint (keyframe) packet per page. The syncpoint packet must be the first packet completed on the page (if any).

More Information