TransOgg

From XiphWiki

(Difference between revisions)
Jump to: navigation, search
m
Line 4: Line 4:
TransOgg is an updated Ogg container (think of it as Ogg 2.0) that makes some changes to the
TransOgg is an updated Ogg container (think of it as Ogg 2.0) that makes some changes to the
-
Ogg transport layer and more directly tackles metadata.  The core philosophy of
+
Ogg transport layer and more directly tackles metadata.  [[TransOggChangesFromOgg|transOgg: Changes from Ogg]] summarizes the major changes from the original Ogg container design.
-
building a layered container out of a basic streaming transport is
+
This page presents a long-winded, standalone overview of transOgg design points and rationale.
-
unchanged.  It's best to look at transOgg from the standpoint of what
+
-
changes from the original Ogg as the basic design is the same.
+
-
 
+
-
== Transport Changes ==
+
-
 
+
-
=== Full stream metadata on every packet ===
+
-
 
+
-
Most containers replicate some codec data into the container layer for ease of implementation, balancing overhead/complexity with convenience.  This pushes back (but does not eliminate) the need for a codec-specific 'stubs' or 'packetizers'.
+
-
 
+
-
The original Ogg design took a 'maximally minimalist' stance on stream
+
-
metadata, not replicating any data into the container layer that could
+
-
be provided by a codec stub.  This was not a popular design decision
+
-
to put it mildly, mainly because it pushed more of the implementation
+
-
work out of the container lib and onto external framework implementors.
+
-
 
+
-
For that reason, transOgg goes the opposite direction.  It stamps full
+
-
container metadata on every packet, pushing back the need for
+
-
packetizers even further, and hopefully saving more work.
+
-
 
+
-
=== Generalized/formalized timing and interleave metadata ===
+
-
 
+
-
The timing, interleave, structure, and codec-specific fields must be
+
-
fully generalized, specified and declared by the container.
+
-
 
+
-
=== New lacing scheme ===
+
-
 
+
-
Packet size encoding is tweaked to use a new extension pivot; rather
+
-
than extending the packet size encoding from an extension value of
+
-
255, the new pivot is 252.  This allows the length of any packet
+
-
segment in a page to be encoded in at most three bytes, preserves
+
-
small-packet encoding efficiency and also allows signalling for runs
+
-
of zero-packets in null-packet based VFR schemes.
+
-
 
+
-
== Metadata ==
+
-
 
+
-
As in the original Ogg design, metadata is encapsulated within the
+
-
low-level transport as a stream.  Unlike the original Ogg, transOgg
+
-
metadata is mandatory and necessary for stream operation. Metadata
+
-
uses stream ID 0 in all links.  All other stream IDs must be unique to
+
-
the complete stream.
+
-
 
+
-
=== Structural metadata ===
+
-
 
+
-
This is the essential metadata required for operation of the
+
-
transOgg's mux layer.  As in the original Ogg, no metadata is required
+
-
to capture, parse and recover packets from the page stream, however
+
-
structural metadata is required to interpret many of the values
+
-
contained in the page.  This data exists in the form of a
+
-
header/footer pair and is mandatory.
+
-
 
+
-
* Per-stream metadata, such as timing, sync/order flags, codec flags, etc.
+
-
* Mandatory stream-global information (enumeration of codecs)
+
-
* Per-stream codec setup information (codec headers)
+
-
* seeking index (optional)
+
-
* Chaining linkage
+
-
 
+
-
=== Semantic metadata ===
+
-
 
+
-
This is metadata used to properly present or semantically augment the
+
-
data of the stream itself.
+
-
 
+
-
* Stream relationships (primary/secondary angles, languages, overlays, etc)
+
-
* Non-linear features such as chapters
+
-
* Fixed (header/footer) and streamable semantic metadata
+
-
* Streamed (rolling) headers for in-band capture
+
= Design Points =
= Design Points =
Line 76: Line 11:
In no particular order:
In no particular order:
-
*[discuss] transOgg is degined for local storage and packet-based transport.  It is arranged to minimize round-trips ('seeks' or 'random access') for potentially ultra-high latency systems like HTTP streaming, as well as avoid any structural requirement to burst large amounts of data in a constrained-rate stream. Some intended uses include:
+
* transOgg is degined for local storage and packet-based transport.  It is arranged to minimize round-trips ('seeks' or 'random access') for potentially ultra-high latency systems like HTTP streaming, as well as avoid any structural requirement to burst large amounts of data in a constrained-rate stream. Some intended uses include:
** HTTP push/pull streaming (eg, HTML5, icecast/shoutcast).
** HTTP push/pull streaming (eg, HTML5, icecast/shoutcast).
** local file storage (eg, digital video storage and online distribution)  
** local file storage (eg, digital video storage and online distribution)  
Line 82: Line 17:
** packet broadcast (eg, UDP multicast, encrypted multicast)
** packet broadcast (eg, UDP multicast, encrypted multicast)
-
*[discuss] transOgg is designed for variable, unpredictably sized data payloads with no minumum or maximum size.
+
* transOgg is designed for variable, unpredictably sized data payloads with no minumum or maximum size.
-
*[discuss] The transOgg container is structurally designed for streaming (both live and progressive download).  It is not possible to construct a valid transOgg stream that is unsuitable for streaming.
+
* The transOgg container is structurally designed for streaming (both live and progressive download).  It is not possible to construct a valid transOgg stream that is unsuitable for streaming.
-
*[discuss] transOgg defines two steam types: continuous-time and discontinuous-time.  Continuous-time streams are gapless media such as video and audio. Discontinuous-time streams are media types with unpredictably or irregularly placed data, such as subtitles and timed metadata. (expand on continuous types and how to 'suspend' the stream using null
+
* transOgg defines two steam types: continuous-time and discontinuous-time.  Continuous-time streams are gapless media such as video and audio. Discontinuous-time streams are media types with unpredictably or irregularly placed data, such as subtitles and timed metadata. (expand on continuous types and how to 'suspend' the stream using null pages/null packets and duration fields)
-
pages/null packets and duration fields)
+
-
*[discuss] transOgg metadata is structurally encapsulated into the transport stream but located at fixed, predictable positions (excepting streamed metadata, which are treated as a discontinuous stream).
+
* transOgg metadata is structurally encapsulated into the transport stream but located at fixed, predictable positions (excepting streamed metadata, which are treated as a discontinuous stream).
-
*[discuss] transOgg retains Ogg's non-heirarchical page structure. The new page structure is a blend of Ogg pages, Matroska clusters/blocks and NUT packets.  Achievable minimum overhead drops to under .04%; practical overhead improves upon NUT, Ogg and Matroska.
+
* transOgg retains Ogg's non-heirarchical page structure. The new page structure is a blend of Ogg pages, Matroska clusters/blocks and NUT packets.  Achievable minimum overhead drops to under .04%; practical overhead improves upon NUT, Ogg and Matroska.
-
*[discuss] A transOgg stream always captures and begins demux within 128kB maximum. Fine-grained capture is necessary for efficient streaming, seeking and scrubbing.  The overhead tradeoff of a frequent capture pattern is negligable and fully offset by other improvements.
+
* A transOgg stream always captures and begins demux within 128kB maximum. Fine-grained capture is necessary for efficient streaming, seeking and scrubbing.  The overhead tradeoff of a frequent capture pattern is negligable and fully offset by other improvements.
-
*[discuss] Multiplexing of multiple elementary streams is performed by interleaving at the page level.  The multiplexing algorithm is fully specified, deterministic and delivers optimal buffering behavior. There is no educated guessing or multiple possible practices.
+
* Multiplexing of multiple elementary streams is performed by interleaving at the page level.  The multiplexing algorithm is fully specified, deterministic and delivers optimal buffering behavior. There is no educated guessing or multiple possible practices.
-
*[discuss] transOgg buffering is simple and explicitly specified.
+
* transOgg buffering is simple and explicitly specified.
-
*[discuss] transOgg implements nonlinear features such as menus, chapters, loop points, and branch points out of its linear stream transport by borrowing as completely as possible from CMML, Skeleton and Matroska's EBML metadata specification.
+
* transOgg implements nonlinear features such as menus, chapters, loop points, and branch points out of its linear stream transport by borrowing as completely as possible from CMML, Skeleton and Matroska's EBML metadata specification.
-
*[discuss] Valid transOgg streams may be concatenated to form a new, valid transOgg stream.  Mandatory reverse linkage at the end of each stream eliminates the need for interpolated bisection search when opening concatenated streams.  Cross-link metadata provides file-global indexing and chaptering for chained streams.
+
* Valid transOgg streams may be concatenated to form a new, valid transOgg stream.  Mandatory reverse linkage at the end of each stream eliminates the need for interpolated bisection search when opening concatenated streams.  Cross-link metadata provides file-global indexing and chaptering for chained streams.
-
*[discuss] transOgg metadata begins and ends every stream.  It is mandatory, fully specified, and part of 'container knowledge'.
+
* transOgg metadata begins and ends every stream.  It is mandatory, fully specified, and part of 'container knowledge'.
** A transOgg stream must begin with the master metadata header.  This master header is the first page[s] of the physical transOgg bitstream as well as the logical master metadata stream.
** A transOgg stream must begin with the master metadata header.  This master header is the first page[s] of the physical transOgg bitstream as well as the logical master metadata stream.
** The metadata stream is a discontinuous stream that may  provide additional timed metadata and events throughout the stream,  similar to NUT 'info packets'.
** The metadata stream is a discontinuous stream that may  provide additional timed metadata and events throughout the stream,  similar to NUT 'info packets'.
Line 109: Line 43:
** Tags (user contributed metadata) and Cues (the index)  may appear at the head of the stream or in the footer, as may any  other metadata elements that could not be known before stream end in  a live stream (eg, duration).  Single-pass creation tools write these elements in the footer metadata.  Tools can later move these  elements to the header metadata.  All other metadata elements may  appear only in the header.
** Tags (user contributed metadata) and Cues (the index)  may appear at the head of the stream or in the footer, as may any  other metadata elements that could not be known before stream end in  a live stream (eg, duration).  Single-pass creation tools write these elements in the footer metadata.  Tools can later move these  elements to the header metadata.  All other metadata elements may  appear only in the header.
-
*[discuss] Headerless capture, multicast, and stateless unicast MAY be supported within the metadata stream using "rolling headers",similar to the "rolling intra" mechanism proposed for the Theora videocodec. This allows stream capture and playback in a bounded timeperiod without OOB transmission of headers or bitrate spikes.  It also facilitates file recovery in the event the stream headers are lost.
+
* Headerless capture, multicast, and stateless unicast MAY be supported within the metadata stream using "rolling headers",similar to the "rolling intra" mechanism proposed for the Theora videocodec. This allows stream capture and playback in a bounded timeperiod without OOB transmission of headers or bitrate spikes.  It also facilitates file recovery in the event the stream headers are lost.
-
*[discuss] Structural codec metadata, such as timebase, keyframing, coding delay, page duration, etc, are replicated in the transOgg container.  Unlike Ogg (and to a lesser degree Matroska), no knowledge must be queried or assumed based on the specific codecs in use inorder to mux, demux, remux, repaginate, or seek in a bitstream.
+
* Structural codec metadata, such as timebase, keyframing, coding delay, page duration, etc, are replicated in the transOgg container.  Unlike Ogg (and to a lesser degree Matroska), no knowledge must be queried or assumed based on the specific codecs in use inorder to mux, demux, remux, repaginate, or seek in a bitstream.
-
*[discuss] As in NUT, all streams have their own rational timebase.  The encoding used is a parameterized generalization of Ogggranule positions. The granule timebase and parameters are fully specified and declared in the container. The granule mechanism is capable of exact sample positioning without approximation, expressing PTS and DTS of out-of-order encodings, preroll/delay of keyframe-lesscodecs, and distance from last syncpoint.
+
* As in NUT, all streams have their own rational timebase.  The encoding used is a parameterized generalization of Ogggranule positions. The granule timebase and parameters are fully specified and declared in the container. The granule mechanism is capable of exact sample positioning without approximation, expressing PTS and DTS of out-of-order encodings, preroll/delay of keyframe-lesscodecs, and distance from last syncpoint.
-
*[discuss] All encapsulated packets are stamped with full DTS, PTS, duration, delay, and syncpoint distance.
+
* All encapsulated packets are stamped with full DTS, PTS, duration, delay, and syncpoint distance.
-
*[discuss] Whenever possible, the transOgg specification presents a single, correct, optimal MUST behavior.  Whenever possible, the container design seeks to make MUST behaviors structural.  We avoid handwaving essential behaviors into 'best practices' documents 'to be specified later'.
+
* Whenever possible, the transOgg specification presents a single, correct, optimal MUST behavior.  Whenever possible, the container design seeks to make MUST behaviors structural.  We avoid handwaving essential behaviors into 'best practices' documents 'to be specified later'.
-
*[discuss] the core transOgg container seeks to avoid optional structures, switches, code paths, and features in its framing mechanisms. Optional structures and features are acceptable (and necessary) within metadata.
+
* the core transOgg container seeks to avoid optional structures, switches, code paths, and features in its framing mechanisms. Optional structures and features are acceptable (and necessary) within metadata.

Revision as of 10:42, 27 May 2010

Due to limitations of wiki-syntax, it should be noted that this page cannot be correctly titled 'transOgg'. The 't' is lowercase.

What is transOgg?

TransOgg is an updated Ogg container (think of it as Ogg 2.0) that makes some changes to the Ogg transport layer and more directly tackles metadata. transOgg: Changes from Ogg summarizes the major changes from the original Ogg container design. This page presents a long-winded, standalone overview of transOgg design points and rationale.

Design Points

In no particular order:

  • transOgg is degined for local storage and packet-based transport. It is arranged to minimize round-trips ('seeks' or 'random access') for potentially ultra-high latency systems like HTTP streaming, as well as avoid any structural requirement to burst large amounts of data in a constrained-rate stream. Some intended uses include:
    • HTTP push/pull streaming (eg, HTML5, icecast/shoutcast).
    • local file storage (eg, digital video storage and online distribution)
    • physical media (eg, digital video distribution on optical media)
    • packet broadcast (eg, UDP multicast, encrypted multicast)
  • transOgg is designed for variable, unpredictably sized data payloads with no minumum or maximum size.
  • The transOgg container is structurally designed for streaming (both live and progressive download). It is not possible to construct a valid transOgg stream that is unsuitable for streaming.
  • transOgg defines two steam types: continuous-time and discontinuous-time. Continuous-time streams are gapless media such as video and audio. Discontinuous-time streams are media types with unpredictably or irregularly placed data, such as subtitles and timed metadata. (expand on continuous types and how to 'suspend' the stream using null pages/null packets and duration fields)
  • transOgg metadata is structurally encapsulated into the transport stream but located at fixed, predictable positions (excepting streamed metadata, which are treated as a discontinuous stream).
  • transOgg retains Ogg's non-heirarchical page structure. The new page structure is a blend of Ogg pages, Matroska clusters/blocks and NUT packets. Achievable minimum overhead drops to under .04%; practical overhead improves upon NUT, Ogg and Matroska.
  • A transOgg stream always captures and begins demux within 128kB maximum. Fine-grained capture is necessary for efficient streaming, seeking and scrubbing. The overhead tradeoff of a frequent capture pattern is negligable and fully offset by other improvements.
  • Multiplexing of multiple elementary streams is performed by interleaving at the page level. The multiplexing algorithm is fully specified, deterministic and delivers optimal buffering behavior. There is no educated guessing or multiple possible practices.
  • transOgg buffering is simple and explicitly specified.
  • transOgg implements nonlinear features such as menus, chapters, loop points, and branch points out of its linear stream transport by borrowing as completely as possible from CMML, Skeleton and Matroska's EBML metadata specification.
  • Valid transOgg streams may be concatenated to form a new, valid transOgg stream. Mandatory reverse linkage at the end of each stream eliminates the need for interpolated bisection search when opening concatenated streams. Cross-link metadata provides file-global indexing and chaptering for chained streams.
  • transOgg metadata begins and ends every stream. It is mandatory, fully specified, and part of 'container knowledge'.
    • A transOgg stream must begin with the master metadata header. This master header is the first page[s] of the physical transOgg bitstream as well as the logical master metadata stream.
    • The metadata stream is a discontinuous stream that may provide additional timed metadata and events throughout the stream, similar to NUT 'info packets'.
    • A transOgg stream must end with metadata footer page[s] that provide reverse linkage to the beginning of the stream.
    • Tags (user contributed metadata) and Cues (the index) may appear at the head of the stream or in the footer, as may any other metadata elements that could not be known before stream end in a live stream (eg, duration). Single-pass creation tools write these elements in the footer metadata. Tools can later move these elements to the header metadata. All other metadata elements may appear only in the header.
  • Headerless capture, multicast, and stateless unicast MAY be supported within the metadata stream using "rolling headers",similar to the "rolling intra" mechanism proposed for the Theora videocodec. This allows stream capture and playback in a bounded timeperiod without OOB transmission of headers or bitrate spikes. It also facilitates file recovery in the event the stream headers are lost.
  • Structural codec metadata, such as timebase, keyframing, coding delay, page duration, etc, are replicated in the transOgg container. Unlike Ogg (and to a lesser degree Matroska), no knowledge must be queried or assumed based on the specific codecs in use inorder to mux, demux, remux, repaginate, or seek in a bitstream.
  • As in NUT, all streams have their own rational timebase. The encoding used is a parameterized generalization of Ogggranule positions. The granule timebase and parameters are fully specified and declared in the container. The granule mechanism is capable of exact sample positioning without approximation, expressing PTS and DTS of out-of-order encodings, preroll/delay of keyframe-lesscodecs, and distance from last syncpoint.
  • All encapsulated packets are stamped with full DTS, PTS, duration, delay, and syncpoint distance.
  • Whenever possible, the transOgg specification presents a single, correct, optimal MUST behavior. Whenever possible, the container design seeks to make MUST behaviors structural. We avoid handwaving essential behaviors into 'best practices' documents 'to be specified later'.
  • the core transOgg container seeks to avoid optional structures, switches, code paths, and features in its framing mechanisms. Optional structures and features are acceptable (and necessary) within metadata.
Personal tools


Main Page

Xiph.Org Projects

Audio—

Video—

Text—

Container—

Streaming—