transOgg

From XiphWiki

(Difference between revisions)
Jump to: navigation, search
m (formatting)
m (I lied (it didn't work correctly under Preview))
 
(14 intermediate revisions not shown)
Line 1: Line 1:
-
Due to limitations of wiki-syntax, it should be noted that this page cannot be correctly titled 'transOgg'.  The 't' is lowercase.
+
{{DISPLAYTITLE:transOgg}}
 +
[[Image:proposed-transogg-logo.png|415px|right]]
= What is transOgg? =
= What is transOgg? =
-
TransOgg is an updated Ogg container (think of it as Ogg 2.0) that makes some changes to the
+
For a long time there have been discussions of what we in Xiph would change in the Ogg container once we considered it appropriate to break spec. transOgg is an updated Ogg container (ie Ogg v 2) that makes some changes to the
-
Ogg transport layer and more directly tackles metadata.  The core philosophy of
+
Ogg transport layer and more directly tackles metadata.  [[TransOggChangesFromOgg|transOgg: Changes from Ogg]] summarizes the major changes from the original Ogg container design.
-
building a layered container out of a basic streaming transport is
+
This page presents an overview of nebulous transOgg design points and rationale.
-
unchanged.  It's best to look at transOgg from the standpoint of what
+
-
changes from the original Ogg as the basic design is the same.
+
-
== Transport Changes ==
+
As of today, transOgg exists only in the form of the whitepapers and structure proposals here.  This spec is only in the very early stages of being written.  No code exists as yet.
-
=== Full stream metadata on every packet ===
+
= Design Points =
-
Most containers replicate some codec data into the container layer for ease of implementation, balancing overhead/complexity with convenience.  This pushes back (but does not eliminate) the need for a codec-specific 'stubs' or 'packetizers'.
+
In no particular order:
-
The original Ogg design took a 'maximally minimalist' stance on stream
+
* transOgg is degined for local storage and packet-based transport. Some intended uses include:
-
metadata, not replicating any data into the container layer that could
+
** HTTP push/pull streaming (eg, HTML5, icecast/shoutcast).
-
be provided by a codec stub. This was not a popular design decision
+
** local file storage (eg, digital video storage and online distribution)
-
to put it mildly, mainly because it pushed more of the implementation
+
** physical media (eg, digital video distribution on optical media)
-
work out of the container lib and onto external framework implementors.
+
** packet broadcast (eg, UDP multicast, encrypted multicast)
-
For that reason, transOgg goes the opposite direction.  It stamps full
+
* transOgg is designed for variable, unpredictably sized data payloads with no minumum or maximum size.
-
container metadata on every packet, pushing back the need for
+
-
packetizers even further, and hopefully saving more work.
+
-
=== Generalized/formalized timing and interleave metadata ===
+
* The transOgg container is structurally designed for streaming (both live and progressive download).  It is not possible to construct a valid transOgg stream that is unsuitable for streaming.
-
The timing, interleave, structure, and codec-specific fields must be
+
* transOgg defines two steam types: continuous-time and discontinuous-time.  Continuous-time streams are gapless media such as video and audio. Discontinuous-time streams are media types with unpredictably or irregularly placed data, such as subtitles and timed metadata.
-
fully generalized, specified and declared by the container.
+
-
=== New lacing scheme ===
+
* transOgg metadata is structurally encapsulated into the transport stream but located at fixed, predictable positions (excepting streamed metadata, which are treated as a discontinuous stream).
-
Packet size encoding is tweaked to use a new extension pivot; rather
+
* transOgg retains Ogg's flat page structure. The new/tweaked page primitive is a blend of Ogg pages, Matroska clusters/blocks and NUT packetsAchievable minimum overhead drops to under .04%; practical overhead improves upon NUT, Ogg and Matroska.
-
than extending the packet size encoding from an extension value of
+
-
255, the new pivot is 252This allows the length of any packet
+
-
segment in a page to be encoded in at most three bytes, preserves
+
-
small-packet encoding efficiency and also allows signalling for runs
+
-
of zero-packets in null-packet based VFR schemes.
+
-
== Metadata ==
+
* A transOgg stream always captures and begins demux within 128kB maximum. Fine-grained capture is necessary for efficient streaming, seeking and scrubbing.  The overhead tradeoff of a frequent capture pattern is negligable and fully offset by other improvements.
-
As in the original Ogg design, metadata is encapsulated within the
+
* Multiplexing of multiple elementary streams is performed by interleaving at the page level.  The multiplexing algorithm is fully specified, deterministic and delivers optimal buffering behavior. There is no educated guessing or multiple possible practices.
-
low-level transport as a streamUnlike the original Ogg, transOgg
+
-
metadata is mandatory and necessary for stream operation. Metadata
+
-
uses stream ID 0 in all links.  All other stream IDs must be unique to
+
-
the complete stream.
+
-
=== Structural metadata ===
+
* transOgg buffering is simple and explicitly specified.
-
This is the essential metadata required for operation of the
+
* transOgg implements nonlinear features such as menus, chapters, loop points, and branch points out of its linear stream transport by borrowing from CMML, Skeleton and Matroska's EBML metadata specification.
-
transOgg's mux layer.  As in the original Ogg, no metadata is required
+
-
to capture, parse and recover packets from the page stream, however
+
-
structural metadata is required to interpret many of the values
+
-
contained in the page.  This data exists in the form of a
+
-
header/footer pair and is mandatory.
+
-
* Per-stream metadata, such as timing, sync/order flags, codec flags, etc.
+
* Valid transOgg streams may be concatenated to form a new, valid transOgg stream. Mandatory reverse linkage at the end of each stream eliminates the need for interpolated bisection search when opening concatenated streams.  Cross-link metadata provides file-global indexing and chaptering for chained streams.
-
* Mandatory stream-global information (enumeration of codecs)
+
-
* Per-stream codec setup information (codec headers)
+
-
* seeking index (optional)
+
-
* Chaining linkage
+
-
=== Semantic metadata ===
+
* transOgg metadata begins and ends every stream.  Metadata is mandatory, fully specified, and part of 'container knowledge'.
 +
** A transOgg stream must begin with the master metadata header.  This master header is the first page[s] of the physical transOgg bitstream as well as the logical master metadata stream.
 +
** The metadata stream is a discontinuous stream that may  provide additional timed metadata and events throughout the stream,  similar to NUT 'info packets'.
 +
** A transOgg stream must end with metadata footer page[s] that provide, among other things, reverse linkage to the beginning of the stream.
 +
** Tags (user contributed metadata) and Cues (the index)  may appear at the head of the stream or in the footer, as may any  other metadata elements that could not be known before stream end in  a live stream (eg, duration).  Single-pass creation tools write these elements in the footer metadata.  Tools can later move these elements to the header metadata.  All other metadata elements may  appear only in the header.
-
This is metadata used to properly present or semantically augment the
+
* Headerless capture, multicast, and stateless unicast MAY be supported within the metadata stream using "rolling headers", similar to the "rolling intra" mechanism proposed for the Theora videocodec. This allows stream capture and playback in a bounded timeperiod without OOB transmission of headers or bitrate spikes.  It also facilitates file recovery in the event the stream headers are lost.
-
data of the stream itself.
+
-
* Stream relationships (primary/secondary angles, languages, overlays, etc)
+
* Structural codec metadata, such as timebase, keyframing, coding delay, page duration, etc, are replicated in the transOgg container.  Unlike Ogg (and to a lesser degree Matroska), no knowledge must be queried or assumed based on the specific codecs in use inorder to mux, demux, remux, repaginate, transmux, or seek in a bitstream.
-
* Non-linear features such as chapters
+
-
* Fixed (header/footer) and streamable semantic metadata
+
-
* Streamed (rolling) headers for in-band capture
+
-
= transOgg Design Points =
+
* As in NUT, all streams have their own rational timebase.  The encoding used is a parameterized generalization of Ogg granule positions. The granule timebase and parameters are fully specified and declared in the container. The granule mechanism is capable of exact sample positioning without approximation, expressing PTS and DTS of out-of-order encodings, preroll/delay of keyframe-lesscodecs, and distance from last syncpoint.
-
In no particular order:
+
* All encapsulated packets are stamped with full DTS, PTS, duration, delay, and syncpoint distance.
-
* [discuss] transOgg is degined for local storage and packet-based
+
* Whenever possible, the transOgg specification presents a single, correct, optimal MUST behaviorWhenever possible, the container design seeks to make MUST behaviors structural.  We avoid handwaving essential behaviors into 'best practices' documents 'to be specified later'.
-
transportIt is arranged to minimize round-trips ('seeks' or 'random
+
-
access') for potentially ultra-high latency systems like HTTP
+
-
streaming, as well as avoid any structural requirement to burst large
+
-
amounts of data in a constrained-rate stream.
+
-
Some intended uses include:
+
* the core transOgg container seeks to avoid optional structures, switches, code paths, and features in its framing mechanisms. Optional structures and features are acceptable (and necessary) within metadata.
-
 
+
-
** HTTP push/pull streaming (eg, HTML5, icecast/shoutcast).
+
-
** local file storage (eg, digital video storage and online distribution)
+
-
** physical media (eg, digital video distribution on optical media)
+
-
** packet broadcast (eg, UDP multicast, encrypted multicast)
+
-
*[discuss] transOgg is designed for variable, unpredictably sized
+
= High level design =
-
data payloads with no minumum or maximum size.
+
-
*[discuss] The transOgg container is structurally designed for
+
The high level transOgg design consists of a transport, metadata, and
-
streaming (both live and progressive download)It is not possible to
+
specified practicesThese pieces are conceptually seperable, but the
-
construct a valid transOgg stream that is unsuitable for streaming.
+
container cannot succeed missing any one.
-
*[discuss] transOgg defines two steam types: continuous-time and
+
== Transport ==
-
discontinuous-time.  Continuous-time streams are gapless media such as
+
-
video and audio. Discontinuous-time streams are media types with
+
-
unpredictably or irregularly placed data, such as subtitles and timed
+
-
metadata.
+
-
(expand on continuous types and how to 'suspend' the stream using null
+
Transport is the mechanism of encapsulating and delivering data.
-
pages/null packets and duration fields)
+
transOgg uses a modified/updated Ogg page mechanism for data and
 +
metadata delivery.
 +
 
 +
Transport benefits from a simple, fixed encoding.  Optional features,
 +
arbitrary extensibility, recursive or non-flat heirarchy, and
 +
conditional semantic encoding are undesirable complications in a low
 +
level transport and should be used only when clearly advantageous or
 +
unavoidable.  Specifying transport as a self-contained layer also
 +
seperates correct transport behavior and corner cases from the rest of
 +
the container behavior.
 +
 
 +
Raw A/V media is fundamentally time-linear in atomic form.  Networks
 +
and storage media deliver data for consumption in a time-linear stream
 +
of bytes.  Both suggest that a linear encoding is optimal for the
 +
low-level encapsulation.  Metadata can build non-linear presentation
 +
from linear segments.  Nonlinear structural metadata appears at the
 +
beginning and end of the stream; as such, this metadata can also be
 +
placed in the linear transport easily as the beginning and end of
 +
data.  Encapsulating metadata in the transport like the streaming data
 +
also makes it trivial to support streamed metadata and 'rolling
 +
headers' using preexisting transport mechanisms. (*-- discuss both
 +
chaining and multi-segment; metadata that can reach across segments?
 +
etc?)
 +
 
 +
== Metadata ==
-
*[discuss] transOgg metadata is structurally encapsulated into the
+
Metadata is everything in a stream/file that is not the media stream
-
transport stream but located at fixed, predictable positions
+
itself.  transOgg proposes use a packed encoding for metadata types unlikely to see much flux, and an extensibly-structured encoding for more free-form types (eg, Matroska-style metadata in an EBML
-
(excepting streamed metadata, which are treated as a discontinuous
+
encoding for stream tagging).
-
stream).
+
-
*[discuss] transOgg retains Ogg's non-heirarchical page structure.
+
Metadata encompasses a number of semantically quite different
-
The new page structure is a blend of Ogg pages, Matroska
+
concepts, eg:
-
clusters/blocks and NUT packets.  Achievable minimum overhead drops to
+
-
under .04%; practical overhead improves upon NUT, Ogg and Matroska.
+
-
*[discuss] A transOgg stream always captures and begins demux within
+
* 1: data about how the individual streams are encoded and encapsulated (codec id, timebase, continuous/discontinuous encoding, codec private data, etc). This metadata is essential to base container operation and must function as container knowledgeIt is always located in a fixed position at the beginning of the file as it must be read to bootstrap container operation.
-
128kB maximum. Fine-grained capture is necessary for efficient
+
-
streaming, seeking and scrubbingThe overhead tradeoff of a frequent
+
-
capture pattern is negligable and fully offset by other improvements.
+
-
*[discuss] Multiplexing of multiple elementary streams is performed by
+
* 2: data about navigating the file as it's currently arranged (linkages, indexing, chapters)This data is either essential to high-level container operation or essential to the application depending on how the implementation abstractions work out.
-
interleaving at the page levelThe multiplexing algorithm is fully
+
-
specified, deterministic and delivers optimal buffering behavior.
+
-
There is no educated guessing or multiple possible practices.
+
-
*[discuss] transOgg buffering is simple and explicitly specified.
+
* 3: data about how the streams are presented for playback (langauge, primary angle, available soundtrack languages, menus).  This data is needed by the application.
-
*[discuss] transOgg implements nonlinear features such as menus,
+
* 4: user-supplied comments, one-shot auxiliary data (tags, album art). This data is needed by the application and the user.
-
chapters, loop points, and branch points out of its linear stream
+
-
transport by borrowing as completely as possible from CMML, Skeleton
+
-
and Matroska's EBML metadata specification.
+
-
*[discuss] Valid transOgg streams may be concatenated to form a new,
+
Each kind of metadata shares some basic traits.  It is heirarchical,
-
valid transOgg streamMandatory reverse linkage at the end of each
+
largely conditional, and benefits from a rich stable of optional
-
stream eliminates the need for interpolated bisection search when
+
elements to be used as appropraiteIt is also likely that aside from
-
opening concatenated streamsCross-link metadata provides
+
the MUST elements required for playback (mostly from list 1), not all
-
file-global indexing and chaptering for chained streams.
+
metadata will be interesting to all playersAn obvious use case is a
 +
memory and CPU constrained mobile device with no bitmapped display
 +
which would want to entirely ignore/skip large album art chunks.
-
*[discuss] transOgg metadata begins and ends every stream.  It is
+
== Specified practices ==
-
mandatory, fully specified, and part of 'container knowledge'.
+
-
 
+
-
**[discuss] A transOgg stream must begin with the master metadata
+
-
  header.  This master header is the first page[s] of the phyical
+
-
  transOgg bitstream as well as the logical master metadata stream.
+
-
+
-
**[discuss] The metadata stream is a discontinuous stream that may
+
-
  provide additional timed metadata and events throughout the stream,
+
-
  similar to NUT 'info packets'.
+
-
**[discuss] A transOgg stream must end with metadata footer page[s]
+
Specified practices provide the instruction manual for proper use of
-
  that provide reverse linkage to the beginning of the stream.
+
the container system in all cases where the container structure allows
 +
multiple behavioral or encoding possibilities.  'best practices' is a
 +
more common term, however 'best' connotes that such guidelines are
 +
merely suggestive.  Specified practices are effectively MUST clauses
 +
that govern proper behavior rather than valid data.
-
**[discuss] Tags (user contributed metadata) and Cues (the index)
+
Specified practices are an especially weak point of current FOSS
-
  may appear at the head of the stream or in the footer, as may any
+
container offerings.  Developers of open projects often value a system
-
  other metadata elements that could not be known before stream end in
+
that offers many equivalent ways of performing a task, and leave it to
-
  a live stream (eg, duration).  Single-pass creation tools write these
+
higher-level developers (or worse, the user) to somehow make an
-
  elements in the footer metadataTools can later move these
+
informed decision of how to proceedIn the container space, this is
-
  elements to the header metadata.  All other metadata elements may
+
undesirable bordering on disaster; 'more' is 'less'.
-
  appear only in the header.
+
-
*[discuss] Headerless capture, multicast, and stateless unicast MAY
+
To choose an absurd example, it might be technologically nifty to be
-
be supported within the metadata stream using "rolling headers",
+
able to place an index into the middle of a file, but what could it
-
similar to the "rolling intra" mechanism proposed for the Theora video
+
possibly accomplish?  Absurd flexibility results in absurd bugs.  When
-
codec. This allows stream capture and playback in a bounded time
+
absurd or clearly suboptimal choices are structurally possible, the
-
period without OOB transmission of headers or bitrate spikesIt also
+
spec should not be silent on the subjectThe spec MUST explicitly
-
facilitates file recovery in the event the stream headers are lost.*
+
address allowed and disallowed behavior to the most complete degree
 +
achievable.
-
*[discuss] Structural codec metadata, such as timebase, keyframing,
+
= Proposals / Commentary Requested =
-
coding delay, page duration, etc, are replicated in the transOgg
+
-
container.  Unlike Ogg (and to a lesser degree Matroska), no knowledge
+
-
must be queried or assumed based on the specific codecs in use in
+
-
order to mux, demux, remux, repaginate, or seek in a bitstream.
+
-
*[discuss] As in NUT, all streams have their own rational
+
transOgg is a specification in the early stages of design. As such, we're soliciting feedback on specific design proposals below, with more to be added over time as the design process approaches new milestones.
-
timebase.  The encoding used is a parameterized generalization of Ogg
+
-
granule positions. The granule timebase and parameters are fully
+
-
specified and declared in the container. The granule mechanism is
+
-
capable of exact sample positioning without approximation, expressing
+
-
PTS and DTS of out-of-order encodings, preroll/delay of keyframe-less
+
-
codecs, and distance from last syncpoint.
+
-
*[discuss] All encapsulated packets are stamped with full DTS, PTS,
+
* [[TransOgg_Seeking_Proposals]]: Three proposed seeking mechanism variants for transOgg.
-
duration, delay, and syncpoint distance.
+
-
*[discuss] Whenever possible, the transOgg specification presents a single,
+
= Specification =
-
correct, optimal MUST behavior.  Whenever possible, the container
+
-
design seeks to make MUST behaviors structural.  We avoid handwaving
+
-
essential behaviors into 'best practices' documents 'to be specified
+
-
later'.
+
-
*[discuss]the core transOgg container seeks to avoid optional structures,
+
* [[TransOgg_Page]]: Specification of transOgg page primitive
-
switches, code paths, and features in its framing mechanisms.
+
* [[TransOgg_Transport]]: Specification of transOgg stream
-
Optional structures and features are acceptable (and necessary) within
+
* [[TransOgg_Metadata]]: Specification of transOgg stream metadata
-
metadata.
+

Latest revision as of 16:32, 28 September 2012

Proposed-transogg-logo.png

Contents

What is transOgg?

For a long time there have been discussions of what we in Xiph would change in the Ogg container once we considered it appropriate to break spec. transOgg is an updated Ogg container (ie Ogg v 2) that makes some changes to the Ogg transport layer and more directly tackles metadata. transOgg: Changes from Ogg summarizes the major changes from the original Ogg container design. This page presents an overview of nebulous transOgg design points and rationale.

As of today, transOgg exists only in the form of the whitepapers and structure proposals here. This spec is only in the very early stages of being written. No code exists as yet.

Design Points

In no particular order:

  • transOgg is degined for local storage and packet-based transport. Some intended uses include:
    • HTTP push/pull streaming (eg, HTML5, icecast/shoutcast).
    • local file storage (eg, digital video storage and online distribution)
    • physical media (eg, digital video distribution on optical media)
    • packet broadcast (eg, UDP multicast, encrypted multicast)
  • transOgg is designed for variable, unpredictably sized data payloads with no minumum or maximum size.
  • The transOgg container is structurally designed for streaming (both live and progressive download). It is not possible to construct a valid transOgg stream that is unsuitable for streaming.
  • transOgg defines two steam types: continuous-time and discontinuous-time. Continuous-time streams are gapless media such as video and audio. Discontinuous-time streams are media types with unpredictably or irregularly placed data, such as subtitles and timed metadata.
  • transOgg metadata is structurally encapsulated into the transport stream but located at fixed, predictable positions (excepting streamed metadata, which are treated as a discontinuous stream).
  • transOgg retains Ogg's flat page structure. The new/tweaked page primitive is a blend of Ogg pages, Matroska clusters/blocks and NUT packets. Achievable minimum overhead drops to under .04%; practical overhead improves upon NUT, Ogg and Matroska.
  • A transOgg stream always captures and begins demux within 128kB maximum. Fine-grained capture is necessary for efficient streaming, seeking and scrubbing. The overhead tradeoff of a frequent capture pattern is negligable and fully offset by other improvements.
  • Multiplexing of multiple elementary streams is performed by interleaving at the page level. The multiplexing algorithm is fully specified, deterministic and delivers optimal buffering behavior. There is no educated guessing or multiple possible practices.
  • transOgg buffering is simple and explicitly specified.
  • transOgg implements nonlinear features such as menus, chapters, loop points, and branch points out of its linear stream transport by borrowing from CMML, Skeleton and Matroska's EBML metadata specification.
  • Valid transOgg streams may be concatenated to form a new, valid transOgg stream. Mandatory reverse linkage at the end of each stream eliminates the need for interpolated bisection search when opening concatenated streams. Cross-link metadata provides file-global indexing and chaptering for chained streams.
  • transOgg metadata begins and ends every stream. Metadata is mandatory, fully specified, and part of 'container knowledge'.
    • A transOgg stream must begin with the master metadata header. This master header is the first page[s] of the physical transOgg bitstream as well as the logical master metadata stream.
    • The metadata stream is a discontinuous stream that may provide additional timed metadata and events throughout the stream, similar to NUT 'info packets'.
    • A transOgg stream must end with metadata footer page[s] that provide, among other things, reverse linkage to the beginning of the stream.
    • Tags (user contributed metadata) and Cues (the index) may appear at the head of the stream or in the footer, as may any other metadata elements that could not be known before stream end in a live stream (eg, duration). Single-pass creation tools write these elements in the footer metadata. Tools can later move these elements to the header metadata. All other metadata elements may appear only in the header.
  • Headerless capture, multicast, and stateless unicast MAY be supported within the metadata stream using "rolling headers", similar to the "rolling intra" mechanism proposed for the Theora videocodec. This allows stream capture and playback in a bounded timeperiod without OOB transmission of headers or bitrate spikes. It also facilitates file recovery in the event the stream headers are lost.
  • Structural codec metadata, such as timebase, keyframing, coding delay, page duration, etc, are replicated in the transOgg container. Unlike Ogg (and to a lesser degree Matroska), no knowledge must be queried or assumed based on the specific codecs in use inorder to mux, demux, remux, repaginate, transmux, or seek in a bitstream.
  • As in NUT, all streams have their own rational timebase. The encoding used is a parameterized generalization of Ogg granule positions. The granule timebase and parameters are fully specified and declared in the container. The granule mechanism is capable of exact sample positioning without approximation, expressing PTS and DTS of out-of-order encodings, preroll/delay of keyframe-lesscodecs, and distance from last syncpoint.
  • All encapsulated packets are stamped with full DTS, PTS, duration, delay, and syncpoint distance.
  • Whenever possible, the transOgg specification presents a single, correct, optimal MUST behavior. Whenever possible, the container design seeks to make MUST behaviors structural. We avoid handwaving essential behaviors into 'best practices' documents 'to be specified later'.
  • the core transOgg container seeks to avoid optional structures, switches, code paths, and features in its framing mechanisms. Optional structures and features are acceptable (and necessary) within metadata.

High level design

The high level transOgg design consists of a transport, metadata, and specified practices. These pieces are conceptually seperable, but the container cannot succeed missing any one.

Transport

Transport is the mechanism of encapsulating and delivering data. transOgg uses a modified/updated Ogg page mechanism for data and metadata delivery.

Transport benefits from a simple, fixed encoding. Optional features, arbitrary extensibility, recursive or non-flat heirarchy, and conditional semantic encoding are undesirable complications in a low level transport and should be used only when clearly advantageous or unavoidable. Specifying transport as a self-contained layer also seperates correct transport behavior and corner cases from the rest of the container behavior.

Raw A/V media is fundamentally time-linear in atomic form. Networks and storage media deliver data for consumption in a time-linear stream of bytes. Both suggest that a linear encoding is optimal for the low-level encapsulation. Metadata can build non-linear presentation from linear segments. Nonlinear structural metadata appears at the beginning and end of the stream; as such, this metadata can also be placed in the linear transport easily as the beginning and end of data. Encapsulating metadata in the transport like the streaming data also makes it trivial to support streamed metadata and 'rolling headers' using preexisting transport mechanisms. (*-- discuss both chaining and multi-segment; metadata that can reach across segments? etc?)

Metadata

Metadata is everything in a stream/file that is not the media stream itself. transOgg proposes use a packed encoding for metadata types unlikely to see much flux, and an extensibly-structured encoding for more free-form types (eg, Matroska-style metadata in an EBML encoding for stream tagging).

Metadata encompasses a number of semantically quite different concepts, eg:

  • 1: data about how the individual streams are encoded and encapsulated (codec id, timebase, continuous/discontinuous encoding, codec private data, etc). This metadata is essential to base container operation and must function as container knowledge. It is always located in a fixed position at the beginning of the file as it must be read to bootstrap container operation.
  • 2: data about navigating the file as it's currently arranged (linkages, indexing, chapters). This data is either essential to high-level container operation or essential to the application depending on how the implementation abstractions work out.
  • 3: data about how the streams are presented for playback (langauge, primary angle, available soundtrack languages, menus). This data is needed by the application.
  • 4: user-supplied comments, one-shot auxiliary data (tags, album art). This data is needed by the application and the user.

Each kind of metadata shares some basic traits. It is heirarchical, largely conditional, and benefits from a rich stable of optional elements to be used as appropraite. It is also likely that aside from the MUST elements required for playback (mostly from list 1), not all metadata will be interesting to all players. An obvious use case is a memory and CPU constrained mobile device with no bitmapped display which would want to entirely ignore/skip large album art chunks.

Specified practices

Specified practices provide the instruction manual for proper use of the container system in all cases where the container structure allows multiple behavioral or encoding possibilities. 'best practices' is a more common term, however 'best' connotes that such guidelines are merely suggestive. Specified practices are effectively MUST clauses that govern proper behavior rather than valid data.

Specified practices are an especially weak point of current FOSS container offerings. Developers of open projects often value a system that offers many equivalent ways of performing a task, and leave it to higher-level developers (or worse, the user) to somehow make an informed decision of how to proceed. In the container space, this is undesirable bordering on disaster; 'more' is 'less'.

To choose an absurd example, it might be technologically nifty to be able to place an index into the middle of a file, but what could it possibly accomplish? Absurd flexibility results in absurd bugs. When absurd or clearly suboptimal choices are structurally possible, the spec should not be silent on the subject. The spec MUST explicitly address allowed and disallowed behavior to the most complete degree achievable.

Proposals / Commentary Requested

transOgg is a specification in the early stages of design. As such, we're soliciting feedback on specific design proposals below, with more to be added over time as the design process approaches new milestones.

Specification

Personal tools


Main Page

Xiph.Org Projects

Audio—

Video—

Text—

Container—

Streaming—