TransOgg Seeking Proposals

From XiphWiki

(Difference between revisions)
Jump to: navigation, search
(Blow new text document into wiki; not formatted as yet.)
m (formatting)
 
Line 1: Line 1:
Three proposals for seeking structures/methods in transOgg
Three proposals for seeking structures/methods in transOgg
-
Seeking within a container is an example of 'we can't have everything,
+
=Overview=
-
so we have to choose what we really want'.  We consider the following
+
-
design requirements wants and needs:
+
-
  1) Efficient seeking with minimum latency.  Minimum overall latency
+
Seeking within a container is an example of 'we can't have everything, so we have to choose what we really want'We consider the following design requirements wants and needs:
-
  can depend heavily on the scenario, where seeking within a file on a
+
-
  hard disk, over http on a high bandwidth link/high latency and over
+
-
  http on a low bandwidth/low latency link will all have different
+
-
  optimal design requiresThus the design will have to shoot for
+
-
  good behavior across the lot.
+
-
  2) Low or no additional overhead.  Because, hey, wasted space is
+
# Efficient seeking with minimum latency.  Minimum overall latency can depend heavily on the scenario, where seeking within a file on a hard disk, over http on a high bandwidth link/high latency and over http on a low bandwidth/low latency link will all have different optimal design requires.  Thus the design will have to shoot for good behavior across the lot.
-
  wasted space.
+
# Low or no additional overhead.  Because, hey, wasted space is wasted space.
 +
# Seeking must be available within Ogg's specific streaming design philosophy; an encoder must be able to write a seekable file in one pass, and a demuxer must be able to seek in any valid Ogg file.
 +
# Muxing must not require repaginating existing data (the 'deck ofcards' metaphor)
 +
# A design that the designers can actually document completely and that implementors are actually willing and able to implement.
-
  3) Seeking must be available within Ogg's specific streaming design
+
Seeking in original Ogg treated 2, 3, 4 as paramount, 1 as secondary, and was a disaster due to 5.  transOgg will need to strike a different balance.  I submit three seeking mechanism proposals for consideration.
-
  philosophy; an encoder must be able to write a seekable file in one
+
-
  pass, and a demuxer must be able to seek in any valid Ogg file.
+
-
  4) Muxing must not require repaginating existing data (the 'deck of
+
= Proposal 1: The original Ogg bisection search only =
-
  cards' metaphor)
+
-
  5) A design that the designers can actually document completely and
+
We stick with the original seeking algorithm as intended in the original Ogg bitstream and correct the corner cases that made it a serious pain.  There is no index. Seeking is done via bisection search:
-
  that implementors are actually willing and able to implement.
+
-
Seeking in original Ogg treated 2, 3, 4 as paramount, 1 as
+
# Make an initial guess as to the physical position of the desired seekpoint based on known stream duration and simple linear interpolation.
-
secondary, and was a disaster due to 5. transOgg will need to strike
+
# Seek and capture a page
-
a different balance. I submit three seeking mechanism proposals for
+
# Construct a new interpolated best guess as to the desired seek location based on DTS of captured page
-
consideration.
+
# repeat steps 2 and 3 until we have seen contiguous pages, the first with a time stamp immediately preceeding the desired point in time and the second with a time stamp equal-to-or-following the desired point.
 +
#
 +
## If none of the codec streams being decoded use backreferences, we're finished.
 +
## In the event of codec streams with backreferences (such as is typical in video), the pages for the codec stream also include the DTS of the earliest required reference.  Read forward until we've seen one page from each decoded stream using backreferences; save the earliest backreference DTS.
 +
# Repeat the bisection search in 2-4 using the backreference DTS as the desired final seek location.
-
Proposal 1: The original Ogg bisection search only
+
This algorithm appears more complex than it is; bisection searching is a well understood technique, and the average bisection search in a 1GB ogg file (of mixed Vorbis audio and Theora video) required approximately 3.5 seeks for the initial seek and just over one more seek for the second (as the first search can inform the second quite well).
-
  We stick with the original seeking algorithm as intended in the
+
This original Ogg seeking method was unpopular for several reasons:
-
  original Ogg bitstream and correct the corner cases that made it a
+
-
  serious pain.  There is no index.  Seeking is done via bisection
+
-
  search:
+
-
  1) Make an initial guess as to the physical position of the desired
+
# The DTS and backreference encoding was defined by-codec, and thus decoding the timestamps required additional infrastructure that most frameworks had to code from scratch. Several frameworks never implemented precise seeking for this reason alone.
-
    seekpoint based on known stream duration and simple linear
+
# Corner cases introduced by specific design decisions (eg, the presence of pages with no timestamps) complicated the actual implementation.  In addition, correct behavior for these corner cases was not well documented or not documented at all.
-
    interpolation.
+
# Xiph never implemented its own all-encompassing framework to provide an example of complete seeking that worked in any Ogg file.  In general, we hardwired our own apps for seeking in either Vorbis alone or Vorbis+Theora only.
 +
# Many frameworks that attempted to implement proper bisection seeking had bugs that went unnoticed or unfixed (owing to all of the above). These bugs could increase the number of seeks in a bisection search by a factor of 10X-100X. This gave bisection seeking a reputation for inherent poor performace.  Even Vorbisfile has suffered from a number of bugs of exactly this sort at several points.
 +
# Stream structure discovery was also based on performing multiple bisection searches to find link boundaries. Given all the points above, and the fact that stream structure decisions added additional corner cases to implementations, structure discovery also tended to be buggy and slow.
-
  2) Seek and capture a page
+
The lesson of the above is that it is not enough to have a conceptually simple design.  The actual implementation complexity can easily explode given seemingly small design decisions going in the wrong direction.
-
  3) Construct a new interpolated best guess as to the desired seek
+
The primary strikes against proposal 1 are:
-
    location based on DTS of captured page
+
-
  4) repeat steps 2 and 3 until we have seen contiguous pages, the
+
# Given the practically poor and unstable performance of preceding bisection seeking implmentations, it will be difficult to sell adopter opinion on an updated bisection design, even one that farts unicorns.
-
    first with a time stamp immediately preceeding the desired point
+
# Even when the bisection search performs perfectly, it still requires multiple seeks (4.5 on average for exact seeking in a very smart bisector, and an upper bound of 7 for precise seek in a simplest-possible-correct implementation).
-
    in time and the second with a time stamp equal-to-or-following
+
# Backreference discovery requires reading enough data to find at least one page of each stream that uses backreferences.  Although the required amount of data is bounded (and on average, much less than the upper bound) this too increases the seeking latency.
-
    the desired point.
+
-
  5.a) If none of the codec streams being decoded use backreferences,
+
The primary advantages of proposal 1 are:
-
    we're finished.
+
-
  5.b) In the event of codec streams with backreferences (such as is
+
# The design is conceptually simple amd presents a single mechanism making use of non-redundant, authoritative data.  Although it is still possible to mux an invalid file, it is much harder to end up with a stream in which second-order seeking 'helper' structures disagree with the authoritative timestamps within the page data.
-
    typical in video), the pages for the codec stream also include
+
# Proper design would allow an optimal seek implementation nearly as simple as the design.
-
    the DTS of the earliest required reference. Read forward
+
# Zero additional data overhead.
-
    until we've seen one page from each decoded stream using
+
# No additional requirements on a muxer (muxers need only obey timestamp ordering. The small deviations from this plan in original Ogg were rather serious errors in hindsight)
-
    backreferences; save the earliest backreference DTS.
+
# Strict adherence to the overall Ogg muxing philosophy of having no features unavailable in a stream context (every feature of the stream format is available to a streaming muxer).
-
  6) Repeat the bisection search in 2-4 using the backreference
+
= Proposal 2: Mandatory bisection index with an optional head-or-tail index =
-
    DTS as the desired final seek location.
+
-
  This algorithm appears more complex than it is; bisection searching
+
The structure of the Proposal 2 Ogg file is as in Proposal 1, with
-
  is a well understood technique, and the average bisection search in
+
no changes to the bisection seeking algorithm.  However, in addition
-
  a 1GB ogg file (of mixed Vorbis audio and Theora video) required
+
to the bisection seeking mechanism, an index may be optionally
-
  approximately 3.5 seeks for the initial seek and just over one more
+
present.
-
  seek for the second (as the first search can inform the second quite
+
-
  well).
+
-
  This original Ogg seeking method was unpopular for several reasons:
+
# The index may be in either the header or footer of the file (but not both). --- point of discussion; perhaps only in the tail? This would make the index a fully encode-stream-available feature.
 +
# The index consists of hard byte offsets to the earliest backreference required to decode all continuous media types from the specified DTS. -- point of discussion; secondary index for discontinuous media streams? Discontinuous spacing can easily be too wide to include in a single-backreference index system; there would potentially need to be a dedicated index per discontinuous stream.
 +
# The index does not have a specific, hard resolution.  It may be as fine or coarse as desired.
 +
# The index is ephemeral and non-authoritative. It can be generated or regenerated trivially via a linear scan of the stream. Remuxing a stream would necessarily invalidate any preexisting index.
 +
# The index is not mandatory and cannot be relied upon to exist.
-
  1) The DTS and backreference encoding was defined by-codec,
+
Experience with Matroska and the original version of Ogg shows that
-
    and thus decoding the timestamps required additional
+
where an index and a bisection mechanism coexist, implementations
-
    infrastructure that most frameworks had to code from scratch.
+
usually ignore the bisection mechanism entirely and rely on the
-
    Several frameworks never implemented precise seeking for this
+
index alone, refusing to seek when no index is present. This had
-
    reason alone.
+
been the original reason for resisting an index in Ogg, borne out by
 +
the the fact that few Matroska demuxers implement the mandatory
 +
bisection seeking and so can seek only in Matroska files with
 +
indexes.  WebM went one step further by mandating an index and
 +
eliminating the Matroska bisection seek mechanism entirely.
-
  2) Corner cases introduced by specific design decisions (eg, the
+
An index is undeniably useful.  Bisection seeking requires 3.5 seeks
-
    presence of pages with no timestamps) complicated the actual
+
on average (without backreferences) and between 4.5 and 7 average
-
    implementationIn addition, correct behavior for these corner
+
seeks in a file with backreferences, depending on the sophistication
-
    cases was not well documented or not documented at all.
+
of the search algorithmIndexes do not necessarily eliminate all
 +
seeks, and may not improve upon the amount of data that needs to be
 +
read, but it's possible to generate an index that does so whenever
 +
such an index is warranted.
-
  3) Xiph never implemented its own all-encompassing framework to
+
Thus Proposal 2 assumes that the benefit of an index (improved
-
    provide an example of complete seeking that worked in any Ogg
+
seeking performance) outweighs the disadvantages (incomplete demuxer
-
    file.  In general, we hardwired our own apps for seeking in
+
implementations that cannot seek when the index is absent).
-
    either Vorbis alone or Vorbis+Theora only.
+
-
  4) Many frameworks that attempted to implement proper bisection
+
The disadvantages of Proposal 2 are:
-
    seeking had bugs that went unnoticed or unfixed (owing to all of
+
-
    the above). These bugs could increase the number of seeks in a
+
-
    bisection search by a factor of 10X-100X. This gave bisection
+
-
    seeking a reputation for inherent poor performace.  Even
+
-
    Vorbisfile has suffered from a number of bugs of exactly this
+
-
    sort at several points.
+
-
  5) Stream structure discovery was also based on performing multiple
+
# Redundant, non-structural information. Having a non-authoritative index increases the opportunity for mux and demux implementation bugs. There will certainly be cases in the wild where the index does not agree with the stream.  This disagreement isn't possible if there's no non-structural or redundant seeking information.
-
    bisection searches to find link boundaries. Given all the points
+
# A front-positioned index breaks the Ogg 'everything is always a stream' model, as a front-positioned index necessarily requires a second pass.
-
    above, and the fact that stream structure decisions added
+
# Experience in multiple containers show that implementors will tend to implement index-only seeking [and even tell users their files are 'broken' if the files don't contain an index].
-
    additional corner cases to implementations, structure discovery
+
-
    also tended to be buggy and slow.
+
-
  The lesson of the above is that it is not enough to have a
+
The advantages of Proposal 2 are:
-
  conceptually simple design.  The actual implementation complexity
+
-
  can easily explode given seemingly small design decisions going in
+
-
  the wrong direction.
+
-
  The primary strikes against proposal 1 are:
+
# Lower latency. Indexes can be generated for consumption models like the web where seeks can be very high latency, and reducing the seek number to 'always only 1' is a boon.
 +
# A tail-positioned index would preserve the streaming model and keep time-to-first-frame low by avoiding a large index in the header at stream startup.  The index can be grabbed from the stream tail asynchronously.
 +
# Coarse-grained indexes can be used as a mechanism to inform bisection search, still resulting in fewer seeks while also reducing the index overhead in situations where this overhead is critical.
-
  1) Given the practically poor and unstable performance of preceding
+
= Proposal 3: Bisection search with threaded backreference =
-
    bisection seeking implmentations, it will be difficult to sell
+
-
    adopter opinion on an updated bisection design, even one that
+
-
    farts unicorns.
+
-
  2) Even when the bisection search performs perfectly, it still
+
Proposal 3 makes use of bisection seeking as above, but rather than
-
    requires multiple seeks (4.5 on average for exact seeking in a
+
using an index, adds an auxiliary backreference structure threaded
-
    very smart bisector, and an upper bound of 7 for precise seek in
+
through the stream in order to eliminate the second stage of the
-
    a simplest-possible-correct implementation).
+
bisection seek. The bisection sarch algorithm always short-circuits
 +
at step 5:
-
  3) Backreference discovery requires reading enough data to find at
+
# Make an initial guess as to the physical position of the desired seekpoint based on known stream duration and simple linear interpolation.
-
    least one page of each stream that uses backreferences. Although
+
# Seek and capture a page
-
    the required amount of data is bounded (and on average, much less
+
# Construct a new interpolated best guess as to the desired seek location based on DTS of captured page
-
    than the upper bound) this too increases the seeking latency.
+
# repeat steps 2 and 3 until we have seen contiguous pages, the first with a time stamp immediately preceeding the desired point in time and the second with a time stamp equal-to-or-following the desired point.
 +
#
 +
## If none of the codec streams being decoded use backreferences, we're finished.
 +
## If we need to seek to a backreference, the auxiliary backreference structure gives us a hard byte offset to seek back to.
-
  The primary advantages of proposal 1 are:
+
The proposed threaded backreference mechanism would append a single,
 +
variable-length-encoded byte offset to the end of each page after
 +
the packet data.  This addiiton would modify the capture criteria
 +
such that capture must account for the optional, unsignaled field.
 +
After reading packet data, demux framing would have to check for
 +
either a capture pattern, or a backreference followed by a capture
 +
pattern.
-
  1) The design is conceptually simple amd presents a single mechanism
+
Variations of this proposal could also include bisection hints to
-
    making use of non-redundant, authoritative data.  Although it is
+
further reduce the order of complexity of the bisection search.
-
    still possible to mux an invalid file, it is much harder to end
+
-
    up with a stream in which second-order seeking 'helper'
+
-
    structures disagree with the authoritative timestamps within the
+
-
    page data.
+
-
  2) Proper design would allow an optimal seek implementation nearly as
+
As I see it, this middle of the road approach is a compromise
-
    simple as the design.
+
without real advantages. In its favor:
-
  3) Zero additional data overhead.
+
# Streams that make use of backreferences (most AV streams) would avoid the second bisection search, always reducing the average number of seeks by a little more than one.
-
  4) No additional requirements on a muxer (muxers need only obey
+
However, there are significant relative disadvantages.
-
    timestamp ordering. The small deviations from this plan in
+
-
    original Ogg were rather serious errors in hindsight)
+
-
  5) Strict adherence to the overall Ogg muxing philosophy of having
+
# Like Proposal 2, the threaded backreference mechanism means seeking makes use of two ~ unrelated mechanisms instead of one. The threaded backreference increases mux and demux complexity.
-
    no features unavailable in a stream context (every feature of the
+
# The average seek number drops from ~ 4.5 physcal seeks to ~ 3.5. This is not a significant improvement given the additional compexity.  Proposal 2 is simpler, similar overhead, and drops best case seeks to 1.
-
    stream format is available to a streaming muxer).
+
# Like proposal 2, the additional backreference offset is redundant, non-authoritative, and non-structural.
-
 
+
-
Proposal 2: Mandatory bisection index with an optional head-or-tail index
+
-
 
+
-
  The structure of the Proposal 2 Ogg file is as in Proposal 1, with
+
-
  no changes to the bisection seeking algorithm.  However, in addition
+
-
  to the bisection seeking mechanism, an index may be optionally
+
-
  present.
+
-
 
+
-
  1) The index may be in either the header or footer of the file (but
+
-
    not both). --- point of discussion; perhaps only in the tail?
+
-
    This would make the index a fully encode-stream-available
+
-
    feature.
+
-
 
+
-
  2) The index consists of hard byte offsets to the earliest
+
-
    backreference required to decode all continuous media types from
+
-
    the specified DTS. -- point of discussion; secondary index for
+
-
    discontinuous media streams? Discontinuous spacing can easily be
+
-
    too wide to include in a single-backreference index system; there
+
-
    would potentially need to be a dedicated index per discontinuous
+
-
    stream.
+
-
 
+
-
  3) The index does not have a specific, hard resolution.  It may be
+
-
    as fine or coarse as desired.
+
-
 
+
-
  4) The index is ephemeral and non-authoritative. It can be generated
+
-
    or regenerated trivially via a linear scan of the stream.
+
-
    Remuxing a stream would necessarily invalidate any preexisting
+
-
    index.
+
-
 
+
-
  5) The index is not mandatory and cannot be relied upon to exist.
+
-
 
+
-
  Experience with Matroska and the original version of Ogg shows that
+
-
  where an index and a bisection mechanism coexist, implementations
+
-
  usually ignore the bisection mechanism entirely and rely on the
+
-
  index alone, refusing to seek when no index is present.  This had
+
-
  been the original reason for resisting an index in Ogg, borne out by
+
-
  the the fact that few Matroska demuxers implement the mandatory
+
-
  bisection seeking and so can seek only in Matroska files with
+
-
  indexes.  WebM went one step further by mandating an index and
+
-
  eliminating the Matroska bisection seek mechanism entirely.
+
-
 
+
-
  An index is undeniably useful.  Bisection seeking requires 3.5 seeks
+
-
  on average (without backreferences) and between 4.5 and 7 average
+
-
  seeks in a file with backreferences, depending on the sophistication
+
-
  of the search algorithm.  Indexes do not necessarily eliminate all
+
-
  seeks, and may not improve upon the amount of data that needs to be
+
-
  read, but it's possible to generate an index that does so whenever
+
-
  such an index is warranted.
+
-
 
+
-
  Thus Proposal 2 assumes that the benefit of an index (improved
+
-
  seeking performance) outweighs the disadvantages (incomplete demuxer
+
-
  implementations that cannot seek when the index is absent).
+
-
 
+
-
  The disadvantages of Proposal 2 are:
+
-
 
+
-
  1) Redundant, non-structural information. Having a non-authoritative
+
-
    index increases the opportunity for mux and demux implementation
+
-
    bugs. There will certainly be cases in the wild where the index
+
-
    does not agree with the stream.  This disagreement isn't possible
+
-
    if there's no non-structural or redundant seeking information.
+
-
 
+
-
  2) A front-positioned index breaks the Ogg 'everything is always a
+
-
    stream' model, as a front-positioned index necessarily requires a
+
-
    second pass.
+
-
 
+
-
  3) Experience in multiple containers show that implementors will
+
-
    tend to implement index-only seeking [and even tell users their
+
-
    files are 'broken' if the files don't contain an index].
+
-
 
+
-
  The advantages of Proposal 2 are:
+
-
 
+
-
  1) Latency. Indexes can be generated for consumption models like the
+
-
    web where seeks can be very high latency, and reducing the seek
+
-
    number to 'always only 1' is a boon.
+
-
 
+
-
  2) A tail-positioned index would preserve the streaming model and
+
-
    keep time-to-first-frame low by avoiding a large index in the
+
-
    header at stream startup.  The index can be grabbed from the
+
-
    stream tail asynchronously.
+
-
 
+
-
  3) Coarse-grained indexes can be used as a mechanism to inform
+
-
    bisection search, still resulting in fewer seeks while also
+
-
    reducing the index overhead in situations where this overhead is
+
-
    critical.
+
-
 
+
-
Proposal 3: Bisection search with threaded backreference
+
-
 
+
-
  Proposal 3 makes use of bisection seeking as above, but rather than
+
-
  using an index, adds an auxiliary backreference structure threaded
+
-
  through the stream in order to eliminate the second stage of the
+
-
  bisection seek.  The bisection sarch algorithm always short-circuits
+
-
  at step 5:
+
-
 
+
-
  1) Make an initial guess as to the physical position of the desired
+
-
    seekpoint based on known stream duration and simple linear
+
-
    interpolation.
+
-
 
+
-
  2) Seek and capture a page
+
-
 
+
-
  3) Construct a new interpolated best guess as to the desired seek
+
-
    location based on DTS of captured page
+
-
 
+
-
  4) repeat steps 2 and 3 until we have seen contiguous pages, the
+
-
    first with a time stamp immediately preceeding the desired point
+
-
    in time and the second with a time stamp equal-to-or-following
+
-
    the desired point.
+
-
 
+
-
  5.a) If none of the codec streams being decoded use backreferences,
+
-
    we're finished.
+
-
 
+
-
  5.b) If we need to seek to a backreference, the auxiliary
+
-
    backreference structure gives us a hard byte offset to seek back
+
-
    to.
+
-
  The proposed threaded backreference mechanism would append a single,
+
-
  variable-length-encoded byte offset to the end of each page after
+
-
  the packet data.  This addiiton would modify the capture criteria
+
-
  such that capture must account for the optional, unsignaled field.
+
-
  After reading packet data, demux framing would have to check for
+
-
  either a capture pattern, or a backreference followed by a capture
+
-
  pattern.
+
-
 
+
-
  Variations of this proposal could also include bisection hints to
+
-
  further reduce the order of complexity of the bisection search.
+
-
 
+
-
  As I see it, this middle of the road approach is a compromise
+
-
  without real advantages.  In its favor:
+
-
 
+
-
  1) Streams that make use of backreferences (most AV streams) would
+
-
    avoid the second bisection search, always reducing the average
+
-
    number of seeks by a little more than one.
+
-
 
+
-
  However, there are significant disadvantages.
+
-
 
+
-
  1) Like Proposal 2, the threaded backreference mechanism means
+
-
    seeking makes use of two ~ unrelated mechanisms instead of one.
+
-
    The threaded backreference increases mux and demux complexity.
+
-
 
+
-
  2) The average seek number drops from ~ 4.5 physcal seeks to ~ 3.5.
+
-
    This is not a significant improvement given the additional
+
-
    compexity.  Proposal 2 is simpler, similar overhead, and drops
+
-
    best case seeks to 1.
+
-
 
+
-
  3) Like proposal 2, the additional backreference offset is
+
-
    redundant, non-authoritative, and non-structural.
+
In short, this mechanism has the disadvantages of Proposal 2 (in
In short, this mechanism has the disadvantages of Proposal 2 (in
addition to being weird) but does not deliver as much advantage as
addition to being weird) but does not deliver as much advantage as
Proposal 2.
Proposal 2.

Latest revision as of 22:14, 18 September 2012

Three proposals for seeking structures/methods in transOgg

Contents

Overview

Seeking within a container is an example of 'we can't have everything, so we have to choose what we really want'. We consider the following design requirements wants and needs:

  1. Efficient seeking with minimum latency. Minimum overall latency can depend heavily on the scenario, where seeking within a file on a hard disk, over http on a high bandwidth link/high latency and over http on a low bandwidth/low latency link will all have different optimal design requires. Thus the design will have to shoot for good behavior across the lot.
  2. Low or no additional overhead. Because, hey, wasted space is wasted space.
  3. Seeking must be available within Ogg's specific streaming design philosophy; an encoder must be able to write a seekable file in one pass, and a demuxer must be able to seek in any valid Ogg file.
  4. Muxing must not require repaginating existing data (the 'deck ofcards' metaphor)
  5. A design that the designers can actually document completely and that implementors are actually willing and able to implement.

Seeking in original Ogg treated 2, 3, 4 as paramount, 1 as secondary, and was a disaster due to 5. transOgg will need to strike a different balance. I submit three seeking mechanism proposals for consideration.

Proposal 1: The original Ogg bisection search only

We stick with the original seeking algorithm as intended in the original Ogg bitstream and correct the corner cases that made it a serious pain. There is no index. Seeking is done via bisection search:

  1. Make an initial guess as to the physical position of the desired seekpoint based on known stream duration and simple linear interpolation.
  2. Seek and capture a page
  3. Construct a new interpolated best guess as to the desired seek location based on DTS of captured page
  4. repeat steps 2 and 3 until we have seen contiguous pages, the first with a time stamp immediately preceeding the desired point in time and the second with a time stamp equal-to-or-following the desired point.
    1. If none of the codec streams being decoded use backreferences, we're finished.
    2. In the event of codec streams with backreferences (such as is typical in video), the pages for the codec stream also include the DTS of the earliest required reference. Read forward until we've seen one page from each decoded stream using backreferences; save the earliest backreference DTS.
  5. Repeat the bisection search in 2-4 using the backreference DTS as the desired final seek location.

This algorithm appears more complex than it is; bisection searching is a well understood technique, and the average bisection search in a 1GB ogg file (of mixed Vorbis audio and Theora video) required approximately 3.5 seeks for the initial seek and just over one more seek for the second (as the first search can inform the second quite well).

This original Ogg seeking method was unpopular for several reasons:

  1. The DTS and backreference encoding was defined by-codec, and thus decoding the timestamps required additional infrastructure that most frameworks had to code from scratch. Several frameworks never implemented precise seeking for this reason alone.
  2. Corner cases introduced by specific design decisions (eg, the presence of pages with no timestamps) complicated the actual implementation. In addition, correct behavior for these corner cases was not well documented or not documented at all.
  3. Xiph never implemented its own all-encompassing framework to provide an example of complete seeking that worked in any Ogg file. In general, we hardwired our own apps for seeking in either Vorbis alone or Vorbis+Theora only.
  4. Many frameworks that attempted to implement proper bisection seeking had bugs that went unnoticed or unfixed (owing to all of the above). These bugs could increase the number of seeks in a bisection search by a factor of 10X-100X. This gave bisection seeking a reputation for inherent poor performace. Even Vorbisfile has suffered from a number of bugs of exactly this sort at several points.
  5. Stream structure discovery was also based on performing multiple bisection searches to find link boundaries. Given all the points above, and the fact that stream structure decisions added additional corner cases to implementations, structure discovery also tended to be buggy and slow.

The lesson of the above is that it is not enough to have a conceptually simple design. The actual implementation complexity can easily explode given seemingly small design decisions going in the wrong direction.

The primary strikes against proposal 1 are:

  1. Given the practically poor and unstable performance of preceding bisection seeking implmentations, it will be difficult to sell adopter opinion on an updated bisection design, even one that farts unicorns.
  2. Even when the bisection search performs perfectly, it still requires multiple seeks (4.5 on average for exact seeking in a very smart bisector, and an upper bound of 7 for precise seek in a simplest-possible-correct implementation).
  3. Backreference discovery requires reading enough data to find at least one page of each stream that uses backreferences. Although the required amount of data is bounded (and on average, much less than the upper bound) this too increases the seeking latency.

The primary advantages of proposal 1 are:

  1. The design is conceptually simple amd presents a single mechanism making use of non-redundant, authoritative data. Although it is still possible to mux an invalid file, it is much harder to end up with a stream in which second-order seeking 'helper' structures disagree with the authoritative timestamps within the page data.
  2. Proper design would allow an optimal seek implementation nearly as simple as the design.
  3. Zero additional data overhead.
  4. No additional requirements on a muxer (muxers need only obey timestamp ordering. The small deviations from this plan in original Ogg were rather serious errors in hindsight)
  5. Strict adherence to the overall Ogg muxing philosophy of having no features unavailable in a stream context (every feature of the stream format is available to a streaming muxer).

Proposal 2: Mandatory bisection index with an optional head-or-tail index

The structure of the Proposal 2 Ogg file is as in Proposal 1, with no changes to the bisection seeking algorithm. However, in addition to the bisection seeking mechanism, an index may be optionally present.

  1. The index may be in either the header or footer of the file (but not both). --- point of discussion; perhaps only in the tail? This would make the index a fully encode-stream-available feature.
  2. The index consists of hard byte offsets to the earliest backreference required to decode all continuous media types from the specified DTS. -- point of discussion; secondary index for discontinuous media streams? Discontinuous spacing can easily be too wide to include in a single-backreference index system; there would potentially need to be a dedicated index per discontinuous stream.
  3. The index does not have a specific, hard resolution. It may be as fine or coarse as desired.
  4. The index is ephemeral and non-authoritative. It can be generated or regenerated trivially via a linear scan of the stream. Remuxing a stream would necessarily invalidate any preexisting index.
  5. The index is not mandatory and cannot be relied upon to exist.

Experience with Matroska and the original version of Ogg shows that where an index and a bisection mechanism coexist, implementations usually ignore the bisection mechanism entirely and rely on the index alone, refusing to seek when no index is present. This had been the original reason for resisting an index in Ogg, borne out by the the fact that few Matroska demuxers implement the mandatory bisection seeking and so can seek only in Matroska files with indexes. WebM went one step further by mandating an index and eliminating the Matroska bisection seek mechanism entirely.

An index is undeniably useful. Bisection seeking requires 3.5 seeks on average (without backreferences) and between 4.5 and 7 average seeks in a file with backreferences, depending on the sophistication of the search algorithm. Indexes do not necessarily eliminate all seeks, and may not improve upon the amount of data that needs to be read, but it's possible to generate an index that does so whenever such an index is warranted.

Thus Proposal 2 assumes that the benefit of an index (improved seeking performance) outweighs the disadvantages (incomplete demuxer implementations that cannot seek when the index is absent).

The disadvantages of Proposal 2 are:

  1. Redundant, non-structural information. Having a non-authoritative index increases the opportunity for mux and demux implementation bugs. There will certainly be cases in the wild where the index does not agree with the stream. This disagreement isn't possible if there's no non-structural or redundant seeking information.
  2. A front-positioned index breaks the Ogg 'everything is always a stream' model, as a front-positioned index necessarily requires a second pass.
  3. Experience in multiple containers show that implementors will tend to implement index-only seeking [and even tell users their files are 'broken' if the files don't contain an index].

The advantages of Proposal 2 are:

  1. Lower latency. Indexes can be generated for consumption models like the web where seeks can be very high latency, and reducing the seek number to 'always only 1' is a boon.
  2. A tail-positioned index would preserve the streaming model and keep time-to-first-frame low by avoiding a large index in the header at stream startup. The index can be grabbed from the stream tail asynchronously.
  3. Coarse-grained indexes can be used as a mechanism to inform bisection search, still resulting in fewer seeks while also reducing the index overhead in situations where this overhead is critical.

Proposal 3: Bisection search with threaded backreference

Proposal 3 makes use of bisection seeking as above, but rather than using an index, adds an auxiliary backreference structure threaded through the stream in order to eliminate the second stage of the bisection seek. The bisection sarch algorithm always short-circuits at step 5:

  1. Make an initial guess as to the physical position of the desired seekpoint based on known stream duration and simple linear interpolation.
  2. Seek and capture a page
  3. Construct a new interpolated best guess as to the desired seek location based on DTS of captured page
  4. repeat steps 2 and 3 until we have seen contiguous pages, the first with a time stamp immediately preceeding the desired point in time and the second with a time stamp equal-to-or-following the desired point.
    1. If none of the codec streams being decoded use backreferences, we're finished.
    2. If we need to seek to a backreference, the auxiliary backreference structure gives us a hard byte offset to seek back to.

The proposed threaded backreference mechanism would append a single, variable-length-encoded byte offset to the end of each page after the packet data. This addiiton would modify the capture criteria such that capture must account for the optional, unsignaled field. After reading packet data, demux framing would have to check for either a capture pattern, or a backreference followed by a capture pattern.

Variations of this proposal could also include bisection hints to further reduce the order of complexity of the bisection search.

As I see it, this middle of the road approach is a compromise without real advantages. In its favor:

  1. Streams that make use of backreferences (most AV streams) would avoid the second bisection search, always reducing the average number of seeks by a little more than one.

However, there are significant relative disadvantages.

  1. Like Proposal 2, the threaded backreference mechanism means seeking makes use of two ~ unrelated mechanisms instead of one. The threaded backreference increases mux and demux complexity.
  2. The average seek number drops from ~ 4.5 physcal seeks to ~ 3.5. This is not a significant improvement given the additional compexity. Proposal 2 is simpler, similar overhead, and drops best case seeks to 1.
  3. Like proposal 2, the additional backreference offset is redundant, non-authoritative, and non-structural.

In short, this mechanism has the disadvantages of Proposal 2 (in addition to being weird) but does not deliver as much advantage as Proposal 2.

Personal tools


Main Page

Xiph.Org Projects

Audio—

Video—

Text—

Container—

Streaming—