TransOgg Seeking Proposals: Difference between revisions

From XiphWiki
Jump to navigation Jump to search
(Blow new text document into wiki; not formatted as yet.)
 
m (formatting)
 
Line 1: Line 1:
Three proposals for seeking structures/methods in transOgg
Three proposals for seeking structures/methods in transOgg


Seeking within a container is an example of 'we can't have everything,
=Overview=
so we have to choose what we really want'.  We consider the following
design requirements wants and needs:


  1) Efficient seeking with minimum latency.  Minimum overall latency
Seeking within a container is an example of 'we can't have everything, so we have to choose what we really want'We consider the following design requirements wants and needs:
  can depend heavily on the scenario, where seeking within a file on a
  hard disk, over http on a high bandwidth link/high latency and over
  http on a low bandwidth/low latency link will all have different
  optimal design requiresThus the design will have to shoot for
  good behavior across the lot.


  2) Low or no additional overhead.  Because, hey, wasted space is
# Efficient seeking with minimum latency.  Minimum overall latency can depend heavily on the scenario, where seeking within a file on a hard disk, over http on a high bandwidth link/high latency and over http on a low bandwidth/low latency link will all have different optimal design requires.  Thus the design will have to shoot for good behavior across the lot.
  wasted space.
# Low or no additional overhead.  Because, hey, wasted space is wasted space.
# Seeking must be available within Ogg's specific streaming design philosophy; an encoder must be able to write a seekable file in one pass, and a demuxer must be able to seek in any valid Ogg file.
# Muxing must not require repaginating existing data (the 'deck ofcards' metaphor)
# A design that the designers can actually document completely and that implementors are actually willing and able to implement.


  3) Seeking must be available within Ogg's specific streaming design
Seeking in original Ogg treated 2, 3, 4 as paramount, 1 as secondary, and was a disaster due to 5.  transOgg will need to strike a different balance.  I submit three seeking mechanism proposals for consideration.
  philosophy; an encoder must be able to write a seekable file in one
  pass, and a demuxer must be able to seek in any valid Ogg file.


  4) Muxing must not require repaginating existing data (the 'deck of
= Proposal 1: The original Ogg bisection search only =
  cards' metaphor)


  5) A design that the designers can actually document completely and
We stick with the original seeking algorithm as intended in the original Ogg bitstream and correct the corner cases that made it a serious pain.  There is no index. Seeking is done via bisection search:
  that implementors are actually willing and able to implement.


Seeking in original Ogg treated 2, 3, 4 as paramount, 1 as
# Make an initial guess as to the physical position of the desired seekpoint based on known stream duration and simple linear interpolation.
secondary, and was a disaster due to 5. transOgg will need to strike
# Seek and capture a page
a different balanceI submit three seeking mechanism proposals for
# Construct a new interpolated best guess as to the desired seek location based on DTS of captured page
consideration.
# repeat steps 2 and 3 until we have seen contiguous pages, the first with a time stamp immediately preceeding the desired point in time and the second with a time stamp equal-to-or-following the desired point.
#
## If none of the codec streams being decoded use backreferences, we're finished.
## In the event of codec streams with backreferences (such as is typical in video), the pages for the codec stream also include the DTS of the earliest required referenceRead forward until we've seen one page from each decoded stream using backreferences; save the earliest backreference DTS.
# Repeat the bisection search in 2-4 using the backreference DTS as the desired final seek location.


Proposal 1: The original Ogg bisection search only
This algorithm appears more complex than it is; bisection searching is a well understood technique, and the average bisection search in a 1GB ogg file (of mixed Vorbis audio and Theora video) required approximately 3.5 seeks for the initial seek and just over one more seek for the second (as the first search can inform the second quite well).


  We stick with the original seeking algorithm as intended in the
This original Ogg seeking method was unpopular for several reasons:
  original Ogg bitstream and correct the corner cases that made it a
  serious pain.  There is no index.  Seeking is done via bisection
  search:


  1) Make an initial guess as to the physical position of the desired
# The DTS and backreference encoding was defined by-codec, and thus decoding the timestamps required additional infrastructure that most frameworks had to code from scratch. Several frameworks never implemented precise seeking for this reason alone.
    seekpoint based on known stream duration and simple linear
# Corner cases introduced by specific design decisions (eg, the presence of pages with no timestamps) complicated the actual implementation.  In addition, correct behavior for these corner cases was not well documented or not documented at all.
    interpolation.
# Xiph never implemented its own all-encompassing framework to provide an example of complete seeking that worked in any Ogg file.  In general, we hardwired our own apps for seeking in either Vorbis alone or Vorbis+Theora only.
# Many frameworks that attempted to implement proper bisection seeking had bugs that went unnoticed or unfixed (owing to all of the above). These bugs could increase the number of seeks in a bisection search by a factor of 10X-100X. This gave bisection seeking a reputation for inherent poor performace.  Even Vorbisfile has suffered from a number of bugs of exactly this sort at several points.
# Stream structure discovery was also based on performing multiple bisection searches to find link boundaries. Given all the points above, and the fact that stream structure decisions added additional corner cases to implementations, structure discovery also tended to be buggy and slow.


  2) Seek and capture a page
The lesson of the above is that it is not enough to have a conceptually simple design.  The actual implementation complexity can easily explode given seemingly small design decisions going in the wrong direction.


  3) Construct a new interpolated best guess as to the desired seek
The primary strikes against proposal 1 are:
    location based on DTS of captured page


  4) repeat steps 2 and 3 until we have seen contiguous pages, the
# Given the practically poor and unstable performance of preceding bisection seeking implmentations, it will be difficult to sell adopter opinion on an updated bisection design, even one that farts unicorns.
    first with a time stamp immediately preceeding the desired point
# Even when the bisection search performs perfectly, it still requires multiple seeks (4.5 on average for exact seeking in a very smart bisector, and an upper bound of 7 for precise seek in a simplest-possible-correct implementation).
    in time and the second with a time stamp equal-to-or-following
# Backreference discovery requires reading enough data to find at least one page of each stream that uses backreferences.  Although the required amount of data is bounded (and on average, much less than the upper bound) this too increases the seeking latency.
    the desired point.


  5.a) If none of the codec streams being decoded use backreferences,
The primary advantages of proposal 1 are:
    we're finished.


  5.b) In the event of codec streams with backreferences (such as is
# The design is conceptually simple amd presents a single mechanism making use of non-redundant, authoritative data. Although it is still possible to mux an invalid file, it is much harder to end up with a stream in which second-order seeking 'helper' structures disagree with the authoritative timestamps within the page data.
    typical in video), the pages for the codec stream also include
# Proper design would allow an optimal seek implementation nearly as simple as the design.
    the DTS of the earliest required reference. Read forward
# Zero additional data overhead.
    until we've seen one page from each decoded stream using
# No additional requirements on a muxer (muxers need only obey timestamp ordering. The small deviations from this plan in original Ogg were rather serious errors in hindsight)
    backreferences; save the earliest backreference DTS.
# Strict adherence to the overall Ogg muxing philosophy of having no features unavailable in a stream context (every feature of the stream format is available to a streaming muxer).


  6) Repeat the bisection search in 2-4 using the backreference
= Proposal 2: Mandatory bisection index with an optional head-or-tail index =
    DTS as the desired final seek location.


  This algorithm appears more complex than it is; bisection searching
The structure of the Proposal 2 Ogg file is as in Proposal 1, with
  is a well understood technique, and the average bisection search in
no changes to the bisection seeking algorithm.  However, in addition
  a 1GB ogg file (of mixed Vorbis audio and Theora video) required
to the bisection seeking mechanism, an index may be optionally
  approximately 3.5 seeks for the initial seek and just over one more
present.
  seek for the second (as the first search can inform the second quite
  well).


  This original Ogg seeking method was unpopular for several reasons:
# The index may be in either the header or footer of the file (but not both). --- point of discussion; perhaps only in the tail? This would make the index a fully encode-stream-available feature.
# The index consists of hard byte offsets to the earliest backreference required to decode all continuous media types from the specified DTS. -- point of discussion; secondary index for discontinuous media streams? Discontinuous spacing can easily be too wide to include in a single-backreference index system; there would potentially need to be a dedicated index per discontinuous stream.
# The index does not have a specific, hard resolution.  It may be as fine or coarse as desired.
# The index is ephemeral and non-authoritative. It can be generated or regenerated trivially via a linear scan of the stream. Remuxing a stream would necessarily invalidate any preexisting index.
# The index is not mandatory and cannot be relied upon to exist.


  1) The DTS and backreference encoding was defined by-codec,
Experience with Matroska and the original version of Ogg shows that
    and thus decoding the timestamps required additional
where an index and a bisection mechanism coexist, implementations
    infrastructure that most frameworks had to code from scratch.
usually ignore the bisection mechanism entirely and rely on the
    Several frameworks never implemented precise seeking for this
index alone, refusing to seek when no index is present. This had
    reason alone.
been the original reason for resisting an index in Ogg, borne out by
the the fact that few Matroska demuxers implement the mandatory
bisection seeking and so can seek only in Matroska files with
indexes.  WebM went one step further by mandating an index and
eliminating the Matroska bisection seek mechanism entirely.


  2) Corner cases introduced by specific design decisions (eg, the
An index is undeniably useful.  Bisection seeking requires 3.5 seeks
    presence of pages with no timestamps) complicated the actual
on average (without backreferences) and between 4.5 and 7 average
    implementationIn addition, correct behavior for these corner
seeks in a file with backreferences, depending on the sophistication
    cases was not well documented or not documented at all.
of the search algorithmIndexes do not necessarily eliminate all
seeks, and may not improve upon the amount of data that needs to be
read, but it's possible to generate an index that does so whenever
such an index is warranted.


  3) Xiph never implemented its own all-encompassing framework to
Thus Proposal 2 assumes that the benefit of an index (improved
    provide an example of complete seeking that worked in any Ogg
seeking performance) outweighs the disadvantages (incomplete demuxer
    file.  In general, we hardwired our own apps for seeking in
implementations that cannot seek when the index is absent).
    either Vorbis alone or Vorbis+Theora only.


  4) Many frameworks that attempted to implement proper bisection
The disadvantages of Proposal 2 are:
    seeking had bugs that went unnoticed or unfixed (owing to all of
    the above). These bugs could increase the number of seeks in a
    bisection search by a factor of 10X-100X. This gave bisection
    seeking a reputation for inherent poor performace.  Even
    Vorbisfile has suffered from a number of bugs of exactly this
    sort at several points.


  5) Stream structure discovery was also based on performing multiple
# Redundant, non-structural information. Having a non-authoritative index increases the opportunity for mux and demux implementation bugs. There will certainly be cases in the wild where the index does not agree with the stream.  This disagreement isn't possible if there's no non-structural or redundant seeking information.
    bisection searches to find link boundaries. Given all the points
# A front-positioned index breaks the Ogg 'everything is always a stream' model, as a front-positioned index necessarily requires a second pass.
    above, and the fact that stream structure decisions added
# Experience in multiple containers show that implementors will tend to implement index-only seeking [and even tell users their files are 'broken' if the files don't contain an index].
    additional corner cases to implementations, structure discovery
    also tended to be buggy and slow.


  The lesson of the above is that it is not enough to have a
The advantages of Proposal 2 are:
  conceptually simple design.  The actual implementation complexity
  can easily explode given seemingly small design decisions going in
  the wrong direction.


  The primary strikes against proposal 1 are:
# Lower latency. Indexes can be generated for consumption models like the web where seeks can be very high latency, and reducing the seek number to 'always only 1' is a boon.
# A tail-positioned index would preserve the streaming model and keep time-to-first-frame low by avoiding a large index in the header at stream startup.  The index can be grabbed from the stream tail asynchronously.
# Coarse-grained indexes can be used as a mechanism to inform bisection search, still resulting in fewer seeks while also reducing the index overhead in situations where this overhead is critical.


  1) Given the practically poor and unstable performance of preceding
= Proposal 3: Bisection search with threaded backreference =
    bisection seeking implmentations, it will be difficult to sell
    adopter opinion on an updated bisection design, even one that
    farts unicorns.


  2) Even when the bisection search performs perfectly, it still
Proposal 3 makes use of bisection seeking as above, but rather than
    requires multiple seeks (4.5 on average for exact seeking in a
using an index, adds an auxiliary backreference structure threaded
    very smart bisector, and an upper bound of 7 for precise seek in
through the stream in order to eliminate the second stage of the
    a simplest-possible-correct implementation).
bisection seek.  The bisection sarch algorithm always short-circuits
at step 5:


  3) Backreference discovery requires reading enough data to find at
# Make an initial guess as to the physical position of the desired seekpoint based on known stream duration and simple linear interpolation.
    least one page of each stream that uses backreferences. Although
# Seek and capture a page
    the required amount of data is bounded (and on average, much less
# Construct a new interpolated best guess as to the desired seek location based on DTS of captured page
    than the upper bound) this too increases the seeking latency.
# repeat steps 2 and 3 until we have seen contiguous pages, the first with a time stamp immediately preceeding the desired point in time and the second with a time stamp equal-to-or-following the desired point.
#
## If none of the codec streams being decoded use backreferences, we're finished.
## If we need to seek to a backreference, the auxiliary backreference structure gives us a hard byte offset to seek back to.


  The primary advantages of proposal 1 are:
The proposed threaded backreference mechanism would append a single,
variable-length-encoded byte offset to the end of each page after
the packet data.  This addiiton would modify the capture criteria
such that capture must account for the optional, unsignaled field.
After reading packet data, demux framing would have to check for
either a capture pattern, or a backreference followed by a capture
pattern.


  1) The design is conceptually simple amd presents a single mechanism
Variations of this proposal could also include bisection hints to
    making use of non-redundant, authoritative data.  Although it is
further reduce the order of complexity of the bisection search.
    still possible to mux an invalid file, it is much harder to end
    up with a stream in which second-order seeking 'helper'
    structures disagree with the authoritative timestamps within the
    page data.


  2) Proper design would allow an optimal seek implementation nearly as
As I see it, this middle of the road approach is a compromise
    simple as the design.
without real advantages. In its favor:


  3) Zero additional data overhead.
# Streams that make use of backreferences (most AV streams) would avoid the second bisection search, always reducing the average number of seeks by a little more than one.


  4) No additional requirements on a muxer (muxers need only obey
However, there are significant relative disadvantages.
    timestamp ordering. The small deviations from this plan in
    original Ogg were rather serious errors in hindsight)


  5) Strict adherence to the overall Ogg muxing philosophy of having
# Like Proposal 2, the threaded backreference mechanism means seeking makes use of two ~ unrelated mechanisms instead of one. The threaded backreference increases mux and demux complexity.
    no features unavailable in a stream context (every feature of the
# The average seek number drops from ~ 4.5 physcal seeks to ~ 3.5. This is not a significant improvement given the additional compexity.  Proposal 2 is simpler, similar overhead, and drops best case seeks to 1.
    stream format is available to a streaming muxer).
# Like proposal 2, the additional backreference offset is redundant, non-authoritative, and non-structural.
 
Proposal 2: Mandatory bisection index with an optional head-or-tail index
 
  The structure of the Proposal 2 Ogg file is as in Proposal 1, with
  no changes to the bisection seeking algorithm.  However, in addition
  to the bisection seeking mechanism, an index may be optionally
  present.
 
  1) The index may be in either the header or footer of the file (but
    not both). --- point of discussion; perhaps only in the tail?
    This would make the index a fully encode-stream-available
    feature.
 
  2) The index consists of hard byte offsets to the earliest
    backreference required to decode all continuous media types from
    the specified DTS. -- point of discussion; secondary index for
    discontinuous media streams? Discontinuous spacing can easily be
    too wide to include in a single-backreference index system; there
    would potentially need to be a dedicated index per discontinuous
    stream.
 
  3) The index does not have a specific, hard resolution.  It may be
    as fine or coarse as desired.
 
  4) The index is ephemeral and non-authoritative. It can be generated
    or regenerated trivially via a linear scan of the stream.
    Remuxing a stream would necessarily invalidate any preexisting
    index.
 
  5) The index is not mandatory and cannot be relied upon to exist.
 
  Experience with Matroska and the original version of Ogg shows that
  where an index and a bisection mechanism coexist, implementations
  usually ignore the bisection mechanism entirely and rely on the
  index alone, refusing to seek when no index is present.  This had
  been the original reason for resisting an index in Ogg, borne out by
  the the fact that few Matroska demuxers implement the mandatory
  bisection seeking and so can seek only in Matroska files with
  indexes.  WebM went one step further by mandating an index and
  eliminating the Matroska bisection seek mechanism entirely.
 
  An index is undeniably useful.  Bisection seeking requires 3.5 seeks
  on average (without backreferences) and between 4.5 and 7 average
  seeks in a file with backreferences, depending on the sophistication
  of the search algorithm.  Indexes do not necessarily eliminate all
  seeks, and may not improve upon the amount of data that needs to be
  read, but it's possible to generate an index that does so whenever
  such an index is warranted.
 
  Thus Proposal 2 assumes that the benefit of an index (improved
  seeking performance) outweighs the disadvantages (incomplete demuxer
  implementations that cannot seek when the index is absent).
 
  The disadvantages of Proposal 2 are:
 
  1) Redundant, non-structural information. Having a non-authoritative
    index increases the opportunity for mux and demux implementation
    bugs. There will certainly be cases in the wild where the index
    does not agree with the stream.  This disagreement isn't possible
    if there's no non-structural or redundant seeking information.
 
  2) A front-positioned index breaks the Ogg 'everything is always a
    stream' model, as a front-positioned index necessarily requires a
    second pass.
 
  3) Experience in multiple containers show that implementors will
    tend to implement index-only seeking [and even tell users their
    files are 'broken' if the files don't contain an index].
 
  The advantages of Proposal 2 are:
 
  1) Latency. Indexes can be generated for consumption models like the
    web where seeks can be very high latency, and reducing the seek
    number to 'always only 1' is a boon.
 
  2) A tail-positioned index would preserve the streaming model and
    keep time-to-first-frame low by avoiding a large index in the
    header at stream startup.  The index can be grabbed from the
    stream tail asynchronously.
 
  3) Coarse-grained indexes can be used as a mechanism to inform
    bisection search, still resulting in fewer seeks while also
    reducing the index overhead in situations where this overhead is
    critical.
 
Proposal 3: Bisection search with threaded backreference
 
  Proposal 3 makes use of bisection seeking as above, but rather than
  using an index, adds an auxiliary backreference structure threaded
  through the stream in order to eliminate the second stage of the
  bisection seek.  The bisection sarch algorithm always short-circuits
  at step 5:
 
  1) Make an initial guess as to the physical position of the desired
    seekpoint based on known stream duration and simple linear
    interpolation.
 
  2) Seek and capture a page
 
  3) Construct a new interpolated best guess as to the desired seek
    location based on DTS of captured page
 
  4) repeat steps 2 and 3 until we have seen contiguous pages, the
    first with a time stamp immediately preceeding the desired point
    in time and the second with a time stamp equal-to-or-following
    the desired point.
 
  5.a) If none of the codec streams being decoded use backreferences,
    we're finished.
 
  5.b) If we need to seek to a backreference, the auxiliary
    backreference structure gives us a hard byte offset to seek back
    to.
  The proposed threaded backreference mechanism would append a single,
  variable-length-encoded byte offset to the end of each page after
  the packet data.  This addiiton would modify the capture criteria
  such that capture must account for the optional, unsignaled field.
  After reading packet data, demux framing would have to check for
  either a capture pattern, or a backreference followed by a capture
  pattern.
 
  Variations of this proposal could also include bisection hints to
  further reduce the order of complexity of the bisection search.
 
  As I see it, this middle of the road approach is a compromise
  without real advantages.  In its favor:
 
  1) Streams that make use of backreferences (most AV streams) would
    avoid the second bisection search, always reducing the average
    number of seeks by a little more than one.
 
  However, there are significant disadvantages.
 
  1) Like Proposal 2, the threaded backreference mechanism means
    seeking makes use of two ~ unrelated mechanisms instead of one.
    The threaded backreference increases mux and demux complexity.
 
  2) The average seek number drops from ~ 4.5 physcal seeks to ~ 3.5.
    This is not a significant improvement given the additional
    compexity.  Proposal 2 is simpler, similar overhead, and drops
    best case seeks to 1.
 
  3) Like proposal 2, the additional backreference offset is
    redundant, non-authoritative, and non-structural.


In short, this mechanism has the disadvantages of Proposal 2 (in
In short, this mechanism has the disadvantages of Proposal 2 (in
addition to being weird) but does not deliver as much advantage as
addition to being weird) but does not deliver as much advantage as
Proposal 2.
Proposal 2.

Latest revision as of 15:14, 18 September 2012

Three proposals for seeking structures/methods in transOgg

Overview

Seeking within a container is an example of 'we can't have everything, so we have to choose what we really want'. We consider the following design requirements wants and needs:

  1. Efficient seeking with minimum latency. Minimum overall latency can depend heavily on the scenario, where seeking within a file on a hard disk, over http on a high bandwidth link/high latency and over http on a low bandwidth/low latency link will all have different optimal design requires. Thus the design will have to shoot for good behavior across the lot.
  2. Low or no additional overhead. Because, hey, wasted space is wasted space.
  3. Seeking must be available within Ogg's specific streaming design philosophy; an encoder must be able to write a seekable file in one pass, and a demuxer must be able to seek in any valid Ogg file.
  4. Muxing must not require repaginating existing data (the 'deck ofcards' metaphor)
  5. A design that the designers can actually document completely and that implementors are actually willing and able to implement.

Seeking in original Ogg treated 2, 3, 4 as paramount, 1 as secondary, and was a disaster due to 5. transOgg will need to strike a different balance. I submit three seeking mechanism proposals for consideration.

Proposal 1: The original Ogg bisection search only

We stick with the original seeking algorithm as intended in the original Ogg bitstream and correct the corner cases that made it a serious pain. There is no index. Seeking is done via bisection search:

  1. Make an initial guess as to the physical position of the desired seekpoint based on known stream duration and simple linear interpolation.
  2. Seek and capture a page
  3. Construct a new interpolated best guess as to the desired seek location based on DTS of captured page
  4. repeat steps 2 and 3 until we have seen contiguous pages, the first with a time stamp immediately preceeding the desired point in time and the second with a time stamp equal-to-or-following the desired point.
    1. If none of the codec streams being decoded use backreferences, we're finished.
    2. In the event of codec streams with backreferences (such as is typical in video), the pages for the codec stream also include the DTS of the earliest required reference. Read forward until we've seen one page from each decoded stream using backreferences; save the earliest backreference DTS.
  5. Repeat the bisection search in 2-4 using the backreference DTS as the desired final seek location.

This algorithm appears more complex than it is; bisection searching is a well understood technique, and the average bisection search in a 1GB ogg file (of mixed Vorbis audio and Theora video) required approximately 3.5 seeks for the initial seek and just over one more seek for the second (as the first search can inform the second quite well).

This original Ogg seeking method was unpopular for several reasons:

  1. The DTS and backreference encoding was defined by-codec, and thus decoding the timestamps required additional infrastructure that most frameworks had to code from scratch. Several frameworks never implemented precise seeking for this reason alone.
  2. Corner cases introduced by specific design decisions (eg, the presence of pages with no timestamps) complicated the actual implementation. In addition, correct behavior for these corner cases was not well documented or not documented at all.
  3. Xiph never implemented its own all-encompassing framework to provide an example of complete seeking that worked in any Ogg file. In general, we hardwired our own apps for seeking in either Vorbis alone or Vorbis+Theora only.
  4. Many frameworks that attempted to implement proper bisection seeking had bugs that went unnoticed or unfixed (owing to all of the above). These bugs could increase the number of seeks in a bisection search by a factor of 10X-100X. This gave bisection seeking a reputation for inherent poor performace. Even Vorbisfile has suffered from a number of bugs of exactly this sort at several points.
  5. Stream structure discovery was also based on performing multiple bisection searches to find link boundaries. Given all the points above, and the fact that stream structure decisions added additional corner cases to implementations, structure discovery also tended to be buggy and slow.

The lesson of the above is that it is not enough to have a conceptually simple design. The actual implementation complexity can easily explode given seemingly small design decisions going in the wrong direction.

The primary strikes against proposal 1 are:

  1. Given the practically poor and unstable performance of preceding bisection seeking implmentations, it will be difficult to sell adopter opinion on an updated bisection design, even one that farts unicorns.
  2. Even when the bisection search performs perfectly, it still requires multiple seeks (4.5 on average for exact seeking in a very smart bisector, and an upper bound of 7 for precise seek in a simplest-possible-correct implementation).
  3. Backreference discovery requires reading enough data to find at least one page of each stream that uses backreferences. Although the required amount of data is bounded (and on average, much less than the upper bound) this too increases the seeking latency.

The primary advantages of proposal 1 are:

  1. The design is conceptually simple amd presents a single mechanism making use of non-redundant, authoritative data. Although it is still possible to mux an invalid file, it is much harder to end up with a stream in which second-order seeking 'helper' structures disagree with the authoritative timestamps within the page data.
  2. Proper design would allow an optimal seek implementation nearly as simple as the design.
  3. Zero additional data overhead.
  4. No additional requirements on a muxer (muxers need only obey timestamp ordering. The small deviations from this plan in original Ogg were rather serious errors in hindsight)
  5. Strict adherence to the overall Ogg muxing philosophy of having no features unavailable in a stream context (every feature of the stream format is available to a streaming muxer).

Proposal 2: Mandatory bisection index with an optional head-or-tail index

The structure of the Proposal 2 Ogg file is as in Proposal 1, with no changes to the bisection seeking algorithm. However, in addition to the bisection seeking mechanism, an index may be optionally present.

  1. The index may be in either the header or footer of the file (but not both). --- point of discussion; perhaps only in the tail? This would make the index a fully encode-stream-available feature.
  2. The index consists of hard byte offsets to the earliest backreference required to decode all continuous media types from the specified DTS. -- point of discussion; secondary index for discontinuous media streams? Discontinuous spacing can easily be too wide to include in a single-backreference index system; there would potentially need to be a dedicated index per discontinuous stream.
  3. The index does not have a specific, hard resolution. It may be as fine or coarse as desired.
  4. The index is ephemeral and non-authoritative. It can be generated or regenerated trivially via a linear scan of the stream. Remuxing a stream would necessarily invalidate any preexisting index.
  5. The index is not mandatory and cannot be relied upon to exist.

Experience with Matroska and the original version of Ogg shows that where an index and a bisection mechanism coexist, implementations usually ignore the bisection mechanism entirely and rely on the index alone, refusing to seek when no index is present. This had been the original reason for resisting an index in Ogg, borne out by the the fact that few Matroska demuxers implement the mandatory bisection seeking and so can seek only in Matroska files with indexes. WebM went one step further by mandating an index and eliminating the Matroska bisection seek mechanism entirely.

An index is undeniably useful. Bisection seeking requires 3.5 seeks on average (without backreferences) and between 4.5 and 7 average seeks in a file with backreferences, depending on the sophistication of the search algorithm. Indexes do not necessarily eliminate all seeks, and may not improve upon the amount of data that needs to be read, but it's possible to generate an index that does so whenever such an index is warranted.

Thus Proposal 2 assumes that the benefit of an index (improved seeking performance) outweighs the disadvantages (incomplete demuxer implementations that cannot seek when the index is absent).

The disadvantages of Proposal 2 are:

  1. Redundant, non-structural information. Having a non-authoritative index increases the opportunity for mux and demux implementation bugs. There will certainly be cases in the wild where the index does not agree with the stream. This disagreement isn't possible if there's no non-structural or redundant seeking information.
  2. A front-positioned index breaks the Ogg 'everything is always a stream' model, as a front-positioned index necessarily requires a second pass.
  3. Experience in multiple containers show that implementors will tend to implement index-only seeking [and even tell users their files are 'broken' if the files don't contain an index].

The advantages of Proposal 2 are:

  1. Lower latency. Indexes can be generated for consumption models like the web where seeks can be very high latency, and reducing the seek number to 'always only 1' is a boon.
  2. A tail-positioned index would preserve the streaming model and keep time-to-first-frame low by avoiding a large index in the header at stream startup. The index can be grabbed from the stream tail asynchronously.
  3. Coarse-grained indexes can be used as a mechanism to inform bisection search, still resulting in fewer seeks while also reducing the index overhead in situations where this overhead is critical.

Proposal 3: Bisection search with threaded backreference

Proposal 3 makes use of bisection seeking as above, but rather than using an index, adds an auxiliary backreference structure threaded through the stream in order to eliminate the second stage of the bisection seek. The bisection sarch algorithm always short-circuits at step 5:

  1. Make an initial guess as to the physical position of the desired seekpoint based on known stream duration and simple linear interpolation.
  2. Seek and capture a page
  3. Construct a new interpolated best guess as to the desired seek location based on DTS of captured page
  4. repeat steps 2 and 3 until we have seen contiguous pages, the first with a time stamp immediately preceeding the desired point in time and the second with a time stamp equal-to-or-following the desired point.
    1. If none of the codec streams being decoded use backreferences, we're finished.
    2. If we need to seek to a backreference, the auxiliary backreference structure gives us a hard byte offset to seek back to.

The proposed threaded backreference mechanism would append a single, variable-length-encoded byte offset to the end of each page after the packet data. This addiiton would modify the capture criteria such that capture must account for the optional, unsignaled field. After reading packet data, demux framing would have to check for either a capture pattern, or a backreference followed by a capture pattern.

Variations of this proposal could also include bisection hints to further reduce the order of complexity of the bisection search.

As I see it, this middle of the road approach is a compromise without real advantages. In its favor:

  1. Streams that make use of backreferences (most AV streams) would avoid the second bisection search, always reducing the average number of seeks by a little more than one.

However, there are significant relative disadvantages.

  1. Like Proposal 2, the threaded backreference mechanism means seeking makes use of two ~ unrelated mechanisms instead of one. The threaded backreference increases mux and demux complexity.
  2. The average seek number drops from ~ 4.5 physcal seeks to ~ 3.5. This is not a significant improvement given the additional compexity. Proposal 2 is simpler, similar overhead, and drops best case seeks to 1.
  3. Like proposal 2, the additional backreference offset is redundant, non-authoritative, and non-structural.

In short, this mechanism has the disadvantages of Proposal 2 (in addition to being weird) but does not deliver as much advantage as Proposal 2.