https://wiki.xiph.org/index.php?title=TransOgg_Seeking_Proposals&feed=atom&action=historyTransOgg Seeking Proposals - Revision history2024-03-19T10:10:30ZRevision history for this page on the wikiMediaWiki 1.40.1https://wiki.xiph.org/index.php?title=TransOgg_Seeking_Proposals&diff=13672&oldid=prevXiphmont: formatting2012-09-18T22:14:46Z<p>formatting</p>
<a href="https://wiki.xiph.org/index.php?title=TransOgg_Seeking_Proposals&diff=13672&oldid=13671">Show changes</a>Xiphmonthttps://wiki.xiph.org/index.php?title=TransOgg_Seeking_Proposals&diff=13671&oldid=prevXiphmont: Blow new text document into wiki; not formatted as yet.2012-09-18T21:51:30Z<p>Blow new text document into wiki; not formatted as yet.</p>
<p><b>New page</b></p><div>Three proposals for seeking structures/methods in transOgg<br />
<br />
Seeking within a container is an example of 'we can't have everything,<br />
so we have to choose what we really want'. We consider the following<br />
design requirements wants and needs:<br />
<br />
1) Efficient seeking with minimum latency. Minimum overall latency<br />
can depend heavily on the scenario, where seeking within a file on a<br />
hard disk, over http on a high bandwidth link/high latency and over<br />
http on a low bandwidth/low latency link will all have different<br />
optimal design requires. Thus the design will have to shoot for<br />
good behavior across the lot.<br />
<br />
2) Low or no additional overhead. Because, hey, wasted space is<br />
wasted space.<br />
<br />
3) Seeking must be available within Ogg's specific streaming design<br />
philosophy; an encoder must be able to write a seekable file in one<br />
pass, and a demuxer must be able to seek in any valid Ogg file.<br />
<br />
4) Muxing must not require repaginating existing data (the 'deck of<br />
cards' metaphor)<br />
<br />
5) A design that the designers can actually document completely and<br />
that implementors are actually willing and able to implement.<br />
<br />
Seeking in original Ogg treated 2, 3, 4 as paramount, 1 as<br />
secondary, and was a disaster due to 5. transOgg will need to strike<br />
a different balance. I submit three seeking mechanism proposals for<br />
consideration.<br />
<br />
Proposal 1: The original Ogg bisection search only<br />
<br />
We stick with the original seeking algorithm as intended in the<br />
original Ogg bitstream and correct the corner cases that made it a<br />
serious pain. There is no index. Seeking is done via bisection<br />
search:<br />
<br />
1) Make an initial guess as to the physical position of the desired<br />
seekpoint based on known stream duration and simple linear<br />
interpolation.<br />
<br />
2) Seek and capture a page<br />
<br />
3) Construct a new interpolated best guess as to the desired seek<br />
location based on DTS of captured page<br />
<br />
4) repeat steps 2 and 3 until we have seen contiguous pages, the<br />
first with a time stamp immediately preceeding the desired point<br />
in time and the second with a time stamp equal-to-or-following<br />
the desired point.<br />
<br />
5.a) If none of the codec streams being decoded use backreferences,<br />
we're finished.<br />
<br />
5.b) In the event of codec streams with backreferences (such as is<br />
typical in video), the pages for the codec stream also include<br />
the DTS of the earliest required reference. Read forward<br />
until we've seen one page from each decoded stream using<br />
backreferences; save the earliest backreference DTS.<br />
<br />
6) Repeat the bisection search in 2-4 using the backreference<br />
DTS as the desired final seek location.<br />
<br />
This algorithm appears more complex than it is; bisection searching<br />
is a well understood technique, and the average bisection search in<br />
a 1GB ogg file (of mixed Vorbis audio and Theora video) required<br />
approximately 3.5 seeks for the initial seek and just over one more<br />
seek for the second (as the first search can inform the second quite<br />
well).<br />
<br />
This original Ogg seeking method was unpopular for several reasons:<br />
<br />
1) The DTS and backreference encoding was defined by-codec,<br />
and thus decoding the timestamps required additional<br />
infrastructure that most frameworks had to code from scratch.<br />
Several frameworks never implemented precise seeking for this<br />
reason alone.<br />
<br />
2) Corner cases introduced by specific design decisions (eg, the<br />
presence of pages with no timestamps) complicated the actual<br />
implementation. In addition, correct behavior for these corner<br />
cases was not well documented or not documented at all.<br />
<br />
3) Xiph never implemented its own all-encompassing framework to<br />
provide an example of complete seeking that worked in any Ogg<br />
file. In general, we hardwired our own apps for seeking in<br />
either Vorbis alone or Vorbis+Theora only.<br />
<br />
4) Many frameworks that attempted to implement proper bisection<br />
seeking had bugs that went unnoticed or unfixed (owing to all of<br />
the above). These bugs could increase the number of seeks in a<br />
bisection search by a factor of 10X-100X. This gave bisection<br />
seeking a reputation for inherent poor performace. Even<br />
Vorbisfile has suffered from a number of bugs of exactly this<br />
sort at several points.<br />
<br />
5) Stream structure discovery was also based on performing multiple<br />
bisection searches to find link boundaries. Given all the points<br />
above, and the fact that stream structure decisions added<br />
additional corner cases to implementations, structure discovery<br />
also tended to be buggy and slow.<br />
<br />
The lesson of the above is that it is not enough to have a<br />
conceptually simple design. The actual implementation complexity<br />
can easily explode given seemingly small design decisions going in<br />
the wrong direction.<br />
<br />
The primary strikes against proposal 1 are:<br />
<br />
1) Given the practically poor and unstable performance of preceding<br />
bisection seeking implmentations, it will be difficult to sell<br />
adopter opinion on an updated bisection design, even one that<br />
farts unicorns.<br />
<br />
2) Even when the bisection search performs perfectly, it still<br />
requires multiple seeks (4.5 on average for exact seeking in a<br />
very smart bisector, and an upper bound of 7 for precise seek in<br />
a simplest-possible-correct implementation).<br />
<br />
3) Backreference discovery requires reading enough data to find at<br />
least one page of each stream that uses backreferences. Although<br />
the required amount of data is bounded (and on average, much less<br />
than the upper bound) this too increases the seeking latency.<br />
<br />
The primary advantages of proposal 1 are:<br />
<br />
1) The design is conceptually simple amd presents a single mechanism<br />
making use of non-redundant, authoritative data. Although it is<br />
still possible to mux an invalid file, it is much harder to end<br />
up with a stream in which second-order seeking 'helper'<br />
structures disagree with the authoritative timestamps within the<br />
page data.<br />
<br />
2) Proper design would allow an optimal seek implementation nearly as<br />
simple as the design.<br />
<br />
3) Zero additional data overhead.<br />
<br />
4) No additional requirements on a muxer (muxers need only obey<br />
timestamp ordering. The small deviations from this plan in<br />
original Ogg were rather serious errors in hindsight)<br />
<br />
5) Strict adherence to the overall Ogg muxing philosophy of having<br />
no features unavailable in a stream context (every feature of the<br />
stream format is available to a streaming muxer).<br />
<br />
Proposal 2: Mandatory bisection index with an optional head-or-tail index<br />
<br />
The structure of the Proposal 2 Ogg file is as in Proposal 1, with<br />
no changes to the bisection seeking algorithm. However, in addition<br />
to the bisection seeking mechanism, an index may be optionally<br />
present.<br />
<br />
1) The index may be in either the header or footer of the file (but<br />
not both). --- point of discussion; perhaps only in the tail?<br />
This would make the index a fully encode-stream-available<br />
feature.<br />
<br />
2) The index consists of hard byte offsets to the earliest<br />
backreference required to decode all continuous media types from<br />
the specified DTS. -- point of discussion; secondary index for<br />
discontinuous media streams? Discontinuous spacing can easily be<br />
too wide to include in a single-backreference index system; there<br />
would potentially need to be a dedicated index per discontinuous<br />
stream.<br />
<br />
3) The index does not have a specific, hard resolution. It may be<br />
as fine or coarse as desired.<br />
<br />
4) The index is ephemeral and non-authoritative. It can be generated<br />
or regenerated trivially via a linear scan of the stream.<br />
Remuxing a stream would necessarily invalidate any preexisting<br />
index.<br />
<br />
5) The index is not mandatory and cannot be relied upon to exist.<br />
<br />
Experience with Matroska and the original version of Ogg shows that<br />
where an index and a bisection mechanism coexist, implementations<br />
usually ignore the bisection mechanism entirely and rely on the<br />
index alone, refusing to seek when no index is present. This had<br />
been the original reason for resisting an index in Ogg, borne out by<br />
the the fact that few Matroska demuxers implement the mandatory<br />
bisection seeking and so can seek only in Matroska files with<br />
indexes. WebM went one step further by mandating an index and<br />
eliminating the Matroska bisection seek mechanism entirely.<br />
<br />
An index is undeniably useful. Bisection seeking requires 3.5 seeks<br />
on average (without backreferences) and between 4.5 and 7 average<br />
seeks in a file with backreferences, depending on the sophistication<br />
of the search algorithm. Indexes do not necessarily eliminate all<br />
seeks, and may not improve upon the amount of data that needs to be<br />
read, but it's possible to generate an index that does so whenever<br />
such an index is warranted.<br />
<br />
Thus Proposal 2 assumes that the benefit of an index (improved<br />
seeking performance) outweighs the disadvantages (incomplete demuxer<br />
implementations that cannot seek when the index is absent).<br />
<br />
The disadvantages of Proposal 2 are:<br />
<br />
1) Redundant, non-structural information. Having a non-authoritative<br />
index increases the opportunity for mux and demux implementation<br />
bugs. There will certainly be cases in the wild where the index<br />
does not agree with the stream. This disagreement isn't possible<br />
if there's no non-structural or redundant seeking information.<br />
<br />
2) A front-positioned index breaks the Ogg 'everything is always a<br />
stream' model, as a front-positioned index necessarily requires a<br />
second pass.<br />
<br />
3) Experience in multiple containers show that implementors will<br />
tend to implement index-only seeking [and even tell users their<br />
files are 'broken' if the files don't contain an index].<br />
<br />
The advantages of Proposal 2 are:<br />
<br />
1) Latency. Indexes can be generated for consumption models like the<br />
web where seeks can be very high latency, and reducing the seek<br />
number to 'always only 1' is a boon.<br />
<br />
2) A tail-positioned index would preserve the streaming model and<br />
keep time-to-first-frame low by avoiding a large index in the<br />
header at stream startup. The index can be grabbed from the<br />
stream tail asynchronously.<br />
<br />
3) Coarse-grained indexes can be used as a mechanism to inform<br />
bisection search, still resulting in fewer seeks while also<br />
reducing the index overhead in situations where this overhead is<br />
critical.<br />
<br />
Proposal 3: Bisection search with threaded backreference<br />
<br />
Proposal 3 makes use of bisection seeking as above, but rather than<br />
using an index, adds an auxiliary backreference structure threaded<br />
through the stream in order to eliminate the second stage of the<br />
bisection seek. The bisection sarch algorithm always short-circuits<br />
at step 5:<br />
<br />
1) Make an initial guess as to the physical position of the desired<br />
seekpoint based on known stream duration and simple linear<br />
interpolation.<br />
<br />
2) Seek and capture a page<br />
<br />
3) Construct a new interpolated best guess as to the desired seek<br />
location based on DTS of captured page<br />
<br />
4) repeat steps 2 and 3 until we have seen contiguous pages, the<br />
first with a time stamp immediately preceeding the desired point<br />
in time and the second with a time stamp equal-to-or-following<br />
the desired point.<br />
<br />
5.a) If none of the codec streams being decoded use backreferences,<br />
we're finished.<br />
<br />
5.b) If we need to seek to a backreference, the auxiliary<br />
backreference structure gives us a hard byte offset to seek back<br />
to.<br />
The proposed threaded backreference mechanism would append a single,<br />
variable-length-encoded byte offset to the end of each page after<br />
the packet data. This addiiton would modify the capture criteria<br />
such that capture must account for the optional, unsignaled field.<br />
After reading packet data, demux framing would have to check for<br />
either a capture pattern, or a backreference followed by a capture<br />
pattern.<br />
<br />
Variations of this proposal could also include bisection hints to<br />
further reduce the order of complexity of the bisection search.<br />
<br />
As I see it, this middle of the road approach is a compromise<br />
without real advantages. In its favor:<br />
<br />
1) Streams that make use of backreferences (most AV streams) would<br />
avoid the second bisection search, always reducing the average<br />
number of seeks by a little more than one.<br />
<br />
However, there are significant disadvantages.<br />
<br />
1) Like Proposal 2, the threaded backreference mechanism means<br />
seeking makes use of two ~ unrelated mechanisms instead of one.<br />
The threaded backreference increases mux and demux complexity.<br />
<br />
2) The average seek number drops from ~ 4.5 physcal seeks to ~ 3.5.<br />
This is not a significant improvement given the additional<br />
compexity. Proposal 2 is simpler, similar overhead, and drops<br />
best case seeks to 1.<br />
<br />
3) Like proposal 2, the additional backreference offset is<br />
redundant, non-authoritative, and non-structural.<br />
<br />
In short, this mechanism has the disadvantages of Proposal 2 (in<br />
addition to being weird) but does not deliver as much advantage as<br />
Proposal 2.</div>Xiphmont