<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.xiph.org/index.php?action=history&amp;feed=atom&amp;title=TransOgg_Seeking_Proposals</id>
	<title>TransOgg Seeking Proposals - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.xiph.org/index.php?action=history&amp;feed=atom&amp;title=TransOgg_Seeking_Proposals"/>
	<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=TransOgg_Seeking_Proposals&amp;action=history"/>
	<updated>2026-05-31T02:12:17Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.45.1</generator>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=TransOgg_Seeking_Proposals&amp;diff=13672&amp;oldid=prev</id>
		<title>Xiphmont: formatting</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=TransOgg_Seeking_Proposals&amp;diff=13672&amp;oldid=prev"/>
		<updated>2012-09-18T22:14:46Z</updated>

		<summary type="html">&lt;p&gt;formatting&lt;/p&gt;
&lt;a href=&quot;https://wiki.xiph.org/index.php?title=TransOgg_Seeking_Proposals&amp;amp;diff=13672&amp;amp;oldid=13671&quot;&gt;Show changes&lt;/a&gt;</summary>
		<author><name>Xiphmont</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=TransOgg_Seeking_Proposals&amp;diff=13671&amp;oldid=prev</id>
		<title>Xiphmont: Blow new text document into wiki; not formatted as yet.</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=TransOgg_Seeking_Proposals&amp;diff=13671&amp;oldid=prev"/>
		<updated>2012-09-18T21:51:30Z</updated>

		<summary type="html">&lt;p&gt;Blow new text document into wiki; not formatted as yet.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Three proposals for seeking structures/methods in transOgg&lt;br /&gt;
&lt;br /&gt;
Seeking within a container is an example of &amp;#039;we can&amp;#039;t have everything,&lt;br /&gt;
so we have to choose what we really want&amp;#039;.  We consider the following&lt;br /&gt;
design requirements wants and needs:&lt;br /&gt;
&lt;br /&gt;
  1) Efficient seeking with minimum latency.  Minimum overall latency&lt;br /&gt;
  can depend heavily on the scenario, where seeking within a file on a&lt;br /&gt;
  hard disk, over http on a high bandwidth link/high latency and over&lt;br /&gt;
  http on a low bandwidth/low latency link will all have different&lt;br /&gt;
  optimal design requires.  Thus the design will have to shoot for&lt;br /&gt;
  good behavior across the lot.&lt;br /&gt;
&lt;br /&gt;
  2) Low or no additional overhead.  Because, hey, wasted space is&lt;br /&gt;
  wasted space.&lt;br /&gt;
&lt;br /&gt;
  3) Seeking must be available within Ogg&amp;#039;s specific streaming design&lt;br /&gt;
  philosophy; an encoder must be able to write a seekable file in one&lt;br /&gt;
  pass, and a demuxer must be able to seek in any valid Ogg file.&lt;br /&gt;
&lt;br /&gt;
  4) Muxing must not require repaginating existing data (the &amp;#039;deck of&lt;br /&gt;
  cards&amp;#039; metaphor)&lt;br /&gt;
&lt;br /&gt;
  5) A design that the designers can actually document completely and&lt;br /&gt;
  that implementors are actually willing and able to implement.&lt;br /&gt;
&lt;br /&gt;
Seeking in original Ogg treated 2, 3, 4 as paramount, 1 as&lt;br /&gt;
secondary, and was a disaster due to 5.  transOgg will need to strike&lt;br /&gt;
a different balance.  I submit three seeking mechanism proposals for&lt;br /&gt;
consideration.&lt;br /&gt;
&lt;br /&gt;
Proposal 1: The original Ogg bisection search only&lt;br /&gt;
&lt;br /&gt;
  We stick with the original seeking algorithm as intended in the&lt;br /&gt;
  original Ogg bitstream and correct the corner cases that made it a&lt;br /&gt;
  serious pain.  There is no index.  Seeking is done via bisection&lt;br /&gt;
  search:&lt;br /&gt;
&lt;br /&gt;
  1) Make an initial guess as to the physical position of the desired&lt;br /&gt;
     seekpoint based on known stream duration and simple linear&lt;br /&gt;
     interpolation.&lt;br /&gt;
&lt;br /&gt;
  2) Seek and capture a page&lt;br /&gt;
&lt;br /&gt;
  3) Construct a new interpolated best guess as to the desired seek&lt;br /&gt;
     location based on DTS of captured page&lt;br /&gt;
&lt;br /&gt;
  4) repeat steps 2 and 3 until we have seen contiguous pages, the&lt;br /&gt;
     first with a time stamp immediately preceeding the desired point&lt;br /&gt;
     in time and the second with a time stamp equal-to-or-following&lt;br /&gt;
     the desired point.&lt;br /&gt;
&lt;br /&gt;
  5.a) If none of the codec streams being decoded use backreferences,&lt;br /&gt;
     we&amp;#039;re finished.&lt;br /&gt;
&lt;br /&gt;
  5.b) In the event of codec streams with backreferences (such as is&lt;br /&gt;
     typical in video), the pages for the codec stream also include&lt;br /&gt;
     the DTS of the earliest required reference.  Read forward&lt;br /&gt;
     until we&amp;#039;ve seen one page from each decoded stream using&lt;br /&gt;
     backreferences; save the earliest backreference DTS.&lt;br /&gt;
&lt;br /&gt;
  6) Repeat the bisection search in 2-4 using the backreference&lt;br /&gt;
     DTS as the desired final seek location.&lt;br /&gt;
&lt;br /&gt;
  This algorithm appears more complex than it is; bisection searching&lt;br /&gt;
  is a well understood technique, and the average bisection search in&lt;br /&gt;
  a 1GB ogg file (of mixed Vorbis audio and Theora video) required&lt;br /&gt;
  approximately 3.5 seeks for the initial seek and just over one more&lt;br /&gt;
  seek for the second (as the first search can inform the second quite&lt;br /&gt;
  well).&lt;br /&gt;
&lt;br /&gt;
  This original Ogg seeking method was unpopular for several reasons:&lt;br /&gt;
&lt;br /&gt;
  1) The DTS and backreference encoding was defined by-codec,&lt;br /&gt;
     and thus decoding the timestamps required additional&lt;br /&gt;
     infrastructure that most frameworks had to code from scratch.&lt;br /&gt;
     Several frameworks never implemented precise seeking for this&lt;br /&gt;
     reason alone.&lt;br /&gt;
&lt;br /&gt;
  2) Corner cases introduced by specific design decisions (eg, the&lt;br /&gt;
     presence of pages with no timestamps) complicated the actual&lt;br /&gt;
     implementation.  In addition, correct behavior for these corner&lt;br /&gt;
     cases was not well documented or not documented at all.&lt;br /&gt;
&lt;br /&gt;
  3) Xiph never implemented its own all-encompassing framework to&lt;br /&gt;
     provide an example of complete seeking that worked in any Ogg&lt;br /&gt;
     file.  In general, we hardwired our own apps for seeking in&lt;br /&gt;
     either Vorbis alone or Vorbis+Theora only.&lt;br /&gt;
&lt;br /&gt;
  4) Many frameworks that attempted to implement proper bisection&lt;br /&gt;
     seeking had bugs that went unnoticed or unfixed (owing to all of&lt;br /&gt;
     the above). These bugs could increase the number of seeks in a&lt;br /&gt;
     bisection search by a factor of 10X-100X. This gave bisection&lt;br /&gt;
     seeking a reputation for inherent poor performace.  Even&lt;br /&gt;
     Vorbisfile has suffered from a number of bugs of exactly this&lt;br /&gt;
     sort at several points.&lt;br /&gt;
&lt;br /&gt;
  5) Stream structure discovery was also based on performing multiple&lt;br /&gt;
     bisection searches to find link boundaries. Given all the points&lt;br /&gt;
     above, and the fact that stream structure decisions added&lt;br /&gt;
     additional corner cases to implementations, structure discovery&lt;br /&gt;
     also tended to be buggy and slow.&lt;br /&gt;
&lt;br /&gt;
  The lesson of the above is that it is not enough to have a&lt;br /&gt;
  conceptually simple design.  The actual implementation complexity&lt;br /&gt;
  can easily explode given seemingly small design decisions going in&lt;br /&gt;
  the wrong direction.&lt;br /&gt;
&lt;br /&gt;
  The primary strikes against proposal 1 are:&lt;br /&gt;
&lt;br /&gt;
  1) Given the practically poor and unstable performance of preceding&lt;br /&gt;
     bisection seeking implmentations, it will be difficult to sell&lt;br /&gt;
     adopter opinion on an updated bisection design, even one that&lt;br /&gt;
     farts unicorns.&lt;br /&gt;
&lt;br /&gt;
  2) Even when the bisection search performs perfectly, it still&lt;br /&gt;
     requires multiple seeks (4.5 on average for exact seeking in a&lt;br /&gt;
     very smart bisector, and an upper bound of 7 for precise seek in&lt;br /&gt;
     a simplest-possible-correct implementation).&lt;br /&gt;
&lt;br /&gt;
  3) Backreference discovery requires reading enough data to find at&lt;br /&gt;
     least one page of each stream that uses backreferences.  Although&lt;br /&gt;
     the required amount of data is bounded (and on average, much less&lt;br /&gt;
     than the upper bound) this too increases the seeking latency.&lt;br /&gt;
&lt;br /&gt;
  The primary advantages of proposal 1 are:&lt;br /&gt;
&lt;br /&gt;
  1) The design is conceptually simple amd presents a single mechanism&lt;br /&gt;
     making use of non-redundant, authoritative data.  Although it is&lt;br /&gt;
     still possible to mux an invalid file, it is much harder to end&lt;br /&gt;
     up with a stream in which second-order seeking &amp;#039;helper&amp;#039;&lt;br /&gt;
     structures disagree with the authoritative timestamps within the&lt;br /&gt;
     page data.&lt;br /&gt;
&lt;br /&gt;
  2) Proper design would allow an optimal seek implementation nearly as&lt;br /&gt;
     simple as the design.&lt;br /&gt;
&lt;br /&gt;
  3) Zero additional data overhead.&lt;br /&gt;
&lt;br /&gt;
  4) No additional requirements on a muxer (muxers need only obey&lt;br /&gt;
     timestamp ordering. The small deviations from this plan in&lt;br /&gt;
     original Ogg were rather serious errors in hindsight)&lt;br /&gt;
&lt;br /&gt;
  5) Strict adherence to the overall Ogg muxing philosophy of having&lt;br /&gt;
     no features unavailable in a stream context (every feature of the&lt;br /&gt;
     stream format is available to a streaming muxer).&lt;br /&gt;
&lt;br /&gt;
Proposal 2: Mandatory bisection index with an optional head-or-tail index&lt;br /&gt;
&lt;br /&gt;
  The structure of the Proposal 2 Ogg file is as in Proposal 1, with&lt;br /&gt;
  no changes to the bisection seeking algorithm.  However, in addition&lt;br /&gt;
  to the bisection seeking mechanism, an index may be optionally&lt;br /&gt;
  present.&lt;br /&gt;
&lt;br /&gt;
  1) The index may be in either the header or footer of the file (but&lt;br /&gt;
     not both). --- point of discussion; perhaps only in the tail?&lt;br /&gt;
     This would make the index a fully encode-stream-available&lt;br /&gt;
     feature.&lt;br /&gt;
&lt;br /&gt;
  2) The index consists of hard byte offsets to the earliest&lt;br /&gt;
     backreference required to decode all continuous media types from&lt;br /&gt;
     the specified DTS. -- point of discussion; secondary index for&lt;br /&gt;
     discontinuous media streams? Discontinuous spacing can easily be&lt;br /&gt;
     too wide to include in a single-backreference index system; there&lt;br /&gt;
     would potentially need to be a dedicated index per discontinuous&lt;br /&gt;
     stream.&lt;br /&gt;
&lt;br /&gt;
  3) The index does not have a specific, hard resolution.  It may be&lt;br /&gt;
     as fine or coarse as desired.&lt;br /&gt;
&lt;br /&gt;
  4) The index is ephemeral and non-authoritative. It can be generated&lt;br /&gt;
     or regenerated trivially via a linear scan of the stream.&lt;br /&gt;
     Remuxing a stream would necessarily invalidate any preexisting&lt;br /&gt;
     index.&lt;br /&gt;
&lt;br /&gt;
  5) The index is not mandatory and cannot be relied upon to exist.&lt;br /&gt;
&lt;br /&gt;
  Experience with Matroska and the original version of Ogg shows that&lt;br /&gt;
  where an index and a bisection mechanism coexist, implementations&lt;br /&gt;
  usually ignore the bisection mechanism entirely and rely on the&lt;br /&gt;
  index alone, refusing to seek when no index is present.  This had&lt;br /&gt;
  been the original reason for resisting an index in Ogg, borne out by&lt;br /&gt;
  the the fact that few Matroska demuxers implement the mandatory&lt;br /&gt;
  bisection seeking and so can seek only in Matroska files with&lt;br /&gt;
  indexes.  WebM went one step further by mandating an index and&lt;br /&gt;
  eliminating the Matroska bisection seek mechanism entirely.&lt;br /&gt;
&lt;br /&gt;
  An index is undeniably useful.  Bisection seeking requires 3.5 seeks&lt;br /&gt;
  on average (without backreferences) and between 4.5 and 7 average&lt;br /&gt;
  seeks in a file with backreferences, depending on the sophistication&lt;br /&gt;
  of the search algorithm.  Indexes do not necessarily eliminate all&lt;br /&gt;
  seeks, and may not improve upon the amount of data that needs to be&lt;br /&gt;
  read, but it&amp;#039;s possible to generate an index that does so whenever&lt;br /&gt;
  such an index is warranted.&lt;br /&gt;
&lt;br /&gt;
  Thus Proposal 2 assumes that the benefit of an index (improved&lt;br /&gt;
  seeking performance) outweighs the disadvantages (incomplete demuxer&lt;br /&gt;
  implementations that cannot seek when the index is absent).&lt;br /&gt;
&lt;br /&gt;
  The disadvantages of Proposal 2 are:&lt;br /&gt;
&lt;br /&gt;
  1) Redundant, non-structural information. Having a non-authoritative&lt;br /&gt;
     index increases the opportunity for mux and demux implementation&lt;br /&gt;
     bugs. There will certainly be cases in the wild where the index&lt;br /&gt;
     does not agree with the stream.  This disagreement isn&amp;#039;t possible&lt;br /&gt;
     if there&amp;#039;s no non-structural or redundant seeking information.&lt;br /&gt;
&lt;br /&gt;
  2) A front-positioned index breaks the Ogg &amp;#039;everything is always a&lt;br /&gt;
     stream&amp;#039; model, as a front-positioned index necessarily requires a&lt;br /&gt;
     second pass.&lt;br /&gt;
&lt;br /&gt;
  3) Experience in multiple containers show that implementors will&lt;br /&gt;
     tend to implement index-only seeking [and even tell users their&lt;br /&gt;
     files are &amp;#039;broken&amp;#039; if the files don&amp;#039;t contain an index].&lt;br /&gt;
&lt;br /&gt;
  The advantages of Proposal 2 are:&lt;br /&gt;
&lt;br /&gt;
  1) Latency. Indexes can be generated for consumption models like the&lt;br /&gt;
     web where seeks can be very high latency, and reducing the seek&lt;br /&gt;
     number to &amp;#039;always only 1&amp;#039; is a boon.&lt;br /&gt;
&lt;br /&gt;
  2) A tail-positioned index would preserve the streaming model and&lt;br /&gt;
     keep time-to-first-frame low by avoiding a large index in the&lt;br /&gt;
     header at stream startup.  The index can be grabbed from the&lt;br /&gt;
     stream tail asynchronously.&lt;br /&gt;
&lt;br /&gt;
  3) Coarse-grained indexes can be used as a mechanism to inform&lt;br /&gt;
     bisection search, still resulting in fewer seeks while also&lt;br /&gt;
     reducing the index overhead in situations where this overhead is&lt;br /&gt;
     critical.&lt;br /&gt;
&lt;br /&gt;
Proposal 3: Bisection search with threaded backreference&lt;br /&gt;
&lt;br /&gt;
  Proposal 3 makes use of bisection seeking as above, but rather than&lt;br /&gt;
  using an index, adds an auxiliary backreference structure threaded&lt;br /&gt;
  through the stream in order to eliminate the second stage of the&lt;br /&gt;
  bisection seek.  The bisection sarch algorithm always short-circuits&lt;br /&gt;
  at step 5:&lt;br /&gt;
&lt;br /&gt;
  1) Make an initial guess as to the physical position of the desired&lt;br /&gt;
     seekpoint based on known stream duration and simple linear&lt;br /&gt;
     interpolation.&lt;br /&gt;
&lt;br /&gt;
  2) Seek and capture a page&lt;br /&gt;
&lt;br /&gt;
  3) Construct a new interpolated best guess as to the desired seek&lt;br /&gt;
     location based on DTS of captured page&lt;br /&gt;
&lt;br /&gt;
  4) repeat steps 2 and 3 until we have seen contiguous pages, the&lt;br /&gt;
     first with a time stamp immediately preceeding the desired point&lt;br /&gt;
     in time and the second with a time stamp equal-to-or-following&lt;br /&gt;
     the desired point.&lt;br /&gt;
&lt;br /&gt;
  5.a) If none of the codec streams being decoded use backreferences,&lt;br /&gt;
     we&amp;#039;re finished.&lt;br /&gt;
&lt;br /&gt;
  5.b) If we need to seek to a backreference, the auxiliary&lt;br /&gt;
     backreference structure gives us a hard byte offset to seek back&lt;br /&gt;
     to.&lt;br /&gt;
  The proposed threaded backreference mechanism would append a single,&lt;br /&gt;
  variable-length-encoded byte offset to the end of each page after&lt;br /&gt;
  the packet data.  This addiiton would modify the capture criteria&lt;br /&gt;
  such that capture must account for the optional, unsignaled field.&lt;br /&gt;
  After reading packet data, demux framing would have to check for&lt;br /&gt;
  either a capture pattern, or a backreference followed by a capture&lt;br /&gt;
  pattern.&lt;br /&gt;
&lt;br /&gt;
  Variations of this proposal could also include bisection hints to&lt;br /&gt;
  further reduce the order of complexity of the bisection search.&lt;br /&gt;
&lt;br /&gt;
  As I see it, this middle of the road approach is a compromise&lt;br /&gt;
  without real advantages.  In its favor:&lt;br /&gt;
&lt;br /&gt;
  1) Streams that make use of backreferences (most AV streams) would&lt;br /&gt;
     avoid the second bisection search, always reducing the average&lt;br /&gt;
     number of seeks by a little more than one.&lt;br /&gt;
&lt;br /&gt;
  However, there are significant disadvantages.&lt;br /&gt;
&lt;br /&gt;
  1) Like Proposal 2, the threaded backreference mechanism means&lt;br /&gt;
     seeking makes use of two ~ unrelated mechanisms instead of one.&lt;br /&gt;
     The threaded backreference increases mux and demux complexity.&lt;br /&gt;
&lt;br /&gt;
  2) The average seek number drops from ~ 4.5 physcal seeks to ~ 3.5.&lt;br /&gt;
     This is not a significant improvement given the additional&lt;br /&gt;
     compexity.  Proposal 2 is simpler, similar overhead, and drops&lt;br /&gt;
     best case seeks to 1.&lt;br /&gt;
&lt;br /&gt;
  3) Like proposal 2, the additional backreference offset is&lt;br /&gt;
     redundant, non-authoritative, and non-structural.&lt;br /&gt;
&lt;br /&gt;
In short, this mechanism has the disadvantages of Proposal 2 (in&lt;br /&gt;
addition to being weird) but does not deliver as much advantage as&lt;br /&gt;
Proposal 2.&lt;/div&gt;</summary>
		<author><name>Xiphmont</name></author>
	</entry>
</feed>