Talk:TransOgg Seeking Proposals

From XiphWiki
Revision as of 14:12, 9 April 2013 by Derf (talk | contribs) (Created page with "''> The DTS and backreference encoding was defined by-codec,<br/> ''> and thus decoding the timestamps required additional<br/> ''> infrastructure that most frameworks had to cod...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

> The DTS and backreference encoding was defined by-codec,
> and thus decoding the timestamps required additional
> infrastructure that most frameworks had to code from
> scratch. Several frameworks never implemented precise
> seeking for this reason alone.

In our defense, this allowed people to repeatedly mux things into Ogg that would have been impossible based on what we knew when Ogg was designed: the Theora keyframe backreference scheme, the Dirac generalized B-frame scheme, the VP8 "invisible" ALT refs, the Opus pre-skip and pre-roll, etc.

We can argue that we now know all these things and that no more new ones will come along, but the MKV folks thought that, too, and the last two (alt-refs and pre-skip/pre-roll) proved them wrong.

Of course, they also don't handle basic Vorbis end-time trimming correctly, so it's not like they even got the stuff that was known at the time correct. A good design here is hard work. That's not a reason not to try, and I know you know all of this, but I think we should be aware of the risks.

> Xiph never implemented its own all-encompassing framework
> to provide an example of complete seeking that worked in
> any Ogg file.

Arguably, even if we had, people who already have their own frameworks (gstreamer, FFmpeg, VLC, etc.) wouldn't have used it.

> Stream structure discovery was also based on performing
> multiple bisection searches to find link boundaries.

The important point here is that because you have no information about where the link boundary is, nor what kind of streams are contained in the data after the boundary, you can't do much better than dumb bisection (I tried in libopusfile, with some success, but it was basically improving the constant out front... each bisection was still log(N) where N is the size of the link).

But this can be solved by a simple back-pointer at the end of each link. That makes link enumeration 1 seek per link.

> Given the practically poor and unstable performance of
> preceding bisection seeking implmentations, it will be
> difficult to sell adopter opinion on an updated bisection
> design, even one that farts unicorns.

This is actually a really important point. I think we should get buy-in from people who actually want to use this for something before we try go off and make it. The guys making NUT spent just as much effort on design work as we did (their conversations actually sounded an awful lot like the ones we had designing Ogg), and yet... no one uses NUT. Our resources and attention are finite.

> Although it is still possible to mux an invalid file, it
> is much harder to end up with a stream in which
> second-order seeking 'helper' structures disagree with
> the authoritative timestamps within the page data.

The seeking implementation still has to deal with a bunch of potentially invalid data to avoid going off the rails. For example, timestamps that are aren't ordered properly, timestamps that don't lie in the range of the stream start and end times, timestamps whose difference overflows a 64-bit signed integer, etc. Large amount of garbage data at the end of the file make most code used to find the stream length O(N^2) instead of O(N) (it's not hard to write O(N) code, but most people just don't realize the potential problem). The list goes on...

> WebM went one step further by mandating an index and
> eliminating the Matroska bisection seek mechanism entirely.

Which they then walked back when they realized they wanted live streaming support, too. But that doesn't stop lots of things breaking on indexless files.

> The index can be grabbed from the stream tail
> asynchronously.

The likelihood of this getting implemented is very low. Every single media framework wants to know two things before it starts playing: is the file seekable, and how long is it. This affects the controls presented, the playback time display, etc. Nobody is prepared for information about either of these points to change during the middle of playback.

If you need an index to seek, you need to check if it's there before you say you can seek (because it's often missing or damaged, especially if it's at the end of a file). If the index is at the end, it just makes this more expensive. The only real advantage is that you can write the file in a single pass.