Problems resulting from design of Ogg
[work in progress]
The sections and bullet points on this page are a dump of slides from a Shane Stephens FOMS 2008 talk. They reflect a number of 'problems' developers regularly bring up with Ogg, both legitimately and erroneously. Because these points are brought up regularly and often incorrectly used in support of other formats [such as Matroska and NUT] we've added them to the Wiki here along with a response/discussion of each claimed problem. For the most part, they reflect either an inadequacy in existing software, an inadequacy in existing documentation and/or a misunderstanding of the Ogg encapsulation. Consider this Wiki page a first step toward rectifying a legitimate lack of documentation.
Seeking and Editing Problems
These mostly boil down to a lack of provided, reference library software to perform the tasks of seeking and tracking for the application developer. Libogg as it exists now is a very low-level library that provides only the rudimentary routines necessary to provide valid stream building blocks (pages and packets). It does not provide any higher-level or automatic stream handling functionality, and as such, each application developer has had to reinvent the higher level routines over and over again. This leads to the impression that Ogg is an overly complex and error-prone encapsulation as 100 different applications will each have their own homegrown stream handling routines, each with their own bugs and unimplemented functions. The chaos that reigns in the Ogg application space is not an idictment of Ogg. It's due to a lack of stable, documented, high-level stream handling libraries from the beginning.
- jagged edges [ie, coarse granularity]
- In Ogg, a codec's frames (packets) are encapsulated into pages, singly or in groups. It is typical for an audio stream to store many packets on a single page and a video stream to store only one or possibly two packets on a page. The individual ['logical'] audio and video streams are multiplexed into a single 'physical' stream at the page level. This is done for two reasons: First, to make it easy to multiplex and demultiplex streams into new arrangements, and second to reduce the overhead of encapsulation. By grouping small packets into a larger page, the overhead of the page header is spread across the packets. A well-formed Ogg stream has a typical overhead of about 1%, regardless of the media types it encapsulates.
- The 'jagged edges' complaint arises because default libogg1 behavior fills all pages to ~ 4kB regardless of stream bitrate. A given audio page might contain a full second of audio packets while a video pages contains a single video frame. We would then see 20 or 30 video pages for each audio page.
- This arrangement is not incorrect, it is merely suboptimal when optimizing for minimal buffering and seeking. Packet and sample precision seeking takes longer and buffering overhead is higher. However, in a poorly written stream handler that makes invalid assumptions about Ogg streams for convenience, it can trigger bugs. Either the Ogg stream or Ogg itself is blamed for allowing such 'stupid' streams. Note however that even such a 'broken' stream (it is not broken, it's simply suboptimal) can be repaginated into an optimal arrangement losslessly.
- The root of the 'balancing' problem is a lack of functionality. By default, libogg always flushes pages at just over 4kB (rather than working by timestamp or some other better default). Changing this behavior requires manual intervention in stream building when it should be automatic to libogg. Chalk this one up to 'software flaw' not 'inadequacy in Ogg'.
- wide variance in location of cotemporal data
- This is a different way of describing the 'jagged edges' problem above. Because a suboptimally chosen interleave can have low-bitrate pages spaced far apaprt, the audio frames for a given point in time may be physically located well away from the time-matching video data. This point is addressed above.
- impossible to reconstruct all granulepos values around holes
- granulepos / timeval mapping inconsistencies
- poorly sorted streams are rife
- impossible to efficiently seek with noncontinuous data
- no absolute clock (no presentation timestamps)
- no way to correct for clock skew between audio/video encoding
- end-time ordering
- except when we have non-continuous data
- Ordering isn't only an issue for non-continuous data. In theory, an idiot can fit up to ~30 minutes of Speex audio (silence) in a single page (or 4 minutes of actual speech).
- inefficient lacing values for video
- ad-hoc granulepos retrofitting for video, CMML
- seeking is hard
- pages, and libogg's behaviour when creating them
What use are...
- serial numbers?
- packet numbers?
- Useful for audio (preventing ear damage), but could be optional for video
- We should not need to know the type of a stream if we are not decoding the stream
- granulepos interpretations
- Skeleton goes some way towards fixing this
- Stupid decision for flushing pages
- Makes it generally easy to build broken files.
Short-term workarounds (Ogg1-compatible)
- Don't use partial packets unless absolutely necessary
- If absolutely necessary, don't share the pages with other packets
- Specify that pages should not contain more than X ms of data (let's say 250-500 ms)
- Put Theora keyframes alone on their page??
A successor to Ogg
- It should be called (Ogg2|Ogg3|Ogg++|OggNG|Ogh|Foo|Dumplings|AdvancedOgg or AOgg or Ogg+A|ggo|SOgg)
- The design should be done from desired capabilities and desired properties
- These capabilities and properties should come from AV experts, web-page designers, system administrators, and users
- Simple seeking
- Cleanly cuttable
- Robust to errors
- Supports arbitrary stream types
- Low bit cost
- Easy to chunk
- Low decode cost
- Supports multiple streams of each type
Untied We Stand
- Can cotemporal data be colocated?
- streams & bundles
- great for cutting
- OK for demultiplexing
- “should” cut down on bit overhead
- hugely simplifies seeking
Gimme a Hint
- Can we add seeking hints to the stream?
- these can be tiny and infrequent
- awesome for standalone files
- what do we do when streaming?
- hint correction packets?
- is this turtles all the way down?
- Would an up-front index be better?
- These problems aren't unsurmountable
- but we're only finding some of them now, and we've been working around others for years
- Nobody will adopt another container format
- Nobody cares about <insert hated feature here> anyway
- Even if we have Ogg2, we'll still be stuck having to support Ogg1 and broken files