Daala on Wheels
See also: https://xiph.org/daala/
We've been having weekly progress meetings on mumble.
- 2012 June 4 minutes (actually a work week)
- 2012 June 22 minutes
- 2012 June 29 minutes recording
- 2012 July 6 minutes
- 2012 July 13 minutes
- 2012 July 20 minutes
- 2012 July 27 minutes
- 2012 August 3 minutes recording
- 2012 August 10 minutes
- 2012 August 17 - no meeting
- 2012 August 24 - minutes recording
- 2012 August 31 - no meeting
- 2012 September 7 - no meeting
- 2012 September 14 - no meeting
- 2012 September 21 - minutes
- 2012 September 28 - minutes recording
- 2012 October 5 - recording
Other components which have been discussed include:
Techniques applicable to all frame types
- Multisymbol arithmetic coding
- Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks.
- Explore legendre polynomial basis transforms instead of DCT
- May have better perceptual properties and/or result in 'less compromised' efficient implementations.
- Coefficient domain prediction to allow efficient energy preserving quantization.
- Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.
- Perhaps 45deg is still useful?
- Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;)
- Special block modes well suited to solid color/cartoon like content— avoiding ringing.
- Are pixel prediction modes too slow?
- In general— what markov random field techniques can be applied with acceptable performance. Any?
- Designed for parallel encode and decode within each frame
- Important because
- the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode
- Moore's law for single-threaded throughput is dead. Future hardware is all multicore/GPU.
- Getting the order of application right for the lapping filters.
- Important because
- Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf
Techniques applicable to inter frames
- Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).
- Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.
- YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.
- Alpha channel — need testing material!
- 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)
- Efficient 3D? — need testing material!
- Good support for decode side droppable frames?
- Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.
Crazy crap that might be interesting or at least fun to make fun of...
- Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.
- Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.
- Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.
- Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.
- Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.
- Parametric decode-side blur.
- Fancy block property prediction. (Not clear how these prediction interact with intra pred)
- Using Kurtosis for detecting text in a frame
- The idea was to detect a Bernouilli distribution but it's not robust and too noisy