<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://wiki.xiph.org/skins/common/feed.css?272"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://wiki.xiph.org/index.php?title=Special:Contributions/Rillian&amp;feed=atom&amp;limit=50&amp;target=Rillian&amp;year=&amp;month=</id>
		<title>XiphWiki - User contributions [en]</title>
		<link rel="self" type="application/atom+xml" href="http://wiki.xiph.org/index.php?title=Special:Contributions/Rillian&amp;feed=atom&amp;limit=50&amp;target=Rillian&amp;year=&amp;month="/>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Special:Contributions/Rillian"/>
		<updated>2013-05-20T12:01:58Z</updated>
		<subtitle>From XiphWiki</subtitle>
		<generator>MediaWiki 1.16.1</generator>

	<entry>
		<id>http://wiki.xiph.org/Vorbis-tools</id>
		<title>Vorbis-tools</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Vorbis-tools"/>
				<updated>2013-02-13T19:06:02Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: grammar fix&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Several tools to use, manipulate and create Vorbis files.&lt;br /&gt;
&lt;br /&gt;
* [http://downloads.xiph.org/releases/vorbis/ Release packages]&lt;br /&gt;
* [https://svn.xiph.org/trunk/vorbis-tools/ Source repository]&lt;br /&gt;
&lt;br /&gt;
Contains:&lt;br /&gt;
* ogg123&lt;br /&gt;
* oggenc&lt;br /&gt;
* oggdec&lt;br /&gt;
* ogginfo&lt;br /&gt;
* vcut&lt;br /&gt;
* vorbiscomment&lt;br /&gt;
&lt;br /&gt;
[[Category:Vorbis]]&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Vorbis-tools</id>
		<title>Vorbis-tools</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Vorbis-tools"/>
				<updated>2013-02-13T19:05:42Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: Replace out of date download link with generic ones.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Several tools to use, manipulate and create Vorbis files.&lt;br /&gt;
&lt;br /&gt;
* [http://downloads.xiph.org/releases/vorbis/ Release packages]&lt;br /&gt;
* [https://svn.xiph.org/trunk/vorbis-tools/ Source repository]&lt;br /&gt;
&lt;br /&gt;
Contain:&lt;br /&gt;
* ogg123&lt;br /&gt;
* oggenc&lt;br /&gt;
* oggdec&lt;br /&gt;
* ogginfo&lt;br /&gt;
* vcut&lt;br /&gt;
* vorbiscomment&lt;br /&gt;
&lt;br /&gt;
[[Category:Vorbis]]&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-12-07T22:44:34Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ Fix date&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
* 2012 August 10 [https://people.xiph.org/~giles/2012/daala_20120810.txt minutes]&lt;br /&gt;
* 2012 August 17 - no meeting&lt;br /&gt;
* 2012 August 24 - [https://people.xiph.org/~giles/2012/daala_20120824.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120824.opus recording]&lt;br /&gt;
* 2012 August 31 - no meeting&lt;br /&gt;
* 2012 September 7 - no meeting&lt;br /&gt;
* 2012 September 14 - no meeting&lt;br /&gt;
* 2012 September 21 - [https://people.xiph.org/~giles/2012/daala_20120921.txt minutes]&lt;br /&gt;
* 2012 September 28 - [https://people.xiph.org/~giles/2012/daala_20120928.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120928.opus recording]&lt;br /&gt;
* 2012 October 5 - [https://people.xiph.org/~giles/2012/daala_20121005.opus recording]&lt;br /&gt;
* 20120 October 26 - &lt;br /&gt;
* 2012 Novemeber 2 - no meeting&lt;br /&gt;
* 2012 December 7 - [https://people.xiph.org/~giles/2012/daala_20121207.txt minutes]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-12-07T22:43:51Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ link to meeting summary&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
* 2012 August 10 [https://people.xiph.org/~giles/2012/daala_20120810.txt minutes]&lt;br /&gt;
* 2012 August 17 - no meeting&lt;br /&gt;
* 2012 August 24 - [https://people.xiph.org/~giles/2012/daala_20120824.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120824.opus recording]&lt;br /&gt;
* 2012 August 31 - no meeting&lt;br /&gt;
* 2012 September 7 - no meeting&lt;br /&gt;
* 2012 September 14 - no meeting&lt;br /&gt;
* 2012 September 21 - [https://people.xiph.org/~giles/2012/daala_20120921.txt minutes]&lt;br /&gt;
* 2012 September 28 - [https://people.xiph.org/~giles/2012/daala_20120928.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120928.opus recording]&lt;br /&gt;
* 2012 October 5 - [https://people.xiph.org/~giles/2012/daala_20121005.opus recording]&lt;br /&gt;
* 20120 October 26 - &lt;br /&gt;
* 2012 Novemeber 2 - no meeting&lt;br /&gt;
* 2012 December 6 - [https://people.xiph.org/~giles/2012/daala_20121206.txt minutes]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Opus-1.0.2</id>
		<title>Opus-1.0.2</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Opus-1.0.2"/>
				<updated>2012-12-06T03:48:52Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Other minor changes */ s/minor//&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Opus 1.0.2 fixes an out-of-bounds read that could be triggered by a malicious Opus packet causing an integer wrap-around in the padding code. Considering that the packet would have to be at least 16 MB in size and that no out-of-bounds write is possible, the severity is very low. This new release also has the following changes:&lt;br /&gt;
&lt;br /&gt;
== Quality-impacting ==&lt;br /&gt;
* Changed the behaviour of the PLC to always fill the caller's buffer&lt;br /&gt;
* Properly decode in-band FEC for packets with mutiple Opus frames&lt;br /&gt;
* Hybrid mode quality improvements and fixes&lt;br /&gt;
* Fixed bugs in the CELT mode PLC&lt;br /&gt;
* Redundant mode transition fixes&lt;br /&gt;
&lt;br /&gt;
== Other changes ==&lt;br /&gt;
* Stack reduction&lt;br /&gt;
* Doc fixes (many)&lt;br /&gt;
* 16-bit fixes&lt;br /&gt;
* Misc build fixes&lt;br /&gt;
* New API calls: OPUS_GET_LAST_PACKET_DURATION ctl() and opus_packet_get_nb_samples()&lt;br /&gt;
* Minor code cleanup&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Opus-1.0.2</id>
		<title>Opus-1.0.2</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Opus-1.0.2"/>
				<updated>2012-12-05T23:10:04Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Other minor changes */ s/Extra/New/ API&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Opus 1.0.2 fixes an out-of-bounds read that could be triggered by a malicious Opus packet causing an integer wrap-around in the padding code. Considering that the packet would have to be at least 16 MB in size and that no out-of-bounds write is possible, the severity is very low. This new release also has the following changes:&lt;br /&gt;
&lt;br /&gt;
== Quality-impacting ==&lt;br /&gt;
* Changed the behaviour of the PLC to always fill the caller's buffer&lt;br /&gt;
* Hybrid mode quality improvements and fixes&lt;br /&gt;
* Fixed bugs in the CELT mode PLC&lt;br /&gt;
* Redundant mode transition fixes&lt;br /&gt;
&lt;br /&gt;
== Other minor changes ==&lt;br /&gt;
* Stack reduction&lt;br /&gt;
* Doc fixes (many)&lt;br /&gt;
* 16-bit fixes&lt;br /&gt;
* Misc build fixes&lt;br /&gt;
* New API calls: OPUS_GET_LAST_PACKET_DURATION ctl() and opus_packet_get_nb_samples()&lt;br /&gt;
* Minor code cleanup&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Opus-1.0.2</id>
		<title>Opus-1.0.2</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Opus-1.0.2"/>
				<updated>2012-12-05T22:58:04Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Quality-impacting */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Opus 1.0.2 fixes an out-of-bounds read that could be triggered by a malicious Opus packet causing an integer wrap-around in the padding code. Considering that the packet would have to be at least 16 MB in size and that no out-of-bounds write is possible, the severity is very low. This new release also has the following changes:&lt;br /&gt;
&lt;br /&gt;
== Quality-impacting ==&lt;br /&gt;
* Changed the behaviour of the PLC to always fill the caller's buffer&lt;br /&gt;
* Hybrid mode quality improvements and fixes&lt;br /&gt;
* Fixed bugs in the CELT mode PLC&lt;br /&gt;
* Redundant mode transition fixes&lt;br /&gt;
&lt;br /&gt;
== Other minor changes ==&lt;br /&gt;
* Stack reduction&lt;br /&gt;
* Doc fixes (many)&lt;br /&gt;
* 16-bit fixes&lt;br /&gt;
* Misc build fixes&lt;br /&gt;
* Extra API: OPUS_GET_LAST_PACKET_DURATION ctl() and opus_packet_get_nb_samples()&lt;br /&gt;
* Minor code cleanup&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OPUS_TODO</id>
		<title>OPUS TODO</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OPUS_TODO"/>
				<updated>2012-12-04T18:03:46Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Spec */ link to drafts&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 1.0.2 ==&lt;br /&gt;
&lt;br /&gt;
* multi-frame FEC/PLC fix&lt;br /&gt;
* PLC fix&lt;br /&gt;
* opus_packet_get_duration()&lt;br /&gt;
* OPUS_GET_FRAME_SIZE() for decoder??&lt;br /&gt;
* &amp;lt;strike&amp;gt;Add license headers to all dist files&amp;lt;/strike&amp;gt; DONE&lt;br /&gt;
* &amp;lt;strike&amp;gt;Fix remaining build issues with MSVC&amp;lt;/strike&amp;gt; DONE&lt;br /&gt;
* &amp;lt;strike&amp;gt;Add OPUS_EXPORT override for chrome&amp;lt;/strike&amp;gt; DONE&lt;br /&gt;
&lt;br /&gt;
== 1.1-beta ==&lt;br /&gt;
&lt;br /&gt;
* tune transient detector&lt;br /&gt;
* variable frame size?&lt;br /&gt;
* LOTS of testing&lt;br /&gt;
* re-tune hybrid rate allocation&lt;br /&gt;
* re-tune mode switching decisions&lt;br /&gt;
* figure out how to use speech/music detection optimally&lt;br /&gt;
* everything from 1.0.2&lt;br /&gt;
&lt;br /&gt;
== Lower priority ==&lt;br /&gt;
&lt;br /&gt;
* Handle packets with PLC frames followed by FEC&lt;br /&gt;
* Better handling for the case where FEC has a different bandwidth than the current mode&lt;br /&gt;
* PLC transitions on unprotected SILK-SILK bandwidth changes?&lt;br /&gt;
&lt;br /&gt;
== Spec ==&lt;br /&gt;
* Ogg mapping. See [[http://tools.ietf.org/html/draft-ietf-codec-oggopus IETF draft]]&lt;br /&gt;
* Matroska mapping. See: [[MatroskaOpus]]&lt;br /&gt;
* RTP payload format See [[http://tools.ietf.org/html/draft-spittka-payload-rtp-opus IETF draft]]&lt;br /&gt;
&lt;br /&gt;
== Website ==&lt;br /&gt;
* De-uglify webpage&lt;br /&gt;
* Promotional material&lt;br /&gt;
&lt;br /&gt;
== Other ==&lt;br /&gt;
&lt;br /&gt;
* Oggz-validate (should also validate opus toc)&lt;br /&gt;
&lt;br /&gt;
== Opus-tools ==&lt;br /&gt;
* A simple real time streaming example tool&lt;br /&gt;
* Replaygain (half done— needs a gain tool)&lt;br /&gt;
&lt;br /&gt;
== Experiments ==&lt;br /&gt;
&lt;br /&gt;
* Test exp_analysis and void_my_warranty.patch&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future work ==&lt;br /&gt;
* Smart automatic mode decision&lt;br /&gt;
* psymodel based VBR&lt;br /&gt;
* Remove copy in inverse MDCT&lt;br /&gt;
* Save some float&amp;lt;-&amp;gt;int conversions&lt;br /&gt;
* Improvements to LP mode CBR (greg has some code)&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OPUS_TODO</id>
		<title>OPUS TODO</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OPUS_TODO"/>
				<updated>2012-12-04T18:00:21Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* 1.0.2 */ other recent changes&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 1.0.2 ==&lt;br /&gt;
&lt;br /&gt;
* multi-frame FEC/PLC fix&lt;br /&gt;
* PLC fix&lt;br /&gt;
* opus_packet_get_duration()&lt;br /&gt;
* OPUS_GET_FRAME_SIZE() for decoder??&lt;br /&gt;
* &amp;lt;strike&amp;gt;Add license headers to all dist files&amp;lt;/strike&amp;gt; DONE&lt;br /&gt;
* &amp;lt;strike&amp;gt;Fix remaining build issues with MSVC&amp;lt;/strike&amp;gt; DONE&lt;br /&gt;
* &amp;lt;strike&amp;gt;Add OPUS_EXPORT override for chrome&amp;lt;/strike&amp;gt; DONE&lt;br /&gt;
&lt;br /&gt;
== 1.1-beta ==&lt;br /&gt;
&lt;br /&gt;
* tune transient detector&lt;br /&gt;
* variable frame size?&lt;br /&gt;
* LOTS of testing&lt;br /&gt;
* re-tune hybrid rate allocation&lt;br /&gt;
* re-tune mode switching decisions&lt;br /&gt;
* figure out how to use speech/music detection optimally&lt;br /&gt;
* everything from 1.0.2&lt;br /&gt;
&lt;br /&gt;
== Lower priority ==&lt;br /&gt;
&lt;br /&gt;
* Handle packets with PLC frames followed by FEC&lt;br /&gt;
* Better handling for the case where FEC has a different bandwidth than the current mode&lt;br /&gt;
* PLC transitions on unprotected SILK-SILK bandwidth changes?&lt;br /&gt;
&lt;br /&gt;
== Spec ==&lt;br /&gt;
* Ogg mapping&lt;br /&gt;
* Matroska mapping. See: [[MatroskaOpus]]&lt;br /&gt;
* RTP payload format&lt;br /&gt;
&lt;br /&gt;
== Website ==&lt;br /&gt;
* De-uglify webpage&lt;br /&gt;
* Promotional material&lt;br /&gt;
&lt;br /&gt;
== Other ==&lt;br /&gt;
&lt;br /&gt;
* Oggz-validate (should also validate opus toc)&lt;br /&gt;
&lt;br /&gt;
== Opus-tools ==&lt;br /&gt;
* A simple real time streaming example tool&lt;br /&gt;
* Replaygain (half done— needs a gain tool)&lt;br /&gt;
&lt;br /&gt;
== Experiments ==&lt;br /&gt;
&lt;br /&gt;
* Test exp_analysis and void_my_warranty.patch&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Future work ==&lt;br /&gt;
* Smart automatic mode decision&lt;br /&gt;
* psymodel based VBR&lt;br /&gt;
* Remove copy in inverse MDCT&lt;br /&gt;
* Save some float&amp;lt;-&amp;gt;int conversions&lt;br /&gt;
* Improvements to LP mode CBR (greg has some code)&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-11-02T22:01:33Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ I've not been making updates lately&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
* 2012 August 10 [https://people.xiph.org/~giles/2012/daala_20120810.txt minutes]&lt;br /&gt;
* 2012 August 17 - no meeting&lt;br /&gt;
* 2012 August 24 - [https://people.xiph.org/~giles/2012/daala_20120824.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120824.opus recording]&lt;br /&gt;
* 2012 August 31 - no meeting&lt;br /&gt;
* 2012 September 7 - no meeting&lt;br /&gt;
* 2012 September 14 - no meeting&lt;br /&gt;
* 2012 September 21 - [https://people.xiph.org/~giles/2012/daala_20120921.txt minutes]&lt;br /&gt;
* 2012 September 28 - [https://people.xiph.org/~giles/2012/daala_20120928.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120928.opus recording]&lt;br /&gt;
* 2012 October 5 - [https://people.xiph.org/~giles/2012/daala_20121005.opus recording]&lt;br /&gt;
* 20120 October 26 - &lt;br /&gt;
* 2012 Novemeber 2 - no meeting&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-10-05T20:30:35Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ Recording link for today's meeting.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
* 2012 August 10 [https://people.xiph.org/~giles/2012/daala_20120810.txt minutes]&lt;br /&gt;
* 2012 August 17 - no meeting&lt;br /&gt;
* 2012 August 24 - [https://people.xiph.org/~giles/2012/daala_20120824.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120824.opus recording]&lt;br /&gt;
* 2012 August 31 - no meeting&lt;br /&gt;
* 2012 September 7 - no meeting&lt;br /&gt;
* 2012 September 14 - no meeting&lt;br /&gt;
* 2012 September 21 - [https://people.xiph.org/~giles/2012/daala_20120921.txt minutes]&lt;br /&gt;
* 2012 September 28 - [https://people.xiph.org/~giles/2012/daala_20120928.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120928.opus recording]&lt;br /&gt;
* 2012 October 5 - [https://people.xiph.org/~giles/2012/daala_20121005.opus recording]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-09-28T22:48:35Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ Link to minutes and recording&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
* 2012 August 10 [https://people.xiph.org/~giles/2012/daala_20120810.txt minutes]&lt;br /&gt;
* 2012 August 17 - no meeting&lt;br /&gt;
* 2012 August 24 - [https://people.xiph.org/~giles/2012/daala_20120824.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120824.opus recording]&lt;br /&gt;
* 2012 August 31 - no meeting&lt;br /&gt;
* 2012 September 7 - no meeting&lt;br /&gt;
* 2012 September 14 - no meeting&lt;br /&gt;
* 2012 September 21 - [https://people.xiph.org/~giles/2012/daala_20120921.txt minutes]&lt;br /&gt;
* 2012 September 28 - [https://people.xiph.org/~giles/2012/daala_20120928.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120928.opus recording]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-09-21T20:39:09Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ Post minutes from today's meeting.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
* 2012 August 10 [https://people.xiph.org/~giles/2012/daala_20120810.txt minutes]&lt;br /&gt;
* 2012 August 17 - no meeting&lt;br /&gt;
* 2012 August 24 - [https://people.xiph.org/~giles/2012/daala_20120824.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120824.opus recording]&lt;br /&gt;
* 2012 August 31 - no meeting&lt;br /&gt;
* 2012 September 7 - no meeting&lt;br /&gt;
* 2012 September 14 - no meeting&lt;br /&gt;
* 2012 September 21 - [https://people.xiph.org/~giles/2012/daala_20120921.txt minutes]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/MatroskaOpus</id>
		<title>MatroskaOpus</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/MatroskaOpus"/>
				<updated>2012-09-14T22:49:38Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* DRAFT */ empty is the same as not present&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''DRAFT''' ==&lt;br /&gt;
&lt;br /&gt;
This is an encapsulation spec for the [[Opus]] codec in [[http://matroska.org/ Matroska]]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.&lt;br /&gt;
&lt;br /&gt;
 - CodecID is A_OPUS&lt;br /&gt;
 - SampleFrequecy is 48000&lt;br /&gt;
 - Channels is number of output PCM channels&lt;br /&gt;
 - CodecPrivate is the 'OpusHead' packet, identical to the Ogg mapping&lt;br /&gt;
&lt;br /&gt;
The 'OpusHead' format is defined by the [[http://tools.ietf.org/html/draft-terriberry-oggopus Ogg Opus]] mapping. In particular it includes pre-skip, gain, and the channel mapping table required for correct surround output.&lt;br /&gt;
&lt;br /&gt;
The second 'OpusTags' header packet from Ogg Opus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.&lt;br /&gt;
&lt;br /&gt;
If the CodecPrivate is empty or not present and Channels is 1 or 2, players MAY treat it as a sane set of defaults, I guess. e.g. channel mapping family 0, no pre-skip or gain. For Channels &amp;gt; 2 the track MUST be rejected, since there's no way to map the encoded substreams to channels.&lt;br /&gt;
&lt;br /&gt;
== Open Questions ==&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?&lt;br /&gt;
&lt;br /&gt;
How does the OpusHead pre-skip field interact with the timestamps? The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero. Moritz suggests this won't work because the resolution of the timestamps is controlled by the muxer, so the SimpleBlock timestamp offset isn't sample accurate anyway.[[http://lists.matroska.org/pipermail/matroska-devel/2012-September/004254.html ref]]&lt;br /&gt;
&lt;br /&gt;
One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped &amp;lt;= start of output this shouldn't affect seeking.&lt;br /&gt;
&lt;br /&gt;
How important is it that timestamps start at zero in a Matroska file?&lt;br /&gt;
&lt;br /&gt;
The SimpleBlock structure also has an 'invisible' bit, which tells the player to decode, but not display, the contained frames. This lets the muxer signal the pre-skip semantics with frame accuracy, but not sample accuracy. If players implement this it will at least help with sync. Libav does not appear to support the invisible bit.&lt;br /&gt;
&lt;br /&gt;
How can sample-accurate end-time trimming work in Matroska? Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus. This needs a new element specifying the number of samples to trim, perhaps a new BlockGroup child.&lt;br /&gt;
&lt;br /&gt;
If new elements are required, can they be defined so as to enable correct seeking in rolling intra (a.k.a intra refresh) video as well?&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/MatroskaOpus</id>
		<title>MatroskaOpus</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/MatroskaOpus"/>
				<updated>2012-09-14T21:45:47Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Open Questions */ updates from today's dicussion with Moritz&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''DRAFT''' ==&lt;br /&gt;
&lt;br /&gt;
This is an encapsulation spec for the [[Opus]] codec in [[http://matroska.org/ Matroska]]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.&lt;br /&gt;
&lt;br /&gt;
 - CodecID is A_OPUS&lt;br /&gt;
 - SampleFrequecy is 48000&lt;br /&gt;
 - Channels is number of output PCM channels&lt;br /&gt;
 - CodecPrivate is the 'OpusHead' packet, identical to the Ogg mapping&lt;br /&gt;
&lt;br /&gt;
The 'OpusHead' format is defined by the [[http://tools.ietf.org/html/draft-terriberry-oggopus Ogg Opus]] mapping. In particular it includes pre-skip, gain, and the channel mapping table required for correct surround output.&lt;br /&gt;
&lt;br /&gt;
The second 'OpusTags' header packet from Ogg Opus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.&lt;br /&gt;
&lt;br /&gt;
If the CodecPrivate is empty and Channels is 1 or 2, players MAY treat it as a sane set of defaults, I guess. e.g. channel mapping family 0, no pre-skip or gain. For Channels &amp;gt; 2 the track MUST be rejected, since there's no way to map the encoded substreams to channels.&lt;br /&gt;
&lt;br /&gt;
== Open Questions ==&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?&lt;br /&gt;
&lt;br /&gt;
How does the OpusHead pre-skip field interact with the timestamps? The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero. Moritz suggests this won't work because the resolution of the timestamps is controlled by the muxer, so the SimpleBlock timestamp offset isn't sample accurate anyway.[[http://lists.matroska.org/pipermail/matroska-devel/2012-September/004254.html ref]]&lt;br /&gt;
&lt;br /&gt;
One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped &amp;lt;= start of output this shouldn't affect seeking.&lt;br /&gt;
&lt;br /&gt;
How important is it that timestamps start at zero in a Matroska file?&lt;br /&gt;
&lt;br /&gt;
The SimpleBlock structure also has an 'invisible' bit, which tells the player to decode, but not display, the contained frames. This lets the muxer signal the pre-skip semantics with frame accuracy, but not sample accuracy. If players implement this it will at least help with sync. Libav does not appear to support the invisible bit.&lt;br /&gt;
&lt;br /&gt;
How can sample-accurate end-time trimming work in Matroska? Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus. This needs a new element specifying the number of samples to trim, perhaps a new BlockGroup child.&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/MatroskaOpus</id>
		<title>MatroskaOpus</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/MatroskaOpus"/>
				<updated>2012-09-14T17:20:29Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Open Questions */ remove duplicate paragraph&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''DRAFT''' ==&lt;br /&gt;
&lt;br /&gt;
This is an encapsulation spec for the [[Opus]] codec in [[http://matroska.org/ Matroska]]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.&lt;br /&gt;
&lt;br /&gt;
 - CodecID is A_OPUS&lt;br /&gt;
 - SampleFrequecy is 48000&lt;br /&gt;
 - Channels is number of output PCM channels&lt;br /&gt;
 - CodecPrivate is the 'OpusHead' packet, identical to the Ogg mapping&lt;br /&gt;
&lt;br /&gt;
The 'OpusHead' format is defined by the [[http://tools.ietf.org/html/draft-terriberry-oggopus Ogg Opus]] mapping. In particular it includes pre-skip, gain, and the channel mapping table required for correct surround output.&lt;br /&gt;
&lt;br /&gt;
The second 'OpusTags' header packet from Ogg Opus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.&lt;br /&gt;
&lt;br /&gt;
If the CodecPrivate is empty and Channels is 1 or 2, players MAY treat it as a sane set of defaults, I guess. e.g. channel mapping family 0, no pre-skip or gain. For Channels &amp;gt; 2 the track MUST be rejected, since there's no way to map the encoded substreams to channels.&lt;br /&gt;
&lt;br /&gt;
== Open Questions ==&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?&lt;br /&gt;
&lt;br /&gt;
How does the OpusHead pre-skip field interact with the timestamps? The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero.&lt;br /&gt;
&lt;br /&gt;
One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped &amp;lt;= start of output this shouldn't affect seeking.&lt;br /&gt;
&lt;br /&gt;
How important is it that timestamps start at zero in a Matroska file?&lt;br /&gt;
&lt;br /&gt;
The SimpleBlock structure also has an 'invisible' bit, which tells the player to decode, but not display, the contained frames. This lets the muxer signal the pre-skip semantics with frame accuracy, but not sample accuracy. If players implement this it will at least help with sync. Libav does not appear to support the invisible bit.&lt;br /&gt;
&lt;br /&gt;
How can sample-accurate end-time trimming work in Matroska? Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus.&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/MatroskaOpus</id>
		<title>MatroskaOpus</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/MatroskaOpus"/>
				<updated>2012-09-14T17:19:19Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Open Questions */ typo&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''DRAFT''' ==&lt;br /&gt;
&lt;br /&gt;
This is an encapsulation spec for the [[Opus]] codec in [[http://matroska.org/ Matroska]]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.&lt;br /&gt;
&lt;br /&gt;
 - CodecID is A_OPUS&lt;br /&gt;
 - SampleFrequecy is 48000&lt;br /&gt;
 - Channels is number of output PCM channels&lt;br /&gt;
 - CodecPrivate is the 'OpusHead' packet, identical to the Ogg mapping&lt;br /&gt;
&lt;br /&gt;
The 'OpusHead' format is defined by the [[http://tools.ietf.org/html/draft-terriberry-oggopus Ogg Opus]] mapping. In particular it includes pre-skip, gain, and the channel mapping table required for correct surround output.&lt;br /&gt;
&lt;br /&gt;
The second 'OpusTags' header packet from Ogg Opus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.&lt;br /&gt;
&lt;br /&gt;
If the CodecPrivate is empty and Channels is 1 or 2, players MAY treat it as a sane set of defaults, I guess. e.g. channel mapping family 0, no pre-skip or gain. For Channels &amp;gt; 2 the track MUST be rejected, since there's no way to map the encoded substreams to channels.&lt;br /&gt;
&lt;br /&gt;
== Open Questions ==&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?&lt;br /&gt;
&lt;br /&gt;
How does the OpusHead pre-skip field interact with the timestamps? The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero.&lt;br /&gt;
&lt;br /&gt;
One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped &amp;lt;= start of output this shouldn't affect seeking.&lt;br /&gt;
&lt;br /&gt;
How important is it that timestamps start at zero in a Matroska file?&lt;br /&gt;
&lt;br /&gt;
The SimpleBlock structure also has an 'invisible' bit, which tells the player to decode, but not display, the contained frames. This lets the muxer signal the pre-skip semantics with frame accuracy, but not sample accuracy. If players implement this it will at least help with sync. Libav does not appear to support the invisible bit.&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
How can sample-accurate end-time trimming work in Matroska? Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus.&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/MatroskaOpus</id>
		<title>MatroskaOpus</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/MatroskaOpus"/>
				<updated>2012-09-14T17:16:44Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* DRAFT */ Link to the OggOpus draft&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''DRAFT''' ==&lt;br /&gt;
&lt;br /&gt;
This is an encapsulation spec for the [[Opus]] codec in [[http://matroska.org/ Matroska]]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.&lt;br /&gt;
&lt;br /&gt;
 - CodecID is A_OPUS&lt;br /&gt;
 - SampleFrequecy is 48000&lt;br /&gt;
 - Channels is number of output PCM channels&lt;br /&gt;
 - CodecPrivate is the 'OpusHead' packet, identical to the Ogg mapping&lt;br /&gt;
&lt;br /&gt;
The 'OpusHead' format is defined by the [[http://tools.ietf.org/html/draft-terriberry-oggopus Ogg Opus]] mapping. In particular it includes pre-skip, gain, and the channel mapping table required for correct surround output.&lt;br /&gt;
&lt;br /&gt;
The second 'OpusTags' header packet from Ogg Opus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.&lt;br /&gt;
&lt;br /&gt;
If the CodecPrivate is empty and Channels is 1 or 2, players MAY treat it as a sane set of defaults, I guess. e.g. channel mapping family 0, no pre-skip or gain. For Channels &amp;gt; 2 the track MUST be rejected, since there's no way to map the encoded substreams to channels.&lt;br /&gt;
&lt;br /&gt;
== Open Questions ==&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?&lt;br /&gt;
&lt;br /&gt;
How does the OpusHead pre-skip field interact with the timestamps? The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero.&lt;br /&gt;
&lt;br /&gt;
One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped &amp;lt;= start of output this shouldn't affect seeking.&lt;br /&gt;
&lt;br /&gt;
How important is it that timestamps start at zero in a Matroska file?&lt;br /&gt;
&lt;br /&gt;
The SimpleBlock structure also has an 'invisible' bit, which tell the player to decode, but not display, the contained frames. This lets the muxer signal the pre-skip semantics with frame accuracy, but not sample accuracy. If players implement this it will at least help with sync. Libav does not appear to support the invisible bit.&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
How can sample-accurate end-time trimming work in Matroska? Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus.&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/MatroskaOpus</id>
		<title>MatroskaOpus</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/MatroskaOpus"/>
				<updated>2012-09-14T17:11:24Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* DRAFT */ remove unhelpful quotes&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''DRAFT''' ==&lt;br /&gt;
&lt;br /&gt;
This is an encapsulation spec for the [[Opus]] codec in [[http://matroska.org/ Matroska]]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.&lt;br /&gt;
&lt;br /&gt;
 - CodecID is A_OPUS&lt;br /&gt;
 - SampleFrequecy is 48000&lt;br /&gt;
 - Channels is number of output PCM channels&lt;br /&gt;
 - CodecPrivate is the 'OpusHead' packet, identical to the OggOpus mapping&lt;br /&gt;
&lt;br /&gt;
The second 'OpusTags' header packet from OggOpus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.&lt;br /&gt;
&lt;br /&gt;
If the CodecPrivate is empty and Channels is 1 or 2, players MAY treat it as a sane set of defaults, I guess. e.g. channel mapping family 0, no pre-skip or gain. For Channels &amp;gt; 2 the track MUST be rejected, since there's no way to map the encoded substreams to channels.&lt;br /&gt;
&lt;br /&gt;
== Open Questions ==&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?&lt;br /&gt;
&lt;br /&gt;
How does the OpusHead pre-skip field interact with the timestamps? The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero.&lt;br /&gt;
&lt;br /&gt;
One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped &amp;lt;= start of output this shouldn't affect seeking.&lt;br /&gt;
&lt;br /&gt;
How important is it that timestamps start at zero in a Matroska file?&lt;br /&gt;
&lt;br /&gt;
The SimpleBlock structure also has an 'invisible' bit, which tell the player to decode, but not display, the contained frames. This lets the muxer signal the pre-skip semantics with frame accuracy, but not sample accuracy. If players implement this it will at least help with sync. Libav does not appear to support the invisible bit.&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
How can sample-accurate end-time trimming work in Matroska? Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus.&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/MatroskaOpus</id>
		<title>MatroskaOpus</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/MatroskaOpus"/>
				<updated>2012-09-14T17:10:24Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Questions */ open questions are open&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''DRAFT''' ==&lt;br /&gt;
&lt;br /&gt;
This is an encapsulation spec for the [[Opus]] codec in [[http://matroska.org/ Matroska]]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.&lt;br /&gt;
&lt;br /&gt;
 - CodecID is A_OPUS&lt;br /&gt;
 - SampleFrequecy is 48000&lt;br /&gt;
 - Channels is number of output PCM channels&lt;br /&gt;
 - CodecPrivate is the 'OpusHead' packet, identical to the OggOpus mapping&lt;br /&gt;
&lt;br /&gt;
The second 'OpusTags' header packet from OggOpus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.&lt;br /&gt;
&lt;br /&gt;
If the CodecPrivate is empty and 'Channels' is 1 or 2, players MAY treat it as a sane set of defaults, I guess. e.g. channel mapping family 0, no pre-skip or gain. For 'Channels &amp;gt; 2' the track MUST be rejected, since there's no way to map the encoded substreams to channels.&lt;br /&gt;
&lt;br /&gt;
== Open Questions ==&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?&lt;br /&gt;
&lt;br /&gt;
How does the OpusHead pre-skip field interact with the timestamps? The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero.&lt;br /&gt;
&lt;br /&gt;
One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped &amp;lt;= start of output this shouldn't affect seeking.&lt;br /&gt;
&lt;br /&gt;
How important is it that timestamps start at zero in a Matroska file?&lt;br /&gt;
&lt;br /&gt;
The SimpleBlock structure also has an 'invisible' bit, which tell the player to decode, but not display, the contained frames. This lets the muxer signal the pre-skip semantics with frame accuracy, but not sample accuracy. If players implement this it will at least help with sync. Libav does not appear to support the invisible bit.&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
How can sample-accurate end-time trimming work in Matroska? Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus.&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/MatroskaOpus</id>
		<title>MatroskaOpus</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/MatroskaOpus"/>
				<updated>2012-09-14T17:10:04Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Questions */ update questions with the PreRoll element idea&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''DRAFT''' ==&lt;br /&gt;
&lt;br /&gt;
This is an encapsulation spec for the [[Opus]] codec in [[http://matroska.org/ Matroska]]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.&lt;br /&gt;
&lt;br /&gt;
 - CodecID is A_OPUS&lt;br /&gt;
 - SampleFrequecy is 48000&lt;br /&gt;
 - Channels is number of output PCM channels&lt;br /&gt;
 - CodecPrivate is the 'OpusHead' packet, identical to the OggOpus mapping&lt;br /&gt;
&lt;br /&gt;
The second 'OpusTags' header packet from OggOpus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.&lt;br /&gt;
&lt;br /&gt;
If the CodecPrivate is empty and 'Channels' is 1 or 2, players MAY treat it as a sane set of defaults, I guess. e.g. channel mapping family 0, no pre-skip or gain. For 'Channels &amp;gt; 2' the track MUST be rejected, since there's no way to map the encoded substreams to channels.&lt;br /&gt;
&lt;br /&gt;
== Questions ==&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?&lt;br /&gt;
&lt;br /&gt;
How does the OpusHead pre-skip field interact with the timestamps? The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero.&lt;br /&gt;
&lt;br /&gt;
One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped &amp;lt;= start of output this shouldn't affect seeking.&lt;br /&gt;
&lt;br /&gt;
How important is it that timestamps start at zero in a Matroska file?&lt;br /&gt;
&lt;br /&gt;
The SimpleBlock structure also has an 'invisible' bit, which tell the player to decode, but not display, the contained frames. This lets the muxer signal the pre-skip semantics with frame accuracy, but not sample accuracy. If players implement this it will at least help with sync. Libav does not appear to support the invisible bit.&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results. We need a new element to signal this, e.g. Track::TrackEntry::PreRoll.&lt;br /&gt;
&lt;br /&gt;
How can sample-accurate end-time trimming work in Matroska? Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus.&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/MatroskaOpus</id>
		<title>MatroskaOpus</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/MatroskaOpus"/>
				<updated>2012-09-14T17:08:09Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* DRAFT */ remove the simpler mapping since we have consensus for the OpusHead proposal&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== '''DRAFT''' ==&lt;br /&gt;
&lt;br /&gt;
This is an encapsulation spec for the [[Opus]] codec in [[http://matroska.org/ Matroska]]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.&lt;br /&gt;
&lt;br /&gt;
 - CodecID is A_OPUS&lt;br /&gt;
 - SampleFrequecy is 48000&lt;br /&gt;
 - Channels is number of output PCM channels&lt;br /&gt;
 - CodecPrivate is the 'OpusHead' packet, identical to the OggOpus mapping&lt;br /&gt;
&lt;br /&gt;
The second 'OpusTags' header packet from OggOpus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.&lt;br /&gt;
&lt;br /&gt;
If the CodecPrivate is empty and 'Channels' is 1 or 2, players MAY treat it as a sane set of defaults, I guess. e.g. channel mapping family 0, no pre-skip or gain. For 'Channels &amp;gt; 2' the track MUST be rejected, since there's no way to map the encoded substreams to channels.&lt;br /&gt;
&lt;br /&gt;
== Questions ==&lt;br /&gt;
&lt;br /&gt;
Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?&lt;br /&gt;
&lt;br /&gt;
How does the OpusHead pre-skip field interact with the timestamps? The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero.&lt;br /&gt;
&lt;br /&gt;
One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped &amp;lt;= start of output this shouldn't affect seeking.&lt;br /&gt;
&lt;br /&gt;
How important is it that timestamps start at zero in a Matroska file?&lt;br /&gt;
&lt;br /&gt;
The SimpleBlock structure also has an 'invisible' bit, which tell the player to decode, but not display, the contained frames. This lets the muxer signal the pre-skip semantics with frame accuracy, but not sample accuracy. If players implement this it will at least help with sync. Libav does not appear to support the invisible bit.&lt;br /&gt;
&lt;br /&gt;
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results.&lt;br /&gt;
&lt;br /&gt;
How can sample-accurate end-time trimming work in Matroska? Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus.&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-09-08T17:29:48Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ no meeting last this week: travel and Opus work&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
* 2012 August 10 [https://people.xiph.org/~giles/2012/daala_20120810.txt minutes]&lt;br /&gt;
* 2012 August 17 - no meeting&lt;br /&gt;
* 2012 August 24 - [https://people.xiph.org/~giles/2012/daala_20120824.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120824.opus recording]&lt;br /&gt;
* 2012 August 31 - no meeting&lt;br /&gt;
* 2012 September 7 - no meeting&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-08-31T20:08:18Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */  No meeting this week.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
* 2012 August 10 [https://people.xiph.org/~giles/2012/daala_20120810.txt minutes]&lt;br /&gt;
* 2012 August 17 - no meeting&lt;br /&gt;
* 2012 August 24 - [https://people.xiph.org/~giles/2012/daala_20120824.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120824.opus recording]&lt;br /&gt;
* 2012 August 31 - no meeting&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-08-24T21:54:37Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ Recording link for today's meeting.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
* 2012 August 10 [https://people.xiph.org/~giles/2012/daala_20120810.txt minutes]&lt;br /&gt;
* 2012 August 17 - no meeting&lt;br /&gt;
* 2012 August 24 - [https://people.xiph.org/~giles/2012/daala_20120824.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120824.opus recording]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-08-24T20:35:17Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ upload today's meeting minutes&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
* 2012 August 10 [https://people.xiph.org/~giles/2012/daala_20120810.txt minutes]&lt;br /&gt;
* 2012 August 17 - no meeting&lt;br /&gt;
* 2012 August 24 - [https://people.xiph.org/~giles/2012/daala_20120824.txt minutes]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-08-20T21:34:45Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ last week's meeting was skipped&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
* 2012 August 10 [https://people.xiph.org/~giles/2012/daala_20120810.txt minutes]&lt;br /&gt;
* 2012 August 17 - no meeting&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-08-13T17:09:44Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ upload friday's meeting minutes&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
* 2012 August 10 [https://people.xiph.org/~giles/2012/daala_20120810.txt minutes]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-08-03T22:46:19Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ Link to this week's recording&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120803.opus recording]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-08-03T21:46:43Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ post today's meeting minutes&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
* 2012 August 3 [https://people.xiph.org/~giles/2012/daala_20120803.txt minutes]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-07-29T15:44:18Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ link minutes from this week's meeting&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
* 2012 July 27 [https://people.xiph.org/~giles/2012/daala_20120727.txt minutes]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-07-21T14:19:11Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ Fix date on the latest minutes.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 20 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-07-20T22:11:30Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ link minutes from this week's meeting&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120720.txt minutes]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-07-20T19:19:53Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ upload minutes from last week's meeting. I didn't make a recording.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
* 2012 July 13 [https://people.xiph.org/~giles/2012/daala_20120713.txt minutes]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-07-09T21:35:18Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Weekly meetings */ Minutes from the July 6 daala weekly workshop&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
* 2012 July 6 [https://people.xiph.org/~giles/2012/daala_20120706.txt minutes]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OggOpus</id>
		<title>OggOpus</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OggOpus"/>
				<updated>2012-07-05T22:44:49Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: Link to the ietf draft&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''Superceeded by [http://tools.ietf.org/html/draft-terriberry-oggopus the ietf draft].'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Ogg Mapping for Opus ==&lt;br /&gt;
&lt;br /&gt;
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.&lt;br /&gt;
&lt;br /&gt;
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.&lt;br /&gt;
&lt;br /&gt;
The remaining parameters that must be signaled are&lt;br /&gt;
&lt;br /&gt;
* The magic number for stream identification,&lt;br /&gt;
* The stream count and coupling for multichannel audio, and&lt;br /&gt;
* Any metadata or tags.&lt;br /&gt;
&lt;br /&gt;
=== Content Type ===&lt;br /&gt;
&lt;br /&gt;
The recommended mime-type for Ogg Opus files is '''audio/ogg''', defined in [http://www.ietf.org/rfc/rfc5334.txt RFC 5334].&lt;br /&gt;
&lt;br /&gt;
If more specificity is desired, one can distinguish Opus files as 'audio/ogg; codecs=opus'.&lt;br /&gt;
&lt;br /&gt;
The recommended filename extension for Ogg Opus files is '''.opus'''.&lt;br /&gt;
&lt;br /&gt;
=== Packet Organization ===&lt;br /&gt;
&lt;br /&gt;
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. &lt;br /&gt;
&lt;br /&gt;
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.&lt;br /&gt;
&lt;br /&gt;
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes &amp;quot;OpusHead&amp;quot;. It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.&lt;br /&gt;
&lt;br /&gt;
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes &amp;quot;OpusTags&amp;quot;. It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.&lt;br /&gt;
&lt;br /&gt;
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first audio packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.&lt;br /&gt;
&lt;br /&gt;
=== Granule Position ===&lt;br /&gt;
&lt;br /&gt;
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.&lt;br /&gt;
&lt;br /&gt;
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.&lt;br /&gt;
&lt;br /&gt;
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below.&lt;br /&gt;
&lt;br /&gt;
All other pages with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer can assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.&lt;br /&gt;
&lt;br /&gt;
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.&lt;br /&gt;
&lt;br /&gt;
The PCM sample position is determined from the granule position using the formula&lt;br /&gt;
&lt;br /&gt;
 'PCM sample position' = 'granule position' - 'pre-skip' .&lt;br /&gt;
&lt;br /&gt;
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula&lt;br /&gt;
&lt;br /&gt;
                   'PCM sample position'&lt;br /&gt;
 'playback time' = --------------------- .&lt;br /&gt;
                          48000.0&lt;br /&gt;
&lt;br /&gt;
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.&lt;br /&gt;
&lt;br /&gt;
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.&lt;br /&gt;
&lt;br /&gt;
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.&lt;br /&gt;
&lt;br /&gt;
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.&lt;br /&gt;
&lt;br /&gt;
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.&lt;br /&gt;
&lt;br /&gt;
==== ID Header ====&lt;br /&gt;
&lt;br /&gt;
      0                   1                   2                   3&lt;br /&gt;
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;
     |       'O'     |      'p'      |     'u'       |     's'       |&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;
     |       'H'     |       'e'     |     'a'       |     'd'       |&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;
     |  version = 1  | channel count |           pre-skip            |&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;
     |                original input sample rate in Hz               |&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;
     |    output gain Q7.8 in dB     |  channel map  |               |&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               :&lt;br /&gt;
     |                                                               |&lt;br /&gt;
     :          optional channel mapping table...                    :&lt;br /&gt;
     |                                                               |&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;
&lt;br /&gt;
Brief description of each field:&lt;br /&gt;
&lt;br /&gt;
 - Magic signature: &amp;quot;OpusHead&amp;quot; (64 bits)&lt;br /&gt;
 - Version number (8 bits unsigned): 0x01 for this spec&lt;br /&gt;
 - Channel count 'c' (8 bits unsigned): MUST be &amp;gt; 0&lt;br /&gt;
 - Pre-skip (16 bits unsigned, little endian)&lt;br /&gt;
 - Input sample rate (32 bits unsigned, little endian): informational only&lt;br /&gt;
 - Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when&lt;br /&gt;
   decoding&lt;br /&gt;
 - Channel mapping family (8 bits unsigned)&lt;br /&gt;
  --  0 = one stream: mono or L,R stereo&lt;br /&gt;
  --  1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...&lt;br /&gt;
  --  2..254 = reserved (treat as 255)&lt;br /&gt;
  --  255 = no defined channel meaning&lt;br /&gt;
 If channel mapping family &amp;gt; 0&lt;br /&gt;
 - Stream count 'N' (8 bits unsigned): MUST be &amp;gt; 0&lt;br /&gt;
 - Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M &amp;lt;= N, M+N &amp;lt;= 255&lt;br /&gt;
 - Channel mapping (8*c bits)&lt;br /&gt;
   -- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Detailed definition of each field:&lt;br /&gt;
&lt;br /&gt;
* '''Magic signature'''&lt;br /&gt;
The magic signature &amp;quot;OpusHead&amp;quot; allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.&lt;br /&gt;
&lt;br /&gt;
* '''Version'''&lt;br /&gt;
The version number MUST always be '1' for this version of the encapsulation specification.&lt;br /&gt;
&lt;br /&gt;
Implementations SHOULD treat streams where the upper four bits of the version number match a recognized specification as backwards-compatible with that specification. That is, the version number can be considered split into &amp;quot;major&amp;quot; and &amp;quot;minor&amp;quot; version sub-fields, with changes to the &amp;quot;minor&amp;quot; sub-field in the lower four bits signaling compatible changes. For example, a decoder implementing this specification SHOULD accept any stream with a version number 15 or less, and SHOULD assume any stream with a version number 16 or greater is incompatible. The initial version '1' was chosen to keep implementations from relying on this byte as a null terminator for the OpusHead string.&lt;br /&gt;
&lt;br /&gt;
* '''Channel count''' 'c'&lt;br /&gt;
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.&lt;br /&gt;
&lt;br /&gt;
* '''Pre-skip'''&lt;br /&gt;
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.&lt;br /&gt;
&lt;br /&gt;
When constructing cropped Ogg Opus streams, a pre-skip of at least 3840 samples (80 ms) is RECOMMENDED to ensure complete convergence.&lt;br /&gt;
&lt;br /&gt;
* '''Input sample rate'''&lt;br /&gt;
This is ''not'' the sample rate to use for playback of the encoded data.&lt;br /&gt;
&lt;br /&gt;
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.&lt;br /&gt;
&lt;br /&gt;
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:&lt;br /&gt;
* If the hardware supports 48 kHz playback, decode at 48 kHz,&lt;br /&gt;
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,&lt;br /&gt;
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,&lt;br /&gt;
* else decode at 48 kHz and resample.&lt;br /&gt;
&lt;br /&gt;
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.&lt;br /&gt;
&lt;br /&gt;
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't &lt;br /&gt;
actually upsample the output to 10 MHz if requested).&lt;br /&gt;
&lt;br /&gt;
* '''Output gain'''&lt;br /&gt;
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.&lt;br /&gt;
&lt;br /&gt;
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.&lt;br /&gt;
&lt;br /&gt;
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.&lt;br /&gt;
&lt;br /&gt;
The gain is the 20 log&amp;lt;sub&amp;gt;10&amp;lt;/sub&amp;gt; ratio of output to input sample values to be applied to the decoder output. E.g. &amp;lt;code&amp;gt;sample *= pow(10, header.gain/(20.*256))&amp;lt;/code&amp;gt; where header.gain is the raw 16 bit Q7.8 value from the header.&lt;br /&gt;
&lt;br /&gt;
* '''Channel mapping family'''&lt;br /&gt;
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet.  &lt;br /&gt;
&lt;br /&gt;
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:&lt;br /&gt;
&lt;br /&gt;
* Family 0 (RTP mapping)&lt;br /&gt;
** Allowed numbers of channels: 1 or 2&lt;br /&gt;
** 1 channel: monophonic (mono)&lt;br /&gt;
** 2 channels: stereo (left, right)&lt;br /&gt;
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1.  When the channel mapping byte has this value, no further fields are present in OpusHead.&lt;br /&gt;
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])&lt;br /&gt;
** Allowed numbers of channels: 1 ... 8&lt;br /&gt;
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.&lt;br /&gt;
* Family 255 (no defined channel meaning)&lt;br /&gt;
** Allowed numbers of channels: 1...255&lt;br /&gt;
** Channels are unidentified.  General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.&lt;br /&gt;
&lt;br /&gt;
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.&lt;br /&gt;
&lt;br /&gt;
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.&lt;br /&gt;
&lt;br /&gt;
* '''Stream count''' 'N'&lt;br /&gt;
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.&lt;br /&gt;
&lt;br /&gt;
For channel mapping family 0, this value defaults to 1, and is not coded.&lt;br /&gt;
&lt;br /&gt;
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.&lt;br /&gt;
&lt;br /&gt;
* '''Two-channel stream count''' 'M'&lt;br /&gt;
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.&lt;br /&gt;
&lt;br /&gt;
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.&lt;br /&gt;
&lt;br /&gt;
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.&lt;br /&gt;
&lt;br /&gt;
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The &amp;quot;two-channel stream count&amp;quot; field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.&lt;br /&gt;
&lt;br /&gt;
* '''Channel mapping'''&lt;br /&gt;
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.&lt;br /&gt;
&lt;br /&gt;
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.&lt;br /&gt;
&lt;br /&gt;
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.&lt;br /&gt;
&lt;br /&gt;
==== Comment Header ====&lt;br /&gt;
&lt;br /&gt;
 - 8 byte 'OpusTags' magic signature (64 bits)&lt;br /&gt;
 - The remaining data follows the vorbis-comment header design used in OggVorbis (without the &amp;quot;framing-bit&amp;quot;), OggTheora, and Speex:&lt;br /&gt;
  * Vendor string (always present).&lt;br /&gt;
  ** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.&lt;br /&gt;
  * TAG=value metadata strings (zero or more).&lt;br /&gt;
  ** 4-byte little-endian string count.&lt;br /&gt;
  ** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in &amp;quot;tag=value&amp;quot; form.&lt;br /&gt;
&lt;br /&gt;
One new comment field is introduced for Ogg Opus:&lt;br /&gt;
 R128_TRACK_GAIN=-573  &lt;br /&gt;
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead &amp;quot;output gain&amp;quot; field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.&lt;br /&gt;
&lt;br /&gt;
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write &amp;quot;R128_TRACK_GAIN=0&amp;quot;. If a tool modifies the OpusHead &amp;quot;output gain&amp;quot; field, it MUST also update or remove the R128_TRACK_GAIN comment field.&lt;br /&gt;
&lt;br /&gt;
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.&lt;br /&gt;
&lt;br /&gt;
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.&lt;br /&gt;
&lt;br /&gt;
== Other Implementation Notes ==&lt;br /&gt;
&lt;br /&gt;
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.&lt;br /&gt;
&lt;br /&gt;
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Encoders SHOULD use no more padding than required to make a variable bitrate (VBR) stream constant bitrate (CBR). Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message.&lt;br /&gt;
&lt;br /&gt;
In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of (61,298*N - 2) bytes, or about 60 kB per Opus stream. With 255 streams, this is 15,630,988 bytes (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 streams, each containing 120 ms of audio encoded as 2.5 ms frames, each frame using the maximum possible number of bytes (1275) and stored in the least efficient manner allowed (a VBR code 3 Opus packet). Even in such a packet, most of the data will be zeros, as 2.5 ms frames, which are required to run in the MDCT mode, cannot actually use all 1275 bytes. The largest packet consisting entirely of useful data is (15,326*N - 2) bytes, or about 15 kB per stream. This corresponds to 120 ms of audio encoded as 10 ms frames in either LP or Hybrid mode, but at a data rate of over 1 Mbps, which makes little sense for the quality achieved. A more reasonable limit is (7,664*N - 2) bytes, or about 7.5 kB per stream. This corresponds to 120 ms of audio encoded as 20 ms stereo MDCT-mode frames, with a total bitrate just under 511 kbps (not counting the Ogg encapsulation overhead). With N=8, the maximum useful number of streams for the channel meanings currently defined by mapping family 1, this gives a maximum packet size of 61,310 bytes, or just under 60 kB. This is still quite conservative, as it assumes each output channel is taken from one decoded channel of a stereo packet. An implementation could reasonably choose any of these numbers for its internal limits.&lt;br /&gt;
&lt;br /&gt;
== Test Vectors ==&lt;br /&gt;
&lt;br /&gt;
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]&lt;br /&gt;
* Opus test vectors&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Daala</id>
		<title>Daala</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Daala"/>
				<updated>2012-06-29T23:32:21Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: link meeting minutes and mumble recording&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Daala is the current working name of a next generation video codec— to be renamed once someone insists on something better. So far the best proposed alternative is PatentCake.&lt;br /&gt;
&lt;br /&gt;
For now the purposes of this page is to collect notes about things which have been discussed in informal public IRC discussion about the next generation initiative. Participants in these discussions have included Timothy Terriberry, Jason Garrett-Glaser, Loren Merritt, Ben Schwartz, Greg Maxwell, and others. &lt;br /&gt;
&lt;br /&gt;
See also: [https://xiph.org/daala/ https://xiph.org/daala/]&lt;br /&gt;
&lt;br /&gt;
== Weekly meetings ==&lt;br /&gt;
&lt;br /&gt;
We've been having weekly progress meetings on mumble.&lt;br /&gt;
&lt;br /&gt;
* 2012 June 4  [https://people.xiph.org/~giles/2012/daala_20120604.txt minutes] (actually a work week)&lt;br /&gt;
* 2012 June 22 [https://people.xiph.org/~giles/2012/daala_20120622.txt minutes]&lt;br /&gt;
* 2012 June 29 [https://people.xiph.org/~giles/2012/daala_20120629.txt minutes] [https://people.xiph.org/~giles/2012/daala_20120629.opus recording]&lt;br /&gt;
&lt;br /&gt;
= Techniques =&lt;br /&gt;
&lt;br /&gt;
The discussed overall structure so far has been a variable size lapped-DCT block based codec with lapping done via pre/post filtering with a specially structured (lifting) linear phase transform along the edges along with overlapped block motion compensation and the expected trimmings. The lapping can be optimized for energy compaction and other useful properties, including invert-ability, and yields excellent results with efficient finite precision math.&lt;br /&gt;
&lt;br /&gt;
Other components which have been discussed include:&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to all frame types==&lt;br /&gt;
* Multisymbol arithmetic coding &lt;br /&gt;
** Timothy has some trial code showing speed-up proportional to the number of bits coded at once. (ec_test.c)&lt;br /&gt;
* Mode prediction using the previously decoded data, e.g. coding the mode using a probability function derived from trained predictors on the surrounding blocks. &lt;br /&gt;
** This will be terrible for robustness but may significantly reduce signalling overhead, allowing many more modes, and provide continuous adaptation between signalling free and fully signalled modes.&lt;br /&gt;
* Explore legendre polynomial basis transforms instead of DCT&lt;br /&gt;
** May have better perceptual properties and/or result in 'less compromised' efficient implementations.  &lt;br /&gt;
* Coefficient domain prediction to allow efficient energy preserving quantization.&lt;br /&gt;
* Variable partition size/shape and the use of good predictors appears to remove most of the benefit of directional transforms.&lt;br /&gt;
** Perhaps 45deg is still useful?&lt;br /&gt;
** How does this change with partition sizes? Directional transforms are clearly not that useful with 4x4. &lt;br /&gt;
* Transform-post filtering to allow merging smaller transform blocks (like TF merging in CELT) may allow more flexible partitioning then outright using mixed block sizes.&lt;br /&gt;
* Perturbed quantization mode-signalling has been discussed but mostly laughed at. ;) &lt;br /&gt;
* Special block modes well suited to solid color/cartoon like content— avoiding ringing.&lt;br /&gt;
** Are pixel prediction modes too slow?&lt;br /&gt;
* In general— what markov random field techniques can be applied with acceptable performance. Any?&lt;br /&gt;
* Designed for parallel encode and decode within each frame&lt;br /&gt;
** Important because&lt;br /&gt;
*** the proposed techniques need a lot more CPU than H.264 and VP8 for both encode and decode&lt;br /&gt;
*** Moore's law for single-threaded throughput is dead.  Future hardware is all multicore/GPU.&lt;br /&gt;
** Implies&lt;br /&gt;
*** Getting the order of application right for the lapping filters.&lt;br /&gt;
*** Mandatory slicing? Maybe some kind of multilevel entropy coding to reduce redundancy between slices while minimizing the single-threaded portion of decode.&lt;br /&gt;
&lt;br /&gt;
* Using PVQ and energy conservation: see http://jmvalin.ca/video/video_pvq_v3.pdf&lt;br /&gt;
&lt;br /&gt;
==Techniques applicable to inter frames==&lt;br /&gt;
* Using x264 as a test-bed Jason and Loren demonstrated 15% rate/distortion improvements from using 10-bit intermediaries and references, estimated as being 1/3rd from quality calculation in the 10-bit space, 1/3rd from the higher precision references, and 1/3rd from higher intermediate precision in calculations (e.g. MC filter processing).&lt;br /&gt;
** Increased reference precision competes for memory with increased number of references. The improvements demonstrated appear to be a greater win than increasing the reference count once there are four references or so.&lt;br /&gt;
* Super-resolution techniques for motion-compensation references have been discussed— in particular it appears that the half-pel location is where intelligent filtering matters the most so staged computation could be effectively used to allow more expensive filtering at that level.&lt;br /&gt;
** Edge-directed interpolation techniques might be effectively applied to increase motion compensation accuracy, but most of the techniques known to be very effective are too slow.&lt;br /&gt;
** Speculation has been offered that a significant part of MC inaccuracy may be due to blending in a physically incorrect (gamma-corrected) space, though no real conclusions were made. Academic papers on motion compensation accuracy seem to have ignored this issue.&lt;br /&gt;
* Timothy has an example code base for a variable partition size blocking-free motion compensation scheme which merges OBMC (overlapped block motion compensation) and CGI (control-grid interpolation) with an interesting prediction/sub-division scheme and whole-frame trellis optimization of motion vectors. (daala-exp)&lt;br /&gt;
&lt;br /&gt;
==Basic features==&lt;br /&gt;
* YUV 4:4:4, 4:2:2 , 4:2:0 subsamplings, 8-bit, 10bit.&lt;br /&gt;
* Alpha channel — need testing material!&lt;br /&gt;
* 8-bit RGB compatible mode? (e.g. YCoCg, internally or at least flagging for it)&lt;br /&gt;
* Efficient 3D? — need testing material!&lt;br /&gt;
* Lossless?&lt;br /&gt;
** The value of this is disputable. If nothing else it's arguable that stuffing lossless into a lossy format may be the only way to get lossless into many people's hands. Also, see below&lt;br /&gt;
* Good support for decode side droppable frames?&lt;br /&gt;
** Hopefully the referencing structure will be flexible enough to enable this even if it's not an intentional feature.&lt;br /&gt;
&lt;br /&gt;
==Frills==&lt;br /&gt;
* Optionally storing a checksum of the expected decoded frame for decoder/encoder mismatch detection.&lt;br /&gt;
* Expose the number of referential descendants of a given frame (or even the whole reference DAG) for most efficient allocation of FEC.&lt;br /&gt;
&lt;br /&gt;
==Wingdings==&lt;br /&gt;
Crazy crap that might be interesting or at least fun to make fun of... &lt;br /&gt;
* &amp;gt;10bit?&lt;br /&gt;
** Use cases don't seem well enough defined yet. Significant complexity. Any prospective hardware developer may hire assassins.&lt;br /&gt;
** Possible compromise: the video reference structure contains a backbone that can be decoded at only N bits of depth (e.g. 10), and higher precisions are only supported outside of this reference chain.&lt;br /&gt;
**# Precision by truncation: decode is performed twice on each frame, identically, at low and high precision.  The only difference between them is the bit-depth of the transform, or possibly of the transform and MC filters.  Only low-precision outputs can be referenced by subsequent frames.  Useful if high-precision content is still worth watching at low precision.&lt;br /&gt;
**# Precision by gamma: decode is performed once at low precision as normal.  Then the output frame is converted to linear-light at high precision, after which another layer of residuals is added.  The second layer can be permitted to reference previous high-precision frames... tricky to use both sets of references though. Useful if high precision is used for storing linear data, but people still want to watch it on &amp;quot;low-end&amp;quot; hardware.&lt;br /&gt;
* Some high end digital cameras are operating jpeg-derivatives in a special mode that keeps the image in the native linear RGB bayer format in order to avoid lossy/slow demosaicing on the camera. In particular this allows white balancing in post without excessive loss. Probably out of scope for Daala itself.&lt;br /&gt;
** Bayer, 4:2:0, 4:2:2, and Interlacing are all special cases of a more general pattern in which the output frames are decimated/subsampled in a regular fashion.  All such subsamplings could be supported by a unified framework in which the video is always stored with all planes fully sampled, with a header indicating the recommended subsampling for display.  In such cases, the encoder can regard the transform as highly overcomplete, and simply ignore unneeded coefficients (presumably by leaving high frequency residuals coded as zero).  This structure would in effect turn the codec into a motion-compensated interpolating/deinterlacing filter.  Whether this approach is sensible presumably depends in part on how the transform is structured.  It would be especially easy if the transform's highest-frequencies were coded by a wavelet-like layer.&lt;br /&gt;
* Lossless intra-ability: The ability to losslessly rewrite any frame as an intra frame (perhaps with significant bitrate overhead) in order to make frame accurate cuts possible.&lt;br /&gt;
** Or best handled by making sure that containers have working pre-roll, but presumably common GOP sizes will be greater than the number of references so even if losslessly reencoding the references is expensive it may be cheaper than pre-roll. Do both?&lt;br /&gt;
** Can be had for 'free' if lossless is supported, plus the right header flags to restuff the references from lossless copies in a packed hidden frame.&lt;br /&gt;
** Use of explicitly (rather than staged) super-resolution and/or deeper references may make this functionality unattractive due to increased overhead.&lt;br /&gt;
* Internal overlays which could be swapped without re-encoding? (e.g. advertising, station ID). Could also be automatically generated by a Sufficiently Advanced™ encoder to improve efficiencies for static sprites over moving backgrounds.&lt;br /&gt;
** Complicates making the complexity bounded. No Sufficiently Advanced™ encoder likely to ever exist. But perhaps the station id/advertising uses fully justify this.&lt;br /&gt;
** Could be done externally to the video codec, but if so it's no likely to be useful for anyone ever.&lt;br /&gt;
* A secondary reference implementation in OpenCL, maintained throughout development, to make sure that the codec is GPU-friendly and can be done efficiently using OpenCL primitives.&lt;br /&gt;
* SWAR-friendly arithmetic.  For example, choosing transform coefficients so that no intermediate product overflows 16 bits (tricky for signed values) can sometimes enable (e.g.) 4 parallel operations in one uint64_t.  This can allow a pure C reference implementation to run faster, which is valuable for initial adoption and ports to new platforms.&lt;br /&gt;
* Parametric decode-side blur.&lt;br /&gt;
** Symmetrical blur in regions that are smooth on scales longer than the block size.  Could be signaled or derived from observed DC values.&lt;br /&gt;
** Motion blur so that moving objects are blurred along the motion vector.  May require coding a shutter speed parameter (0..1 as a fraction of the inter-frame interval).&lt;br /&gt;
* Fancy block property prediction.  (Not clear how these prediction interact with intra pred)&lt;br /&gt;
** Predict block properties (quantizer, energy, etc.) from MV.  (0,0) probably means small delta.  Larger MV's may correspond to larger deltas ... although  at low shutter speeds large MVs may correlate with reduced overall HF energy.&lt;br /&gt;
** Predict delta spectral shape from source block spectral shape.  HF/LF ratio of the delta may be correlated with the same ratio in its source blocks.  Works well with decode-side fDCT.&lt;br /&gt;
&lt;br /&gt;
==Negative results==&lt;br /&gt;
&lt;br /&gt;
* Using Kurtosis for detecting text in a frame&lt;br /&gt;
** The idea was to detect a Bernouilli distribution but it's not robust and too noisy&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OpusRelease</id>
		<title>OpusRelease</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OpusRelease"/>
				<updated>2012-06-29T21:41:41Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Release management / marketing work */ We have opus-tools binaries now&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Opus release planning ==&lt;br /&gt;
&lt;br /&gt;
This page is for organizing todo items for the initial opus release. Some items would best be dribbled out after the initial release while others are more release critical (tools binaries, test vectors, examples)&lt;br /&gt;
&lt;br /&gt;
=== AUTH48 ===&lt;br /&gt;
&lt;br /&gt;
* Update version in Makefile.draft&lt;br /&gt;
* Update rfcXXXX strings&lt;br /&gt;
* Update sha1 in the build script&lt;br /&gt;
&lt;br /&gt;
=== Release management / marketing work ===&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;strike&amp;gt;Windows binaries&amp;lt;/strike&amp;gt;&lt;br /&gt;
* &amp;lt;strike&amp;gt;Mac binaries&amp;lt;/strike&amp;gt;&lt;br /&gt;
* Get binary packages of opus and opus-tools working for major distros&lt;br /&gt;
&lt;br /&gt;
* Ogg Opus test vectors&lt;br /&gt;
&lt;br /&gt;
* More 'brochure' level text about Opus and its advantages&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;strike&amp;gt;Logo&amp;lt;/strike&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* More images/graphs/art for the site (PEAQ sweeps?)&lt;br /&gt;
&lt;br /&gt;
* Overhaul of the comparative page&lt;br /&gt;
&lt;br /&gt;
* demo pages&lt;br /&gt;
** static demo with music, voice; like [https://people.xiph.org/~giles/2012/opus this one]&lt;br /&gt;
** albums&lt;br /&gt;
** audiobooks&lt;br /&gt;
** remote control encoder (oneman)&lt;br /&gt;
** emscripten decoder&lt;br /&gt;
** A recording of a remote jamming event (bemasc)&lt;br /&gt;
** update Greg's CELT try-tool[http://people.xiph.org/~greg/trial_tool.png] for libopus&lt;br /&gt;
** Son-of-[http://people.xiph.org/~xiphmont/demo/celt/demo.html Monty-CELT-demo] (An in-the-weeds technology demo)&lt;br /&gt;
*** This may actually need to be a series of two, one covering LP mode and one covering hybrid and the rest of the system.&lt;br /&gt;
&lt;br /&gt;
* Improved documentation (e.g. an API overview doc that isn't a doxygen dump), &amp;quot;idiots guide to using libopus&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* Bugtracker&lt;br /&gt;
&lt;br /&gt;
* Client applications beyond Firefox, esp VLC and foobar2000. See [[OPUS TODO]] for a list.&lt;br /&gt;
&lt;br /&gt;
* Icecast release with opus support [https://github.com/krad-radio/icecast-oneman/commit/35927ca52f8e538eb20d8a185b8c10f1f2e9118a patch here]&lt;br /&gt;
&lt;br /&gt;
* '''Expanding this list'''&lt;br /&gt;
&lt;br /&gt;
=== Development work ===&lt;br /&gt;
&lt;br /&gt;
* fix remaining fixed point overflows (gmaxwell/jmspeex)&lt;br /&gt;
* Make opus_custom_demo read/write little endian on BE hosts.&lt;br /&gt;
* Boring standard makefiles for many systems (VMS too? haha)&lt;br /&gt;
* Oggdropish GUI for opusenc&lt;br /&gt;
* Additional tools (validator, gain, udp streaming example)&lt;br /&gt;
* libao for opus-tools &lt;br /&gt;
* LP-mode CBR (boundary conditions, iterations, harmonic average)&lt;br /&gt;
* Fix -fstack-protector-all (e.g. mingw32) or remove it&lt;br /&gt;
* &amp;lt;strike&amp;gt;MIN32/MAX32 used stupidly in celt/ where the arguments evaluated twice.&amp;lt;/strike&amp;gt;&lt;br /&gt;
* &amp;lt;strike&amp;gt;why does --speech + 24kbit make opus tools code stereo where its mono without speech?&amp;lt;/strike&amp;gt;&lt;br /&gt;
* make test_opus_encode test more stupidly low rates.&lt;br /&gt;
* random duration for the first/last frame in opus-tools&lt;br /&gt;
* Activity CTL&lt;br /&gt;
&lt;br /&gt;
== Later work ==&lt;br /&gt;
&lt;br /&gt;
* short block high rate waste in mdct mode&lt;br /&gt;
* (mdct mode) silence detection&lt;br /&gt;
* Prod the second implementation (Tim's) into something (quasi-)releasable&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/Work_In_Progress</id>
		<title>Work In Progress</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/Work_In_Progress"/>
				<updated>2012-06-29T21:39:32Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: s/CELT/Opus/&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* '''General Usage:'''&lt;br /&gt;
** [[Ogg_Index]]: Introducting index headers into Ogg&lt;br /&gt;
** [[Metadata]]: Various types of Ogg metadata including the [[M3F]] (Multimedia Metadata Format) and [[XMLEmbedding]]&lt;br /&gt;
** [[MIME_Types_and_File_Extensions]]: MIME Types and file extensions for Ogg multimedia files&lt;br /&gt;
** [[Subtle]]: Subtitling tool for professional use that intends to support most subtitle formats including CMML and OggKate&lt;br /&gt;
** [[OggText]]: A generic media mapping for (discontinuous) text codecs into Ogg&lt;br /&gt;
** [[ROE]]: A description format for describing the tracks and languages etc. of an Ogg multitrack composition&lt;br /&gt;
&lt;br /&gt;
* '''Compressed Codecs:'''&lt;br /&gt;
** [[OggOpus]]: A low-latency lossy audio codec&lt;br /&gt;
** [[Daala]]: A next-generation video codec research project&lt;br /&gt;
** [[Theora]]: A lossy video codec [[TheoraTodo]]&lt;br /&gt;
** [[OggDirac]]: The &amp;quot;next-generation&amp;quot; wavelet based video codec, lossy or lossless &lt;br /&gt;
** [[OggMNG]]: A mapping for encapsulating the MNG animation format in Ogg&lt;br /&gt;
&lt;br /&gt;
* '''Uncompressed Codecs:'''&lt;br /&gt;
** [[OggKate]]: A codec for karaoke and text encapsulation in Ogg&lt;br /&gt;
** [[OggPCM]]: Uncompressed PCM audio, currently being implemented&lt;br /&gt;
** [[OggSpots]]: A mapping for encapsulating timed images in Ogg&lt;br /&gt;
** [[OggUVS]]: Uncompressed RGB and YUV video&lt;br /&gt;
&lt;br /&gt;
* '''Abandonware''' (nobody working on those as far as we know)&lt;br /&gt;
** [[Ghost]]: A &amp;quot;next-generation&amp;quot; audio codec (vapourware so far -- don't hold your breath)&lt;br /&gt;
** [[Oggless]]: Embedding Xiph codecs like Vorbis in containers other than Ogg&lt;br /&gt;
** [[IceShare]]: P2P content distribution&lt;br /&gt;
** [[OggPCM_Draft1]]: Original uncompressed PCM audio proposal&lt;br /&gt;
** [[OggRGB]]: Original uncompressed RGB video proposal&lt;br /&gt;
** [[OggWrit]]: Text phrase codec (e.g. subtitles)&lt;br /&gt;
** [[OggYUV]]: Original uncompressed YUV video proposal&lt;br /&gt;
&lt;br /&gt;
[[Category:Developers stuff]]&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OggOpus/testvectors</id>
		<title>OggOpus/testvectors</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OggOpus/testvectors"/>
				<updated>2012-06-26T00:24:56Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: link to greg's file collection&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page lists test vectors needed for OggOpus which are specific to the Ogg mapping (separate from the opus bitstream test vectors, though they do some bitstream testing as a side efffect)&lt;br /&gt;
&lt;br /&gt;
Greg is collecting a draft file set at https://people.xiph.org/~greg/opus_testvectors/&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* All test vectors should be chained files with at least two parts&lt;br /&gt;
** Chained file where the second link has no pregap and starts with inter frames (to ensure that decoder state is reset)&lt;br /&gt;
* Pre-skip (set large pre-skip with a chime &amp;quot;if you just heard a chime, your player is broken&amp;quot;)&lt;br /&gt;
* Multichannel&lt;br /&gt;
** Multichannel stereo (e.g. mono+mono)&lt;br /&gt;
** Multichannel w/pre-skip and random channel maps&lt;br /&gt;
** Multichannel with silent channels&lt;br /&gt;
*** Totally silent multichannel  (Should this one be invalid?)&lt;br /&gt;
** Multichannel with repeated channels (i.e. one stream used for multiple channels)&lt;br /&gt;
** Multichannel with 256 channels&lt;br /&gt;
** Mapping tests for the Vorbis mappings (e.g. name of the speaker spoken by each speaker)&lt;br /&gt;
* Files with crazy input rate.&lt;br /&gt;
* Header-gain set very high with a very quiet input (silent if you don't implement header gain).&lt;br /&gt;
* Header-gain set very low with an input that will clip a decoder if the header gain is not done internally.&lt;br /&gt;
* Header-gain set very low, and R128_TRACK_GAIN to normalize it&lt;br /&gt;
** matching WAV outputs ... but matching to what?&lt;br /&gt;
* Single packet per page&lt;br /&gt;
* Utterly stuffed pages with constant continued pages&lt;br /&gt;
* Pages whose contents are entirely and partially dropped frames (len=0) (maybe redundant with bitstream tests)&lt;br /&gt;
* Files with chimes after the end (testing end length chopping)&lt;br /&gt;
* File with all opus modes and frame sizes&lt;br /&gt;
* Stereo files using many mono frames at the beginning/end&lt;br /&gt;
* OpusTags comment values containing very large nonsense comments, duplicate comment values etc.&lt;br /&gt;
* Files with non-zero initial granulepos, pre-skip, trimmed last page to check duration calculation&lt;br /&gt;
&lt;br /&gt;
=== Illegal test vectors that MUST fail ===&lt;br /&gt;
* Zero streams (N=0)&lt;br /&gt;
* Too many two-output streams&lt;br /&gt;
** M&amp;gt;N&lt;br /&gt;
** M&amp;lt;=N but M+N&amp;gt;255&lt;br /&gt;
* Channels mapped to nonexistent stream indices (255 &amp;gt; index &amp;gt;= M+N)&lt;br /&gt;
* Illegal OpusTags comments&lt;br /&gt;
** Total length larger or shorter than the packet&lt;br /&gt;
** Illegal field names&lt;br /&gt;
** Illegal field contents&lt;br /&gt;
** Illegal field (no &amp;quot;=&amp;quot;)&lt;br /&gt;
** Multiple R128_TRACK_GAIN comments (should this be required to fail?)&lt;br /&gt;
** R128_TRACK_GAIN comments containing illegal values (should this be required to fail?)&lt;br /&gt;
*** Non-ASCII encodings of correct-looking values&lt;br /&gt;
* All GP==0&lt;br /&gt;
* first data granulepos too small&lt;br /&gt;
* preskip &amp;gt; final granulepos&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OPUS_TODO</id>
		<title>OPUS TODO</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OPUS_TODO"/>
				<updated>2012-06-13T17:05:10Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* 1.0 Launch */ Some progress on these goals&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== IETF draft ==&lt;br /&gt;
* &amp;lt;s&amp;gt;New comparison tool&amp;lt;/s&amp;gt; done in draft-11&lt;br /&gt;
* &amp;lt;s&amp;gt;Update test vectors&amp;lt;/s&amp;gt; done in draft-11&lt;br /&gt;
&lt;br /&gt;
== Spec ==&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;s&amp;gt;Finish codec draft&amp;lt;/s&amp;gt;&lt;br /&gt;
* Get draft through the RFC process&lt;br /&gt;
* &amp;lt;s&amp;gt;Ogg mapping (including multi-channel)&amp;lt;/s&amp;gt;. See: [[OggOpus]]&lt;br /&gt;
* Matroska mapping. See: [[MatroskaOpus]]&lt;br /&gt;
* RTP payload format&lt;br /&gt;
&lt;br /&gt;
== 1.0 Launch ==&lt;br /&gt;
* De-uglify webpage&lt;br /&gt;
* &amp;lt;s&amp;gt;Add logo&amp;lt;/s&amp;gt;&lt;br /&gt;
* FAQ&lt;br /&gt;
* Promotional material&lt;br /&gt;
* &amp;lt;s&amp;gt;Opus tools releases&amp;lt;/s&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Other ==&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;s&amp;gt;Logo See: [https://bugzilla.mozilla.org/show_bug.cgi?id=689261 Mozilla bug 689261] for some discussion&amp;lt;/s&amp;gt;&lt;br /&gt;
* Test vectors&lt;br /&gt;
* Listening tests&lt;br /&gt;
* Documentation (at a minimum every exported symbol should have complete and accurate documentation)&lt;br /&gt;
* Add content to opus-codec.org&lt;br /&gt;
** The above documentation&lt;br /&gt;
** Presentations &lt;br /&gt;
** Examples and test results  (hyperlink to Monty's demo, gmaxwell's HA results page, etc)&lt;br /&gt;
* Oggz-validate (should also validate opus toc)&lt;br /&gt;
&lt;br /&gt;
== Opus-tools ==&lt;br /&gt;
* Build infrastructure (e.g. autotools)&lt;br /&gt;
* A simple real time streaming example tool&lt;br /&gt;
* &amp;lt;s&amp;gt;Multichannel support&amp;lt;/s&amp;gt; doneish.&lt;br /&gt;
* Replaygain (half done— needs a gain tool)&lt;br /&gt;
* &amp;lt;s&amp;gt;Testing (incl. jenkins automation)&amp;lt;/s&amp;gt; doneish&lt;br /&gt;
&lt;br /&gt;
== Third party software ==&lt;br /&gt;
* Support in ekiga&lt;br /&gt;
* Support in mumble&lt;br /&gt;
* Support in asterisk&lt;br /&gt;
* Support in icecast&lt;br /&gt;
* Support in firefox (rtcweb and in ogg)&lt;br /&gt;
* Support in VLC&lt;br /&gt;
* Support in ogg123&lt;br /&gt;
* Support in ffmpeg&lt;br /&gt;
* Support in rockbox&lt;br /&gt;
* Support in foobar2000&lt;br /&gt;
* Support in gstreamer&lt;br /&gt;
* Support in mplayer&lt;br /&gt;
* Support in xmms&lt;br /&gt;
* Support in oggdsf&lt;br /&gt;
* Support in xiphqt&lt;br /&gt;
* Support in RoarAudio (specs + roard + libroardsp)&lt;br /&gt;
&lt;br /&gt;
== Future work ==&lt;br /&gt;
* Smart automatic mode decision&lt;br /&gt;
* psymodel based VBR&lt;br /&gt;
* Remove copy in inverse MDCT&lt;br /&gt;
* Save some float&amp;lt;-&amp;gt;int conversions&lt;br /&gt;
* Improvements to LP mode CBR (greg has some code)&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OPUS_TODO</id>
		<title>OPUS TODO</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OPUS_TODO"/>
				<updated>2012-06-13T17:03:38Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Spec */ We have a draft for the ogg mapping&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== IETF draft ==&lt;br /&gt;
* &amp;lt;s&amp;gt;New comparison tool&amp;lt;/s&amp;gt; done in draft-11&lt;br /&gt;
* &amp;lt;s&amp;gt;Update test vectors&amp;lt;/s&amp;gt; done in draft-11&lt;br /&gt;
&lt;br /&gt;
== Spec ==&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;s&amp;gt;Finish codec draft&amp;lt;/s&amp;gt;&lt;br /&gt;
* Get draft through the RFC process&lt;br /&gt;
* &amp;lt;s&amp;gt;Ogg mapping (including multi-channel)&amp;lt;/s&amp;gt;. See: [[OggOpus]]&lt;br /&gt;
* Matroska mapping. See: [[MatroskaOpus]]&lt;br /&gt;
* RTP payload format&lt;br /&gt;
&lt;br /&gt;
== 1.0 Launch ==&lt;br /&gt;
* De-uglify webpage&lt;br /&gt;
* Add logo&lt;br /&gt;
* FAQ&lt;br /&gt;
* Promotional material&lt;br /&gt;
* Opus tools releases&lt;br /&gt;
&lt;br /&gt;
== Other ==&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;s&amp;gt;Logo See: [https://bugzilla.mozilla.org/show_bug.cgi?id=689261 Mozilla bug 689261] for some discussion&amp;lt;/s&amp;gt;&lt;br /&gt;
* Test vectors&lt;br /&gt;
* Listening tests&lt;br /&gt;
* Documentation (at a minimum every exported symbol should have complete and accurate documentation)&lt;br /&gt;
* Add content to opus-codec.org&lt;br /&gt;
** The above documentation&lt;br /&gt;
** Presentations &lt;br /&gt;
** Examples and test results  (hyperlink to Monty's demo, gmaxwell's HA results page, etc)&lt;br /&gt;
* Oggz-validate (should also validate opus toc)&lt;br /&gt;
&lt;br /&gt;
== Opus-tools ==&lt;br /&gt;
* Build infrastructure (e.g. autotools)&lt;br /&gt;
* A simple real time streaming example tool&lt;br /&gt;
* &amp;lt;s&amp;gt;Multichannel support&amp;lt;/s&amp;gt; doneish.&lt;br /&gt;
* Replaygain (half done— needs a gain tool)&lt;br /&gt;
* &amp;lt;s&amp;gt;Testing (incl. jenkins automation)&amp;lt;/s&amp;gt; doneish&lt;br /&gt;
&lt;br /&gt;
== Third party software ==&lt;br /&gt;
* Support in ekiga&lt;br /&gt;
* Support in mumble&lt;br /&gt;
* Support in asterisk&lt;br /&gt;
* Support in icecast&lt;br /&gt;
* Support in firefox (rtcweb and in ogg)&lt;br /&gt;
* Support in VLC&lt;br /&gt;
* Support in ogg123&lt;br /&gt;
* Support in ffmpeg&lt;br /&gt;
* Support in rockbox&lt;br /&gt;
* Support in foobar2000&lt;br /&gt;
* Support in gstreamer&lt;br /&gt;
* Support in mplayer&lt;br /&gt;
* Support in xmms&lt;br /&gt;
* Support in oggdsf&lt;br /&gt;
* Support in xiphqt&lt;br /&gt;
* Support in RoarAudio (specs + roard + libroardsp)&lt;br /&gt;
&lt;br /&gt;
== Future work ==&lt;br /&gt;
* Smart automatic mode decision&lt;br /&gt;
* psymodel based VBR&lt;br /&gt;
* Remove copy in inverse MDCT&lt;br /&gt;
* Save some float&amp;lt;-&amp;gt;int conversions&lt;br /&gt;
* Improvements to LP mode CBR (greg has some code)&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OpusVersions</id>
		<title>OpusVersions</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OpusVersions"/>
				<updated>2012-05-28T19:33:05Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Scheme 6: A.B */ scheme 6 is the same as scheme 3&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Possible version schemes:&lt;br /&gt;
&lt;br /&gt;
== Scheme 1: A.B.C.D ==&lt;br /&gt;
&lt;br /&gt;
* A=1&lt;br /&gt;
* B is the RFC version&lt;br /&gt;
* C is for feature improvements&lt;br /&gt;
* D is for bugfixes&lt;br /&gt;
&lt;br /&gt;
== Scheme 2: A.B.C ==&lt;br /&gt;
&lt;br /&gt;
* A is the RFC version&lt;br /&gt;
* B is for feature improvements&lt;br /&gt;
* C is for bugfixes&lt;br /&gt;
&lt;br /&gt;
== Scheme 3: A.B.C ==&lt;br /&gt;
&lt;br /&gt;
* A=1&lt;br /&gt;
* B is the RFC version and for feature improvements&lt;br /&gt;
* D is for bugfixes&lt;br /&gt;
&lt;br /&gt;
== Scheme 4: A.B.C ==&lt;br /&gt;
&lt;br /&gt;
* A is the RFC versions that aren't both backward- and forward-comparible (e.g. extensions)&lt;br /&gt;
* B is for feature improvements and RFC versions that are both backward- and forward-comparible (e.g. clarifications, minor fixes)&lt;br /&gt;
* C is for bugfixes&lt;br /&gt;
&lt;br /&gt;
== Scheme 5: A.B.C ==&lt;br /&gt;
&lt;br /&gt;
* A is for any new RFC&lt;br /&gt;
* B is for feature improvments compliant with A&lt;br /&gt;
* C is for bug fixes&lt;br /&gt;
&lt;br /&gt;
== Scheme 6: A.B ==&lt;br /&gt;
&lt;br /&gt;
* A is for features and RFC changes&lt;br /&gt;
* B is for bugfixes&lt;br /&gt;
&lt;br /&gt;
This is Scheme 3 with the initial '1' dropped.&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OpusVersions</id>
		<title>OpusVersions</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OpusVersions"/>
				<updated>2012-05-28T19:31:52Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: two more possibilities&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Possible version schemes:&lt;br /&gt;
&lt;br /&gt;
== Scheme 1: A.B.C.D ==&lt;br /&gt;
&lt;br /&gt;
* A=1&lt;br /&gt;
* B is the RFC version&lt;br /&gt;
* C is for feature improvements&lt;br /&gt;
* D is for bugfixes&lt;br /&gt;
&lt;br /&gt;
== Scheme 2: A.B.C ==&lt;br /&gt;
&lt;br /&gt;
* A is the RFC version&lt;br /&gt;
* B is for feature improvements&lt;br /&gt;
* C is for bugfixes&lt;br /&gt;
&lt;br /&gt;
== Scheme 3: A.B.C ==&lt;br /&gt;
&lt;br /&gt;
* A=1&lt;br /&gt;
* B is the RFC version and for feature improvements&lt;br /&gt;
* D is for bugfixes&lt;br /&gt;
&lt;br /&gt;
== Scheme 4: A.B.C ==&lt;br /&gt;
&lt;br /&gt;
* A is the RFC versions that aren't both backward- and forward-comparible (e.g. extensions)&lt;br /&gt;
* B is for feature improvements and RFC versions that are both backward- and forward-comparible (e.g. clarifications, minor fixes)&lt;br /&gt;
* C is for bugfixes&lt;br /&gt;
&lt;br /&gt;
== Scheme 5: A.B.C ==&lt;br /&gt;
&lt;br /&gt;
* A is for any new RFC&lt;br /&gt;
* B is for feature improvments compliant with A&lt;br /&gt;
* C is for bug fixes&lt;br /&gt;
&lt;br /&gt;
== Scheme 6: A.B ==&lt;br /&gt;
&lt;br /&gt;
* A is for features and RFC changes&lt;br /&gt;
* B is for bugfixes&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OggOpus</id>
		<title>OggOpus</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OggOpus"/>
				<updated>2012-05-23T23:40:12Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Ogg Mapping for Opus */ mime-type and filename extension&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Ogg Mapping for Opus ==&lt;br /&gt;
&lt;br /&gt;
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.&lt;br /&gt;
&lt;br /&gt;
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.&lt;br /&gt;
&lt;br /&gt;
The remaining parameters that must be signaled are&lt;br /&gt;
&lt;br /&gt;
* The magic number for stream identification,&lt;br /&gt;
* The stream count and coupling for multichannel audio, and&lt;br /&gt;
* Any metadata or tags.&lt;br /&gt;
&lt;br /&gt;
=== Content Type ===&lt;br /&gt;
&lt;br /&gt;
The recommended mime-type for Ogg Opus files is '''audio/ogg''', defined in [http://www.ietf.org/rfc/rfc5334.txt RFC 5334].&lt;br /&gt;
&lt;br /&gt;
If more specificity is desired, one can distinguish Opus files as 'audio/ogg; codecs=opus'.&lt;br /&gt;
&lt;br /&gt;
The recommended filename extension for Ogg Opus files is '''.opus'''.&lt;br /&gt;
&lt;br /&gt;
=== Packet Organization ===&lt;br /&gt;
&lt;br /&gt;
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. &lt;br /&gt;
&lt;br /&gt;
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.&lt;br /&gt;
&lt;br /&gt;
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes &amp;quot;OpusHead&amp;quot;. It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.&lt;br /&gt;
&lt;br /&gt;
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes &amp;quot;OpusTags&amp;quot;. It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.&lt;br /&gt;
&lt;br /&gt;
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first audio packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.&lt;br /&gt;
&lt;br /&gt;
=== Granule Position ===&lt;br /&gt;
&lt;br /&gt;
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.&lt;br /&gt;
&lt;br /&gt;
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.&lt;br /&gt;
&lt;br /&gt;
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below.&lt;br /&gt;
&lt;br /&gt;
All other pages with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer can assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.&lt;br /&gt;
&lt;br /&gt;
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.&lt;br /&gt;
&lt;br /&gt;
The PCM sample position is determined from the granule position using the formula&lt;br /&gt;
&lt;br /&gt;
 'PCM sample position' = 'granule position' - 'pre-skip' .&lt;br /&gt;
&lt;br /&gt;
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula&lt;br /&gt;
&lt;br /&gt;
                   'PCM sample position'&lt;br /&gt;
 'playback time' = --------------------- .&lt;br /&gt;
                          48000.0&lt;br /&gt;
&lt;br /&gt;
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.&lt;br /&gt;
&lt;br /&gt;
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.&lt;br /&gt;
&lt;br /&gt;
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.&lt;br /&gt;
&lt;br /&gt;
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.&lt;br /&gt;
&lt;br /&gt;
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.&lt;br /&gt;
&lt;br /&gt;
==== ID Header ====&lt;br /&gt;
&lt;br /&gt;
      0                   1                   2                   3&lt;br /&gt;
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;
     |       'O'     |      'p'      |     'u'       |     's'       |&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;
     |       'H'     |       'e'     |     'a'       |     'd'       |&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;
     |  version = 1  | channel count |           pre-skip            |&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;
     |                original input sample rate in Hz               |&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;
     |    output gain Q7.8 in dB     |  channel map  |               |&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               :&lt;br /&gt;
     |                                                               |&lt;br /&gt;
     :          optional channel mapping table...                    :&lt;br /&gt;
     |                                                               |&lt;br /&gt;
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+&lt;br /&gt;
&lt;br /&gt;
Brief description of each field:&lt;br /&gt;
&lt;br /&gt;
 - Magic signature: &amp;quot;OpusHead&amp;quot; (64 bits)&lt;br /&gt;
 - Version number (8 bits unsigned): 0x01 for this spec&lt;br /&gt;
 - Channel count 'c' (8 bits unsigned): MUST be &amp;gt; 0&lt;br /&gt;
 - Pre-skip (16 bits unsigned, little endian)&lt;br /&gt;
 - Input sample rate (32 bits unsigned, little endian): informational only&lt;br /&gt;
 - Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when&lt;br /&gt;
   decoding&lt;br /&gt;
 - Channel mapping family (8 bits unsigned)&lt;br /&gt;
  --  0 = one stream: mono or L,R stereo&lt;br /&gt;
  --  1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...&lt;br /&gt;
  --  2..254 = reserved (treat as 255)&lt;br /&gt;
  --  255 = no defined channel meaning&lt;br /&gt;
 If channel mapping family &amp;gt; 0&lt;br /&gt;
 - Stream count 'N' (8 bits unsigned): MUST be &amp;gt; 0&lt;br /&gt;
 - Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M &amp;lt;= N, M+N &amp;lt;= 255&lt;br /&gt;
 - Channel mapping (8*c bits)&lt;br /&gt;
   -- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Detailed definition of each field:&lt;br /&gt;
&lt;br /&gt;
* '''Magic signature'''&lt;br /&gt;
The magic signature &amp;quot;OpusHead&amp;quot; allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.&lt;br /&gt;
&lt;br /&gt;
* '''Version'''&lt;br /&gt;
The version number MUST always be '1' for this version of the encapsulation specification.&lt;br /&gt;
&lt;br /&gt;
Implementations SHOULD treat streams where the upper four bits of the version number match a recognized specification as backwards-compatible with that specification. That is, the version number can be considered split into &amp;quot;major&amp;quot; and &amp;quot;minor&amp;quot; version sub-fields, with changes to the &amp;quot;minor&amp;quot; sub-field in the lower four bits signaling compatible changes. For example, a decoder implementing this specification SHOULD accept any stream with a version number 15 or less, and SHOULD assume any stream with a version number 16 or greater is incompatible. The initial version '1' was chosen to keep implementations from relying on this byte as a null terminator for the OpusHead string.&lt;br /&gt;
&lt;br /&gt;
* '''Channel count''' 'c'&lt;br /&gt;
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.&lt;br /&gt;
&lt;br /&gt;
* '''Pre-skip'''&lt;br /&gt;
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.&lt;br /&gt;
&lt;br /&gt;
When constructing cropped Ogg Opus streams, a pre-skip of at least 3840 samples (80 ms) is RECOMMENDED to ensure complete convergence.&lt;br /&gt;
&lt;br /&gt;
* '''Input sample rate'''&lt;br /&gt;
This is ''not'' the sample rate to use for playback of the encoded data.&lt;br /&gt;
&lt;br /&gt;
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.&lt;br /&gt;
&lt;br /&gt;
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:&lt;br /&gt;
* If the hardware supports 48 kHz playback, decode at 48 kHz,&lt;br /&gt;
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,&lt;br /&gt;
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,&lt;br /&gt;
* else decode at 48 kHz and resample.&lt;br /&gt;
&lt;br /&gt;
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.&lt;br /&gt;
&lt;br /&gt;
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't &lt;br /&gt;
actually upsample the output to 10 MHz if requested).&lt;br /&gt;
&lt;br /&gt;
* '''Output gain'''&lt;br /&gt;
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.&lt;br /&gt;
&lt;br /&gt;
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.&lt;br /&gt;
&lt;br /&gt;
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.&lt;br /&gt;
&lt;br /&gt;
The gain is the 20 log&amp;lt;sub&amp;gt;10&amp;lt;/sub&amp;gt; ratio of output to input sample values to be applied to the decoder output. E.g. &amp;lt;code&amp;gt;sample *= pow(10, header.gain/(20.*256))&amp;lt;/code&amp;gt; where header.gain is the raw 16 bit Q7.8 value from the header.&lt;br /&gt;
&lt;br /&gt;
* '''Channel mapping family'''&lt;br /&gt;
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet.  &lt;br /&gt;
&lt;br /&gt;
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:&lt;br /&gt;
&lt;br /&gt;
* Family 0 (RTP mapping)&lt;br /&gt;
** Allowed numbers of channels: 1 or 2&lt;br /&gt;
** 1 channel: monophonic (mono)&lt;br /&gt;
** 2 channels: stereo (left, right)&lt;br /&gt;
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1.  When the channel mapping byte has this value, no further fields are present in OpusHead.&lt;br /&gt;
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])&lt;br /&gt;
** Allowed numbers of channels: 1 ... 8&lt;br /&gt;
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.&lt;br /&gt;
* Family 255 (no defined channel meaning)&lt;br /&gt;
** Allowed numbers of channels: 1...255&lt;br /&gt;
** Channels are unidentified.  General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.&lt;br /&gt;
&lt;br /&gt;
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.&lt;br /&gt;
&lt;br /&gt;
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.&lt;br /&gt;
&lt;br /&gt;
* '''Stream count''' 'N'&lt;br /&gt;
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.&lt;br /&gt;
&lt;br /&gt;
For channel mapping family 0, this value defaults to 1, and is not coded.&lt;br /&gt;
&lt;br /&gt;
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.&lt;br /&gt;
&lt;br /&gt;
* '''Two-channel stream count''' 'M'&lt;br /&gt;
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.&lt;br /&gt;
&lt;br /&gt;
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.&lt;br /&gt;
&lt;br /&gt;
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.&lt;br /&gt;
&lt;br /&gt;
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The &amp;quot;two-channel stream count&amp;quot; field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.&lt;br /&gt;
&lt;br /&gt;
* '''Channel mapping'''&lt;br /&gt;
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.&lt;br /&gt;
&lt;br /&gt;
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.&lt;br /&gt;
&lt;br /&gt;
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.&lt;br /&gt;
&lt;br /&gt;
==== Comment Header ====&lt;br /&gt;
&lt;br /&gt;
 - 8 byte 'OpusTags' magic signature (64 bits)&lt;br /&gt;
 - The remaining data follows the vorbis-comment header design used in OggVorbis (without the &amp;quot;framing-bit&amp;quot;), OggTheora, and Speex:&lt;br /&gt;
  * Vendor string (always present).&lt;br /&gt;
  ** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.&lt;br /&gt;
  * TAG=value metadata strings (zero or more).&lt;br /&gt;
  ** 4-byte little-endian string count.&lt;br /&gt;
  ** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in &amp;quot;tag=value&amp;quot; form.&lt;br /&gt;
&lt;br /&gt;
One new comment field is introduced for Ogg Opus:&lt;br /&gt;
 R128_TRACK_GAIN=-573  &lt;br /&gt;
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead &amp;quot;output gain&amp;quot; field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.&lt;br /&gt;
&lt;br /&gt;
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write &amp;quot;R128_TRACK_GAIN=0&amp;quot;. If a tool modifies the OpusHead &amp;quot;output gain&amp;quot; field, it MUST also update or remove the R128_TRACK_GAIN comment field.&lt;br /&gt;
&lt;br /&gt;
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.&lt;br /&gt;
&lt;br /&gt;
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.&lt;br /&gt;
&lt;br /&gt;
== Other Implementation Notes ==&lt;br /&gt;
&lt;br /&gt;
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.&lt;br /&gt;
&lt;br /&gt;
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Encoders SHOULD use no more padding than required to make a variable bitrate (VBR) stream constant bitrate (CBR). Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message.&lt;br /&gt;
&lt;br /&gt;
In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of (61,298*N - 2) bytes, or about 60 kB per Opus stream. With 255 streams, this is 15,630,988 bytes (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 streams, each containing 120 ms of audio encoded as 2.5 ms frames, each frame using the maximum possible number of bytes (1275) and stored in the least efficient manner allowed (a VBR code 3 Opus packet). Even in such a packet, most of the data will be zeros, as 2.5 ms frames, which are required to run in the MDCT mode, cannot actually use all 1275 bytes. The largest packet consisting entirely of useful data is (15,326*N - 2) bytes, or about 15 kB per stream. This corresponds to 120 ms of audio encoded as 10 ms frames in either LP or Hybrid mode, but at a data rate of over 1 Mbps, which makes little sense for the quality achieved. A more reasonable limit is (7,664*N - 2) bytes, or about 7.5 kB per stream. This corresponds to 120 ms of audio encoded as 20 ms stereo MDCT-mode frames, with a total bitrate just under 511 kbps (not counting the Ogg encapsulation overhead). With N=8, the maximum useful number of streams for the channel meanings currently defined by mapping family 1, this gives a maximum packet size of 61,310 bytes, or just under 60 kB. This is still quite conservative, as it assumes each output channel is taken from one decoded channel of a stereo packet. An implementation could reasonably choose any of these numbers for its internal limits.&lt;br /&gt;
&lt;br /&gt;
== Test Vectors ==&lt;br /&gt;
&lt;br /&gt;
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]&lt;br /&gt;
* Opus test vectors&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OpusRelease</id>
		<title>OpusRelease</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OpusRelease"/>
				<updated>2012-05-21T18:46:50Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Release management / marketing work */ link oneman's patch&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Opus release planning ==&lt;br /&gt;
&lt;br /&gt;
This page is for organizing todo items for the initial opus release. Some items would best be dribbled out after the initial release while others are more release critical (tools binaries, test vectors, examples)&lt;br /&gt;
&lt;br /&gt;
=== Release management / marketing work ===&lt;br /&gt;
&lt;br /&gt;
* Windows binaries&lt;br /&gt;
* Mac binaries&lt;br /&gt;
* Get binary packages of opus and opus-tools working for major distros&lt;br /&gt;
&lt;br /&gt;
* Ogg Opus test vectors&lt;br /&gt;
&lt;br /&gt;
* More 'brochure' level text about Opus and its advantages&lt;br /&gt;
&lt;br /&gt;
* Logo&lt;br /&gt;
&lt;br /&gt;
* More images/graphs/art for the site (PEAQ sweeps?)&lt;br /&gt;
&lt;br /&gt;
* Overhaul of the comparative page&lt;br /&gt;
&lt;br /&gt;
* demo pages&lt;br /&gt;
** static demo with music, voice; like [https://people.xiph.org/~giles/2012/opus this one]&lt;br /&gt;
** albums&lt;br /&gt;
** audiobooks&lt;br /&gt;
** remote control encoder (oneman)&lt;br /&gt;
** emscripten decoder&lt;br /&gt;
** A recording of a remote jamming event (bemasc)&lt;br /&gt;
** update Greg's CELT try-tool[http://people.xiph.org/~greg/trial_tool.png] for libopus&lt;br /&gt;
** Son-of-[http://people.xiph.org/~xiphmont/demo/celt/demo.html Monty-CELT-demo] (An in-the-weeds technology demo)&lt;br /&gt;
*** This may actually need to be a series of two, one covering LP mode and one covering hybrid and the rest of the system.&lt;br /&gt;
&lt;br /&gt;
* Improved documentation (e.g. an API overview doc that isn't a doxygen dump), &amp;quot;idiots guide to using libopus&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* Bugtracker&lt;br /&gt;
&lt;br /&gt;
* Client applications beyond Firefox, esp VLC and foobar2000. See [[OPUS TODO]] for a list.&lt;br /&gt;
&lt;br /&gt;
* Icecast release with opus support [https://github.com/krad-radio/icecast-oneman/commit/35927ca52f8e538eb20d8a185b8c10f1f2e9118a patch here]&lt;br /&gt;
&lt;br /&gt;
=== Development work ===&lt;br /&gt;
&lt;br /&gt;
* fix remaining fixed point overflows (gmaxwell/jmspeex)&lt;br /&gt;
* Make opus_custom_demo read/write little endian on BE hosts.&lt;br /&gt;
* Boring standard makefiles for many systems (VMS too? haha)&lt;br /&gt;
* Oggdropish GUI for opusenc&lt;br /&gt;
* Additional tools (validator, gain, udp streaming example)&lt;br /&gt;
* libao for opus-tools &lt;br /&gt;
* LP-mode CBR (boundary conditions, iterations, harmonic average)&lt;br /&gt;
* Fix -fstack-protector-all (e.g. mingw32) or remove it&lt;br /&gt;
* MIN32/MAX32 used stupidly in celt/ where the arguments evaluated twice.&lt;br /&gt;
* why does --speech + 24kbit make opus tools code stereo where its mono without speech?&lt;br /&gt;
* make test_opus_encode test more stupidly low rates.&lt;br /&gt;
* random duration for the first/last frame in opus-tools&lt;br /&gt;
* short block high rate waste in mdct mode&lt;br /&gt;
* (mdct mode) silence detection&lt;br /&gt;
* Prod the second implementation (Tim's) into something (quasi-)releasable&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OpusRelease</id>
		<title>OpusRelease</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OpusRelease"/>
				<updated>2012-05-17T17:22:50Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Release management / marketing work */ I assume 'sweaps' was a typo.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Opus release planning ==&lt;br /&gt;
&lt;br /&gt;
This page is for organizing todo items for the initial opus release. Some items would best be dribbled out after the initial release while others are more release critical (tools binaries, test vectors, examples)&lt;br /&gt;
&lt;br /&gt;
=== Release management / marketing work ===&lt;br /&gt;
&lt;br /&gt;
* Windows binaries&lt;br /&gt;
* Mac binaries&lt;br /&gt;
* Get binary packages of opus and opus-tools working for major distros&lt;br /&gt;
&lt;br /&gt;
* Ogg Opus test vectors&lt;br /&gt;
&lt;br /&gt;
* More 'brochure' level text about Opus and its advantages&lt;br /&gt;
&lt;br /&gt;
* Logo&lt;br /&gt;
&lt;br /&gt;
* More images/graphs/art for the site (PEAQ sweeps?)&lt;br /&gt;
&lt;br /&gt;
* Overhaul of the comparative page&lt;br /&gt;
&lt;br /&gt;
* demo pages&lt;br /&gt;
** static demo with music, voice; like [https://people.xiph.org/~giles/2012/opus this one]&lt;br /&gt;
** albums&lt;br /&gt;
** audiobooks&lt;br /&gt;
** remote control encoder (oneman)&lt;br /&gt;
** emscripten decoder&lt;br /&gt;
** A recording of a remote jamming event (bemasc)&lt;br /&gt;
** update Greg's CELT try-tool[http://people.xiph.org/~greg/trial_tool.png] for libopus&lt;br /&gt;
** Son-of-[http://people.xiph.org/~xiphmont/demo/celt/demo.html Monty-CELT-demo] (An in-the-weeds technology demo)&lt;br /&gt;
*** This may actually need to be a series of two, one covering LP mode and one covering hybrid and the rest of the system.&lt;br /&gt;
&lt;br /&gt;
* Improved documentation (e.g. an API overview doc that isn't a doxygen dump), &amp;quot;idiots guide to using libopus&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* Bugtracker&lt;br /&gt;
&lt;br /&gt;
* Client applications beyond Firefox, esp VLC and foobar2000. See [[OPUS TODO]] for a list.&lt;br /&gt;
&lt;br /&gt;
=== Development work ===&lt;br /&gt;
&lt;br /&gt;
* fix remaining fixed point overflows (gmaxwell/jmspeex)&lt;br /&gt;
* Make opus_custom_demo read/write little endian on BE hosts.&lt;br /&gt;
* Boring standard makefiles for many systems (VMS too? haha)&lt;br /&gt;
* Oggdropish GUI for opusenc&lt;br /&gt;
* Additional tools (validator, gain, udp streaming example)&lt;br /&gt;
* libao for opus-tools &lt;br /&gt;
* LP-mode CBR (boundary conditions, iterations, harmonic average)&lt;br /&gt;
* Fix -fstack-protector-all (e.g. mingw32) or remove it&lt;br /&gt;
* MIN32/MAX32 used stupidly in celt/ where the arguments evaluated twice.&lt;br /&gt;
* why does --speech + 24kbit make opus tools code stereo where its mono without speech?&lt;br /&gt;
* make test_opus_encode test more stupidly low rates.&lt;br /&gt;
* random duration for the first/last frame in opus-tools&lt;br /&gt;
* short block high rate waste in mdct mode&lt;br /&gt;
* (mdct mode) silence detection&lt;br /&gt;
* Prod the second implementation (Tim's) into something (quasi-)releasable&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OpusRelease</id>
		<title>OpusRelease</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OpusRelease"/>
				<updated>2012-05-17T00:11:16Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Opus release planning */ test vectors&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Opus release planning ==&lt;br /&gt;
&lt;br /&gt;
This page is for organizing todo items for the initial opus release.&lt;br /&gt;
&lt;br /&gt;
* Windows binaries&lt;br /&gt;
* Mac binaries&lt;br /&gt;
* Get binary packages of opus and opus-tools working for major distros&lt;br /&gt;
&lt;br /&gt;
* Ogg Opus test vectors&lt;br /&gt;
&lt;br /&gt;
* demo pages&lt;br /&gt;
** static demo with music, voice; like [https://people.xiph.org/~giles/2012/opus this one]&lt;br /&gt;
** albums&lt;br /&gt;
** audiobooks&lt;br /&gt;
** remote control encoder (oneman)&lt;br /&gt;
** emscripten decoder&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OpusRelease</id>
		<title>OpusRelease</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OpusRelease"/>
				<updated>2012-05-17T00:09:54Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Opus release planning */ would be nice to get a js build working for broader demonstrations&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Opus release planning ==&lt;br /&gt;
&lt;br /&gt;
This page is for organizing todo items for the initial opus release.&lt;br /&gt;
&lt;br /&gt;
* Windows binaries&lt;br /&gt;
* Mac binaries&lt;br /&gt;
* Get binary packages of opus and opus-tools working for major distros&lt;br /&gt;
&lt;br /&gt;
* demo pages&lt;br /&gt;
** static demo with music, voice; like [https://people.xiph.org/~giles/2012/opus this one]&lt;br /&gt;
** albums&lt;br /&gt;
** audiobooks&lt;br /&gt;
** remote control encoder (oneman)&lt;br /&gt;
** emscripten decoder&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OpusRelease</id>
		<title>OpusRelease</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OpusRelease"/>
				<updated>2012-05-17T00:09:19Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Opus release planning */ Link to my demo page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Opus release planning ==&lt;br /&gt;
&lt;br /&gt;
This page is for organizing todo items for the initial opus release.&lt;br /&gt;
&lt;br /&gt;
* Windows binaries&lt;br /&gt;
* Mac binaries&lt;br /&gt;
* Get binary packages of opus and opus-tools working for major distros&lt;br /&gt;
&lt;br /&gt;
* demo pages&lt;br /&gt;
** static demo with music, voice; like [https://people.xiph.org/~giles/2012/opus this one]&lt;br /&gt;
** albums&lt;br /&gt;
** audiobooks&lt;br /&gt;
** remote control encoder (oneman)&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	<entry>
		<id>http://wiki.xiph.org/OpusRelease</id>
		<title>OpusRelease</title>
		<link rel="alternate" type="text/html" href="http://wiki.xiph.org/OpusRelease"/>
				<updated>2012-05-17T00:07:49Z</updated>
		
		<summary type="html">&lt;p&gt;Rillian: /* Opus release planning */ fix formatting&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Opus release planning ==&lt;br /&gt;
&lt;br /&gt;
This page is for organizing todo items for the initial opus release.&lt;br /&gt;
&lt;br /&gt;
* Windows binaries&lt;br /&gt;
* Mac binaries&lt;br /&gt;
* Get binary packages of opus and opus-tools working for major distros&lt;br /&gt;
&lt;br /&gt;
* demo pages&lt;br /&gt;
** static demo with music, voice&lt;br /&gt;
** albums&lt;br /&gt;
** audiobooks&lt;br /&gt;
** remote control encoder (oneman)&lt;/div&gt;</summary>
		<author><name>Rillian</name></author>	</entry>

	</feed>