Intra: Difference between revisions

Revision as of 20:48, 8 February 2013

In Daala, intra-frame predictors are coefficients that are used to predict the contents of a block based on its three neighboring blocks. Consider a 4x4 block structure:

$\left({\begin{array}{cccc|cccc}x_{1}&x_{2}&x_{3}&x_{4}&x_{5}&x_{6}&x_{7}&x_{8}\\x_{9}&x_{10}&x_{11}&x_{12}&x_{13}&x_{14}&x_{15}&x_{16}\\x_{17}&x_{18}&x_{19}&x_{20}&x_{21}&x_{22}&x_{23}&x_{24}\\x_{25}&x_{26}&x_{27}&x_{28}&x_{29}&x_{30}&x_{31}&x_{32}\\\hline x_{33}&x_{34}&x_{35}&x_{36}&y_{1}&y_{2}&y_{3}&y_{4}\\x_{37}&x_{38}&x_{39}&x_{40}&y_{5}&y_{6}&y_{7}&y_{8}\\x_{41}&x_{42}&x_{43}&x_{44}&y_{9}&y_{10}&y_{11}&y_{12}\\x_{45}&x_{46}&x_{47}&x_{48}&y_{13}&y_{14}&y_{15}&y_{16}\\\end{array}}\right)$

Assuming a linear predictor, we would like to find the 768 coefficients that best predict $y_{i}$ from the neighboring $x_{i}$ . That is, for a given block with coefficients ${\hat {x}}$ and ${\hat {y}}$ , we would like a 48x16 matrix $\beta$ such that the residual ${\hat {e}}={\hat {y}}-{\hat {x}}\beta$ is minimized (for some $p$ -norm). Because it is this residual ${\hat {e}}$ that is quantized and coded in the bitstream, we would prefer that most of its coefficients are zero. This will be discussed further when comparing the different $p$ -norms.

Block Modes

These predictors often follow certain geometry in the time-domain image space. Imagine a vertical edge that runs through $y$ . A natural predictor might be to simply copy the values ${\begin{array}{cccc}x_{29}&x_{30}&x_{31}&x_{32}\end{array}}$ down into each row of the 4x4 block $y$ . In order to account for the different possible geometries, each block is assigned a mode which indicates what set of prediction coefficients to use (the $\beta$ ). When encoding a block $y$ , the encoder chooses a mode, computes ${\hat {e}}$ and writes both the block mode and the residual ${\hat {e}}$ .

Because the decoder must know the coefficients for each of the block modes, there is a tradeoff between how well we can predict the values in $y$ (number of block modes) with decoder complexity. Our hypothesis is that the optimal number of block modes will correspond to how closely we can fit different geometries inside the block. For 4x4 blocks, this might correspond to a mode for each of the 8 cardinal directions (ask Jason to clarify) with larger blocks potentially supporting a larger number of directions. In addition, we would like to support the DC and True Motion modes from WebM/VP8 as they are often the best fit. For common video sequences, anywhere from 20% to 45% of the intra frames use the True Motion mode [1].

This is slightly complicated by the fact that our coefficients are not in the time-domain, but are the result of a lapped transform, see TDLT. In practice all of these modes have an equivalent in the frequency domain. As a starting point, we are using the 10 modes from VP8 to classify the blocks from a set of sample images. Each category of blocks will be used to construct a set of predictors $\beta$ which is then used to reclassify the blocks based on Sum of Absolute Transform Differences (SATD). Tim suggested weighting each blocks contribution based on SATD_bestfit-SATD_nearest.

$L^{2}$ -norm

Suppose for some mode we have $n$ blocks. We can then construct the matrices $X$ and $Y$ where the ith row of each contains the zero-mean values from the ith block.

$X={\begin{pmatrix}x_{1,1}-{\bar {x_{1}}}&x_{1,2}-{\bar {x_{2}}}&\cdots &x_{1,m}-{\bar {x_{m}}}\\x_{2,1}-{\bar {x_{1}}}&x_{2,2}-{\bar {x_{2}}}&\cdots &x_{2,m}-{\bar {x_{m}}}\\\vdots &\vdots &\ddots &\vdots \\x_{n,1}-{\bar {x_{1}}}&x_{n,2}-{\bar {x_{2}}}&\cdots &x_{n,m}-{\bar {x_{m}}}\end{pmatrix}}$

$Y={\begin{pmatrix}y_{1,1}-{\bar {y_{1}}}&y_{1,2}-{\bar {y_{2}}}&\cdots &y_{1,l}-{\bar {y_{l}}}\\y_{2,1}-{\bar {y_{1}}}&y_{2,2}-{\bar {y_{2}}}&\cdots &y_{2,l}-{\bar {y_{l}}}\\\vdots &\vdots &\ddots &\vdots \\y_{n,1}-{\bar {y_{1}}}&y_{n,2}-{\bar {y_{2}}}&\cdots &y_{n,l}-{\bar {y_{l}}}\end{pmatrix}}$

In the case of 4x4 blocks, $m=48$ and $l=16$ . Let $C=X^{T}X$ and $D=X^{T}Y$ . Then $\beta =C^{-1}D$ .

4x4 Intra Predictors

Using the set of 50 images in subset1-y4m to train sparse (4*4*4 = 64 multiplies per block) intra predictors:

VP8 Intra Predictors
Mode 0 Blocks  379965 SATD 1030.8  Bits 136.827 Mean 1725.16 Var 591085 CgRef 12.9338 CgPred 14.4075 Pg 1.47375
Mode 1 Blocks  328826 SATD 1410.93 Bits 140.398 Mean 1755.67 Var 615743 CgRef 10.7723 CgPred 14.4396 Pg 3.66727
Mode 2 Blocks  364320 SATD 1318.99 Bits 140.038 Mean 1771.33 Var 613037 CgRef 12.2597 CgPred 14.321  Pg 2.0613
Mode 3 Blocks  461738 SATD 1266.24 Bits 138.956 Mean 1799.99 Var 581067 CgRef 12.4228 CgPred 14.4985 Pg 2.07569
Mode 4 Blocks  317707 SATD 1320.78 Bits 140.709 Mean 1738.33 Var 615735 CgRef 11.4334 CgPred 13.989  Pg 2.55553
Mode 5 Blocks  276650 SATD 1222.78 Bits 139.462 Mean 1691.97 Var 612193 CgRef 11.5441 CgPred 14.2428 Pg 2.69867
Mode 6 Blocks  264186 SATD 1238.89 Bits 139.503 Mean 1664.18 Var 603871 CgRef 11.7692 CgPred 14.2162 Pg 2.44696
Mode 7 Blocks  354682 SATD 1293.8  Bits 139.885 Mean 1654    Var 603002 CgRef 12.1736 CgPred 14.2243 Pg 2.05075
Mode 8 Blocks  278080 SATD 1322.45 Bits 140.806 Mean 1706.37 Var 634358 CgRef 11.6018 CgPred 14.0524 Pg 2.4506
Mode 9 Blocks  253209 SATD 1321.22 Bits 140.924 Mean 1722.95 Var 625202 CgRef 11.4638 CgPred 13.7727 Pg 2.30894
Pooled Blocks 3279363 SATD 1270.74 Bits 139.871 Mean 1727.99 Var 619488 CgRef 11.4416 CgPred 14.2047 Pg 2.76305

Daala Intra Predictors
Mode 0 Blocks  472906 SATD 1017.46 Bits 138.832 Mean 1728.18 Var 553398 CgRef 12.9443 CgPred 14.6488 Pg 1.7045
Mode 1 Blocks  261910 SATD 1519.87 Bits 146.67  Mean 1779.45 Var 471328 CgRef 9.70601 CgPred 12.4896 Pg 2.78355
Mode 2 Blocks  198050 SATD 1818.89 Bits 149.906 Mean 1778.28 Var 440574 CgRef 10.0855 CgPred 11.5142 Pg 1.42869
Mode 3 Blocks  400658 SATD 1091.01 Bits 138.022 Mean 1765.68 Var 699810 CgRef 14.9146 CgPred 17.0632 Pg 2.14857
Mode 4 Blocks  440888 SATD 1056.23 Bits 138.926 Mean 1768.35 Var 669611 CgRef 13.8652 CgPred 15.8732 Pg 2.008
Mode 5 Blocks  123873 SATD 2646.11 Bits 158.928 Mean 1761.66 Var 348360 CgRef 6.56612 CgPred 7.681   Pg 1.11488
Mode 6 Blocks  389669 SATD 1128.95 Bits 140.725 Mean 1607.85 Var 604298 CgRef 12.8995 CgPred 14.7494 Pg 1.84991
Mode 7 Blocks  364446 SATD 1132.17 Bits 139.413 Mean 1649.46 Var 674713 CgRef 14.0093 CgPred 16.1988 Pg 2.18954
Mode 8 Blocks  373419 SATD 1116.84 Bits 140.019 Mean 1720.85 Var 707029 CgRef 13.9352 CgPred 15.8526 Pg 1.91734
Mode 9 Blocks  253544 SATD 1571.98 Bits 147.453 Mean 1797.04 Var 502237 CgRef 10.7593 CgPred 12.3135 Pg 1.55419
Pooled Blocks 3279363 SATD 1261.89 Bits 142.641 Mean 1727.99 Var 619488 CgRef 12.3321 CgPred 14.4918 Pg 2.15973

Using the set of 1000 images in subset3-y4m to test the sparse (4*4*4 = 64 multiplies per block) intra predictors:

VP8 Intra Predictors
Mode 0 Blocks  8440618 SATD 773.069 Bits 129.891 Mean 1987.82 Var 714490 CgRef 16.1033 CgPred 17.7858 Pg 1.68249
Mode 1 Blocks  7177017 SATD 1097.06 Bits 133.975 Mean 1835.81 Var 682211 CgRef 13.2692 CgPred 17.2354 Pg 3.96624
Mode 2 Blocks  7486531 SATD 1129.14 Bits 136.141 Mean 1849.6  Var 699452 CgRef 14.1344 CgPred 16.3106 Pg 2.17612
Mode 3 Blocks  9420648 SATD 1030.04 Bits 133.978 Mean 1856.55 Var 667016 CgRef 14.7014 CgPred 16.9182 Pg 2.21677
Mode 4 Blocks  6537593 SATD 1074.28 Bits 135.633 Mean 1877.16 Var 703018 CgRef 13.7951 CgPred 16.4451 Pg 2.65005
Mode 5 Blocks  5764375 SATD 1037.61 Bits 135.162 Mean 1862.93 Var 714729 CgRef 13.7467 CgPred 16.5772 Pg 2.83053
Mode 6 Blocks  5526915 SATD 1047.34 Bits 135.235 Mean 1853.98 Var 725463 CgRef 13.9811 CgPred 16.5951 Pg 2.61399
Mode 7 Blocks  7493442 SATD 1061.93 Bits 134.918 Mean 1797.8  Var 701592 CgRef 14.467  CgPred 16.6788 Pg 2.21183
Mode 8 Blocks  5635636 SATD 1077.08 Bits 135.666 Mean 1850.12 Var 724307 CgRef 13.9346 CgPred 16.4812 Pg 2.54658
Mode 9 Blocks  5148581 SATD 1100.52 Bits 136.283 Mean 1895.9  Var 726050 CgRef 13.7549 CgPred 16.1284 Pg 2.37353
Pooled Blocks 68631356 SATD 1035.13 Bits 134.722 Mean 1868.07 Var 721805 CgRef 13.8553 CgPred 16.77   Pg 2.91471

Daala Intra Predictors
Mode 0 Blocks  9675144 SATD 841.841 Bits 133.86  Mean 1877.93 Var 662343 CgRef 15.701  CgPred 17.4209 Pg 1.71986
Mode 1 Blocks  4809122 SATD 1246.09 Bits 141.563 Mean 1764.73 Var 466747 CgRef 12.3822 CgPred 14.2618 Pg 1.8796
Mode 2 Blocks  3895161 SATD 1620.32 Bits 146.753 Mean 1763.23 Var 408176 CgRef 10.9493 CgPred 12.402  Pg 1.45271
Mode 3 Blocks  8512887 SATD 895.992 Bits 133.096 Mean 1838.33 Var 781431 CgRef 17.0033 CgPred 19.2762 Pg 2.27292
Mode 4 Blocks 10224556 SATD 788.516 Bits 131.961 Mean 2073.27 Var 777684 CgRef 16.8937 CgPred 18.964  Pg 2.07024
Mode 5 Blocks  2492854 SATD 2306.98 Bits 155.1   Mean 1781.56 Var 306540 CgRef 7.42632 CgPred 8.54354 Pg 1.11722
Mode 6 Blocks  7839240 SATD 960.763 Bits 136.271 Mean 1723.23 Var 684064 CgRef 15.2766 CgPred 17.1139 Pg 1.83731
Mode 7 Blocks  8127384 SATD 913.37  Bits 133.886 Mean 1865.64 Var 797013 CgRef 16.528  CgPred 18.8681 Pg 2.34004
Mode 8 Blocks  8289385 SATD 856.357 Bits 133.506 Mean 1963.06 Var 814308 CgRef 16.8012 CgPred 18.6876 Pg 1.88641
Mode 9 Blocks  4765623 SATD 1420.7  Bits 144.479 Mean 1773.33 Var 478656 CgRef 11.868  CgPred 13.3845 Pg 1.51648
Pooled Blocks 68631356 SATD 1030.34 Bits 137.446 Mean 1868.07 Var 721805 CgRef 14.964  CgPred 17.0421 Pg 2.07809

8x8 Intra Predictors

Using the set of 50 images in subset1-y4m to train sparse (4*8*8 = 256 multiplies per block) intra predictors:

VP8 Intra Predictors
Mode 0 Blocks 153189 SATD 6082.74 Bits 633.298 Mean 1720.67 Var 559647 CgRef 12.0272 CgPred 12.3392 Pg 0.311931
Mode 1 Blocks  49244 SATD 6482.51 Bits 631.517 Mean 1752.33 Var 634379 CgRef 12.983  CgPred 14.5119 Pg 1.52891
Mode 2 Blocks  81386 SATD 6317.27 Bits 633.888 Mean 1772.67 Var 617045 CgRef 13.2505 CgPred 13.8038 Pg 0.55328
Mode 3 Blocks 149817 SATD 6142.2  Bits 629.138 Mean 1821.6  Var 577836 CgRef 13.4642 CgPred 14.0662 Pg 0.601936
Mode 4 Blocks  59095 SATD 6488.09 Bits 639.403 Mean 1738.64 Var 635175 CgRef 12.8858 CgPred 13.546  Pg 0.660174
Mode 5 Blocks  51138 SATD 6321.64 Bits 636.812 Mean 1680.52 Var 614762 CgRef 12.5191 CgPred 13.2898 Pg 0.770625
Mode 6 Blocks  46860 SATD 6214.38 Bits 635.479 Mean 1636.45 Var 606974 CgRef 13.0011 CgPred 13.5919 Pg 0.590785
Mode 7 Blocks 101899 SATD 6579.86 Bits 637.096 Mean 1621    Var 597451 CgRef 12.9896 CgPred 13.5883 Pg 0.598636
Mode 8 Blocks  55425 SATD 6946.74 Bits 644.887 Mean 1714.81 Var 646845 CgRef 12.5421 CgPred 13.1933 Pg 0.651179
Mode 9 Blocks  43833 SATD 6717.76 Bits 640.445 Mean 1738.18 Var 634206 CgRef 12.5208 CgPred 13.1502 Pg 0.629351
Pooled Blocks 791886 SATD 6356.01 Bits 635.537 Mean 1728.58 Var 619153 CgRef 12.4832 CgPred 13.326  Pg 0.842891

Daala Intra Predictors
Mode 0 Blocks  62290 SATD 8842.88 Bits 695.702 Mean 1762.94 Var 546310 CgRef 10.527  CgPred 11.2558 Pg 0.728816
Mode 1 Blocks  51719 SATD 8525.04 Bits 687.553 Mean 1814.55 Var 507208 CgRef 10.9023 CgPred 11.7365 Pg 0.834129
Mode 2 Blocks  36010 SATD 12451.8 Bits 728.279 Mean 1791.12 Var 384211 CgRef 7.39062 CgPred 7.865   Pg 0.474379
Mode 3 Blocks 112574 SATD 5202.09 Bits 631.108 Mean 1783.42 Var 690506 CgRef 15.7576 CgPred 16.3784 Pg 0.620821
Mode 4 Blocks 115633 SATD 5215.31 Bits 633.79  Mean 1769.83 Var 629411 CgRef 14.1115 CgPred 14.8214 Pg 0.709915
Mode 5 Blocks  54821 SATD 9203.44 Bits 696.669 Mean 1744.5  Var 463757 CgRef 9.45701 CgPred 10.0708 Pg 0.613816
Mode 6 Blocks  87676 SATD 5900.82 Bits 646.94  Mean 1578.28 Var 591176 CgRef 13.1847 CgPred 13.8014 Pg 0.616685
Mode 7 Blocks  92133 SATD 5419.48 Bits 636.372 Mean 1634.71 Var 692905 CgRef 15.1441 CgPred 15.8044 Pg 0.660312
Mode 8 Blocks  76551 SATD 6501.37 Bits 658.66  Mean 1736.97 Var 612091 CgRef 12.6443 CgPred 13.4359 Pg 0.791668
Mode 9 Blocks 102479 SATD 5847.21 Bits 646.934 Mean 1733.7  Var 570790 CgRef 12.9057 CgPred 13.583  Pg 0.677333
Pooled Blocks 791886 SATD 6625.85 Bits 659.058 Mean 1728.58 Var 619153 CgRef 12.6445 CgPred 13.4208 Pg 0.776311

Using the set of 1000 images in subset3-y4m to test the sparse (4*8*8 = 256 multiplies per block) intra predictors.

VP8 Intra Predictors
Mode 0 Blocks  3285321 SATD 4831.25 Bits 609.544 Mean 1942.13 Var 681922 CgRef 15.1468 CgPred 15.5036 Pg 0.356785
Mode 1 Blocks  1097934 SATD 4966.25 Bits 600.994 Mean 1847.45 Var 682361 CgRef 16.1138 CgPred 17.8094 Pg 1.69564
Mode 2 Blocks  1708487 SATD 5380.91 Bits 616.147 Mean 1869.28 Var 707993 CgRef 15.3892 CgPred 15.978  Pg 0.58875
Mode 3 Blocks  3106373 SATD 5029.6  Bits 608.275 Mean 1879.34 Var 663156 CgRef 15.8358 CgPred 16.4784 Pg 0.642574
Mode 4 Blocks  1208481 SATD 5328.85 Bits 618.814 Mean 1882.27 Var 709893 CgRef 15.2482 CgPred 15.8695 Pg 0.621321
Mode 5 Blocks  1049455 SATD 5420.14 Bits 619.549 Mean 1855.61 Var 719825 CgRef 14.9456 CgPred 15.7082 Pg 0.762665
Mode 6 Blocks   959802 SATD 5356.04 Bits 618.585 Mean 1832.64 Var 732730 CgRef 15.2935 CgPred 15.878  Pg 0.584477
Mode 7 Blocks  2216886 SATD 5374.46 Bits 613.848 Mean 1782.23 Var 703311 CgRef 15.4919 CgPred 16.1339 Pg 0.64197
Mode 8 Blocks  1089024 SATD 5614.08 Bits 620.635 Mean 1830.07 Var 723888 CgRef 15.031  CgPred 15.6673 Pg 0.636324
Mode 9 Blocks   870162 SATD 5664.15 Bits 621.395 Mean 1877.86 Var 725235 CgRef 14.8465 CgPred 15.4287 Pg 0.582189
Pooled Blocks 16591925 SATD 5205.41 Bits 613.921 Mean 1868.35 Var 720473 CgRef 15.1122 CgPred 15.9821 Pg 0.869956

Daala Intra Predictors
Mode 0 Blocks  1179905 SATD 7866.76 Bits 682.077 Mean 1726.99 Var 519986 CgRef 12.0984 CgPred 12.4692 Pg 0.370783
Mode 1 Blocks  1008452 SATD 7410.96 Bits 669.185 Mean 1815.4  Var 486451 CgRef 12.7369 CgPred 13.1166 Pg 0.379717
Mode 2 Blocks   676403 SATD 11047.9 Bits 713.149 Mean 1806.12 Var 353773 CgRef 8.53864 CgPred 9.00691 Pg 0.46827
Mode 3 Blocks  2540750 SATD 4183.79 Bits 607.011 Mean 1934.57 Var 794608 CgRef 18.4004 CgPred 19.0367 Pg 0.636302
Mode 4 Blocks  2656446 SATD 3991.2  Bits 605.507 Mean 2030    Var 728302 CgRef 17.1992 CgPred 17.8889 Pg 0.68974
Mode 5 Blocks  1011671 SATD 8495.58 Bits 687.746 Mean 1744.3  Var 449801 CgRef 10.8417 CgPred 11.4205 Pg 0.578879
Mode 6 Blocks  1710619 SATD 5227.87 Bits 632.837 Mean 1648.81 Var 645320 CgRef 15.2445 CgPred 15.8622 Pg 0.617689
Mode 7 Blocks  2352651 SATD 3996.86 Bits 601.981 Mean 2013.93 Var 842396 CgRef 18.4642 CgPred 19.1573 Pg 0.693167
Mode 8 Blocks  1482503 SATD 5447.2  Bits 637.809 Mean 1810.08 Var 641515 CgRef 14.9565 CgPred 15.5448 Pg 0.588273
Mode 9 Blocks  1972525 SATD 5184.71 Bits 633.59  Mean 1822.5  Var 633855 CgRef 14.9489 CgPred 15.5997 Pg 0.650804
Pooled Blocks 16591925 SATD 5466.77 Bits 636.877 Mean 1868.35 Var 720473 CgRef 15.3899 CgPred 16.0271 Pg 0.63723

[1] http://blog.webmproject.org/2010/07/inside-webm-technology-vp8-intra-and.html

@@ Line 24: / Line 24: @@
 Because the decoder must know the coefficients for each of the block modes, there is a tradeoff between how well we can predict the values in <math>y</math> (number of block modes) with decoder complexity.  Our hypothesis is that the optimal number of block modes will correspond to how closely we can fit different geometries inside the block.  For 4x4 blocks, this might correspond to a mode for each of the 8 cardinal directions (ask Jason to clarify) with larger blocks potentially supporting a larger number of directions.  In addition, we would like to support the DC and True Motion modes from WebM/VP8 as they are often the best fit.  For common video sequences, anywhere from 20% to 45% of the intra frames use the True Motion mode [1].
-This is slightly complicated by the fact that our coefficients are not in the time-domain, but are the result of a lapped transform, see [[TDLT]].  In practice all of these modes have an equivalent in the frequency domain.  As a starting point, we are using the 10 modes from Theora to classify the blocks from a set of sample images.  Each category of blocks will be used to construct a set of predictors <math>\beta</math> which is then used to reclassify the blocks based on Sum of Absolute Transform Differences (SATD).  <i>Tim suggested weighting each blocks contribution based on SATD_bestfit-SATD_nearest</i>.
+This is slightly complicated by the fact that our coefficients are not in the time-domain, but are the result of a lapped transform, see [[TDLT]].  In practice all of these modes have an equivalent in the frequency domain.  As a starting point, we are using the 10 modes from VP8 to classify the blocks from a set of sample images.  Each category of blocks will be used to construct a set of predictors <math>\beta</math> which is then used to reclassify the blocks based on Sum of Absolute Transform Differences (SATD).  <i>Tim suggested weighting each blocks contribution based on SATD_bestfit-SATD_nearest</i>.
 == <math>L^2</math>-norm ==

Intra: Difference between revisions

Revision as of 20:48, 8 February 2013

Contents

Block Modes

$L^{2}$ -norm

4x4 Intra Predictors

8x8 Intra Predictors

Navigation menu

Intra: Difference between revisions

Revision as of 20:48, 8 February 2013

Block Modes

L 2 {\displaystyle L^{2}} -norm

4x4 Intra Predictors

8x8 Intra Predictors

Navigation menu

$L^{2}$ -norm