https://wiki.xiph.org/index.php?title=DaalaMeeting20150414&feed=atom&action=historyDaalaMeeting20150414 - Revision history2022-10-07T13:36:26ZRevision history for this page on the wikiMediaWiki 1.35.0https://wiki.xiph.org/index.php?title=DaalaMeeting20150414&diff=15725&oldid=prevMrZeus: Created page with "<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"> weekly-meeting Version 132859 Saved..."2015-04-21T16:00:15Z<p>Created page with "<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"> weekly-meeting Version 132859 Saved..."</p>
<p><b>New page</b></p><div><pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap; <br />
white-space: -pre-wrap; <br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word;"><br />
<br />
weekly-meeting Version 132859 Saved April 14, 2015<br />
<br />
# Meeting 2015-04-14<br />
<br />
Mumble: mf4.xiph.org:64738<br />
<br />
# Agenda<br />
<br />
- reviews<br />
- 32x32 MV partitions<br />
- Non-photographic stuff<br />
<br />
# Attending<br />
<br />
azita, daala-person, derf, jack, jmspeex, MrZeus, TD-Linux, tmatth, unlord, xiphmont, yushin<br />
<br />
# reviews<br />
<br />
# 32x32 partitions<br />
<br />
- d: An update: previously our MV grid had an 8px offset to the rest of the frame. When 16x16 was maximum block size, you'd get 8px ahead of the current macroblock you were trying to decode would give you the right pixels just in time. It also made it so the minimal number of MVs to encode was the same as traditional codecs. We have long suspected this caused inefficiency near the border because the ones near the frame are 0 and there are lots of splits near the boundary. I had to change the grid anyway so I removed the offset and it made things 1.4% worse. We're losing like 6% at low rates according to JM. If someone wants to look at it feel free. I can't figure out anything wrong with it. The next thing I'm trying is shifting up the sizes of all motion blocks by 2x. It's easier than adding extra levels, so the first step is to make things bigger and then go back later and add back the smaller blocks. Looking for volunteers to review that as it's almost done.<br />
- x: I'll have to do that anyway to get my full precision patches landed.<br />
<br />
# full precision ref frames<br />
<br />
- x: Turns out 12bit is better than we thought. In video1 short it didn't get any gain on ducks, but did get a sizeable gain on everything else. Thomas also suggested the change will show most change in SSIM. It does show the greatest change in the metrics. On sequences where we saw quilting (changing seats and sintel) we were getting 6-7% improvement.<br />
- d: Which graphs should I look at? nttshort doesn't show this.<br />
- x: full_precision_branch_root and full_precision_branch_12bit. video1-short.<br />
- d: We can do links now!<br />
- x: https://www.arewecompressedyet.com/?full_precision_branch_root&full_precision_branch_12bit<br />
- x: The most impressive gains will show up on SSIM, but not on ducks.<br />
- td: You have no root on ntt short 1?<br />
- x: If we want to look at ntt short it's a different set. The 8bit and 12bit full_precision_branches. On ntt short 1 it gave gains on most clips. One or two clips it gave very slight loss, on the others it gave significant gains. I'm interested in where that loss is coming from.<br />
- jm: parkscene is kind of odd. It shouldn't hurt.<br />
- td: bq terrace is another with a loss.<br />
- x: That's a larger loss and I don't know why it's happening.<br />
- j: Why would it ever get worse unless it was a bug.<br />
- td: Noise.<br />
- d: bq terrace is 5% and the thing is bad right in the target bitrates.<br />
- x: I will go understand that.<br />
<br />
# Non-photographic stuff<br />
<br />
- jm: I'm trying to understand how the algorithms work and reading papers. I can't seem to make much out of them. I looked at Tim's presentation and trying to get deeper and having a hard time.<br />
- d: I didn't think the papers on SPIHT were that bad.<br />
- jm: They may be good for someone in the field, but I'm not.<br />
- d: You are now.<br />
- jm: I see this tree stuff, but it's not clear what the root of the tree is. Is that the entire image?<br />
- d: It's a wavelet coefficient at level 0, so it could be the whole thing.<br />
- jm: The tree is centered at 1 DC?<br />
- d: Rooted at 1 DC.<br />
- jm: If you do several levels up, at the very first step you ignore all correlations across directions?<br />
- d: Yes.<br />
- y: They say ?? coefficient is not used.<br />
- d: Part of the idea is that it is supposed to be a decorrelating transform anyway. I'm trying to remember if ebcot did tried to exploit some of those correlations.<br />
- jm: What is ebcot?<br />
- d: The coder in JPEG2000. The general idea you are trying to exploit is that coefficients are significant if their neighbors are significant. So you account for spatial by looking for coefficients nearby at the same scale. You expect coefficients in the same band of the wavelet decomposition to significant if others are in that same band. SPIHT is organized so that you are grouping clusters of 4 coefficients together. They all have a common parent and you are trying to code significance at the parent level.<br />
- jm: If each has children, the children are treated with respect to the parent; a bunch of four at the same level can have different probabilities from the next one spatially right?<br />
- d: Yes. You are trying to build a hierarchy that exploits this correlation. Ebcot does this more directly by doing it in a flat array and estimating based on neighbors.<br />
- jm: I still don't understand how it uses the max.<br />
- d: You should read Yushin's paper for that.<br />
- y: I have drawn more diagrams in my paper for easier understanding. I'll go find it for you. The wavelet defines three different spatial frequencies, H, V, and diag. The tree has three branches at root, and four elsewhere. The max is defining the maximum dynamic range of this one spatial coefficent tree. Between these two trees nothing is ever used (except DC). The way they code coefficients is quite painful to understand.<br />
- jm: I don't want to do bitplane coding. It seems to make things more confusing.<br />
- y: Bitplane coding definitely makes the paper harder to follow.<br />
- d: The basic idea is that you want to know when a given coefficient and its children are significant at a level.<br />
- jm: You treat all directions independently right?<br />
- d: You have a subtree and it has one coefficient at the current scale and some children at the next scale. There is some correlation for different scales in the same place and some correlation for things nearby in the same scale. And you want to know at what level do the coefficients in this subtree become significant. And that's why you take the max.<br />
<br />
...<br />
<br />
- jm: What do I code?<br />
- d: You compute the max of all the coefficients and code the index of the most significant 1 bit in that max.<br />
- jm: So the max rounded to a power of 2.<br />
- d: This tells you that all the coefficients in that subtree are bounded by that significant bit. The next thing you code is an offset to where the current coefficient becomes significant. How many bits down do you have to go to get to the root of this subtree. And you'd expect that to be close to the max. So you can use unary coding to say how many bits down from the max you have to go to get to where that coefficient is significant.<br />
<br />
...<br />
<br />
- jm: In the subtree, if I'm not hitting the max, it doesn't mean my children aren't hitting the max.<br />
- d: If your current subtree doesn't hit the max you're going to code a new max for the subtree.<br />
- jm: I'm using 5 levels (32x32) is that a lot or too little?<br />
- d: I think JPEG2000 defaulted to 64x64 and Dirac did ??<br />
- y: If you do 5 levels for 32x32 you end up with one coefficient in lowest band. A larger number of coefficients is better.<br />
- jm: I'm going to use the superblock size whatever that ends up being.<br />
- y: ??<br />
- jm: Everything seems to be binary, but we don't use binary arithmetic coding.<br />
- d: The code I showed you before showed how you could convert it. Instead of doing these unary codes you can just code the bits with probabilities. You can do a bunch of joint coding on the four children.<br />
- jm: That makes sense. Do you code somewhere which of the four children is hitting the max if the higher level parent hasn't?<br />
- d: That's exactly what I do. You know there are 15 possibilities if the parent hasn't hit the max. The other stuff the authors did later was that you're grouping magnitudes into different sets and coding which partition you are in. There's no reason those have to be powers of two. You have the set of all possible coefficient magnitudes and you are partitioning with roughly logarithmic progression, but it doesn't have to be powers of two.<br />
- jm: That sounds more complicated.<br />
- d: It is, but not more complicated than what we did in Opus. The idea is that it gives you more flexibility with how you pick the partitions. It may make sense to have more partitions near zero. Instead of saying you become significant when you become four or two, you have partitions for 1, 2, and 3. Having a few extra at small numbers s<br />
</pre></div>MrZeus