DaalaMeeting20150210

# Meeting 2015-02-10

Mumble: mf4.xiph.org:64738

# Agenda

- reviews
- od_hv_intra_pred (yushin)
- Block size decision
- AWCY metrics
- Suggestion: more technical meeting time

# Attending

azita, yushin, jmspeex, td-linux, mrzeus, jack, tmatth, xiphmont

# reviews

reviews are good

# od_hv_intra_pred

- jm: I looked into this and discussed with Yushin. I believe the code is actually correct but the encoder is suboptimal. Basically the prediction attempts to put something in all the bins where it actually makes sense and leaves it up to the encoder to encode noref or not. In no case would not putting something in a band make it better.
- y: I checked that. My conclusion is that hte idea is not working well. It works for 4x4 but not for larger block sizes. It fails most of the time. However, there is a noref that will cover up the bad predictor. I made a random change to not code outside of some 4x4 coeff, and there was a gain. But I'd like to check this issue more. I know we've spent a lot of time on TF switching, but that is really for intraprediction.
- jm: In theory, the same thing should work for any blocksize. What probably happened is that in some cases a prediction is only marginally useful that signaling not noref ends up costing more than the prediction is helping. This is an RDO problem in the encoder. It's known that the rate estimates on noref and gain/theta are not very good, and this would be a symptom of that. it would be fixable by fixing the rdo rates.
- y: My take is that it's more than RDO. The correlation between neighbor block and current block is dropping very quickly. The real world doesn't have checkerboards and wallpaper in general.
- jm: This wasn't to say I've solved intraprediction. In cases that are really really easy, this prevents us from looking really bad. It's almost free, so we might as well use it. It's kind of a baseline; it works in a few cases, but it doesn't hurt in other cases, and it's a benchmark for future predictors. If you can't beat this, then don't bother.
- y: There's room for improvement. I think we should enable this for 4x4 and disable elsewhere.
- jm: If you want to disable it for higher blocksizes then do it in the encoder.
- y: We can turn off noref for those higher blocks.
- jm: The way the bitstream is designed you have to code noref. It is jointly coded with gain and theta.
- y: I'll track this as a side task and try to find a slight improvement from the current state.
- jm: If we never code noref, it's free because it's jointly coded and adapted. I think it should work at 16x16. There's nothing fundamental about 4x4. The problem is that the encoder is using it too much.
- y: Beyond 8x8 I can hardly expect that the blocks are correlated.
- j: What happens when the blocksizes don't match?
- jm: Nathan has a patch that TFs, but currently it only applies when they are the same.
- y: The phase might not match, see what I mean? If there's a phase change, the DCT response is really different and you can't copy the coefficient.
- jm: If your pattern is pure horizontal stripes then this will work perfectly.
- y: But there are only rare cases for that.
- jm: I just did a test at forced 8x8 with checkerboard. Without prediction is 98kB and with prediction it is 51kB.
- y: I heard from Tim that this gets up to 50% drop in rate.
- jm: The point is that it works; it's not about the block size.
- y: I don't agree. It would be useful for synthetic images, but real world images don't have these properties. There's no periodicity in real world images.
- jm: This isn't about perodicity. It's about continuing horizontal patterns. It's mostly equivalent to a freq domain version of the standard H and V predictors.
- y: From my small experiment, larger blocks are hardly ever predicted.
- jm: Any sort of intrapredictor works better at 4x4.
- y: Very few blocks are intrapredicted in the current scheme.
- jm: What I'm saying is that there are a lot of cases that are badly predicted. A few cases will be very well predicted, so we might as well catch these. And this has nothing to do with the blocksize. At 16x16 there is still a gain on synthetic stuff. This was never meant to make great gains. I don't see why we wouldn't include it.
- x: If it's only useful on a tenth of a percent, then it's not useful. On the other hand, this may be great if I get the directional transforms working.
- jm: The success ratio on natural stuff is very low.
- y: How much gain do we get from Nathan's adaptive blocksize patch?
- jm: I have no idea anymore.
- y: Ok. I will try to check that after I look more into this basic version.
- jm: Nathans' just TFs the neighbor block to the right size and then pretends they matched. If the current block is 8x8, and the neighbor is 16x16, then it will TF down the neighbor and use the 8x8 to predict.

# block size decision

- jm: This still really worries me. I've started making more experiments for BSS, and I think I've come across a problem that would even effect the ridiculous method in the paper Tim pointed out. Who here has looked at this?
- td: I looked a while ago.
- y: I am still very interested in this. Last update I got was that Jean-Marc had a small quick patch that fixed some things.
- jm: Absolutely not. That was a hack until we find the real solution.
- y: But the real solutoin would include that condition. I didn't mean that exact patch.
- jm: Yeah. I've been looking at solving the real problem because this is something that could kill the lapping approach. Right now the approach I tried to take is have fixed lapping and try different block size decisions and see hwat happens. But even the distortion metric itself causes problems with lapping. In the trivial case where the image is not moving at all, measuring the distortion of 4x4 and 8x8, not moving in 4x4 and 8x8 gives you really different distortion estimates because the whole thing is biorthogonal. If we do things in the lapped domain, we can't even resolve a case as simple as everything will be skipped so let's use the largest blocksize.
- td: I'm curious what comes out of smarter's only 4pt lapping thing.
- jm: His approach would have the same problem. If our distortion metrics cannot be equal in the case where we are skipping everything then it's kind of doomed. The minimum you would want is for your metrics of skipping an 8x8 block would be the same as skipping 4 4x4 blocks in the same location.
- td: If he only uses 4pt lapping that is the case.
- jm: No because you'd have interior lapping in one case but not the other.
- td: You should be able to make a distortion metric that would be the same for both. That seems wrong.
- jm: It is wrong, and short of inverting all the lapping I don't see a way.
- j: We can invert the lapping here can't we?
- jm: Yes, but I think it would cause other problems. Where do you ignore distortion and that sort of thing. The more I look into this problem the less I can see how we can solve it. Codecs like 265 do really fancy block size saerch and get the optimal quadtree and we can't come up with something halfway decent.
- y: With other codecs they use real RDO and they do it at the top level. They probe full coding passes. I have changed my approach to block size decision to larger scale. Last meeting I explained we want approximation of lapped transform and pvq coding. And each blocksize we will test the approximated version and then we will get a rough estimate of rate and distortion for that blocksize.
- jm: Does the metric you have in mind give you the same distortion for 1 8x8 vs 4 4x4 in the skip case?
(back and forth explanation of the example)
- td: Why would lapping vs no-lapping change distortion?
- jm: Lapping is not orthogonal and doesn't preserve MSE.
- y: This is why I suggested an approximation of lapped transform and gives a better estimate of distortion.
- td: Greg has those det1 filters that were never completed. Would those give equal MSE?
- jm: Those are also not orthogonal. There are orthogonal ones but they are awful.
- j: Do we pick the wrong blocksize here with the different distortions?
- jm: Absolutely. I simulated this with real rate and it picked 4x4 way more often than it should. What I did was that after figuring out all MVs and before doing the real coding, I run the actual encoder and not make it output bits or modify the image. I run it at everything 4x4 and everything 8x8. In the coeffs domain I look at each 8x8 block and what the distortion is. Then I look at the 4x4 and I sum the four corresponding 4x4s and look at the distortion and then add lambda * rate difference. When I look at the results, the vast majority of hte blocks it tihnks that 8x8 has better rate distortion.
- x: Minor nitpick: you shouldn't be summing should you?
- jm: D over 8x8 is sum of D over 4x4s.
- x: I thought we were concerned with distortion energy not amplitude.
- jm: My metric is MSE and adding energy.
- y: Is it correct ot add 4 4x4s?
- jm: yes.
- y: I cannot see how these are the same. L1 norm is the same, but L2 norm not. I'll check it.
- jm: We're never doing the sqrt. Just summing squares of differences.
- y: Taking the sqrt is the same though right?
- jm: We must be talking about different things. In any case, I end up with 90% 4x4 despite most of the image being actual skips. This is how bad the distortion mismatch is.
- x: The 8x8 has 4x the support. Or rather it scales linearly but you may have the proportion wrong.
- jm: You can't just apply some kind of scaling. We'd like to have a metric where if you are getting the same output the distortion should be the same.
- x: I disagree.
- jm: Considering identical images. The first one is keyframe coded. The second one is identical to the first one. So we'll skip everything at 32x32. But let's consider just 8x8 and 4x4. We'd want to skip everything at 8x8 level. And we would achieve this because distortion would be the same for 8x8 and 4 4x4s.
- x: You're objecting that the answer is different but the answer *is* different. You're not going to solve this looking at individual blocks individually.
- jm: If I look globally then I'd still get diferent distortion.
- x: Don't constrain so far.
- jm: It seems like we're doing contortions in this simple case.
- x: They are not supposed to be equal at that level. So choose a level at which they are equal. In teh current way the problem is set up there is no solution. I do not think that this means there is no solution with different formulations.
- jm: The only case where they would be equal is by undoing the lapping. And if we do that then we aren't considering ringing from block to block., which will bring another set of issues.
- x; You could fix this in the skip decision.
- jm: I don't want to add hacks on top of hacks. I want an actual metric that is not completely silly. I think other people should try to go hit this problem and see if they can come up with better suggestions. This is something that could break the entire toolchain we've created.
- td: are we always going to have time domain for both places? Can we subtract them?
- jm: We can totally subtract them.

DaalaMeeting20150210

Navigation menu