DaalaMeeting20150303

# Meeting 2015-03-03

Mumble: mf4.xiph.org:64738

# Agenda

- reviews
- State of block size decisions (TD-Linux, yushin)

# Attending

smarter, unlord, jack, derf, yushin, gianni, TD-Linux

# reviews

572 assigned to monty

# state of block size decisions

- td: I worked on the tran paper. I got some results, but not as good as Jean-Marc's. I have not considered how to do activity masking so it's turned off in my branch. I can't figure out hwo to do that other than turn on AM after the block size decisions are made.
- d: How do we get a version of each to compare?
- td: We turn off QM, AM, etc in Jean-Marc's.
- d: You can't turn off QM.
- td: He normalizes them. Maybe I could do that with a table. When he fixed his he got a really small improvement, so it's not a major contribution.
- d: It will definitely be normalized different. THe first results he reported were in the break even stage, and now we're sitting at 18% on PSNR. What changes caused that?
- s: The RDO.
- d: The first changes had RDO in them.
- s: They also had tons of stuff disabled.
- d: They are still disabled.
- s: AM is the only thing still disabled.
- td: That's stuff I can't do yet.
- u: Didn't he add RDO to Haar DC?
- td: Mine also has that.
- y: Haar is only for intra right?
- d: Yes.
- d: We really only need to look at a 16x16 block at a time.
- td: I basically started with the more obvious way to do it. The 16x16 at a time is way more sane. I'm not exhaustively searching all combinations right now, but I could do that at 16x16 only.
- u: Do we still have a problem with the borders?
- d: We are already not account for that. We are doing 4pt on the borders. If you aren't searching all combinations, what are you searching?
- td: I try smallest block size and merge to bigger blocks. Then 32x32 gets compared to the best from that. It worked pretty well but it's not exactly like the tran paper.
- y: What you guys describe as dynamic programming is not really how I understand dynamic programming. What we're doing now is only find local minimum.
- d: Thomas's is computing the global optimum but for a problem that is not exactly the problem we want to solve.
- td: Trans paper searches all space for the optimum.
- d: But he is not using dynamic programming for that, just enumerating all solutions.
- y: I remember if the solution for the subproblem is used more than once then you keep around the solution.
- s: I think we don't care whether it's called dynamic programming or not.
- u: I guess we aren't publishing but we should use the right word.
- y: Before external presentation we should probably check.
- u: So Thomas' approach is not exhaustive because it's not splitting blocks?
- s: It's greedy.
- td: It only tries joining one at a time to see if it improves.
- u: So it's hill climbing?
- d: It would be called iterated conditional modes. It's similar to what WebP was doing with intra modes. They would pick modes for every block, then go back and pick modes given knowledge of the coding artifacts of their neighbors. This is like what nathan did for intrapaint.
- u: I was calling it vterbi trellis.
- d: That was dynamic programming for the updates, but scanning makes it ICM (iterated conditional modes).
- u: We still haven't done the test for exhaustive at 16x16?
- td: No. I should do that as well. Unforutnately what I'm doing is not exactly what's in the paper which goes against some of hte point.
- d: You said you did some comparison vs. Jean-Marc, but where are the numbers?
- td: They were all pasted into IRC in various conversations. I should redo it so that it's compared to the correct Jean-Marc branch.
- s: I was considering refactoring JM's branch so that we have defines for optimize for PSNR etc. Because optimize for PSNR branch of JM is not up to date. At least I think so.
- u: JM only does PSNR?
- d: JM has three different branches at least.
- s: Each one builds on the other ones. What I was thinking of doing is having defines for choosing which metric to optimize for. Switching distortion metric and quantization table.
- y: I was thinking if we did this as a runtime mode, then because JM is using a differnet filter set, based on the rate control mode we could switch the filters. Would that make sense? For example, for current master and JM's branch, the filters are different. So the decoders aren't compatible. So we should signal that.
- d: I think the idea is that we should only have one of these approaches and we just haven't decided yet.
- s: We could have an option for using the full lapping stuff, so that if we figure out how to make an encoder for it we can do it.
- u: If we do exhaustive search we should be able to get better performance.
- d: That remains to be shown. JM has a bunch of fancy metrics in his decision. Is there any reason we can't use the same metrics in your stuff Thomas?
- td: Yes, that is an easy path.
- s: You also need to change your QMs. That's why I want to refactor this stuff so we have a common base.
- td: The QMs have to be retuned.
- d: No one knows how because JM didn't document what he did.
- s: Does it make sense to fastssim on an 8x8 block?
- d: You could do something but it wouldn't be fastssim. The first thing it does is downscale 5 times.
- y: In freq domain, if you exclude high frequency, then I think you can achieve something like downscaling.
- d: You don't have five octaves in an 8x8 block.
- y: For 8pt DCT the 8 basis functions, each freq is close to pi/8 right? No, it's low scale. The HF basis is pi/2 to pi. And then the middle one (2nd left) is close to 16 to pi/8.
- d: They aren't octaves. An octave is a power of 2. In an 8x8 block you have room for 3 octaves. You can compose it into bands that are not octaves. You could do partial downsampling at different scales but...
- s: Would it be a good AM metric?
- d: Prior to Daala I would have said SSIM, but JM made AM which didn't make SSIM go up. When I did this in Theora it went up 3dB.
- s: Seems like it's gotten better since the original implementation for SSIM.
- d: There are papers on how to optimize locally for SSIM. Look at Bovik's papers. Thomas, do you know what you're doing next?
- td: Yes, at least for today. It shouldn't take long.
- y: What I tried is I wanted to have the assumption that JM wanted to avoid. That the amount of lapped pixels are different for each block size. For nxn there are 3*nxn lapped pixels. The reason I wanted to include this lappepd pixel as input and output of current block is that if you view from decoder ... The current result is that there is some improvement, but it is less than half of what JM has right now. If I compare to master it is giving me PSNR of -11%. Others are -3, -6, and -6. This is not a small gain. When we do RDO I added the overhead of splitting because sofar when you are adding the rate for block size information (for example 32x32 block and 4x4) we have added just constant amount for each block size. But when you go from 32x32 to 4x4, you have to consider all the split information all the way to 4x4.
- d: This is the part I don't understand. Think about it as you're going up the tree. You're adding 1 bit for each fo the 4x4s. And now you want to consider whether you are doing 1 8x8 or 4 4x4s (you add one bit), then when you go to do 16x16 vs some 8x8s or 4x4s, the ones that are 4x4s will have an extra 2 bits factored in, and then you'll add another 2 bits for the 16x16. When you consider a decision at the higher level, you aboslutely consider the decisions at the lower level.
- y: Actually it's more than 2 bits. I'm adding some low scale of the level.
- d: You are adding much more than that for the lower sizes.
- y: I penalize more for small blocks.
- s: How do you compute your penalty?
- d: ???
- s: Have you looked at the decisions that your stuff does?
- y: They look reasonable to me. They split when there is an HF edge. I only checked at -v50. I didn't check large values.
- d: You've only looked at intra?
- y: I have tested both now. I use the same code for both. Half the improvement for inter as intra.
- d: JM's code was the other way around.
- y: If you can recall, JM started this RDO approach for inter.
- d: He did a version for intra, and he got about half the gain from inter. It seems odd that yours is the other way around. I was trying to say that I don't think your explanation of why you are penalizing is correct. It's entire a bias that when you have relatively little blocks at that level, it's expensive to code a split there. Adding in a constant 2 bits is definitely not hte right cost. We can measur e this. Take JM's branch and look at the probabilities for his decisions. You can compute an average over all the frames in some subset of videos. That is a better static cost than 2 bits.
- y: I'm not sure JM is including the bit for split information.
- td: He added 1/2 bit for each split. Sorry, 2 bits for each split.
- y: Same at every level?
- d: Yes. Just go take some block size decision that does approximately the right thing, and measure that, and now you have a nice static probability table. If 32/n works better than that, then we don't understand something here.
- s: You're just penalizing more against small block, and it just happens to help for some metrics.
- d: JM already put in a hack to penalize small blocks.
- y: Another part I included was to exclude DC coeff during RDO. The DC part is already encoded by Haar before PVQ is coded.
- d: You have to do something different for DC, and JM did something, but I'm not sure what it is.
- td: I think he does partial Haar, depending on how far down the tree is is.
- u: We talked about doing this for partial prediction right?
- d: Not sure what you mean by partial prediction.

(discussion about smarter's grayscale tool)

(discussion of quilting that builds up over time. not a lot of diagnosis of this so far)

- td: Nathan, are skip decisions in the visualizer yet?
- u: Not quite yet. I've got something and I'll have it posted today.
- d: Figure out the quilting stuff is a pretty high priority for Monty. I'll talk to him about that when he gets in. We now have a test case on a video that we can distribute. It happens on sintel if you do 50 frames.
- u: Yushin, were you able to get the stream analyzer to build?
- y: No, a bunch of compile errors.
- u: I'll work with you to figure that out and then I'll commit it.
- d: Guillaume, I agree we should work with monochrome images for now. But we do have to fix the problem eventually.