DaalaMeeting20150127

From XiphWiki
Jump to navigation Jump to search
# Meeting 2015-01-27

Mumble:  mf4.xiph.org:64738

# Agenda

- reviews
  - Let's land as much as we can before London
- London meeting agenda

# Attending

jmspeex, unlord, jack, yushin, td-linux, xiphmont, derf, smarter

# reviews

every is fine with getting reviews done before next week (8-12% improvements in metrics)

# matrix interpolation

- y: I want to make sure we are looking at real images. I believe our metrics (fastssim and psnrhvsm especially) gives better score if we use aggressive QM, but this causes artifacts. I want to make sure these QMs look ok visually. I heard that Jean-Marc checked images and it seemed ok.
- jm: I haven't done extensive testing of everything there. The interpolation I'm doing is between Yushin's matrix and a variant of what we had before (what we have in master now) that is tuned for PSNR-HVS and that we use at high rate.
- y: 32 means scale by 2.0. 16 means no scaling. What Jean-Marc is using is the minimum value. When I see other standards, they vary the quantization stepsizes by no more than two times because that's very risky. That's why I wanted to make sure visually.
- jm: At some point I was looking at the jpeg QM, and it had pretty huge values. Using 16 as meaning 1.0. The JPEG matrix has values up to 120.
- y: 8x then?
- jm: yep
- y: JPEG doesn't have inter, so it's quite different. They have very spikey coeffs. After inter and intra prediction, we only represent the residual. The AC coeffs are quite a bit smaller than JPEG. My claim is not really true for our intra picture, but I'm more concerned about our inter picture. It will give us a smaller residue. We must make sure our QM scaling is not too aggressive.
- jm: At some point you pointed me to QMs, and one reason the intra and inter matrices are different is that MPEG2 had a dead-zone quantizer.
- y: They have option for non-linear quantization. Dead-zone means around zero right?
- jm: They have a larger step around zero, which means they need a finer quantizer. If they have a quantizer of 2 then it is really a 3.
- y: If you use a larger quantizer then all the AC values will have less resolution.
- jm: The first reconstruction level is at 1.5 in MPEG2 right?
- d: Yes, I believe that's correct.
- jm: So looking at the QM there, a value of 32 is a value of 48 if you aren't using a deadzone quantizer.
- y: What you say is possible, but I think it's a special case of bias.
- jm: There's a bias, but we already have it (so do 264, etc). My understanding is that the reconstruction values themselves have a deadzone.
- d: This is different than Theora, where we didn't have a deadzone in the reconstruction. The originally VP3 had different QMs for intra and inter, and when I replaced them with the same matrix, I got universal improvement across the board. VP8 uses flat except for DC (which is independently scaled).
- y: I will generate decoded sequences for current master, and then another for the relaxed one, and another flat, and then we can compare the images visually. 
- jm: Make sure that you check multiple rates. At low bitrates, relaxed is better. I don't know if the bitrate is too high for us to see much difference, but at v=20, the high bitrate matrix is used as is without interpolation. The interpolation is that anything below v=20 gets  high rate, and anything above 70 gets high rate, and anything between is interpolated. So I think it's really 20 that we want to check.

# block size decisions

- y: ntt-short got better by using all 16x16 or all 32x32. For intra on video-short1, 16x16 is better. What I want to propose is to use RDO, but lapped transform makes that difficult.
- u: smarter and I talked about this at VDD. You code the image all 4x4 and all 8x8 and you compare the bit costs for each block, ignoring neighbors.
- y: For up to 16x16, there are 17 cases. For 32x32 it's 80k.
- d: If you look at the paper they were able to get decent results even with just up to 16x16. We aren't even doing well up to 16x16 yet, so let's try to make this into a more manageable problem.
- jm: Even that I don't know how to solve because of a minor detail related to us not having an actual distortion metric that can be used across blocksizes.
- d: This is in general a hard problem.
- jm: From what I understand of 264/vp8, they have MSE as a distortion metric. And that can be compared across block sizes. What we have is two levels more complicated. We have QMs and activity masking. Plus we have extra weight on the gain difference, and that changes based on different block sizes.
- y: We need to approximate lapped transforms, PVQ, adaptive quantization. Addressing Tran's paper, I think it can be a helpful reference because it does not do motion estimation. But in our coder, motion block sizes are completely unrelated. What I'm thinking is that usually they estimate bits and simplest way is SAD or SATD.
- jm: I'm not sure how well you can approximate PVQ for inter.
- y: It's possible that in real world that this cannot be done in the encoder, but we should show the maximum performance of our coder.
- d: We need to show ourselves because we need to know what we're approximating.
- u: Do we have an idea for distortion metric?
- y: That's easier, we can use MSE or SAD, or SATD or SSIM.
- jm: You want to run it on each block?
- u: No, run on the final image withe very block decision determined.
- jm: How do they compute the distortion in the Tran paper?