# Meeting 2015-01-06
derf, jmspeex, azita, td-linux, unlord, yushin
all seems good
- jm: What's the status of all the papers? Mine has a completed first draft. So I'd like to get some review from others. http://jmvalin.ca/video/spie_pvq.pdf I've already gotten some good comments Yushin.
- y: There are still a few mysterious parts, but I'm getting much more familiar.
- jm: Which parts are still hard?
- y: Big difference from standard codecs is companding quantization. There are two parameters controlling strength of visual masing is controled by gain, but it seems in the newest version one of the parameters is gone.
- jm: Beta is just 1/(1-alpha). It's just easier notation. Alpha and beta are representing the same thing.
- u: You may want to move the definition of beta up near equation 8 where you first use it.
- jm: Good idea.
- y: Section 5 is coding resolution.
- jm: Yeah, and I introduce beta after equation 9, but I agree with nathan that I should move it after equation 8.
- j: How are the other papers coming?
- d: I still have a lot of writing to do.
- u: Likewise. And yesterday we talked about an additional experiment we need to run.
- y: Now that our codec works in Visual Studio, can I get a computer to do that?
- j: Yes, you can get a loaner from IT, or I can approve a second laptop, etc.
- y: Objectively I compared our codec to H.264. For intra, Daala is better than x264 in most of the cases. I checked in the practical quality ranges. 0.25 to 0.5 bpp (not exactly correct because it changes picture by picutre). 30-32dB x264 was better in PSNR and SSIM. x265 I haven't tried. They are using adaptive quantization unlike HM and JM (standard reference). My first conclusion is that we can claim that x265 for intra. I think you already shared this fact. For inter coding, we have a problem that we need to solve. There are two parts, motion estiamtion (temporal prediction) and coefficient coding. The coefficient coding is shared by intra and inter, and we are spending 70-80% to code AC coeffs. If we improve inter, that will make us close H.265 because that's the remaining part. I hope the arithmetic coding is doing a big part for the coding, but transform should give a benefit as well.
- jm: One thing that would be useful to check is are we doing worse because our prediction is iworse, or because we are worse coding the residual? In 264, the coeff coding is the same in both inter and intra, but in Daala, we have PVQ and noref and we are supposed to use the prediction much more often in inter.
- y: I think that's a valid point. Since PVQ is working well in intra, even though a predictor is used for inter, i think PVQ is not bad at inter coding. I don't know the real culprit, but I hope the motion estimation is the reason. There are a lot of AC coefficient still there, but it's possible PVQ is the reason there are still so many coeffs.
- u: Could we do this with a CG computation?
- d: I don't think you can compute a coding gain, but you'd want some distortion metric. But I wouldn't do 264 and Daala but the partner codec and Daala.j
- t: Jack asked why we are worse at low bitrate than the partner codec. From visual comparison the skipping is better than our, and the inter coding. There are a bunch of talking heads videos in the ntt set, but we are coding a lot of flicker and noise in areas that aren't changing.
- jm: That is actually useful information. Are we coding DC flickering or AC flickering?
- t: looks to be AC. In the areas where there is motion, we are coding coeffs up to 20px away from the motion. They seem to have perhaps broken the deblocking filter when they ripped out things. It works but it doesn't seemed well tuned.
- d: Is that something you want to look into jean-marc?
- jm: I think Yushin should look into the flickering. I may be able to fix it once it is identified, but I'm working on a ton of stuff. I'm working on auto-disabling AM on edges. I'm using the motion compensated reference and detecting edges on that, and then using that to control AM.
- y: Is this new code than the block splitting function?
- jm: The issue is that when you have edges, for intra we just rely on 4x4 classification. For inter, you'll have edges that are not 4x4. So what I do instead is that i look at the motion compensated reference, and if it has edges I disable masking.
- d: So this is running in the decoder?
- jm: Yes, running the edge detector in the decoder. I'm using the one from the ptalagform Theora. Most of it could be SIMDable. There was only one division that could be a table but not SIMD.
- y: I'm still investigating the block split function, but haven't come to any conclusion yet. Who designed this?
- jm: I did. Everything weird in Daala is mine.
- d: Not true. I did OBMC :)
- y: One question about block split is that it's purely based on CG and CG is subtle. The decision doesn't seem to prefer smaller blocks. We probably need to adopt some thresholding method instead of continuous decision. If the difference is more than a threshold, then we change the blocksize. Assume we have smaller CG for larger block, then we choose a smaller blocksize, right?
- jm: We're looking at a mix of CG and distortion.
- y: Specifically, instead of just comparing CG with < or >, I'm proposing to use some threshold. If the delta is bigger than the threshold, then switch sizes.
- jm: I'm not sure what you're trying to fix.
- y: It's similar to motion vectors. If neighbor is 0,0 vector, then current motion vector wants to be close so that the motion field is smooth. FOr example if you are shooting a panning scene of a large building with many small windows, you could get false motion vectors.
- jm: You are talking about doing RDO on block size decision...
- y: It splits into smaller blocksizes only when it is very confident.
- jm: this is what the CG parameters do.
- y: It looks like the CG is very dense and continuous.
- jm: If you want to bias large block sizes, you can just change the CG values you have. We already have that bias. If the CG of 32x32 is smaller, then it would use it.
- y: Another point I am checking is if block splitting is identifying texturs and edges well. When I try to encode parkjoy, inter would not spend more than 2Mb. There are some less contrast parts on the right upper side. THey are kind of washed out, and that concerned me. I hope I can find some relation with block size decision and this washed out quality. Maybe there's no relationship but I wanted to check.
- jm: Anything you can find that we're doing wrong would be useful.
- y: PVQ and AM want to preserve quality in low contract part, which is edges.
- jm: No, right now AM is destroying edges.
- y: You're right, AM is turned off for edges.
- jm: For inter, we're not just using 4x4 for edges, which is why AM causes issues there.
- y: The gain companding is adaptive to our quality factor right?
- jm: Yep.
- y: It controls the strengths of AM based on strengths of gain.
- jm: I"m not sure what you're question is. The gain companding gives us non-uniform resolution.
- y: If the resolution (number of vertices in quant space) becomes very low, what would happen to our AM? It can wash out the quality right?
- jm: It would wash out less than if we didn't do companding.
- y: The resolution changes based on bitrate, so lower resolution for lower bitrate will destroy the quality right?
- jm: Lower resolution means worse quality for all codecs. If you compare AM to no-AM, you can see figure 4 in the PVQ paper. You can see the first level of quantization (level 1) on the blue curve with AM is at a lower gain than the first level in the green one.
- y: Gain itself is better coded than standard codec. I was expecting syntheiszed textures there; that's not happening?
- jm: If you are seeing things being completely washed out, then we're below the first level of AM. The quantizer is just too coarse.
- y: How about gain quantization. Is it quantized uniformly?
- jm: No, companding makes it non-uniform.
- d: One thing that would help in figure 4 would be to specify the step size in the caption.