DaalaMeeting20141216

From XiphWiki
Revision as of 08:58, 23 December 2014 by Jack (talk | contribs) (Created page with "<pre> # Meeting 2014-12-16 Mumble: mf4.xiph.org:64738 # Agenda - reviews - update1 - goals - ?? # Attending # reviews - no one waiting on anything # update1 - x: Not a l...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
# Meeting 2014-12-16

Mumble:  mf4.xiph.org:64738

# Agenda

- reviews
- update1
- goals
- ??

# Attending

# reviews

- no one waiting on anything

# update1

- x: Not a lot to decide other than if we want to update the pictures and graphs.
- jm: I think we should at the very least update it with more current curves and images. We've had big enough improvements it's worth updating. The question is do we want to add 32x32 DCT to that. Right now we have two things that are big improvements to master. They are actual 32x32 and deringing postfilter.
- x: What is the actual improvement?
- jm: Between the july curve that we have and current master + 32x32, the difference is ~5%.
- u: There are places in our commit history where we broke everything. The 8-18 commit may just be a bad one. The curves I showed Andreas skipped those and doesn't show 15%.
- jm: On still images we got about 5% since July. The most improved metric is PSNR-HVS-M.
- j: I don't think we should gate the release on 32x32 and deringing.
- jm: I think we should include 32x32 and not deringing.
- x: I think we should compare against master. If we have to wait, that's ok.
- u: I agree with Jack. I would rather not commit to 2 days for 32x32.

# goals

- j: the plan is for everyone to commit to 1% on metrics in Q1.
- y: which metrics?
- d: focus on fast-ssim and psnr-hvsm. These goals will be evaluated by humans.
- y: How do others (external) consider these metrics?
- j: These metrics are for bonus calculation (the personal portion), and not for external verification.
- jm: If all four metrics say you were worse but the images are better, that's still good.
- j: I'm evaling these, so we can work with that result

# yushin question time

- y: already started asking jean-marc and jack questions about pvq. i'm trying to understand the code right now. i'm trying ot understand pvq, and why we are using that and learning about patents.
- x: We loved pvq for audio becuase of gain preservation, and we thought that would be useful for video as well. The patent things are a nice fallout, but it wasn't the first order concern.
- jm: We don't need as much energy preservation for video, but it's still good. Another advantage is that we can do activity masking with no signaling, which means we can do it block by block or treat H and V separately. The last advantage is that from the N parameters it is applied to, it extracts gain and angle which have actual meaning, and we can use better statistics than just the normal statistics on coefficients.
- x: It partitions the search space into spaces with actual meaning.
- y: I don't have a concrete feeling yet how it is introducing errors when you manipulate k value. With scalar, it is known how errors scale with quantizer, but for PVQ I don't have this internalized yet. I will ask jean-marc about this.
- jm: Essentially, the values of k are chosen specifically for PVQ to follow the same quantization rules. With PVQ 1 bit per dimension will give you 6dB improvement on average. I don't have that math in the pvq demo, but it will be in the paper I am writing. k is selected to match the same distortion curves as scalar.
- y: That is very nice.
- jm: If you change q by a factor of 2, you will have twice as many levels for the gain. q is basically first mapped to the gain. The levels for the gain are exactly the same as they would be for scalar. If all your non-zero values are in the same axis, you would have the same scale as with scalar. If you look at the figures in the pvq demo, you can see what a scalar quantizer looks like in 2D. If you scroll to the figure after the two figures side by side, you can see with the same step size what PVQ looks like. The resolution is pretty similar. The difference between your circles is the same as scalar, and k is chosen to have uniform resolution. In the paper I have the actual derivation of what k should be.
- y: I'll ask more questions as they come up.
- d: I encourage you to ask in public IRC so everyone can learn those answers. Probably lots of people have the same question.
- jm: You (yushin) are in a special position now to be able to notice mistakes we made and tell us when we did something stupid.
- y: Companding is quite odd in a video codec.
- jm: That's the part that does the activity masking.
- y: Signal companding is compressing the whole dynamic range and you get better quantization.
- jm: It's the same idea here, we compand the gain so we get better resolution for small gains. For audio we go logarithmic, but for video we don't go that far. In Opus we quantize the log of the gain. For video logarithmic is way too much companding; linear means no activity masking. So we find an exponent that is in the middle (2/3 or 3/2 depending on which way you look). In the PVQ demo, you see a 2D quantizer. If you compare that one to the previous one I showed you, you see that for very small values you have more resolution, so very slight amount of texture you can have code points that are close. As the gain gets larger the spacing gets larger.
- y: That's what I'm calling non-uniform quantization.
- jm: As k gets larger, in all directions you get coarser quantization. This implements activity masking without signaling that you need to increase the resolution or decrease it. In a normal codec, you would have to signal that.
- y: If I do find anything stupid, I will definitely let you know. For now I will need some time to dig into the lapping transform and PVQ. The block size decision is a little odd too, since most codecs choose block sizes at the very last moment. But in our case it's done early, but it's good for me because it's much simpler to follow.
- jm: BS decision is done for two reasons. One is that we can't do RDO like 264 and 265 because of lapping. We need to apply lapping before we do the DCTs, and the lapping will be different depending on the blocksize. We can't just decide to change teh size of one block, because it will change the lapping for previous blocks, including up neighbors. We can't just locally make decisions.
- y: Are we signaling block sizes?
- d: Yes, they are signaled.
- y: I don't see constraint related to lapped transforms then. If you are encoding one super blocks, we can consider two different encoding passes. One is split in 4x4 only, and one no splitting. Can we do this?
- d: THe problem is that the 32x32 superblock is lapped with its neighbours. The decision depends on the block size of your neighbours. We don't encode the lappings to use, the lapping is determined from the block size.
- y: Can we assume the neighbours block sizes are decided?
- jm: No, because some neighbours are in the *next* row.
- y: Ok, I will think this over offline. This is similar to intra-prediction issues right?
- d: You can make greedy optimizations here. Greg tried this but gave up, and you might ask him about this problem.
- jm: The other difference is that the actual decision for blocksize that we code, we don't have exact measures of distortion and rate like 264/5, but we are trying to get a better distortion metric by using something better than MSE. If you look in od_split_superblock, this function tries to measure how visible the ringing would be from different sized transforms. It's assuming that the more texture you have, the more ringing you can have without the artifact being perceptible. It's trying to measure where the image is flat and textures and see how you can split blocks so that you don't have an edge leaking energy into the flat area. It's by no means perfect. It has to change with the rate and that's just approximate. It's attempting to do something there, and it seems to make sensible decisions. Mostly for still images. For video I'm not convinced yet. It's very hard to tune because I've noticed when looking at metrics, biasing to 16x16 gives better metrics despite the images looking pretty bad.