DaalaMeeting20140128

From XiphWiki
Jump to navigation Jump to search
# Meeting 2014-01-28

Mumble:  mf4.xiph.org:64738

# Agenda

- review review (27 open issues)
  - 144
  - 132
- pvq: activity masking http://people.xiph.org/~greg/daala/pvq_activity/psnr.png
- pvq: 16x16
- pvq: energy preservation
- pvq: demo 5
- work week agenda bashing - https://daala.etherpad.mozilla.org/workweek-201402

# Attending

xiphmont, jmspeex, jack, unlord, gmaxwell, derf

# activity masking

g: i achieved an impossible result. i mananged to improve PSNR. what i'm doing here is changing the computation in teh pvq code that selects the resolution and also changing how it quantizes the gain. in changing it to achieve implicit activity masking so that we give it more rate in smoother areas where noise is more visible. this is achieve straightforwardly in the current code by companding and uncompanding the gain. at least in the case where there is no theta, is there more to it?
jm: even when there's no theta, tehre's factors of 1+2alpha that come up in a few places.
g: i'm aware of that.
jm: computing k is a function of the gain.
x: missing teh big picture here
jm: monty, did you read the video pvq doc that i wrote? the whole diea here is that flat areas should have more resolution. highly textured areas should have less resolution. the first thing you do is when you quantize the gain you want more resolution in flat areas. instead of using uniform scalar quantization on gain itself you use it on a warped version of the gain. on the gain raised to a power less than one.
x: yes, i played with that.
jm: once you do that you also you need to update how you determine k as a function of that gain.
x: so setting activity has to be taken care of elsewhere in the code. got it.
g: i didn't expect to achieve an improvement in PSNR. that blew my mind. the PSNR graph was surprising. this change should hurt PSNR and help SSIM, but i got an across the board improvement. maybe jm has ideas about where we were suboptimal before.
jm: i notice you're increasing the rate across the board which also influences other things. i wonder if while applying activity masking you could just scale the quality so that you achieve similar rate and see what happens?
g: i'm confused.
jm: basically adding this activity masking increases resolution across the board because all gain steps get closer. so i was thinking in the function itself you could multiple also the quality (-V option) by a constant factor that just gives us equivalent rate.
g: i can do that. i'm still at a loss as to where our behavior is non-optimimal such that i get a PSNR improvement.
x: i found that we're currently in the case where the vast majority of the pvq partitions are quantizing to 0. maybe you got out of that. this is not a product of the RDO search. most gains are coming in below 0.5 and they quantize to 0. at moderate and low rates, when i look at pvq compared to scalar, ???? the current behavior of pvq completely defeats the whole purpose of using it. that is energy preservation is not happening. high end detail is getting thrown away.
g: i think what we were tryign to achieve is that we expect that pvq would actually result in a straight up MSE improvement as well. the old stuff did not result in a MSE improvement so the point of the reimplementation exercises would fix that.
jm: you are summarizing that perfectly.
x: i agree. however in my mind, i was hoping to get primarily psycho-visual results.
- jm: i thought about doing that as a second step. first get an MSE improvement, then imrpove visuals.
- x: quantizing all the low energy stuff to 0 appears to baked in to the way it's being done.
- jm: it's absolutely not. baked into the encoder or the format?
- x: simply into the way we handle it right now.
- jm: i think a lot of this is the cost function
- x: i don't. the cost function is applied after things are quantized to zero.
- jm: right now the cost function does not value gain preservation.
- x: it is always quantized to zero before that point.
- jm: oh, you want to preserve 0.1 gains? that is for activity masking. the only way to do this is activity masking or some variant of it.
- x: i disagree with that but i do think that activity masking is a fine way to experiment.
- g: activity masking compands the gain and makes it less likely that small values will quantize to zero.
- x: i hope that's the case.
- g: as an aside, this code is hellaciously slwo right now.
- jm: the loop for the non-predicted case tries all the gains instead of a small range. we're doing many calls to the pvq encoder instead of a single one. and the pvq encoder itself doesn't have the fast pass.
- g: we may need to improve perofrmnace to finish the work on this stuff because it's really slow.
- jm: one thing i realized from greg's results and it's going up always. and not touching anything it should go down. did you only change activity?
- g: no i put in the constants, but i'll check that i didn't do it backwards. the reason i'm perplexed about this is that my expectation is that nothing i'm doing should improve PSNR.
- jm: i've also got the activity variable backwards compared to my document. what did you set it to?
- g: i set it to 4/3rds.
- jm: it should be set to 3/4ths.  the document says that 1 + 2alpha should be 4/3rds.
- g: i'll check that i have it right. that still doesn't explain PSNR improvement by 0.5db.
- jm: i can see why it does. right now the pvq code is preserving energy than is MSE optimal.
- g: how?
- jm: at high rate it is optimal. at low rate, i did the derivations of MSE optimal at low rate, and what is actually MSE optimal is to have a black hole in the middle. if you have 1000 coeffs, anything that has a gain below 5 or 10 you better have it just at zero. essentially if you have a really large N and the gain of your vector is 2, then you look at what would the error be if i set k according to gain = 2and it gives you a distortion. you find that setting gain at 0 gives you a lower distortion. i suspect by doing activity masking backwards that would cause the PSNR improvement.
- g: i'll go figure that out. i can see how that could happen now.
- jm: just look at where activity is applied. maybe it's worth not changing the value of activity but reversing where we use the inverse activity verses the activity itself.
- g: i'll go fix it.

# 16x16

- x: there's not a lot to say about it other than it was reasonably easy. except for the constants sprinkled all over the code. the fact there were 4 PVQ partitions was hardcoded in 4 differnet places. it's straightforward. it's implemented and working. the only caveat is that i don't know what CDF stands for. the CDF for 16x16 is a placeholder and we need to code more than 16 choices. so it's coded as 4 bits using 8x8 and then 3 bits with flat probability. if it turns out not to be a premature optimization. i've been traipsing through this code, and perhaps i should document it. it's not so much that the code doesn't make sense, but there are anumber of things that are named strangely and in weird places. i am preparing a patch that moves a few things to more logical places and fixes up the names. pvq_encoding is now laplace_encode_vector. the details aren't important; we can bikeshed on the patch.
- g: i agree. it's come up as a source of confusion a few times now.
- x: all these chagnes are minor though there will be some bikeshedding of naming.
- jm: you didn't see anything the code was doing that was stupid?
- x: there was some dead code and a few places where we were dealing with coding conventions. we can go into more details offline.
- greg: testing shows dead code:
https://mf4.xiph.org/jenkins/view/daala/job/daala-coverage/ws/unix/coverage/src/pvq_encoder.c.gcov.html#165
- x: as for removing old pvq thing, it's already gone.
- jm: what's the status of your patches?
- x: they are in my local git. i was going to do the mild refactoring and commenting before submitting all of it.
- jm: if you have some that are ready to apply i have no problem looking and merging right away.
- x: ok
- jm: do you have them in public git? if you push them somewhere i can review from git.

# energy preservation

- x: we were looking to pvq for two things. it's elegant and perhaps more efficient. the third thing is, is energy preservation actually useful? i think we should experiment with that. i'm certain it is going to behave in surprising ways and that it will be useful. i'm not sure that the small amount of efficiency we're seeing is worth the additional complexity. it's not particularly impressive in my testing. i'm seeing visual effects that are not as good as scalar. there are large sections of the iamge that just wash out.
- jack: can you bring some sample images to the work week so we can all look at them together?
- x: yes. even if you discard the images that are surprising, the other images look better with scalar.
- jm: these are regions wher eyou have gain < 1?
- x: yes. these are regions where energy is disappearing. happens in both HF and LF. a large area of the image grays out slightly, but compared to scalar it's obvious.
- jm: activity masking is one part of hte solution. another thing to experiment with (in parallel) is to have something explicitly handling really low gains. like adding 1 pulse at 0.2. some kind of spreading for just this issue.
- x: what i was going to propose experimenting with is to do away with the partitioning and apply a spectral tilt. try to keep the rough energy across the spectrum more or less intact, but do it in a reversible way like with CELT. (spectral tilt = whitening filter). what i'm loking for is nenergy preservation so that energy doesn't move as we quantize but we smoothly decorrelate. i want to see what it looks like when our quantization process moves to noise and grain that is true to the original spectral distribution rather than moving towards everything becoming a flat block.
- jm: if you apply this whitening filter and then inject noise...
- x: i dont' want to inject noise. i want it to pick up noise instead of drop energy
- jm: if you apply the spectral tile then code then apply it backwards, you're going to give better resolution to the stuff that was weaker.
- x: ype. we don't want to do a straight version of that. we're allocating our resolution based on energy, and when the LF is low energy you lose contrast. when the HF is low energy it turns into a flat area of perfect color.
- jm: that part i get. if the gain doesn't quantize to zero you'll be using a bunch of pulses, and maybe what we need is a first step is a single pulse for small gain so that you're not completely washed out.
- x: that's a good test as well. one reason i want to drop partitioning is that diiving a big number doesn't give you two little numbers, but two more big numbers that are just slightly smaller.
- jm: i was hoping the gains would be easy to code because we could code them jointly and predict them.
- x: that may be the case. what i'm seeing right now, is that until we get to that point it's not worth discussing because they are all zero.
- g: are you comparing these things at equal rates?
- x: yes. i realize that the rate is sort of a guess, but i'm running with RDO and matching rate.
- g: i think it's well accepted wisdom that at low bitrates you should be reducing resolution of your images. and that remains true even if you have good partioning schemes in the codec. it's positive to me that our quantizer is making similar decisions. i wanted to make sure that what you were complaining about wasn't what was expected to be a good behavior.
- x: if i was taking this in a vacuum, i'd agree.
- jm: what is reported as the quantization factor ?
- x: i have been testing at between 32 and 48. low enough rate that it's causing obvious differences but well above where it would totaly fall apart.
- jm: the function that has k as a function of hte gain is MSE optimal, and maybe we should deviate from that. it has the same error in the direction of the gain as it does in other directions. maybe it shouldn't do that.
- x: at the moment things are disappointing and we have reason to be concerned. perhaps activity masking will shape it right up.
- jm: the one i'm mentioning right now is what i consider to be an easy one that is complemtary to activity masking. i always thought it was reasonable to check that early on.
- g: i will play with that. i wonder if there is a better way to test these visual tradeoffs.
- jm: that's one of theprolbme with pvq. there are thousands of interacting things.
- x: i think we might want to turn prediction off and try it that way. the one thing that pvq is doing right that scalar is botching is that scalar is waffling like hell and pvq is not.
- g: this may be the zigzags.
- all: ZIGZAGS!
- g: can you check to see if there is a difference in waffling between even and odd quantizer values?
- x: pvq shows no hint of this.
- g: pvq gets rid of hte zigzag. i could see a rounding difference causingi that.