DaalaMeeting20140225

From XiphWiki
Jump to navigation Jump to search
# Meeting 2014-02-25

Mumble:  mf4.xiph.org:64738

# Agenda

- patch status
- fastssim metric?
- ietf89 planning

# Attending

jmspeex, unlord, derf, xiphmont, gmaxwell, jack

# patch status

- derf: 8x8 is in, but 16x16 is not in
- derf: i have not made a lot of progress on 32x32
- greg: i screwed up the training on 16x16 and that's why it's not done yet. i left it running for 2 days with the old code, and i didn't notice it until yesterday.
- xiphmont: i am running my own RDO tests but was curious what the status was.
- unlord: you're testing RDO?
- xiphmont: i'm just keeping track of what works and doesn't. i'm mostly focused on PVQ and trying to inform my intuition about where we are getting gains. i want to be able to command the knowledge without thinking about it a lot.
- unlord: RDO or RD curves?
- xiphmont: RD curves, sorry.
- jack: did the scaling patch land?
- derf: yes. we still need to do all the followup things: transform multiples might overflow now. CfL needs to be revisited since we were at the precision limit before, and i'm not sure what the third one was.
- greg: i ran the training tools in clang with integer overflow checker and it wasn't overflowing on me at 16x32. i only ran it on subset1.
- jm: when did the transforms land for 8x8?
- derf: yes. same patch as the tools. wait. there were two patches. one did non-neg constraint for search (was 4x4 only last I checked) and the other was changes to 8x8 including updating the tools to use the new transforms.
- jm: i'm looking at git, and trying to find 8x8 coeffs.
- derf: did you check that in greg?
- greg: maybe i forgot to push it. will fix.

# fastssim

- jack: fastssim was/is broken. should we delete it?
- derf: we can certainly stop running it, but it's probably a good idea to understand why, but it's a low priority. the nice advantage is of fastssim is that it's faster and the implementation is multi-scale which has interesting properties. one of those properties is that it's one of the few metrics that shows changes in quant matrices beyond flat to jpeg-style. multiscale SSIM will move where the other metrics don't.
- unlord: i can remove it from rd_plot.
- jack: perhaps also bd_rate.
- unlord: sounds good.
- xiphmont: what about people who notice we collect data but don't use it?
- jack: file a bug and put the bug # in the comment that disables it so people know why
- derf: traceability!

# pvq

- jm: i ran some experiments with joint coding of gains, and in theory we may want to do that, but i've been disappointed with the results.
- derf: this is jointly coding the gains within a single block?
- jm: the octave experiment was dumping the values of the 3 gains in the top octave and looking at the sum of independent entropy for each of the three gains and then looking at the joint entropy. the two differ by 3-4%. not nearly as much as i would have hoped. this is without prediction because i wasn't sure how to mix gain prediction into that. gain prediction gets us less than 1% currently.
- derf: less than 1% of the rate on the gains?
- jm: less than 1% of total rate. the 5% was % of gain itself.
- derf: 1% of total rate is a lot.
- jm: it was less than 1%. the difference in total rate is 0.5%. the gain itself is 10%. the gain prediction is about 5% relative to the total gain thing.
- derf: we're spending 10% of the bits on gains. huh. that seems like an awful lot.
- greg: that's crazy
- jm: we're dividing into bands now
- x: we turn one large DC and a bunch of small AC into one large DC and 5-6 large AC coeefs. it defeats a large amount of the purpose.
- jm: it's about 80% of the bits for pvq.
- derf: how many dimensions in bands on avg?
- jm: this is with block switching. the band sizes are 15, 8, 32, 64, and maybe 16s.
- derf: maybe not so terrible but it's a little concerning.
- jm: i didn't see that it matched. i dumped all the gains and measured the entropy, but i iddn't check that it matched the rate that it spent in the encoder. i'm not that concerned about that. it seemed to mostly match. there seems to be something like 10% on DC and all kinds of signaling. 70% on pulses. 10% on gains, and the rest on no-ref flags and thetas. i'm not sure i like the no-ref thing.
- derf: i'm sure i don't like it.
- jm: why?
- derf: seems like an odd thing to have. in particular, what this is saying is that our predictors are harming in a bunch of cases and we're going to spedn bits to correct that.
- jm: hwich is what you do for scalar, except that we don't need to spend pulses on ignoring the predictor. for scalar if a block is flat and you predict all kinds of crap... ???.. with pvq if you have flat and predict grab you say ignore band 0. two choices i had: no-ref flag. the other was flip the angle. have the angle be from 180 to 90 degrees. but that basically meant the equivalent of subtracting the prediction which i didn't really like.
- d: the point was if you're going to do this in scalar, you could code a flag to ignore the prediction and you wouldn't have anything to cancel out. if we're going to compare pvq to scalar we need to compare apples to apples.
- jm: i'm trying to think if there is a better way. the only advantage of having hte sign flip with theta is that there's no redundancy whereas ignore prediction has some redundancy. the same point can be coded with the theta or without.
- g: it is also potentially redundant with our mode bits. in theory we should have a mode that is not predicting a whole bunch. so if really the prediction is terrible, that's the mode we should be using.
- jm: it's easy to say if we predict mode 0 then these flags are hardcoded and you don't do that.
- greg: ok. i retract that objection.
- derf: it's easy to do but we're not doing it.
- jm: i had to disable prediction for my experiments with gains. jointly coding thetas with gains when they don't have a theta becomes really messy and i didn't really like it.
- d: i'm not sure how you get around that.
- g: does it mess up the context modelling? i imagine the stats are pretty different.
- jm: right now there is not much context modelling. in terms of context for coding gains and thetas there are so many potential things to use and they are actually hard to use because we are switching and some have prediction and some don't. i'm not sure what the gains would be. they don't seem to be that large. for the gains and thetas you can use the gains and thetas of neighboring bands and blocks, the prediction itself.... i'm finding it hard to actually use this.
- d: this is the sort of thing that we need to spend an awful lot of time on at some point. this is a little more of an art than a science. i am hoping we can makek reasonable progress without sinking a lot of time on it up front.
- jm: among the things that need figuring out is the band layout. maybe monty wants to look at that.
- x: i've been thinking about making it part of demo5 with some js so you could play with band partitioning in real time.
- jm: looking at curves?
- x: no. actual image output.
- jm: we'll need to figure out band layout. i'm not sure how to make it easy to change the layout and second how to actually decide that one is better than another one.
- d: first one soundsn like an engineering problem. i think we can figure that one out. for the second one, maybe looking at images will tell us something.
- jm: you need decent enough encoding that you can reliably use the image you're looking at.
- d: i'm not following. you mean the issue is being able to change the layout and have the stats updated?
- jm: yep.
- d: you're right, that is a little more challenging.
- jm: i did a few experiments with the k value for the g=1. i think we can do a little bit better, but not huge gains. what i'm finding is that we need fewer pulses for small gains. this is with curves so needs to be checked on real images. there's all the activity masking that goes into that as well.
- derf: activity masking would suggest we want more pulses?
- jm: no it says we want more gain resolution at small gains. k is a function of what we're quantizing. this is orthogonal to activity masking.
- d: is it?
- jm: mostly, although you can combinei them.
- d: i guess you're right.
- jm: the idea is that MSE optimal means you want to skip gain=1 and not have pulses there. and of course we're not doing that, but at the same time having fewer pulses for k is good since you can use a smaller stepsize for the gain.
- d: this is what i've been telling people is great about pvq all the time
- jm: i'm still telling you where the energy is, just with not very much resolution.
- d: maybe we'll get to a point where we can experiment with folding or something
- jm: i did play with injecting noise. it still means you need to signal the gain
- x: quad trees!
- d: we can make that decision independently but scalar cannot. you can think of several examples where you'd want to do that.
- jm: what it also told me is that we also want to figure out activity masking before figuring out gain = 1.
- d: presumably greg will have spare time while 16x32 filters are training. maybe he can figure out while that patch is hurting instead of helps.
- jm: on metrics on actual images?
- g: both. obviously something was busted.
- jm: hurting metrics is not that surprising.
- d: hurting SSIM is surprising.
- g: i expect it to help SSIM.
- d: when we added activity masking to theora we got a 3dB improvement to SSIM. that was using the old SSIM which was doubled, so 1.5dB with the current tool. but it was still quite a lot.
- jm: i think we want activity masking to be adaptive and not apply it to 4x4.
- g: that may be part of what i was messing up.
- jm: there's a patch that goes a long way for that (bss ???).

# ietf89

- jack: anything we should be doing?
- jm: get cullen and magnus to agree on ptime.