DaalaMeeting20140415

From XiphWiki
Jump to navigation Jump to search
# Meeting 2014-04-15

Mumble:  mf4.xiph.org:64738

# Agenda

- reviews
- Discuss JM's experiments
- Google meeting

# Attending

gmaxwell, unlord, jack, jmspeex, derf

# reviews

no one waiting on reviews

# JM's experiments

- jm: did a bunch of experiments in random directions. see status update for the list. i tried splitting the lowest band in two. i tried both splitting along horiz and vert and putting the diagonal on either side. and i tried having horiz and vert in one band and diagonal in one band. that one was the worst of the two. splitting horizontal vs. vertical gave similar results--maybe slightly worse.
- derf: i assume the issue is overhead of the additional parameters.
- jm: i don't know
- d: how do we answer that question?
- jm: i don't know
- g: we could measure it while cheating with the additonal parameters
- jm: it was very close. let me get the numbers. i was getting stuff like some metrics being 0.6% better, psnrhvsm being 1% worse. that sort of scale.
# Google meeting
- d: i don't know if you noticed the range of the changes the google team was talking about while we were there.
- u: it felt like a one way conversation. they would like us to prove these techniques would work in libvpx and they'd love to use them.
- d: they phrased it better than they have in the past. the way they put it was that they wanted to build a library of techniques and then pick the things that worked. i don't think they quite fully understand that our techniques require some fundamental changes to the codec.
- d: ?? would they be upset if they took some of our ideas and the answer is absolutely no.
- u: they wanted us to show it would be an improvement in libvpx
- d: that's what debargha wanted :)
- g: jim gave a different perspective on it. after we left the meeting we had some additional conversation about the ietf.
- d: they wanted to know what the ideal ask was. so i sent him an email with that.
- u: that sounds promising.
- jm: we should discuss lower level things with them like our investigations on entropy in block 
- j: let's make an etherpad for an agenda and get the next one schedule. put that item on there.
- d: i talked to them about june. will ping them.
- jm: in general we can talk about minor things we tried and things we are trying to figure it out.
- d: i tried to bring up metrics, but they didn't seem to have any great ideas there either.
# JM's experiments part 2
- jm: that splitting thing was worse with all metrics with activity masking.
- d: did you do interband masking?
- jm: i did not. that may be part of the problem. basically it was between 1 and 2% worse. i tried doing RDO tuning and i was able to improve all 4 metrics and the images look worse
- j: patent that and submit it to mpeg!
- jm: if i disable intra on every band except n=15 for 4x4 and 8x8 it's a wash.
- d: it was like 0.3%.
- jm: less than that.
- d: it's not a surprise that the current predictors are not doing much in HF.
- jm: my thought was that especially for 16x16, ?? trying to predict the first 15 properly. and not waste our nonzero coeffs on HF since we're not going to nail that anyway.
- d: isn't our training already doing that/
- jm: maybe for scalar it's not such a bad idea but for pvq having one nonzero coeff in the high band is completely useless. if you predict just one HF coeff you don't gain anything in pvq. in scalar you might have a tiny gain.
- g: i think if we're not predicitng the HF then we lose a lot of what intra is supposed to gain us.
- jm: for 16x16 we're not even able to predict the LFs enough that it becomes useful, so i don't see how we're going to predict HF anytime soon. one thing i was thinking aobut was can we do something along the lines of intra block prediction--predicting HF from LF?
- g: our original training tools were set up to assume a particular scan order and it added the previously scanned coeffs to the prediction
- jm: that won't work for pvq. something linear is not going to work. i don't see how something linear would work because if it did work, then the transform would suck. the dct is supposed to decorrelate all components. i don't see how it is going to help to have something linear.
- d: the nonlinearity comes from a) you're looking at the residue at a particular mode and b) biorthogonality issues, but those are probably minor.
- jm: even if there's a biortho issue, it probably means your thing shouldn't be orthogonal.
- u: whatever happened to the idea of undoing the lapping?
- d: i don't know how to do and it wasn't baked yet. i haven't worked on it yet.
- jm: i couldn't think of the entire system, but the general idea i had was something along the lines of trained VQ. training a codebook of HF (or one for each band), and then your codebook is 256 entries and based on the freq ???? that gives you a set of maybe 4 or 8 code vectors that are allowed given the low frequencies. it makes your search faster and what you need to code is not 8 bits but 2 or 4.
- d: i've been thinking of something along those lines for trying to use when the predictor is 0 or bad. somethign we could use to fill in HF to match whatever energy we had encoded. i hadn't thought of coding it conditionally based on the LF stuff. i'm not sure how that would work.
- jm: i know a way it could work but it would be too complex. take the LF and do a quantization (k-means and figure out which vector is closer among a set of 256).
- d: that's too complex. you could do it over 4.
- jm: my idea is to have just enough to be able to learn what happens if you have an edge in a particular direction.
- d: we could just classify "do i have an edge in a direction" for a small set of directions to start with right?
- jm: what do you mean?
- d: instead of learning 128 of these things, build a classifier that tells you that you have an edge in direction X for some small set of directions.
- jm: we need directions and offsets. this is not a predictor
- d: that makes it much more of a mess. i don't know how to do that in the freq domain.
- jm: the other thing i thought about is instead of using trained VQ to classify the LF, using PVQ. But then the problem is that at most we'd be able to use three pulses and they'd end up in the lowest freqs so there wouldn't be many combinations in practice.
- d: yep. i think you have almost no information about whether or not you have an edge.
- jm: the only thing i tried so far was the case where i didn't code anything, just trying to predict. doing VQ with a codebook of 256 in the low band and using that to predict the high band. that had very little prediction effect. it's quite possible that having just a few bits in the HF would have much more impact. i assume in many cases the HF could go one way or another but if you average them you get nothing.
- d: you're going to run into phase issues
- jm: you resolve phase with ???. i tried to predict ??? with the sine coeffs. there are a few factors that are skewed by 25-75% but if you looked at overall we'd end up saving 0.1%. i haven't looked at what we gain by coding.
- d: how would this work for inter?
- jm: why would you want to do this? MC should be a lot better than what we get out of this.
- d: it's a lot better than what our inter is doing right now. but correlations in the HF is going to be around 75%. it's a lot small than .99 which is what i would like.
- jm: i was tyring to get to a point where it's better than no prediction at all, because that's where we are now. ???
- g: the next thing we could look at is a time domain predictor
- d: have we measured the prediction power in the HF with the VP8 predictors? turn off lapping and use the DCT and we have the VP8 predictors implemented.
- jm: we see from h265.. the fruits image at ridiculous rate... we can see it's possible to predict something.
- d: my question is if using the same metrics you're using how it compares. that should tell us what is possible to do.
- jm: that's totally different. right now we have nothing. doing anything at all is already better.
- d: but i want to know what is achievable.
- jm: i don't know.

# activity masking

- d: i did some playing around but i didn't find a whole lot. didn't do that badly on the worst examples i could come up with. it does make text a little bit worse, but it's more moving noise around and maybe adding some, more than severely degrading anything. it's not that bad. i wanted to compare it to some of the scalar quant stuff i've been working on, but it's not ready yet. you point out that we can reduce luma DC resolution at 4x4.
- jm: i was thinking the larger ones. probably all of them though.
- d: the way scalar quant matrices work is that DC needs more resolution than the first two AC coeffs.
- jm: one thing i experimented with was trying to have DC be much worse for luma and i found it surprising how much error we could tolerate in the DC. i suspected it was because in JPEG the error in DC causes more blocking and we dont' have the problem.
- d: humans are insensitive to DC shifts. we may be able to get away with worse than traditional codecs do there. i assume that we're terrible on metrics on your images?
- jm: i didn't even look. tim, you mentioned not being good on text. AM is only enabled on 8x8 and 16x16 and disabled on 4x4 because you don't want to do it on edges. it would be interesting to see what it does on 4x4 but i haven't really tried because i didn't expect it to improve.
- d: i'll turn it on and run it on the text again and we'll find out.
- jm: http://jmvalin.ca/video/iena0.png http://jmvalin.ca/video/iena1.png http://jmvalin.ca/video/iena2.png http://jmvalin.ca/video/iena3.png
- jm: 0 is normal. changing resolution by 2^n. 8 is definitely extreme.
- d: did you just change the quantizer?
- jm: i only changed the quantizer. the effect is more noticeable on airforce.
- d: that's because that image is only DC.
- g: you can see in #3 that there is a band of darkness in the upper left corner.
- d: what it suggests is that the sky in the other images happens to be close to a quantization interval.
- jm: one thing i considered also was look at having large steps except for the very first step. if you're on a gradient you can code the small differences but for larger changes you don't have that small of resolution. image 0 is 104kB. image 1 is 101.5kB. image 2 is 99kB. image 3 is 97kB. it's about 2.5kB every time i double the error.
- d: a bit over 2%. it seems like the step size you needed to code gradients correctly depends on the slope of the gradient.
- jm: why would it?
- d: i have some huge gap in my dc quant levels. at some point i guess it's slow enough we can take one step.
- jm: yes, i mean just in this region. i'll show you airforce and you'll see what i mean.
- jm: http://jmvalin.ca/video/air_dc0.png http://jmvalin.ca/video/air_dc1.png http://jmvalin.ca/video/air_dc2.png http://jmvalin.ca/video/air_dc3.png
- jm: Maybe most of the rate would be in that first step.
- d: Yes, but you keep pointing out how much the tails matter.
- jm: Because of scaling I don't think we can even predict a flat image correctly when the block size changes
- d: We certantly need to do something about the scaling at least for DC
- g: So we use a table of scaling factors
- jm: There are too many even just for dc
- g: so we scale things to a common reference e.g. 16x16 and then scale back so it's two multiplers per size
- d: there were reports on IRC that 4:4:4 was broken, anyone want to look at that?
- u: I'll look at that, I think I noticed that a while ago and didn't follow up on that