DaalaMeeting20140401

From XiphWiki
Revision as of 16:05, 6 February 2015 by Daala-ts (talk | contribs) (Add word wrapping)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
# Meeting 2014-04-01

Mumble:  mf4.xiph.org:64738

# Agenda

- reviews
- Do we keep mumble?
--  But it finally works!
- the path to crushing jpeg
- path with PVQ?
- Are we finally happy the transforms are working right?
- Basis magnitude scaling

# Attending

jack, derf, unlord, gmaxwell, jmspeex

# reviews

- g: i'll take 226, 228, 229

# mumble

- jack: do we keep mumble? what alternative?
- nathan: tested vline with josh aas and it worked fine.
- jm: i would consider vidyo.
- j: we can invite external people, but they'll need the client.
- jm: can they dial in?
- j: yes.
- jm: we can keep mumble for now.
- j: ok. stay on mumble for now, switch the next time we have problems.

# path to crushing jpeg

- j: we're beating jpeg but not enough to impress in honolulu.
- jm: metrics or actual images?
- j: whatever helps
- g: both is best.
- d: we need to make things actually look better. that is the ultimate goal. we can talk around curves that don't look so great if we can show that shutting off things that make image better make the curves better. it's a good part of our process that we don't have to care about the metrics.
- u: i feel like we should be able to quantify this in a meaningful fashion.
- d: that's a research project in itself
- u: josh aas is doing exactly this
- d: we added the sensible ones and only one metric is showing improvements based on jm's visual improvements.
- j: all the metrics are some flawed approx of human vision.
- u: PSNR-HVS-M has human in the name!
- g: for the honolulu meeting, we need to figure out a way to convert better visual performance and convert it to something that looks good on paper. we can't present images in the IETF, we must present numbers.
(discussion of honolulu bof strategy)
- g: as far as where we need to be, do we need to beat webm?
- d: that would be nice.
- g; it would certainly help. where are we on the graphs?
- j: not even half way to vp8.
- j: do we need to beat webp?
(some discussion)
- j: summary: we don't need to beat webp for honolulu but do need to be better than where we are. what about what to do to fix performnace?
- d: i don't want to spend more time on intrapred. but i have some ideas. instead of trying to train a predictor for specific modes, start with just horiz and vert modes. first, undo dct part of transform and then can i explicitly undo the lapping? given that i have my AR95 model of my image, what is the optimal predictor for the pixels that i don't have for hte pixels i do have for the output fo the transform.
- jm: AR95 in one direction and trying to predict in the other direction?
- d: i'm looking at this as the 1d problem.
- g: what is the extension of the 1d line assuming you just go horiz and vert.
- d: right. this is where it starts to get very fuzzy. once you do that can you do it for diagonals?
- g: if that actually works it may make the rest of what we're doing already work better. you drop that in as a fixed mode and then see what happens to the rest.
- d: all kinds of stuff yfou can do. that's why i don't want to spend time to make it better.
- g: i think with all the mode stuff it may be the one of those cases that as long as we nail a few things very well, teh perofrmance of the others is not as important.
- d: clearly it doesn't seem true since h265 people added tons of additional diagonal modes.
- jm: so you're thinking of finding a closed form solution
- d: specifically there's a different one for each kind of lapping you're using.
- jm: i think that's part of the problem we have right now. we're using the same predictors regardless of which transform size we're doing. we're using TF and mixing stuff that has different basis magnitudes.
- d: that's certainly a problem mixing det1 and not.
- jm: even without it you won't get the same magnitudes. if your intrapredictor is trying to be fancy you're going to be way off.
- u: our predictors are coupled t the size of the transform. for the other codecs, that's nto the case.
- j: open loop can solve that right? do everything at 4x4?
- d: i still want to try that.
- jhM; still has basis problems
- d: yes, but that's what we slated for interns to look at.
- u: also more modes at larger sizes may help.
- jm: right now 45 degrees is not working.
- d: i want to spend time on this other idea, and figure out how to extend it to diagonal modes.
- jm: until we fix the ones we have it's no use adding more.
- j: what do we want to work on besides intrapred?
- d: block size switching
- jm: i don't think the problem is with switching
- d: right. we just want larger transofrms to work.
- g: i'm convinced that the transforms are not the problem. i wanted to know if other people are convinced of this?
- jm: do you mean implementation or use of dct?
- g: our implementation.
- jm: i've always been convicned of this.
- d: you were the only one. i wasn't convinced because i wrote it.
- j: why else might they not be working?
- jm: in the other codecs, 16x16 works, but not for the reason people pretend it works. right now it's not helping us to understand that. i don't think it's about coding gain.
- d: what's it about?
- jm: i don't know. basically i've spent some time measuring coding again on nonlapped 16x16. it's a tiny bit higher than 8x8 and if i actually compute distributions and their entropy then the entropy is worse or similar to 8x8. nothign there to show that it will perform better than 8x8. other codecs are getting better rd curves with those block sizes, but that's why i think it must be something else.
- d: you've never tried forcing everything to 16x16 or 8x8 in the other codecs?
- jm: i have not. i have tried measuring our forced 16x16 dfistributions. for example in one case, what i did was tune the BSS code to tend towards smaller block sizes so only flat blocks get 16x16. so then i used that classification and did it 8x8 and 16x16 so it was apples to apples. and even in that case it wasn't great. one thought i had was that for low bitrate you're not actually coding the high frequencies much, and that appears to be why 16x is doing worse than 8x.
- d: you're assumption of order 0 entropy may not be accurate?
- jm: it's possible but not the first thing i think of. it's also possible that there are many problems with the larger transforms each getting us a tiny bit worse. the problem there is you can't isolate anything.
- j: do you compute CG from math or from implementation?
- jm: i am measuring stddev from actual coeffs.
- g: do these agree with covariance matrices?
- jm: i didn't spend too much time, but there was nothing suspicious there. i've also been interested in looking at distribution gain which is entropy of distribution - log2 of stddev. basically how hard the actual distribution is to code. for this i was seeing 4x4 easier than 8x8 easier than 16x16 at equal stddev. it seems like ti would help to have someone who knows what they are doing to look at this.
- d: which things should i not look at? :)
- g: most things we can turn off an get anomalous results so it must be entropy coding.
- d: run some images in another codec and see what it does for block sizes.
- g: if we code images of sky with no landscape it should be all large blocks.
- d: go run that in another codec an dlook at the decisions.
- g: the fact that BS decisions are based on perceptual model and not CG.
- u: what about brute force BSS that minimizes rate, but we don't just care about rate. but ???
- d: we should figure out how to set it, everyone else does
- g: i should go try to implemetn this then. i think i need to implement some stuff to snapshot entropy coder state first, but it's all pretty simple.
- u: as part of these alternatives, instead of using lapped, do no lapping and use time domain prediction and code the rest using dct and pvq. that would be a different test that isn't us trying to analyze other codecs.
- jm: from the tests i ran, before running actual distributions, and 16x16 with no intra and nothing else was also not helping.
- u: i'm wondering if having time domain prediction is what's making the larger transforms work well.
- g: there are image codecs that are state of the art that odn't use intra at all, just BSS. that's why i think it wasn't intra that's causing most of our problems.
- u: do you have a name of one of those?
- g: ADCT. no open source, but windows binary.

# path to pvq

- jm: mostly i want to figure out what is the next steps?
- d: it's getting increasingly harder to compare to scalar as it grows more thigns that scalar doesn't have. i don't know what to do about that. unless you want to implement RDO for scalar?
- jm: AM is something i can do, but RDO i tried and failed before.
- d: i don't think anyone else has any hope of figuring out your k tokenizer.
- jm: the final rate it should be using is not much different than if you were coding coeffs independently. coding one extra pulse still has a cost and coding low frequencies is still cheaper than HF. it shouldnt' be really different than coding coeffs separately.
- j: so work on AM first then RDO?
- d: let me clean up quan matrices first.
- j: what's jm working on until then?
- jm: there's a whole bunch of cases in pvq where the ref vector is 0. i haven't yet checked whether it's because the area is so flat that there's nothing to predict or ... in any case, there's no point in coding flags and such in those cases. so right not we code them uselessly. ref = 0 is 1/3 of the cases i looked at. i still have on my todo list to include actual rate metrics, running the dct inside BSS.
- g: i was planning on doing something that was brute forcey. just code block every different way and measure performance coming out and not using perceptual metrics at all. what you're suggesting is halfway between those and i should try that too. i want to see what the largest improvement we can get from that path is.
- jm: with or without the prefilter?
- g: i'll have to figure out how to handle the prefilter.
- d: you're going to have to make some approximations.
- jm: an obvious one is not to run the prefilter. what's the status on training them?
- d: not spending more time on that.
- g: i don't want to spend any more time on it right now. read my status report from last week. i don't believe there are any prefilters in that design space that dont' have worse DC response but better coding gain.
- jm: my thoguht was let's get rid of det1 for now, but right now we can use the DC basis metric to improve our non-det1 filters. especially the 16x. in some cases it seemed to be doing a bit of blocking.
- g: it's easy to spin up the code to do that. i'll do it.
- jm: and also shoudl we move 4x away from det1?
- d: keeping it det1 means we can use it for lossless. the question is is there a lot to be gained by making it non-det1.
- jm: didn't it make fastssim worse?
- d: i don't remember
- g: we can look. i'll check that. i don't think it made anything worse and i did visual inspection too.
- d: i think it also wasn't much better and that was disappointing. i can't remember if it was before or after scaling.
- jm: in general these metrics are not that reliable.
- jm: all of the basis functions have a scaling less than 1.
- d: yep
- jm: so when I apply the scailing the rd curves all change and nothing matches
- d: because none of your lambdas match
- jm: the RD is really not doing anything now that we're scaling up.