DaalaMeeting20130917

From XiphWiki
Jump to navigation Jump to search
# Meeting 2013-09-17

## Agenda

- daala-private list?
- summit innovation fair
- research talks (strategy, pvq, neon)
- coding party agenda
- TF ideas (monty)

## Attending

greg, jm, derf, unlord, jack, xiphmont

## daala-private

- jack: should we create this for private discussions?
- derf: no. this won't be a problem once monty leaves RH.
- ???: To be fair, it's not a problem so much as RH policy is still to keep any inventions secret until preliminary filing

## innovation fair

- jack: should we do this?
- gmaxwell: yes, we should do this.
- jack: i'll sign us up. probably just for toronto (me) and santa clara (greg and others)

## research talks

- jack: tim will give a talk to the research team on halloween. we also thought it might be good to give a talk on pvq as well.
- jm: it probably makes more sense in something at the coding party. i wanted to have these talks to introduce a topic.
- jack: let's give them at the coding party and then decide whether they warrant wider distribution.
- jm: my goal is to get everyone on the same page and a wider talk would be more for presenting something that already works. for this talk i want to be interrupted with specific questions and that would bore people from outside.

## coding party agenda

- jack: invited alon to the meeting so we should give talks on monday or tuesday. probably need a talk on the training infrastructure by nathan since he has a ML background too. pvq talk as well.
- greg: training infra is a good way to onboard new people.
- nathan: we need metrics for them to see if what they are getting is ok.
- jack: we have hte metrics already right, so we just need to document what they are and what hte numbers mean.
- nathan: i meant guidance for what is a valid test and what we're looking for. for example, is it sufficient to test on still images or do we need to include video.
- derf: that depends on whether or not we're testing motion compensation.
- jack: let's plan for tuesday pre-lunch for pvq and training talk.
- derf: we should probably come up with a bunch of projects that people can work on that people can complete in a week.
- greg: there's additional interest around the office. a few people plan to show up.
- jack: we have lots of infrastructural things like merging from reviews, etc.
- jm: I don't like automatic merges.
- greg: separate from the merge stuff i'm fine with it being manual. a buildbot that automatically did reviews by attempting to test and collect metrics.
- nathan: this can feed into arewecompressedyet.
- greg: it would be the same infrastructre but running on uncommitted things.
- jm: it would be nice to upload stuff to try and see what the metrics are. having some kidn of aggregated metric when you make a decision about whether patch is good. ideally you want a single metric to see whether th enet effect is positive or negative.
- greg: nathan's stuff already gives you two numbers like this. because it's across multiple rates it's sometimes opaque.
- jm: when you say multiple rates, you mean it doesn't get the same SNR on one point but at multiple quality levels
- derf: it's looking at two RD curves and computing the gap between them.
- jack: seems like we could farm all this stuff to amazon.
- jm: i guess we want both the numbers and the curves as output. if i get those then i know if the patch is any good.

## TF ideas

- greg: jm, do you understand what monty has been doing?
- jm: i think i do. if it works it makes a lot of sense and it seems to work.
- greg: it sort of works. as tim was pointing out on irc it has problems in the textured areas.
- jm: what kind of problems? you're talking about the version with TF on AC?
- greg: or not
- jm: what's hte problem?
- greg: the area on chapel building doesn't look as good as the lapped transform case.
- jm: maybe 2 months ago i proposed something similar. basically i proposed doing lapping on 4x4 and another level of transform, and all i got as a comment from Tim was JPEG-XR and i haven't figured out why it's bad.
- greg, derf: we said the same thing to monty.
- greg: the same thing that wavelets coders are doing where you don't have ringing but you don't geting any coding gain.
- jm: i'm talking about the case where you do TF the AC.
- greg: that's not what JPEG-XR does.
- jm: my original suggestions was do 4x4 lapping and 4x4 dcts and then do another level of lapping for each coefficient.
- greg: that's not what monty is doing
- jm: i'd like to know what's bad about this. i thought the quality looked generally good? what if we were to apply the lapping on the AC too?
- greg: the only other problem is that it leaves you to switch between 4x support and 16x support. there's no way to do the lapping with 2x that's not just really trivial. there's no room for the rotation.
- jm: 4x4 lapping 8x8 may not be that bad. if you do it as a 4x4 and go up to 8x8, one issue is that you can't do a lapping of 2. i'm saying don't do lapping there. wich is 8x8 transform with 4x4 lapping, and i don't see why this would be bad.
- greg: it has a lot less coding gain than 8x8 lapping.
- jm: it's going to have worse coding gain but better ringing characteristics. and if you're 8x8 you are close to somewhere that coudl ring. it's worth testing but i suspect overall you won't be worse.
- greg: when i switch 8x8 to 4x4 lap everywhere it costs 0.3dB. it's rather a lot. that's including on places where it would be on the border of a 8x8 and 16x16 block.
nathan: your test included 16x16 blocks?
- greg: correct
- nathan: the version i ran only has 4x and 8x. that's why we're different.
- greg: i ran it both ways and got similar results. that's interesting.
- jack: this seems very rube goldberg machine
- greg: this is actually simpler than what we currently have in the codec. things like h.264 and vp8 have second level transforms. vp8 and h.264 both do a degree of second level processing but are limited. monty has been talking about doing more complicated things at the second level.
- jm: do you have a link to the stuff it does on the front of the chapel?
- jack: why do we TF at small sizes. i thougtht it was for 16x to 32x
- jm: because of intra-prediction we don't have a chocie. unless you want intra predictors for every combination of modes you have to use TF. if you have a 16x block and above you use a 4x4 block then you have to TF to 16x16 to do a intraprediction.
- nathan: this is because we trained where all your neighbors are 16x16
- greg: monty is exploring replacing smaller transforms with TF versions.
- jack: does this buy us speed?
- greg: potentially.
- nathan: this is also an alternative to banding because you can lump together the DCs.
- greg: i wouldn't say that. it let's us have signalling choices.
- jm: looking at the front of hte chapel, i'm not sure it's actually worse.
- greg: you have flat areas and textured areas which is the classic wavelet problem.
- jm: with or without AC coeefs
- nathan: we use TF on chroma from luma to predict 4x4 luma blocks from 2x2 chroma blocks.
- jm: this hapens if the HF have too little support. this is definitely not something we should do. i always considered the version where you do TF on AC as well.
- jack: why don't we train on all combinations?
- jm: if you considered all the possibilies you would do millions of things.
- nathan: there is also overhead on signalling. you might have more info than you need.
- jm: if you didn't mind carrying hundreds of MB of tables, then we should do brute force. even if we could train this at scale, there's no way we could include all the coefficients.
- jack: how much data can we ship with encoder and decoder?
- greg: we want this data to fit into L1 cache. couple hundred kilobytes. once we've converted to fixed point and dealt with precision we don't have a ton of free space.
- jack: why do we do so much work on intra prediction in general?
- greg: it's a lot of bits and it's where hte interesting challenges are. 
- jm: if we do the predictors up to 16x16 considering all the possible combinations of sizes, this would multiple the number of predictors by 83k. the three predictors we have take 150k of space. if we included everything the decoder would be 10GB.
- greg: if we plot out the FF download size over time, we'll fit this in a couple of years.
- derf: target cpu complexity should not be a modern desktop processor.
- jm: every 16x16 block can be split 17 ways so there are 17^4 combinations.

## Opus 1.1

- jack: what will it take to ship this?
- jm: i'm temporarily off the critical path i think. what's left is merging the arm neon stuff and then a lot of testing.
- jack: who's doing the testing
- greg: i am
- jm: i did what i wanted to do for opus. there's one minor feature that coudl be done by someone else. the only opus stuff on my todo list is writing an article for the IETF journal. the only other stuff i have is daala.
- jm: i keep getitng interrupted with issues like google dropping opus for complexity reasons.
- jm: i've changed focus on pvq writing stuff with double precision. it will take longer to get to a decoder but the research will be more reliable.
 
# JM updates on PVQ work

- jm: Checked in a double precision rewrite.  Checked on still view hardly codes anything on the still blocks
- jm: Been wondering what to do with the gain, quantize the gain or the gain difference. The difference is that if it quantizes the gain directly then the reference value may not be a possible quantized value.
- tim: Yes, thats a problem.
- jm: One option is to code the gain as an integer, the other option is to code it as an integer difference. tim: I think it should be the second one.
- jm: [what happens when symetric deltas would result in negative absolute gains]