From XiphWiki
Jump to: navigation, search
# Meeting 2014-07-01

Mumble:  mf4.xiph.org:64738

# Agenda

- reviews
- code party
  - https://daala.etherpad.mozilla.org/coding-party-201408
- gmaxwell's experiments
- Monty requests (please discuss, shoot some notes):
  - Continuous testing/integration plan or at least scope (for budget)
  - Rough list of tasks we could be doing right now if we had more engineers / resources

# Attending

gmaxwell, derf, jmspeex, td-linux, bkoc, jack, unlord

# Reviews

looking good

# Coding party


# Greg's Experiments

- g: i've been working on using multiple frames in MC. i've done two things. one is that i added additional buffers to do the initial search against the original frame (uncoded frames). on the clips that i tested on, doing that didn't seem to improve the results at all, but didn't hurt either.
- jm: i would not expect it to help the curves, but visually. mostly at low birates.
- g: i will check visually. I hadn't looked at it yet. What clips were you testing on?
- jm: I didn't observe cases where we needed that. It was a general comment. I assumed it might help on parkjoy. Also another reason I wanted to have the original frame lying around was for temporal RDO; a simple causal model. And that requires the uncompressed reference. So it's useful to have it lying around anyway. I don't know if we want to motion compensate the old reference though. 
- g: for temporarl RDO i thought what we'd do there is remember the skip decisions from the prior frame.
- d: I don't think skip decisions helps you at all. You're tlaking about feed forward RDO right? You think it's not changing much so you bump the quality and now it changed.
- jm: it's moer that I coded it at some precision and on the next keyframe I'll try for slightly better accuracy
- d: keyframe? for one then it's not a keyframe anymore.
- jm: i'm talking about encoder only temporal rdo. the main thing i wanted to work on was something simple for temporal rdo that helped backgrounds not moving.
- d: why wouldn't we do a two pass thing?
- jm: i wanted to do something that wasn't optimal but would catch the obvoius stuff.
- g: i'll get what i did committed even if it's not useful for motion search because it sounds like you'll use it for other things.
- jm: the other thing I had discussed was encoding theta based on how much actual change in the image there is.
- d: in other words, when we've encoded crap
- jm: we've been talking about conservation of energy, i'm talking about conservation of amount of change. like conserving film noise based on theta in the compressed image. i'm not going to be working on this any time soon.
- d: in film noise, i think of it the other way, as adding changes
- jm: you can work it both ways.
- j: greg, what was the second thing?
- g: i have no clue how to do the prediction of the vectors when you change from one ref frame to another.
- jm: now you're talking about golden frame right?
- d: i thought about this a bit. it seems to me that probably the simplset thing you can do is to have a seaprate grid for each frame and propagate predictions across superblocks. when i actually code a vector from the golden frame, then do the splitting down in the grid to come up with a rpedictor for it for all the vectors i don't have for the same frame.
- g: so every time you have something you flood fill so you have a predictor?
- d: exactly.
- jm: my other proposal is you learn the offsets between the golden frame and the previous frame. if you've just coded MV based on golden frame say 10px up and 1px left, you learn the offset 10,1 and as you code more you learn this avg offset.
- d: basically you learn a conversion factor to convert between the two frames.
- jm: basically. you have two choices when youc ode something. you can decide you're coding an actual prediction residual and you can also say that the value you're putting in for the prediction is 0 if you don't want the conversation factor's failure to cause problems. the prediction residual that we're coding is used to predict other superblocks.
- d: not the residual the vector you get at the end, but yes.
- jm: i'm just saying you coudl put in the predicted value. there's what you coded and what you'd use for prediction, and it doesn't have to be the same for both.
- d: why wouldn't you use what you actually coded?
- jm: at the very least you would want your list of superblock vectors to be all as if they were from the previous frame. you may also want instead to copy a prediction from the golden frame, converted as if it was from the previous frame.
- d: i don't understand. i have a vector i wat to code from the golden frame, the neighbors are from the previous frame. i have a learned offset, i apply the offset and now i have a vector in the godlen frame...
- jm: in terms of updating the state for the prediction. there are two choices. the obvious one is to undo the mapping and put the value there. option 2 is to take your oirignal prediction you made from previous frame, before you added the offset, and you put that there.
- d: that's what i was saying before about each frame having it's own separate grid
- jm: nope. you're not adapting anything for the golden frame. you're treating the previous frame as different. based on the assumption that you're only using the golden frame once in a while.
- g: you'd wan tto use it where there were occlusions.
- d: what i believe jm is proposing for the previous frame is identical to what i suggested. but for the golden frame it is different.
- g: i could try both of those relatively easily
- d: here' shte fun case i don't understand. i'm coding a MV in the golden frame. I have two neighbors in golden and two in previous frame to inform the prediction.
- g: i don't think it is a complex case. 
- d: if you store the predictor in the previous frame, that's fine. but i still don't think it explains how to do the training.
- g: i think this might actually be bad in the case where it turned out you were always referencing the second frame; then it would be kind of pessimal because the predictors would all be stupid.
- d: not necessarily because you'd learn that offset.
- g: in the case wher eyou back propagate the offset, that owuld be ok. but if you flood forward you wouldn't have a prediction.
- d: all your prediction would be the offset. which would be bad.
- g: alternatively if hte neighbors are in the golden frame, then use the neighbors. i don't know when you'd make that cutoff and the code would be complicated. i think i need to do some work to figure out how often these predictors would be chosen.
# Continuous Integration
- j: jmspeex suggested N * 1hr for the machines. how long do the test vectors take to encode?
- jm: depends on how many frames, etc. i made a rough approximation a few weeks ago and it looked like N would be a few dozen machines. it depends on how good we want the test to be.
- j: what kind of stuff would we be doing if we had more engineers?
- j: fixing intra?
- g: broadening to still image related stuff
- d: that seems bad.
- j: so saying improving intra is probably the better way to phrase it.
- jm: the deringing filter is a job for a lot more engineering.
- d: we have no one doing research on improving complexity. not making it faster, but cheaper or more parallelizable.
- jm: better encoding search is something useful that usually people do after the format is frozen. i think it's better to do while you're still working on the format.
- d: the reason people put it off is that you can put out a format without doing that so they don't. one of the dangers you run into is doing what google did where you optimize for what you can search and if you can't search for it so it doesn't go in.
- jm: it's also bad to put in stuff you don't know you can search either. opus was a good example where optimizing quality during development helped a bit. we probably don't want to go as far for video, but all the implicit stuff in opus was a win, which you had to optimize in the decoder. if you wait to optimize after bitstream freeze, you're going to realize you are coding things the wrong way. there are bitstream implications. I also have crazy ideas for coding MVs that we don't have time to implement ourselves.
- j: should we go through the roadmap ( https://daala.etherpad.mozilla.org/daala-plan-2014 ) to see what we aren't working on?
- d: yes, and also where the holes are in that list.
- jm: there's tons of PVQ experimentation to do as well that I don't have time to do right now.
- d: td-linux, are you interested in working on that?

# Jean-Marc's Crazy Idea Shop

- jm: I did this experiment with intra, and tried to do it for inter. ??? seemed like vq could be useful there depending on what the reference looks like.
- d: might work.
- g: do you have an idea about how to manage the complexity
- jm: the multiple codebooks is not an issue since they'd be selected based on what dimension was used for hausholder. if using the angle and hte gain you found you wanted k=2,3 then you'd use this codebook but not otherwrise.
- g: how would this compare to a change of basis / rotation?
- jm: that would be more expensive.
- d: how to not do one that is O(n^3)
- jm: O(n^2) since matrix * vector. are you aware of cases where this wouldn't work?
- d: i'm trying to think of how this would interact with aliasing. if you are using an edge to predict an edge, the error you get is aliasing error. where does that show up?
- jm: my thought is just that if your edge was horizontal, then aliasing error would form a horizontal line.
- d: yeah
- jm: i don't know where or what it is, but my codebook would not contain vertical lines.
- d: ok I see.
- jm: I was just wondering if this could be generalized further to use finer temporal correlation.  Imagine we were coding 1px at a time, the variance you would pick for a pixel would be related to the local energy in the reference...
- d: Yes/no, certantly if your weighing your quantization that way
- jm: no no even with a flat quantization, you're likely to have high variance in the residual when there is high variance in the image.
[bunch of conversation that I missed transcribing because I joined in]
- d: There are existing image coders that take advantage of this in a single frame subband by subband, e.g. using the statistics for neighboring blocks in the same subband to adjust the entropy model
- jm: So I may experiment with doing the training with one set per housholder mode