DaalaMeeting20140715

# Meeting 2014-07-15

Mumble: mf4.xiph.org:64738

# Agenda

- reviews
- code party
- https://daala.etherpad.mozilla.org/coding-party-201408
- IETF 90
- https://daala.etherpad.mozilla.org/video-codec-ietf90
- Visual Information Processing and Communication submissions
- tech discussion: new intra pred idea (jmspeex)

# Attending

jmspeex, td-linux, derf, bkoc, smarter

# Reviews

nothing to report. assigned a few to nathan since he wasn't here

# Coding party

Need to ping people you haven't heard from next week.

# IETF 90

tim needs to add agenda items

# VIPC

- d: sent email about this to the internal alias. extended abstracts are due next week.
- jm: i don't like copyright assignment. that's my only issue.
- d: i will raise that with the organizers.
- j: who's going to submit stuff to this? we should submit at least two things, since they want to do a special session on daala.
- jm: i think it makes sense to have a pvq paper
- j: should we have an overview paper?
- jm & d: probably not, can just cover the needed intro stuff in the more meaty presentations.
- d: we could do something on chroma from luma.
- smarter: what about the entropy coder?
- d: what's new there?
- jm: if we did, it would be use of multi-symbol in video coding.
- d: we could talk about the intra coding probabilities and the laplace encoder.
- jm: we could mention it, but i have a feeling that it will be completely rewritten.
- j: what about your motion compensation paper?
- d: it's less interesting if you dont' do CGI, but i can try to come up with some results to make that interesting.
- jm: for what pieces of daala do we actually have results that we can present?
- j: nathan might be a good person for CfL
- d: nathan might want to write up stuff on training the transforms.
- jm: if extended abstract is 3-4 pages, how long is the actual paper?
- d: fantastic question!
- u: oct 6 is notification.
- d: there's no guidelines on length. make it as long as you want.

# jean-marc's intrapred ideas

- jm: i mentioned it to tim last week but wanted to see if anyone had ideas for how to implement it. my latest idea is to instead of doing prediction at the same time as coding, do a first pass trying to get all the edges approximately right and trying to code the difference. for instance, chop the image into differnt block sizes. you code a 1d dct of all the edges, and then you apply an intrapredictor to that (standard type of intrapred based on the border). you get a coarse image and you use this as a standard reference just as if we had motion compensation.
- d: i don't see any reaosn it wouldn't work, but i don't know how you plan on handling the sudden changes at block boundaries.
- jm: you mean dealing with blocking? i'm still trying to figure that out. the other one i'm having a harder time with is how not to do it completely dumbly. the easiest way is to take all the border pixels and code them all with a 1d dct. once you have them all you find the best predictor and then fill all the contents. with the dct you are applying you're only looking at the border pixels not inside the block, and you should probably use those. when you're quantizing the 1d dct, you're not even looking at what's inside the block. you're only looking at what's in the border. so if there is noise in the block, you're not taking that into account.
- d: what you want to do is given that this is the prediction you're going to be using, what is the optimum set of pixels to code there.
- jm: i would like to have it structured such that having a greedy search is not completely dumb.
- d: i don't htink this breaks that. i think the pixels you want to code are going to be different based on the mode you chose.
- jm: anytime you're encoding a block you'll already have the top and left edge of the block with the dct already. then you need to figure out the intra within that block and a vague idea i had is that hte top and left edges are already coded. you decide on the mode you'll be using for every block of the image, and then you code the 1d dct, and every time you code one you look at both sides of that edge. and then you do this kind of blending. on every block you would extend from 2 edges and you do both extensions and then do a blending within the block. this was my idea for trying to not have discontinuities.
- d: my point is that if i'm looking at the right edge of a block and i want to code some dct there. i'm going to use it to predict to the left and right. if the pred mode in the block on my left is horizontal, then that blocks contribution should be the average across each row. but if i look to the right and it says to predict up-right, then i shoudl be looking at the diagonal contribution and predict that. the pixels are going to be different for every choice of intramodes.
- jm: how would you do the blending?
- d: no idea. maybe you would just do the obmc thing. if the mode is horizontal, you could do bilinear weight of left edge to right edge.

(discussion too fast for me to capture about this)

- d: the other thing i dont' quite understand is switching between inter and intra.
- jm: you don't.
- d: i'd really l ike to.
- jm: this scheme is for keyframes. i have no solution for mixed frames. i see one way to do it, but it is another variant and i don't know how i would implement it with reasonable complexity. the idea would be to do it in the lapped domain. backtracking.... instead of encoding left edge and top edge and applying the filter. another way to do this would be to code the 1d dct along the diagonal that is orthogonal to the prediction direction. so this is another way you can code the information there.
- d: you would code the avg. along each of those prediction lines right?
- jm: exactly. you can also get lapping to work with that. i am coding the average as if there was no lapping and it would be like you generate this kind of pattern for an infinite block and then apply the lapping to that. it would tell you what's inside this particular block. that brings out the problem with the scheme you suggested. with my modified scheme, you'd be able to say i predict this pattern in this direction, but the actual 1d dct could not be predicted from anything else. you wouldnt' be able to predict it from the neighboring block.
- d: you'd be able to do DC prediction right?
- jm: yes, but it wouldn't add anything. it would just be an overcomplete codebook. back to the other scheme for keyframes. in the case where you're actually using the left and top edge and predicting, your prediction gives you the bottom and right edge which you can code an error for. if you have a perfect pattern for the entire image your 1d dcts will be zeros. imagine if your entire image is a 45 degree pattern. once you've coded the top and left edge of your entire image, you propagate your pattern with the predictor within a block. once you propagate you get a prediction for the bottom and right edge and then you only need to encode the difference.
- d: i can see why that's harder to do if you're lapping things, but i'm not sure it's impossible.
- jm: if it was possible it would also work for freq domain prediction. i think i want to play with it a little a bit. is anyone else interested in this?
- td: i am i guess.
- jm: i'm not sure how to get started but i'd like to see what results we can get. i don't think we need lapping working as a first step.
- d: i think you can start with the keyframe approach and just not worry about it.
- jm: i'd like to see how good we can get fruits with just this and nothing extra.

# other intra ideas

- u: we have unlapping that we're working on. i have some bad results but no good results yet. the other one is the stuff that greg an i have been talking about, which is to manufacture a 4d covariance matrix and do sparsification on that. no training on actual images just the synthesized stuff.
- jm: what you mean is figuring out the exact modes that vp8 does?
- u: any directional mode. some kind of 2d AR95
- d: you want a limit.
- u: if we have 32 of these at a 16x16 level we coudl sparsify them.
- jm: on unlapping we still have the block size mixing problem right?
- u: yes. what you end up doing is that you try to separate the edges. as you look at certain edges in certain configurations, some of them you can undo all the way to space domain and some are half unlapped. it seems like you'd train for all possible combinations, but that seems an intractable amount.
- jm: what are your bad results? still debugging or what's the issue?
- u: i've got a model that's learned on say the left edge. and i ahve a model for the horizontal mode but i'm trying to learn something for a diagonal mode. i haven't got something that makes sense because it's basically taking trained data for that. i'll post some images of what's predicted and maybe it will make more sense.
- u: the next steps are that horizontal mdoe it works ok. the same issues we had before are still there. if the block sizes are different, how does lapping interact. for a given angle you should be able to do a much larger least squares regression. you should be able to predict a much larger region around that border. i haven't had a lot of luck with that one yet. the idea is to do something that was separable, because then you could do them independently, but i don't think you can really separate them.
- jm: on a different note, what is monty up to?
- j: monty has been doing his journal this week! http://people.xiph.org/~xiphmont/journal.txt