DaalaMeeting20140729

From XiphWiki
Revision as of 16:00, 6 February 2015 by Daala-ts (talk | contribs) (Add word wrapping)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
# Meeting 2014-07-29

Mumble:  mf4.xiph.org:64738

# Agenda

- reviews
- coding party
- EDI bad results: https://martres.me/tmp/edihv/ vs https://martres.me/tmp/sinc/ see also https://martres.me/tmp/edi/
- Intra paint: http://jmvalin.ca/video/fruits_quant16b.png http://jmvalin.ca/video/fruits_quant16c.png
- Monty to Orban?

# Attending

jmspeex, gmaxwell, bkoc, smarter, xiphmont, unlord, derf, jack, TD-Linux

# Reviews

- smarter: i'm waiting on https://review.xiph.org/359/ https://review.xiph.org/358/ https://review.xiph.org/357/
- jack: don't assign all reviews to tim by default :)

# Coding party

- jack: anyone heard back from anyone?
- jm: tmatth is confirmed
- derf: mo from cisco is going to send 1 and one who might participate remotely.
- unlord: silverdev says he would love to participate remotely
- x: david schleef will be remote
- jack: doesn't he live in bay area? :)
- jack: who invited MikeS?
- gm: that was me. I will try again.

# EDI Bad results

- smarter: right now it doesn't work and i'm trying to investigate why. tim suggested i look for DC shift, but I don't thin it's a problem. He suggested I do a psnr for each kind of pixel and I'm looking into doing that, but not sure what to do exactly.
- x: what is EDI?
- d: edge directed interpolation
- jm: i've never been convinced we want EDI. I don't know if it can actually help. To test whether the implementationw orks we could start with a 4k video, downsample it to 720p by only decimating with no filtering. this would create aliasing that EDI is able to compensate for. the reason I'm not convinced that EDI works is that there is already filtering in image acquisition to lowpass the image before it's captured. my reasoning is basically that you could create the sampling for which edi is supposed to help by taking a high res image and decimating the pixels. if EDI doesn't work on that, then it could be a bug in the implementation.
- smarter: i'll try that.
- jm: if it does help, but not on normal videos, might be that EDI itself doesn't work.
- d: it's worth trying this. it's easier to do this than the classification stuff. the interpolation code classifies pixels already.
- s: it doesn't ???
- d: but i'm not worried about the source pixels. as long as you are doing actual interpolation those are never going to change.
- s: i thought you wanted to do classifier psnr compared to the original video.
- d: no i was looking more at like if you do a downsample and upsample, then the question is whether is one of the modes/classes broken. if none of those stand out, instead of using david's filters, training a filter for each class. you should be able to do no worse than a normal linear filter.
- s: i'll try that. thanks.

# intra paint

- jm: this is what i've been working on for the past week or two. we discussed last week. it's for intra prediction.
- g: can you post an image for your latest results? the previous one had banding, but I think you fixed that.
- jm: http://jmvalin.ca/video/fruits_quant16b.png and http://jmvalin.ca/video/fruits_quant16c.png
- d: what happened at the bottom?
- jm: forget about that. it's an edge effect from not padding the image. the boundary is not handled correctly.
- x: there are several places where it looks like a decision error.
- jm: can you give me an example?
- x: first image, along the bottom where two sticks above the main branch make an H.
- jm: what area of image relative to the fruit?
- x: down and to the right of fruit. sharp edges are being interpolated directly horizontal. it doesn't look like there is anything horizontal there. why is it doing that?
- jm: this is probably an effect of blending across an edge. the other issue is that it hasn't work very well but if you remove the b or the c image and look at quant16. that doesn't have what i thought you meant.
- x: later on I will look at it closer and circle things I'm noticing.
- jm: the one thing i know is an issue is i have a dc mode that is attempting to do some kidn of interpolation to do something smooth over the block. this mode does not get picked up naturally; i have to bias towards it. the difference between quant16 and b or c is that i'm biasing for the dc mode. the biasing causes some directional blocks to be classified as gradient. another thing that is kind of buggy is the edge prediction. for each block i'm predictiong right and bottom using top and left. doing that prediction reduces bitrate and artifacts because it makes things more continuous. there's still some bugs with some of the directions. i need to improve this. these images are 16x16 only, and i'd like it to be variable block size, but i don't have the BSS code yet, because it has to be a completely different decision than what we have now.
- d: i'm trying to figure out how to integrate this with the motion compensation.
- g: if you imagine the intrapred that is producing an image first, then this could fill in holes.
- jm: intrapred costs something. it's signaled. you can't just run it on the entire image and then run MC on top.
- g: but if you run MC on most of hte image, you could use this for filling.
- jm: you could definitely do that, but you don't need this technique for that.
...
- jm: either you'd want to resuse the edge, and you would not be able to optimize that edge for the block on which you are doing prediction. so you might have slightly worse predition but it's probably worth it.
- d: that's basically exactly what i want to do. using anything other than the MC, then you're going to have to blend some extra thing to avoid the discontinuity.
- jm: mostly it's just going to cost you extra. another thing you could do is if you decided you wanted to code all the edges, then you'd have to do blending on the other side, but you'd have to encode more side information.
- d: that's what I just said.
- jm: if you had a block in the middle, you end up coding 9 modes.
- d: i figured you'd just continue whatever mode you had into the adjacent blocs. the point is that you'd have to do some kind of blending.
- jm: if you decide to code just the inside then you will probably be considering more blocks to be intra. and if you did it on the outside, you'd you'd code less. so it would probably end up the same amount. so far it seems like there is actual potential to beat 265 keyframes with this.
- d: i'm mostly worried about the blocking artifacts in all the images, but i assume you will fix that.
- jm: these are extremely low bitrate, but sure. i still don't have good edge prediction and block swtiching should make a big differece. speaking of BSS, do you see an issue with going to 64 just for this?
- d: i've been trying to avoid it. right now we do multiple passes but that's not how this is going to be done eventually.
- jm: what does it change if we have these ultra-blocks for only a few things?
- d: it doubles the buffering, because you have to get 64 lines at a time. i'm not saying no, but...
- jm: you could still code everything in superblock raster order.
- d: we are absolutely not coding things in multiple different orders.
- jm: what is the problem with that?
- d: the problem is that the encoder has to make the decisions in the order you code things. if you make it delay this extra stuff, the search problem becomes much much harder.
- jm: what i had in mind you scan in superblock order, and any time you have a superblock that doesn't have the 64 data, you just code it there.
- d: what?
- jm: on even superblock rows you code it, but on odd rows you don't.
- d: you're missing the point. i've had to make decisions about the odd row before i coded anything, so i'd have to buffer all those until i code it. even just the loop filter in theora made this hard.
- g: it makes it hard to make jointly optimal decisions.
- jm: put it this way. it's already hard enough to do the decisions even without considering the residual. so i don't think we'll be considering how to code the residual anytime soon. well ok, i'll go with 32x32 for now. i'll need lookahead by at least 32 anyway. i assume that is fine?
- d: Yes, we have to do that for MC anyway.
- jm: monty you said you were going to give it more thought?
- x: i have, but nothing useful yet. one thing is rather than thinking in terms of blocks, think of doing things with a sweep from top down and from one side to the other and try to think if that relaxes any constraints. the thought is that you wouldn't necessarily be blending on block boundaries. you would be predicting edges from top to bottom and from right to left, but you wouldn't be trying to predict a half circle within a block. my ideas on that are not well formed enough yet. i want to see it predict something that is actually hard to predict. i think we're goign in a direction where the prediction isn't giving us what is most useful. i have no idea what CG you are getting from this. it's possible i'm completely wrong.
- jm: i'm not yet sure what kind of rate we can afford for this. right now i'm coding pretty coarsely.
- x: i think we're in a state where neither of us has any results we can compare. i'm going to keep thinking about it. i'm mostly thinking about javascript and emscripten right now.
- jm: the thing i'm working on right now is the edge predicting. sometimes you are missing some pixels for the edge, like the up right directions. so what i've been trying to look at is either coding one edge without prediction and then using the right edge to code the bottom edge, and possibly in some cases reversing the order in which things are coded. for example you might have most of the bottom edge that you are able to predict from the top edge and use that edge to predict the right edge.
- x: i can see that working for some directions.
- x: I can see the minutia mattering once we have numbers.
- jm: One thing I need to solve is the "DC mode" this is where a directional fill would create edge like artifacts in a smooth block. See 16.png (not b/c). What the dc mode I've implemented does is interpolates using 4 pixels (up,down,left,right edges of the block) and interpolates with a weight inversely related to the distance, which is bad but works better than just using a directional mode.