DaalaMeeting20150512
Jump to navigation
Jump to search
# Meeting 2015-05-12 Mumble: mf4.xiph.org:64738 # Agenda - reviews - increasing the range of Motion Vectors? - coordination of 1xMC and full precision - PCS submission - Screencasting (update, scope) - Why do we suck on VC? - IETF hackathon (please register) # Attending unlord, TD-Linux, derf, xiphmont, yushin, tmatth, MrZeus, jmspeex # reviews derf still owes jm and yushin a review each, possibly someone else, hasn't started, been busy on Chroma and QM interpolation work. TD: Nathan should review 731 and 704. They're easy. jm: I approve of 731 derf: I owe tristan 677 jm: [construction noises] #range of motion vectors smarter: experimented with increasing range of motion vectors to -63..64. Does not affect most clips, Keiba improves 25%, Parkjoy 2%, a few others were hurt by about -.05% derf: Why -63 and not -64? TD: codebase started out with -31, not -32 [some debate, TD wins the bloodbath, but derf gets the late win] yushin: have you turned off the hard coded hit cache? smarter: yes yushin: Also hardcoded in the BMA search smarter: yes, I know this code pretty well derf: where is the patch? smarter: https://github.com/smarter/daala/commits/incr_mv_limits smarter: why do we need a limit? What happens if we don't have one? jm: Oh, you're not changing the search, just what's allowed? That's why ChinaSpeed does not improve. derf: right, it doesn't find it because there's no hierarchical search. derf: also, motion est search performance has deteriorated badly (speed) derf: we've lost assembly for 32x32 and stuff like that, which has degraded speed. The original reason for the limit was to keep things like the hit-cache reasonable. And to keep things from wandering off into crazy-land, which is probably why ChinaSpeed got worse. TD: we should land the limits increase anyway smarter: but which number? derf: pick a number that encompasses the test set. run tests derf: x264 does this in its lookahead thread. We don't have a lookahead thread. We could copy their design (which requires rate control to land) or not copy their design and design our own now. [free range design debate] jmspeex: the reason i was looking at correlation is because it's something you can actually interpolate derf: too slow, especially compared to SAD, which i can compute in 1 cycle jmspeex: so you keep something like survivors around? derf: i wouldn't do anything that fancy in the first pass. [debate wraps up] smarter: I think that's all I needed to talk about. I'll run numbers on AWCY derf: don't forget sintel derf/TD: try some scene cuts; right now MC makes a black frame by copying black from other areas of the frame. td: I'll make a scene cut set for AWCY [lots of discussion of different times of scene cuts] derf: good to know if the MV limit increase will hurt or help screen cuts. coordination of 1xMC and full precision: Monty: the stuff that yushin is working on right now where he eliminates doing upsampling on the spot derf: currently we are not doing any filtering yushin: i think it's not a good time to land all this, ultimately we don't want to upsample at all stages, just on the fly as needed. monty: that's what i want to talk about, the work i'm doing collides with this work. full-precision specifically, so lets coordinate to avoid making this a race yushin: the next test is subpel-refinement, it's about infrastructure and will take time derf: i think this is a good improvement that will be needed for other things, it makes it easy to drop in other interpolation filters yushin: SATD is only running for stage 4 (didn't hear properly?) derf: i think you need to do this, we need better subpel filters, given how little we use 8th pel, not worth jumping to 16th pel yushin: 264 only has the test for [inaudible] smarter: One thing I noticed is that on Keiba, before my patch it used half-pel, after it used quarter-pel on the first inter frame, I don't know if that means anything derf: I don't either. monty: full precision reference is close to ready to land, needs review. conceptually there are few unknowns. before i stopped working on it because i thought 1xmc was about to land so i went back to rate estimation derf: yushin just told me that redoing 1xmc will take a month, so let's do your thing first since it will be ready sooner smarter: what's the plan for full precision monty: it's going to be runtime not compile time, a single number that changes, but you won't be able to change it during a run. we could run it only on high depth input, maybe not, needs to be tested. smarter: but if it's in the bitstream it needs to be supported by hardware in any case jm: we are in research stage at this point, we may keep just one or just the other monty: i believe derf wanted the decision about how it's actually used to be left to the working group, we don't want to piss off hardware people. jm: there's still the issue that when it's not on we have quilting derf: so part of the point of having this on at runtime and configurable is to show people "this is what happens when you do/don't have references". jm: at the point, the option would be lapping goes which seems terrible monty: not worth being overly pessimistic about the WG #PCS derf: td, negge: PCS submission in 3 days? td-linux: i emailed the PCS dude about the picture we need to encode derf: this also requires a 5 page paper. i'm also ok if you guys think we should forgo it jm: i can help a bit derf: it might be nice to get the paint deringing on for this jm: i don't believe there's a decoder for paint deringing td-linux: i thought it was just a postprocess pass jm: there's the bilinear filter (no side info) and the deringing filter (which does need side info). smarter: you think it would be too much for one person? jm: there's a new bilinear filter, we'd have to make sure there aren't unexpected interactions. smarter: i can take a look jm: this would be in a branch of course derf: need to provide a binary (encoder and decoder), we're still waiting on receiving testdata. td-linux: it's 2.0 to .1 bpp jm: seems like haar is pointless in this case derf: you can submit different binaries for different metrics, they will feed data from this to a jpeg call for proposals. we will also get free human views of our output smarter: can we dedicate a page to bashing psnr for images? conclusion -> td-linux, with help from smarter and jm will try and submit by the deadline on the 15th jm: quick aside we need to take care of patents for the paint stuff unlord: make sure you sign up for the hackathon if you're going to IETF Screencasting ----------- jm: current update, it at least now works on moving pictures (i.e. doesn't crash). i only tested on still pictures. there was some confusion on my part was what 2d haar was supposed to be. doing the haar in one direction than the other yields ugly results for text, quadtree is more promising. smarter: why does haar make sense for text jm: text has details that are spatially local, you don't went them to spread around. haar is the most spatially local transform you can do in 2d. compared to 265, we appear to do better on text. better on eclipse-daala image, ~20% on my image of a bunch of white text on black backgroudn terminals. we are much worse on stuff with lines, rectangles, etc. For this type of content, we should do something like the standard decomposition for haar, or making the basis overcomplete and signal rectangular stuff at a higher-level, e.g. code the medians and really fine haar stuff derf: are you comparing with standard 265 or screencasting extensions? jm: the former (on awcy). i'm happy to compare with just that since the extensions are still a draft at this point. we are better than 265 on pure text, overall we are worse and i want to fix that first derf: we should still check against 265's screencast extensions to see what's possible jm: i still need to try switching it on/off automatically, need to find out what 265's stuff does. smarter: my guess is they have more intra modes, etc. it's generally easier for them to switch, they don't have prefiltering jm:i should be able to figure out when to/not to lap, i've already done some experiments. Looking at the absolute value of the difference of each pixel compared to the mean on either side of an edge was one thing i was looking at. Could also do a global flag. derf: most encoders don't look at the whole image before coding anything jm: could be an encoder mode to differentiate between screencast and photographic content derf: the question is how much improvement can we get from the screencasting work, and can any of it carry over to photographic content smarter: namely selectively enabling lapping jm: it wouldn't be hard to say on actual super block boundaries, being able to use different size (?) of lapping...the hard part is making decisions within a superblock. We could do the lapping selection on an edge-by-edge basis, and go as high as 32x32 or zero lapping, but we would do it as a presearch on the actual superblock boundaries and do that before making any blocksize decisions. We need to determine is once you make the lapping decision on edges, how do you do the lapping inside the superblock. smarter: make it a function of the edges? jm: if you've decided to use 32x32 lapping on the edges of a block, you'll probably do 32x32 transform in the block. But you could have a block where each of its 4 edges have different lapping, makes things interesting! derf: it sounds like there are implications that are important for the main encoder, so I think it's worth pursuing the screencasting work jm: DCT is able to compactly represent lines, rectangles etc. where Haar is not smarter: maybe try a mix of DCT and Haar? jm: problem with Haar stuff is that it makes edges very jaggy Videoconferencing ================= derf: why do we suck here? We do better on 30fps than 60fps. jm: at decent rates we can't do worse than 100% higher rate than 265, since half of our bits are in the keyframes derf: the ratio from keyframe to delta frames certainly plays, we could test that with longer 30fps sequences. jm: concern i had was that in general you expect a codec to spend half its bits on MVs and we're spending way less than that derf: for delta frames, half the bits should be for macroblock mode info and MVs jm: i tried increasing the lambda and got 1% improvement, tried decreasing it, makes it worse. Decreasing it means putting more bits in MVs and doing that makes us worse. derf: for talking heads, motion compensation shouldn't be making a big difference jm: we need to figure out why we're so bad here, just as previously we found out blocksize switching was broken