DaalaMeeting20150512

From XiphWiki
Jump to navigation Jump to search

# Meeting 2015-05-12
 
Mumble:  mf4.xiph.org:64738
 
# Agenda
 
- reviews
- increasing the range of Motion Vectors?
- coordination of 1xMC and full precision
- PCS submission
- Screencasting (update, scope)
- Why do we suck on VC?
- IETF hackathon (please register)
 
# Attending
 
unlord, TD-Linux, derf, xiphmont, yushin, tmatth, MrZeus, jmspeex
 
# reviews
derf still owes jm and yushin a review each, possibly someone else, hasn't started, been busy on Chroma and QM interpolation work.
 
TD: Nathan should review 731 and 704.  They're easy.
jm: I approve of 731
derf: I owe tristan 677
jm: [construction noises]
 
#range of motion vectors
smarter: experimented with increasing range of motion vectors to -63..64.  Does not affect most clips, Keiba improves 25%, Parkjoy 2%, a few others were hurt by about -.05%
derf: Why -63 and not -64?
TD: codebase started out with -31, not -32
[some debate, TD wins the bloodbath, but derf gets the late win]
yushin: have you turned off the hard coded hit cache?
smarter: yes
yushin: Also hardcoded in the BMA search
smarter: yes, I know this code pretty well
derf: where is the patch?
smarter: https://github.com/smarter/daala/commits/incr_mv_limits
smarter: why do we need a limit?  What happens if we don't have one?
jm: Oh, you're not changing the search, just what's allowed?
That's why ChinaSpeed does not improve.
derf: right, it doesn't find it because there's no hierarchical search.
derf: also, motion est search performance has deteriorated badly (speed)
derf: we've lost assembly for 32x32 and stuff like that, which has degraded speed. The original reason for the limit was to keep things like the hit-cache reasonable.
And to keep things from wandering off into crazy-land, which is probably why ChinaSpeed got worse.
TD: we should land the limits increase anyway
smarter: but which number?
derf: pick a number that encompasses the test set. run tests
derf: x264 does this in its lookahead thread. We don't have a lookahead thread.  We could copy their design (which requires rate control to land) or not copy their design and design our own now.
[free range design debate]
 
jmspeex: the reason i was looking at correlation is because it's something you can actually interpolate
derf: too slow, especially compared to SAD, which i can compute in 1 cycle
jmspeex: so you keep something like survivors around?
derf: i wouldn't do anything that fancy in the first pass.
 
[debate wraps up]
smarter: I think that's all I needed to talk about. I'll run numbers on AWCY
derf: don't forget sintel
derf/TD: try some scene cuts; right now MC makes a black frame by copying black from other areas of the frame.
td: I'll make a scene cut set for AWCY
[lots of discussion of different times of scene cuts]
derf: good to know if the MV limit increase will hurt or help screen cuts.
 
coordination of 1xMC and full precision:
Monty: the stuff that yushin is working on right now where he eliminates doing upsampling on the spot
derf: currently we are not doing any filtering
yushin: i think it's not a good time to land all this, ultimately we don't want to upsample at all stages, just on the fly as needed.
monty: that's what i want to talk about, the work i'm doing collides with this work. full-precision specifically, so lets coordinate to avoid making this a race
yushin: the next test is subpel-refinement, it's about infrastructure and will take time
derf: i think this is a good improvement that will be needed for other things, it makes it easy to drop in other interpolation filters
yushin: SATD is only running for stage 4 (didn't hear properly?)
derf: i think you need to do this, we need better subpel filters, given how little we use 8th pel, not worth jumping to 16th pel
yushin: 264 only has the test for [inaudible]
smarter: One thing I noticed is that on Keiba, before my patch it used half-pel, after it used quarter-pel on the first inter frame, I don't know if that means anything
derf: I don't either.
monty: full precision reference is close to ready to land, needs review. conceptually there are few unknowns. before i stopped working on it because i thought 1xmc was about to land so i went back to rate estimation
derf: yushin just told me that redoing 1xmc will take a month, so let's do your thing first since it will be ready sooner
smarter: what's the plan for full precision
monty: it's going to be runtime not compile time, a single number that changes, but you won't be able to change it during a run. we could run it only on high depth input, maybe not, needs to be tested.
smarter: but if it's in the bitstream it needs to be supported by hardware in any case
jm: we are in research stage at this point, we may keep just one or just the other
monty: i believe derf wanted the decision about how it's actually used to be left to the working group, we don't want to piss off hardware people.
jm: there's still the issue that when it's not on we have quilting
derf: so part of the point of having this on at runtime and configurable is to show people "this is what happens when you do/don't have references".
jm: at the point, the option would be lapping goes which seems terrible
monty: not worth being overly pessimistic about the WG
 
#PCS
 
derf: td, negge: PCS submission in 3 days?
td-linux: i emailed the PCS dude about the picture we need to encode
derf: this also requires a 5 page paper. i'm also ok if you guys think we should forgo it
jm: i can help a bit
derf: it might be nice to get the paint deringing on for this
jm: i don't believe there's a decoder for paint deringing
td-linux: i thought it was just a postprocess pass
jm: there's the bilinear filter (no side info) and the deringing filter (which does need side info).
smarter: you think it would be too much for one person?
jm: there's a new bilinear filter, we'd have to make sure there aren't unexpected interactions.
smarter: i can take a look
jm: this would be in a branch of course
derf: need to provide a binary (encoder and decoder), we're still waiting on receiving testdata.
td-linux: it's 2.0 to .1 bpp
jm: seems like haar is pointless in this case
derf: you can submit different binaries for different metrics, they will feed data from this to a jpeg call for proposals. we will also get free human views of our output
smarter: can we dedicate a page to bashing psnr for images?
conclusion -> td-linux, with help from smarter and jm will try and submit by the deadline on the 15th
 
jm: quick aside we need to take care of patents for the paint stuff
unlord: make sure you sign up for the hackathon if you're going to IETF
 
Screencasting
-----------
jm: current update, it at least now works on moving pictures (i.e. doesn't crash). i only tested on still pictures. there was some confusion on my part was what 2d haar was supposed to be. doing the haar in one direction than the other yields ugly results for text, quadtree is more promising.
smarter: why does haar make sense for text
jm: text has details that are spatially local, you don't went them to spread around. haar is the most spatially local transform you can do in 2d. compared to 265, we appear to do better on text. better on eclipse-daala image, ~20% on my image of a bunch of white text on black backgroudn terminals. we are much worse on stuff with lines, rectangles, etc. For this type of content, we should do something like the standard decomposition for haar, or making the basis overcomplete and signal rectangular stuff at a higher-level, e.g. code the medians and really fine haar stuff
derf: are you comparing with standard 265 or screencasting extensions?
jm: the former (on awcy). i'm happy to compare with just that since the extensions are still a draft at this point. we are better than 265 on pure text, overall we are worse and i want to fix that first
derf: we should still check against 265's screencast extensions to see what's possible
jm: i still need to try switching it on/off automatically, need to find out what 265's stuff does.
smarter: my guess is they have more intra modes, etc. it's generally easier for them to switch, they don't have prefiltering
jm:i should be able to figure out when to/not to lap, i've already done some experiments. Looking at the absolute value of the difference of each pixel compared to the mean on either side of an edge was one thing i was looking at. Could also do a global flag.
derf: most encoders don't look at the whole image before coding anything
jm: could be an encoder mode to differentiate between screencast and photographic content
derf: the question is how much improvement can we get from the screencasting work, and can any of it carry over to photographic content
smarter: namely selectively enabling lapping
jm: it wouldn't be hard to say on actual super block boundaries, being able to use different size (?) of lapping...the hard part is making decisions within a superblock. We could do the lapping selection on an edge-by-edge basis, and go as high as 32x32 or zero lapping, but we would do it as a presearch on the actual superblock boundaries and do that before making any blocksize decisions. We need to determine is once you make the lapping decision on edges, how do you do the lapping inside the superblock.
smarter: make it a function of the edges?
jm: if you've decided to use 32x32 lapping on the edges of a block, you'll probably do 32x32 transform in the block. But you could have a block where each of its 4 edges have different lapping, makes things interesting!
derf: it sounds like there are implications that are important for the main encoder, so I think it's worth pursuing the screencasting work
jm: DCT is able to compactly represent lines, rectangles etc. where Haar is not
smarter: maybe try a mix of DCT and Haar?
jm: problem with Haar stuff is that it makes edges very jaggy

Videoconferencing
=================
derf: why do we suck here? We do better on 30fps than 60fps.
jm: at decent rates we can't do worse than 100% higher rate than 265, since half of our bits are in the keyframes
derf: the ratio from keyframe to delta frames certainly plays, we could test that with longer 30fps sequences.
jm: concern i had was that in general you expect a codec to spend half its bits on MVs and we're spending way less than that
derf: for delta frames, half the bits should be for macroblock mode info and MVs
jm: i tried increasing the lambda and got 1% improvement, tried decreasing it, makes it worse. Decreasing it means putting more bits in MVs and doing that makes us worse.
derf: for talking heads, motion compensation shouldn't be making a big difference
jm: we need to figure out why we're so bad here, just as previously we found out blocksize switching was broken