DaalaMeeting20150407

From XiphWiki
Revision as of 00:45, 15 April 2015 by MrZeus (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
# Meeting 2015-04-07

Mumble:  mf4.xiph.org:64738

# Agenda

- reviews
- multi-ref

# Attending

daala-person, derf, jack, jmspeex, td-linux, tmatth, unlord, yushin

# reviews

- jm: I gave one to Thomas.
- t: I was afraid.

few patches from IETF92 waiting on various people.

# multi-ref

- t: I have a blend function that works and have a prediction function that predicts the right frame.
- d: Which prediction? Pixels or MVs?
- t: The MVs also worked. So far it looked like the MVs don't get screwed up, but I need to look at them more. For coding the predictors, I haven't touched that so it's bad.
- d: I've also been working on changing that code.

# MV prediction

- jm: I did some quick experiments. The current median of 4 predictors that we are using is still doing much better than previous MV as a prediction. Is that expected?
- d: I don't know what you mean.
- jm: Looking at previous frame and using the same MV for the prediction.
- d: That isn't always available; we don't have a grid point there in all cases.
- jm: I only looked at level 0.
- d: It's very often not a great predictor. Sometimes it's really good and sometimes it's really bad. Every scheme that used temporal prediction used it as one of the possible predictors but there is always a spatial prediction fallback.
- jm: I tested on johnny so it seems like it should have worked well there.
- d: Why would it work well there?
- jm: Why would it not work well?
- d: ... The boundary between background and foreground is where you are going to run into problems and temporal is not going to do well there either (just like spatial). I would expect temporal to do worse than median of 4 spatial on large areas that move consistently. On the background which doesn't move, it doesn't matter which one you use. ON the foreground, it doesn't make a big difference but I expect spatial to do a little bit better. I've never seen it where temporal is the only predictor you have.
- j: Could chose from both but not account for signaling cost to check.
- jm: Also I could see how often it is used in that case.
- d: One thing I could think of is that it would do a good job if something is spinning.
- jm: Why?
- d: Always the same motion in the same place, but it varies spatially. So the spatial prediction is a bit off and the temporal predictor is pretty good.
- jm: Do we have any spinning videos?
- d: Some stuff spins in mobile and in cactus. Tractor has huge spinning wheels.
- jm: has anyone done long range predictors?
- d: People have done it with two pass. You use explicit signalling. It stinks.
- jm: The general idea I had is to compute a running histogram and the one or two most likely ones we can make a predictor out of them.
- d: When you say 1 or 2, that gets "complex" in the computational sense. One is perhaps not that bad. Maybe one is that bad.
- jm: It could be the top non-zero one.
- t: How does this help over median predictors?
- d: You have some kind of pan and the background has consistent motion. I'm not sure it's that great because you still have up-right in most cases. Something should intersect the background. The current issue is the median of 4, when up-right is the only good predictor, you are going to throw it out. This is where HEVC and VP9 got gains, because instead of discarding, they collect them as multiple predictors and signal which one they use.
- t: You could do something simple by weighting them by tossing them into the median 2 or 3 times.
- d: They build lists of unique MVs.
- jm: Unique as in very close or equal or different?
- d: Equal or different. Then they signal which one from the list they use. In HEVC this is complicated and I can't explain it well. But you do sort of what you said. You have a list, you pick from the list, and you code a residual. In VP9 it's a little different. You build the list and the first entry is the most popular, etc. Then you can either code something using the first one and code a residual, or take the first one with no residual or the second one with no residual. I think there is a fourth option too. And the fourth is use zero with no residual.
- jm: How are the first and second entries built?
- d: The most popular from your neighbors. If you don't have enough unique ones then you fill in with zeros.

(missed some stuff about packetization)

- d: The way decoders work is they want all the information for one superblock/macroblock at one time. In order to do that if you have all the info next to each other in the packet, once you get the packet you can decode that area of the screen. Whereas if you have them in different packets, if I get a packet for some set of DCT coeffs, now I need to wait for the packet with MVs before I can go about decoding the screen. And then I have to shuffle state and wait etc. It starts to make things a real mess.
- jm: And yet you can at least parallelize the entropy decoder by a factor of two.
- d: What you're already going to be doing is splitting up into tiles and splitting up that way.
- jm: You can make tiles twice as big.
- d: At the cost of additional complexity though. If you split this then you have dependent packets and they aren't independent.
- jm: You could have two packets per tiles.
- d: Packets per tile depends on how complex the tile is. You might get 3 packets for residual and 2 packets for MVs and then you have to wait. We have things more separable than other codecs, but I'm hoping to fix that instead of institutionalizing it. Part of 32x32 block size stuff is removing the border region around the grid. I still don't know if that actually works better. This would have no grid points outside the frame.
- jm: Are you sure that is not hurting us somehow?
- d: That's what I'm trying to test.
- jm: That it's not clamping MVs to 0,0 anywhere? Can we make sure we have smooth MVs across the whole image? It hurts any time we have a border of non-zero; we're paying a price. Why was this done that way?
- d: To make the number of MVs that we had to signal the same as a normal codec. The number of level 0 MVs is the same as the number you'd have if you had one per macroblock. That was a thought I had in 2006 and I don't claim it is a great thought. In any case, I'm working on fixing this. If you are going to center on 32x32 all my borders are 4 grid points instead of 2 grid points, etc.
- u: What happens when we go to 64x64?
- d: If I do it right, nothing changes, because the grid now lines up with the transform blocks. If I instead make all the twos into fours, then I'd have to go back and make fours into eights. So fixing this by changing the structure makes it easier to keep expanding the block size.

...

(jack leaves, derf and yushin discuss "stage" vs. "phase" nomenclature)