Difference between revisions of "DaalaMeeting20140812"

From XiphWiki
Jump to navigation Jump to search
(Created page with "<pre> # Meeting 2014-08-12 Mumble: mf4.xiph.org:64738 # Agenda - reviews - coding party - Name change - Multi-frame MC? (greg) - ?? # Attending bkoc, derf, gmaxwell, jack, ...")
m (Add word wrapping)
Line 1: Line 1:
<pre style="white-space: pre-wrap;
white-space: -moz-pre-wrap;
white-space: -pre-wrap;
white-space: -o-pre-wrap;
word-wrap: break-word;">
# Meeting 2014-08-12
# Meeting 2014-08-12

Latest revision as of 16:00, 6 February 2015

# Meeting 2014-08-12

Mumble:  mf4.xiph.org:64738

# Agenda

- reviews
- coding party
- Name change
- Multi-frame MC? (greg)
- ??

# Attending

bkoc, derf, gmaxwell, jack, jmspeex, smarter, TD-Linux

# Reviews

-jm: let's clean up rietveld before the code party. if you know of reviews that aren't going to land, please close them.

# Coding party

# Name Change

- d: As some people may be aware, Andreas requested we consider rebranding. We met with the PR and branding people last week and went through a bunch of names. If you want to say something about this, you need to speak up now.

(discussion of name ideas)

# Multi-frame MC

- g: I got stuck on the optimization parts on that. I'm not sure how to make it work. I need to spend some whiteboard time with Tim. The stuff with handling multiple buffers and doing initial searches is working fine. What was the patch you assigned me last night?
- d: The bug that smarter found where it was getting the costs wrong in some cases. I doubt it makes much difference because it was being triggered about 4x a frame.
- g: I wonder if there is some test we can construct to make sure this is doing it right. We know it's not getting ht MVs massively wrong, but perhaps the optimization is not working.
- d: The problem is that the objective is not capture true motion; the objective is to get good RD performance. I wouldn't expect a one pixel off error. The current code has a bunch of checks in it but we are not running them regularly.
- g: Sounds like consistency checking is about the best we can do.
- smarter: This would be useful to be in encoder check.
- jm: Could we run this in jenkins?
- d: yes.
- g: I can add that to the CI tools. Some of these things we should make abort when they hit with an override for data collection. I can do that.

# Intra paint

- jm: I have code to encode the mode, but not yet for the top row and left column.
- d: What's the bitrate on the modes looking like?
- jm: For 16x16 I had an average around 3 and 3.5 over subset1. For 8x8 the rate is slightly lower. For 32x32 it's a bit higher. Block size switching right now has kind of a high rate for mode signalling. I don't know yet whether it's caused by the block size decorrelating the mode values or some kind of bug in the way i do the encoding for multiple block sizes.
- d: Is it the bitrate going up, or the bits you spend on blocksizes goes high?
- jm: When I turn on BSS the number of bits on modes goes up.
- d: That's interesting but I don't know the answer.
- g: I can see why it would do that. If there is a region that is all the same direction you'd switch to a large block there and not use as many bits.
- jm: I'm definitely aware of that effect, but the amount it goes up suggests something else. The context is not dependent on the block size, so there is a cost there.
- d: What are you doing for the context modeling now?
- jm: I model the probability of DC based on whether the causal neighbors are DC. THere are 8 possibilities for hte context and for each of the 8 I adapt a probability. For the directions, everytime i consider a certain mode, any of hte causal neighbors using that mode gives it 2 points, any of them +1 or -1 gets 1 point. And for each of these values (0-6 points) I adapt probabilities.
- d: I don't understand.
- jm: Each time a mode gets 6 points, i take the probabilty from a table that's adapted and based on whether the mode was selected or not I update the table.
- d: You don't adapt zero?
- jm: No, that's a special case. That's for modes that have no causal neighbors that are close. Right now I'm using an actual CDF for it.
- d: Are you using the same set of modes for every blocksize?
- jm: The same set except that it's scaled to the size. The number of modes is proportional to the blocksize.
- d: So if you block sizes differ by 2 or more then you are scaling by 4 and only considering neighbors 1 away.


- d: It spreads it out much wider than when it was modeling at 32x32.
- jm: This is a first attempt and I'm sure we can do better.
- d: My iniitial thought was to pick one resolution (whatever the smallest number of modes is)...
- jm: The smallest is 8x8. The resolution is variable. For 8x8 I could have up to 30 modes, but only encoding 16.
- d: My thought was that you use the lowest resolution modes and encode one at that resolution. Then for larger blocksizes you use some kind of refinement on top.
- jm: I could do something like that. I thought of a similar idea of coding the LSBs but my main concern with that is that the LSBs are not actually flat because of perfectly horizontal and perfectly vertical. There is also a slight bias for 45 degrees.
- d: I wasn't suggesting it would be flat but you might be able to do some kind of residual there.
- jm: It's something that can and should be tried. Right now I'm trying to get an actual bitstream out. Eventually I'd like to do chroma as well. Ideally by reusing the same modes; not coding an additional set of modes, and probably doing some kind of CfL on the edges. I"m basically rewritten the equivalent of an entire still image codec. It's the sam ecomplexity except I do 1D instead of 2D dcts.
- d: Do you have a rought idea of where this will be at the coding party and a list of tasks for people?
- jm: There is the encoder integration that TD is working on.
- td: I've been workign on AWCY instead. I could switch if you want.
- jm: I'm at the point where if we had it in the encoder we could have a reasonable list of the rates. What's the priority of this vs. AWCY, I don't know.
- d: Obviously both as soon as possible. What's the state of conflict resolution for the existing EC2 stuff?
- td: Right now it looks to see if there are any processes running and then bails. The new one reserves instances for yourself so it's more sure.
- d: Can the coding party people use it as it works now?
- td: yes
-d: Given that focus on the intra paint integration. Maybe we can get some folks working on AWCY at the coding party itself.
- jm: An example of things that can be done is to look at other transforms than the DCT. For one of the edges the appropriate one would be this DST 6 or 7.
- d: 7 I believe.
- jm: but for the other it would be something different because it gets predicted from both sides.
- d: That probably winds you up with a DCT again.
- jm: I'm not sure actually. For this particular edge, I tried random modulated by a hamming window and I tried to loo kat what the basis functions were like and they were quite interesting. Half looked like windowed basis functions and the other half were high on the edges and low in the window and low eigenvalue. There's also what you do about the corner pixel which gets encoded twice. I'm not exactly sure we want power of 2 - 1 DCTS.
- d: you could force 0 for some of them.
- jm: I don't like that. It willl increase the L1 norm. You'd want to do something that minimizes the L1 norm. The value that minimizes the L2 norm is to put a zero in teh time domain, which is probably not the best. But you dont' want to force a zero. Forcing a zero would make things worse than what we have right now. Repeating the value next to it sounds better to me.
There's also better search, doing chroma, and having an actual decoder to check the encoder is actually correct.
- d: Put all that on the etherpad.
- jm: Daala should be the first codec to standardize just the encocder not hte decoder.
- d: I'll quote thomas, "I don't feel like we need to innovate in that area." If other people have ideas, please put them in the etherpad. My goal is to clean up the MC stuff. It will not be fully done by the coding party, but I'm trying.

# EDI status

- s: I'm running training but I don't have any results yet. I have some questions for Tim later.
- jm: Did you try the experiments we suggests?
- s: 0.1 to 0.3dB worse for EDI.
- jm: Using what downsampling?
- s: I tried 2x2, average, and ???
- jm: If you can't beat the current filters with actual decimation then something is wrong with EDI. Hopefully that will be solved by this training.
- s: How do you feel about 16 tap filters?
- d: That seems a little large.
- jm: Not so much about complexity but it will ring all over the place.


- d: Don't train on subset4. That's the test subset. Train on subset3. I think there is more than enough data for a six tap filter. The reason there are two is so that you can validate on the other one. There are two thare small enough you can actually look at images. But the purpose is hte same for both test sets.
- d: The point is you're trying to reduce ringing along edges. But having long filters is not going to be helpful for that goal.
- jm: Once you have actual filters, one thing that is useful to look at is to look at the frequency response of the filters.
- s: Can you write that down on IRC?
- jm: frqz