DaalaMeeting20141014: Difference between revisions

From XiphWiki
Jump to navigation Jump to search
(Created page with "<pre> # Meeting 2014-10-14 Mumble: mf4.xiph.org:64738 # Agenda - What monty's up to - reviews - work week https://daala.etherpad.mozilla.org/forth.orgworkweek-201410 - hv int...")
 
(Add word wrapping)
 
Line 1: Line 1:
<pre>
<pre style="white-space: pre-wrap;
white-space: -moz-pre-wrap;
white-space: -pre-wrap;
white-space: -o-pre-wrap;
word-wrap: break-word;">
# Meeting 2014-10-14
# Meeting 2014-10-14



Latest revision as of 15:57, 6 February 2015

# Meeting 2014-10-14

Mumble:  mf4.xiph.org:64738

# Agenda

- What monty's up to
- reviews
- work week https://daala.etherpad.mozilla.org/forth.orgworkweek-201410
- hv intra now with TF https://review.xiph.org/485/
- gain/theta/skip coding
- deringing
- go(ing) west

# Attending

jmspeex, unlord, xiphmont, jack, derf

# Reviews

- JM is waiting on review for #478.

# What Monty is doing

- x: Still working on pseudo-directional transforms. Though if it works (don't know yet), it is a varaint of Jean-Marc's idea -- encoding a strip of pixels normal to the prediction direction. I'm still trying to do this via a postfilter after the transform that tries to compact the directional edges into a single column or single row. This is the kind of thing that is possible; the question is whether or not there exists a simple filter to do it so that it's practical to compute. That stalled before because I was flailing on the search mechanism, but I figured out how to do that and I've got code for it. If we do manage to compact the energy into a single row, it's a variant of Jean-Marc's thing. We only need to predict that single row and only from other single rows. That may save the original intraprediction. If this works out it dramatically reduces the cycles.
- d: The complexity just moves to this postfilter.
- x: That is the worry, but hoping it still a win. Worried that it won't be a big enough win.
- jm: You're trying to take into account lapping?
- x: Yes, but lapping makes it easier. The lapping increasing compaction, so there are fewer coeffs.
- jm: You are trying to be critically sampled as well?
- x: Of course. For the search, I"m making the assumption that we have a limited number of lifts, and the lifting chains will not end up more than a couple of coeffs deep.
- d: Which block sizes?
- x: 4x4 only.
- d: Good because you can make 4x4 work but not anything else.
- x: I hope that it gives us a pattern for other block sizes.
- d: I'm less hopeful about that.
- jm: Are you considering multiple block sizes?
- x: Not yet, but I was hoping that if we have a single lift pattern that is expanded then that pattern would help us handle multiple block sizes without TF.
- jm: The main issue with block sizes is that your stuff doesn't need to interact with surrounding blocks, but you have the lapping problems on all four sides.
- x: This is entirely transform domain.
- jm: If you're going to have a 16x16 block, the lapping can be for 16, for 8 or for 4, same on bottom, left, and right.
- x: I'm not predicting from entire blocks anymore.
- d: JM is just talking about he transform.
- x: What is the worry?
- jm: The transform needs to account for lapping.
- x: In the transform domain they don't look different.
- jm: It's not the case of lapped and unlapped. There are 81 ways to do lapping for a 16x16 block.
- x: That changes the quantities but no the qualities. Different side lobe heights but not different side lobes. I am pursuing this with the assumption that the block sizes don't change it qualitatively.
- jm: The transform is supposed to predict a 45 degree pattern right?
- x: Right.
- jm: If I have one block with b&w with the limit being 45 degree boundary. Can you code that with N non-zero coeffs.
- x: Have you looked at what the lapping does in transform domain? It causes refractions and reflections along the edge. Those don't change with lapping support, only the strength changes. The filter doesn't eliminate that, there is still some information there. I don't have a filter present yet, but the lapping appears to reduce the amount of information I need to deal with. Given that 4 directional edges our 2d DCT doesn't compact those, the lapping reduces those.
- jm: What is expanding?
- x: Directional edges going through our lapped 2d transform. In the transform domain there is an apparent expansion of complexity. Lapping reduces that apparent expansion. I'm not attempting to find a perfect filter, just a very good filter. I don't think the lapping doesn't change information in such a way that it needs a different filter for the combinations.
- d: I worry that you spend a bunch of time getting 4x4 working but it falls apart at higher sizes. That's what happened to the original stuff.
- x: Did 4x4 actualy work without sparsification?
- u: Yes, it worked well both ways. I can dig up some old graphs if you want.
- d: One possiblity is that we could do intra just for 4x4.
- x: The block size is so small that it can only win if you can do it without signalling.
- d: 4x4 worked when everything was 4x4. I don't think it worked with multiple block sizes.
- u: We need to run these tests again instead of speculating.

# gain / theta / skip coding

- jm: Last week Tim suggested adding a special flag for skipping everything beyond this band. In most cases the code we have right now we will use gain=0 more often than skipping for high bands. the reason for this is that there isn't very much in the first place. A large fraction of the predictions we have will have less energy than the quant threshold.
- d: Which is completely expected.
- jm: I'm not exactly sure what to do abou tthis. Especially as gain=0 has better distortion.
- d: Slightly better right? The thing that concerns me is that we have two ways to code this thing, but we should not spend a bit on signalling that. We should figure out a way to prefer one of these in terms of the signaling cost.
- jm: I the case you don't mind a desync, it's easy to see the gain of the reference is below a certain amount so don't allow gain=0 or skipping. This is easy to do. If you do mind desync on error, ...
- d: Which we do.
- jm: I have no idea what to do about that case.
- d: My gut says you prefer to skip always and make skipping cheap to code. But that still leaves this redundancy in there.
- jm: There are things you can do where you pretend you are skipping and the code behind it just sets gain=0 if it's better for distortion. I thing I'm trying to do at the same time is to jointly code theta and the gain. My first attempt didn't work well, so I've started doing this again. I can actually get some slight improvement by doing it as long as I don't attempt to put noref in there. I think noref is more efficiently encoded jointly across all bands.
- d: Maybe using something vaguely like the SPIHT stuff, just on the gains. Basically you would have at the top level you would code the maximum gain of all the bnads below you. Then you signal whether the current band is the maximum or less and do that for the bands below that. In normal SPIHT you'd hierarchically do that for each band. The idea is that if I code that hte maximum gain=0 at the top level then I've told you what all the gains are now.
- jm: A while ago I looked at this correlation and I didn't find very much correlation. Essentially there are a lot of things you could couple, but you can only couple two of them. Also, consider that for inter you are not coding gain, but the gain difference.
- d: My point is that you have some kind of hierarchical way to limit the range of these things up front. I would expect you'd expect larger gain differences in LF vs. HF. If you can expect LF to have larger changes than HF, then when you code the maximum gain then I'm going to have a biased probability for the gain change. And I've cut out a huge amount of possibilities.
- jm: For the lowest quadrant, if there isn't much change, I can code gain=0 or nonzero combined with theta of noref 0 or more than zero combined with do you want to skip everything else. You could have all this in a single symbol.
- d: My concern is less the number of symbols but cutting down the number of possibilities you have to encode. That's the point of coding the max instead of the sum.
- jm: We have 16 so what's the point in having less than 16?
- d: I'm not saying you have less than 16. In a single symbol my maximum gain would be up to 2^16 (log structure) and from there code what is the delta down to the actual gain. It is using two different symbols, but it still uses all 16 values.
- jm: What you're describing sounds like it would work at high bitrate, but I've mostly looking at low bitrate cases.
- d: At low bitrates I think this works well. I could code all gains with one symbol. If one is 1 then it might take two symbols. It's not the hard and fast you have to skip everything or nothing.
- jm: I'd like to be able to, in a single symbol, say I'm coding some residual for a specific quadrant, and nothing for the higher frequencies.
- d: My idea would take two, but I'm not sure that's a huge disadvantage.
- jm: We're pretty much stuck having two schemes? One for robust and one for non-robust?
- d: Yeah, it sucks, but not sure how to avoid it.

# Deringing

- d: I have a few bugs in my rewrite of interp_pixel but I'm looking at that.
- jm: Do you want to see this landed?
- d: I don't see any real blockers to it at this point.
- jm: I'd like to add more directions.
- j: Can we signal this in the bitstream so that as complexity budget increases the directional search can increase?
- d: Yes. The limitation is implementation complexity, but this seems doable.
- j: Will this get landed before the work week?
- d: Probably a project for the work week.
- jm: Have you played with it on actual images?
- d: Enough to confirm that my version wasn't working :) It seems to do good things most of the time.
- jm: Are you just rewriting it to avoid sqrt etc?
- d: Exactly?
- jm: Is it bit exact?
- d: Down to rounding error.
- j: Is this one of the projects we can hand off to Andreas to optimize?
- d: Needs a bit of algorithmic optimization first.
- jm: There's a whole bunch of things that I"m considering changing. The direction search is costed on a single block, but probably we want to add up the directions from neighbors so the directions will be more consistent.
- d: Some kind of regularization.