DaalaMeeting20150421

# Meeting 2015-04-21

Mumble:  mf4.xiph.org:64738

# Agenda

- reviews
- MONTY'S MAGIC 8-12-BIT BALL STUFF
- yushin's 'no sad' experiments & any update from jm (relating to scaler W * lambda * abs(X-Y))
- Haar update
- No MV grid border and 32x32
- call with Vidyo team at 10

# Attending

yushin, daala-person, jmspeex, MrZeus, TD-Linux, unlord, derf, xiphmont, tmatth

# reviews

- derf not blocked waiting for monty's reviews
- unlord to land suhas code from IETF

# monty (move from 8bit to 12bit ref frames)

- x: latest improvement 
- x: gives 7 1/2% on some clips
- x: on video1 short, 
- x: one concern I have remaining is disabling all of the assembly makes this really slow.  On some clips, at exceptionally low bit rates, e.g. 0.007 bpp, the lines do cross on some clips which is not what I would expect.
- x: as far as I know this is the only place where 12 bit is not as good as 8bit
- d: where are these places?
- x: let me look for those.  akiyo is one of them, and it happens on all metrics.
- x: that is something I would not expect, since on akiyo it is otherwise getting fairly solid gains.  I would like to figure out why that is before landing the patch.
- j: do you have any reason to believe the one in akiyo isn't just a fluke?
- x: no but I have no reason to believe it is a fluke.
- x: the answer is I don't know, but I would like to know.
- x: for the most part, the bugs that have been present have not be especially subtle.  The overflows have a tendency to create very small changes.  We have an automated overflow checking tool, but the problem is that full precision is slow (absent 16-bit versions of assembly) and that we cannot do it on AWCY.
- j: you could run it on just the akiyo example to see why its breaking there
- x: yes, I just found out about that from AWCY though this morning
- x: it is good news, but it's not complete.
- j: what was the original problem that made this not an improvement?
- x: two bugs, one was a miscast, the other was an overflow in 1fmv
- j: just plain bugs, as opposed to things we did not take into account.

- d: the code was intentionally overflowing before (as in the overflow was producing the correct result).
- j: nice
- x: you really should comment those places
- d: yes, there are many places in this code that should be improved
- d: I will say that the 1fmv8 is one fixed motion vector with 8 bit reference frames
- d: I hope you removed the 8
- x: my liberal agenda is to ensure there is only a 16 bit version
- x: but it even improves ducks take off now, which was surprising
- x: it shows about the amount of improvement in duckstakeoff that I would expect on all of the clips
- j: why were you surprised specifically on duckstakeoff
- x: because duckstakeoff is not the kind of clip where you would expect large improvements
- d: I think the issue there is that because everything is being recoded all the time, everything is being shoved through the quantization loop.

- j: the improvement on johnny is pretty awesome
- x: on fastssim too
- j: chinaspeed is a regression, and a pretty significant one
- x: I have something new to play with then
- j: what is actually odd about chinaspeed is that the curve that looks odd is the 8bit one, it has this bump
- x: a bunch of them have that, its not unique to chinaspeed, but you are right that a lot of the ones that have it are in 8bit
- x: this is the best example of it so this is the one to look at [to debug this]

# yushin 'no-sad' experiments

- y: I tried some variants of SAD in the frequency domain with the DCT, but have not shown any gains yet
- y: I was wondering if there was any update with jean-marc relating to scalar
- j: I still need to write this up, but was waiting to see how yushin's experiments worked
- d: did you try regular SATD? With the Hadamard?
- y: Oh, no, this is DCT.

- d: so what specifically have you tried?  Have you tried replacing the one metric used everywhere?
- y: yes, 
- y: on master I tried changing the metric used for stage 3 and stage 4 and saw 8% and up to 27% gain
- y: I replaced SAD with SATD and used a sub-pel refinement, but its still 
- y: by default stages 1-4 are on, but I wanted to know what happened if you turned off stage 4, the subpel refinement, turning this off you lose 7-8%
- y: turning on stage 3 you lose 27%
- d: I thought we had done this before and on ntt-short we lost 15%
- y: yes but I wanted to try this on current master

- d: one thing I was worried about was that for stages 1 and 2 there are places where we are tuned based on the metric used
- d: its basically the block size
- d: there is the termination threshold, the other thing you have to worry about is the lambda scaling
- y: I mostly tried changing the lambda
- d: I'm more worried you have to use a different one for SAD and SATD
- d: if you replaced the metric in some of those stages, did you also change the lambda in just those stages?
- y: yeah sure
- j: so the hard thing is that this could have interactions with intra-mode, basically when we finally give up on trying to get a better motion vector
- y: no I don't think so,
- j: no I mean right now we don't have intra, so we always chose motion vectors even when we are going to discard them all
- y: you mean the no-reference case?
- j: yes, later when we use no-ref, right now 
- y: I know you have changed the default SATD value and got some gains right, 
- d: you mean the default lambda value
- y: thats right
- y: so, I'm going to spend some more time figuring why its not working,
- y: currently my explanation is that motion vector coding is independent of the reg coding
- d: can you post your patches for that somewhere?
- y: yes, its all checked in, I'll send you a link to that

# derf No MV grid border and 32x32

- d: I did find one more bug when removing the grid border stuff, which is a bug that kind of exists in master but does not matter
- d: the bug is this, it is that we get the bounds of the motion vector search wrong for grid points that are on the border of the image frame.
- j: the bounds that prevent you from actually looking way out side the image for no reason
- d: so what was actually happening was that for a motion vector grid point right on that edge, what happens is its OBMC so for any grid point there are four boundaries it effects, so it would compute how many pixels to the right I effect, and how far it
- d: it was saying there are an extra 16 pixels to the right 
- j: so it was forcing you to code non-zero motion vectors?
- d: yes, originally I thought this was a problem we had with frame sizes that were not a multiple of 32.
- d: I had thought the problem was that we were doing a bad job of estimating the motion vectors on the boundary, but we were both spending bits on it and screwing up our prediction of the padding which the PVQ stuff would then try to code
- j: so what is the impact on recentering the grid without changing the motion vector size
- d: so I did two runs on that, one where I changed the motion vector grid without improving the padding, and one where it has the current behavior where it fixes the padding (which is 5B)
- j: I see mvgrid5b should that be compared to master, 0408?
- d: that's the one
- j: so 1.4% that is not so bad
- d: well, it was basically exactly the same as we had before, but we don't fall apart on some clips now.

- d: flowervase is my favorite example
- j: ouch
- d: what is happening is it is zooming out, and you specifically want motion vectors that point outside the frame and its clamping them to point inside the frame
- j: yeah I see
- d: that is the one I could not explain by this border stuff since that is an image size that is exactly a multiple of 32 so there is no padding at all.

- j: how is the 32x32 part going?
- d: I did a run yesterday and it was bad, so something is wrong.  One theory is that our flag probabilities are now very wrong since we do very little splitting
- j: so the code right now for level 2 that was 8x8 suddenly the same code becomes 16x16?
- d: all I did was the two line patch to change the minimum and maximum MVB sizes
- d: after I went through and replaced all these hard coded constants, that is what the patch devolved to
- x: everyone is using 64bit float today
- t: too bad 128bit float is not standardized

- x: I wanted to ask if there was anything specific to bring up during the call other than a status update
- d: I don't think there is anything non-obvious to talk about

# jm Haar

- j: so for Haar basically, I have been trying to understand how the SPHIT stuff works and kind of failed
- j: for some reason the code I wrote worked out better than the code tim wrote
- d: the differences now come down to context modelling
- j: there is still something like a 7 or 8% difference on synthetic content
- d: did you read the paper that yushin posted yesterday
- j: so the problem I have is that I read these papers and have no idea how they work
- d: so would you like to go through it?
- j: yes, this is the one with the different levels
- j: http://www.cipr.rpi.edu/~pearlman/papers/TSP06_ZT_cp.pdf
- d: where does this stop making sense?
- j: right at the beginning, I don't even know what they mean by zero trees
- d: he has a nice picture of these, lets look at figure 1
- d: like all wavelets its a bit plane method
- y: whenever you discard a bitplane, its exactly the same as using a quantizer, what they call this kind of quantization in coding theory, is successive quantization approximation, what this means is you can just code or transmit the base and then transmit other planes and its exactly the same as using a quantizer that is the power of two
- j: yeah, I understand it just sounds kind of silly because you lose all the RDO
- d: you get RDO in that you get a rate-distortion operating curve, just by picking a point in the stream
- d: but what you don't get is the chance to pick a quantizer and then do RDO
- d: so that's why bitplane coding
- d: back to figure 1, in the top bit plane here, bit plane 12, there is an example of a zero tree, you have one coefficient and you are declaring that all of the coefficients beneath it are zero.
- j: I don't understand why it is pointing to just some part of the high-high plane when its coming from all of the DC
- d: because remember, this is assuming you've done 5 levels of decomposition on an image that is much larger than 32x32, so the top level is not DC that covers the whole frame
- j: it appears to be taken from a high-high square at a much larger resolution, I don't understand why it becomes smaller all of the sudden
- j: I am looking at bit plane 12 and bit plane 11, the one in bit plane 11 is easier to see, it is a vertical square at relatively low frequency, why doesn't it propagate to the entire quadrant there
- d: the idea here is that it only covers the spatial coefficients, if you take one coefficient from the vertical band
- j: so now what is exactly what you call a zero tree
- d: a zero tree is a collection of coefficients where you are saying all of the coefficients in this tree (everything with the same location and spatial) all of these are insignificant at the current bit plane
- j: okay
- d: so what he defines in section 3 is this degree k-zero tree, the idea is that its zero except for at the top k levels
- j: so k=1 is more like what I am doing
- d: yes, you have symbols for decoding k=0 trees
- d: so yes, SPHIT is a degree 2 coding
- j: what do you mean by degree 2?
- d: as in it has symbols for both

- j: in the end if you do things optimally you end up using 4 bits so its equivalent
- d: I'm not sure you are doing things uniary there
- j: is there an example of degree 1 method there and did what I write end up inadvertently being equivalent to that
- d: I think what you wrote might have been a degree 1 method
- d: his argument in this paper is there is no advantage of degree 1 over degree 0
- d: that is the argument in the paper
- j: in my code, the place where I actually prune is saying when I am called with zero I actually abort
- d: what?
- j: never mind
- d: hopefully jm can understand this paper now, we can take it offline otherwise. I'd pay attention to the tables at the end.



- j: last thing I wanted to ask, a variant of Haar that I was thinking about, and I'm wondering if that existed or had a name
- j: my idea is that right now with Haar if you have a continuous vertical line you end up with a whole bunch of identical coefficients
- d: you do?  I guess in the highest frequency band you do
- j: in the very highest frequency band you 
- d: now I don't know what you mean
- j: lets say you have a horizontal line across the entire image, as you move horizontal you will the same coefficient repeated, so there is some redundancy there
- j: my thought is, what if we applied a 1-D haar transform to these coefficients, so it would be vaguely wavelet packet light, so you would apply horizontal haar on the vertical components and vertical haar on the horizontal components, the result would be that nearly all of the 
- d: no I don't believe there is a name for this, but I am going to hand you a thesis
- j: the thesis is about exactly this?
- d: no it is a generalization of this, but lets talk about that on IRC
13:14:45 <+derf> jmspeex: 
https://people.xiph.org/~tterribe/tmp/Smith97-%20Integrated%20Spatial%20and%20Feature%20Image%20Systems-%20Retrieval,%20Analysis%20and%20Compression/