DaalaMeeting20140318

From XiphWiki
Jump to navigation Jump to search
# Meeting 2014-03-18

Mumble:  mf4.xiph.org:64738

# Agenda

- reviews
- future of det1
- fastssim
- Activity masking

# Attending

unlord, jmspeex, derf, gmaxwell, jack

# future of det1

- gmaxwell: jean-marc doesn't think it's necessary. becaue we're scaling up we lose the bi-directional round-tripping feature.
- jm: the only case where det1 makes a difference now is lossless, not even high rate.
- g: that's assuming we don't turn down the scaling at some high rate
- d: i think it'll be a 0/1 switch. we won't turn it off.
- jm: there's really no point in not scaling up if we're not lossless. so det1 would only ever be useful at lossless.
- g: would we consider having separate prefilters at lossless vs. non-lossless?
- jm: i don't think we should make the normal non-lossless case worse for lossless.
- d: i don't think it's about making it worse. in theory it made it better except for 8x8.
- g: the dc basis function is not very smooth so it causes blocking artifacts.
- d: we had outlined 3 or 4 ways to smooth the basis functions. you tried yet another one and i don't know which one you tried.
- g: it was one of the ones we tried before. we measured the leakage (freq response to dc funciton). i'm not unconvinced that that could do something useful. the question is what does it buy us?
- d: it buys us stuff at lossless?
- g: at the cost of more multiplies. the rotations are comparable to the number of ?? in the steps. and the scales are related to the number of inputs.
- jm: if there's more multiples it's because there are more degrees of freedom.
- d: there are fewer degrees of freedom.
- g: we definitely have more.
- d: then something is wrong and we need to sit down and figure out what.
- jm: speaking of DoF i came up with what we should normally have for a certain size. i pasted that in irc at the time.
- d: what you're trying to do is come up with DoF for an arbitrary det1 matrix, but that's not what we're doing.
- jm: what i got was consistent with some of the papers i saw. for the case of 16x8, there's 128 possible different transforms.
- d: you already lost me.
- jm: 16x8 transform. there's 16x8 coeefs. if the only constraint you have is invertibility then you can put whatever coeff you want in any element and it will work. then if you impose symmetricity, you go to 64 DoF. if you don't want DC leakage, there's another 7 DoF that you lose. that means you're left with 57 DoF. if you consider that it doesn't matter if one basis function is scaled by 2 then you remove another 8 DoF (49). if you impose det1, then you get 48 DoF for a 16x8 transform. so i went the other way, if i impose that the core of hte transform is orthonormal, how many degrees do i have for 8x8 orthonormal? 28. my reasoning here is that i have 48 DoF for the bi-orthogonal transform. 28 are taken by the DCT. so there are 20 DoF remaining.
- d: that's too many. that may be correct for a completely arbitrary lapped transform. but a good number of those DoF are useless because the guy who did this work did exhaustive searches and determined they were useless.
- jm: i looked at the paper and it said 8x8 had 21 DoF.
- d: what paper?
- jm: monty's demo. in monty's demo, see the v matrix near the bottom. s0 to s_n-1. you have the PNU type 3 or type 4. both of these would have 21 DoF for an 8x8 transform.
- d: what you're looking at has exactly 10.
- jm: what is n? 4 or 8?
- d: n is 4jj
- jm: this still has 12
- g: you can see it has the banded structure, so this has less.
- jm: would an arbitrary V matrix still have all the properties like symmetry and leakage?
- d: yes. the butterflies on the input is what guarantees symmetry. the prefilter and postfilter U and U^-1 matrices in them, the U matrix is arbitrary but it is set to identity, so lots of DoF goes away. by setting it to the identity it gets rid of DC leakage.
- jm: why are some DoF completely useless?
- d: i recommended reading trans original research on the subject. nothing he did to U helped. for the V matrix, the stuff he did comes out as nearly band diagonal. that's what led to the type 3 and type 4 factorizations, which allow you to specify the coeffs on the diagonal very well and leave the off-diagonal stuff very small which is what you want. you can do a more general version of hte type 3 where you have pairwise rotations among all the pairs. n^2 multiplies instead of 2n.
- jm: the full n^2 would be redundant. in the case of 16x8 some of them may be useless but if you have more than 20 DoF it's not going to work.
- d: my point is that if you do enough of the pairs so that you don't have any redundancy then it works out correctly. doing extra pairs doesn't buy you anything. i only have his old paper that is split into part 1 and part 2. in part 2 section 3 he goes through the derivation and explains whey these things are useless. i'll have to go over it with greg later and figure out how he got more DoF.
- jm: we can try det1 and we can keep it if it's good, but i wouldn't restrict myself to that.
- d: it has nice properties and when it made everything better it seemed like a good idea.
- jm: ??
- g: it did reduce the support of the transform. i think that on the 4x8 case, the ones we have checked in right now aren't problematic. now that i did something wrong with the lifters i should look into that. looking at the factorization i should lose 1 DoF. i'm not sure what i did, but i ended up with more.
- d: i know we did an early test to add in the extra pairs to make sure it didn't make a difference.
- g: i might have picked up the old code that had that. that's why i've been complaining about the search being a problem.
- jm: you mentioned you did some training with the leakage
- g: so far the results i'm getting out of that are crap. once i fix the DoF here that may change. taking 9 DoF out of your search is kind of magical.
- d: any point that was good in the old space doesn't mean you'd ever find it.
- g: i know what i need to do immediately. let's table this until i've done that.

# reviews

stuff is looking good

# fastssim

- jm: i want this back on
- j: it's broken
- g: monty also had positive opinions on fastssim.
- d: he's not on the call :(. my suspicion is that it's likely there are bugs in that code. someone should look at it and figure that out. monty had signed up to do that.
- g: i run it against the TID database and it was so bad as to be broken on that. it didn't do the same thing as the SSIM in their paper and they didn't have numbers for hte fast formulation, but it worked reasonable well on that database.
- d: we should coordinate with monty.
- g: i think jm is saying that it agrees with his visual inspection particularly on blocking artifacts.
- d: becaues of multiscale.
- jm: can we reenable it for now?
- d: i think that's fine. if you make graphs for public consumption then turn it off until we know more stuff.
- u: should i turn it back on in bd rate too?
- d: yes.
- u: i'll submit a patch.

# activity masking

- jm: i think we need to understand how AM works, but i don't. how do we figure it out? it sounds like the kind of thing we could spend two years doing fundamental research.
- g: what precise question should we ask?
- jm: what i want to do is understand the equivalent of what are the video critical bands when it comes to masking.
- d: right. the research i've seen suggests that they are directional and i've even seen there are 6 directions, which doesn't particularly work well for our square grids.
- g: the cells in the eyes are in a hexagonal pattern.
- d: whether that corresponds to how hte brain interprets things god only knows. after that things are octave band structrured. where those bands lie depends on how far away you sit from the screen. an octave can cover a pretty large range of the freq space. it's not like audio where pitch is fixed at an absolute scale.
- jm: audio is not logarithmic in terms of freq.
- d: my point is the boundaries are not at fixed locations. it's like audio as if you were on a speeding train.
- jm: you could offset everything by half a critical band and it would still work. this is not about it's betwen this and this frequency. it's around this point the critical band is this wide.
- g: i can give you a hand wavy argument: the vision system works when objects are at different distances.
- d: i don't have a good handle on what kind of threshold differences will ???
- jm: if we assume this then it's really different from the AM in theora.
- d: yes. i don't know if theora was brilliant. i tried something and it worked.
- g: x264 was very similar to theora's analysis.
- d: it does things at the macroblock level and stuff too. there's a lot of details in the stuff that jm cares about.
- g: all x264 is doing is trying to classify regions that are textured. a whole wide general class of techniques may be effective but doesn't help you decide what to do per band or on directionality. 6 directoins also doesn't fall nicely into our 2d separable transforms.
- jm: i would do an approximation there. horizontal, vertical, and diagonal. my main question is there masking between horiz and vert or between two difference octaves.
- g: i can propose a theory about how it might work. you've got difference detectors that run in each direction and there's effectively a high pass filter that comes out of hte feedback detectors, so if there's a lot .... in areas with lots of contrast, the sensitivity to changes goes down. i fyou assume they run independently and there are multiple scales then you end up with a conclusion that you expect them to be indepdendent in horiz and vert at finest scales and independence between bands.
- d: then you read hte papers and they say somethin else. short story is that no one has any idea how this stuff fucking works.
- g: keep in mind that these studies are done with really simple stimulus images.
# code party
- j: SFO office, week of 6/9 work for everyone?
- jm: i'll need to check that.