Unimplemented x264 features in rav1e

From XiphWiki
Revision as of 13:19, 12 February 2019 by Unlord (talk | contribs) (Unlord moved page X264 features in rav1e to Unimplemented x264 features in rav1e)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This is the list of unimplemented x264 features, but put in the context of rav1e instead: [1]

Motion Estimation

  • Sequential elimination (SEA), used for exhaustive search, might be more generally applicable to algorithms like UMH, by letting us skip a lot of SADs. The downside is we won't be able to use SAD_X4 anymore.
    • rav1e: This would be nice to have. SAD_X4 is not applicable.
  • (T)ESA is currently wrong for motion searches done on weightp duplicates. This effect is miniscule, but it still should be fixed.
    • rav1e: doesn't even have TESA yet.
  • Hierarchical motion estimation might be a useful way to catch very long motion vectors without the cost of UMH or ESA. It might also help regularize motion.
    • rav1e: already has this, in the form of half-resolution and quarter-resolution search. It's enormously helpful for video game sequences.
  • Somehow take into account the effect of motion vector decision on future blocks.
    • rav1e: daala did this to enormous gain, should be doable in rav1e too.
  • We don't need to check all 11 predictors all the time for 16x16 fullpel motion search.
    • But how do we know which ones we can afford to skip, and when?
    • Xvid and libtheora have algorithms for this, but the former's is almost surely 100% useless and the latter doesn't seem impressive either.
    • rav1e: checking predictors is so cheap that this is a waste.
  • libtheora does fullpel motion estimation on the source pixels instead of decoded pixels. Does this give a better starting point for the subpel search and discourage "weird" MVs?
    • rav1e: worth trying but probably only for halfres and quarterres
  • With extremely fast encoding settings (subme 0), can we rip off lookahead MVs instead of doing a real search?
    • This seems to be awful from my testing, but maybe there's something we can do?
    • rav1e: sounds like a waste, if you're going this fast you shouldn't be doing lookahead
  • Try sub-8x8 partitions in B-frames. Is it at all useful?
    • rav1e: already does this. might be worth investigating not doing this
  • Try bidir motion estimation for fullpel. That is, considering L1's MV when doing L0 (or vice versa). Xvid does this. How much does it help?
    • rav1e: probably worth doing.
  • Fullpel chroma ME?
    • rav1e: already done?

Intra Analysis

  • Make the early terminations smarter. Currently they're just hacks -- some statistical analysis might be useful.
    • rav1e early termination is more straightforward but still needs tuning.

Mode Decision

  • Can we find more ways to skip more motion searches in multiref?
    • A while back, I tried using weaker motion searches on older refs. This helped a bit for speed-vs-compression, but is ironically the opposite of what one wants; older refs will be harder to find good MVs in, and therefore really need better searches.
  • On extremely fast encoding settings, fast skip is actually kind of slow. But anything dumber (e.g. SAD) is completely useless. Is there some better balance that can be achieved here?
    • Can we do something smart by analyzing fenc? It's impossible to tell whether a block is motionless by looking at fdec, but looking at the source pixels is useful. There's still complexity such as lower-QP-than-reference though.
  • See the TODOs for deblock-aware RD in common/deblock.c.
    • I tried correcting weightp references for deblock RDO, but it didn't help.
    • I tried chroma, too, and again, it didn't help measurably.
  • Is there a faster way than SA8D/SATD to do 8x8dct vs 4x4dct mode decision? At very fast settings, the time this uses is nontrivial.
    • Doing a merged 4x4/8x8 SATD would help here, but would require new asm.
    • rav1e: this speedup is probably less important
  • Is there a faster way than RD to do 8x8dct vs 4x4dct mode decision that's still better than SATD? RD takes over an order of magnitude more time than SATD, so it might be useful to have something in the middle.
    • rav1e: even more critical than in x264, see fast rdo patches
  • Is there some value to swapping the mode decision metric from SATD to SA8D if we think that the macroblock will use 8x8dct? This has been tried before, but only helped if our guess was extremely good (better than we could get in reality).
    • rav1e: only uses one metric at a time.

Psy

  • Psy-RD is a hack. It works, but it's a hack. If you apply QNS with Psy-RD as the metric, it goes way overboard and gives terrible results. This means that Psy-RD only works because normal mode decision is limited in the way it can modify the image to better suit the metric. Is there a way to make it better?
    • rav1e: see cdef-dist.
  • Should RD be linear at all? Perhaps we should weight more heavily against low quality blocks and also try to ignore minuscule distortion that viewers can't see.
    • rav1e: see cdef-dist.
  • Psy-trellis (and maybe psy-RD?) are too strong at very high QPs.
    • rav1e: see cdef-dist.
  • Psy-trellis should be merged with Psy-RD. There are patches for this, but they probably won't be committed until psy-RD itself is fixed.
    • rav1e: see cdef-dist.
  • RD should take into account local variance.
    • rav1e: see cdef-dist.
  • Lambda should be varied on a per-DCT-block basis instead of a per-macroblock basis.
    • rav1e: should already do this
  • Lambda should be picked independent of quantizer (i.e. with greater precision).
    • rav1e: already done
  • Classic problem: a block is mostly high complexity but has a small area of low complexity. How do we judge whether that area is important? Good example: sharp text on background with film grain; grain gets blurred out because of the text.
    • If we think it's important all the time, we ruin the quality of many clips that rely on raising complexity on edges (Touhou).
  • Should motion estimation lambda be as high as it is at very high quantizers? There's some value to capturing "true motion"...
  • Macroblock tree correlates pretty well with visual perception in that its concept of a "high complexity" matches well with the visual concept. Except for local illumination changes. Talk to Dark Shikari for a patch.
    • rav1e still needs mbtree

Lookahead

  • Temporal MV predictors in lookahead? There's a patch for these somewhere, but they biased heavily in favor of B-frames, likely by improving the motion search.
    • rav1e: lookahead should totally use temporal candidates if available
  • Should lookahead use variable lambda based on quantizer (esp. due to adaptive quant)? If so, should it take into account estimated ratecontrol quantizer, too? If so, how?
    • rav1e: should already do this, file a bug if it doesn't

Quantization

  • There's room for something between trellis and deadzone in terms of complexity. libvpx has a good example -- it biases towards zero-runs in its "medium speed" quantizer. This can't be SIMD'd easily, but is still vastly faster than trellis. A nonlinear quantizer (be more likely to round up larger coefficients) might also be useful.
    • How useful is this with an entropy coder that doesn't really bias towards zero-runs, as in CABAC?
    • rav1e: same concern, with lv-map instead
  • Floyd-Steinberg for quantization? Try pushing quantization error to nearby DCT coefficients. Should this go from high to low or low to high?
    • rav1e: eeeeeeeeeeeeh
  • Energy-preserving quantizer -- maintain L1 (or maybe L2? I'm not sure) energy. Should we maintain it in the spatial domain (post-iDCT) or residual domain? Probably the former.
    • See saintdev's github for one attempt at this.
    • rav1e: this is pretty hard with just the quantizer, do delta q's first

Transform

  • Analyze the error characteristics of the fDCT. Is there any way to make it more accurate without much speed loss? Particularly at extremely low quantizers, this might help.
    • rav1e: Using the daala fDCT will accomplish this as well as improved speed (the AV1 fDCT is much more accurate than the H.264 one)
  • Before forward transform, run a "blocking filter" that acts as the approximate inverse of the deblock filter. See this paper.
    • rav1e: even more applicable to AV1 due to the many postfilters.