Summer of Code 2021: Difference between revisions

Revision as of 22:07, 2 March 2021

Introduction

This year Xiph.org is focusing on the rav1e AV1 encoder for its GSoC participation. Both video and still images are currently hot topics, especially with the recent support of AVIF within browsers.

Below you'll find the description for the following GSoC project ideas around the rav1e project.

If you want to know more about a particular idea, please get in touch with the people listed under "possible mentors". While no guarantee, that the person will be the actual mentor for the task, they know it and will be happy to answer your questions.

In our previous participation we focused a lot on our multimedia codec projects. This turned out to be very challenging for students. So this year we're not offering project ideas from those. If you're a student interested in codec work, have previous experience in it and are confident, that you can convince us, you're welcome to get in touch.

Detailed Project Descriptions

These ideas were suggested by various members of the developer community as projects that would be beneficial and which we feel we can mentor. Students should feel free to select one of these, develop a variation, or propose their own ideas. Here, ideally.

rav1e-by-gop integration

rav1e-by-gop is an extended command line encoder that provides additional encoding strategies such as by-gop parallel encoding across multiple machines.

Problem / Intro

rav1e-by-gop is currently a command line program, some users might want to enjoy on the extended features from other programs even if they do not always belong to an encoder.

Solution / Task

make rav1e-by-gop expose the same API of the normal rav1e to make easy to use the multiple machine encoding features from other programs.

Requirements

The student should be familiar with Rust or C programming.
Video knowledge is not strictly necessary, however a basic understanding of the concepts is vastly beneficial.

Possible Mentors

User:Tdaede User:Lu_zero

Support WebAssembly SIMD

rav1e supports the WASI platform and it has its javascript API bindings relying on it.

Problem / Intro

The WebAssembly SIMD is getting closer to be available, we should support it.

Solution / Task

Implement the dispatch logic for WASM SIMD as done already for x86_64 and aarch64.
Implement the Sum of absolute difference (SAD) and Sum of absolute transformed differences (SATD)
Implement the inverse transforms (idct, iadst, identity, ...)
Implement the motion compensation.

Requirements

The student should be familiar with Rust, WASM, wasmtime and related tools. Knowledge of x86 or arm assembly is not needed but will help.

Possible Mentors

User:Lu_zero

Implement butteraugli in av-metrics

av-metrics is a collection of video quality metrics, butteraugli is a promising psychovisual similarity metric.

Problem / Intro

Currently the implementation of butteraugli exists as stand-alone codebase. The code is readable, but it could be faster.

Solution / Task

Implement rust bindings to the reference butteraugli.
Implement butteraugli in pure rust within av-metrics.
Write integration and unit tests to make sure the implementation does not diverge
Write criterion benchmarks
Implement x86_64 or aarch64 optimizations for it, using intrinsics or plain ASM.

Requirements

The student should be familiar with Rust, C and C++. Knowledge of x86 or arm assembly is welcome.

Possible Mentors

User:Lu_zero

Grain synthesis implementation inside of the rav1e encoder

Grain synthesis is using the idea of modeling noise temporally and spatially using noise estimation.

Problem / Intro

Keeping high frequency detail and noise(dithering, camera noise, grain) using traditional encoder techniques is very expensive in terms of bitrate allocation, and some tools implemented to take care of that problem can create additional artifacts that are not pleasing to the general viewer experience, or are detrimental to the fidelity of the image.

Solution / Task

Implementing grain synthesis that models the noise parameters of a video, and applies the generated noise parameters during the decoding process, saving very high amounts of bitrates and providing a very high subjective visual fidelity and appeal.

Making it faster than other forms of grain synthesis via smarter algorithms and using various forms of threading to speed up its application, such as tile threading and integration with rav1e-by-gop, making it possible to use as part of any encoding workflow. This will make sure adoption of the technique becomes as widespread as possible.

Requirements

The student should be familiar with Rust and C, and must have a light background in general visual media encoding, such as video and image compression. Complexity: Medium.

Possible Mentors

User:Lu_zero && XX

Adaptive quantization

Adaptive quantization is the process of an algorithm trying to efficiently allocate bitrate among the various macroblocks found in a frame by varying the quantizer across each of them according to different visual targets.

Problem / Intro

Often times, an encoder does not know about the best way to allocate the bitrate budget across a frame, and may overspend a considerable amount of bitrate to regions that might not benefit from a low quantizer(low amounts of distortion, so less compression) while not giving enough bitrate to zones that might actually need it. This can even cause issues temporally, as bitrate allocation within a group of frames(GOPs) may be skewed towards more complex and high motion frames, while leaving other frames with barely any bitrate to work with, creating visual artifacts such as blocking and banding.

Solution / Task

Implementing 1-3 forms on adaptive quantization either based on variance, complexity and/or variance variance with a bias in low contrast frames in rav1e. This will make bitrate allocation more efficient, and avoid bitrate overspending in areas which either need the lower quantizer to avoid the presence of lower quality frames that might detract from the viewer experience, and make objective/subjective quality targets easier to achieved. The combination of powerful adaptive quantization and grain synthesis would allow for a higher subjective quality viewing experience at lower bitrates while potentially lowering computational complexity by a good margin.

Making a smart adaptive AQ mode in which the encoder chooses which adaptive quantization algorithm to use depending on the scene featured in a GOP. Potentially difficult.

Requirements

The student should be familiar with Rust and C, and having a background in general visual media encoding, such as video and image compression is recommended.

Difficulty: Medium depending on which targets the student chooses to follow

Possible Mentors

User:Lu_zero && XX

Visual metric targeting in rav1e-by-gop

Objective metrics are used to evaluate an encoder's performance in a diverse set of scenarios. Different metrics such as PSNR, SSIM, DSSIM, VMAF and some closed no-reference metrics are used in the field to record encoder performance changes across versions trying to correlate closely with human perception.

Problem / Intro

Classical methods of rate control such as ABR(Average BitRate), fixed quantizers and even CRF(Constant Rate Factor) have the issue of not targeting a certain quality level. This can result in starved encodes where the bitrate budget has to be kept low to stay watchable by the viewer without interruption, leading to scenes that have exceptionally good visual targets by overspending bitrate, and scenes that have very poor visual appeal by having too little bitrate, detracting from the viewer experience entirely. More advanced forms of rate control like CRF help somewhat, but they still have the issue of having to overshoot so the lower quality scenes do not suffer, and do not adapt to the different type of content encoded, resulting in variable quality encodes.

Solution / Task

Implementing visual metric targeting based on VMAF(mainly used for video) and butteraugli(mainly used for images) as part of rav1e-by-gop as a secondary rate control option.

The application of visual metric targeting in rav1e-by-gop would take advantage of its adaptive keyframe placement and smart scene detection to its fullest. This would allow for the best rate control possible, as short scenes in the length of 1-15s are where visual metrics such as VMAF shine the most. The idea is to encode first with a very fast speed preset in the encoder to gauge the quality at a prefixed quantizer. If the visual metric target set is not achieved, the encoder tries again once or twice until it gets the right result. With this method, instead of targeting an average of bitrate, you would target a visual score, getting higher efficiency and higher subjective quality. This would also be advantageous in terms of encoding time spent, as encoder complexity could be dialed back while keeping overall efficiency the same or higher, with efficiency being a function of both encoder efficiency and rate control.

Implementing butteraugli quality targeting as an option using rav1e for AVIF images. Since visual quality requirements are considerably higher for intra only(image only) media, keeping high visual fidelity is even more important than video compression. Quality targeting iterations would also be quite useful here.

Requirements

The student should be familiar with Rust and C. General interest in image and video coding is recommended

Difficulty: Low-medium.

Possible Mentors

User:Lu_zero && XX

@@ Line 117: / Line 117: @@
 ==== Problem / Intro ====
-Often times, an encoder does not know about the best way to allocate the bitrate budget across a frame, and may overspend a considerable amount of bitrate te regions that might not benefit from a low quantizer(low amounts of distortion, so less compression) while not giving enough bitrate to zones that might actually need it. This can even cause issues temporally, as bitrate allocation within a group of frames(GOPs) may be skewed towards more complex and high motion frames, while leaving other frames with barely any bitrate to work with, creating visual artifacts such as blocking and banding.
+Often times, an encoder does not know about the best way to allocate the bitrate budget across a frame, and may overspend a considerable amount of bitrate to regions that might not benefit from a low quantizer(low amounts of distortion, so less compression) while not giving enough bitrate to zones that might actually need it. This can even cause issues temporally, as bitrate allocation within a group of frames(GOPs) may be skewed towards more complex and high motion frames, while leaving other frames with barely any bitrate to work with, creating visual artifacts such as blocking and banding.
 ==== Solution / Task ====