Summer of Code 2021
Introduction
This year Xiph.org is focusing on the rav1e AV1 encoder for its GSoC participation. Both video and still images are currently hot topics, especially with the recent support of AVIF within browsers.
Below you'll find the description for the following GSoC project ideas around the rav1e project.
If you want to know more about a particular idea, please get in touch with the people listed under "possible mentors". While no guarantee, that the person will be the actual mentor for the task, they know it and will be happy to answer your questions. Please join #xiph in Freenode IRC for more. If you don't have IRC set up you can easily connect from your web browser.
Detailed Project Descriptions
These ideas were suggested by various members of the developer community as projects that would be beneficial and which we feel we can mentor. Students should feel free to select one of these, develop a variation, or propose their own ideas. Here, ideally.
Grain synthesis implementation inside of the rav1e encoder
Grain synthesis is using the idea of modeling noise temporally and spatially using noise estimation.
Problem / Intro
Keeping high frequency detail and noise(dithering, camera noise, grain) using traditional encoder techniques is very expensive in terms of bitrate allocation, and some tools implemented to take care of that problem can create additional artifacts that are not pleasing to the general viewer experience, or are detrimental to the fidelity of the image.
Solution / Task
Implementing grain synthesis that models the noise parameters of a video, and applies the generated noise parameters during the decoding process, saving very high amounts of bitrates and providing a very high subjective visual fidelity and appeal.
Making it faster than other forms of grain synthesis via smarter algorithms and using various forms of threading to speed up its application, such as tile threading and integration with rav1e-by-gop, making it possible to use as part of any encoding workflow. This will make sure adoption of the technique becomes as widespread as possible.
Requirements
The student should be familiar with Rust and C, and must have a light background in general visual media encoding, such as video and image compression.
Difficulty: Medium to difficult depending on the novel ideas implemented and complexity of the final implementation.
Possible Mentors
Adaptive quantization
Adaptive quantization is the process of an algorithm trying to efficiently allocate bitrate among the various macroblocks found in a frame by varying the quantizer across each of them according to different visual targets.
Problem / Intro
Often times, an encoder does not know about the best way to allocate the bitrate budget across a frame, and may overspend a considerable amount of bitrate to regions that might not benefit from a low quantizer(low amounts of distortion, so less compression) while not giving enough bitrate to zones that might actually need it. This can even cause issues temporally, as bitrate allocation within a group of frames(GOPs) may be skewed towards more complex and high motion frames, while leaving other frames with barely any bitrate to work with, creating visual artifacts such as blocking and banding.
Solution / Task
Implementing 1-3 forms on adaptive quantization either based on variance, complexity and/or variance variance with a bias in low contrast frames in rav1e. This will make bitrate allocation more efficient, and avoid bitrate overspending in areas which either need the lower quantizer to avoid the presence of lower quality frames that might detract from the viewer experience, and make objective/subjective quality targets easier to achieved. The combination of powerful adaptive quantization and grain synthesis would allow for a higher subjective quality viewing experience at lower bitrates while potentially lowering computational complexity by a good margin.
Making a smart adaptive AQ mode in which the encoder chooses which adaptive quantization algorithm to use depending on the scene featured in a GOP. Potentially difficult.
Requirements
The student should be familiar with Rust and C, and having a background in general visual media encoding, such as video and image compression is recommended.
Difficulty: Medium to high depending on which targets the student chooses to follow.
Possible Mentors
Make rav1e-by-gop expose a rav1e API
rav1e-by-gop is an extended command line encoder that provides additional encoding strategies such as by-gop parallel encoding across multiple machines.
Problem / Intro
rav1e-by-gop is currently a command line program, some users might want to enjoy on the extended features from other programs even if they do not always belong to an encoder.
Solution / Task
make rav1e-by-gop expose the same API of the normal rav1e to make easy to use the multiple machine encoding features from other programs.
Requirements
The student should be familiar with Rust or C programming.
Video knowledge is not strictly necessary, however a basic understanding of the concepts is vastly beneficial.
Possible Mentors
Visual metric targeting in rav1e-by-gop
Objective metrics are used to evaluate an encoder's performance in a diverse set of scenarios. Different metrics such as PSNR, SSIM, DSSIM, VMAF and some closed no-reference metrics are used in the field to record encoder performance changes across versions trying to correlate closely with human perception.
Problem / Intro
Classical methods of rate control such as ABR(Average BitRate), fixed quantizers and even CRF(Constant Rate Factor) have the issue of not targeting a certain quality level. This can result in starved encodes where the bitrate budget has to be kept low to stay watchable by the viewer without interruption, leading to scenes that have exceptionally good visual targets by overspending bitrate, and scenes that have very poor visual appeal by having too little bitrate, detracting from the viewer experience entirely. More advanced forms of rate control like CRF help somewhat, but they still have the issue of having to overshoot so the lower quality scenes do not suffer, and do not adapt to the different type of content encoded, resulting in variable quality encodes.
Solution / Task
Implementing visual metric targeting based on VMAF(mainly used for video) and butteraugli(mainly used for images) as part of rav1e-by-gop as a secondary rate control option.
The application of visual metric targeting in rav1e-by-gop would take advantage of its adaptive keyframe placement and smart scene detection to its fullest. This would allow for the best rate control possible, as short scenes in the length of 1-15s are where visual metrics such as VMAF shine the most. The idea is to encode first with a very fast speed preset in the encoder to gauge the quality at a prefixed quantizer. If the visual metric target set is not achieved, the encoder tries again once or twice until it gets the right result. With this method, instead of targeting an average of bitrate, you would target a visual score, getting higher efficiency and higher subjective quality. This would also be advantageous in terms of encoding time spent, as encoder complexity could be dialed back while keeping overall efficiency the same or higher, with efficiency being a function of both encoder efficiency and rate control.
Implementing butteraugli quality targeting as an option using rav1e for AVIF images. Since visual quality requirements are considerably higher for intra only(image only) media, keeping high visual fidelity is even more important than video compression. Quality targeting iterations would also be quite useful here.
Requirements
The student should be familiar with Rust and C. General interest in image and video coding is recommended
Difficulty: Medium.
Possible Mentors
Improve fast scene detection modes for rav1e
Scene detection method determine where it's necessary to split video sequences for optimal encoding efficiency.
Problem / Intro
Currently implemented fast scene detection method is not optimal and sometimes give false results. This is also detrimental to per scene visual metric quality targeting.
Solution / Task
Reevaluate current fast scene detection method. Fast method correctness can be improved. Possible features:
- Adaptive resolution scaling for fast scene detection.
- Implementing additional methods/metrics for fast scene detection.
- Adaptive threshold for scene detection
- Better external scene detection for rav1e-by-gop per scene visual metric quality targeting.
Requirements
The student should be familiar with Rust. Video knowledge is preferable.
Complexity: Medium.
Possible Mentors
Improved cluster support for Icecast
Icecast servers deliver streams to million of users simultaneous worldwide. Each instance can handle many thousand clients at the same time. However redundancy, scalability, hardware requirement, and most importantly network connectivity often requires to use several instances in a professional deployment.
Problem / Intro
Icecast is designed as a standalone application. While basic support exists (such as master-slave mode) support for clusters can be improved. At this point a cluster level mangement instance seems to be missing.
Solution / Task
A solution for cluster management should be developed. A cluster controller should be implemented as well as support within Icecast. The focus is on Icecast itself at this point. However the controler should at least demonstrate all features implemented in Icecast. Possible features:
- Automatic master-slave, and relay configuration.
- Load distribution.
- Statistic data collection.
- Log collection.
- Node monitoring.
- Signalling of cluster state to external components (e.g. for automatic cluster scaling)
Requirements
The student should be familiar with C. A basic understanding of HTTP, as well as other web technologies is helpful. Knowledge of visualisation technologies is not required.
Complexity: Medium to high.
Possible Mentors
phschafft (teamed with someone else)
Icecast currently supports navigation of listeners between different streams. This was developed mostly for fallbacks (providing alternative content if the primary source fails). This support should be improved to provide better interaction with contents.
Problem / Intro
The current implementation is designed to work very robust for source side events (such as fallbacks). However it fails for two requirements:
- Listener initiated interaction such as adaptive streaming.
- Exact timing.
Solution / Task
The current concept is capable of being extended to the new requirements. Code should be written to add the new features on the existing (and well proven) infrastructure. The following additional features would be needed: Detection and matching of features within and between streams. This is required for any kind of synchronisation.
- Executing operations exactly at detached features.
- Adding ways to communicate features and operations between listener and Icecast, source and Icecast, and between multiple
- Icecast instances of the same cluster.
Requirements
The student should be familiar with C. A basic understanding of HTTP, Ogg, and Matroska/WebM is helpful.
Complexity: Medium to high.
Possible Mentors
phschafft (teamed with someone else)
Uniform return channel for Icecast
Icecast supports broadcasting media to several thousand listeners per instance. In a classic setup this is a one way process from the source (such as a radio or TV studio) to the consumer. However it is sometimes useful to provide a return channel, such as for implementing polls.
Problem / Intro
Returning information from the listener to the source is part of classic media. The need has become more relevant with the development of more interactive ways of the web. Several technologies have been used to implement this including asking the listeners to call in, send e-mails, or comment on a web page.
Classic ways to implement feedback include a media break and are only loosely bound to the forward channel.
Solution / Task
A uniform return channel should be implemented that allows several types of data to be send from the listener to the source. This includes three major parts:
- Improved session handling (both for listeners and for sources)
- Implementing a return channel for listeners.
- Implementing a return channel for sources.
Requirements
The student should be familiar with C and HTTP.
Complexity: Medium.
Possible Mentors
phschafft (teamed with someone else)
(Live) listener statistics for Icecast
Icecast supports writing a basic access.log that includes client information as well as connection time. In addition a playlist log is supported, and live statistic data via the STATS interface.
A standard solution to use this data for detailed listener statistics is missing.
Problem / Intro
In the current solution there is no off the shelf solution to process the statistic data provided by Icecast. The best available solutions are standard access log analysers. A solution for live statistics is completely missing. Statistics taking content into account is also absent.
Solution / Task
To improve the situation two major steps must be accomplished:
- The statistic interface of Icecast must be enhanced to provide the required information.
- A solution that analyses this data must be developed. The focus is on this part.
This project allows for a wide range of ideas from the participants to be incorporated. There is not yet and specific technical direction set. Evaluating different options is the first part of the project.
Requirements
The student should be familiar with C. A basic understanding of log analysis, and monitoring and/or data collecting systems is helpful.
Difficulty: Medium depending on which targets the student chooses to follow.
Possible Mentors
phschafft (teamed with someone else)
Support WebAssembly SIMD in rav1e
rav1e supports the WASI platform and it has its javascript API bindings relying on it.
Problem / Intro
The WebAssembly SIMD is getting closer to be available, we should support it.
Solution / Task
- Implement the dispatch logic for WASM SIMD as done already for x86_64 and aarch64.
- Implement the Sum of absolute difference (SAD) and Sum of absolute transformed differences (SATD)
- Implement the inverse transforms (idct, iadst, identity, ...)
- Implement the motion compensation.
Requirements
The student should be familiar with Rust, WASM, wasmtime and related tools. Knowledge of x86 or arm assembly is not needed but will help.
Possible Mentors
Implement butteraugli in av-metrics
av-metrics is a collection of video quality metrics, butteraugli is a promising psychovisual similarity metric.
Problem / Intro
Currently the implementation of butteraugli exists as stand-alone codebase. The code is readable, but it could be faster.
Solution / Task
- Implement rust bindings to the reference butteraugli.
- Implement butteraugli in pure rust within av-metrics.
- Write integration and unit tests to make sure the implementation does not diverge
- Write criterion benchmarks
- Implement x86_64 or aarch64 optimizations for it, using intrinsics or plain ASM.
Requirements
The student should be familiar with Rust, C and C++. Knowledge of x86 or arm assembly is welcome.
Possible Mentors
Deploy Opus-in-MP4 in WebAssembly for Safari
Implement the missing Opus support for Safari.
Problem / Intro
Safari supports decoding Opus streams, but only when packed in CAF with no support for Media Source Extension. It would be nice to deploy a middleware in WASM that remuxes non-fragmented MP4 to CAF to support progressive streams, and to determine if enough hooks exist for fragmented MP4s as well. Needless to say, audio and video need to remain in sync throughout the whole process.
Solution / Task
- Make sure the underlying Opus decoder works in Safari.
- Implement a minimal mp4-to-caf demuxer/muxer in your language of choice.
- Compile this remuxer to Javascript/WebAssembly/WASM.
- Develop a proof of concept streaming for a non-fragmented mp4 file.
- Package the system to a reusable library.
- Investigate if it's possible to port this functionality to fragmented mp4 as well.
Requirements
The student should be familiar with streaming protocols, encapsulation, WebAssembly. Expect lots of hacking in Javascript.