https://wiki.xiph.org/api.php?action=feedcontributions&user=Derf&feedformat=atomXiphWiki - User contributions [en]2024-03-28T20:33:43ZUser contributionsMediaWiki 1.40.1https://wiki.xiph.org/index.php?title=OpusFAQ&diff=16606OpusFAQ2017-07-07T20:46:56Z<p>Derf: /* Wouldn't it be better to build an index? */</p>
<hr />
<div>[[Image:Opus logo trans.png]]<br />
<br />
If you are looking for info not covered in this FAQ, try the '''[https://opus-codec.org main Opus website]''' or the pages included in the '''[[:Category:Opus|Opus category]]''' of this wiki.<br />
<br />
== General Questions ==<br />
<br />
=== What is Opus? Who created it? ===<br />
<br />
Opus is a totally open, royalty-free, highly versatile audio codec.<br />
<br />
It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''. <br />
<br />
Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.<br />
<br />
=== How does Opus compare to other codecs? ===<br />
<br />
Opus is distinguished from most high quality formats (eg: [[Vorbis]], AAC, MP3) by having '''[https://tools.ietf.org/html/rfc6716#section-2 low delay]''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: [[Speex]], G.711, GSM) by supporting '''[https://tools.ietf.org/html/rfc6716#section-2.1.1 high audio quality]''' (supports narrow-band all the way to full-band audio).<br />
<br />
It '''[https://opus-codec.org/comparison meets or exceeds existing codecs' quality]''' across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.<br />
<br />
Most importantly, the Opus format and its reference implementation are both available under '''[https://opus-codec.org/license/ liberal, royalty-free licenses]'''.<br /><br />
This makes it:<br />
* easy to adopt<br />
* compatible with free software<br />
* suitable for use as part of the basic infrastructure of the Internet<br />
<br />
=== Does Opus make all those other lossy codecs obsolete? ===<br />
<br />
Yes.<br />
<br />
From a technical point of view (loss, delay, bitrates, ...) Opus renders '''[[Speex]]''' obsolete and should also replace '''[[Vorbis]]''' and the common proprietary codecs too (e.g. AAC, MP3, ...).<br />
<br />
=== Will Opus replace Vorbis in video files? ===<br />
<br />
For '''[[Ogg]]''' video files (which use the '''[[Theora]]''' video codec), you ''can'' use Opus instead of Vorbis, but the overall size reduction will be minimal and it will break compatibility with existing players.<br />
<br />
For WebM video files, the convention is to use the '''[http://www.webmproject.org/vp9/ VP9 video codec]''' when using Opus as an audio codec.<br />
<br />
=== How do I use Opus? ===<br />
<br />
For now, the best way to '''encode''' audio into Opus files is to use the '''opusenc''' command-line tool from the '''[https://opus-codec.org/downloads/ opus-tools package]'''.<br />
<br />
If you want to encode many files at once (e.g. your music library), try the applications listed in the '''[[OpusSupport|Opus Support]]''' page.<br />
<br />
For rough guidelines on encoding settings, see the '''[[Opus Recommended Settings]]''' page.<br />
<br />
=== What programs support Opus? ===<br />
<br />
Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.<br />
<br />
For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.<br />
<br />
Opus is a relatively new codec: '''[[OpusSupport|many more applications]]''' will support it in the near future.<br />
<br />
=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===<br />
<br />
Yes and no.<br />
<br />
Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.<br />
<br />
However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.<br />
<br />
The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.<br />
<br />
See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.<br />
<br />
If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!<br />
<br />
=== What are the licensing requirements? ===<br />
<br />
The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met. <br />
<br />
Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).<br />
<br />
See the '''[https://www.opus-codec.org/license/ Opus Licensing]''' page for details.<br />
<br />
=== Why make Opus free? ===<br />
<br />
On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.<br />
<br />
Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.<br />
<br />
Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.<br />
<br />
This is why Opus, unlike many codecs, is free.<br />
<br />
=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===<br />
<br />
No.<br />
<br />
The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.<br />
<br />
=== Why not keep the SILK and CELT codecs separate? ===<br />
Opus is more than just two independent codecs with a switch.<br />
<br />
In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.<br />
<br />
Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.<br />
<br />
=== Now that Opus is standardized, will its development stop or can it be further improved? ===<br />
Yes, Opus '''can''' and '''should''' be improved, because unlike most '''[https://en.wikipedia.org/wiki/ITU-T#Key_standards_published_by_ITU ITU-T codecs]''', Opus is only defined in terms of its decoder.<br />
<br />
The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.<br />
<br />
Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.<br />
<br />
In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.<br />
<br />
=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===<br />
<br />
Yes.<br />
<br />
=== In what ways is Opus optimized for the Internet? ===<br />
<br />
Opus has good packet loss robustness and concealment, but its optimisations go further.<br />
<br />
One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.<br />
<br />
This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.<br />
<br />
One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.<br />
<br />
=== What applications for Android can play Opus? ===<br />
<br />
Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.<br />
<br />
=== When will the next version be released? ===<br />
<br />
When it's done. Seriously, we do not know.<br />
<br />
Opus is not a large project with a fixed release schedule.<br />
<br />
That being said, our '''[https://www.opus-codec.org/downloads/ pre-releases]''' and even the git repositories ('''[https://git.xiph.org/?p=opus.git Xiph]''', '''[https://github.com/xiph/opus GitHub]''') are pretty stable and given proper testing (which you should always do anyway), are safe to distribute.<br />
<br />
Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.<br />
<br />
== Software Developers' Questions ==<br />
<br />
=== On what platforms does Opus run? ===<br />
<br />
The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.<br />
<br />
Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.<br />
<br />
=== Is there a fixed-point implementation? ===<br />
<br />
Yes.<br />
<br />
The fixed-point and floating-point decoder and encoder implementations are part of the same code base.<br />
<br />
The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.<br />
<br />
=== Which implementation should I use? ===<br />
<br />
While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.<br />
<br />
The [https://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.<br />
<br />
All Opus implementations are compatible by definition.<br />
<br />
=== How is supporting Opus different from supporting Speex/G.711/MP3? ===<br />
<br />
Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''. <br />
<br />
The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.<br />
<br />
=== My application doesn't work. Can anyone help me? ===<br />
<br />
It's possible to get help, but before doing so, there are a few basic things to try:<br />
<br />
* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.<br />
* Read the [https://www.opus-codec.org/docs/ Opus documentation].<br />
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.<br />
<br />
If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.<br />
<br />
=== How do I report a bug? ===<br />
<br />
If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].<br />
<br />
Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.<br />
<br />
If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.<br />
<br />
Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].<br />
<br />
=== What is Opus Custom? ===<br />
<br />
Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.<br />
<br />
Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.<br />
<br />
For these reasons, '''its use is discouraged''' outside of very specific applications. <br />
<br />
You may want to use Opus Custom for:<br />
<br />
* ultra-low-delay applications, where synchronization with the soundcard buffer is important. <br />
* low-power embedded applications, where compatibility with others is not important.<br />
<br />
For almost all other types of applications, Opus Custom should not be used.<br />
<br />
=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===<br />
<br />
Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.<br />
<br />
Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.<br />
<br />
The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed '''[https://github.com/xiph/opus-tools/blob/master/src/resample.c resampler]''' which can be used where resampling is required.<br />
<br />
=== But won't the resampler hurt the quality? Isn't it better to use 44.1 kHz directly? ===<br />
<br />
Not really. The quality degradation caused by any reasonable resampler (SoX, libspeexdsp, libsamplerate, ...) is far less than the distortion caused by the best lossy codec at its highest bitrate. If you can't tolerate the quality degradation caused by a good 44.1 ↔ 48 kHz resampler, then you shouldn't be using a lossy codec in the first place. Similarly, the extra CPU spent in the resampler is small compared to the rest of the codec. Not only that, but many soundcards only support 48 kHz on playback, so players can directly play the output rather than resample it to 48 kHz (e.g. for a 44.1 kHz MP3). So effectively, Opus is only shifting the burden of resampling from the decoder side to the encoder side.<br />
<br />
One advantage of supporting only one internal rate is that it makes it possible for Opus to support many features, including efficient speech compression (through SILK) and real-time applications. It also means all the quality tuning effort can be spent on a single configuration, which helps bring even better quality.<br />
<br />
=== How is the bitrate setting used in VBR mode? ===<br />
<br />
Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.<br />
<br />
The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.<br />
<br />
=== What frame size should I use? ===<br />
<br />
A '''20ms''' frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.<br />
<br />
Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent). For file encoding, using a frame size larger than 20 ms will usually result in '''worse''' quality for the same bitrate because it constrains the encoder in the decisions it can make.<br />
<br />
=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===<br />
<br />
The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.<br />
<br />
In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.<br />
<br />
FEC is only used by the encoder under certain conditions:<br />
* the feature must be enabled via the '''OPUS_SET_INBAND_FEC''' CTL<br />
* the encoder must be told to expect loss via the '''OPUS_SET_PACKET_LOSS_PERC''' CTL<br />
* the codec must be operated in any of the '''Linear Prediction''' or '''Hybrid''' modes<br />
<br />
Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.<br />
<br />
Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default, the implementation assumes there is no loss.<br />
<br />
=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===<br />
<br />
A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.<br />
<br />
To build Opus without the references to <tt>malloc/free</tt>, you must:<br />
<br />
* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application<br />
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.<br />
<br />
If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage.<br><br />
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.<br />
<br />
=== How can I ensure that my software interoperates with other software implementing Opus? ===<br />
<br />
For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.<br />
<br />
In general, here's a list of specific issues to check:<br />
* Can your application handle all frame sizes, including changing the frame size from frame to frame?<br />
* Does your application react properly to lost packets, by calling the decoder with a NULL packet?<br />
<br />
=== What is the complexity of Opus? ===<br />
<br />
The complexity of Opus varies by a large amount based on the settings used.<br />
<br />
It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone. <br />
<br />
For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.<br />
<br />
=== Opus is using too much CPU for my application. What can I do? ===<br />
<br />
First don't panic and don't start writing assembly just yet.<br />
<br />
It's possible that you're just not using the right set of options.<br />
<br />
If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.<br />
<br />
Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[https://www.opus-codec.org/docs/ Documentation]''' for details).<br />
<br />
If all else fails and you need to optimize the Opus code, see the next question.<br />
<br />
=== I would like to optimize/improve/help with Opus. Where should I start? ===<br />
<br />
Please '''[https://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.<br />
<br />
This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path. More details in the '''[[OpusContributing|contributing page]]'''.<br />
<br />
=== Does Opus have an echo canceller like Speex does? ===<br />
<br />
Echo cancellation is completely independent from codecs.<br />
<br />
You can use any echo canceller (including the one from libspeexdsp) along with Opus.<br />
<br />
That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].<br />
<br />
=== How do I get the duration of a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__info.html op_pcm_total()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement this yourself, you need to<br />
* Read the BOS (Beginning Of Stream) pages to enumerate the serial numbers of all concurrently multiplexed streams, identify the Opus stream you want, and get its preskip value.<br />
* Read up through the first complete audio data page to compute the starting granule position (since the timestamps might not start at 0, e.g., if the file was captured from a live stream that was joined after the start).<br />
* Seek near the end of a file and look for a page with the same serial number as found in the headers (just under 64 kB from the end should be sufficient to ensure you find a page, assuming the Opus data is not multiplexed with another stream and there is no trailing garbage in the file).<br />
* If you find a page whose serial number was not included in the original set of BOS pages, you have a chained stream. You need to bisect the file to identify the end of the first chain and the start of the next, and repeat this process for each link in the chain.<br />
* If you don't find any pages at all, or find a page whose serial number was included in the original set of BOS pages, but was not the serial number of the Opus stream you want, back up and try again (being careful to avoid rescanning the same data, which can produce quadratic worst-case complexity).<br />
* If you find a page whose serial number matches the Opus stream you want, look at its final granule position, and compute the total duration (in seconds) as (final_granule_position - initial_granule_position - preskip)/48000.0.<br />
<br />
=== Why don't you store the duration in the header? Isn't all of that slow and complicated? ===<br />
<br />
Computing the duration directly from the file contents allows files to be written in a single pass, without any seeking, which is necessary for live streaming. Chaining also simplifies live streaming, as you can just pipe multiple files into the same network connection, with all associated metadata updates, etc., and the results are still valid .opus files (contrast with the '''[http://www.smackfu.com/stuff/programming/shoutcast.html hacks used to add metadata to MP3 streams]''').<br />
<br />
Opening a typical .opus file, which is not multiplexed and not chained, and computing the duration over the network requires just one extra HTTP request, which can proceed in parallel with the buffering in the main request. This is the behavior you will get from libopusfile's HTTP backend by default.<br />
<br />
Enumeration of chain boundaries can be expensive in files with many links, but in our testing libopusfile used nearly an order of magnitude fewer seeks to do this than some other media frameworks (at the time). Storing a duration in a header wouldn't solve this, since every link in a chain has its own, independent headers. If the cost of chain enumeration is a problem, the best way to avoid it is to store the links in separate files (i.e., don't use chaining).<br />
<br />
=== How do I seek in a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__seeking.html op_pcm_seek() or op_raw_seek()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement seeking yourself, you need to<br />
* Identify the link that contains the target (if you have a chained file).<br />
* Adjust the target by 80 ms to get enough pre-roll data (to ensure the decoder will have converged by the time you reach the target), as recommended by '''[https://tools.ietf.org/html/rfc7845 RFC 7845]'''.<br />
* Estimate the location of the last audio data page with a completed packet prior to the adjusted target, using the duration and size (in bytes) of the link.<br />
* Seek to that location and scan forward until you find an audio data page with a completed packet (that contains a valid granule position).<br />
* If you think you are sufficiently close to the adjusted target, scan forward until you find the next audio data page with a completed packet.<br />
* If the adjusted target lies between the first audio data page with a completed packet you found and the next one, stop. You can decode forward from here and start playing when you reach your (original, unadjusted) target.<br />
* Otherwise, go back and re-estimate the seek location using the granule positions and file offsets of the page(s) you just found.<br />
<br />
libopusfile includes fallbacks to prevent pathological worst-case behavior when its guesses are repeatedly wrong. Weighted bisection can degrade to a linear scan, but libopusfile's worst case is within a constant factor of naive bisection (i.e., logarithmic). We have only ever observed such pathological behavior in files we manually constructed to trigger it.<br />
<br />
libopusfile also takes shortcuts when the target location is near the current position, to make small seeks cheaper. In the best case it can loop forever over very short files whose data is contained in a single page (e.g., less than 1 second long with default encoder settings) without any seeking at all.<br />
<br />
You can find more information on seeking in files that contain Opus multiplexed with other streams (e.g., video) '''[[GranulePosAndSeeking|on this page]]'''.<br />
<br />
=== Wouldn't it be better to build an index? ===<br />
<br />
As with file durations, an index at the beginning of the file is incompatible with live streaming. It also means more data has to be fetched before a file can start playing over the network, because you must read past the index even when you don't intend to seek. The index could be stored at the end (which even still allows encoding the file in a single pass), but this requires one (or more) extra seeks to read the index (especially if its exact location at the end is not known), either on file open or on first seek. Unlike the final timestamp, which is small and fixed in size, an index grows with the file duration, and can have unbounded size. It is also easy for an index to become out of sync with a file that has been edited or damaged, in which case seeking will simply fail. By contrast, you can seek in a truncated .opus download without issues.<br />
<br />
In practice, bisection seeking on VBR audio achieves performance that is very nearly as good as seeking with an index, without any of the drawbacks of an index. libopusfile provides a test program called seeking_example which can be used to benchmark the performance on your files.<br />
<br />
On a 96 kbps VBR file nearly one hour long (the second movement of Mahler's Symphony No. 8 "Symphony of a Thousand"):<br />
<br />
Testing exact PCM seeking to random places in 169680000 samples (58m55.000s)...<br />
Total seek operations: 1020 (1.020 per exact seek, 2 maximum).<br />
<br />
On a chained file formed by concatenating the eight test vectors for the currently supported channel layouts in mapping family 1:<br />
<br />
Opened file containing 8 links with 18 seeks (2.250 per link).<br />
Testing exact PCM seeking to random places in 2759064 samples (57.481s)...<br />
Total seek operations: 946 (0.946 per exact seek, 2 maximum).<br />
<br />
That is, the number of physical seeks required is almost always 1, every once in a while 2, and in short files, sometimes even 0.<br />
<br />
[[Category:Opus]]</div>Derfhttps://wiki.xiph.org/index.php?title=OpusFAQ&diff=16605OpusFAQ2017-07-07T20:45:39Z<p>Derf: /* How do I seek in a .opus file? */</p>
<hr />
<div>[[Image:Opus logo trans.png]]<br />
<br />
If you are looking for info not covered in this FAQ, try the '''[https://opus-codec.org main Opus website]''' or the pages included in the '''[[:Category:Opus|Opus category]]''' of this wiki.<br />
<br />
== General Questions ==<br />
<br />
=== What is Opus? Who created it? ===<br />
<br />
Opus is a totally open, royalty-free, highly versatile audio codec.<br />
<br />
It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''. <br />
<br />
Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.<br />
<br />
=== How does Opus compare to other codecs? ===<br />
<br />
Opus is distinguished from most high quality formats (eg: [[Vorbis]], AAC, MP3) by having '''[https://tools.ietf.org/html/rfc6716#section-2 low delay]''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: [[Speex]], G.711, GSM) by supporting '''[https://tools.ietf.org/html/rfc6716#section-2.1.1 high audio quality]''' (supports narrow-band all the way to full-band audio).<br />
<br />
It '''[https://opus-codec.org/comparison meets or exceeds existing codecs' quality]''' across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.<br />
<br />
Most importantly, the Opus format and its reference implementation are both available under '''[https://opus-codec.org/license/ liberal, royalty-free licenses]'''.<br /><br />
This makes it:<br />
* easy to adopt<br />
* compatible with free software<br />
* suitable for use as part of the basic infrastructure of the Internet<br />
<br />
=== Does Opus make all those other lossy codecs obsolete? ===<br />
<br />
Yes.<br />
<br />
From a technical point of view (loss, delay, bitrates, ...) Opus renders '''[[Speex]]''' obsolete and should also replace '''[[Vorbis]]''' and the common proprietary codecs too (e.g. AAC, MP3, ...).<br />
<br />
=== Will Opus replace Vorbis in video files? ===<br />
<br />
For '''[[Ogg]]''' video files (which use the '''[[Theora]]''' video codec), you ''can'' use Opus instead of Vorbis, but the overall size reduction will be minimal and it will break compatibility with existing players.<br />
<br />
For WebM video files, the convention is to use the '''[http://www.webmproject.org/vp9/ VP9 video codec]''' when using Opus as an audio codec.<br />
<br />
=== How do I use Opus? ===<br />
<br />
For now, the best way to '''encode''' audio into Opus files is to use the '''opusenc''' command-line tool from the '''[https://opus-codec.org/downloads/ opus-tools package]'''.<br />
<br />
If you want to encode many files at once (e.g. your music library), try the applications listed in the '''[[OpusSupport|Opus Support]]''' page.<br />
<br />
For rough guidelines on encoding settings, see the '''[[Opus Recommended Settings]]''' page.<br />
<br />
=== What programs support Opus? ===<br />
<br />
Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.<br />
<br />
For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.<br />
<br />
Opus is a relatively new codec: '''[[OpusSupport|many more applications]]''' will support it in the near future.<br />
<br />
=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===<br />
<br />
Yes and no.<br />
<br />
Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.<br />
<br />
However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.<br />
<br />
The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.<br />
<br />
See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.<br />
<br />
If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!<br />
<br />
=== What are the licensing requirements? ===<br />
<br />
The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met. <br />
<br />
Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).<br />
<br />
See the '''[https://www.opus-codec.org/license/ Opus Licensing]''' page for details.<br />
<br />
=== Why make Opus free? ===<br />
<br />
On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.<br />
<br />
Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.<br />
<br />
Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.<br />
<br />
This is why Opus, unlike many codecs, is free.<br />
<br />
=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===<br />
<br />
No.<br />
<br />
The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.<br />
<br />
=== Why not keep the SILK and CELT codecs separate? ===<br />
Opus is more than just two independent codecs with a switch.<br />
<br />
In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.<br />
<br />
Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.<br />
<br />
=== Now that Opus is standardized, will its development stop or can it be further improved? ===<br />
Yes, Opus '''can''' and '''should''' be improved, because unlike most '''[https://en.wikipedia.org/wiki/ITU-T#Key_standards_published_by_ITU ITU-T codecs]''', Opus is only defined in terms of its decoder.<br />
<br />
The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.<br />
<br />
Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.<br />
<br />
In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.<br />
<br />
=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===<br />
<br />
Yes.<br />
<br />
=== In what ways is Opus optimized for the Internet? ===<br />
<br />
Opus has good packet loss robustness and concealment, but its optimisations go further.<br />
<br />
One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.<br />
<br />
This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.<br />
<br />
One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.<br />
<br />
=== What applications for Android can play Opus? ===<br />
<br />
Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.<br />
<br />
=== When will the next version be released? ===<br />
<br />
When it's done. Seriously, we do not know.<br />
<br />
Opus is not a large project with a fixed release schedule.<br />
<br />
That being said, our '''[https://www.opus-codec.org/downloads/ pre-releases]''' and even the git repositories ('''[https://git.xiph.org/?p=opus.git Xiph]''', '''[https://github.com/xiph/opus GitHub]''') are pretty stable and given proper testing (which you should always do anyway), are safe to distribute.<br />
<br />
Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.<br />
<br />
== Software Developers' Questions ==<br />
<br />
=== On what platforms does Opus run? ===<br />
<br />
The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.<br />
<br />
Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.<br />
<br />
=== Is there a fixed-point implementation? ===<br />
<br />
Yes.<br />
<br />
The fixed-point and floating-point decoder and encoder implementations are part of the same code base.<br />
<br />
The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.<br />
<br />
=== Which implementation should I use? ===<br />
<br />
While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.<br />
<br />
The [https://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.<br />
<br />
All Opus implementations are compatible by definition.<br />
<br />
=== How is supporting Opus different from supporting Speex/G.711/MP3? ===<br />
<br />
Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''. <br />
<br />
The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.<br />
<br />
=== My application doesn't work. Can anyone help me? ===<br />
<br />
It's possible to get help, but before doing so, there are a few basic things to try:<br />
<br />
* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.<br />
* Read the [https://www.opus-codec.org/docs/ Opus documentation].<br />
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.<br />
<br />
If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.<br />
<br />
=== How do I report a bug? ===<br />
<br />
If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].<br />
<br />
Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.<br />
<br />
If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.<br />
<br />
Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].<br />
<br />
=== What is Opus Custom? ===<br />
<br />
Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.<br />
<br />
Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.<br />
<br />
For these reasons, '''its use is discouraged''' outside of very specific applications. <br />
<br />
You may want to use Opus Custom for:<br />
<br />
* ultra-low-delay applications, where synchronization with the soundcard buffer is important. <br />
* low-power embedded applications, where compatibility with others is not important.<br />
<br />
For almost all other types of applications, Opus Custom should not be used.<br />
<br />
=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===<br />
<br />
Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.<br />
<br />
Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.<br />
<br />
The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed '''[https://github.com/xiph/opus-tools/blob/master/src/resample.c resampler]''' which can be used where resampling is required.<br />
<br />
=== But won't the resampler hurt the quality? Isn't it better to use 44.1 kHz directly? ===<br />
<br />
Not really. The quality degradation caused by any reasonable resampler (SoX, libspeexdsp, libsamplerate, ...) is far less than the distortion caused by the best lossy codec at its highest bitrate. If you can't tolerate the quality degradation caused by a good 44.1 ↔ 48 kHz resampler, then you shouldn't be using a lossy codec in the first place. Similarly, the extra CPU spent in the resampler is small compared to the rest of the codec. Not only that, but many soundcards only support 48 kHz on playback, so players can directly play the output rather than resample it to 48 kHz (e.g. for a 44.1 kHz MP3). So effectively, Opus is only shifting the burden of resampling from the decoder side to the encoder side.<br />
<br />
One advantage of supporting only one internal rate is that it makes it possible for Opus to support many features, including efficient speech compression (through SILK) and real-time applications. It also means all the quality tuning effort can be spent on a single configuration, which helps bring even better quality.<br />
<br />
=== How is the bitrate setting used in VBR mode? ===<br />
<br />
Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.<br />
<br />
The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.<br />
<br />
=== What frame size should I use? ===<br />
<br />
A '''20ms''' frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.<br />
<br />
Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent). For file encoding, using a frame size larger than 20 ms will usually result in '''worse''' quality for the same bitrate because it constrains the encoder in the decisions it can make.<br />
<br />
=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===<br />
<br />
The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.<br />
<br />
In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.<br />
<br />
FEC is only used by the encoder under certain conditions:<br />
* the feature must be enabled via the '''OPUS_SET_INBAND_FEC''' CTL<br />
* the encoder must be told to expect loss via the '''OPUS_SET_PACKET_LOSS_PERC''' CTL<br />
* the codec must be operated in any of the '''Linear Prediction''' or '''Hybrid''' modes<br />
<br />
Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.<br />
<br />
Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default, the implementation assumes there is no loss.<br />
<br />
=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===<br />
<br />
A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.<br />
<br />
To build Opus without the references to <tt>malloc/free</tt>, you must:<br />
<br />
* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application<br />
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.<br />
<br />
If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage.<br><br />
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.<br />
<br />
=== How can I ensure that my software interoperates with other software implementing Opus? ===<br />
<br />
For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.<br />
<br />
In general, here's a list of specific issues to check:<br />
* Can your application handle all frame sizes, including changing the frame size from frame to frame?<br />
* Does your application react properly to lost packets, by calling the decoder with a NULL packet?<br />
<br />
=== What is the complexity of Opus? ===<br />
<br />
The complexity of Opus varies by a large amount based on the settings used.<br />
<br />
It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone. <br />
<br />
For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.<br />
<br />
=== Opus is using too much CPU for my application. What can I do? ===<br />
<br />
First don't panic and don't start writing assembly just yet.<br />
<br />
It's possible that you're just not using the right set of options.<br />
<br />
If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.<br />
<br />
Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[https://www.opus-codec.org/docs/ Documentation]''' for details).<br />
<br />
If all else fails and you need to optimize the Opus code, see the next question.<br />
<br />
=== I would like to optimize/improve/help with Opus. Where should I start? ===<br />
<br />
Please '''[https://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.<br />
<br />
This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path. More details in the '''[[OpusContributing|contributing page]]'''.<br />
<br />
=== Does Opus have an echo canceller like Speex does? ===<br />
<br />
Echo cancellation is completely independent from codecs.<br />
<br />
You can use any echo canceller (including the one from libspeexdsp) along with Opus.<br />
<br />
That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].<br />
<br />
=== How do I get the duration of a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__info.html op_pcm_total()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement this yourself, you need to<br />
* Read the BOS (Beginning Of Stream) pages to enumerate the serial numbers of all concurrently multiplexed streams, identify the Opus stream you want, and get its preskip value.<br />
* Read up through the first complete audio data page to compute the starting granule position (since the timestamps might not start at 0, e.g., if the file was captured from a live stream that was joined after the start).<br />
* Seek near the end of a file and look for a page with the same serial number as found in the headers (just under 64 kB from the end should be sufficient to ensure you find a page, assuming the Opus data is not multiplexed with another stream and there is no trailing garbage in the file).<br />
* If you find a page whose serial number was not included in the original set of BOS pages, you have a chained stream. You need to bisect the file to identify the end of the first chain and the start of the next, and repeat this process for each link in the chain.<br />
* If you don't find any pages at all, or find a page whose serial number was included in the original set of BOS pages, but was not the serial number of the Opus stream you want, back up and try again (being careful to avoid rescanning the same data, which can produce quadratic worst-case complexity).<br />
* If you find a page whose serial number matches the Opus stream you want, look at its final granule position, and compute the total duration (in seconds) as (final_granule_position - initial_granule_position - preskip)/48000.0.<br />
<br />
=== Why don't you store the duration in the header? Isn't all of that slow and complicated? ===<br />
<br />
Computing the duration directly from the file contents allows files to be written in a single pass, without any seeking, which is necessary for live streaming. Chaining also simplifies live streaming, as you can just pipe multiple files into the same network connection, with all associated metadata updates, etc., and the results are still valid .opus files (contrast with the '''[http://www.smackfu.com/stuff/programming/shoutcast.html hacks used to add metadata to MP3 streams]''').<br />
<br />
Opening a typical .opus file, which is not multiplexed and not chained, and computing the duration over the network requires just one extra HTTP request, which can proceed in parallel with the buffering in the main request. This is the behavior you will get from libopusfile's HTTP backend by default.<br />
<br />
Enumeration of chain boundaries can be expensive in files with many links, but in our testing libopusfile used nearly an order of magnitude fewer seeks to do this than some other media frameworks (at the time). Storing a duration in a header wouldn't solve this, since every link in a chain has its own, independent headers. If the cost of chain enumeration is a problem, the best way to avoid it is to store the links in separate files (i.e., don't use chaining).<br />
<br />
=== How do I seek in a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__seeking.html op_pcm_seek() or op_raw_seek()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement seeking yourself, you need to<br />
* Identify the link that contains the target (if you have a chained file).<br />
* Adjust the target by 80 ms to get enough pre-roll data (to ensure the decoder will have converged by the time you reach the target), as recommended by '''[https://tools.ietf.org/html/rfc7845 RFC 7845]'''.<br />
* Estimate the location of the last audio data page with a completed packet prior to the adjusted target, using the duration and size (in bytes) of the link.<br />
* Seek to that location and scan forward until you find an audio data page with a completed packet (that contains a valid granule position).<br />
* If you think you are sufficiently close to the adjusted target, scan forward until you find the next audio data page with a completed packet.<br />
* If the adjusted target lies between the first audio data page with a completed packet you found and the next one, stop. You can decode forward from here and start playing when you reach your (original, unadjusted) target.<br />
* Otherwise, go back and re-estimate the seek location using the granule positions and file offsets of the page(s) you just found.<br />
<br />
libopusfile includes fallbacks to prevent pathological worst-case behavior when its guesses are repeatedly wrong. Weighted bisection can degrade to a linear scan, but libopusfile's worst case is within a constant factor of naive bisection (i.e., logarithmic). We have only ever observed such pathological behavior in files we manually constructed to trigger it.<br />
<br />
libopusfile also takes shortcuts when the target location is near the current position, to make small seeks cheaper. In the best case it can loop forever over very short files whose data is contained in a single page (e.g., less than 1 second long with default encoder settings) without any seeking at all.<br />
<br />
You can find more information on seeking in files that contain Opus multiplexed with other streams (e.g., video) '''[[GranulePosAndSeeking|on this page]]'''.<br />
<br />
=== Wouldn't it be better to build an index? ===<br />
<br />
As with file durations, an index at the beginning of the file is incompatible with live streaming. It also means more data has to be fetched before a file can start playing over the network, because you must read past the index even when you don't intend to seek. The index could be stored at the end (which even still allows encoding the file in a single pass), but this requires one (or more) extra seeks to read the index (especially if its exact location at the end is not known), either on file open or on first seek. Unlike the final timestamp, which is small and fixed in size, an index grows with the file duration, and can have unbounded size. It is also easy for an index to become out of sync with a file that has been edited or damaged, in which case seeking will simply fail. In contrast, you can seek in a truncated .opus download without issues.<br />
<br />
In practice, bisection seeking on VBR audio achieves performance that is very nearly as good as seeking with an index, without any of the drawbacks of an index. libopusfile provides a test program called seeking_example which can be used to benchmark the performance on your files.<br />
<br />
On a 96 kbps VBR file nearly one hour long (the second movement of Mahler's Symphony No. 8 "Symphony of a Thousand"):<br />
<br />
Testing exact PCM seeking to random places in 169680000 samples (58m55.000s)...<br />
Total seek operations: 1020 (1.020 per exact seek, 2 maximum).<br />
<br />
On a chained file formed by concatenating the eight test vectors for the currently supported channel layouts in mapping family 1:<br />
<br />
Opened file containing 8 links with 18 seeks (2.250 per link).<br />
Testing exact PCM seeking to random places in 2759064 samples (57.481s)...<br />
Total seek operations: 946 (0.946 per exact seek, 2 maximum).<br />
<br />
That is, the number of physical seeks required is almost always 1, every once in a while 2, and in short files, sometimes even 0.<br />
<br />
[[Category:Opus]]</div>Derfhttps://wiki.xiph.org/index.php?title=OpusFAQ&diff=16604OpusFAQ2017-07-07T20:43:34Z<p>Derf: /* How do I seek in a .opus file? */</p>
<hr />
<div>[[Image:Opus logo trans.png]]<br />
<br />
If you are looking for info not covered in this FAQ, try the '''[https://opus-codec.org main Opus website]''' or the pages included in the '''[[:Category:Opus|Opus category]]''' of this wiki.<br />
<br />
== General Questions ==<br />
<br />
=== What is Opus? Who created it? ===<br />
<br />
Opus is a totally open, royalty-free, highly versatile audio codec.<br />
<br />
It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''. <br />
<br />
Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.<br />
<br />
=== How does Opus compare to other codecs? ===<br />
<br />
Opus is distinguished from most high quality formats (eg: [[Vorbis]], AAC, MP3) by having '''[https://tools.ietf.org/html/rfc6716#section-2 low delay]''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: [[Speex]], G.711, GSM) by supporting '''[https://tools.ietf.org/html/rfc6716#section-2.1.1 high audio quality]''' (supports narrow-band all the way to full-band audio).<br />
<br />
It '''[https://opus-codec.org/comparison meets or exceeds existing codecs' quality]''' across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.<br />
<br />
Most importantly, the Opus format and its reference implementation are both available under '''[https://opus-codec.org/license/ liberal, royalty-free licenses]'''.<br /><br />
This makes it:<br />
* easy to adopt<br />
* compatible with free software<br />
* suitable for use as part of the basic infrastructure of the Internet<br />
<br />
=== Does Opus make all those other lossy codecs obsolete? ===<br />
<br />
Yes.<br />
<br />
From a technical point of view (loss, delay, bitrates, ...) Opus renders '''[[Speex]]''' obsolete and should also replace '''[[Vorbis]]''' and the common proprietary codecs too (e.g. AAC, MP3, ...).<br />
<br />
=== Will Opus replace Vorbis in video files? ===<br />
<br />
For '''[[Ogg]]''' video files (which use the '''[[Theora]]''' video codec), you ''can'' use Opus instead of Vorbis, but the overall size reduction will be minimal and it will break compatibility with existing players.<br />
<br />
For WebM video files, the convention is to use the '''[http://www.webmproject.org/vp9/ VP9 video codec]''' when using Opus as an audio codec.<br />
<br />
=== How do I use Opus? ===<br />
<br />
For now, the best way to '''encode''' audio into Opus files is to use the '''opusenc''' command-line tool from the '''[https://opus-codec.org/downloads/ opus-tools package]'''.<br />
<br />
If you want to encode many files at once (e.g. your music library), try the applications listed in the '''[[OpusSupport|Opus Support]]''' page.<br />
<br />
For rough guidelines on encoding settings, see the '''[[Opus Recommended Settings]]''' page.<br />
<br />
=== What programs support Opus? ===<br />
<br />
Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.<br />
<br />
For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.<br />
<br />
Opus is a relatively new codec: '''[[OpusSupport|many more applications]]''' will support it in the near future.<br />
<br />
=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===<br />
<br />
Yes and no.<br />
<br />
Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.<br />
<br />
However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.<br />
<br />
The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.<br />
<br />
See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.<br />
<br />
If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!<br />
<br />
=== What are the licensing requirements? ===<br />
<br />
The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met. <br />
<br />
Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).<br />
<br />
See the '''[https://www.opus-codec.org/license/ Opus Licensing]''' page for details.<br />
<br />
=== Why make Opus free? ===<br />
<br />
On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.<br />
<br />
Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.<br />
<br />
Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.<br />
<br />
This is why Opus, unlike many codecs, is free.<br />
<br />
=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===<br />
<br />
No.<br />
<br />
The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.<br />
<br />
=== Why not keep the SILK and CELT codecs separate? ===<br />
Opus is more than just two independent codecs with a switch.<br />
<br />
In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.<br />
<br />
Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.<br />
<br />
=== Now that Opus is standardized, will its development stop or can it be further improved? ===<br />
Yes, Opus '''can''' and '''should''' be improved, because unlike most '''[https://en.wikipedia.org/wiki/ITU-T#Key_standards_published_by_ITU ITU-T codecs]''', Opus is only defined in terms of its decoder.<br />
<br />
The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.<br />
<br />
Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.<br />
<br />
In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.<br />
<br />
=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===<br />
<br />
Yes.<br />
<br />
=== In what ways is Opus optimized for the Internet? ===<br />
<br />
Opus has good packet loss robustness and concealment, but its optimisations go further.<br />
<br />
One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.<br />
<br />
This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.<br />
<br />
One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.<br />
<br />
=== What applications for Android can play Opus? ===<br />
<br />
Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.<br />
<br />
=== When will the next version be released? ===<br />
<br />
When it's done. Seriously, we do not know.<br />
<br />
Opus is not a large project with a fixed release schedule.<br />
<br />
That being said, our '''[https://www.opus-codec.org/downloads/ pre-releases]''' and even the git repositories ('''[https://git.xiph.org/?p=opus.git Xiph]''', '''[https://github.com/xiph/opus GitHub]''') are pretty stable and given proper testing (which you should always do anyway), are safe to distribute.<br />
<br />
Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.<br />
<br />
== Software Developers' Questions ==<br />
<br />
=== On what platforms does Opus run? ===<br />
<br />
The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.<br />
<br />
Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.<br />
<br />
=== Is there a fixed-point implementation? ===<br />
<br />
Yes.<br />
<br />
The fixed-point and floating-point decoder and encoder implementations are part of the same code base.<br />
<br />
The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.<br />
<br />
=== Which implementation should I use? ===<br />
<br />
While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.<br />
<br />
The [https://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.<br />
<br />
All Opus implementations are compatible by definition.<br />
<br />
=== How is supporting Opus different from supporting Speex/G.711/MP3? ===<br />
<br />
Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''. <br />
<br />
The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.<br />
<br />
=== My application doesn't work. Can anyone help me? ===<br />
<br />
It's possible to get help, but before doing so, there are a few basic things to try:<br />
<br />
* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.<br />
* Read the [https://www.opus-codec.org/docs/ Opus documentation].<br />
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.<br />
<br />
If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.<br />
<br />
=== How do I report a bug? ===<br />
<br />
If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].<br />
<br />
Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.<br />
<br />
If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.<br />
<br />
Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].<br />
<br />
=== What is Opus Custom? ===<br />
<br />
Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.<br />
<br />
Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.<br />
<br />
For these reasons, '''its use is discouraged''' outside of very specific applications. <br />
<br />
You may want to use Opus Custom for:<br />
<br />
* ultra-low-delay applications, where synchronization with the soundcard buffer is important. <br />
* low-power embedded applications, where compatibility with others is not important.<br />
<br />
For almost all other types of applications, Opus Custom should not be used.<br />
<br />
=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===<br />
<br />
Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.<br />
<br />
Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.<br />
<br />
The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed '''[https://github.com/xiph/opus-tools/blob/master/src/resample.c resampler]''' which can be used where resampling is required.<br />
<br />
=== But won't the resampler hurt the quality? Isn't it better to use 44.1 kHz directly? ===<br />
<br />
Not really. The quality degradation caused by any reasonable resampler (SoX, libspeexdsp, libsamplerate, ...) is far less than the distortion caused by the best lossy codec at its highest bitrate. If you can't tolerate the quality degradation caused by a good 44.1 ↔ 48 kHz resampler, then you shouldn't be using a lossy codec in the first place. Similarly, the extra CPU spent in the resampler is small compared to the rest of the codec. Not only that, but many soundcards only support 48 kHz on playback, so players can directly play the output rather than resample it to 48 kHz (e.g. for a 44.1 kHz MP3). So effectively, Opus is only shifting the burden of resampling from the decoder side to the encoder side.<br />
<br />
One advantage of supporting only one internal rate is that it makes it possible for Opus to support many features, including efficient speech compression (through SILK) and real-time applications. It also means all the quality tuning effort can be spent on a single configuration, which helps bring even better quality.<br />
<br />
=== How is the bitrate setting used in VBR mode? ===<br />
<br />
Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.<br />
<br />
The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.<br />
<br />
=== What frame size should I use? ===<br />
<br />
A '''20ms''' frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.<br />
<br />
Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent). For file encoding, using a frame size larger than 20 ms will usually result in '''worse''' quality for the same bitrate because it constrains the encoder in the decisions it can make.<br />
<br />
=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===<br />
<br />
The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.<br />
<br />
In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.<br />
<br />
FEC is only used by the encoder under certain conditions:<br />
* the feature must be enabled via the '''OPUS_SET_INBAND_FEC''' CTL<br />
* the encoder must be told to expect loss via the '''OPUS_SET_PACKET_LOSS_PERC''' CTL<br />
* the codec must be operated in any of the '''Linear Prediction''' or '''Hybrid''' modes<br />
<br />
Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.<br />
<br />
Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default, the implementation assumes there is no loss.<br />
<br />
=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===<br />
<br />
A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.<br />
<br />
To build Opus without the references to <tt>malloc/free</tt>, you must:<br />
<br />
* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application<br />
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.<br />
<br />
If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage.<br><br />
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.<br />
<br />
=== How can I ensure that my software interoperates with other software implementing Opus? ===<br />
<br />
For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.<br />
<br />
In general, here's a list of specific issues to check:<br />
* Can your application handle all frame sizes, including changing the frame size from frame to frame?<br />
* Does your application react properly to lost packets, by calling the decoder with a NULL packet?<br />
<br />
=== What is the complexity of Opus? ===<br />
<br />
The complexity of Opus varies by a large amount based on the settings used.<br />
<br />
It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone. <br />
<br />
For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.<br />
<br />
=== Opus is using too much CPU for my application. What can I do? ===<br />
<br />
First don't panic and don't start writing assembly just yet.<br />
<br />
It's possible that you're just not using the right set of options.<br />
<br />
If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.<br />
<br />
Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[https://www.opus-codec.org/docs/ Documentation]''' for details).<br />
<br />
If all else fails and you need to optimize the Opus code, see the next question.<br />
<br />
=== I would like to optimize/improve/help with Opus. Where should I start? ===<br />
<br />
Please '''[https://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.<br />
<br />
This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path. More details in the '''[[OpusContributing|contributing page]]'''.<br />
<br />
=== Does Opus have an echo canceller like Speex does? ===<br />
<br />
Echo cancellation is completely independent from codecs.<br />
<br />
You can use any echo canceller (including the one from libspeexdsp) along with Opus.<br />
<br />
That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].<br />
<br />
=== How do I get the duration of a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__info.html op_pcm_total()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement this yourself, you need to<br />
* Read the BOS (Beginning Of Stream) pages to enumerate the serial numbers of all concurrently multiplexed streams, identify the Opus stream you want, and get its preskip value.<br />
* Read up through the first complete audio data page to compute the starting granule position (since the timestamps might not start at 0, e.g., if the file was captured from a live stream that was joined after the start).<br />
* Seek near the end of a file and look for a page with the same serial number as found in the headers (just under 64 kB from the end should be sufficient to ensure you find a page, assuming the Opus data is not multiplexed with another stream and there is no trailing garbage in the file).<br />
* If you find a page whose serial number was not included in the original set of BOS pages, you have a chained stream. You need to bisect the file to identify the end of the first chain and the start of the next, and repeat this process for each link in the chain.<br />
* If you don't find any pages at all, or find a page whose serial number was included in the original set of BOS pages, but was not the serial number of the Opus stream you want, back up and try again (being careful to avoid rescanning the same data, which can produce quadratic worst-case complexity).<br />
* If you find a page whose serial number matches the Opus stream you want, look at its final granule position, and compute the total duration (in seconds) as (final_granule_position - initial_granule_position - preskip)/48000.0.<br />
<br />
=== Why don't you store the duration in the header? Isn't all of that slow and complicated? ===<br />
<br />
Computing the duration directly from the file contents allows files to be written in a single pass, without any seeking, which is necessary for live streaming. Chaining also simplifies live streaming, as you can just pipe multiple files into the same network connection, with all associated metadata updates, etc., and the results are still valid .opus files (contrast with the '''[http://www.smackfu.com/stuff/programming/shoutcast.html hacks used to add metadata to MP3 streams]''').<br />
<br />
Opening a typical .opus file, which is not multiplexed and not chained, and computing the duration over the network requires just one extra HTTP request, which can proceed in parallel with the buffering in the main request. This is the behavior you will get from libopusfile's HTTP backend by default.<br />
<br />
Enumeration of chain boundaries can be expensive in files with many links, but in our testing libopusfile used nearly an order of magnitude fewer seeks to do this than some other media frameworks (at the time). Storing a duration in a header wouldn't solve this, since every link in a chain has its own, independent headers. If the cost of chain enumeration is a problem, the best way to avoid it is to store the links in separate files (i.e., don't use chaining).<br />
<br />
=== How do I seek in a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__seeking.html op_pcm_seek() or op_raw_seek()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement seeking yourself, you need to<br />
* Identify the link that contains the target (if you have a chained file).<br />
* Adjust the target by 80 ms to get enough pre-roll data (to ensure the decoder will have converged by the time you reach the target), as recommended by '''[https://tools.ietf.org/html/rfc7845 RFC 7845]'''.<br />
* Estimate the location of the last audio data page with a completed packet prior to the adjusted target, using the duration and size (in bytes) of the link.<br />
* Seek to that location and scan forward until you find an audio data page with a completed packet (that contains a valid granule position).<br />
* If you think you are sufficiently close to the adjusted target, scan forward until you find the next audio data page with a completed packet.<br />
* If the adjusted target lies between the first audio data page with a completed packet you found and the next one, stop. You can decode forward from here and start playing when you reach your (original, unadjusted) target.<br />
* Otherwise, go back and re-estimate the seek location using the granule positions and file offsets of the page(s) you just found.<br />
<br />
libopusfile includes fallbacks to prevent pathological worst-case behavior when its guesses are repeatedly wrong. Weighted bisection can degrade to a linear scan, but libopusfile's worst case is within a constant factor of naive bisection (i.e., logarithmic). We have only ever observed such pathological behavior in files we manually constructed to trigger it.<br />
<br />
libopusfile also takes shortcuts when the target location is near the current position, to make small seeks cheaper. In the best case it can loop forever over very short files whose data is contained in a single page (e.g., less than 1 second long with default encoder settings) without any seeking at all.<br />
<br />
You can find more information on seeking in multiplexed files when you want to play more than just a single Opus stream '''[[GranulePosAndSeeking|on this page]]'''.<br />
<br />
=== Wouldn't it be better to build an index? ===<br />
<br />
As with file durations, an index at the beginning of the file is incompatible with live streaming. It also means more data has to be fetched before a file can start playing over the network, because you must read past the index even when you don't intend to seek. The index could be stored at the end (which even still allows encoding the file in a single pass), but this requires one (or more) extra seeks to read the index (especially if its exact location at the end is not known), either on file open or on first seek. Unlike the final timestamp, which is small and fixed in size, an index grows with the file duration, and can have unbounded size. It is also easy for an index to become out of sync with a file that has been edited or damaged, in which case seeking will simply fail. In contrast, you can seek in a truncated .opus download without issues.<br />
<br />
In practice, bisection seeking on VBR audio achieves performance that is very nearly as good as seeking with an index, without any of the drawbacks of an index. libopusfile provides a test program called seeking_example which can be used to benchmark the performance on your files.<br />
<br />
On a 96 kbps VBR file nearly one hour long (the second movement of Mahler's Symphony No. 8 "Symphony of a Thousand"):<br />
<br />
Testing exact PCM seeking to random places in 169680000 samples (58m55.000s)...<br />
Total seek operations: 1020 (1.020 per exact seek, 2 maximum).<br />
<br />
On a chained file formed by concatenating the eight test vectors for the currently supported channel layouts in mapping family 1:<br />
<br />
Opened file containing 8 links with 18 seeks (2.250 per link).<br />
Testing exact PCM seeking to random places in 2759064 samples (57.481s)...<br />
Total seek operations: 946 (0.946 per exact seek, 2 maximum).<br />
<br />
That is, the number of physical seeks required is almost always 1, every once in a while 2, and in short files, sometimes even 0.<br />
<br />
[[Category:Opus]]</div>Derfhttps://wiki.xiph.org/index.php?title=OpusFAQ&diff=16603OpusFAQ2017-07-07T20:42:38Z<p>Derf: /* How do I seek in a .opus file? */</p>
<hr />
<div>[[Image:Opus logo trans.png]]<br />
<br />
If you are looking for info not covered in this FAQ, try the '''[https://opus-codec.org main Opus website]''' or the pages included in the '''[[:Category:Opus|Opus category]]''' of this wiki.<br />
<br />
== General Questions ==<br />
<br />
=== What is Opus? Who created it? ===<br />
<br />
Opus is a totally open, royalty-free, highly versatile audio codec.<br />
<br />
It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''. <br />
<br />
Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.<br />
<br />
=== How does Opus compare to other codecs? ===<br />
<br />
Opus is distinguished from most high quality formats (eg: [[Vorbis]], AAC, MP3) by having '''[https://tools.ietf.org/html/rfc6716#section-2 low delay]''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: [[Speex]], G.711, GSM) by supporting '''[https://tools.ietf.org/html/rfc6716#section-2.1.1 high audio quality]''' (supports narrow-band all the way to full-band audio).<br />
<br />
It '''[https://opus-codec.org/comparison meets or exceeds existing codecs' quality]''' across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.<br />
<br />
Most importantly, the Opus format and its reference implementation are both available under '''[https://opus-codec.org/license/ liberal, royalty-free licenses]'''.<br /><br />
This makes it:<br />
* easy to adopt<br />
* compatible with free software<br />
* suitable for use as part of the basic infrastructure of the Internet<br />
<br />
=== Does Opus make all those other lossy codecs obsolete? ===<br />
<br />
Yes.<br />
<br />
From a technical point of view (loss, delay, bitrates, ...) Opus renders '''[[Speex]]''' obsolete and should also replace '''[[Vorbis]]''' and the common proprietary codecs too (e.g. AAC, MP3, ...).<br />
<br />
=== Will Opus replace Vorbis in video files? ===<br />
<br />
For '''[[Ogg]]''' video files (which use the '''[[Theora]]''' video codec), you ''can'' use Opus instead of Vorbis, but the overall size reduction will be minimal and it will break compatibility with existing players.<br />
<br />
For WebM video files, the convention is to use the '''[http://www.webmproject.org/vp9/ VP9 video codec]''' when using Opus as an audio codec.<br />
<br />
=== How do I use Opus? ===<br />
<br />
For now, the best way to '''encode''' audio into Opus files is to use the '''opusenc''' command-line tool from the '''[https://opus-codec.org/downloads/ opus-tools package]'''.<br />
<br />
If you want to encode many files at once (e.g. your music library), try the applications listed in the '''[[OpusSupport|Opus Support]]''' page.<br />
<br />
For rough guidelines on encoding settings, see the '''[[Opus Recommended Settings]]''' page.<br />
<br />
=== What programs support Opus? ===<br />
<br />
Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.<br />
<br />
For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.<br />
<br />
Opus is a relatively new codec: '''[[OpusSupport|many more applications]]''' will support it in the near future.<br />
<br />
=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===<br />
<br />
Yes and no.<br />
<br />
Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.<br />
<br />
However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.<br />
<br />
The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.<br />
<br />
See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.<br />
<br />
If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!<br />
<br />
=== What are the licensing requirements? ===<br />
<br />
The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met. <br />
<br />
Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).<br />
<br />
See the '''[https://www.opus-codec.org/license/ Opus Licensing]''' page for details.<br />
<br />
=== Why make Opus free? ===<br />
<br />
On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.<br />
<br />
Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.<br />
<br />
Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.<br />
<br />
This is why Opus, unlike many codecs, is free.<br />
<br />
=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===<br />
<br />
No.<br />
<br />
The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.<br />
<br />
=== Why not keep the SILK and CELT codecs separate? ===<br />
Opus is more than just two independent codecs with a switch.<br />
<br />
In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.<br />
<br />
Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.<br />
<br />
=== Now that Opus is standardized, will its development stop or can it be further improved? ===<br />
Yes, Opus '''can''' and '''should''' be improved, because unlike most '''[https://en.wikipedia.org/wiki/ITU-T#Key_standards_published_by_ITU ITU-T codecs]''', Opus is only defined in terms of its decoder.<br />
<br />
The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.<br />
<br />
Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.<br />
<br />
In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.<br />
<br />
=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===<br />
<br />
Yes.<br />
<br />
=== In what ways is Opus optimized for the Internet? ===<br />
<br />
Opus has good packet loss robustness and concealment, but its optimisations go further.<br />
<br />
One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.<br />
<br />
This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.<br />
<br />
One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.<br />
<br />
=== What applications for Android can play Opus? ===<br />
<br />
Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.<br />
<br />
=== When will the next version be released? ===<br />
<br />
When it's done. Seriously, we do not know.<br />
<br />
Opus is not a large project with a fixed release schedule.<br />
<br />
That being said, our '''[https://www.opus-codec.org/downloads/ pre-releases]''' and even the git repositories ('''[https://git.xiph.org/?p=opus.git Xiph]''', '''[https://github.com/xiph/opus GitHub]''') are pretty stable and given proper testing (which you should always do anyway), are safe to distribute.<br />
<br />
Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.<br />
<br />
== Software Developers' Questions ==<br />
<br />
=== On what platforms does Opus run? ===<br />
<br />
The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.<br />
<br />
Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.<br />
<br />
=== Is there a fixed-point implementation? ===<br />
<br />
Yes.<br />
<br />
The fixed-point and floating-point decoder and encoder implementations are part of the same code base.<br />
<br />
The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.<br />
<br />
=== Which implementation should I use? ===<br />
<br />
While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.<br />
<br />
The [https://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.<br />
<br />
All Opus implementations are compatible by definition.<br />
<br />
=== How is supporting Opus different from supporting Speex/G.711/MP3? ===<br />
<br />
Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''. <br />
<br />
The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.<br />
<br />
=== My application doesn't work. Can anyone help me? ===<br />
<br />
It's possible to get help, but before doing so, there are a few basic things to try:<br />
<br />
* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.<br />
* Read the [https://www.opus-codec.org/docs/ Opus documentation].<br />
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.<br />
<br />
If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.<br />
<br />
=== How do I report a bug? ===<br />
<br />
If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].<br />
<br />
Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.<br />
<br />
If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.<br />
<br />
Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].<br />
<br />
=== What is Opus Custom? ===<br />
<br />
Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.<br />
<br />
Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.<br />
<br />
For these reasons, '''its use is discouraged''' outside of very specific applications. <br />
<br />
You may want to use Opus Custom for:<br />
<br />
* ultra-low-delay applications, where synchronization with the soundcard buffer is important. <br />
* low-power embedded applications, where compatibility with others is not important.<br />
<br />
For almost all other types of applications, Opus Custom should not be used.<br />
<br />
=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===<br />
<br />
Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.<br />
<br />
Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.<br />
<br />
The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed '''[https://github.com/xiph/opus-tools/blob/master/src/resample.c resampler]''' which can be used where resampling is required.<br />
<br />
=== But won't the resampler hurt the quality? Isn't it better to use 44.1 kHz directly? ===<br />
<br />
Not really. The quality degradation caused by any reasonable resampler (SoX, libspeexdsp, libsamplerate, ...) is far less than the distortion caused by the best lossy codec at its highest bitrate. If you can't tolerate the quality degradation caused by a good 44.1 ↔ 48 kHz resampler, then you shouldn't be using a lossy codec in the first place. Similarly, the extra CPU spent in the resampler is small compared to the rest of the codec. Not only that, but many soundcards only support 48 kHz on playback, so players can directly play the output rather than resample it to 48 kHz (e.g. for a 44.1 kHz MP3). So effectively, Opus is only shifting the burden of resampling from the decoder side to the encoder side.<br />
<br />
One advantage of supporting only one internal rate is that it makes it possible for Opus to support many features, including efficient speech compression (through SILK) and real-time applications. It also means all the quality tuning effort can be spent on a single configuration, which helps bring even better quality.<br />
<br />
=== How is the bitrate setting used in VBR mode? ===<br />
<br />
Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.<br />
<br />
The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.<br />
<br />
=== What frame size should I use? ===<br />
<br />
A '''20ms''' frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.<br />
<br />
Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent). For file encoding, using a frame size larger than 20 ms will usually result in '''worse''' quality for the same bitrate because it constrains the encoder in the decisions it can make.<br />
<br />
=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===<br />
<br />
The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.<br />
<br />
In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.<br />
<br />
FEC is only used by the encoder under certain conditions:<br />
* the feature must be enabled via the '''OPUS_SET_INBAND_FEC''' CTL<br />
* the encoder must be told to expect loss via the '''OPUS_SET_PACKET_LOSS_PERC''' CTL<br />
* the codec must be operated in any of the '''Linear Prediction''' or '''Hybrid''' modes<br />
<br />
Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.<br />
<br />
Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default, the implementation assumes there is no loss.<br />
<br />
=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===<br />
<br />
A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.<br />
<br />
To build Opus without the references to <tt>malloc/free</tt>, you must:<br />
<br />
* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application<br />
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.<br />
<br />
If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage.<br><br />
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.<br />
<br />
=== How can I ensure that my software interoperates with other software implementing Opus? ===<br />
<br />
For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.<br />
<br />
In general, here's a list of specific issues to check:<br />
* Can your application handle all frame sizes, including changing the frame size from frame to frame?<br />
* Does your application react properly to lost packets, by calling the decoder with a NULL packet?<br />
<br />
=== What is the complexity of Opus? ===<br />
<br />
The complexity of Opus varies by a large amount based on the settings used.<br />
<br />
It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone. <br />
<br />
For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.<br />
<br />
=== Opus is using too much CPU for my application. What can I do? ===<br />
<br />
First don't panic and don't start writing assembly just yet.<br />
<br />
It's possible that you're just not using the right set of options.<br />
<br />
If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.<br />
<br />
Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[https://www.opus-codec.org/docs/ Documentation]''' for details).<br />
<br />
If all else fails and you need to optimize the Opus code, see the next question.<br />
<br />
=== I would like to optimize/improve/help with Opus. Where should I start? ===<br />
<br />
Please '''[https://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.<br />
<br />
This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path. More details in the '''[[OpusContributing|contributing page]]'''.<br />
<br />
=== Does Opus have an echo canceller like Speex does? ===<br />
<br />
Echo cancellation is completely independent from codecs.<br />
<br />
You can use any echo canceller (including the one from libspeexdsp) along with Opus.<br />
<br />
That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].<br />
<br />
=== How do I get the duration of a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__info.html op_pcm_total()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement this yourself, you need to<br />
* Read the BOS (Beginning Of Stream) pages to enumerate the serial numbers of all concurrently multiplexed streams, identify the Opus stream you want, and get its preskip value.<br />
* Read up through the first complete audio data page to compute the starting granule position (since the timestamps might not start at 0, e.g., if the file was captured from a live stream that was joined after the start).<br />
* Seek near the end of a file and look for a page with the same serial number as found in the headers (just under 64 kB from the end should be sufficient to ensure you find a page, assuming the Opus data is not multiplexed with another stream and there is no trailing garbage in the file).<br />
* If you find a page whose serial number was not included in the original set of BOS pages, you have a chained stream. You need to bisect the file to identify the end of the first chain and the start of the next, and repeat this process for each link in the chain.<br />
* If you don't find any pages at all, or find a page whose serial number was included in the original set of BOS pages, but was not the serial number of the Opus stream you want, back up and try again (being careful to avoid rescanning the same data, which can produce quadratic worst-case complexity).<br />
* If you find a page whose serial number matches the Opus stream you want, look at its final granule position, and compute the total duration (in seconds) as (final_granule_position - initial_granule_position - preskip)/48000.0.<br />
<br />
=== Why don't you store the duration in the header? Isn't all of that slow and complicated? ===<br />
<br />
Computing the duration directly from the file contents allows files to be written in a single pass, without any seeking, which is necessary for live streaming. Chaining also simplifies live streaming, as you can just pipe multiple files into the same network connection, with all associated metadata updates, etc., and the results are still valid .opus files (contrast with the '''[http://www.smackfu.com/stuff/programming/shoutcast.html hacks used to add metadata to MP3 streams]''').<br />
<br />
Opening a typical .opus file, which is not multiplexed and not chained, and computing the duration over the network requires just one extra HTTP request, which can proceed in parallel with the buffering in the main request. This is the behavior you will get from libopusfile's HTTP backend by default.<br />
<br />
Enumeration of chain boundaries can be expensive in files with many links, but in our testing libopusfile used nearly an order of magnitude fewer seeks to do this than some other media frameworks (at the time). Storing a duration in a header wouldn't solve this, since every link in a chain has its own, independent headers. If the cost of chain enumeration is a problem, the best way to avoid it is to store the links in separate files (i.e., don't use chaining).<br />
<br />
=== How do I seek in a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__seeking.html op_pcm_seek() or op_raw_seek()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement seeking yourself, you need to<br />
* Identify the link that contains the target (if you have a chained file).<br />
* Adjust the target by 80 ms to get enough pre-roll data (to ensure the decoder will have converged by the time you reach the target), as recommended by '''[https://tools.ietf.org/html/rfc7845 RFC 7845]'''.<br />
* Estimate the location of the last audio data page with a completed packet prior to the adjusted target, using the duration and size (in bytes) of the link.<br />
* Seek to that location and scan forward until you find an audio data page with a completed packet (that contains a valid granule position).<br />
* If you think you are sufficiently close to the adjusted target, scan forward until you find the next audio data page with a completed packet.<br />
* If the adjusted target lies between the first audio data page with a completed packet you found and the next one, stop. You can decode forward from here and start playing when you reach your (original unadjusted) target.<br />
* Otherwise, go back and re-estimate the seek location using the granule positions and file offsets of the page(s) you just found.<br />
<br />
libopusfile includes fallbacks to prevent pathological worst-case behavior when its guesses are repeatedly wrong. Weighted bisection can degrade to a linear scan, but libopusfile's worst case is within a constant factor of naive bisection (i.e., logarithmic). We have only ever observed such pathological behavior in files we manually constructed to trigger it.<br />
<br />
libopusfile also takes shortcuts when the target location is near the current position, to make small seeks cheaper. In the best case it can loop forever over very short files whose data is contained in a single page (e.g., less than 1 second long with default encoder settings) without any seeking at all.<br />
<br />
You can find more information on seeking in multiplexed files when you want to play more than just a single Opus stream '''[[GranulePosAndSeeking|on this page]]'''.<br />
<br />
=== Wouldn't it be better to build an index? ===<br />
<br />
As with file durations, an index at the beginning of the file is incompatible with live streaming. It also means more data has to be fetched before a file can start playing over the network, because you must read past the index even when you don't intend to seek. The index could be stored at the end (which even still allows encoding the file in a single pass), but this requires one (or more) extra seeks to read the index (especially if its exact location at the end is not known), either on file open or on first seek. Unlike the final timestamp, which is small and fixed in size, an index grows with the file duration, and can have unbounded size. It is also easy for an index to become out of sync with a file that has been edited or damaged, in which case seeking will simply fail. In contrast, you can seek in a truncated .opus download without issues.<br />
<br />
In practice, bisection seeking on VBR audio achieves performance that is very nearly as good as seeking with an index, without any of the drawbacks of an index. libopusfile provides a test program called seeking_example which can be used to benchmark the performance on your files.<br />
<br />
On a 96 kbps VBR file nearly one hour long (the second movement of Mahler's Symphony No. 8 "Symphony of a Thousand"):<br />
<br />
Testing exact PCM seeking to random places in 169680000 samples (58m55.000s)...<br />
Total seek operations: 1020 (1.020 per exact seek, 2 maximum).<br />
<br />
On a chained file formed by concatenating the eight test vectors for the currently supported channel layouts in mapping family 1:<br />
<br />
Opened file containing 8 links with 18 seeks (2.250 per link).<br />
Testing exact PCM seeking to random places in 2759064 samples (57.481s)...<br />
Total seek operations: 946 (0.946 per exact seek, 2 maximum).<br />
<br />
That is, the number of physical seeks required is almost always 1, every once in a while 2, and in short files, sometimes even 0.<br />
<br />
[[Category:Opus]]</div>Derfhttps://wiki.xiph.org/index.php?title=OpusFAQ&diff=16602OpusFAQ2017-07-07T20:39:49Z<p>Derf: /* How do I get the duration of a .opus file? */</p>
<hr />
<div>[[Image:Opus logo trans.png]]<br />
<br />
If you are looking for info not covered in this FAQ, try the '''[https://opus-codec.org main Opus website]''' or the pages included in the '''[[:Category:Opus|Opus category]]''' of this wiki.<br />
<br />
== General Questions ==<br />
<br />
=== What is Opus? Who created it? ===<br />
<br />
Opus is a totally open, royalty-free, highly versatile audio codec.<br />
<br />
It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''. <br />
<br />
Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.<br />
<br />
=== How does Opus compare to other codecs? ===<br />
<br />
Opus is distinguished from most high quality formats (eg: [[Vorbis]], AAC, MP3) by having '''[https://tools.ietf.org/html/rfc6716#section-2 low delay]''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: [[Speex]], G.711, GSM) by supporting '''[https://tools.ietf.org/html/rfc6716#section-2.1.1 high audio quality]''' (supports narrow-band all the way to full-band audio).<br />
<br />
It '''[https://opus-codec.org/comparison meets or exceeds existing codecs' quality]''' across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.<br />
<br />
Most importantly, the Opus format and its reference implementation are both available under '''[https://opus-codec.org/license/ liberal, royalty-free licenses]'''.<br /><br />
This makes it:<br />
* easy to adopt<br />
* compatible with free software<br />
* suitable for use as part of the basic infrastructure of the Internet<br />
<br />
=== Does Opus make all those other lossy codecs obsolete? ===<br />
<br />
Yes.<br />
<br />
From a technical point of view (loss, delay, bitrates, ...) Opus renders '''[[Speex]]''' obsolete and should also replace '''[[Vorbis]]''' and the common proprietary codecs too (e.g. AAC, MP3, ...).<br />
<br />
=== Will Opus replace Vorbis in video files? ===<br />
<br />
For '''[[Ogg]]''' video files (which use the '''[[Theora]]''' video codec), you ''can'' use Opus instead of Vorbis, but the overall size reduction will be minimal and it will break compatibility with existing players.<br />
<br />
For WebM video files, the convention is to use the '''[http://www.webmproject.org/vp9/ VP9 video codec]''' when using Opus as an audio codec.<br />
<br />
=== How do I use Opus? ===<br />
<br />
For now, the best way to '''encode''' audio into Opus files is to use the '''opusenc''' command-line tool from the '''[https://opus-codec.org/downloads/ opus-tools package]'''.<br />
<br />
If you want to encode many files at once (e.g. your music library), try the applications listed in the '''[[OpusSupport|Opus Support]]''' page.<br />
<br />
For rough guidelines on encoding settings, see the '''[[Opus Recommended Settings]]''' page.<br />
<br />
=== What programs support Opus? ===<br />
<br />
Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.<br />
<br />
For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.<br />
<br />
Opus is a relatively new codec: '''[[OpusSupport|many more applications]]''' will support it in the near future.<br />
<br />
=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===<br />
<br />
Yes and no.<br />
<br />
Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.<br />
<br />
However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.<br />
<br />
The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.<br />
<br />
See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.<br />
<br />
If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!<br />
<br />
=== What are the licensing requirements? ===<br />
<br />
The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met. <br />
<br />
Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).<br />
<br />
See the '''[https://www.opus-codec.org/license/ Opus Licensing]''' page for details.<br />
<br />
=== Why make Opus free? ===<br />
<br />
On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.<br />
<br />
Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.<br />
<br />
Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.<br />
<br />
This is why Opus, unlike many codecs, is free.<br />
<br />
=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===<br />
<br />
No.<br />
<br />
The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.<br />
<br />
=== Why not keep the SILK and CELT codecs separate? ===<br />
Opus is more than just two independent codecs with a switch.<br />
<br />
In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.<br />
<br />
Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.<br />
<br />
=== Now that Opus is standardized, will its development stop or can it be further improved? ===<br />
Yes, Opus '''can''' and '''should''' be improved, because unlike most '''[https://en.wikipedia.org/wiki/ITU-T#Key_standards_published_by_ITU ITU-T codecs]''', Opus is only defined in terms of its decoder.<br />
<br />
The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.<br />
<br />
Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.<br />
<br />
In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.<br />
<br />
=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===<br />
<br />
Yes.<br />
<br />
=== In what ways is Opus optimized for the Internet? ===<br />
<br />
Opus has good packet loss robustness and concealment, but its optimisations go further.<br />
<br />
One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.<br />
<br />
This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.<br />
<br />
One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.<br />
<br />
=== What applications for Android can play Opus? ===<br />
<br />
Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.<br />
<br />
=== When will the next version be released? ===<br />
<br />
When it's done. Seriously, we do not know.<br />
<br />
Opus is not a large project with a fixed release schedule.<br />
<br />
That being said, our '''[https://www.opus-codec.org/downloads/ pre-releases]''' and even the git repositories ('''[https://git.xiph.org/?p=opus.git Xiph]''', '''[https://github.com/xiph/opus GitHub]''') are pretty stable and given proper testing (which you should always do anyway), are safe to distribute.<br />
<br />
Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.<br />
<br />
== Software Developers' Questions ==<br />
<br />
=== On what platforms does Opus run? ===<br />
<br />
The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.<br />
<br />
Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.<br />
<br />
=== Is there a fixed-point implementation? ===<br />
<br />
Yes.<br />
<br />
The fixed-point and floating-point decoder and encoder implementations are part of the same code base.<br />
<br />
The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.<br />
<br />
=== Which implementation should I use? ===<br />
<br />
While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.<br />
<br />
The [https://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.<br />
<br />
All Opus implementations are compatible by definition.<br />
<br />
=== How is supporting Opus different from supporting Speex/G.711/MP3? ===<br />
<br />
Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''. <br />
<br />
The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.<br />
<br />
=== My application doesn't work. Can anyone help me? ===<br />
<br />
It's possible to get help, but before doing so, there are a few basic things to try:<br />
<br />
* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.<br />
* Read the [https://www.opus-codec.org/docs/ Opus documentation].<br />
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.<br />
<br />
If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.<br />
<br />
=== How do I report a bug? ===<br />
<br />
If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].<br />
<br />
Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.<br />
<br />
If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.<br />
<br />
Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].<br />
<br />
=== What is Opus Custom? ===<br />
<br />
Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.<br />
<br />
Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.<br />
<br />
For these reasons, '''its use is discouraged''' outside of very specific applications. <br />
<br />
You may want to use Opus Custom for:<br />
<br />
* ultra-low-delay applications, where synchronization with the soundcard buffer is important. <br />
* low-power embedded applications, where compatibility with others is not important.<br />
<br />
For almost all other types of applications, Opus Custom should not be used.<br />
<br />
=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===<br />
<br />
Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.<br />
<br />
Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.<br />
<br />
The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed '''[https://github.com/xiph/opus-tools/blob/master/src/resample.c resampler]''' which can be used where resampling is required.<br />
<br />
=== But won't the resampler hurt the quality? Isn't it better to use 44.1 kHz directly? ===<br />
<br />
Not really. The quality degradation caused by any reasonable resampler (SoX, libspeexdsp, libsamplerate, ...) is far less than the distortion caused by the best lossy codec at its highest bitrate. If you can't tolerate the quality degradation caused by a good 44.1 ↔ 48 kHz resampler, then you shouldn't be using a lossy codec in the first place. Similarly, the extra CPU spent in the resampler is small compared to the rest of the codec. Not only that, but many soundcards only support 48 kHz on playback, so players can directly play the output rather than resample it to 48 kHz (e.g. for a 44.1 kHz MP3). So effectively, Opus is only shifting the burden of resampling from the decoder side to the encoder side.<br />
<br />
One advantage of supporting only one internal rate is that it makes it possible for Opus to support many features, including efficient speech compression (through SILK) and real-time applications. It also means all the quality tuning effort can be spent on a single configuration, which helps bring even better quality.<br />
<br />
=== How is the bitrate setting used in VBR mode? ===<br />
<br />
Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.<br />
<br />
The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.<br />
<br />
=== What frame size should I use? ===<br />
<br />
A '''20ms''' frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.<br />
<br />
Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent). For file encoding, using a frame size larger than 20 ms will usually result in '''worse''' quality for the same bitrate because it constrains the encoder in the decisions it can make.<br />
<br />
=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===<br />
<br />
The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.<br />
<br />
In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.<br />
<br />
FEC is only used by the encoder under certain conditions:<br />
* the feature must be enabled via the '''OPUS_SET_INBAND_FEC''' CTL<br />
* the encoder must be told to expect loss via the '''OPUS_SET_PACKET_LOSS_PERC''' CTL<br />
* the codec must be operated in any of the '''Linear Prediction''' or '''Hybrid''' modes<br />
<br />
Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.<br />
<br />
Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default, the implementation assumes there is no loss.<br />
<br />
=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===<br />
<br />
A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.<br />
<br />
To build Opus without the references to <tt>malloc/free</tt>, you must:<br />
<br />
* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application<br />
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.<br />
<br />
If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage.<br><br />
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.<br />
<br />
=== How can I ensure that my software interoperates with other software implementing Opus? ===<br />
<br />
For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.<br />
<br />
In general, here's a list of specific issues to check:<br />
* Can your application handle all frame sizes, including changing the frame size from frame to frame?<br />
* Does your application react properly to lost packets, by calling the decoder with a NULL packet?<br />
<br />
=== What is the complexity of Opus? ===<br />
<br />
The complexity of Opus varies by a large amount based on the settings used.<br />
<br />
It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone. <br />
<br />
For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.<br />
<br />
=== Opus is using too much CPU for my application. What can I do? ===<br />
<br />
First don't panic and don't start writing assembly just yet.<br />
<br />
It's possible that you're just not using the right set of options.<br />
<br />
If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.<br />
<br />
Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[https://www.opus-codec.org/docs/ Documentation]''' for details).<br />
<br />
If all else fails and you need to optimize the Opus code, see the next question.<br />
<br />
=== I would like to optimize/improve/help with Opus. Where should I start? ===<br />
<br />
Please '''[https://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.<br />
<br />
This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path. More details in the '''[[OpusContributing|contributing page]]'''.<br />
<br />
=== Does Opus have an echo canceller like Speex does? ===<br />
<br />
Echo cancellation is completely independent from codecs.<br />
<br />
You can use any echo canceller (including the one from libspeexdsp) along with Opus.<br />
<br />
That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].<br />
<br />
=== How do I get the duration of a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__info.html op_pcm_total()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement this yourself, you need to<br />
* Read the BOS (Beginning Of Stream) pages to enumerate the serial numbers of all concurrently multiplexed streams, identify the Opus stream you want, and get its preskip value.<br />
* Read up through the first complete audio data page to compute the starting granule position (since the timestamps might not start at 0, e.g., if the file was captured from a live stream that was joined after the start).<br />
* Seek near the end of a file and look for a page with the same serial number as found in the headers (just under 64 kB from the end should be sufficient to ensure you find a page, assuming the Opus data is not multiplexed with another stream and there is no trailing garbage in the file).<br />
* If you find a page whose serial number was not included in the original set of BOS pages, you have a chained stream. You need to bisect the file to identify the end of the first chain and the start of the next, and repeat this process for each link in the chain.<br />
* If you don't find any pages at all, or find a page whose serial number was included in the original set of BOS pages, but was not the serial number of the Opus stream you want, back up and try again (being careful to avoid rescanning the same data, which can produce quadratic worst-case complexity).<br />
* If you find a page whose serial number matches the Opus stream you want, look at its final granule position, and compute the total duration (in seconds) as (final_granule_position - initial_granule_position - preskip)/48000.0.<br />
<br />
=== Why don't you store the duration in the header? Isn't all of that slow and complicated? ===<br />
<br />
Computing the duration directly from the file contents allows files to be written in a single pass, without any seeking, which is necessary for live streaming. Chaining also simplifies live streaming, as you can just pipe multiple files into the same network connection, with all associated metadata updates, etc., and the results are still valid .opus files (contrast with the '''[http://www.smackfu.com/stuff/programming/shoutcast.html hacks used to add metadata to MP3 streams]''').<br />
<br />
Opening a typical .opus file, which is not multiplexed and not chained, and computing the duration over the network requires just one extra HTTP request, which can proceed in parallel with the buffering in the main request. This is the behavior you will get from libopusfile's HTTP backend by default.<br />
<br />
Enumeration of chain boundaries can be expensive in files with many links, but in our testing libopusfile used nearly an order of magnitude fewer seeks to do this than some other media frameworks (at the time). Storing a duration in a header wouldn't solve this, since every link in a chain has its own, independent headers. If the cost of chain enumeration is a problem, the best way to avoid it is to store the links in separate files (i.e., don't use chaining).<br />
<br />
=== How do I seek in a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__seeking.html op_pcm_seek() or op_raw_seek()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement seeking yourself, you need to<br />
* Identify the link that contains the target (if you have a chained file).<br />
* Adjust the target by 80 ms to ensure you will get enough pre-roll data (to ensure the decoder will have converged by the time you reach the target), as recommended by '''[https://tools.ietf.org/html/rfc7845 RFC 7845]'''.<br />
* Estimate the location of the last audio data page with a completed packet prior to the adjusted target, using the duration and size (in bytes) of the link.<br />
* Seek to that location and scan forward until you find an audio data page with a completed packet (that contains a valid granule position).<br />
* If you think you are sufficiently close to the adjusted target, scan forward until you find the next audio data page with a completed packet.<br />
* If the adjusted target lies between the first audio data page with a completed packet you found and the next one, stop. You can decode forward from here and start playing when you reach your (original unadjusted) target.<br />
* Otherwise, go back and re-estimate the seek location using the granule positions and file offsets of the page(s) you just found.<br />
<br />
libopusfile includes fallbacks to prevent pathological worst-case behavior when its guesses are repeatedly wrong. Weighted bisection can degrade to a linear scan, but libopusfile's worst case is within a constant factor of naive bisection (i.e., logarithmic). We have only ever observed such pathological behavior in files we manually constructed to trigger it.<br />
<br />
libopusfile also takes shortcuts when the target location is near the current position, to make small seeks cheaper. In the best case it can loop forever over very short files whose data is contained in a single page (e.g., less than 1 second long with default encoder settings) without any seeking at all.<br />
<br />
You can find more information on seeking in multiplexed files when you want to play more than just a single Opus stream '''[[GranulePosAndSeeking|on this page]]'''.<br />
<br />
=== Wouldn't it be better to build an index? ===<br />
<br />
As with file durations, an index at the beginning of the file is incompatible with live streaming. It also means more data has to be fetched before a file can start playing over the network, because you must read past the index even when you don't intend to seek. The index could be stored at the end (which even still allows encoding the file in a single pass), but this requires one (or more) extra seeks to read the index (especially if its exact location at the end is not known), either on file open or on first seek. Unlike the final timestamp, which is small and fixed in size, an index grows with the file duration, and can have unbounded size. It is also easy for an index to become out of sync with a file that has been edited or damaged, in which case seeking will simply fail. In contrast, you can seek in a truncated .opus download without issues.<br />
<br />
In practice, bisection seeking on VBR audio achieves performance that is very nearly as good as seeking with an index, without any of the drawbacks of an index. libopusfile provides a test program called seeking_example which can be used to benchmark the performance on your files.<br />
<br />
On a 96 kbps VBR file nearly one hour long (the second movement of Mahler's Symphony No. 8 "Symphony of a Thousand"):<br />
<br />
Testing exact PCM seeking to random places in 169680000 samples (58m55.000s)...<br />
Total seek operations: 1020 (1.020 per exact seek, 2 maximum).<br />
<br />
On a chained file formed by concatenating the eight test vectors for the currently supported channel layouts in mapping family 1:<br />
<br />
Opened file containing 8 links with 18 seeks (2.250 per link).<br />
Testing exact PCM seeking to random places in 2759064 samples (57.481s)...<br />
Total seek operations: 946 (0.946 per exact seek, 2 maximum).<br />
<br />
That is, the number of physical seeks required is almost always 1, every once in a while 2, and in short files, sometimes even 0.<br />
<br />
[[Category:Opus]]</div>Derfhttps://wiki.xiph.org/index.php?title=OpusFAQ&diff=16601OpusFAQ2017-07-07T20:37:49Z<p>Derf: /* Wouldn't it be better to build an index? */</p>
<hr />
<div>[[Image:Opus logo trans.png]]<br />
<br />
If you are looking for info not covered in this FAQ, try the '''[https://opus-codec.org main Opus website]''' or the pages included in the '''[[:Category:Opus|Opus category]]''' of this wiki.<br />
<br />
== General Questions ==<br />
<br />
=== What is Opus? Who created it? ===<br />
<br />
Opus is a totally open, royalty-free, highly versatile audio codec.<br />
<br />
It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''. <br />
<br />
Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.<br />
<br />
=== How does Opus compare to other codecs? ===<br />
<br />
Opus is distinguished from most high quality formats (eg: [[Vorbis]], AAC, MP3) by having '''[https://tools.ietf.org/html/rfc6716#section-2 low delay]''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: [[Speex]], G.711, GSM) by supporting '''[https://tools.ietf.org/html/rfc6716#section-2.1.1 high audio quality]''' (supports narrow-band all the way to full-band audio).<br />
<br />
It '''[https://opus-codec.org/comparison meets or exceeds existing codecs' quality]''' across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.<br />
<br />
Most importantly, the Opus format and its reference implementation are both available under '''[https://opus-codec.org/license/ liberal, royalty-free licenses]'''.<br /><br />
This makes it:<br />
* easy to adopt<br />
* compatible with free software<br />
* suitable for use as part of the basic infrastructure of the Internet<br />
<br />
=== Does Opus make all those other lossy codecs obsolete? ===<br />
<br />
Yes.<br />
<br />
From a technical point of view (loss, delay, bitrates, ...) Opus renders '''[[Speex]]''' obsolete and should also replace '''[[Vorbis]]''' and the common proprietary codecs too (e.g. AAC, MP3, ...).<br />
<br />
=== Will Opus replace Vorbis in video files? ===<br />
<br />
For '''[[Ogg]]''' video files (which use the '''[[Theora]]''' video codec), you ''can'' use Opus instead of Vorbis, but the overall size reduction will be minimal and it will break compatibility with existing players.<br />
<br />
For WebM video files, the convention is to use the '''[http://www.webmproject.org/vp9/ VP9 video codec]''' when using Opus as an audio codec.<br />
<br />
=== How do I use Opus? ===<br />
<br />
For now, the best way to '''encode''' audio into Opus files is to use the '''opusenc''' command-line tool from the '''[https://opus-codec.org/downloads/ opus-tools package]'''.<br />
<br />
If you want to encode many files at once (e.g. your music library), try the applications listed in the '''[[OpusSupport|Opus Support]]''' page.<br />
<br />
For rough guidelines on encoding settings, see the '''[[Opus Recommended Settings]]''' page.<br />
<br />
=== What programs support Opus? ===<br />
<br />
Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.<br />
<br />
For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.<br />
<br />
Opus is a relatively new codec: '''[[OpusSupport|many more applications]]''' will support it in the near future.<br />
<br />
=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===<br />
<br />
Yes and no.<br />
<br />
Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.<br />
<br />
However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.<br />
<br />
The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.<br />
<br />
See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.<br />
<br />
If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!<br />
<br />
=== What are the licensing requirements? ===<br />
<br />
The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met. <br />
<br />
Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).<br />
<br />
See the '''[https://www.opus-codec.org/license/ Opus Licensing]''' page for details.<br />
<br />
=== Why make Opus free? ===<br />
<br />
On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.<br />
<br />
Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.<br />
<br />
Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.<br />
<br />
This is why Opus, unlike many codecs, is free.<br />
<br />
=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===<br />
<br />
No.<br />
<br />
The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.<br />
<br />
=== Why not keep the SILK and CELT codecs separate? ===<br />
Opus is more than just two independent codecs with a switch.<br />
<br />
In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.<br />
<br />
Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.<br />
<br />
=== Now that Opus is standardized, will its development stop or can it be further improved? ===<br />
Yes, Opus '''can''' and '''should''' be improved, because unlike most '''[https://en.wikipedia.org/wiki/ITU-T#Key_standards_published_by_ITU ITU-T codecs]''', Opus is only defined in terms of its decoder.<br />
<br />
The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.<br />
<br />
Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.<br />
<br />
In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.<br />
<br />
=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===<br />
<br />
Yes.<br />
<br />
=== In what ways is Opus optimized for the Internet? ===<br />
<br />
Opus has good packet loss robustness and concealment, but its optimisations go further.<br />
<br />
One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.<br />
<br />
This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.<br />
<br />
One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.<br />
<br />
=== What applications for Android can play Opus? ===<br />
<br />
Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.<br />
<br />
=== When will the next version be released? ===<br />
<br />
When it's done. Seriously, we do not know.<br />
<br />
Opus is not a large project with a fixed release schedule.<br />
<br />
That being said, our '''[https://www.opus-codec.org/downloads/ pre-releases]''' and even the git repositories ('''[https://git.xiph.org/?p=opus.git Xiph]''', '''[https://github.com/xiph/opus GitHub]''') are pretty stable and given proper testing (which you should always do anyway), are safe to distribute.<br />
<br />
Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.<br />
<br />
== Software Developers' Questions ==<br />
<br />
=== On what platforms does Opus run? ===<br />
<br />
The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.<br />
<br />
Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.<br />
<br />
=== Is there a fixed-point implementation? ===<br />
<br />
Yes.<br />
<br />
The fixed-point and floating-point decoder and encoder implementations are part of the same code base.<br />
<br />
The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.<br />
<br />
=== Which implementation should I use? ===<br />
<br />
While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.<br />
<br />
The [https://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.<br />
<br />
All Opus implementations are compatible by definition.<br />
<br />
=== How is supporting Opus different from supporting Speex/G.711/MP3? ===<br />
<br />
Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''. <br />
<br />
The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.<br />
<br />
=== My application doesn't work. Can anyone help me? ===<br />
<br />
It's possible to get help, but before doing so, there are a few basic things to try:<br />
<br />
* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.<br />
* Read the [https://www.opus-codec.org/docs/ Opus documentation].<br />
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.<br />
<br />
If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.<br />
<br />
=== How do I report a bug? ===<br />
<br />
If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].<br />
<br />
Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.<br />
<br />
If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.<br />
<br />
Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].<br />
<br />
=== What is Opus Custom? ===<br />
<br />
Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.<br />
<br />
Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.<br />
<br />
For these reasons, '''its use is discouraged''' outside of very specific applications. <br />
<br />
You may want to use Opus Custom for:<br />
<br />
* ultra-low-delay applications, where synchronization with the soundcard buffer is important. <br />
* low-power embedded applications, where compatibility with others is not important.<br />
<br />
For almost all other types of applications, Opus Custom should not be used.<br />
<br />
=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===<br />
<br />
Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.<br />
<br />
Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.<br />
<br />
The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed '''[https://github.com/xiph/opus-tools/blob/master/src/resample.c resampler]''' which can be used where resampling is required.<br />
<br />
=== But won't the resampler hurt the quality? Isn't it better to use 44.1 kHz directly? ===<br />
<br />
Not really. The quality degradation caused by any reasonable resampler (SoX, libspeexdsp, libsamplerate, ...) is far less than the distortion caused by the best lossy codec at its highest bitrate. If you can't tolerate the quality degradation caused by a good 44.1 ↔ 48 kHz resampler, then you shouldn't be using a lossy codec in the first place. Similarly, the extra CPU spent in the resampler is small compared to the rest of the codec. Not only that, but many soundcards only support 48 kHz on playback, so players can directly play the output rather than resample it to 48 kHz (e.g. for a 44.1 kHz MP3). So effectively, Opus is only shifting the burden of resampling from the decoder side to the encoder side.<br />
<br />
One advantage of supporting only one internal rate is that it makes it possible for Opus to support many features, including efficient speech compression (through SILK) and real-time applications. It also means all the quality tuning effort can be spent on a single configuration, which helps bring even better quality.<br />
<br />
=== How is the bitrate setting used in VBR mode? ===<br />
<br />
Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.<br />
<br />
The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.<br />
<br />
=== What frame size should I use? ===<br />
<br />
A '''20ms''' frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.<br />
<br />
Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent). For file encoding, using a frame size larger than 20 ms will usually result in '''worse''' quality for the same bitrate because it constrains the encoder in the decisions it can make.<br />
<br />
=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===<br />
<br />
The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.<br />
<br />
In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.<br />
<br />
FEC is only used by the encoder under certain conditions:<br />
* the feature must be enabled via the '''OPUS_SET_INBAND_FEC''' CTL<br />
* the encoder must be told to expect loss via the '''OPUS_SET_PACKET_LOSS_PERC''' CTL<br />
* the codec must be operated in any of the '''Linear Prediction''' or '''Hybrid''' modes<br />
<br />
Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.<br />
<br />
Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default, the implementation assumes there is no loss.<br />
<br />
=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===<br />
<br />
A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.<br />
<br />
To build Opus without the references to <tt>malloc/free</tt>, you must:<br />
<br />
* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application<br />
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.<br />
<br />
If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage.<br><br />
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.<br />
<br />
=== How can I ensure that my software interoperates with other software implementing Opus? ===<br />
<br />
For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.<br />
<br />
In general, here's a list of specific issues to check:<br />
* Can your application handle all frame sizes, including changing the frame size from frame to frame?<br />
* Does your application react properly to lost packets, by calling the decoder with a NULL packet?<br />
<br />
=== What is the complexity of Opus? ===<br />
<br />
The complexity of Opus varies by a large amount based on the settings used.<br />
<br />
It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone. <br />
<br />
For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.<br />
<br />
=== Opus is using too much CPU for my application. What can I do? ===<br />
<br />
First don't panic and don't start writing assembly just yet.<br />
<br />
It's possible that you're just not using the right set of options.<br />
<br />
If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.<br />
<br />
Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[https://www.opus-codec.org/docs/ Documentation]''' for details).<br />
<br />
If all else fails and you need to optimize the Opus code, see the next question.<br />
<br />
=== I would like to optimize/improve/help with Opus. Where should I start? ===<br />
<br />
Please '''[https://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.<br />
<br />
This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path. More details in the '''[[OpusContributing|contributing page]]'''.<br />
<br />
=== Does Opus have an echo canceller like Speex does? ===<br />
<br />
Echo cancellation is completely independent from codecs.<br />
<br />
You can use any echo canceller (including the one from libspeexdsp) along with Opus.<br />
<br />
That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].<br />
<br />
=== How do I get the duration of a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__info.html op_pcm_total()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement this yourself, you need to<br />
* Read the BOS (Beginning Of Stream) pages to enumerate the serial numbers of all concurrently multiplexed streams, identify the Opus stream you want, and get its preskip value.<br />
* Read up through the first complete audio data page to compute the starting granule position (since the timestamps might not start at 0, e.g., if the file was captured from a live stream that was joined after the start).<br />
* Seek near the end of a file and look for a page with the same serial number as found in the headers (just under 64 kB from the end should be sufficient to ensure you find a page, assuming the Opus data is not multiplexed with another stream, or there is no trailing garbage in the file).<br />
* If you find a page whose serial number was not included in the original set of BOS pages, you have a chained stream. You need to bisect the file to identify the end of the first chain and the start of the next, and repeat this process for each link in the chain.<br />
* If you don't find any pages at all, or find a page whose serial number was included in the original set of BOS pages, but was not the serial number of the Opus stream you want, back up and try again (being careful to avoid rescanning the same data, which can produce quadratic worst-case complexity).<br />
* If you find a page whose serial number matches the Opus stream you want, look at its final granule position, and compute the total duration (in seconds) as (final_granule_position - initial_granule_position - preskip)/48000.0.<br />
<br />
=== Why don't you store the duration in the header? Isn't all of that slow and complicated? ===<br />
<br />
Computing the duration directly from the file contents allows files to be written in a single pass, without any seeking, which is necessary for live streaming. Chaining also simplifies live streaming, as you can just pipe multiple files into the same network connection, with all associated metadata updates, etc., and the results are still valid .opus files (contrast with the '''[http://www.smackfu.com/stuff/programming/shoutcast.html hacks used to add metadata to MP3 streams]''').<br />
<br />
Opening a typical .opus file, which is not multiplexed and not chained, and computing the duration over the network requires just one extra HTTP request, which can proceed in parallel with the buffering in the main request. This is the behavior you will get from libopusfile's HTTP backend by default.<br />
<br />
Enumeration of chain boundaries can be expensive in files with many links, but in our testing libopusfile used nearly an order of magnitude fewer seeks to do this than some other media frameworks (at the time). Storing a duration in a header wouldn't solve this, since every link in a chain has its own, independent headers. If the cost of chain enumeration is a problem, the best way to avoid it is to store the links in separate files (i.e., don't use chaining).<br />
<br />
=== How do I seek in a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__seeking.html op_pcm_seek() or op_raw_seek()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement seeking yourself, you need to<br />
* Identify the link that contains the target (if you have a chained file).<br />
* Adjust the target by 80 ms to ensure you will get enough pre-roll data (to ensure the decoder will have converged by the time you reach the target), as recommended by '''[https://tools.ietf.org/html/rfc7845 RFC 7845]'''.<br />
* Estimate the location of the last audio data page with a completed packet prior to the adjusted target, using the duration and size (in bytes) of the link.<br />
* Seek to that location and scan forward until you find an audio data page with a completed packet (that contains a valid granule position).<br />
* If you think you are sufficiently close to the adjusted target, scan forward until you find the next audio data page with a completed packet.<br />
* If the adjusted target lies between the first audio data page with a completed packet you found and the next one, stop. You can decode forward from here and start playing when you reach your (original unadjusted) target.<br />
* Otherwise, go back and re-estimate the seek location using the granule positions and file offsets of the page(s) you just found.<br />
<br />
libopusfile includes fallbacks to prevent pathological worst-case behavior when its guesses are repeatedly wrong. Weighted bisection can degrade to a linear scan, but libopusfile's worst case is within a constant factor of naive bisection (i.e., logarithmic). We have only ever observed such pathological behavior in files we manually constructed to trigger it.<br />
<br />
libopusfile also takes shortcuts when the target location is near the current position, to make small seeks cheaper. In the best case it can loop forever over very short files whose data is contained in a single page (e.g., less than 1 second long with default encoder settings) without any seeking at all.<br />
<br />
You can find more information on seeking in multiplexed files when you want to play more than just a single Opus stream '''[[GranulePosAndSeeking|on this page]]'''.<br />
<br />
=== Wouldn't it be better to build an index? ===<br />
<br />
As with file durations, an index at the beginning of the file is incompatible with live streaming. It also means more data has to be fetched before a file can start playing over the network, because you must read past the index even when you don't intend to seek. The index could be stored at the end (which even still allows encoding the file in a single pass), but this requires one (or more) extra seeks to read the index (especially if its exact location at the end is not known), either on file open or on first seek. Unlike the final timestamp, which is small and fixed in size, an index grows with the file duration, and can have unbounded size. It is also easy for an index to become out of sync with a file that has been edited or damaged, in which case seeking will simply fail. In contrast, you can seek in a truncated .opus download without issues.<br />
<br />
In practice, bisection seeking on VBR audio achieves performance that is very nearly as good as seeking with an index, without any of the drawbacks of an index. libopusfile provides a test program called seeking_example which can be used to benchmark the performance on your files.<br />
<br />
On a 96 kbps VBR file nearly one hour long (the second movement of Mahler's Symphony No. 8 "Symphony of a Thousand"):<br />
<br />
Testing exact PCM seeking to random places in 169680000 samples (58m55.000s)...<br />
Total seek operations: 1020 (1.020 per exact seek, 2 maximum).<br />
<br />
On a chained file formed by concatenating the eight test vectors for the currently supported channel layouts in mapping family 1:<br />
<br />
Opened file containing 8 links with 18 seeks (2.250 per link).<br />
Testing exact PCM seeking to random places in 2759064 samples (57.481s)...<br />
Total seek operations: 946 (0.946 per exact seek, 2 maximum).<br />
<br />
That is, the number of physical seeks required is almost always 1, every once in a while 2, and in short files, sometimes even 0.<br />
<br />
[[Category:Opus]]</div>Derfhttps://wiki.xiph.org/index.php?title=OpusFAQ&diff=16600OpusFAQ2017-07-07T20:36:34Z<p>Derf: /* Wouldn't it be better to build an index? */</p>
<hr />
<div>[[Image:Opus logo trans.png]]<br />
<br />
If you are looking for info not covered in this FAQ, try the '''[https://opus-codec.org main Opus website]''' or the pages included in the '''[[:Category:Opus|Opus category]]''' of this wiki.<br />
<br />
== General Questions ==<br />
<br />
=== What is Opus? Who created it? ===<br />
<br />
Opus is a totally open, royalty-free, highly versatile audio codec.<br />
<br />
It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''. <br />
<br />
Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.<br />
<br />
=== How does Opus compare to other codecs? ===<br />
<br />
Opus is distinguished from most high quality formats (eg: [[Vorbis]], AAC, MP3) by having '''[https://tools.ietf.org/html/rfc6716#section-2 low delay]''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: [[Speex]], G.711, GSM) by supporting '''[https://tools.ietf.org/html/rfc6716#section-2.1.1 high audio quality]''' (supports narrow-band all the way to full-band audio).<br />
<br />
It '''[https://opus-codec.org/comparison meets or exceeds existing codecs' quality]''' across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.<br />
<br />
Most importantly, the Opus format and its reference implementation are both available under '''[https://opus-codec.org/license/ liberal, royalty-free licenses]'''.<br /><br />
This makes it:<br />
* easy to adopt<br />
* compatible with free software<br />
* suitable for use as part of the basic infrastructure of the Internet<br />
<br />
=== Does Opus make all those other lossy codecs obsolete? ===<br />
<br />
Yes.<br />
<br />
From a technical point of view (loss, delay, bitrates, ...) Opus renders '''[[Speex]]''' obsolete and should also replace '''[[Vorbis]]''' and the common proprietary codecs too (e.g. AAC, MP3, ...).<br />
<br />
=== Will Opus replace Vorbis in video files? ===<br />
<br />
For '''[[Ogg]]''' video files (which use the '''[[Theora]]''' video codec), you ''can'' use Opus instead of Vorbis, but the overall size reduction will be minimal and it will break compatibility with existing players.<br />
<br />
For WebM video files, the convention is to use the '''[http://www.webmproject.org/vp9/ VP9 video codec]''' when using Opus as an audio codec.<br />
<br />
=== How do I use Opus? ===<br />
<br />
For now, the best way to '''encode''' audio into Opus files is to use the '''opusenc''' command-line tool from the '''[https://opus-codec.org/downloads/ opus-tools package]'''.<br />
<br />
If you want to encode many files at once (e.g. your music library), try the applications listed in the '''[[OpusSupport|Opus Support]]''' page.<br />
<br />
For rough guidelines on encoding settings, see the '''[[Opus Recommended Settings]]''' page.<br />
<br />
=== What programs support Opus? ===<br />
<br />
Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.<br />
<br />
For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.<br />
<br />
Opus is a relatively new codec: '''[[OpusSupport|many more applications]]''' will support it in the near future.<br />
<br />
=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===<br />
<br />
Yes and no.<br />
<br />
Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.<br />
<br />
However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.<br />
<br />
The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.<br />
<br />
See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.<br />
<br />
If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!<br />
<br />
=== What are the licensing requirements? ===<br />
<br />
The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met. <br />
<br />
Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).<br />
<br />
See the '''[https://www.opus-codec.org/license/ Opus Licensing]''' page for details.<br />
<br />
=== Why make Opus free? ===<br />
<br />
On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.<br />
<br />
Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.<br />
<br />
Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.<br />
<br />
This is why Opus, unlike many codecs, is free.<br />
<br />
=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===<br />
<br />
No.<br />
<br />
The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.<br />
<br />
=== Why not keep the SILK and CELT codecs separate? ===<br />
Opus is more than just two independent codecs with a switch.<br />
<br />
In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.<br />
<br />
Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.<br />
<br />
=== Now that Opus is standardized, will its development stop or can it be further improved? ===<br />
Yes, Opus '''can''' and '''should''' be improved, because unlike most '''[https://en.wikipedia.org/wiki/ITU-T#Key_standards_published_by_ITU ITU-T codecs]''', Opus is only defined in terms of its decoder.<br />
<br />
The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.<br />
<br />
Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.<br />
<br />
In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.<br />
<br />
=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===<br />
<br />
Yes.<br />
<br />
=== In what ways is Opus optimized for the Internet? ===<br />
<br />
Opus has good packet loss robustness and concealment, but its optimisations go further.<br />
<br />
One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.<br />
<br />
This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.<br />
<br />
One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.<br />
<br />
=== What applications for Android can play Opus? ===<br />
<br />
Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.<br />
<br />
=== When will the next version be released? ===<br />
<br />
When it's done. Seriously, we do not know.<br />
<br />
Opus is not a large project with a fixed release schedule.<br />
<br />
That being said, our '''[https://www.opus-codec.org/downloads/ pre-releases]''' and even the git repositories ('''[https://git.xiph.org/?p=opus.git Xiph]''', '''[https://github.com/xiph/opus GitHub]''') are pretty stable and given proper testing (which you should always do anyway), are safe to distribute.<br />
<br />
Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.<br />
<br />
== Software Developers' Questions ==<br />
<br />
=== On what platforms does Opus run? ===<br />
<br />
The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.<br />
<br />
Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.<br />
<br />
=== Is there a fixed-point implementation? ===<br />
<br />
Yes.<br />
<br />
The fixed-point and floating-point decoder and encoder implementations are part of the same code base.<br />
<br />
The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.<br />
<br />
=== Which implementation should I use? ===<br />
<br />
While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.<br />
<br />
The [https://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.<br />
<br />
All Opus implementations are compatible by definition.<br />
<br />
=== How is supporting Opus different from supporting Speex/G.711/MP3? ===<br />
<br />
Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''. <br />
<br />
The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.<br />
<br />
=== My application doesn't work. Can anyone help me? ===<br />
<br />
It's possible to get help, but before doing so, there are a few basic things to try:<br />
<br />
* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.<br />
* Read the [https://www.opus-codec.org/docs/ Opus documentation].<br />
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.<br />
<br />
If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.<br />
<br />
=== How do I report a bug? ===<br />
<br />
If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].<br />
<br />
Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.<br />
<br />
If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.<br />
<br />
Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].<br />
<br />
=== What is Opus Custom? ===<br />
<br />
Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.<br />
<br />
Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.<br />
<br />
For these reasons, '''its use is discouraged''' outside of very specific applications. <br />
<br />
You may want to use Opus Custom for:<br />
<br />
* ultra-low-delay applications, where synchronization with the soundcard buffer is important. <br />
* low-power embedded applications, where compatibility with others is not important.<br />
<br />
For almost all other types of applications, Opus Custom should not be used.<br />
<br />
=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===<br />
<br />
Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.<br />
<br />
Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.<br />
<br />
The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed '''[https://github.com/xiph/opus-tools/blob/master/src/resample.c resampler]''' which can be used where resampling is required.<br />
<br />
=== But won't the resampler hurt the quality? Isn't it better to use 44.1 kHz directly? ===<br />
<br />
Not really. The quality degradation caused by any reasonable resampler (SoX, libspeexdsp, libsamplerate, ...) is far less than the distortion caused by the best lossy codec at its highest bitrate. If you can't tolerate the quality degradation caused by a good 44.1 ↔ 48 kHz resampler, then you shouldn't be using a lossy codec in the first place. Similarly, the extra CPU spent in the resampler is small compared to the rest of the codec. Not only that, but many soundcards only support 48 kHz on playback, so players can directly play the output rather than resample it to 48 kHz (e.g. for a 44.1 kHz MP3). So effectively, Opus is only shifting the burden of resampling from the decoder side to the encoder side.<br />
<br />
One advantage of supporting only one internal rate is that it makes it possible for Opus to support many features, including efficient speech compression (through SILK) and real-time applications. It also means all the quality tuning effort can be spent on a single configuration, which helps bring even better quality.<br />
<br />
=== How is the bitrate setting used in VBR mode? ===<br />
<br />
Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.<br />
<br />
The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.<br />
<br />
=== What frame size should I use? ===<br />
<br />
A '''20ms''' frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.<br />
<br />
Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent). For file encoding, using a frame size larger than 20 ms will usually result in '''worse''' quality for the same bitrate because it constrains the encoder in the decisions it can make.<br />
<br />
=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===<br />
<br />
The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.<br />
<br />
In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.<br />
<br />
FEC is only used by the encoder under certain conditions:<br />
* the feature must be enabled via the '''OPUS_SET_INBAND_FEC''' CTL<br />
* the encoder must be told to expect loss via the '''OPUS_SET_PACKET_LOSS_PERC''' CTL<br />
* the codec must be operated in any of the '''Linear Prediction''' or '''Hybrid''' modes<br />
<br />
Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.<br />
<br />
Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default, the implementation assumes there is no loss.<br />
<br />
=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===<br />
<br />
A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.<br />
<br />
To build Opus without the references to <tt>malloc/free</tt>, you must:<br />
<br />
* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application<br />
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.<br />
<br />
If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage.<br><br />
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.<br />
<br />
=== How can I ensure that my software interoperates with other software implementing Opus? ===<br />
<br />
For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.<br />
<br />
In general, here's a list of specific issues to check:<br />
* Can your application handle all frame sizes, including changing the frame size from frame to frame?<br />
* Does your application react properly to lost packets, by calling the decoder with a NULL packet?<br />
<br />
=== What is the complexity of Opus? ===<br />
<br />
The complexity of Opus varies by a large amount based on the settings used.<br />
<br />
It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone. <br />
<br />
For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.<br />
<br />
=== Opus is using too much CPU for my application. What can I do? ===<br />
<br />
First don't panic and don't start writing assembly just yet.<br />
<br />
It's possible that you're just not using the right set of options.<br />
<br />
If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.<br />
<br />
Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[https://www.opus-codec.org/docs/ Documentation]''' for details).<br />
<br />
If all else fails and you need to optimize the Opus code, see the next question.<br />
<br />
=== I would like to optimize/improve/help with Opus. Where should I start? ===<br />
<br />
Please '''[https://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.<br />
<br />
This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path. More details in the '''[[OpusContributing|contributing page]]'''.<br />
<br />
=== Does Opus have an echo canceller like Speex does? ===<br />
<br />
Echo cancellation is completely independent from codecs.<br />
<br />
You can use any echo canceller (including the one from libspeexdsp) along with Opus.<br />
<br />
That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].<br />
<br />
=== How do I get the duration of a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__info.html op_pcm_total()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement this yourself, you need to<br />
* Read the BOS (Beginning Of Stream) pages to enumerate the serial numbers of all concurrently multiplexed streams, identify the Opus stream you want, and get its preskip value.<br />
* Read up through the first complete audio data page to compute the starting granule position (since the timestamps might not start at 0, e.g., if the file was captured from a live stream that was joined after the start).<br />
* Seek near the end of a file and look for a page with the same serial number as found in the headers (just under 64 kB from the end should be sufficient to ensure you find a page, assuming the Opus data is not multiplexed with another stream, or there is no trailing garbage in the file).<br />
* If you find a page whose serial number was not included in the original set of BOS pages, you have a chained stream. You need to bisect the file to identify the end of the first chain and the start of the next, and repeat this process for each link in the chain.<br />
* If you don't find any pages at all, or find a page whose serial number was included in the original set of BOS pages, but was not the serial number of the Opus stream you want, back up and try again (being careful to avoid rescanning the same data, which can produce quadratic worst-case complexity).<br />
* If you find a page whose serial number matches the Opus stream you want, look at its final granule position, and compute the total duration (in seconds) as (final_granule_position - initial_granule_position - preskip)/48000.0.<br />
<br />
=== Why don't you store the duration in the header? Isn't all of that slow and complicated? ===<br />
<br />
Computing the duration directly from the file contents allows files to be written in a single pass, without any seeking, which is necessary for live streaming. Chaining also simplifies live streaming, as you can just pipe multiple files into the same network connection, with all associated metadata updates, etc., and the results are still valid .opus files (contrast with the '''[http://www.smackfu.com/stuff/programming/shoutcast.html hacks used to add metadata to MP3 streams]''').<br />
<br />
Opening a typical .opus file, which is not multiplexed and not chained, and computing the duration over the network requires just one extra HTTP request, which can proceed in parallel with the buffering in the main request. This is the behavior you will get from libopusfile's HTTP backend by default.<br />
<br />
Enumeration of chain boundaries can be expensive in files with many links, but in our testing libopusfile used nearly an order of magnitude fewer seeks to do this than some other media frameworks (at the time). Storing a duration in a header wouldn't solve this, since every link in a chain has its own, independent headers. If the cost of chain enumeration is a problem, the best way to avoid it is to store the links in separate files (i.e., don't use chaining).<br />
<br />
=== How do I seek in a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__seeking.html op_pcm_seek() or op_raw_seek()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement seeking yourself, you need to<br />
* Identify the link that contains the target (if you have a chained file).<br />
* Adjust the target by 80 ms to ensure you will get enough pre-roll data (to ensure the decoder will have converged by the time you reach the target), as recommended by '''[https://tools.ietf.org/html/rfc7845 RFC 7845]'''.<br />
* Estimate the location of the last audio data page with a completed packet prior to the adjusted target, using the duration and size (in bytes) of the link.<br />
* Seek to that location and scan forward until you find an audio data page with a completed packet (that contains a valid granule position).<br />
* If you think you are sufficiently close to the adjusted target, scan forward until you find the next audio data page with a completed packet.<br />
* If the adjusted target lies between the first audio data page with a completed packet you found and the next one, stop. You can decode forward from here and start playing when you reach your (original unadjusted) target.<br />
* Otherwise, go back and re-estimate the seek location using the granule positions and file offsets of the page(s) you just found.<br />
<br />
libopusfile includes fallbacks to prevent pathological worst-case behavior when its guesses are repeatedly wrong. Weighted bisection can degrade to a linear scan, but libopusfile's worst case is within a constant factor of naive bisection (i.e., logarithmic). We have only ever observed such pathological behavior in files we manually constructed to trigger it.<br />
<br />
libopusfile also takes shortcuts when the target location is near the current position, to make small seeks cheaper. In the best case it can loop forever over very short files whose data is contained in a single page (e.g., less than 1 second long with default encoder settings) without any seeking at all.<br />
<br />
You can find more information on seeking in multiplexed files when you want to play more than just a single Opus stream '''[[GranulePosAndSeeking|on this page]]'''.<br />
<br />
=== Wouldn't it be better to build an index? ===<br />
<br />
As with file durations, an index at the beginning of the file is incompatible with live streaming. It also means more data has to be fetched before a file can start playing over the network, because you must read past the index even when you don't intend to seek. The index could be stored at the end (which even still allows encoding the file in a single pass), but this requires one (or more) extra seeks to read the index (especially if its exact location at the end is not known), either on file open or on first seek. Unlike the final timestamp, which is small and fixed in size, an index grows with the file duration, and can have unbounded size. It is also easy for an index to become out of sync with a file that has been edited or damaged, in which case seeking will simply fail. You can seek in a truncated .opus download without issues.<br />
<br />
In practice, bisection seeking on VBR audio achieves performance that is very nearly as good as seeking with an index, without any of these drawbacks. libopusfile provides a test program called seeking_example which can be used to benchmark the performance on your files.<br />
<br />
On a 96 kbps VBR file nearly one hour long (the second movement of Mahler's Symphony No. 8 "Symphony of a Thousand"):<br />
<br />
Testing exact PCM seeking to random places in 169680000 samples (58m55.000s)...<br />
Total seek operations: 1020 (1.020 per exact seek, 2 maximum).<br />
<br />
On a chained file formed by concatenating the eight test vectors for the currently supported channel layouts in mapping family 1:<br />
<br />
Opened file containing 8 links with 18 seeks (2.250 per link).<br />
Testing exact PCM seeking to random places in 2759064 samples (57.481s)...<br />
Total seek operations: 946 (0.946 per exact seek, 2 maximum).<br />
<br />
That is, the number of physical seeks required is almost always 1, every once in a while 2, and in short files, sometimes even 0.<br />
<br />
[[Category:Opus]]</div>Derfhttps://wiki.xiph.org/index.php?title=OpusFAQ&diff=16599OpusFAQ2017-07-07T20:32:46Z<p>Derf: /* Wouldn't it be better to build an index? */</p>
<hr />
<div>[[Image:Opus logo trans.png]]<br />
<br />
If you are looking for info not covered in this FAQ, try the '''[https://opus-codec.org main Opus website]''' or the pages included in the '''[[:Category:Opus|Opus category]]''' of this wiki.<br />
<br />
== General Questions ==<br />
<br />
=== What is Opus? Who created it? ===<br />
<br />
Opus is a totally open, royalty-free, highly versatile audio codec.<br />
<br />
It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''. <br />
<br />
Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.<br />
<br />
=== How does Opus compare to other codecs? ===<br />
<br />
Opus is distinguished from most high quality formats (eg: [[Vorbis]], AAC, MP3) by having '''[https://tools.ietf.org/html/rfc6716#section-2 low delay]''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: [[Speex]], G.711, GSM) by supporting '''[https://tools.ietf.org/html/rfc6716#section-2.1.1 high audio quality]''' (supports narrow-band all the way to full-band audio).<br />
<br />
It '''[https://opus-codec.org/comparison meets or exceeds existing codecs' quality]''' across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.<br />
<br />
Most importantly, the Opus format and its reference implementation are both available under '''[https://opus-codec.org/license/ liberal, royalty-free licenses]'''.<br /><br />
This makes it:<br />
* easy to adopt<br />
* compatible with free software<br />
* suitable for use as part of the basic infrastructure of the Internet<br />
<br />
=== Does Opus make all those other lossy codecs obsolete? ===<br />
<br />
Yes.<br />
<br />
From a technical point of view (loss, delay, bitrates, ...) Opus renders '''[[Speex]]''' obsolete and should also replace '''[[Vorbis]]''' and the common proprietary codecs too (e.g. AAC, MP3, ...).<br />
<br />
=== Will Opus replace Vorbis in video files? ===<br />
<br />
For '''[[Ogg]]''' video files (which use the '''[[Theora]]''' video codec), you ''can'' use Opus instead of Vorbis, but the overall size reduction will be minimal and it will break compatibility with existing players.<br />
<br />
For WebM video files, the convention is to use the '''[http://www.webmproject.org/vp9/ VP9 video codec]''' when using Opus as an audio codec.<br />
<br />
=== How do I use Opus? ===<br />
<br />
For now, the best way to '''encode''' audio into Opus files is to use the '''opusenc''' command-line tool from the '''[https://opus-codec.org/downloads/ opus-tools package]'''.<br />
<br />
If you want to encode many files at once (e.g. your music library), try the applications listed in the '''[[OpusSupport|Opus Support]]''' page.<br />
<br />
For rough guidelines on encoding settings, see the '''[[Opus Recommended Settings]]''' page.<br />
<br />
=== What programs support Opus? ===<br />
<br />
Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.<br />
<br />
For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.<br />
<br />
Opus is a relatively new codec: '''[[OpusSupport|many more applications]]''' will support it in the near future.<br />
<br />
=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===<br />
<br />
Yes and no.<br />
<br />
Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.<br />
<br />
However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.<br />
<br />
The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.<br />
<br />
See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.<br />
<br />
If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!<br />
<br />
=== What are the licensing requirements? ===<br />
<br />
The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met. <br />
<br />
Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).<br />
<br />
See the '''[https://www.opus-codec.org/license/ Opus Licensing]''' page for details.<br />
<br />
=== Why make Opus free? ===<br />
<br />
On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.<br />
<br />
Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.<br />
<br />
Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.<br />
<br />
This is why Opus, unlike many codecs, is free.<br />
<br />
=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===<br />
<br />
No.<br />
<br />
The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.<br />
<br />
=== Why not keep the SILK and CELT codecs separate? ===<br />
Opus is more than just two independent codecs with a switch.<br />
<br />
In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.<br />
<br />
Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.<br />
<br />
=== Now that Opus is standardized, will its development stop or can it be further improved? ===<br />
Yes, Opus '''can''' and '''should''' be improved, because unlike most '''[https://en.wikipedia.org/wiki/ITU-T#Key_standards_published_by_ITU ITU-T codecs]''', Opus is only defined in terms of its decoder.<br />
<br />
The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.<br />
<br />
Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.<br />
<br />
In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.<br />
<br />
=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===<br />
<br />
Yes.<br />
<br />
=== In what ways is Opus optimized for the Internet? ===<br />
<br />
Opus has good packet loss robustness and concealment, but its optimisations go further.<br />
<br />
One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.<br />
<br />
This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.<br />
<br />
One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.<br />
<br />
=== What applications for Android can play Opus? ===<br />
<br />
Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.<br />
<br />
=== When will the next version be released? ===<br />
<br />
When it's done. Seriously, we do not know.<br />
<br />
Opus is not a large project with a fixed release schedule.<br />
<br />
That being said, our '''[https://www.opus-codec.org/downloads/ pre-releases]''' and even the git repositories ('''[https://git.xiph.org/?p=opus.git Xiph]''', '''[https://github.com/xiph/opus GitHub]''') are pretty stable and given proper testing (which you should always do anyway), are safe to distribute.<br />
<br />
Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.<br />
<br />
== Software Developers' Questions ==<br />
<br />
=== On what platforms does Opus run? ===<br />
<br />
The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.<br />
<br />
Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.<br />
<br />
=== Is there a fixed-point implementation? ===<br />
<br />
Yes.<br />
<br />
The fixed-point and floating-point decoder and encoder implementations are part of the same code base.<br />
<br />
The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.<br />
<br />
=== Which implementation should I use? ===<br />
<br />
While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.<br />
<br />
The [https://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.<br />
<br />
All Opus implementations are compatible by definition.<br />
<br />
=== How is supporting Opus different from supporting Speex/G.711/MP3? ===<br />
<br />
Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''. <br />
<br />
The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.<br />
<br />
=== My application doesn't work. Can anyone help me? ===<br />
<br />
It's possible to get help, but before doing so, there are a few basic things to try:<br />
<br />
* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.<br />
* Read the [https://www.opus-codec.org/docs/ Opus documentation].<br />
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.<br />
<br />
If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.<br />
<br />
=== How do I report a bug? ===<br />
<br />
If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].<br />
<br />
Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.<br />
<br />
If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.<br />
<br />
Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].<br />
<br />
=== What is Opus Custom? ===<br />
<br />
Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.<br />
<br />
Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.<br />
<br />
For these reasons, '''its use is discouraged''' outside of very specific applications. <br />
<br />
You may want to use Opus Custom for:<br />
<br />
* ultra-low-delay applications, where synchronization with the soundcard buffer is important. <br />
* low-power embedded applications, where compatibility with others is not important.<br />
<br />
For almost all other types of applications, Opus Custom should not be used.<br />
<br />
=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===<br />
<br />
Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.<br />
<br />
Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.<br />
<br />
The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed '''[https://github.com/xiph/opus-tools/blob/master/src/resample.c resampler]''' which can be used where resampling is required.<br />
<br />
=== But won't the resampler hurt the quality? Isn't it better to use 44.1 kHz directly? ===<br />
<br />
Not really. The quality degradation caused by any reasonable resampler (SoX, libspeexdsp, libsamplerate, ...) is far less than the distortion caused by the best lossy codec at its highest bitrate. If you can't tolerate the quality degradation caused by a good 44.1 ↔ 48 kHz resampler, then you shouldn't be using a lossy codec in the first place. Similarly, the extra CPU spent in the resampler is small compared to the rest of the codec. Not only that, but many soundcards only support 48 kHz on playback, so players can directly play the output rather than resample it to 48 kHz (e.g. for a 44.1 kHz MP3). So effectively, Opus is only shifting the burden of resampling from the decoder side to the encoder side.<br />
<br />
One advantage of supporting only one internal rate is that it makes it possible for Opus to support many features, including efficient speech compression (through SILK) and real-time applications. It also means all the quality tuning effort can be spent on a single configuration, which helps bring even better quality.<br />
<br />
=== How is the bitrate setting used in VBR mode? ===<br />
<br />
Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.<br />
<br />
The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.<br />
<br />
=== What frame size should I use? ===<br />
<br />
A '''20ms''' frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.<br />
<br />
Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent). For file encoding, using a frame size larger than 20 ms will usually result in '''worse''' quality for the same bitrate because it constrains the encoder in the decisions it can make.<br />
<br />
=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===<br />
<br />
The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.<br />
<br />
In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.<br />
<br />
FEC is only used by the encoder under certain conditions:<br />
* the feature must be enabled via the '''OPUS_SET_INBAND_FEC''' CTL<br />
* the encoder must be told to expect loss via the '''OPUS_SET_PACKET_LOSS_PERC''' CTL<br />
* the codec must be operated in any of the '''Linear Prediction''' or '''Hybrid''' modes<br />
<br />
Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.<br />
<br />
Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default, the implementation assumes there is no loss.<br />
<br />
=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===<br />
<br />
A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.<br />
<br />
To build Opus without the references to <tt>malloc/free</tt>, you must:<br />
<br />
* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application<br />
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.<br />
<br />
If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage.<br><br />
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.<br />
<br />
=== How can I ensure that my software interoperates with other software implementing Opus? ===<br />
<br />
For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.<br />
<br />
In general, here's a list of specific issues to check:<br />
* Can your application handle all frame sizes, including changing the frame size from frame to frame?<br />
* Does your application react properly to lost packets, by calling the decoder with a NULL packet?<br />
<br />
=== What is the complexity of Opus? ===<br />
<br />
The complexity of Opus varies by a large amount based on the settings used.<br />
<br />
It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone. <br />
<br />
For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.<br />
<br />
=== Opus is using too much CPU for my application. What can I do? ===<br />
<br />
First don't panic and don't start writing assembly just yet.<br />
<br />
It's possible that you're just not using the right set of options.<br />
<br />
If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.<br />
<br />
Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[https://www.opus-codec.org/docs/ Documentation]''' for details).<br />
<br />
If all else fails and you need to optimize the Opus code, see the next question.<br />
<br />
=== I would like to optimize/improve/help with Opus. Where should I start? ===<br />
<br />
Please '''[https://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.<br />
<br />
This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path. More details in the '''[[OpusContributing|contributing page]]'''.<br />
<br />
=== Does Opus have an echo canceller like Speex does? ===<br />
<br />
Echo cancellation is completely independent from codecs.<br />
<br />
You can use any echo canceller (including the one from libspeexdsp) along with Opus.<br />
<br />
That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].<br />
<br />
=== How do I get the duration of a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__info.html op_pcm_total()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement this yourself, you need to<br />
* Read the BOS (Beginning Of Stream) pages to enumerate the serial numbers of all concurrently multiplexed streams, identify the Opus stream you want, and get its preskip value.<br />
* Read up through the first complete audio data page to compute the starting granule position (since the timestamps might not start at 0, e.g., if the file was captured from a live stream that was joined after the start).<br />
* Seek near the end of a file and look for a page with the same serial number as found in the headers (just under 64 kB from the end should be sufficient to ensure you find a page, assuming the Opus data is not multiplexed with another stream, or there is no trailing garbage in the file).<br />
* If you find a page whose serial number was not included in the original set of BOS pages, you have a chained stream. You need to bisect the file to identify the end of the first chain and the start of the next, and repeat this process for each link in the chain.<br />
* If you don't find any pages at all, or find a page whose serial number was included in the original set of BOS pages, but was not the serial number of the Opus stream you want, back up and try again (being careful to avoid rescanning the same data, which can produce quadratic worst-case complexity).<br />
* If you find a page whose serial number matches the Opus stream you want, look at its final granule position, and compute the total duration (in seconds) as (final_granule_position - initial_granule_position - preskip)/48000.0.<br />
<br />
=== Why don't you store the duration in the header? Isn't all of that slow and complicated? ===<br />
<br />
Computing the duration directly from the file contents allows files to be written in a single pass, without any seeking, which is necessary for live streaming. Chaining also simplifies live streaming, as you can just pipe multiple files into the same network connection, with all associated metadata updates, etc., and the results are still valid .opus files (contrast with the '''[http://www.smackfu.com/stuff/programming/shoutcast.html hacks used to add metadata to MP3 streams]''').<br />
<br />
Opening a typical .opus file, which is not multiplexed and not chained, and computing the duration over the network requires just one extra HTTP request, which can proceed in parallel with the buffering in the main request. This is the behavior you will get from libopusfile's HTTP backend by default.<br />
<br />
Enumeration of chain boundaries can be expensive in files with many links, but in our testing libopusfile used nearly an order of magnitude fewer seeks to do this than some other media frameworks (at the time). Storing a duration in a header wouldn't solve this, since every link in a chain has its own, independent headers. If the cost of chain enumeration is a problem, the best way to avoid it is to store the links in separate files (i.e., don't use chaining).<br />
<br />
=== How do I seek in a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__seeking.html op_pcm_seek() or op_raw_seek()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement seeking yourself, you need to<br />
* Identify the link that contains the target (if you have a chained file).<br />
* Adjust the target by 80 ms to ensure you will get enough pre-roll data (to ensure the decoder will have converged by the time you reach the target), as recommended by '''[https://tools.ietf.org/html/rfc7845 RFC 7845]'''.<br />
* Estimate the location of the last audio data page with a completed packet prior to the adjusted target, using the duration and size (in bytes) of the link.<br />
* Seek to that location and scan forward until you find an audio data page with a completed packet (that contains a valid granule position).<br />
* If you think you are sufficiently close to the adjusted target, scan forward until you find the next audio data page with a completed packet.<br />
* If the adjusted target lies between the first audio data page with a completed packet you found and the next one, stop. You can decode forward from here and start playing when you reach your (original unadjusted) target.<br />
* Otherwise, go back and re-estimate the seek location using the granule positions and file offsets of the page(s) you just found.<br />
<br />
libopusfile includes fallbacks to prevent pathological worst-case behavior when its guesses are repeatedly wrong. Weighted bisection can degrade to a linear scan, but libopusfile's worst case is within a constant factor of naive bisection (i.e., logarithmic). We have only ever observed such pathological behavior in files we manually constructed to trigger it.<br />
<br />
libopusfile also takes shortcuts when the target location is near the current position, to make small seeks cheaper. In the best case it can loop forever over very short files whose data is contained in a single page (e.g., less than 1 second long with default encoder settings) without any seeking at all.<br />
<br />
You can find more information on seeking in multiplexed files when you want to play more than just a single Opus stream '''[[GranulePosAndSeeking|on this page]]'''.<br />
<br />
=== Wouldn't it be better to build an index? ===<br />
<br />
As with file durations, an index at the beginning of the file is incompatible with live streaming. It also means more data has to be fetched before a file can start playing over the network, because you must read past the index even when you don't intend to seek. The index could be stored at the end (which even still allows encoding the file in a single pass), but this requires one (or more) extra seeks to read the index (especially if its exact location at the end is not known), either on file open or on first seek. Unlike the final timestamp, which is small and fixed in size, an index grows with the file duration, and can have unbounded size. It is also easy for an index to become out of sync with a file that has been edited, in which case seeking will simply fail.<br />
<br />
In practice, bisection seeking on VBR audio achieves performance that is very nearly as good as seeking with an index, without any of these drawbacks. libopusfile provides a test program called seeking_example which can be used to benchmark the performance on your files.<br />
<br />
On a 96 kbps VBR file nearly one hour long (the second movement of Mahler's Symphony No. 8 "Symphony of a Thousand"):<br />
<br />
Testing exact PCM seeking to random places in 169680000 samples (58m55.000s)...<br />
Total seek operations: 1020 (1.020 per exact seek, 2 maximum).<br />
<br />
On a chained file formed by concatenating the eight test vectors for the currently supported channel layouts in mapping family 1:<br />
<br />
Opened file containing 8 links with 18 seeks (2.250 per link).<br />
Testing exact PCM seeking to random places in 2759064 samples (57.481s)...<br />
Total seek operations: 946 (0.946 per exact seek, 2 maximum).<br />
<br />
That is, the number of physical seeks required is almost always 1, every once in a while 2, and in short files, sometimes even 0.<br />
<br />
[[Category:Opus]]</div>Derfhttps://wiki.xiph.org/index.php?title=OpusFAQ&diff=16598OpusFAQ2017-07-07T20:28:57Z<p>Derf: Add questions and answers on file duration and seeking</p>
<hr />
<div>[[Image:Opus logo trans.png]]<br />
<br />
If you are looking for info not covered in this FAQ, try the '''[https://opus-codec.org main Opus website]''' or the pages included in the '''[[:Category:Opus|Opus category]]''' of this wiki.<br />
<br />
== General Questions ==<br />
<br />
=== What is Opus? Who created it? ===<br />
<br />
Opus is a totally open, royalty-free, highly versatile audio codec.<br />
<br />
It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''. <br />
<br />
Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.<br />
<br />
=== How does Opus compare to other codecs? ===<br />
<br />
Opus is distinguished from most high quality formats (eg: [[Vorbis]], AAC, MP3) by having '''[https://tools.ietf.org/html/rfc6716#section-2 low delay]''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: [[Speex]], G.711, GSM) by supporting '''[https://tools.ietf.org/html/rfc6716#section-2.1.1 high audio quality]''' (supports narrow-band all the way to full-band audio).<br />
<br />
It '''[https://opus-codec.org/comparison meets or exceeds existing codecs' quality]''' across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.<br />
<br />
Most importantly, the Opus format and its reference implementation are both available under '''[https://opus-codec.org/license/ liberal, royalty-free licenses]'''.<br /><br />
This makes it:<br />
* easy to adopt<br />
* compatible with free software<br />
* suitable for use as part of the basic infrastructure of the Internet<br />
<br />
=== Does Opus make all those other lossy codecs obsolete? ===<br />
<br />
Yes.<br />
<br />
From a technical point of view (loss, delay, bitrates, ...) Opus renders '''[[Speex]]''' obsolete and should also replace '''[[Vorbis]]''' and the common proprietary codecs too (e.g. AAC, MP3, ...).<br />
<br />
=== Will Opus replace Vorbis in video files? ===<br />
<br />
For '''[[Ogg]]''' video files (which use the '''[[Theora]]''' video codec), you ''can'' use Opus instead of Vorbis, but the overall size reduction will be minimal and it will break compatibility with existing players.<br />
<br />
For WebM video files, the convention is to use the '''[http://www.webmproject.org/vp9/ VP9 video codec]''' when using Opus as an audio codec.<br />
<br />
=== How do I use Opus? ===<br />
<br />
For now, the best way to '''encode''' audio into Opus files is to use the '''opusenc''' command-line tool from the '''[https://opus-codec.org/downloads/ opus-tools package]'''.<br />
<br />
If you want to encode many files at once (e.g. your music library), try the applications listed in the '''[[OpusSupport|Opus Support]]''' page.<br />
<br />
For rough guidelines on encoding settings, see the '''[[Opus Recommended Settings]]''' page.<br />
<br />
=== What programs support Opus? ===<br />
<br />
Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.<br />
<br />
For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.<br />
<br />
Opus is a relatively new codec: '''[[OpusSupport|many more applications]]''' will support it in the near future.<br />
<br />
=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===<br />
<br />
Yes and no.<br />
<br />
Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.<br />
<br />
However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.<br />
<br />
The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.<br />
<br />
See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.<br />
<br />
If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!<br />
<br />
=== What are the licensing requirements? ===<br />
<br />
The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met. <br />
<br />
Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).<br />
<br />
See the '''[https://www.opus-codec.org/license/ Opus Licensing]''' page for details.<br />
<br />
=== Why make Opus free? ===<br />
<br />
On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.<br />
<br />
Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.<br />
<br />
Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.<br />
<br />
This is why Opus, unlike many codecs, is free.<br />
<br />
=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===<br />
<br />
No.<br />
<br />
The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.<br />
<br />
=== Why not keep the SILK and CELT codecs separate? ===<br />
Opus is more than just two independent codecs with a switch.<br />
<br />
In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.<br />
<br />
Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.<br />
<br />
=== Now that Opus is standardized, will its development stop or can it be further improved? ===<br />
Yes, Opus '''can''' and '''should''' be improved, because unlike most '''[https://en.wikipedia.org/wiki/ITU-T#Key_standards_published_by_ITU ITU-T codecs]''', Opus is only defined in terms of its decoder.<br />
<br />
The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.<br />
<br />
Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.<br />
<br />
In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.<br />
<br />
=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===<br />
<br />
Yes.<br />
<br />
=== In what ways is Opus optimized for the Internet? ===<br />
<br />
Opus has good packet loss robustness and concealment, but its optimisations go further.<br />
<br />
One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.<br />
<br />
This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.<br />
<br />
One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.<br />
<br />
=== What applications for Android can play Opus? ===<br />
<br />
Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.<br />
<br />
=== When will the next version be released? ===<br />
<br />
When it's done. Seriously, we do not know.<br />
<br />
Opus is not a large project with a fixed release schedule.<br />
<br />
That being said, our '''[https://www.opus-codec.org/downloads/ pre-releases]''' and even the git repositories ('''[https://git.xiph.org/?p=opus.git Xiph]''', '''[https://github.com/xiph/opus GitHub]''') are pretty stable and given proper testing (which you should always do anyway), are safe to distribute.<br />
<br />
Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.<br />
<br />
== Software Developers' Questions ==<br />
<br />
=== On what platforms does Opus run? ===<br />
<br />
The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.<br />
<br />
Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.<br />
<br />
=== Is there a fixed-point implementation? ===<br />
<br />
Yes.<br />
<br />
The fixed-point and floating-point decoder and encoder implementations are part of the same code base.<br />
<br />
The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.<br />
<br />
=== Which implementation should I use? ===<br />
<br />
While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.<br />
<br />
The [https://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.<br />
<br />
All Opus implementations are compatible by definition.<br />
<br />
=== How is supporting Opus different from supporting Speex/G.711/MP3? ===<br />
<br />
Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''. <br />
<br />
The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.<br />
<br />
=== My application doesn't work. Can anyone help me? ===<br />
<br />
It's possible to get help, but before doing so, there are a few basic things to try:<br />
<br />
* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.<br />
* Read the [https://www.opus-codec.org/docs/ Opus documentation].<br />
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.<br />
<br />
If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.<br />
<br />
=== How do I report a bug? ===<br />
<br />
If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].<br />
<br />
Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.<br />
<br />
If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.<br />
<br />
Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].<br />
<br />
=== What is Opus Custom? ===<br />
<br />
Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.<br />
<br />
Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.<br />
<br />
For these reasons, '''its use is discouraged''' outside of very specific applications. <br />
<br />
You may want to use Opus Custom for:<br />
<br />
* ultra-low-delay applications, where synchronization with the soundcard buffer is important. <br />
* low-power embedded applications, where compatibility with others is not important.<br />
<br />
For almost all other types of applications, Opus Custom should not be used.<br />
<br />
=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===<br />
<br />
Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.<br />
<br />
Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.<br />
<br />
The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed '''[https://github.com/xiph/opus-tools/blob/master/src/resample.c resampler]''' which can be used where resampling is required.<br />
<br />
=== But won't the resampler hurt the quality? Isn't it better to use 44.1 kHz directly? ===<br />
<br />
Not really. The quality degradation caused by any reasonable resampler (SoX, libspeexdsp, libsamplerate, ...) is far less than the distortion caused by the best lossy codec at its highest bitrate. If you can't tolerate the quality degradation caused by a good 44.1 ↔ 48 kHz resampler, then you shouldn't be using a lossy codec in the first place. Similarly, the extra CPU spent in the resampler is small compared to the rest of the codec. Not only that, but many soundcards only support 48 kHz on playback, so players can directly play the output rather than resample it to 48 kHz (e.g. for a 44.1 kHz MP3). So effectively, Opus is only shifting the burden of resampling from the decoder side to the encoder side.<br />
<br />
One advantage of supporting only one internal rate is that it makes it possible for Opus to support many features, including efficient speech compression (through SILK) and real-time applications. It also means all the quality tuning effort can be spent on a single configuration, which helps bring even better quality.<br />
<br />
=== How is the bitrate setting used in VBR mode? ===<br />
<br />
Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.<br />
<br />
The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.<br />
<br />
=== What frame size should I use? ===<br />
<br />
A '''20ms''' frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.<br />
<br />
Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent). For file encoding, using a frame size larger than 20 ms will usually result in '''worse''' quality for the same bitrate because it constrains the encoder in the decisions it can make.<br />
<br />
=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===<br />
<br />
The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.<br />
<br />
In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.<br />
<br />
FEC is only used by the encoder under certain conditions:<br />
* the feature must be enabled via the '''OPUS_SET_INBAND_FEC''' CTL<br />
* the encoder must be told to expect loss via the '''OPUS_SET_PACKET_LOSS_PERC''' CTL<br />
* the codec must be operated in any of the '''Linear Prediction''' or '''Hybrid''' modes<br />
<br />
Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.<br />
<br />
Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default, the implementation assumes there is no loss.<br />
<br />
=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===<br />
<br />
A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.<br />
<br />
To build Opus without the references to <tt>malloc/free</tt>, you must:<br />
<br />
* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application<br />
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.<br />
<br />
If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage.<br><br />
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.<br />
<br />
=== How can I ensure that my software interoperates with other software implementing Opus? ===<br />
<br />
For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.<br />
<br />
In general, here's a list of specific issues to check:<br />
* Can your application handle all frame sizes, including changing the frame size from frame to frame?<br />
* Does your application react properly to lost packets, by calling the decoder with a NULL packet?<br />
<br />
=== What is the complexity of Opus? ===<br />
<br />
The complexity of Opus varies by a large amount based on the settings used.<br />
<br />
It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone. <br />
<br />
For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.<br />
<br />
=== Opus is using too much CPU for my application. What can I do? ===<br />
<br />
First don't panic and don't start writing assembly just yet.<br />
<br />
It's possible that you're just not using the right set of options.<br />
<br />
If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.<br />
<br />
Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[https://www.opus-codec.org/docs/ Documentation]''' for details).<br />
<br />
If all else fails and you need to optimize the Opus code, see the next question.<br />
<br />
=== I would like to optimize/improve/help with Opus. Where should I start? ===<br />
<br />
Please '''[https://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.<br />
<br />
This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path. More details in the '''[[OpusContributing|contributing page]]'''.<br />
<br />
=== Does Opus have an echo canceller like Speex does? ===<br />
<br />
Echo cancellation is completely independent from codecs.<br />
<br />
You can use any echo canceller (including the one from libspeexdsp) along with Opus.<br />
<br />
That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].<br />
<br />
=== How do I get the duration of a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__info.html op_pcm_total()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement this yourself, you need to<br />
* Read the BOS (Beginning Of Stream) pages to enumerate the serial numbers of all concurrently multiplexed streams, identify the Opus stream you want, and get its preskip value.<br />
* Read up through the first complete audio data page to compute the starting granule position (since the timestamps might not start at 0, e.g., if the file was captured from a live stream that was joined after the start).<br />
* Seek near the end of a file and look for a page with the same serial number as found in the headers (just under 64 kB from the end should be sufficient to ensure you find a page, assuming the Opus data is not multiplexed with another stream, or there is no trailing garbage in the file).<br />
* If you find a page whose serial number was not included in the original set of BOS pages, you have a chained stream. You need to bisect the file to identify the end of the first chain and the start of the next, and repeat this process for each link in the chain.<br />
* If you don't find any pages at all, or find a page whose serial number was included in the original set of BOS pages, but was not the serial number of the Opus stream you want, back up and try again (being careful to avoid rescanning the same data, which can produce quadratic worst-case complexity).<br />
* If you find a page whose serial number matches the Opus stream you want, look at its final granule position, and compute the total duration (in seconds) as (final_granule_position - initial_granule_position - preskip)/48000.0.<br />
<br />
=== Why don't you store the duration in the header? Isn't all of that slow and complicated? ===<br />
<br />
Computing the duration directly from the file contents allows files to be written in a single pass, without any seeking, which is necessary for live streaming. Chaining also simplifies live streaming, as you can just pipe multiple files into the same network connection, with all associated metadata updates, etc., and the results are still valid .opus files (contrast with the '''[http://www.smackfu.com/stuff/programming/shoutcast.html hacks used to add metadata to MP3 streams]''').<br />
<br />
Opening a typical .opus file, which is not multiplexed and not chained, and computing the duration over the network requires just one extra HTTP request, which can proceed in parallel with the buffering in the main request. This is the behavior you will get from libopusfile's HTTP backend by default.<br />
<br />
Enumeration of chain boundaries can be expensive in files with many links, but in our testing libopusfile used nearly an order of magnitude fewer seeks to do this than some other media frameworks (at the time). Storing a duration in a header wouldn't solve this, since every link in a chain has its own, independent headers. If the cost of chain enumeration is a problem, the best way to avoid it is to store the links in separate files (i.e., don't use chaining).<br />
<br />
=== How do I seek in a .opus file? ===<br />
<br />
Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__seeking.html op_pcm_seek() or op_raw_seek()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.<br />
<br />
If you want to implement seeking yourself, you need to<br />
* Identify the link that contains the target (if you have a chained file).<br />
* Adjust the target by 80 ms to ensure you will get enough pre-roll data (to ensure the decoder will have converged by the time you reach the target), as recommended by '''[https://tools.ietf.org/html/rfc7845 RFC 7845]'''.<br />
* Estimate the location of the last audio data page with a completed packet prior to the adjusted target, using the duration and size (in bytes) of the link.<br />
* Seek to that location and scan forward until you find an audio data page with a completed packet (that contains a valid granule position).<br />
* If you think you are sufficiently close to the adjusted target, scan forward until you find the next audio data page with a completed packet.<br />
* If the adjusted target lies between the first audio data page with a completed packet you found and the next one, stop. You can decode forward from here and start playing when you reach your (original unadjusted) target.<br />
* Otherwise, go back and re-estimate the seek location using the granule positions and file offsets of the page(s) you just found.<br />
<br />
libopusfile includes fallbacks to prevent pathological worst-case behavior when its guesses are repeatedly wrong. Weighted bisection can degrade to a linear scan, but libopusfile's worst case is within a constant factor of naive bisection (i.e., logarithmic). We have only ever observed such pathological behavior in files we manually constructed to trigger it.<br />
<br />
libopusfile also takes shortcuts when the target location is near the current position, to make small seeks cheaper. In the best case it can loop forever over very short files whose data is contained in a single page (e.g., less than 1 second long with default encoder settings) without any seeking at all.<br />
<br />
You can find more information on seeking in multiplexed files when you want to play more than just a single Opus stream '''[[GranulePosAndSeeking|on this page]]'''.<br />
<br />
=== Wouldn't it be better to build an index? ===<br />
<br />
As with file durations, an index at the beginning of the file is incompatible with live streaming. It also means more data has to be fetched before a file can start playing over the network, because you must read past the index even when you don't intend to seek. The index could be stored at the end (which even still allows encoding the file in a single pass), but this requires one (or more) extra seeks to read the index (especially if its exact location at the end is not known), either on file open or on first seek. It is also easy for an index to become out of sync with a file that has been edited, in which case seeking will simply fail.<br />
<br />
In practice, bisection seeking on VBR audio achieves performance that is very nearly as good as seeking with an index, without any of these drawbacks. libopusfile provides a test program called seeking_example which can be used to benchmark the performance on your files.<br />
<br />
On a 96 kbps VBR file nearly one hour long (the second movement of Mahler's Symphony No. 8 "Symphony of a Thousand"):<br />
<br />
Testing exact PCM seeking to random places in 169680000 samples (58m55.000s)...<br />
Total seek operations: 1020 (1.020 per exact seek, 2 maximum).<br />
<br />
On a chained file formed by concatenating the eight test vectors for the currently supported channel layouts in mapping family 1:<br />
<br />
Opened file containing 8 links with 18 seeks (2.250 per link).<br />
Testing exact PCM seeking to random places in 2759064 samples (57.481s)...<br />
Total seek operations: 946 (0.946 per exact seek, 2 maximum).<br />
<br />
That is, the number of physical seeks required is almost always 1, every once in a while 2, and in short files, sometimes even 0.<br />
<br />
[[Category:Opus]]</div>Derfhttps://wiki.xiph.org/index.php?title=MatroskaOpus&diff=16351MatroskaOpus2016-05-07T00:16:53Z<p>Derf: Add commentary about the OpusHead pre-skip value matching the Matroska CodecDelay element.</p>
<hr />
<div>{{draft}}<br />
This is an encapsulation spec for the [[Opus]] codec in [http://matroska.org/ Matroska]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.<br />
<br />
* CodecID is A_OPUS<br />
* SampleFrequecy is 48000<br />
* Channels is number of output PCM channels<br />
* SeekPreRoll is set to 80000000<br />
* CodecPrivate consists of the 'OpusHead' packet, identical to the Ogg mapping.<br />
<br />
The 'OpusHead' format is defined by the [https://tools.ietf.org/html/rfc7845 Ogg Opus] mapping. In particular it includes pre-skip, gain, and the channel mapping table required for correct surround output.<br />
<br />
The second 'OpusTags' header packet from Ogg Opus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.<br />
<br />
SeekPreRoll [56][BB] is a new unsigned integer element added to the TrackEntry element. The value is the number of nanoseconds that must be discarded, for that stream, after a seek until the decoded data is valid to render.<br />
<br />
CodecDelay [56][AA] is a new unsigned integer element added to the TrackEntry element. The value is the number of nanoseconds that must be discarded, for that stream, from the start of that stream. The value is also the number of nanoseconds that all encoded timestamps for that stream must be shifted to get the presentation timestamp. (This will fix Vorbis encoding as well.)<br />
<br />
DiscardPadding [75][A2] is a new signed integer element added to the BlockGroup element. DiscardPadding is the duration in nanoseconds of the silent data added to the Block (padding at the end of the block). The duration of DiscardPadding is not calculated in the duration of the Track and should be discarded during playback. (This will fix Vorbis encoding as well.)<br />
<br />
== Muxing Recommendations ==<br />
<br />
In order to prevent extraneous parsing of muxed content for the players that want to start playback at exactly time T, we will recommend muxers create files with another Cluster within N-1 at T-SeekPreRoll, where T is the start time of Cluster N. Then add CuePoints for all the new T-SeekPreRoll Clusters with a CueTrack of the audio stream. The CuePoints for the video stream will not change. <br />
<br />
For example, a file is a muxed MKV with the following characteristics: <br />
* 5 second interval between video keyframes<br />
* Each video keyframe begins a new Cluster<br />
* Cues will contain video keyframe CuePoints<br />
* For each video keyframe at time T there will be new Cluster at T-SeekPreRoll<br />
* Cues will contain audio CuePoints for T-SeekPreRoll Clusters<br />
* Audio and video are interleaved in monotonically increasing order<br />
<br />
Assume SeekPreRoll is 80 milliseconds, the first Cluster starts at 0 milliseconds with a video keyframe Block and has a duration of 4920 milliseconds. The second Cluster starts at 4920 milliseconds with an audio Block and has a duration of 80 milliseconds. Just to be clear, the second Cluster can contain Blocks from all streams. The third Cluster starts at 5000 milliseconds with a video keyframe Block and has a duration of 4920 milliseconds. The fourth Cluster starts at 9920 milliseconds with an audio Block and has a duration of 80 milliseconds.<br />
<br />
With this recommendation players that want audio and video to start playback at time T can seek to Cluster T-SeekPreRoll and start decoding the audio stream. This will work the same for both local and HTTP playback.<br />
<br />
== Open Questions ==<br />
<br />
* Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?<br />
** If the CodecPrivate is empty or not present and Channels is 1 or 2, players MAY treat it as a sane set of defaults, I guess. e.g. channel mapping family 0, no pre-skip or gain.<br />
** For Channels > 2 the track MUST be rejected, since there's no way to map the encoded substreams to channels.<br />
** We would also have to decide on a default value for OutputGain.<br />
** Version must be 1.<br />
* How can sample-accurate end-time trimming work in Matroska?<br />
** We defined a new element added to a BlockGroup, DiscardPadding (previously PostPadding), which is defined as the number of nanoseconds to discard from the Block.<br />
** Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus. This needs a new element specifying the number of samples to trim, perhaps a new BlockGroup child.<br />
*** This has been addressed with DiscardPadding for Opus. DiscardPadding was speced to fix Vorbis (as well as other codecs) too.<br />
* If new elements are required, can they be defined so as to enable correct seeking in rolling intra (a.k.a intra refresh) video as well?<br />
** SeekPreRoll should work for rolling intra video.<br />
<br />
== Handling Pre-skip data ==<br />
<br />
* '''On [http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel Matroska-dev] we decided to implement proposal one ([http://lists.matroska.org/pipermail/matroska-devel/2013-June/004475.html ref]).'''<br />
* Use Cases: <br />
** UC1: Playback starts from the beginning of the stream. Source stream time starts at 0.<br />
** UC2: Playback starts from the beginning of the stream. Pre-skip data ends in middle of compressed packet.<br />
** UC3: Playback starts from the middle of the stream > SeekPreRoll time.<br />
** UC4: Playback starts from the middle of the stream < SeekPreRoll time.<br />
** UC5: Encode source stream to Opus, mux to Matroksa, then decode Opus stream, must have same number of samples as source stream.<br />
<br />
* one: Timeshift the timestamps by pre-skip data.<br />
** The Opus audio stream pre-skip data starts from time 0 and adds the pre-skip time to the normal audio time, like how Opus files are muxed into ogg files. We would add a new element to the TrackEntry element, CodecDelay, and the player would adjust the timestamps of the decoded samples by subtracting CodecDelay. All use cases should be covered.<br />
** Cons:<br />
*** The timestamp of the Block does not match the timestamp of the playback position.<br />
*** Does not generalize known "decode, but not render" data.<br />
*** Forces the player to handle the pre-skip samples. I.e. not the decoder.<br />
*** Because CodecPrivate already includes a full OpusHead packet, it contains a redundant pre-skip field. To avoid confusion, decoders should ensure the two fields match (if they do not, this indicates a bug, as in [https://trac.ffmpeg.org/ticket/5509]), but since one is specified in nanoseconds and the other in samples at 48 kHz, we need to define what's sufficient to be considered a match.<br />
<br />
* two: Use pre-skip data from CodecPrivate.<br />
** On every discontinuity the decoder would need to decode and throw away the pre-skip data.<br />
** Cons:<br />
*** UC2 will throw away valid data and the AV sync will be off.<br />
*** UC3 will redundantly decode the pre-skip data.<br />
<br />
* three: Add TimeToDiscard to Block.<br />
** Add an element to the Block element, TimeToDiscard in nanoseconds. A value of -1 would not render the whole Block, which would have the same effect as setting the invisible bit. How would this affect the Block timestamp? Maybe the new element should be SamplesToDiscard or DataToDiscard?<br />
** Cons:<br />
<br />
* four: Blocks that contain pre-skip data will set invisible flag.<br />
** Blocks that contain pre-skip data have timestamps from the beginning of the stream. Blocks that only contain normal data have timestamps from the playback position.<br />
** Cons:<br />
*** Forces the player to handle the pre-skip samples. I.e. not the decoder.<br />
*** UC2 will throw away valid data and the AV sync will be off. Other use cases should be fine.<br />
<br />
* five: Force pre-skip packets to be prepended to the first normal packet in the first Block.<br />
** The first Block's timestmap will be set to the start time of the source playback position. We would add a new element to the TrackEntry element, CodecDelay. All use cases should be covered.<br />
** Cons:<br />
*** Does not generalize known "decode, but not render" data.<br />
*** Forces the player to handle the pre-skip samples. I.e. not the decoder.<br />
<br />
* six: Create a new codec, OPUS_MKV.<br />
** Basically the codec will wrap Opus packets with data telling the decoder what type of Opus packet it contains. Essentially we would be creating a new codec to handle pre-skip data within the decoder.<br />
** Cons:<br />
*** There will be two types of Opus data streams!<br />
*** Does not generalize known "decode, but not render" data.<br />
<br />
* seven: Negative timestamps.<br />
** The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero.<br />
** One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped <= start of output this shouldn't affect seeking.<br />
** Cons:<br />
*** Moritz suggests this won't work because the resolution of the timestamps is controlled by the muxer, so the SimpleBlock timestamp offset isn't sample accurate anyway ([http://lists.matroska.org/pipermail/matroska-devel/2012-September/004254.html ref]).<br />
<br />
* eight: <!-- Proposal 8 needs a title, here. --><br />
** The Ogg format uses granule positions which are converted to presentation timecodes using codec specific information on a per logical stream basis.<br />
** The Matroska format uses absolute timecodes with an arbitrary per segement accuracy for all tracks in the segment.<br />
** It is the belief of this tikiman that using a timecode offset of any kind in MKV is unholy.<br />
** The preskip is communicated to the media software via the Opus header in the codec private data. At the begining of the track, the track timecode is not increased until prekip samples are in track frames.<br />
** From then on audio is muxed as normal, however the audio should be muxed >= 3840 samples behind video frames.<br />
*** i.e. Cluster Timecode: 5.000 seconds<br />
*** Video Track Key Frame 5.000 seconds<br />
*** Opus Track Frame 4.920 seconds<br />
<br />
[[Category:Opus]]</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus/testvectors&diff=16338OggOpus/testvectors2016-04-25T23:47:27Z<p>Derf: Comment data shorter than the packet is allowed (RFC 7845 Section 5.2)</p>
<hr />
<div>This page lists test vectors needed for OggOpus which are specific to the Ogg mapping (separate from the opus bitstream test vectors, though they do some bitstream testing as a side efffect)<br />
<br />
Greg is collecting a draft file set at https://people.xiph.org/~greg/opus_testvectors/<br />
<br />
<br />
* All test vectors should be chained files with at least two parts<br />
** Chained file where the second link has no pregap and starts with inter frames (to ensure that decoder state is reset)<br />
* Pre-skip (set large pre-skip with a chime "if you just heard a chime, your player is broken")<br />
* Multichannel<br />
** Multichannel stereo (e.g. mono+mono)<br />
** Multichannel w/pre-skip and random channel maps<br />
** Multichannel with silent channels<br />
*** Totally silent multichannel (Should this one be invalid?)<br />
** Multichannel with repeated channels (i.e. one stream used for multiple channels)<br />
** Multichannel with 256 channels<br />
** Mapping tests for the Vorbis mappings (e.g. name of the speaker spoken by each speaker)<br />
* Files with crazy input rate.<br />
* Header-gain set very high with a very quiet input (silent if you don't implement header gain).<br />
* Header-gain set very low with an input that will clip a decoder if the header gain is not done internally.<br />
* Header-gain set very low, and R128_TRACK_GAIN to normalize it<br />
** matching WAV outputs ... but matching to what?<br />
* Single packet per page<br />
* Utterly stuffed pages with constant continued pages<br />
* Pages whose contents are entirely and partially dropped frames (len=0) (maybe redundant with bitstream tests)<br />
* Files with chimes after the end (testing end length chopping)<br />
* File with all opus modes and frame sizes<br />
* Stereo files using many mono frames at the beginning/end<br />
* OpusTags comment values containing very large nonsense comments, duplicate comment values etc.<br />
* Files with non-zero initial granulepos, pre-skip, trimmed last page to check duration calculation<br />
<br />
=== Illegal test vectors that MUST fail ===<br />
* Zero streams (N=0)<br />
* Too many two-output streams<br />
** M>N<br />
** M<=N but M+N>255<br />
* Channels mapped to nonexistent stream indices (255 > index >= M+N)<br />
* Illegal OpusTags comments<br />
** Total length larger than the packet<br />
** Illegal field names<br />
** Illegal field contents<br />
** Illegal field (no "=")<br />
** Multiple R128_TRACK_GAIN comments (should this be required to fail?)<br />
** R128_TRACK_GAIN comments containing illegal values (should this be required to fail?)<br />
*** Non-ASCII encodings of correct-looking values<br />
* All GP==0<br />
* first data granulepos too small<br />
* preskip > final granulepos<br />
<br />
[[Category:Ogg]]<br />
[[Category:Opus]]</div>Derfhttps://wiki.xiph.org/index.php?title=OpusExtensions&diff=16337OpusExtensions2016-04-19T21:49:51Z<p>Derf: Fix off-by-one in start of each range (thanks to Mark Harris for the report)</p>
<hr />
<div>Opus audio data packets begin with a "table of contents" (TOC) sequence which defines the frame duration, audio bandwidth and coding mode of the packet, as well as describing how individual frames are packed into the data packet. [[https://tools.ietf.org/html/rfc6716#section-3.1 RFC 6716 Section 3.1]]. Other types of data packets are used with Opus in various containers are designed to start with a sequence which is not a valid TOC. This simplifies sorting such data for muxing implementations and ensures they will be rejected by the decoder if they are accidentally passed as Opus audio data.<br />
<br />
Below is a list of such alternate sequences, to avoid duplication.<br />
<br />
== List of reserved invalid Opus TOC sequences ==<br />
<br />
* `Op` is used as a prefix for metadata headers in .opus files. [https://tools.ietf.org/html/draft-ietf-codec-oggopus RFC 7845]<br />
* '0x3FF' in the first 11 bits marks an `opus_control_header` in MPEG-TS. [[OpusTS]]<br />
<br />
== Space of all invalid Opus TOC sequences ==<br />
<br />
* 0x0300<br />
* 0x030D...0x0340<br />
* 0x034D...0x0380<br />
* 0x038D...0x03C0<br />
* 0x03CD...0x03FF<br />
<br />
* 0x0700<br />
* 0x070D...0x0740<br />
* 0x074D...0x0780<br />
* 0x078D...0x07C0<br />
* 0x07CD...0x07FF<br />
<br />
* 0x0B00<br />
* 0x0B07...0x0B40<br />
* 0x0B47...0x0B80<br />
* 0x0B87...0x0BC0<br />
* 0x0BC7...0x0BFF<br />
<br />
* 0x0F00<br />
* 0x0F07...0x0F40<br />
* 0x0F47...0x0F80<br />
* 0x0F87...0x0FC0<br />
* 0x0FC7...0x0FFF<br />
<br />
* 0x1300<br />
* 0x1304...0x1340<br />
* 0x1344...0x1380<br />
* 0x1384...0x13C0<br />
* 0x13C4...0x13FF<br />
<br />
* 0x1700<br />
* 0x1704...0x1740<br />
* 0x1744...0x1780<br />
* 0x1784...0x17C0<br />
* 0x17C4...0x17FF<br />
<br />
* 0x1B00<br />
* 0x1B03...0x1B40<br />
* 0x1B43...0x1B80<br />
* 0x1B83...0x1BC0<br />
* 0x1BC3...0x1BFF<br />
<br />
* 0x1F00<br />
* 0x1F03...0x1F40<br />
* 0x1F43...0x1F80<br />
* 0x1F83...0x1FC0<br />
* 0x1FC3...0x1FFF<br />
<br />
* 0x2300<br />
* 0x230D...0x2340<br />
* 0x234D...0x2380<br />
* 0x238D...0x23C0<br />
* 0x23CD...0x23FF<br />
<br />
* 0x2700<br />
* 0x270D...0x2740<br />
* 0x274D...0x2780<br />
* 0x278D...0x27C0<br />
* 0x27CD...0x27FF<br />
<br />
* 0x2B00<br />
* 0x2B07...0x2B40<br />
* 0x2B47...0x2B80<br />
* 0x2B87...0x2BC0<br />
* 0x2BC7...0x2BFF<br />
<br />
* 0x2F00<br />
* 0x2F07...0x2F40<br />
* 0x2F47...0x2F80<br />
* 0x2F87...0x2FC0<br />
* 0x2FC7...0x2FFF<br />
<br />
* 0x3300<br />
* 0x3304...0x3340<br />
* 0x3344...0x3380<br />
* 0x3384...0x33C0<br />
* 0x33C4...0x33FF<br />
<br />
* 0x3700<br />
* 0x3704...0x3740<br />
* 0x3744...0x3780<br />
* 0x3784...0x37C0<br />
* 0x37C4...0x37FF<br />
<br />
* 0x3B00<br />
* 0x3B03...0x3B40<br />
* 0x3B43...0x3B80<br />
* 0x3B83...0x3BC0<br />
* 0x3BC3...0x3BFF<br />
<br />
* 0x3F00<br />
* 0x3F03...0x3F40<br />
* 0x3F43...0x3F80<br />
* 0x3F83...0x3FC0<br />
* 0x3FC3...0x3FFF<br />
<br />
* 0x4300<br />
* 0x430D...0x4340<br />
* 0x434D...0x4380<br />
* 0x438D...0x43C0<br />
* 0x43CD...0x43FF<br />
<br />
* 0x4700<br />
* 0x470D...0x4740<br />
* 0x474D...0x4780<br />
* 0x478D...0x47C0<br />
* 0x47CD...0x47FF<br />
<br />
* 0x4B00<br />
* 0x4B07...0x4B40<br />
* 0x4B47...0x4B80<br />
* 0x4B87...0x4BC0<br />
* 0x4BC7...0x4BFF<br />
<br />
* 0x4F00<br />
* 0x4F07...0x4F40<br />
* 0x4F47...0x4F6F<br />
* 0x4F70 ("Op"): ID and Comment headers in .opus files [https://tools.ietf.org/html/draft-ietf-codec-oggopus RFC 7845]<br />
* 0x4F71...0x4F80<br />
* 0x4F87...0x4FC0<br />
* 0x4FC7...0x4FFF<br />
<br />
* 0x5300<br />
* 0x5304...0x5340<br />
* 0x5344...0x5380<br />
* 0x5384...0x53C0<br />
* 0x53C4...0x53FF<br />
<br />
* 0x5700<br />
* 0x5704...0x5740<br />
* 0x5744...0x5780<br />
* 0x5784...0x57C0<br />
* 0x57C4...0x57FF<br />
<br />
* 0x5B00<br />
* 0x5B03...0x5B40<br />
* 0x5B43...0x5B80<br />
* 0x5B83...0x5BC0<br />
* 0x5BC3...0x5BFF<br />
<br />
* 0x5F00<br />
* 0x5F03...0x5F40<br />
* 0x5F43...0x5F80<br />
* 0x5F83...0x5FC0<br />
* 0x5FC3...0x5FFF<br />
<br />
* 0x6300<br />
* 0x630D...0x6340<br />
* 0x634D...0x6380<br />
* 0x638D...0x63C0<br />
* 0x63CD...0x63FF<br />
<br />
* 0x6700<br />
* 0x670D...0x6740<br />
* 0x674D...0x6780<br />
* 0x678D...0x67C0<br />
* 0x67CD...0x67FF<br />
<br />
* 0x6B00<br />
* 0x6B07...0x6B40<br />
* 0x6B47...0x6B80<br />
* 0x6B87...0x6BC0<br />
* 0x6BC7...0x6BFF<br />
<br />
* 0x6F00<br />
* 0x6F07...0x6F40<br />
* 0x6F47...0x6F80<br />
* 0x6F87...0x6FC0<br />
* 0x6FC7...0x6FFF<br />
<br />
* 0x7300<br />
* 0x730D...0x7340<br />
* 0x734D...0x7380<br />
* 0x738D...0x73C0<br />
* 0x73CD...0x73FF<br />
<br />
* 0x7700<br />
* 0x770D...0x7740<br />
* 0x774D...0x7780<br />
* 0x778D...0x77C0<br />
* 0x77CD...0x77FF<br />
<br />
* 0x7B00<br />
* 0x7B07...0x7B40<br />
* 0x7B47...0x7B80<br />
* 0x7B87...0x7BC0<br />
* 0x7BC7...0x7BFF<br />
<br />
* 0x7F00<br />
* 0x7F07...0x7F40<br />
* 0x7F47...0x7F80<br />
* 0x7F87...0x7FC0<br />
* 0x7FC7...0x7FDF<br />
* 0x7FE0...0x7FFF: opus_control_header in MPEG-TS [[OpusTS]]<br />
<br />
* 0x8300<br />
* 0x8331...0x8340<br />
* 0x8371...0x8380<br />
* 0x83B1...0x83C0<br />
* 0x83F1...0x83FF<br />
<br />
* 0x8700<br />
* 0x8731...0x8740<br />
* 0x8771...0x8780<br />
* 0x87B1...0x87C0<br />
* 0x87F1...0x87FF<br />
<br />
* 0x8B00<br />
* 0x8B19...0x8B40<br />
* 0x8B59...0x8B80<br />
* 0x8B99...0x8BC0<br />
* 0x8BD9...0x8BFF<br />
<br />
* 0x8F00<br />
* 0x8F19...0x8F40<br />
* 0x8F59...0x8F80<br />
* 0x8F99...0x8FC0<br />
* 0x8FD9...0x8FFF<br />
<br />
* 0x9300<br />
* 0x930D...0x9340<br />
* 0x934D...0x9380<br />
* 0x938D...0x93C0<br />
* 0x93CD...0x93FF<br />
<br />
* 0x9700<br />
* 0x970D...0x9740<br />
* 0x974D...0x9780<br />
* 0x978D...0x97C0<br />
* 0x97CD...0x97FF<br />
<br />
* 0x9B00<br />
* 0x9B07...0x9B40<br />
* 0x9B47...0x9B80<br />
* 0x9B87...0x9BC0<br />
* 0x9BC7...0x9BFF<br />
<br />
* 0x9F00<br />
* 0x9F07...0x9F40<br />
* 0x9F47...0x9F80<br />
* 0x9F87...0x9FC0<br />
* 0x9FC7...0x9FFF<br />
<br />
* 0xA300<br />
* 0xA331...0xA340<br />
* 0xA371...0xA380<br />
* 0xA3B1...0xA3C0<br />
* 0xA3F1...0xA3FF<br />
<br />
* 0xA700<br />
* 0xA731...0xA740<br />
* 0xA771...0xA780<br />
* 0xA7B1...0xA7C0<br />
* 0xA7F1...0xA7FF<br />
<br />
* 0xAB00<br />
* 0xAB19...0xAB40<br />
* 0xAB59...0xAB80<br />
* 0xAB99...0xABC0<br />
* 0xABD9...0xABFF<br />
<br />
* 0xAF00<br />
* 0xAF19...0xAF40<br />
* 0xAF59...0xAF80<br />
* 0xAF99...0xAFC0<br />
* 0xAFD9...0xAFFF<br />
<br />
* 0xB300<br />
* 0xB30D...0xB340<br />
* 0xB34D...0xB380<br />
* 0xB38D...0xB3C0<br />
* 0xB3CD...0xB3FF<br />
<br />
* 0xB700<br />
* 0xB70D...0xB740<br />
* 0xB74D...0xB780<br />
* 0xB78D...0xB7C0<br />
* 0xB7CD...0xB7FF<br />
<br />
* 0xBB00<br />
* 0xBB07...0xBB40<br />
* 0xBB47...0xBB80<br />
* 0xBB87...0xBBC0<br />
* 0xBBC7...0xBBFF<br />
<br />
* 0xBF00<br />
* 0xBF07...0xBF40<br />
* 0xBF47...0xBF80<br />
* 0xBF87...0xBFC0<br />
* 0xBFC7...0xBFFF<br />
<br />
* 0xC300<br />
* 0xC331...0xC340<br />
* 0xC371...0xC380<br />
* 0xC3B1...0xC3C0<br />
* 0xC3F1...0xC3FF<br />
<br />
* 0xC700<br />
* 0xC731...0xC740<br />
* 0xC771...0xC780<br />
* 0xC7B1...0xC7C0<br />
* 0xC7F1...0xC7FF<br />
<br />
* 0xCB00<br />
* 0xCB19...0xCB40<br />
* 0xCB59...0xCB80<br />
* 0xCB99...0xCBC0<br />
* 0xCBD9...0xCBFF<br />
<br />
* 0xCF00<br />
* 0xCF19...0xCF40<br />
* 0xCF59...0xCF80<br />
* 0xCF99...0xCFC0<br />
* 0xCFD9...0xCFFF<br />
<br />
* 0xD300<br />
* 0xD30D...0xD340<br />
* 0xD34D...0xD380<br />
* 0xD38D...0xD3C0<br />
* 0xD3CD...0xD3FF<br />
<br />
* 0xD700<br />
* 0xD70D...0xD740<br />
* 0xD74D...0xD780<br />
* 0xD78D...0xD7C0<br />
* 0xD7CD...0xD7FF<br />
<br />
* 0xDB00<br />
* 0xDB07...0xDB40<br />
* 0xDB47...0xDB80<br />
* 0xDB87...0xDBC0<br />
* 0xDBC7...0xDBFF<br />
<br />
* 0xDF00<br />
* 0xDF07...0xDF40<br />
* 0xDF47...0xDF80<br />
* 0xDF87...0xDFC0<br />
* 0xDFC7...0xDFFF<br />
<br />
* 0xE300<br />
* 0xE331...0xE340<br />
* 0xE371...0xE380<br />
* 0xE3B1...0xE3C0<br />
* 0xE3F1...0xE3FF<br />
<br />
* 0xE700<br />
* 0xE731...0xE740<br />
* 0xE771...0xE780<br />
* 0xE7B1...0xE7C0<br />
* 0xE7F1...0xE7FF<br />
<br />
* 0xEB00<br />
* 0xEB19...0xEB40<br />
* 0xEB59...0xEB80<br />
* 0xEB99...0xEBC0<br />
* 0xEBD9...0xEBFF<br />
<br />
* 0xEF00<br />
* 0xEF19...0xEF40<br />
* 0xEF59...0xEF80<br />
* 0xEF99...0xEFC0<br />
* 0xEFD9...0xEFFF<br />
<br />
* 0xF300<br />
* 0xF30D...0xF340<br />
* 0xF34D...0xF380<br />
* 0xF38D...0xF3C0<br />
* 0xF3CD...0xF3FF<br />
<br />
* 0xF700<br />
* 0xF70D...0xF740<br />
* 0xF74D...0xF780<br />
* 0xF78D...0xF7C0<br />
* 0xF7CD...0xF7FF<br />
<br />
* 0xFB00<br />
* 0xFB07...0xFB40<br />
* 0xFB47...0xFB80<br />
* 0xFB87...0xFBC0<br />
* 0xFBC7...0xFBFF<br />
<br />
* 0xFF00<br />
* 0xFF07...0xFF40<br />
* 0xFF47...0xFF80<br />
* 0xFF87...0xFFC0<br />
* 0xFFC7...0xFFFF</div>Derfhttps://wiki.xiph.org/index.php?title=OpusExtensions&diff=16335OpusExtensions2016-04-19T21:06:01Z<p>Derf: Add list of all invalid TOC sequences (both assigned and unassigned)</p>
<hr />
<div>Opus audio data packets begin with a "table of contents" (TOC) sequence which defines the frame duration, audio bandwidth and coding mode of the packet, as well as describing how individual frames are packed into the data packet. [[https://tools.ietf.org/html/rfc6716#section-3.1 RFC 6716 Section 3.1]]. Other types of data packets are used with Opus in various containers are designed to start with a sequence which is not a valid TOC. This simplifies sorting such data for muxing implementations and ensures they will be rejected by the decoder if they are accidentally passed as Opus audio data.<br />
<br />
Below is a list of such alternate sequences, to avoid duplication.<br />
<br />
== List of reserved invalid Opus TOC sequences ==<br />
<br />
* `Op` is used as a prefix for metadata headers in .opus files. [https://tools.ietf.org/html/draft-ietf-codec-oggopus RFC 7845]<br />
* '0x3FF' in the first 11 bits marks an `opus_control_header` in MPEG-TS. [[OpusTS]]<br />
<br />
== Constructing invalid TOC sequences ==<br />
<br />
The only restriction that doesn't depend on the number of bytes <br />
in the packet is [R5], which is, "Code 3 packets contain at least <br />
one frame, but no more than 120 ms of audio total."<br />
<br />
== Space of all invalid Opus TOC sequences ==<br />
<br />
* 0x0300<br />
* 0x030C...0x0340<br />
* 0x034C...0x0380<br />
* 0x038C...0x03C0<br />
* 0x03CC...0x03FF<br />
<br />
* 0x0700<br />
* 0x070C...0x0740<br />
* 0x074C...0x0780<br />
* 0x078C...0x07C0<br />
* 0x07CC...0x07FF<br />
<br />
* 0x0B00<br />
* 0x0B06...0x0B40<br />
* 0x0B46...0x0B80<br />
* 0x0B86...0x0BC0<br />
* 0x0BC6...0x0BFF<br />
<br />
* 0x0F00<br />
* 0x0F06...0x0F40<br />
* 0x0F46...0x0F80<br />
* 0x0F86...0x0FC0<br />
* 0x0FC6...0x0FFF<br />
<br />
* 0x1300<br />
* 0x1303...0x1340<br />
* 0x1343...0x1380<br />
* 0x1383...0x13C0<br />
* 0x13C3...0x13FF<br />
<br />
* 0x1700<br />
* 0x1703...0x1740<br />
* 0x1743...0x1780<br />
* 0x1783...0x17C0<br />
* 0x17C3...0x17FF<br />
<br />
* 0x1B00<br />
* 0x1B02...0x1B40<br />
* 0x1B42...0x1B80<br />
* 0x1B82...0x1BC0<br />
* 0x1BC2...0x1BFF<br />
<br />
* 0x1F00<br />
* 0x1F02...0x1F40<br />
* 0x1F42...0x1F80<br />
* 0x1F82...0x1FC0<br />
* 0x1FC2...0x1FFF<br />
<br />
* 0x2300<br />
* 0x230C...0x2340<br />
* 0x234C...0x2380<br />
* 0x238C...0x23C0<br />
* 0x23CC...0x23FF<br />
<br />
* 0x2700<br />
* 0x270C...0x2740<br />
* 0x274C...0x2780<br />
* 0x278C...0x27C0<br />
* 0x27CC...0x27FF<br />
<br />
* 0x2B00<br />
* 0x2B06...0x2B40<br />
* 0x2B46...0x2B80<br />
* 0x2B86...0x2BC0<br />
* 0x2BC6...0x2BFF<br />
<br />
* 0x2F00<br />
* 0x2F06...0x2F40<br />
* 0x2F46...0x2F80<br />
* 0x2F86...0x2FC0<br />
* 0x2FC6...0x2FFF<br />
<br />
* 0x3300<br />
* 0x3303...0x3340<br />
* 0x3343...0x3380<br />
* 0x3383...0x33C0<br />
* 0x33C3...0x33FF<br />
<br />
* 0x3700<br />
* 0x3703...0x3740<br />
* 0x3743...0x3780<br />
* 0x3783...0x37C0<br />
* 0x37C3...0x37FF<br />
<br />
* 0x3B00<br />
* 0x3B02...0x3B40<br />
* 0x3B42...0x3B80<br />
* 0x3B82...0x3BC0<br />
* 0x3BC2...0x3BFF<br />
<br />
* 0x3F00<br />
* 0x3F02...0x3F40<br />
* 0x3F42...0x3F80<br />
* 0x3F82...0x3FC0<br />
* 0x3FC2...0x3FFF<br />
<br />
* 0x4300<br />
* 0x430C...0x4340<br />
* 0x434C...0x4380<br />
* 0x438C...0x43C0<br />
* 0x43CC...0x43FF<br />
<br />
* 0x4700<br />
* 0x470C...0x4740<br />
* 0x474C...0x4780<br />
* 0x478C...0x47C0<br />
* 0x47CC...0x47FF<br />
<br />
* 0x4B00<br />
* 0x4B06...0x4B40<br />
* 0x4B46...0x4B80<br />
* 0x4B86...0x4BC0<br />
* 0x4BC6...0x4BFF<br />
<br />
* 0x4F00<br />
* 0x4F06...0x4F40<br />
* 0x4F46...0x4F6F<br />
* 0x4F70 ("Op"): ID and Comment headers in .opus files [https://tools.ietf.org/html/draft-ietf-codec-oggopus RFC 7845]<br />
* 0x4F71...0x4F80<br />
* 0x4F86...0x4FC0<br />
* 0x4FC6...0x4FFF<br />
<br />
* 0x5300<br />
* 0x5303...0x5340<br />
* 0x5343...0x5380<br />
* 0x5383...0x53C0<br />
* 0x53C3...0x53FF<br />
<br />
* 0x5700<br />
* 0x5703...0x5740<br />
* 0x5743...0x5780<br />
* 0x5783...0x57C0<br />
* 0x57C3...0x57FF<br />
<br />
* 0x5B00<br />
* 0x5B02...0x5B40<br />
* 0x5B42...0x5B80<br />
* 0x5B82...0x5BC0<br />
* 0x5BC2...0x5BFF<br />
<br />
* 0x5F00<br />
* 0x5F02...0x5F40<br />
* 0x5F42...0x5F80<br />
* 0x5F82...0x5FC0<br />
* 0x5FC2...0x5FFF<br />
<br />
* 0x6300<br />
* 0x630C...0x6340<br />
* 0x634C...0x6380<br />
* 0x638C...0x63C0<br />
* 0x63CC...0x63FF<br />
<br />
* 0x6700<br />
* 0x670C...0x6740<br />
* 0x674C...0x6780<br />
* 0x678C...0x67C0<br />
* 0x67CC...0x67FF<br />
<br />
* 0x6B00<br />
* 0x6B06...0x6B40<br />
* 0x6B46...0x6B80<br />
* 0x6B86...0x6BC0<br />
* 0x6BC6...0x6BFF<br />
<br />
* 0x6F00<br />
* 0x6F06...0x6F40<br />
* 0x6F46...0x6F80<br />
* 0x6F86...0x6FC0<br />
* 0x6FC6...0x6FFF<br />
<br />
* 0x7300<br />
* 0x730C...0x7340<br />
* 0x734C...0x7380<br />
* 0x738C...0x73C0<br />
* 0x73CC...0x73FF<br />
<br />
* 0x7700<br />
* 0x770C...0x7740<br />
* 0x774C...0x7780<br />
* 0x778C...0x77C0<br />
* 0x77CC...0x77FF<br />
<br />
* 0x7B00<br />
* 0x7B06...0x7B40<br />
* 0x7B46...0x7B80<br />
* 0x7B86...0x7BC0<br />
* 0x7BC6...0x7BFF<br />
<br />
* 0x7F00<br />
* 0x7F06...0x7F40<br />
* 0x7F46...0x7F80<br />
* 0x7F86...0x7FC0<br />
* 0x7FC6...0x7FDF<br />
* 0x7FE0...0x7FFF: opus_control_header in MPEG-TS [[OpusTS]]<br />
<br />
* 0x8300<br />
* 0x8330...0x8340<br />
* 0x8370...0x8380<br />
* 0x83B0...0x83C0<br />
* 0x83F0...0x83FF<br />
<br />
* 0x8700<br />
* 0x8730...0x8740<br />
* 0x8770...0x8780<br />
* 0x87B0...0x87C0<br />
* 0x87F0...0x87FF<br />
<br />
* 0x8B00<br />
* 0x8B18...0x8B40<br />
* 0x8B58...0x8B80<br />
* 0x8B98...0x8BC0<br />
* 0x8BD8...0x8BFF<br />
<br />
* 0x8F00<br />
* 0x8F18...0x8F40<br />
* 0x8F58...0x8F80<br />
* 0x8F98...0x8FC0<br />
* 0x8FD8...0x8FFF<br />
<br />
* 0x9300<br />
* 0x930C...0x9340<br />
* 0x934C...0x9380<br />
* 0x938C...0x93C0<br />
* 0x93CC...0x93FF<br />
<br />
* 0x9700<br />
* 0x970C...0x9740<br />
* 0x974C...0x9780<br />
* 0x978C...0x97C0<br />
* 0x97CC...0x97FF<br />
<br />
* 0x9B00<br />
* 0x9B06...0x9B40<br />
* 0x9B46...0x9B80<br />
* 0x9B86...0x9BC0<br />
* 0x9BC6...0x9BFF<br />
<br />
* 0x9F00<br />
* 0x9F06...0x9F40<br />
* 0x9F46...0x9F80<br />
* 0x9F86...0x9FC0<br />
* 0x9FC6...0x9FFF<br />
<br />
* 0xA300<br />
* 0xA330...0xA340<br />
* 0xA370...0xA380<br />
* 0xA3B0...0xA3C0<br />
* 0xA3F0...0xA3FF<br />
<br />
* 0xA700<br />
* 0xA730...0xA740<br />
* 0xA770...0xA780<br />
* 0xA7B0...0xA7C0<br />
* 0xA7F0...0xA7FF<br />
<br />
* 0xAB00<br />
* 0xAB18...0xAB40<br />
* 0xAB58...0xAB80<br />
* 0xAB98...0xABC0<br />
* 0xABD8...0xABFF<br />
<br />
* 0xAF00<br />
* 0xAF18...0xAF40<br />
* 0xAF58...0xAF80<br />
* 0xAF98...0xAFC0<br />
* 0xAFD8...0xAFFF<br />
<br />
* 0xB300<br />
* 0xB30C...0xB340<br />
* 0xB34C...0xB380<br />
* 0xB38C...0xB3C0<br />
* 0xB3CC...0xB3FF<br />
<br />
* 0xB700<br />
* 0xB70C...0xB740<br />
* 0xB74C...0xB780<br />
* 0xB78C...0xB7C0<br />
* 0xB7CC...0xB7FF<br />
<br />
* 0xBB00<br />
* 0xBB06...0xBB40<br />
* 0xBB46...0xBB80<br />
* 0xBB86...0xBBC0<br />
* 0xBBC6...0xBBFF<br />
<br />
* 0xBF00<br />
* 0xBF06...0xBF40<br />
* 0xBF46...0xBF80<br />
* 0xBF86...0xBFC0<br />
* 0xBFC6...0xBFFF<br />
<br />
* 0xC300<br />
* 0xC330...0xC340<br />
* 0xC370...0xC380<br />
* 0xC3B0...0xC3C0<br />
* 0xC3F0...0xC3FF<br />
<br />
* 0xC700<br />
* 0xC730...0xC740<br />
* 0xC770...0xC780<br />
* 0xC7B0...0xC7C0<br />
* 0xC7F0...0xC7FF<br />
<br />
* 0xCB00<br />
* 0xCB18...0xCB40<br />
* 0xCB58...0xCB80<br />
* 0xCB98...0xCBC0<br />
* 0xCBD8...0xCBFF<br />
<br />
* 0xCF00<br />
* 0xCF18...0xCF40<br />
* 0xCF58...0xCF80<br />
* 0xCF98...0xCFC0<br />
* 0xCFD8...0xCFFF<br />
<br />
* 0xD300<br />
* 0xD30C...0xD340<br />
* 0xD34C...0xD380<br />
* 0xD38C...0xD3C0<br />
* 0xD3CC...0xD3FF<br />
<br />
* 0xD700<br />
* 0xD70C...0xD740<br />
* 0xD74C...0xD780<br />
* 0xD78C...0xD7C0<br />
* 0xD7CC...0xD7FF<br />
<br />
* 0xDB00<br />
* 0xDB06...0xDB40<br />
* 0xDB46...0xDB80<br />
* 0xDB86...0xDBC0<br />
* 0xDBC6...0xDBFF<br />
<br />
* 0xDF00<br />
* 0xDF06...0xDF40<br />
* 0xDF46...0xDF80<br />
* 0xDF86...0xDFC0<br />
* 0xDFC6...0xDFFF<br />
<br />
* 0xE300<br />
* 0xE330...0xE340<br />
* 0xE370...0xE380<br />
* 0xE3B0...0xE3C0<br />
* 0xE3F0...0xE3FF<br />
<br />
* 0xE700<br />
* 0xE730...0xE740<br />
* 0xE770...0xE780<br />
* 0xE7B0...0xE7C0<br />
* 0xE7F0...0xE7FF<br />
<br />
* 0xEB00<br />
* 0xEB18...0xEB40<br />
* 0xEB58...0xEB80<br />
* 0xEB98...0xEBC0<br />
* 0xEBD8...0xEBFF<br />
<br />
* 0xEF00<br />
* 0xEF18...0xEF40<br />
* 0xEF58...0xEF80<br />
* 0xEF98...0xEFC0<br />
* 0xEFD8...0xEFFF<br />
<br />
* 0xF300<br />
* 0xF30C...0xF340<br />
* 0xF34C...0xF380<br />
* 0xF38C...0xF3C0<br />
* 0xF3CC...0xF3FF<br />
<br />
* 0xF700<br />
* 0xF70C...0xF740<br />
* 0xF74C...0xF780<br />
* 0xF78C...0xF7C0<br />
* 0xF7CC...0xF7FF<br />
<br />
* 0xFB00<br />
* 0xFB06...0xFB40<br />
* 0xFB46...0xFB80<br />
* 0xFB86...0xFBC0<br />
* 0xFBC6...0xFBFF<br />
<br />
* 0xFF00<br />
* 0xFF06...0xFF40<br />
* 0xFF46...0xFF80<br />
* 0xFF86...0xFFC0<br />
* 0xFFC6...0xFFFF</div>Derfhttps://wiki.xiph.org/index.php?title=OpusTS&diff=16329OpusTS2016-04-19T19:57:57Z<p>Derf: </p>
<hr />
<div>The latest draft mapping for Opus in MPEG-TS can be found here:<br />
<br />
https://people.xiph.org/~tterribe/opus/ETSI_TS_opus-v0.1.3-draft.doc<br />
<br />
[[Category:Opus]]</div>Derfhttps://wiki.xiph.org/index.php?title=Daala&diff=16125Daala2015-10-26T23:42:34Z<p>Derf: /* Presentations */</p>
<hr />
<div>Daala is the codename for a new video compression technology.<br />
<br />
The effort is a collaboration between the [https://www.mozilla.org/en-US/research/ Mozilla Foundation], the [https://www.xiph.org/ Xiph.Org Foundation] and any other contributors that wish to help.<br />
<br />
The goal of the project is to provide a video format that's free to implement, use and distribute, and a reference implementation with technical performance superior to [https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding H.265]. <br />
<br />
Please see the links below or the [https://www.xiph.org/daala/ main page] for more information.<br />
<br />
== Wiki Pages ==<br />
* [[Daala Quickstart|Daala Quickstart (Linux/MacOS)]]<br />
* [[Daala Quickstart Windows|Daala Quickstart (Windows)]]<br />
* [[Daala MinGW64 Environment]]<br />
<br />
* [[Daala Weekly Meetings|Daala Weekly Meetings]]<br />
<br />
* [[AreWeCompressedYet]]<br />
* [[RD Curve Data Format]]<br />
<br />
* [[DaalaTodo|Daala To-do List]]<br />
* [[DaalaRoadmap|Daala Roadmap]]<br />
<br />
* [[Intra|Intra-prediction within Daala]]<br />
<br />
* [[Videos|Digital Primers]] - educational videos about audio/video technology<br />
<br />
== Communication ==<br />
You are '''encouraged''' to join the<br />
* [irc://irc.freenode.net/daala '''#daala''' IRC channel at freenode.net] - if you don't have an IRC client, you can use Freenode's '''[https://webchat.freenode.net/?channels=%23daala webchat]''' instead.<br />
* [http://lists.xiph.org/mailman/listinfo/daala Daala Email List]<br />
<br />
=== Weekly Meetings ===<br />
You are also welcome to attend the public [[Daala Weekly Meetings|weekly progress meetings]] by installing and using [http://wiki.mumble.info Mumble].<br /><br />
The address is '''mf4.xiph.org''' and the port is '''64738''' (you can run '''mumble://mf4.xiph.org:64738''' within your browser as a shortcut).<br /><br />
The meetings occur on '''Tuesdays''' at '''[http://www.timeanddate.com/worldclock/fixedtime.html?msg=Daala+Weekly+Meeting&iso=20150428T09&p1=1241 9AM Pacific Time]''' (5PM UTC/GMT).<br />
The meeting agenda used to be available at '''[https://daala.etherpad.mozilla.org/weekly-meeting this Etherpad]''', the October 13, 2015 meeting is available on [https://docs.google.com/document/d/1JP_Ko3wPuyDWhooZcp_m9kndyfZ75xN5YOi5yIMCW0s/edit?pli=1 Google Docs] and, following the migration to Etherpad Lite, the meeting agenda and minutes are now available at [https://public.etherpad-mozilla.org/p/daala-weekly-meeting this Etherpad].<br />
<br />
=== Other ===<br />
* [http://forum.doom9.org/showthread.php?t=168004 Doom9 Forum discussion] - generic forum thread regarding Daala<br />
* <del>[https://daala.etherpad.mozilla.org/ep/padlist/all-pads Daala Etherpads] - you can [https://daala.etherpad.mozilla.org/ep/account/request-account request a free account] to view these. You should receive access within a few days.</del> Mozilla are transitioning to Etherpad Lite.<br />
* [http://benjamin.smedbergs.us/weekly-updates.fcgi/project/daala Daala Project Status Board] - what Daala bits the Mozilla people are working on<br />
<br />
== Coding ==<br />
You can get a copy of the latest Daala Source Code from [https://git.xiph.org/?p=daala.git;a=summary '''git.xiph.org'''] or [https://github.com/xiph/daala '''GitHub''']. Please stick to the '''[https://git.xiph.org/?p=daala.git;a=blob_plain;f=doc/coding_style.html Coding Style Guide]'''.<br />
<br />
* [https://review.xiph.org/all?limit=100 Xiph Code Reviews] - there is a proposal on the review process '''[[DaalaReview|here]]'''<br />
* [https://github.com/xiph/daala/issues Daala's issues] - Issue/bug tracker on Github<br />
* [https://mf4.xiph.org/jenkins/view/daala/ Continuous Integration Tests] - these run every time a new commit is made to the Daala git master, to make sure the new code hasn't broken existing functionality.<br />
<br />
== Demos ==<br />
* [https://people.xiph.org/~xiphmont/demo/daala/player-demo.shtml Daala Video Player] - an example implementation of a Daala decoder and player, ported to Javascript using [https://github.com/kripken/emscripten Emscripten].<br />
<br />
=== Codec Techniques ===<br />
* [https://people.xiph.org/~xiphmont/demo/ Demo Articles] - explanations on certain techniques used in Daala (and other Xiph.Org projects)<br />
* [http://exp.martres.me/edi/ Edge-Directed Interpolation] ([https://github.com/smarter/edi source code])<br />
* [https://people.xiph.org/~ds/edi/info.html More Edge-Directed Interpolation]<br />
* [https://people.xiph.org/~unlord/demo/intra.html Intra-prediction]<br />
* [https://people.xiph.org/~unlord/zigzags.html Macroblock Coefficient Zigzag Graph] - HTML page generated using [https://github.com/xiph/daala/blob/master/tools/draw_zigzags.c tools/draw_zigzags.c] from the Daala source code.<br />
<br />
== Documents ==<br />
* [https://people.xiph.org/~unlord/spie_cfl.pdf Chroma from Luma (CfL)]<br />
* [http://jmvalin.ca/papers/spie_pvq.pdf Perceptual Vector Quantisation (PVQ)]<br />
* [https://people.xiph.org/~tterribe/daala/vbsobmc.pdf Overlapped Block Motion Compensation (OBMC)]<br />
* [https://mf4.xiph.org/jenkins/job/daala-autotools/ws/doc/html/index.html C API Documentation]<br />
* [https://people.xiph.org/~yushin/tmp__/yushin_phd_thesis.pdf Image Coding Thesis] by Yushin Cho<br />
* [http://arxiv.org/pdf/1411.4290v1.pdf Maximising Coding Efficiency Through Block Rotation] and why it [http://lists.xiph.org/pipermail/daala/2015-January/000054.html won't work well within Daala]<br />
* [http://jmvalin.ca/video/theoretical_results.pdf JMSpeex' Journal of Dubious Theoretical Results] - "take with an entire shaker-full of salt"<br />
* [https://people.xiph.org/~unlord/pcs_daala.pdf Using Daala Intra Frames for Still Picture Coding]<br />
<br />
=== IETF Drafts ===<br />
* [https://tools.ietf.org/html/draft-egge-videocodec-tdlt Time-Domain Lapped Transforms (TDLT)] - documents the Lapped Transform pre- and post-filters used for block-edge decorrelation<br />
* [https://tools.ietf.org/html/draft-valin-videocodec-pvq Perceptual Vector Quantisation (PVQ)] - <br />
* [https://tools.ietf.org/html/draft-terriberry-codingtools Coding Tools] - documents Entropy Coding, Integer Transforms and other techniques<br />
* [https://tools.ietf.org/html/draft-moffitt-netvc-requirements Internet Video Codec (NetVC) Requirements] - explains what requirements and use cases Daala is trying to cater for<br />
* [https://tools.ietf.org/html/draft-daede-netvc-testing Internet Video Codec (NetVC) Testing and Quality Measurement]<br />
* [https://tools.ietf.org/html/draft-terriberry-ipr-license Example IPR Licence Terms]<br />
<br />
Additional drafts can be found at the [https://datatracker.ietf.org/wg/netvc/documents/ IETF DataTracker].<br />
<br />
== Presentations ==<br />
* 2015-10-24 LinuxDay 24 (Turin) - [https://people.xiph.org/~tterribe/daala/linuxday24.pdf Slides]<br />
* 2015-10-21 MPEG 113 - Future Video Coding Workshop [https://people.xiph.org/~tterribe/daala/mpeg113.pdf Slides]<br />
* 2015-09-19 VideoLAN Dev Days - [https://www.youtube.com/playlist?list=PLQLpBN3oI7E44HIdTOovThc1MNHLchgHE YouTube Playlist] - [https://people.xiph.org/~tterribe/daala/vdd2015.pdf Daala Slides]<br />
* 2015-07-22 IETF 93 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF93_NETVC_II&chapter=chapter_1 NetVC Session 2/2] - [https://datatracker.ietf.org/meeting/93/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf93/netvc_II Video and Chat] - Slides - [https://www.ietf.org/jabber/logs/netvc/2015-07-22.html Jabber Log]<br />
* 2015-07-20 IETF 93 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF93_NETVC&chapter=chapter_1 NetVC Session 1/2] - [https://datatracker.ietf.org/meeting/93/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf93/netvc Video and Chat] - [https://www.ietf.org/proceedings/93/slides/slides-93-netvc-3.pdf Slides] - [https://www.ietf.org/jabber/logs/netvc/2015-07-20.html Jabber Log]<br />
* 2015-03-24 IETF 92 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF92_NETVC&chapter=chapter_0 NetVC Session] - Audio as [https://people.xiph.org/~tdaede/audio/ietf92-venetian-20150324-0900-am1.opus Opus] (29MB) or [https://www.ietf.org/audio/ietf92/ietf92-venetian-20150324-0900-am1.mp3 MP3] (119MB, action starts at 14:50) - [https://www.ietf.org/proceedings/92/slides/slides-92-netvc-0.pdf Slides] - [https://www.ietf.org/mail-archive/web/video-codec/current/msg00235.html Notes] - [https://www.ietf.org/jabber/logs/netvc/2015-03-24.html Jabber Log]<br />
* 2015-02-11 SPIE talks:<br />
<!-- Originals of these 3 videos can be found at http://people.xiph.org/~unlord/ --><br />
<!-- Someday, do a better encode --><br />
** [http://people.xiph.org/~tdaede/video/SPIE_Nathan.webm Chroma from Luma (CfL)] - [https://people.xiph.org/~unlord/SPIE-2015-CfL.pdf Slides] - [https://people.xiph.org/~unlord/spie_cfl.pdf Paper]<br />
** [http://people.xiph.org/~tdaede/video/SPIE_PVQ.webm Perceptual Vector Quantisation (PVQ)] - [http://people.xiph.org/~tterribe/daala/spie_pvq_slides.pdf Slides] - [http://jmvalin.ca/papers/spie_pvq.pdf Paper]<br />
** [http://people.xiph.org/~tdaede/video/SPIE_Tim.webm Overlapped Block Motion Compensation (OBMC)] - [http://people.xiph.org/~tterribe/daala/spie_pvq_slides.pdf Slides] - [https://people.xiph.org/~tterribe/daala/vbsobmc.pdf Paper]<br />
* 2015-01-31 [http://ftp.osuosl.org/pub/fosdem/2015/devroom-open_media/daala.mp4 Daala Project Update at FOSDEM 2015] - [https://fosdem.org/2015/schedule/event/daala/ summary] - [https://fosdem.org/2015/schedule/event/daala/attachments/slides/569/export/events/attachments/daala/slides/569/Daala_FOSDEM_2015.pdf Slides]<br />
* 2015-01-14 [https://www.youtube.co.uk/watch?v=Dmho4gcRvQ4 Linux Conf 2015] - [http://lca2015.linux.org.au/schedule/30187/view_talk presentation summary] - [https://people.xiph.org/~tterribe/pubs/lca2015/daala.pdf Slides]<br />
-------<br />
* 2014-09-16 [https://air.mozilla.org/daala-are-we-compressed-yet/ Daala: Are We Compressed Yet?]<br />
* 2014-06-25 [https://air.mozilla.org/sparsity-induced-prediction-for-images-and-video/ Sparsity Induced Prediction for Images and Video]<br />
* 2014-06-06 VP9 Summit (no video available) - [https://people.xiph.org/~xiphmont/demo/daala/daala-vp9summit-20140606.pdf Slides]<br />
-------<br />
* 2013-10-23 [https://people.xiph.org/~xiphmont/video/Free_Codecs_Update_Opus_and_Daala.ogv Opus and Daala: State of the Art Royalty-free Codecs] - [https://people.xiph.org/~greg/gstreamer-daala-opus.pdf Slides]<br />
* 2013-09-30 [https://people.xiph.org/~tterribe/daala/coding_party2/?C=M;O=A Daala Coding Party 2] - [https://people.xiph.org/~unlord/Daala-Intra.pdf Slides]<br />
* 2013-05-02 [https://people.xiph.org/~xiphmont/tim-terriberry-presents-daala/ Tim Terriberry Presents Daala]<br />
-------<br />
* 2012-01-24 [https://media.basilgohar.com/derf-talks/?C=M;O=A Introduction to Video Coding] - [https://people.xiph.org/~tterribe/pubs/lca2012/auckland/intro_to_video1.pdf Slides] (no video for slides 1-50)<br />
<br />
== Other Websites ==<br />
* [https://www.zazzle.com/daala_tee_shirt-235139149596175944 Daala T-shirts] - if you'd like a free one, help out with the project and ask the Mozilla guys nicely for one :-)<br />
* [https://www.xiph.org/donate/ Donate to Xiph.Org] <br />
* [[Daala_on_Wheels|Historical Daala wiki page]]<br />
<br />
[[Category:Daala]]</div>Derfhttps://wiki.xiph.org/index.php?title=DaalaRoadmap&diff=15592DaalaRoadmap2015-03-31T00:52:20Z<p>Derf: /* Plans for March, 2014 to September, 2014 */</p>
<hr />
<div>== Daala Planning ==<br />
<br />
This is an overview of the Daala project roadmap.<br />
<br />
'''Information on this page is highly subject to update and change.'''<br />
<br />
Please help reach out to us if you are interested in contributing to the project. We would love your help!<br />
<br />
=== Plans for 2014 ===<br />
<br />
See [https://daala.etherpad.mozilla.org/daala-plan-2014 this etherpad] for details. Most information has moved there. See also the [[Daala Weekly Meetings|weekly meeting minutes]] on [[Daala]] for current efforts.<br />
<br />
=== Plans for September, 2013 to March, 2014 ===<br />
<br />
==== Improve existing techniques ==== <br />
<br />
Examining and significantly modifying some of the basic coding tools within Daala to improve efficiency and quality:<br />
<br />
1) Lapped Transforms<br />
* [https://people.xiph.org/~xiphmont/demo/daala/demo1.shtml Monty's LT demo]<br />
* [[TDLT|Time-Domain Lapped Transforms]] wiki page<br />
* https://thanglong.ece.jhu.edu/Tran/Pub/prepost.pdf<br />
* https://research.microsoft.com/pubs/102075/malvar_elt_tsp1192.pdf<br />
<br />
2) Frequency Domain Intra-prediction <br />
* [https://people.xiph.org/~xiphmont/demo/daala/demo2.shtml Monty's Intra-prediction demo]<br />
* [[Intra|Intra-prediction]] wiki page<br />
<br />
3) Time/Frequency resolution switching <br />
* [https://people.xiph.org/~xiphmont/demo/daala/demo3.shtml Monty's TF switching demo]<br />
<br />
4) Chroma from Luma (CfL)<br />
* [https://people.xiph.org/~xiphmont/demo/daala/demo4.shtml Monty's CfL demo]<br />
<br />
5) Motion Compensation tools<br />
<br />
==== Research new techniques ====<br />
<br />
Investigate the following to see if they should be adopted into Daala:<br />
<br />
1) Edge-directed Interpolation <br />
* https://elynxsdk.free.fr/ext-docs/Demosaicing/more/news0/New%20Edge-Directed%20Interpolation.pdf<br />
* [https://exp.martres.me/edi/ smarter's EDI demo in Javascript]<br />
<br />
2) Multi-frame Motion Compensation<br />
<br />
==== Testing tools ====<br />
<br />
1) Experiment with command-line encode/decode/performance tools<br />
* Help people try the codec<br />
* Self-testing<br />
* Improvement metrics for casual contributors to verify changes<br />
<br />
2) Prototype RTP in GIPS/webrtc.org/browser code<br />
<br />
3) Prototype HTTP streaming.<br />
<br />
=== Plans for March, 2014 to September, 2014 ===<br />
<br />
<br />
From March, 2014 to June, 2014: Tuning the various coding tools and components of Daala (assumes investigation and major modifications of the basic coding tools are completed)<br />
<br />
By September, 2014: Be able to show significant quality improvements compared to Daala's performance today (September, 2013).<br />
<br />
=== Plans for 2015 ===<br />
<br />
[https://people.xiph.org/~tterribe/daala/daala-schedule-20150313/Daala%20Schedule.pdf Schedule by task]<br />
<br />
[https://people.xiph.org/~tterribe/daala/daala-schedule-20150313/Daala%20Schedule2.pdf Schedule by person]<br />
<br />
=== Progress and Planning Tools ===<br />
* Every week a Mumble meeting will occur to discuss current development.<br />
* Every 6-8 weeks the team will report on what he has accomplished.<br />
* Every month the team will create a detailed task list of what they plan to do for that month.<br />
<br />
=== Estimates of Technique Effectiveness ===<br />
<br />
These estimates are mostly guessing, based on no actual data. Take them with an entire salt shaker full of salt.<br />
<br />
{| border="1" cellspacing="0" cellpadding="5"<br />
|-<br />
! Technique<br />
! % of bitrate est.<br />
! risk<br />
|-<br />
| intraprediction<br />
| 10% <br />
| <br />
|-<br />
| rate control<br />
| 10%<br />
| low<br />
|-<br />
| multiple reference frames<br />
| 10%<br />
|<br />
|-<br />
| alternate motion predictors<br />
| 5%<br />
|<br />
|-<br />
| multi-resolution blending<br />
| 2%<br />
|<br />
|-<br />
| edge-directed interpolation<br />
| 4-5%<br />
|<br />
|-<br />
| sub-pel search<br />
| 2%<br />
|<br />
|-<br />
| mixed prediction (intra+inter)<br />
| 12%<br />
| high<br />
|-<br />
| generic encoder replacement/optimization<br />
| 1%<br />
|<br />
|-<br />
| skip work<br />
| 5%<br />
|<br />
|-<br />
| deringing<br />
| 10%<br />
| medium<br />
|-<br />
| bi-prediction<br />
| 15%<br />
|<br />
|-<br />
| k-tokenizer<br />
| +/- 1%<br />
|<br />
|-<br />
| 32x32 motion<br />
| 10%<br />
| low<br />
|-<br />
| don't code outside frame<br />
| 1-2%<br />
|<br />
|-<br />
| adaptive motion compensation work<br />
adjacent blocks differing by more than 1 level<br />
| 10%<br />
| high<br />
|-<br />
| don't use SAD (Sum of Absolute Differences)<br />
|<br />
|<br />
|}<br />
<br />
<br />
[[Category:Daala]]</div>Derfhttps://wiki.xiph.org/index.php?title=DaalaRoadmap&diff=15510DaalaRoadmap2015-02-27T15:45:31Z<p>Derf: /* Estimates of Technique Effectiveness */</p>
<hr />
<div>== Daala Planning ==<br />
<br />
This is an overview of the Daala project roadmap.<br />
<br />
'''Information on this page is highly subject to update and change.'''<br />
<br />
Please help reach out to us if you are interested in contributing to the project. We would love your help!<br />
<br />
=== Plans for 2014 ===<br />
<br />
See [https://daala.etherpad.mozilla.org/daala-plan-2014 this etherpad] for details. Most information has moved there. See also the [[Daala Weekly Meetings|weekly meeting minutes]] on [[Daala]] for current efforts.<br />
<br />
=== Plans for September, 2013 to March, 2014 ===<br />
<br />
==== Improve existing techniques ==== <br />
<br />
Examining and significantly modifying some of the basic coding tools within Daala to improve efficiency and quality:<br />
<br />
1) Lapped Transforms<br />
* [https://people.xiph.org/~xiphmont/demo/daala/demo1.shtml Monty's LT demo]<br />
* [[TDLT|Time-Domain Lapped Transforms]] wiki page<br />
* https://thanglong.ece.jhu.edu/Tran/Pub/prepost.pdf<br />
* https://research.microsoft.com/pubs/102075/malvar_elt_tsp1192.pdf<br />
<br />
2) Frequency Domain Intra-prediction <br />
* [https://people.xiph.org/~xiphmont/demo/daala/demo2.shtml Monty's Intra-prediction demo]<br />
* [[Intra|Intra-prediction]] wiki page<br />
<br />
3) Time/Frequency resolution switching <br />
* [https://people.xiph.org/~xiphmont/demo/daala/demo3.shtml Monty's TF switching demo]<br />
<br />
4) Chroma from Luma (CfL)<br />
* [https://people.xiph.org/~xiphmont/demo/daala/demo4.shtml Monty's CfL demo]<br />
<br />
5) Motion Compensation tools<br />
<br />
==== Research new techniques ====<br />
<br />
Investigate the following to see if they should be adopted into Daala:<br />
<br />
1) Edge-directed Interpolation <br />
* https://elynxsdk.free.fr/ext-docs/Demosaicing/more/news0/New%20Edge-Directed%20Interpolation.pdf<br />
* [https://exp.martres.me/edi/ smarter's EDI demo in Javascript]<br />
<br />
2) Multi-frame Motion Compensation<br />
<br />
==== Testing tools ====<br />
<br />
1) Experiment with command-line encode/decode/performance tools<br />
* Help people try the codec<br />
* Self-testing<br />
* Improvement metrics for casual contributors to verify changes<br />
<br />
2) Prototype RTP in GIPS/webrtc.org/browser code<br />
<br />
3) Prototype HTTP streaming.<br />
<br />
=== Plans for March, 2014 to September, 2014 ===<br />
<br />
<br />
From March, 2014 to June, 2014: Tuning the various coding tools and components of Daala (assumes investigation and major modifications of the basic coding tools are completed)<br />
<br />
By September, 2014: Be able to show significant quality improvements compared to Daala's performance today (September, 2013).<br />
<br />
=== Progress and Planning Tools ===<br />
* Every week a Mumble meeting will occur to discuss current development.<br />
* Every 6-8 weeks the team will report on what he has accomplished.<br />
* Every month the team will create a detailed task list of what they plan to do for that month.<br />
<br />
=== Estimates of Technique Effectiveness ===<br />
<br />
These estimates are mostly guessing, based on no actual data. Take them with an entire salt shaker full of salt.<br />
<br />
{| border="1" cellspacing="0" cellpadding="5"<br />
|-<br />
! Technique<br />
! % of bitrate est.<br />
! risk<br />
|-<br />
| intraprediction<br />
| 10% <br />
| <br />
|-<br />
| rate control<br />
| 10%<br />
| low<br />
|-<br />
| multiple reference frames<br />
| 10%<br />
|<br />
|-<br />
| alternate motion predictors<br />
| 5%<br />
|<br />
|-<br />
| multi-resolution blending<br />
| 2%<br />
|<br />
|-<br />
| edge-directed interpolation<br />
| 4-5%<br />
|<br />
|-<br />
| sub-pel search<br />
| 2%<br />
|<br />
|-<br />
| mixed prediction (intra+inter)<br />
| 12%<br />
| high<br />
|-<br />
| generic encoder replacement/optimization<br />
| 1%<br />
|<br />
|-<br />
| skip work<br />
| 5%<br />
|<br />
|-<br />
| deringing<br />
| 10%<br />
| medium<br />
|-<br />
| bi-prediction<br />
| 15%<br />
|<br />
|-<br />
| k-tokenizer<br />
| +/- 1%<br />
|<br />
|-<br />
| 32x32 motion<br />
| 10%<br />
| low<br />
|-<br />
| don't code outside frame<br />
| 1-2%<br />
|<br />
|-<br />
| adaptive motion compensation work<br />
adjacent blocks differing by more than 1 level<br />
| 10%<br />
| high<br />
|-<br />
| don't use SAD<br />
|<br />
|<br />
|}<br />
<br />
<br />
[[Category:Daala]]</div>Derfhttps://wiki.xiph.org/index.php?title=Main_Page&diff=15083Main Page2014-11-26T20:49:35Z<p>Derf: /* Software */</p>
<hr />
<div>In an effort to bring open-source ideals to the world of multimedia the [[Xiph.Org Foundation]] develops a multitude of amazing products. This wiki describes our free and open protocols and software.<br />
<br />
----<br />
<br />
<br />
= Demonstrations of Xiph technologies =<br />
<br />
Want to hear or see Xiph in action? These projects are using our codecs, formats, or libraries.<br />
<br />
* [[VorbisStreams|Vorbis Streams]]: Stations streaming with the [[Vorbis]] codec<br />
* [[Games that use Vorbis]]: Games using the Vorbis codec for music or sound effects<br />
* [[VorbisHardware|Vorbis Hardware]]: Hardware players using the Vorbis codec<br />
* [[VorbisSoftwarePlayers|Vorbis Software Players]]: list of media players with out-of-box support for Vorbis<br />
* [[TheoraHardware|Theora Hardware]]: Hardware using the Theora video codec<br />
* [[TheoraSoftwarePlayers|Theora Software Players]]: list of media players with Theora support<br />
* [[List of Theora videos]]: Sources for video encoded with [[Theora]]<br />
<br />
= Projects/Formats =<br />
<br />
== Container Formats ==<br />
<br />
* [[Ogg]]: Media container. This is our native format and the recommended container for Xiph codecs.<br />
** [[Ogg Skeleton]]: Skeleton information on all logical content bitstreams in Ogg.<br />
** [[MIMETypesCodecs|Specification of MIME types and respective codecs parameter]]<br />
* [[SpeexRTP]]: RTP payload format for voice<br />
* [[VorbisRTP]]: RTP payload format for general audio<br />
* [[TheoraRTP]]: RTP payload format for video<br />
* [[XSPF]]: XML Sharable Playlist Format<br />
<br />
== Codecs ==<br />
<br />
* '''Compressed Audio/Video Codecs:'''<br />
** [[Vorbis]]: Audio codec with a [[Tremor|fixed point decoder]]<br />
** [[Theora]]: Video codec<br />
** [[FLAC]]: Free Lossless Audio Codec<br />
** [[Speex]]: Speech codec<br />
** [[OpusFAQ|Opus]]: Low-latency general-purpose audio codec<br />
* '''Uncompressed Audio/Video Codecs:'''<br />
** [[OggPCM]]: Audio codec<br />
* '''Timed Text/Metadata Codecs:'''<br />
** [[CMML]]: Continuous Media Markup Language, used for [http://www.annodex.net/ Annodex] and subtitles (xine, vlc, gstreamer, and DirectShow support)<br />
**[[OggKate|Kate]]: new format for lyrics and subtitles<br />
<br />
== Software ==<br />
<br />
* '''Software for distributing media'''<br />
** [[Icecast]]: Streaming server<br />
** [[IceS]]: Source client for Icecast servers<br />
<br />
* '''Libraries'''<br />
** [[OggPlay]]: library for synchronised Xiph media playback<br />
**[[XiphQT]]: Quicktime component to play the main Xiph formats<br />
** [[VorbisCommentEdit]]: Macintosh Framework making it easy to incorporate the editing of [[VorbisComment|Vorbis Comments]]<br />
<br />
* '''Other software'''<br />
** [[OggComponent/VorbisComponent]]: Wrappers to integrate Vorbis into Mac OS X (does not yet support encoding)<br />
** [http://xiph.org/paranoia/ cdparanoia]: CDDA extractor/ripper<br />
<br />
== Community ==<br />
<br />
*[[How to help]]<br />
*[[Spread Open Media]]: project to promote Xiph formats.<br />
**[[MailOgging]]: provides templates for anyone willing to contact a company requesting them to add support for Xiph formats.<br />
*[[People]]: Who's who in Xiph.<br />
<br />
== Work in Progress ==<br />
* [[Work In Progress]]: codecs and software still in the research and development stages.<br />
* [[Todo]]: To-do list for various Xiph projects.<br />
<br />
= Project management =<br />
<br />
* [[AdminProcesses]]: who's in charge of what project<br />
* [[MonthlyMeeting]]: page with information on Xiph's MonthlyMeeting<br />
* [[MailingLists]]: list of Xiph's mailing lists<br />
* [[Bounties]]: list of bounties that you can take to improve Xiph's projects<br />
<br />
= Resources for Video and Audio programmers =<br />
<br />
* [[Ambisonics]]: page with technical information on Ambisonics<br />
* [[Resources and papers on Audio, Music and Speech|Courses and papers on Audio, Music and Speech]]: page with links to MIT and other universities' content<br />
* [[Oggless]]: for ideas on how to use the different Xiph codecs outside Ogg<br />
<br />
= Wiki internal =<br />
<br />
* [[Translations]]: What about some translation work<br />
* [[Sandbox]]: Testbed for testing editing skills<br />
* [[XiphWiki:Copyrights]]: License used for all content on the XiphWiki</div>Derfhttps://wiki.xiph.org/index.php?title=Daala_Quickstart&diff=15063Daala Quickstart2014-11-10T01:12:16Z<p>Derf: /* Pre-requisites */</p>
<hr />
<div>= Getting Started =<br />
<br />
This is a simple guide to getting the code and encoding a simple video.<br />
<br />
== Installation ==<br />
<br />
=== Pre-requisites ===<br />
* Standard build tools (autoconf, automake v1.11 or later, libtool, pkg-config, and a C compiler)<br />
* git<br />
* libogg (v1.3 or later)<br />
* libpng<br />
* libjpeg<br />
* libcheck (v0.9.8 or later, can be skipped if you pass --disable-unit-tests to ./configure)<br />
* libsdl (can by skipped if you pass --disable-player to ./configure)<br />
<br />
Instructions for installing these packages are OS-specific (feel free to contribute some here, especially if you tried installing these somewhere and ran into difficulties; you will likely save other people some pain). If you have a package manager that has separate -dev versions with the public headers, make sure you install those in addition to the actual libraries.<br />
<br />
==== Mac OS X ====<br />
Install Apple's command line developer tools. E.g. install [https://developer.apple.com/xcode/ Xcode] from the App Store and select 'Command Line Tools' from the Preferences::Downloads panel, or download and install the pkg directly from [https://developer.apple.com/downloads/ developer.apple.com].<br />
<br />
Install [http://brew.sh/ Homebrew]<br />
<br />
Run the following command to install dependencies:<br />
brew install autoconf automake libtool libogg libpng libjpeg check sdl<br />
<br />
=== Installation Procedure ===<br />
<br />
Just run these commands:<br />
<br />
git clone https://git.xiph.org/daala.git<br />
cd daala<br />
./autogen.sh<br />
./configure<br />
make<br />
<br />
Note that the git clone can take several minutes to complete.<br />
<br />
And optionally<br />
<br />
make tools<br />
<br />
Make sure you run the git clone operation on the same machine where you intend to use the code. Checking out a copy on Windows and then trying to use it on Linux will not work, as executable permissions and line-endings will not be set properly.<br />
<br />
== Encoding a Video ==<br />
<br />
If you do not have one, get a sample video or two in .y4m format from [https://media.xiph.org/video/derf/ media.xiph.org]. These videos are relatively large and will take a long time to encode. There are also subsets of 1 second long videos for faster encoding:<br />
* [https://people.xiph.org/~tdaede/video-1-short/ video-1-short]<br />
<br />
We also maintain a set of still-image collections in .y4m format:<br />
* [https://people.xiph.org/~tterribe/daala/subset1-y4m.tar.gz Subset 1] (50 images, small training set)<br />
* [https://people.xiph.org/~tterribe/daala/subset2-y4m.tar.gz Subset 2] (50 images, small testing set)<br />
* [https://people.xiph.org/~tterribe/daala/subset3-y4m.tar.gz Subset 3] (1000 images, large training set)<br />
* [https://people.xiph.org/~tterribe/daala/subset4-y4m.tar.gz Subset 4] (1000 images, large testing set)<br />
<br />
Encode the video:<br />
<br />
./examples/encoder_example -v 30 video.y4m -o video.ogv<br />
<br />
where<br />
* video.y4m is the input video you want to encode,<br />
* video.ogv is the name of the encoded video file to output,<br />
* -v specifies the quality (currently from 0 to 511, where 0 is lossless)<br />
<br />
== Decoding/Playing a Video ==<br />
<br />
Play the video in a window:<br />
<br />
./examples/player_example video.ogv<br />
<br />
For information on the controls available while playing, run<br />
<br />
./examples/player_example --help<br />
<br />
If you want to use a different player, you can decode the video back to .y4m with<br />
<br />
./examples/dump_video video.ogv -o decoded_video.y4m<br />
<br />
Many other players can play back these .y4m files, and other tools can convert them to various other formats.<br />
<br />
== Using PNG Images ==<br />
<br />
To encode a series of images:<br />
<br />
make tools<br />
./tools/png2y4m video%05d.png -o video.y4m<br />
<br />
where %05d means your input images are named video00000.png, video00001.png, etc. You can leave out the %05d tag if you only want to convert a single image (which does not need to be numbered).<br />
<br />
To convert a y4m back to PNGs:<br />
<br />
./tools/y4m2png video.y4m -o video%05d.png<br />
<br />
If you are converting a .y4m file that only contains a single frame (e.g., from one of the still-image subsets linked above), you can leave out the %05d tag. Conversion from PNG to Y4M uses the Rec 709 matrix with video levels, a box filter for chroma subsampling, and a triangular dither. Conversion back from Y4M to PNG uses the same matrix, levels, and box filter, but does not dither.<br />
<br />
== Creating y4m from other formats ==<br />
<br />
You can use the ffmpeg tool to generate y4m from any of it supported video formats:<br />
<br />
ffmpeg -i video.webm -pix_fmt yuv420p video.y4m<br />
<br />
Note that ffmpeg is optimized for speed. You may not get repeatable results across machines.</div>Derfhttps://wiki.xiph.org/index.php?title=DaalaTodo&diff=15020DaalaTodo2014-10-07T20:17:19Z<p>Derf: </p>
<hr />
<div>= Simple Things =<br />
<br />
* Overflow checking for allocations in od_state_ref_imgs_init().<br />
<br />
<br />
= Tuning =<br />
* Quantization matrix<br />
* Beta<br />
* Inter band masking<br />
* PVQ RDO<br />
* Better MV cost estimates<br />
* Better MC distortion metrics (SATD, some MSE/SATD hybrid, no-ref-aware, maybe table driven Theora-style, etc.)<br />
* Better MV split flag rate estimates<br />
<br />
= Known broken/suboptimal =<br />
* Block size decision<br />
** Never tuned on inter<br />
** Not considering chroma<br />
* Code quantizers on a log scale<br />
* Re-order bitstream (e.g., don't code all MVs at the front, etc.)<br />
* Investigate bias introduced by coefficient scaling<br />
<br />
= New work =<br />
<br />
* VQ for inter frames, with codebook selected based on dim m<br />
* Motion vector "mask" based on previous frame<br />
* Use coeff magnitude correlation for modelling<br />
* Per SB/MB/block/something quantizer changes<br />
* Variable Framerate support<br />
* Dynamic frame size changes (without keyframes)<br />
<br />
= Entropy coding =<br />
<br />
* Make the Laplace vector encoder (aka k-tokenizer) faster<br />
* Add "skip-all remaining bands" flag<br />
* Better encoding of the CfL sign<br />
* Add SIMD to the decoder search</div>Derfhttps://wiki.xiph.org/index.php?title=Mp4Opus&diff=14996Mp4Opus2014-09-21T17:12:23Z<p>Derf: </p>
<hr />
<div>{{draft}}<br />
This is a draft encapsulation guide for [[Opus]] audio in the mp4 (ISO Base) media container.<br />
<br />
[http://wiki.multimedia.cx/index.php?title=MP4 MP4] already has support for declaring encoder delay and pre-roll.<br />
<br />
For pre-roll I believe we can use 'AudioRollRecoveryEntry' for pre-roll.<br />
<br />
For delay, Daemon404 suggested "whatever l-smash maps encoder delay to."<br />
yusuke says, "ISO and Apple recommend the use of edit list for removing priming samples from the presentation." libavformat's demuxer supports *one* edit list.<br />
<br />
<br />
There's some work on codec-independent channel mapping, downmix and dynamic range control as part of [http://mpeg.chiariglione.org/standards/mpeg-4/iso-base-media-file-format/text-isoiec-14496-122012-pdam-4-enhanced-audio-support ISO 14496-12 Amd4] We might be able use some of that, but it doesn't support the Opus case of needing to indicated which streams are coupled pairs. We'll still need to define our own extension for this.<br />
<br />
Question: Better to reuse the channel mapping header entirely, or just report the coupled stream count and use the downmix table to do the mapping?<br />
<br />
Possible registration process, send an email to http://mp4ra.org/request.html. Possible registrar is 'Opus'.<br />
<br />
Internally everything is resampled at 48000, this is always output by the decoder, floating point numbers. But the original sample rate is stored so that the decoder can act upon it. Atom AudioSampleEntry will have 48 and the original one will be stored in the codec's one.<br />
<br />
'''Global Header'''<br />
<br />
''AudioRollRecoveryEntry'' - shall have a value of 3840 (80ms * 48k)<br />
<br />
''AudioSampleEntry'' - hardcoded at 16fp<br />
<br />
''descriptor'' - same as ogg rather than as ts, to keep things simple<br />
<br />
''channel count'' - already included <br />
''pre skip'' - already included<br />
<br />
''Gain'' - volume atom? unused in practice - oggheader - not in ts (TODO?)<br />
when you decode samples you're supposed to multiply against this value, so that decoder can apply post volume<br />
Reusing the one in ogg.<br />
<br />
''mapping family'' (with vorbis mapping)<br />
- mono/stereo no channel config<br />
- specify # channel<br />
- map to the # ouput<br />
<br />
audio channel layout https://developer.apple.com/library/mac/documentation/musicaudio/reference/CoreAudioDataTypesRef/Reference/reference.html - too complex<br />
plug it from ogg and put it in our custom atom<br />
<br />
'''Things to put in custom atom'''<br />
- input sr<br />
- output gain<br />
- channel mappaing<br />
- channel count (for backup)<br />
<br />
'''Opus''' is the name of the atom, like in TS<br />
<br />
what about album art? quicktime/mp4/mp3</div>Derfhttps://wiki.xiph.org/index.php?title=Mp4Opus&diff=14995Mp4Opus2014-09-21T17:11:54Z<p>Derf: </p>
<hr />
<div>{{draft}}<br />
This is a draft encapsulation guide for [[Opus]] audio in the mp4 (ISO Base) media container.<br />
<br />
[http://wiki.multimedia.cx/index.php?title=MP4 MP4] already has support for declaring encoder delay and pre-roll.<br />
<br />
For pre-roll I believe we can use 'AudioRollRecoveryEntry' for pre-roll.<br />
<br />
For delay, Daemon404 suggested "whatever l-smash maps encoder delay to."<br />
yusuke says, "ISO and Apple recommend the use of edit list for removing priming samples from the presentation." libavformat's demuxer supports *one* edit list.<br />
<br />
<br />
There's some work on codec-independent channel mapping, downmix and dynamic range control as part of [http://mpeg.chiariglione.org/standards/mpeg-4/iso-base-media-file-format/text-isoiec-14496-122012-pdam-4-enhanced-audio-support ISO 14496-12 Amd4] We might be able use some of that, but it doesn't support the Opus case of needing to indicated which streams are coupled pairs. We'll still need to define our own extension for this.<br />
<br />
Question: Better to reuse the channel mapping header entirely, or just report the coupled stream count and use the downmix table to do the mapping?<br />
<br />
Possible registration process, send an email to http://mp4ra.org/request.html. Possible registrar is 'Opus'.<br />
<br />
Internally everything is resampled at 48000, this is always output by the decoder, floating point numbers. But the original sample rate is stored so that the decoder can act upon it. Atom AudioSampleEntry will have 48 and the original one will be stored in the codec's one.<br />
<br />
'''Global Header'''<br />
<br />
''AudioRollRecoveryEntry'' - shall have a value of 3840 (80ms * 48k)<br />
''AudioSampleEntry'' - hardcoded at 16fp<br />
<br />
''descriptor'' - same as ogg rather than as ts, to keep things simple<br />
<br />
''channel count'' - already included <br />
''pre skip'' - already included<br />
<br />
''Gain'' - volume atom? unused in practice - oggheader - not in ts (TODO?)<br />
when you decode samples you're supposed to multiply against this value, so that decoder can apply post volume<br />
Reusing the one in ogg.<br />
<br />
''mapping family'' (with vorbis mapping)<br />
- mono/stereo no channel config<br />
- specify # channel<br />
- map to the # ouput<br />
<br />
audio channel layout https://developer.apple.com/library/mac/documentation/musicaudio/reference/CoreAudioDataTypesRef/Reference/reference.html - too complex<br />
plug it from ogg and put it in our custom atom<br />
<br />
'''Things to put in custom atom'''<br />
- input sr<br />
- output gain<br />
- channel mappaing<br />
- channel count (for backup)<br />
<br />
'''Opus''' is the name of the atom, like in TS<br />
<br />
what about album art? quicktime/mp4/mp3</div>Derfhttps://wiki.xiph.org/index.php?title=Daala_Quickstart&diff=14903Daala Quickstart2014-08-18T01:41:43Z<p>Derf: /* Encoding a Video */</p>
<hr />
<div>= Getting Started =<br />
<br />
This is a simple guide to getting the code and encoding a simple video.<br />
<br />
== Installation ==<br />
<br />
=== Pre-requisites ===<br />
* Standard build tools (autoconf, automake v1.11 or later, libtool, and a C compiler)<br />
* git<br />
* libogg (v1.3 or later)<br />
* libpng<br />
* libjpeg<br />
* libcheck (v0.9.8 or later, can be skipped if you pass --disable-unit-tests to ./configure)<br />
* libsdl (can by skipped if you pass --disable-player to ./configure)<br />
<br />
Instructions for installing these packages are OS-specific (feel free to contribute some here, especially if you tried installing these somewhere and ran into difficulties; you will likely save other people some pain). If you have a package manager that has separate -dev versions with the public headers, make sure you install those in addition to the actual libraries.<br />
<br />
==== Mac OS X ====<br />
Install Apple's command line developer tools. E.g. install [https://developer.apple.com/xcode/ Xcode] from the App Store and select 'Command Line Tools' from the Preferences::Downloads panel, or download and install the pkg directly from [https://developer.apple.com/downloads/ developer.apple.com].<br />
<br />
Install [http://brew.sh/ Homebrew]<br />
<br />
Run the following command to install dependencies:<br />
brew install autoconf automake libtool libogg libpng libjpeg check sdl<br />
<br />
=== Installation Procedure ===<br />
<br />
Just run these commands:<br />
<br />
git clone https://git.xiph.org/daala.git<br />
cd daala<br />
./autogen.sh<br />
./configure<br />
make<br />
<br />
Note that the git clone can take several minutes to complete.<br />
<br />
And optionally<br />
<br />
make tools<br />
<br />
Make sure you run the git clone operation on the same machine where you intend to use the code. Checking out a copy on Windows and then trying to use it on Linux will not work, as executable permissions and line-endings will not be set properly.<br />
<br />
== Encoding a Video ==<br />
<br />
If you do not have one, get a sample video or two in .y4m format from [https://media.xiph.org/video/derf/ media.xiph.org]. These videos are relatively large and will take a long time to encode. There are also subsets of 1 second long videos for faster encoding:<br />
* [https://people.xiph.org/~tdaede/video-1-short/ video-1-short]<br />
<br />
We also maintain a set of still-image collections in .y4m format:<br />
* [https://people.xiph.org/~tterribe/daala/subset1-y4m.tar.gz Subset 1] (50 images, small training set)<br />
* [https://people.xiph.org/~tterribe/daala/subset2-y4m.tar.gz Subset 2] (50 images, small testing set)<br />
* [https://people.xiph.org/~tterribe/daala/subset3-y4m.tar.gz Subset 3] (1000 images, large training set)<br />
* [https://people.xiph.org/~tterribe/daala/subset4-y4m.tar.gz Subset 4] (1000 images, large testing set)<br />
<br />
Encode the video:<br />
<br />
./examples/encoder_example -v 30 video.y4m -o video.ogv<br />
<br />
where<br />
* video.y4m is the input video you want to encode,<br />
* video.ogv is the name of the encoded video file to output,<br />
* -v specifies the quality (currently from 0 to 511, where 0 is lossless)<br />
<br />
== Decoding/Playing a Video ==<br />
<br />
Play the video in a window:<br />
<br />
./examples/player_example video.ogv<br />
<br />
For information on the controls available while playing, run<br />
<br />
./examples/player_example --help<br />
<br />
If you want to use a different player, you can decode the video back to .y4m with<br />
<br />
./examples/dump_video video.ogv -o decoded_video.y4m<br />
<br />
Many other players can play back these .y4m files, and other tools can convert them to various other formats.<br />
<br />
== Using PNG Images ==<br />
<br />
To encode a series of images:<br />
<br />
make tools<br />
./tools/png2y4m video%05d.png -o video.y4m<br />
<br />
where %05d means your input images are named video00000.png, video00001.png, etc. You can leave out the %05d tag if you only want to convert a single image (which does not need to be numbered).<br />
<br />
To convert a y4m back to PNGs:<br />
<br />
./tools/y4m2png video.y4m -o video%05d.png<br />
<br />
If you are converting a .y4m file that only contains a single frame (e.g., from one of the still-image subsets linked above), you can leave out the %05d tag. Conversion from PNG to Y4M uses the Rec 709 matrix with video levels, a box filter for chroma subsampling, and a triangular dither. Conversion back from Y4M to PNG uses the same matrix, levels, and box filter, but does not dither.<br />
<br />
== Creating y4m from other formats ==<br />
<br />
You can use the ffmpeg tool to generate y4m from any of it supported video formats:<br />
<br />
ffmpeg -i video.webm -pix_fmt yuv420p video.y4m<br />
<br />
Note that ffmpeg is optimized for speed. You may not get repeatable results across machines.</div>Derfhttps://wiki.xiph.org/index.php?title=Daala_Quickstart&diff=14717Daala Quickstart2014-06-02T23:02:16Z<p>Derf: /* Installation Procedure */</p>
<hr />
<div>= Getting Started =<br />
<br />
This is a simple guide to getting the code and encoding a simple video.<br />
<br />
== Installation ==<br />
<br />
=== Pre-requisites ===<br />
* Standard build tools (autoconf, automake v1.11 or later, libtool, and a C compiler)<br />
* git<br />
* libogg (v1.3 or later)<br />
* libpng<br />
* libjpeg<br />
* libcheck (v0.9.8 or later, can be skipped if you pass --disable-unit-tests to ./configure)<br />
* libsdl (can by skipped if you pass --disable-player to ./configure)<br />
<br />
Instructions for installing these packages are OS-specific (feel free to contribute some here, especially if you tried installing these somewhere and ran into difficulties; you will likely save other people some pain). If you have a package manager that has separate -dev versions with the public headers, make sure you install those in addition to the actual libraries.<br />
<br />
=== Installation Procedure ===<br />
<br />
Just run these commands:<br />
<br />
git clone git@git.xiph.org:daala.git<br />
cd daala<br />
./autogen.sh<br />
./configure<br />
make<br />
<br />
And optionally<br />
<br />
make tools<br />
<br />
Make sure you run the git clone operation on the same machine where you intend to use the code. Checking out a copy on Windows and then trying to use it on Linux will not work, as executable permissions and line-endings will not be set properly.<br />
<br />
== Encoding a Video ==<br />
<br />
If you do not have one, get a sample video or two in .y4m format from [https://media.xiph.org/video/derf/ media.xiph.org]. We also maintain a set of still-image collections in .y4m format:<br />
* [https://people.xiph.org/~tterribe/daala/subset1-y4m.tar.gz Subset 1] (50 images, small training set)<br />
* [https://people.xiph.org/~tterribe/daala/subset2-y4m.tar.gz Subset 2] (1000 images, large training set)<br />
* [https://people.xiph.org/~tterribe/daala/subset3-y4m.tar.gz Subset 3] (50 images, small testing set)<br />
* [https://people.xiph.org/~tterribe/daala/subset4-y4m.tar.gz Subset 4] (1000 images, large testing set)<br />
<br />
Encode the video:<br />
<br />
./examples/encoder_example -v 30 -k 256 video.y4m -o video.ogv<br />
<br />
where<br />
* video.y4m is the input video you want to encode,<br />
* video.ogv is the name of the encoded video file to output,<br />
* -v specifies the quality (currently from 0 to 511, where 0 is lossless), and<br />
* -k specifies the maximum keyframe interval (this is currently 1 by default, which makes every frame a keyframe).<br />
<br />
== Decoding a Video ==<br />
<br />
Play the video:<br />
<br />
./examples/player_example -p video.ogv<br />
<br />
The -p option starts the player paused. Run<br />
<br />
./examples/player_example --help<br />
<br />
for information on the controls available while playing.<br />
<br />
If you want to use a different player, you can decode the video back to .y4m with<br />
<br />
./examples/dump_video video.ogv -o decoded_video.y4m<br />
<br />
Many other players can play back these .y4m files, and other tools can convert them to various other formats.<br />
<br />
== Using PNG Images ==<br />
<br />
To encode a series of images:<br />
<br />
make tools<br />
./tools/png2y4m video%05d.png -o video.y4m<br />
<br />
where %05d means your input images are named video00000.png, video00001.png, etc. You can leave out the %05d tag if you only want to convert a single image (which does not need to be numbered).<br />
<br />
To convert a y4m back to PNGs:<br />
<br />
./tools/y4m2png video.y4m -o video%05d.png<br />
<br />
If you are converting a .y4m file that only contains a single frame (e.g., from one of the still-image subsets linked above), you can leave out the %05d tag. Conversion from PNG to Y4M uses the Rec 709 matrix with video levels, a box filter for chroma subsampling, and a triangular dither. Conversion back from Y4M to PNG uses the same matrix, levels, and box filter, but does not dither.</div>Derfhttps://wiki.xiph.org/index.php?title=Daala_Quickstart&diff=14694Daala Quickstart2014-05-21T21:45:47Z<p>Derf: Created page with "= Getting Started = This is a simple guide to getting the code and encoding a simple video. == Installation == === Pre-requisites === * Standard build tools (autoconf, automak..."</p>
<hr />
<div>= Getting Started =<br />
<br />
This is a simple guide to getting the code and encoding a simple video.<br />
<br />
== Installation ==<br />
<br />
=== Pre-requisites ===<br />
* Standard build tools (autoconf, automake v1.11 or later, libtool, and a C compiler)<br />
* git<br />
* libogg (v1.3 or later)<br />
* libpng<br />
* libjpeg<br />
* libcheck (v0.9.8 or later, can be skipped if you pass --disable-unit-tests to ./configure)<br />
* libsdl (can by skipped if you pass --disable-player to ./configure)<br />
<br />
Instructions for installing these packages are OS-specific (feel free to contribute some here, especially if you tried installing these somewhere and ran into difficulties; you will likely save other people some pain). If you have a package manager that has separate -dev versions with the public headers, make sure you install those in addition to the actual libraries.<br />
<br />
=== Installation Procedure ===<br />
<br />
Just run these commands:<br />
<br />
git clone git@git.xiph.org:/daala.git<br />
cd daala<br />
./autogen.sh<br />
./configure<br />
make<br />
<br />
And optionally<br />
<br />
make tools<br />
<br />
Make sure you run the git clone operation on the same machine where you intend to use the code. Checking out a copy on Windows and then trying to use it on Linux will not work, as executable permissions and line-endings will not be set properly.<br />
<br />
== Encoding a Video ==<br />
<br />
If you do not have one, get a sample video or two in .y4m format from [https://media.xiph.org/video/derf/ media.xiph.org]. We also maintain a set of still-image collections in .y4m format:<br />
* [https://people.xiph.org/~tterribe/daala/subset1-y4m.tar.gz Subset 1] (50 images, small training set)<br />
* [https://people.xiph.org/~tterribe/daala/subset2-y4m.tar.gz Subset 2] (1000 images, large training set)<br />
* [https://people.xiph.org/~tterribe/daala/subset3-y4m.tar.gz Subset 3] (50 images, small testing set)<br />
* [https://people.xiph.org/~tterribe/daala/subset4-y4m.tar.gz Subset 4] (1000 images, large testing set)<br />
<br />
Encode the video:<br />
<br />
./examples/encoder_example -v 30 -k 256 video.y4m -o video.ogv<br />
<br />
where<br />
* video.y4m is the input video you want to encode,<br />
* video.ogv is the name of the encoded video file to output,<br />
* -v specifies the quality (currently from 0 to 511, where 0 is lossless), and<br />
* -k specifies the maximum keyframe interval (this is currently 1 by default, which makes every frame a keyframe).<br />
<br />
== Decoding a Video ==<br />
<br />
Play the video:<br />
<br />
./examples/player_example -p video.ogv<br />
<br />
The -p option starts the player paused. Run<br />
<br />
./examples/player_example --help<br />
<br />
for information on the controls available while playing.<br />
<br />
If you want to use a different player, you can decode the video back to .y4m with<br />
<br />
./examples/dump_video video.ogv -o decoded_video.y4m<br />
<br />
Many other players can play back these .y4m files, and other tools can convert them to various other formats.<br />
<br />
== Using PNG Images ==<br />
<br />
To encode a series of images:<br />
<br />
make tools<br />
./tools/png2y4m video%05d.png -o video.y4m<br />
<br />
where %05d means your input images are named video00000.png, video00001.png, etc. You can leave out the %05d tag if you only want to convert a single image (which does not need to be numbered).<br />
<br />
To convert a y4m back to PNGs:<br />
<br />
./tools/y4m2png video.y4m -o video%05d.png<br />
<br />
If you are converting a .y4m file that only contains a single frame (e.g., from one of the still-image subsets linked above), you can leave out the %05d tag. Conversion from PNG to Y4M uses the Rec 709 matrix with video levels, a box filter for chroma subsampling, and a triangular dither. Conversion back from Y4M to PNG uses the same matrix, levels, and box filter, but does not dither.</div>Derfhttps://wiki.xiph.org/index.php?title=DaalaReview&diff=14204DaalaReview2013-06-19T18:21:41Z<p>Derf: </p>
<hr />
<div>This is a '''proposal''' for a more flexible review process. It is a set of guidelines for the most appropriate approach for different types of changes. Above all, use your judgement.<br />
<br />
== Definitions ==<br />
*A) Full review: Same as current process. Can be based on Rietveld, "git format-patch", or a pull request<br />
*B) Post-review: Code gets committed to repo first, committer nags reviewer weekly if necessary<br />
*C) Design review: Discuss with original author (and possibly others) for the principle of a change, can (and probably should) be done even before doing the work<br />
*D) No review: self-explanatory<br />
<br />
== Guidelines ==<br />
*0) Full-review on anything the committer has reasons to believe would be controversial, buggy, ...<br />
*1) Generally no review on comments/whitespace/style/documentation<br />
** Except potentially when changing Doxygen API comments<br />
*2)Post-review on code that is already badly broken, or on simple bug fixes<br />
*3) No review on initial commit of new stuff that isn't being called<br />
** Full-review required to hook any code not reviewed because of 3)<br />
** If code in 3) breaks the build, it gets fixed immediately or removed<br />
** No review on new tools until they get used by other people<br />
*4) Code that is "obviously correct" or that is trivially testable gets post-review<br />
*5) Code that may have subtle implications gets full review<br />
*6) Refactoring of someone else's code gets design review, followed by either post-review (or full review if requested during design review)<br />
*7) Disruptive work (e.g. block switching) gets both design review and full review.<br />
*8) Most build system changes get no review (unless 0) applies)<br />
*9) Post-review on test code<br />
<br />
== Examples ==<br />
1) [https://git.xiph.org/?p=daala.git;a=commitdiff;h=da726b32 da726b32], [https://git.xiph.org/?p=daala.git;a=commitdiff;h=53362b77 53362b77]<br />
2) [https://git.xiph.org/?p=daala.git;a=commitdiff;h=212d13b3 212d13b3], <br />
3) [https://git.xiph.org/?p=daala.git;a=commitdiff;h=76ee32c8 76ee32c8], [https://git.xiph.org/?p=daala.git;a=commitdiff;h=75b62b23 75b62b23]<br />
4) [https://git.xiph.org/?p=daala.git;a=commitdiff;h=7c102188 7c102188], [https://git.xiph.org/?p=daala.git;a=commitdiff;h=50732d66 50732d66]<br />
5) [https://git.xiph.org/?p=daala.git;a=commitdiff;h=8cd46564 8cd46564], [https://git.xiph.org/?p=daala.git;a=commitdiff;h=74dc3616 74dc3616]<br />
6) [https://git.xiph.org/?p=daala.git;a=commitdiff;h=208ea300 208ea300]<br />
7) [https://git.xiph.org/?p=daala.git;a=commitdiff;h=b4bfee65 b4bfee65]<br />
8) [https://git.xiph.org/?p=daala.git;a=commitdiff;h=7cc3cc5f 7cc3cc5f], [https://git.xiph.org/?p=daala.git;a=commitdiff;h=487fe063 487fe063]<br />
<br />
== Team reviews ==<br />
*1) Once in a while, we should pick a piece of code<br />
*2) Original author cleans up code, adds comments<br />
*3) Code discussed with entire team<br />
*4) Suggestions for improvement (with emphasis on algo)</div>Derfhttps://wiki.xiph.org/index.php?title=Talk:TransOgg_Seeking_Proposals&diff=14083Talk:TransOgg Seeking Proposals2013-04-09T22:12:30Z<p>Derf: Created page with "''> The DTS and backreference encoding was defined by-codec,<br/> ''> and thus decoding the timestamps required additional<br/> ''> infrastructure that most frameworks had to cod..."</p>
<hr />
<div>''> The DTS and backreference encoding was defined by-codec,<br/><br />
''> and thus decoding the timestamps required additional<br/><br />
''> infrastructure that most frameworks had to code from<br/><br />
''> scratch. Several frameworks never implemented precise<br/><br />
''> seeking for this reason alone.''<br />
<br />
In our defense, this allowed people to repeatedly mux things into Ogg that would have been impossible based on what we knew when Ogg was designed: the Theora keyframe backreference scheme, the Dirac generalized B-frame scheme, the VP8 "invisible" ALT refs, the Opus pre-skip and pre-roll, etc.<br />
<br />
We can argue that we now know all these things and that no more new ones will come along, but the MKV folks thought that, too, and the last two (alt-refs and pre-skip/pre-roll) proved them wrong.<br />
<br />
Of course, they also don't handle basic Vorbis end-time trimming correctly, so it's not like they even got the stuff that was known at the time correct. A good design here is hard work. That's not a reason not to try, and I know you know all of this, but I think we should be aware of the risks.<br />
<br />
''> Xiph never implemented its own all-encompassing framework<br/><br />
''> to provide an example of complete seeking that worked in<br/><br />
''> any Ogg file.''<br />
<br />
Arguably, even if we had, people who already have their own frameworks (gstreamer, FFmpeg, VLC, etc.) wouldn't have used it.<br />
<br />
''> Stream structure discovery was also based on performing<br/><br />
''> multiple bisection searches to find link boundaries.''<br />
<br />
The important point here is that because you have '''no''' information about where the link boundary is, nor what kind of streams are contained in the data after the boundary, you can't do much better than dumb bisection (I tried in libopusfile, with some success, but it was basically improving the constant out front... each bisection was still log(N) where N is the size of the link).<br />
<br />
But this can be solved by a simple back-pointer at the end of each link. That makes link enumeration 1 seek per link.<br />
<br />
''> Given the practically poor and unstable performance of<br/><br />
''> preceding bisection seeking implmentations, it will be<br/><br />
''> difficult to sell adopter opinion on an updated bisection<br/><br />
''> design, even one that farts unicorns.''<br />
<br />
This is actually a really important point. I think we should get buy-in from people who actually want to use this for something ''before'' we try go off and make it. The guys making NUT spent just as much effort on design work as we did (their conversations actually sounded an awful lot like the ones we had designing Ogg), and yet... no one uses NUT. Our resources and attention are finite.<br />
<br />
''> Although it is still possible to mux an invalid file, it<br/><br />
''> is much harder to end up with a stream in which<br/><br />
''> second-order seeking 'helper' structures disagree with<br/><br />
''> the authoritative timestamps within the page data.''<br />
<br />
The seeking implementation still has to deal with a bunch of potentially invalid data to avoid going off the rails. For example, timestamps that are aren't ordered properly, timestamps that don't lie in the range of the stream start and end times, timestamps whose difference overflows a 64-bit signed integer, etc. Large amount of garbage data at the end of the file make most code used to find the stream length O(N^2) instead of O(N) (it's not hard to write O(N) code, but most people just don't realize the potential problem). The list goes on...<br />
<br />
''> WebM went one step further by mandating an index and<br/><br />
''> eliminating the Matroska bisection seek mechanism entirely.''<br />
<br />
Which they then walked back when they realized they wanted live streaming support, too. But that doesn't stop lots of things breaking on indexless files.<br />
<br />
''> The index can be grabbed from the stream tail<br/><br />
''> asynchronously.''<br />
<br />
The likelihood of this getting implemented is very low. Every single media framework wants to know two things '''before''' it starts playing: is the file seekable, and how long is it. This affects the controls presented, the playback time display, etc. Nobody is prepared for information about either of these points to change during the middle of playback.<br />
<br />
If you need an index to seek, you need to check if it's there before you say you can seek (because it's often missing or damaged, especially if it's at the end of a file). If the index is at the end, it just makes this more expensive. The only real advantage is that you can write the file in a single pass.</div>Derfhttps://wiki.xiph.org/index.php?title=Opus-1.0.2&diff=13794Opus-1.0.22012-12-06T00:30:34Z<p>Derf: /* Quality-impacting */</p>
<hr />
<div>Opus 1.0.2 fixes an out-of-bounds read that could be triggered by a malicious Opus packet causing an integer wrap-around in the padding code. Considering that the packet would have to be at least 16 MB in size and that no out-of-bounds write is possible, the severity is very low. This new release also has the following changes:<br />
<br />
== Quality-impacting ==<br />
* Changed the behaviour of the PLC to always fill the caller's buffer<br />
* Properly decode in-band FEC for packets with mutiple Opus frames<br />
* Hybrid mode quality improvements and fixes<br />
* Fixed bugs in the CELT mode PLC<br />
* Redundant mode transition fixes<br />
<br />
== Other minor changes ==<br />
* Stack reduction<br />
* Doc fixes (many)<br />
* 16-bit fixes<br />
* Misc build fixes<br />
* New API calls: OPUS_GET_LAST_PACKET_DURATION ctl() and opus_packet_get_nb_samples()<br />
* Minor code cleanup</div>Derfhttps://wiki.xiph.org/index.php?title=OpusTodo&diff=13766OpusTodo2012-11-15T01:38:28Z<p>Derf: /* Future work */</p>
<hr />
<div>== Spec ==<br />
* Ogg mapping<br />
* Matroska mapping. See: [[MatroskaOpus]]<br />
* RTP payload format<br />
<br />
== Website ==<br />
* De-uglify webpage<br />
* Promotional material<br />
<br />
== Other ==<br />
<br />
* Oggz-validate (should also validate opus toc)<br />
<br />
== Opus-tools ==<br />
* A simple real time streaming example tool<br />
* Replaygain (half done— needs a gain tool)<br />
<br />
== Future work ==<br />
* Smart automatic mode decision<br />
* psymodel based VBR<br />
* Remove copy in inverse MDCT<br />
* Save some float<->int conversions<br />
* Improvements to LP mode CBR (greg has some code)</div>Derfhttps://wiki.xiph.org/index.php?title=MatroskaOpus&diff=13517MatroskaOpus2012-07-05T21:37:35Z<p>Derf: Document some more of the outstanding issues with the Matroska mapping</p>
<hr />
<div>== '''DRAFT''' ==<br />
<br />
This is an encapsulation spec for the [[Opus]] codec in [[http://matroska.org/ Matroska]]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.<br />
<br />
Opus has few signaling requirements, so a simple mapping:<br />
<br />
- CodecID is A_OPUS<br />
- SampleFrequency is always 48000<br />
- Channels is 1 or 2 based on what the muxer knows about the input<br />
- CodecPrivate is void.<br />
<br />
However, this doesn't work for multistream. Supporting multistream requires signalling the number of Opus streams packed in each frame and the mapping from those to output channels through the container.<br />
<br />
To support multistream, we place the complete 'OpusHead' header packet from [[OggOpus]], as defined there, in the CodecPrivate element. This provides the number of streams and the channel mapping table, as well as related features like pre-skip and gain which improve the chances of lossless remuxing between the two encapsulations.<br />
<br />
- CodecID is A_OPUS<br />
- SampleFrequecy is 48000<br />
- Channels is number of output PCM channels<br />
- CodecPrivate is the 'OpusHead' packet, identical to the Ogg mapping<br />
<br />
The second 'OpusTags' header packet from OggOpus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.<br />
<br />
If the CodecPrivate is empty, players should treat it as the simpler mapping, I guess.<br />
<br />
== Questions ==<br />
<br />
Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?<br />
<br />
How does the OpusHead pre-skip field interact with the timestamps? The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero.<br />
<br />
One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped <= start of output this shouldn't affect seeking.<br />
<br />
How important is it that timestamps start at zero in a Matroska file?<br />
<br />
The SimpleBlock structure also has an 'invisible' bit, which tell the player to decode, but not display, the contained frames. This lets the muxer signal the pre-skip semantics with frame accuracy, but not sample accuracy. If players implement this it will at least help with sync. Libav does not appear to support the invisible bit.<br />
<br />
Seeking in Opus files requires a pre-roll (recommended to be at least 80 ms). However, currently Matroska requires its index entries to point directly to the data whose timestamp matches the corresponding seek point, not some place arbitrarily before that timestamp. These two requirements are incompatible, and mean that seeking in Opus will be broken in all existing Matroska software. In particularly unlucky cases (e.g., around a transient), playing back audio decoded without any pre-roll can produce extremely loud (possibly equipment-damaging) results.<br />
<br />
How can sample-accurate end-time trimming work in Matroska? Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus.</div>Derfhttps://wiki.xiph.org/index.php?title=MIMETypesCodecs&diff=13509MIMETypesCodecs2012-07-03T16:19:46Z<p>Derf: Add Opus to the list of allowed codecs for Ogg mime types.</p>
<hr />
<div>== Specification of MIME types and respective codecs parameter ==<br />
<br />
Also includes a specification of the recommended file extensions to use with Ogg.<br />
<br />
=== MIME Types ===<br />
<br />
The following MIME types are now officially registered with IANA and specified with the IETF as [http://www.ietf.org/rfc/rfc5334.txt RFC 5334]:<br />
<br />
* video/ogg - for video (with audio) encapsulated in Ogg<br />
** recommends a Skeleton logical bitstrem<br />
** .ogv file extension<br />
** Macintosh File Type Code: OggV<br />
<br />
* audio/ogg - for audio encapsulated in Ogg<br />
** recommends a Skeleton logical bitstrem<br />
** .oga file extension, .ogg for Vorbis I, .spx for Speex<br />
** Macintosh File Type Code: OggA<br />
<br />
* application/ogg - for complex, multitrack, multiplexed files encapsulated in Ogg<br />
** requires a Skeleton logical bitstream<br />
** .ogx file extension<br />
** Macintosh File Type Code: OggX<br />
<br />
<br />
[[MIME_Types_and_File_Extensions|Other MIME types]] are still in the process.<br />
<br />
=== Codecs Parameter ===<br />
<br />
[http://www.rfc-editor.org/rfc/rfc4281.txt Typically], MIME types of media encapsulation formats use the optional "codecs" parameter to specify which codes are being used in a particular file.<br />
<br />
Codecs encapsulated in Ogg require a text identifier at the beginning of the first header page to identify the encapsulated codecs. The following table contains the identifiers for existing Xiph codecs and the codecs parameter names used for */ogg MIME types (in alphabetical order):<br />
<br />
{| class="codecstable" border="1"<br />
|-<br />
! Codecs Parameter Name<br />
! Codec Type<br />
! Codec Identifier<br />
(decimal, hex, octal)<br />
! Version Field (if available)<br />
|-<br />
| [http://svn.annodex.net/liboggz/trunk/src/liboggz/oggz_auto.h celt]<br />
| audio<br />
| char[0,8]: <tt>'CELT\ \ \ \ '</tt><br />
hex: <tt>'0x43 0x45 0x4c 0x54 0x20 0x20 0x20 0x20'</tt><br />
<br />
oct: <tt>'0103 0105 0114 0124 0040 0040 0040 0040'</tt><br />
| char[28,4]: version id<br />
|-<br />
| [http://svn.annodex.net/liboggz/trunk/src/liboggz/oggz_auto.h cmml]<br />
| text<br />
| char[0,8]: <tt>'CMML\0\0\0\0'</tt><br />
hex: <tt>'0x43 0x4d 0x4d 0x4c 0x00 0x00 0x00 0x00'</tt><br />
<br />
oct: <tt>'0103 0115 0115 0114 0000 0000 0000 0000'</tt><br />
| char[8,2]: major version number,<br />
char[10,2]: minor version number<br />
|-<br />
| [http://wiki.xiph.org/index.php/OggDirac dirac]<br />
| video<br />
| char[0,5]: <tt>'BBCD\0'</tt><br />
hex: <tt>'0x42 0x42 0x43 0x44 0x00'</tt><br />
<br />
oct: <tt>'0102 0102 0103 0104 0000'</tt><br />
| ??<br />
|-<br />
| [http://flac.sourceforge.net/ogg_mapping.html flac]<br />
| audio<br />
| char[0,5]: <tt>'\177FLAC'</tt><br />
hex: <tt>'0x7F 0x46 0x4C 0x41 0x43'</tt><br />
<br />
oct: <tt>'0177 0106 0114 0101 0103'</tt><br />
| char[5,1]: binary major version number, <br />
char[6,1]: binary minor version number of mapping<br />
|-<br />
| [[OggMNG|jng]]<br />
| video<br />
| char[0,8]: <tt>'\213JNG\r\n\032\n'</tt><br />
hex: <tt>'0x8b 0x4a 0x4e 0x47 0x0D 0x0A 0x1A 0x0A'</tt><br />
<br />
oct: <tt>'0213 0112 0116 0107 0015 0012 0032 0012'</tt><br />
| ??<br />
|-<br />
| [[OggKate|kate]]<br />
| text<br />
| char[0,8]: <tt>'\x80kate\0\0\0'</tt><br />
hex: <tt>'0x80 0x6b 0x61 0x74 0x65 0x00 0x00 0x00'</tt><br />
<br />
oct: <tt>'0200 0153 0141 0164 0145 0000 0000 0000'</tt><br />
| char[9,1]: major version number,<br />
char[10,1]: minor version number<br />
|-<br />
| [http://lists.xiph.org/pipermail/vorbis-dev/2001-August/004501.html midi]<br />
| text<br />
| char[0,8]: <tt>'OggMIDI\0'</tt><br />
hex: <tt>'0x4f 0x67 0x67 0x4d 0x49 0x44 0x49 0x00'</tt><br />
<br />
oct: <tt>'0117 0147 0147 0115 0111 0104 0111 0000'</tt><br />
| char[8,1]: version field<br />
|-<br />
| [[OggMNG|mng]]<br />
| video<br />
| char[0,8]: <tt>'\212MNG\r\n\032\n'</tt><br />
hex: <tt>'0x8a 0x4d 0x4e 0x47 0x0D 0x0A 0x1A 0x0A'</tt><br />
<br />
oct: <tt>'0212 0115 0116 0107 0015 0012 0032 0012'</tt><br />
| ??<br />
|-<br />
| [[OggOpus|opus]]<br />
| audio<br />
| char[0,8]: <tt>'OpusHead'</tt><br />
hex: <tt>'0x4f 0x70 0x75 0x73 0x48 0x65 0x61 0x64'</tt><br />
<br />
oct: <tt>'0117 0160 0165 0163 0110 0150 0141 0145 1044'</tt><br />
| char[8,1]: version field<br />
|-<br />
| [[OggPCM|pcm]]<br />
| audio<br />
| char[0,8]: <tt>'PCM\ \ \ \ \ '</tt><br />
hex: <tt>'0x50 0x43 0x4d 0x20 0x20 0x20 0x20 0x20'</tt><br />
<br />
oct: <tt>'0120 0103 0115 0040 0040 0040 0040 0040'</tt><br />
| char[8,2]: version major field,<br />
char[10,2]: version minor field<br />
|-<br />
| [[OggMNG|png]]<br />
| video<br />
| char[0,8]: <tt>'\211PNG\r\n\032\n'</tt><br />
hex: <tt>'0x89 0x50 0x4e 0x47 0x0D 0x0A 0x1A 0x0A'</tt><br />
<br />
oct: <tt>'0211 0120 0116 0107 0015 0012 0032 0012'</tt><br />
| ??<br />
|-<br />
| [http://svn.annodex.net/liboggz/trunk/src/liboggz/oggz_auto.h speex]<br />
| audio<br />
| char[0,8]: <tt>'Speex\ \ \ '</tt><br />
hex: <tt>'0x53 0x70 0x65 0x65 0x78 0x20 0x20 0x20'</tt><br />
<br />
oct: <tt>'0123 0160 0145 0145 0170 0040 0040 0040'</tt><br />
| char[28,4]: version id<br />
|-<br />
| [http://svn.annodex.net/liboggz/trunk/src/liboggz/oggz_auto.h theora]<br />
| video<br />
| char[0,7]: <tt>'\x80theora'</tt><br />
hex: <tt>'0x80 0x74 0x68 0x65 0x6f 0x72 0x61'</tt><br />
<br />
oct: <tt>'0180 0164 0150 0145 0157 0162 0141'</tt><br />
| char[7,1]: major version number,<br />
char[8,1]: minor version number,<br />
<br />
char[9,1]: version revision number<br />
|-<br />
| [http://svn.annodex.net/liboggz/trunk/src/liboggz/oggz_auto.h vorbis]<br />
| audio<br />
| char[0,7]: <tt>'\x01vorbis'</tt><br />
hex: <tt>'0x01 0x76 0x6f 0x72 0x62 0x69 0x73'</tt><br />
<br />
oct: <tt>'0001 0166 0157 0162 0142 0151 0163'</tt><br />
| char[7,4]: version field<br />
|-<br />
| [[OggYUV4MPEG|yuv4mpeg]]<br />
| video<br />
| char[0,8]: <tt>'YUV4MPEG'</tt><br />
hex: <tt>'0x59 0x55 0x56 0x34 0x4d 0x50 0x45 0x47'</tt><br />
<br />
oct: <tt>'0131 0125 0126 0064 0115 0120 0105 0107'</tt><br />
| char[8,1] = '2' (0x32) for yuv4mpeg format version 2<br />
|}<br />
<br />
The "char[x,y]" fields mean here: start at byte number x (counting from 0) for a length of y bytes.<br />
<br />
[[Category:Ogg]]</div>Derfhttps://wiki.xiph.org/index.php?title=OpusFAQ&diff=13428OpusFAQ2012-05-25T03:28:52Z<p>Derf: </p>
<hr />
<div>[[Image:Opus logo trans.png|right]]<br />
<br />
== General Questions ==<br />
<br />
=== What is Opus? ===<br />
<br />
Opus is a format for compressing audio to efficiently transmit it across networks. Opus allows for music-grade high quality audio at low data rates while not delaying the signal much. Opus is distinguished from most formats for high quality audio (AAC, Vorbis, MP3) by having low delay and it is distinguished from most low delay formats (G.711, GSM, Speex) by supporting high audio quality.<br />
<br />
=== Who created Opus? ===<br />
<br />
Opus was created by combining Xiph.Org's CELT development codec and Skype's SILK codec as part of a cooperative effort in the IETF codec working group. Opus has been in development since early 2007, and <!-- as of ???? is an IETF proposed standard: '''RFC TBD''' --> has been recommended by the working group for promotion as a proposed standard.<br />
<br />
=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===<br />
<br />
No. The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator" and even sharing code between Opus and the "old SILK" would be highly non-trivial.<br />
<br />
=== How do I use Opus / What programs use Opus? ===<br />
<br />
=== What are the licensing requirements? ===<br />
<br />
The reference Opus source code is released under the BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some conditions specified in the license are met. <br />
<br />
Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with most (all?) open source licenses, including the GPL (v2 and v3).<br />
<br />
=== How does the quality of Opus compare to other codecs? ===<br />
<br />
== Opus for Software developers ==<br />
<br />
=== What is difference between supporting Opus and supporting Speex/G.711/MP3? ===<br />
<br />
Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are any multiple of 2.5ms up to a maximum of 120ms. <br />
<br />
The opus encoder and decoder do not need to have matched sampling rates, bandwidths, or channel counts. Its recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.<br />
<br />
=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===<br />
<br />
<br />
The inband FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.<br />
<br />
In order to make use of inband FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.<br />
<br />
FEC is only used by the encoder under certain conditions: the feature must be enabled via the OPUS_SET_INBAND_FEC CTL, the encoder must be told to expect loss via the OPUS_SET_PACKET_LOSS_PERC CTL, and the codec must be operated in any of the linear prediction or Hybrid modes. Frame durations of <10ms and very high bitrates will use the MDCT modes, where FEC is not available.<br />
<br />
Even when FEC is not used, telling the encoder about the level of loss will help it make more intelligent decisions. By default the implementation assumes there is no loss.<br />
<br />
=== What is Opus Custom? ===<br />
<br />
Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms. Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems. For these reasons, its use is discouraged outside of very specific applications, e.g.:<br />
* ultra low delay applications where synchronization with the soundcard buffer is important. <br />
* low-power embedded applications where compatibility with others is not important.<br />
<br />
For almost all other types of applications, Opus Custom should not be used.<br />
<br />
=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===<br />
<br />
Tools which read or write Opus should inter-work with other sampling rate by transparently performing sample rate conversion behind the scenes. It's generally preferable to run the output at 48kHz even when you know the original input was 44.1kHz because many inexpensive audio interfaces have poor quality output for 44.1k.<br />
<br />
In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.<br />
<br />
=== Which implementation should I use? ===<br />
<br />
While the implementation in the latest IETF draft (and eventually RFC) of Opus is what ''defines'' the standard, it is likely not the best and most up-to-date implementation. The [http://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation— in terms of speed, encoding quality, device compatibility, etc— while still conforming to the standard. All Opus implementations are compatible by definition.</div>Derfhttps://wiki.xiph.org/index.php?title=OpusFAQ&diff=13423OpusFAQ2012-05-25T02:51:33Z<p>Derf: </p>
<hr />
<div>[[Image:Opus logo trans.png|right]]<br />
<br />
=== What is Opus? ===<br />
<br />
Opus is a format for compressing audio to efficiently transmit it across networks. Opus allows for music-grade high quality audio at low data rates while not delaying the signal much. Opus is distinguished from most formats for high quality audio (AAC, Vorbis, MP3) by having low delay and it is distinguished from most low delay formats (G.711, GSM, Speex) by supporting high audio quality.<br />
<br />
=== Who created Opus? ===<br />
<br />
Opus was created by combining Xiph.Org's CELT development codec and Skype's SILK codec as part of a public cooperation in the IETF codec working group. Opus has been in development since early 2007, and <!-- as of ???? is an IETF proposed standard: '''RFC TBD''' --> has been recommended by the working group for promotion as a proposed standard.<br />
<br />
=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===<br />
<br />
No. The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator" and even sharing code between Opus and the "old SILK" would be highly non-trivial.<br />
<br />
=== What are the licensing requirements? ===<br />
<br />
The reference Opus source code is released under the BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some conditions specified in the license are met. <br />
<br />
Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with most (all?) open source licenses, including the GPL (v2 and v3).<br />
<br />
=== How does the quality of Opus compare to other codecs? ===<br />
<br />
=== What is Opus Custom? ===<br />
<br />
Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms. Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems. For these reasons, its use is discouraged outside of very specific applications, e.g.:<br />
* ultra low delay applications where synchronization with the soundcard buffer is important. <br />
* low-power embedded applications where compatibility with others is not important.<br />
<br />
For almost all other types of applications, Opus Custom should not be used.<br />
<br />
=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===<br />
<br />
Tools which read or write Opus should inter-work with other sampling rate by transparently performing sample rate conversion behind the scenes.<br />
<br />
Software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.<br />
<br />
=== Which implementation should I get? ===<br />
<br />
While the implementation in the latest IETF draft (and eventually RFC) of Opus is what defines the standard, it is likely not the best and most up-to-date implementation. The [http://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation while still conforming to the standard. All Opus implementations are compatible but better Opus encoders can deliver higher quality audio and some implementations may be faster or slower or work better on different platforms.</div>Derfhttps://wiki.xiph.org/index.php?title=OpusFAQ&diff=13418OpusFAQ2012-05-24T20:58:35Z<p>Derf: </p>
<hr />
<div>=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===<br />
<br />
No. The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator" and even sharing code between Opus and the "old SILK" would be highly non-trivial.<br />
<br />
=== What is Opus Custom? ===<br />
<br />
Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms. Opus Custom requires additional out-of-band signalling that Opus does not normally require. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems. For these reasons, its use is discouraged outside of very specific applications, e.g.:<br />
* ultra low delay applications where synchronization with the soundcard buffer is important. <br />
* low-power embedded applications where compatibility with others is not important.<br />
<br />
For almost all other types of applications, Opus Custom should not be used.<br />
<br />
=== How do I use 44.1 kHz or some other sampling rate not supported by Opus? ===<br />
<br />
In the vast majority of cases, the best way to support other sampling rates is to perform sample rate conversion to 48 kHz. You should not use Opus Custom just for 44.1 kHz support, except in the very specific circumstances outlined above.<br />
<br />
=== What are the licensing requirements? ===<br />
<br />
The Opus source code is released under the BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some conditions specified in the license are met. <br />
<br />
Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with most (all?) open source licenses, including the GPL (v2 and v3).<br />
<br />
=== How does the quality of Opus compare to other codecs? ===<br />
<br />
=== Which implementation should I get? ===<br />
<br />
While the implementation in the latest IETF draft (and eventually RFC) of Opus is what defines the standard, it is likely not the best and most up-to-date implementation. The [http://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation while still conforming to the standard.</div>Derfhttps://wiki.xiph.org/index.php?title=TDLT&diff=13404TDLT2012-05-24T03:43:34Z<p>Derf: Add start of 16x32 results</p>
<hr />
<div>This page holds the results of Time Domain Lapped Transform (TDLT) optimization problems looking for integer transform coefficients that provide optimal coding gain. Wherever possible the assumptions are stated. Later we should include testing against actual image data to verify the results (see test data [http://people.xiph.org/~tterribe/tmp/subset1-y4m.tar.gz here]).<br />
<br />
The coding gain objective used as the objective is taken from slide 13 of Tim's presentation [http://people.xiph.org/~tterribe/pubs/lca2012/auckland/intro_to_video1.pdf An Introduction to Video Coding]<br />
<br />
<need figure with block matrix diagrams><br />
<br />
The free parameters are initially just the coefficients p_0,...,p_m,q_0,...,q_m where m=(n/2)-1. We limit these to being dyadic rationals, e.g., x/2^d with d=6, between [-1,1].<br />
<br />
Given p's and q's and assuming a linear ramp constrains the s's.<br />
<br />
== 4x8 ==<br />
<br />
Optimal real-valued coefficients for V:<br />
<br />
p0 = -0.18117338915051454<br />
<br />
q0 = 0.6331818230771687<br />
<br />
CG = 8.60603<br />
<br />
{|<br />
!<br />
!p0<br />
!q0<br />
!s0<br />
!s1<br />
!CG<br />
|-<br />
|R=f<br />
| -11/64<br>-0.171875<br />
| 36/64<br>0.5625<br />
| 91/64<br>1.421875<br />
| 85/64<br>1.328125<br />
| &nbsp;<br>8.63473<br />
|-<br />
|R=t,D=f<br />
| -12/64<br>-0.1875<br />
| 41/64<br>0.640625<br />
| 92/64<br>1.4375<br />
| 1093/768<br>1.423177<br />
| &nbsp;<br>8.60486<br />
|-<br />
|R=t,D=t<br />
| -16/64<br>-0.25<br />
| 41/64<br>0.640625<br />
| 92/64<br>1.4375<br />
| 93/64<br>1.453125<br />
| &nbsp;<br>8.59886<br />
|}<br />
<br />
== 8x16 ==<br />
<br />
Optimal real-valued coefficients for V:<br />
<br />
p0 = -0.39460731547057293<br />
<br />
p1 = -0.33002212811740816<br />
<br />
p2 = -0.12391270981321137<br />
<br />
q0 = 0.822154737511288<br />
<br />
q1 = 0.632488694485779<br />
<br />
q2 = 0.40214668677553894<br />
<br />
CG = 9.56867<br />
<br />
{|<br />
!<br />
!p0<br />
!p1<br />
!p2<br />
!q0<br />
!q1<br />
!q2<br />
!s0<br />
!s1<br />
!s2<br />
!s3<br />
!CG<br />
|-<br />
|R=f<br />
|-<br />
|R=t,D=f<br />
| -26/64<br>-0.40625<br />
| -22/64<br>-0.34375<br />
| -8/64<br>-0.125<br />
| 53/64<br>0.828125<br />
| 41/64<br>0.640625<br />
| 26/64<br>0.40625<br />
| 11/8<br>1.375<br />
| 879/768<br>1.14453125<br />
| 1469/1280<br>1.14765625<br />
| 275/224<br>1.2276785714285714<br />
| &nbsp;<br>9.56627<br />
|-<br />
|R=t,D=t<br />
| -24/64<br>-0.375<br />
| -20/64<br>-0.3125<br />
| -4/64<br>-0.0625<br />
| 53/64<br>0.828125<br />
| 40/64<br>0.625<br />
| 24/64<br>0.375<br />
| 88/64<br>1.375<br />
| 75/64<br>1.171875<br />
| 76/64<br>1.1875<br />
| 76/64<br>1.1875<br />
| &nbsp;<br>9.56161<br />
|}<br />
<br />
== 16x32 ==<br />
<br />
Best-known real-valued coefficients for V (R=t):<br />
<br />
p0 = -0.42111473798940136<br />
<br />
p1 = -0.4121736499899753<br />
<br />
p2 = -0.3350240707669929<br />
<br />
p3 = -0.3224547931861314<br />
<br />
p4 = -0.25883387978005545<br />
<br />
p5 = -0.20951913473498104<br />
<br />
p6 = -0.0598657149803332<br />
<br />
q0 = 0.9107782439906195<br />
<br />
q1 = 0.8109855829278226<br />
<br />
q2 = 0.715846584586721<br />
<br />
q3 = 0.6135951570714172<br />
<br />
q4 = 0.49846644853347627<br />
<br />
q5 = 0.3945215834922529<br />
<br />
q6 = 0.21822275136248082<br />
<br />
CG = 9.81157<br />
<br />
{|<br />
!<br />
!p0<br />
!p1<br />
!p2<br />
!p3<br />
!p4<br />
!p5<br />
!p6<br />
!q0<br />
!q1<br />
!q2<br />
!q3<br />
!q4<br />
!q5<br />
!q6<br />
!s0<br />
!s1<br />
!s2<br />
!s3<br />
!s4<br />
!s5<br />
!s6<br />
!s7<br />
!CG<br />
|-<br />
|R=f<br />
|-<br />
|R=t,D=f<br />
| -26/64<br>-0.40625<br />
| -23/64<br>-0.359375<br />
| -20/64<br>-0.3125<br />
| -18/64<br>-0.28125<br />
| -14/64<br>-0.21875<br />
| -14/64<br>-0.21875<br />
| -2/64<br>-0.03125<br />
| 58/64<br>0.90625<br />
| 52/64<br>0.8125<br />
| 45/64<br>0.703125<br />
| 36/64<br>0.5625<br />
| 31/64<br>0.484375<br />
| 23/64<br>0.359375<br />
| 16/64<br>0.25<br />
| 3/2<br>1.5<br />
| 77/64<br>1.20313<br />
| 373/320<br>1.16563<br />
| 543/448<br>1.21205<br />
| 109/96<br>1.13542<br />
| 1543/1408<br>1.09588<br />
| 1823/1664<br>1.09555<br />
| 131/120<br>1.09167<br />
| &nbsp;<br>9.79008<br />
|-<br />
|R=t,D=t<br />
|}</div>Derfhttps://wiki.xiph.org/index.php?title=TDLT&diff=13380TDLT2012-05-12T22:00:08Z<p>Derf: </p>
<hr />
<div>This page holds the results of Time Domain Lapped Transform (TDLT) optimization problems looking for integer transform coefficients that provide optimal coding gain. Wherever possible the assumptions are stated. Later we should include testing against actual image data to verify the results (see test data [http://people.xiph.org/~tterribe/tmp/subset1-y4m.tar.gz here]).<br />
<br />
The coding gain objective used as the objective is taken from slide 13 of Tim's presentation [http://people.xiph.org/~tterribe/pubs/lca2012/auckland/intro_to_video1.pdf An Introduction to Video Coding]<br />
<br />
<need figure with block matrix diagrams><br />
<br />
The free parameters are initially just the coefficients p_0,...,p_m,q_0,...,q_m where m=(n/2)-1. We limit these to being dyadic rationals, e.g., x/2^d with d=6, between [-1,1].<br />
<br />
Given p's and q's and assuming a linear ramp constrains the s's.<br />
<br />
== 4x8 ==<br />
<br />
s0 = 4*(1-q0)<br />
s1 = 4*(1-p0*(1-q0))/3<br />
<br />
Optimal real-valued coefficients for V:<br />
<br />
p0 = -0.18117338915051454<br />
<br />
q0 = 0.6331818230771687<br />
<br />
CG = 8.60603<br />
<br />
Optimal integer-valued coefficients (d=6) for V:<br />
<br />
p0 = -12/64 = -0.1875<br />
<br />
q0 = 41/64 = 0.640625<br />
<br />
CG = 8.60486<br />
<br />
Optimal integer-valued coefficients (d=6) were (1-p0*(1-q0)) is divisible by 3:<br />
<br />
p0 = -13/64 = -0.203125<br />
<br />
q0 = 41/64 = 0.640625<br />
<br />
CG = 8.60446<br />
<br />
== 8x16 ==<br />
<br />
Optimal real-valued coefficients for V:<br />
<br />
p0 = -0.39460731547057293<br />
<br />
p1 = -0.33002212811740816<br />
<br />
p2 = -0.12391270981321137<br />
<br />
q0 = 0.822154737511288<br />
<br />
q1 = 0.632488694485779<br />
<br />
q2 = 0.40214668677553894<br />
<br />
CG = 9.56867<br />
<br />
Optimal [maybe] integer-valued coefficients (d=6) for V:<br />
<br />
p0 = -26/64 = -0.40625<br />
<br />
p1 = -22/64 = -0.34375<br />
<br />
p2 = -8/64 = -0.125<br />
<br />
q0 = 53/64 = 0.828125<br />
<br />
q1 = 41/64 = 0.640625<br />
<br />
q2 = 26/64 = 0.40625<br />
<br />
9.56627</div>Derfhttps://wiki.xiph.org/index.php?title=TDLT&diff=13379TDLT2012-05-12T21:51:02Z<p>Derf: /* 8x16 */</p>
<hr />
<div>This page holds the results of Time Domain Lapped Transform (TDLT) optimization problems looking for integer transform coefficients that provide optimal coding gain. Wherever possible the assumptions are stated. Later we should include testing against actual image data to verify the results (see test data [http://people.xiph.org/~tterribe/tmp/subset1-y4m.tar.gz here]).<br />
<br />
The coding gain objective used as the objective is taken from slide 13 of Tim's presentation [http://people.xiph.org/~tterribe/pubs/lca2012/auckland/intro_to_video1.pdf An Introduction to Video Coding]<br />
<br />
<need figure with block matrix diagrams><br />
<br />
The free parameters are initially just the coefficients p_0,...,p_m,q_0,...,q_m where m=(n/2)-1. We limit these to being dyadic rationals, e.g., x/2^d with d=6, between [-1,1].<br />
<br />
Given p's and q's and assuming a linear ramp constrains the s's.<br />
<br />
== 4x8 ==<br />
<br />
s0 = 4*(1-q0)<br />
s1 = 4*(1-p0*(1-q0))/3<br />
<br />
Optimal real-valued coefficients for V:<br />
<br />
p0 = -0.18117338915051454<br />
<br />
q0 = 0.6331818230771687<br />
<br />
CG = 8.60603<br />
<br />
Optimal integer-valued coefficients (d=6) for V:<br />
<br />
p0 = -12/64 = -0.1875<br />
<br />
q0 = 41/64 = 0.640625<br />
<br />
CG = 8.60486<br />
<br />
Optimal integer-valued coefficients (d=6) were (1-p0*(1-q0)) is divisible by 3:<br />
<br />
p0 = -13/64 = -0.203125<br />
<br />
q0 = 41/64 = 0.640625<br />
<br />
CG = 8.60446<br />
<br />
== 8x16 ==<br />
<br />
Optimal real-valued coefficients for V:<br />
<br />
p0 = -0.4045289182698497<br />
<br />
p1 = -0.33468137871117265<br />
<br />
p2 = -0.12545831989246597<br />
<br />
q0 = 0.8222196599456361<br />
<br />
q1 = 0.634500829289973<br />
<br />
q2 = 0.4034606713167927<br />
<br />
CG = 9.56856<br />
<br />
Optimal [maybe] integer-valued coefficients (d=6) for V:<br />
<br />
p0 = -26/64 = -0.40625<br />
<br />
p1 = -22/64 = -0.34375<br />
<br />
p2 = -8/64 = -0.125<br />
<br />
q0 = 53/64 = 0.828125<br />
<br />
q1 = 41/64 = 0.640625<br />
<br />
q2 = 26/64 = 0.40625<br />
<br />
9.56627</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13374OggOpus2012-05-10T17:29:36Z<p>Derf: Upgrade pre-skip recommendation for cropping to RFC 2119 strength.</p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first audio packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below.<br />
<br />
All other pages with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer can assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 1 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
Brief description of each field:<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): 0x01 for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Detailed definition of each field:<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be '1' for this version of the encapsulation specification. This 8-bit field is partitioned into two sub-fields. The lower four bits constitute a "minor revision", while the upper four bits correspond to a "major revision". I.e., the current major revision is '0', and the current minor revision is '1'. Implementations SHOULD treat streams with an unknown minor revision as backwards-compatible as long as they recognize the major revision. When encountering a stream with an unknown major revision, implementations SHOULD assume it is not backwards compatible. The initial version '1' was chosen to keep implementations from relying on this byte as a null terminator for the OpusHead string.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, a pre-skip of at least 3840 samples (80 ms) is RECOMMENDED to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment Header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Encoders SHOULD use no more padding than required to make a variable bitrate (VBR) stream constant bitrate (CBR). Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message.<br />
<br />
In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of (61,298*N - 2) bytes, or about 60 kB per Opus stream. With 255 streams, this is 15,630,988 bytes (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 streams, each containing 120 ms of audio encoded as 2.5 ms frames, each frame using the maximum possible number of bytes (1275) and stored in the least efficient manner allowed (a VBR code 3 Opus packet). Even in such a packet, most of the data will be zeros, as 2.5 ms frames, which are required to run in the MDCT mode, cannot actually use all 1275 bytes. The largest packet consisting entirely of useful data is (15,326*N - 2) bytes, or about 15 kB per stream. This corresponds to 120 ms of audio encoded as 10 ms frames in either LP or Hybrid mode, but at a data rate of over 1 Mbps, which makes little sense for the quality achieved. A more reasonable limit is (7,664*N - 2) bytes, or about 7.5 kB per stream. This corresponds to 120 ms of audio encoded as 20 ms stereo MDCT-mode frames, with a total bitrate just under 511 kbps (not counting the Ogg encapsulation overhead). With N=8, the maximum useful number of streams for the channel meanings currently defined by mapping family 1, this gives a maximum packet size of 61,310 bytes, or just under 60 kB. This is still quite conservative, as it assumes each output channel is taken from one decoded channel of a stereo packet. An implementation could reasonably choose any of these numbers for its internal limits.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13373OggOpus2012-05-10T17:25:32Z<p>Derf: </p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first audio packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below.<br />
<br />
All other pages with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer can assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 1 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
Brief description of each field:<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): 0x01 for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Detailed definition of each field:<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be '1' for this version of the encapsulation specification. This 8-bit field is partitioned into two sub-fields. The lower four bits constitute a "minor revision", while the upper four bits correspond to a "major revision". I.e., the current major revision is '0', and the current minor revision is '1'. Implementations SHOULD treat streams with an unknown minor revision as backwards-compatible as long as they recognize the major revision. When encountering a stream with an unknown major revision, implementations SHOULD assume it is not backwards compatible. The initial version '1' was chosen to keep implementations from relying on this byte as a null terminator for the OpusHead string.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, we recommend a pre-skip of at least 3840 samples (80 ms) to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment Header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Encoders SHOULD use no more padding than required to make a variable bitrate (VBR) stream constant bitrate (CBR). Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message.<br />
<br />
In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of (61,298*N - 2) bytes, or about 60 kB per Opus stream. With 255 streams, this is 15,630,988 bytes (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 streams, each containing 120 ms of audio encoded as 2.5 ms frames, each frame using the maximum possible number of bytes (1275) and stored in the least efficient manner allowed (a VBR code 3 Opus packet). Even in such a packet, most of the data will be zeros, as 2.5 ms frames, which are required to run in the MDCT mode, cannot actually use all 1275 bytes. The largest packet consisting entirely of useful data is (15,326*N - 2) bytes, or about 15 kB per stream. This corresponds to 120 ms of audio encoded as 10 ms frames in either LP or Hybrid mode, but at a data rate of over 1 Mbps, which makes little sense for the quality achieved. A more reasonable limit is (7,664*N - 2) bytes, or about 7.5 kB per stream. This corresponds to 120 ms of audio encoded as 20 ms stereo MDCT-mode frames, with a total bitrate just under 511 kbps (not counting the Ogg encapsulation overhead). With N=8, the maximum useful number of streams for the channel meanings currently defined by mapping family 1, this gives a maximum packet size of 61,310 bytes, or just under 60 kB. This is still quite conservative, as it assumes each output channel is taken from one decoded channel of a stereo packet. An implementation could reasonably choose any of these numbers for its internal limits.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13370OggOpus2012-05-10T17:15:21Z<p>Derf: </p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below.<br />
<br />
All other pages with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer can assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 0 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): zero for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Some discussion is in order.<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be '1' for this version of the encapsulation specification. This 8-bit field is partitioned into two sub-fields. The lower four bits constitute a "minor revision", while the upper four bits correspond to a "major revision". Implementations SHOULD treat streams with an unknown minor revision as backwards-compatible as long as they recognize the major revision. When encountering a stream with an unknown major revision, implementations SHOULD assume it is not backwards compatible. The initial version '1' was chosen to keep implementations from relying on this byte as a null terminator for the OpusHead string.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, we recommend a pre-skip of at least 3840 samples (80 ms) to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment Header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Encoders SHOULD use no more padding than required to make a variable bitrate (VBR) stream constant bitrate (CBR). Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message.<br />
<br />
In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of (61,298*N - 2) bytes, or about 60 kB per Opus stream. With 255 streams, this is 15,630,988 bytes (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 streams, each containing 120 ms of audio encoded as 2.5 ms frames, each frame using the maximum possible number of bytes (1275) and stored in the least efficient manner allowed (a VBR code 3 Opus packet). Even in such a packet, most of the data will be zeros, as 2.5 ms frames, which are required to run in the MDCT mode, cannot actually use all 1275 bytes. The largest packet consisting entirely of useful data is (15,326*N - 2) bytes, or about 15 kB per stream. This corresponds to 120 ms of audio encoded as 10 ms frames in either LP or Hybrid mode, but at a data rate of over 1 Mbps, which makes little sense for the quality achieved. A more reasonable limit is (7,664*N - 2) bytes, or about 7.5 kB per stream. This corresponds to 120 ms of audio encoded as 20 ms stereo MDCT-mode frames, with a total bitrate just under 511 kbps (not counting the Ogg encapsulation overhead). With N=8, the maximum useful number of streams for the channel meanings currently defined by mapping family 1, this gives a maximum packet size of 61,310 bytes, or just under 60 kB. This is still quite conservative, as it assumes each output channel is taken from one decoded channel of a stereo packet. An implementation could reasonably choose any of these numbers for its internal limits.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13368OggOpus2012-05-10T17:12:15Z<p>Derf: Recommend against excessive padding.</p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below.<br />
<br />
All other pages with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer can assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 0 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): zero for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Some discussion is in order.<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be '1' for this version of the encapsulation specification. This 8-bit field is partitioned into two sub-fields. The lower four bits constitute a "minor revision", while the upper four bits correspond to a "major revision". Implementations SHOULD treat streams with an unknown minor revision as backwards-compatible as long as they recognize the major revision. When encountering a stream with an unknown major revision, implementations SHOULD assume it is not backwards compatible. The initial version '1' was chosen to keep implementations from relying on this byte as a null terminator for the OpusHead string.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, we recommend a pre-skip of at least 3840 samples (80 ms) to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment Header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Encoders SHOULD use no more padding than required to make a VBR stream CBR. Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message.<br />
<br />
In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of (61,298*N - 2) bytes, or about 60 kB per Opus stream. With 255 streams, this is 15,630,988 bytes (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 streams, each containing 120 ms of audio encoded as 2.5 ms frames, each frame using the maximum possible number of bytes (1275) and stored in the least efficient manner allowed (a VBR code 3 Opus packet). Even in such a packet, most of the data will be zeros, as 2.5 ms frames, which are required to run in the MDCT mode, cannot actually use all 1275 bytes. The largest packet consisting entirely of useful data is (15,326*N - 2) bytes, or about 15 kB per stream. This corresponds to 120 ms of audio encoded as 10 ms frames in either LP or Hybrid mode, but at a data rate of over 1 Mbps, which makes little sense for the quality achieved. A more reasonable limit is (7,664*N - 2) bytes, or about 7.5 kB per stream. This corresponds to 120 ms of audio encoded as 20 ms stereo MDCT-mode frames, with a total bitrate just under 511 kbps (not counting the Ogg encapsulation overhead). With N=8, the maximum useful number of streams for the channel meanings currently defined by mapping family 1, this gives a maximum packet size of 61,310 bytes, or just under 60 kB. This is still quite conservative, as it assumes each output channel is taken from one decoded channel of a stereo packet. An implementation could reasonably choose any of these numbers for its internal limits.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13367OggOpus2012-05-10T17:09:43Z<p>Derf: </p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below.<br />
<br />
All other pages with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer can assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 0 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): zero for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Some discussion is in order.<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be '1' for this version of the encapsulation specification. This 8-bit field is partitioned into two sub-fields. The lower four bits constitute a "minor revision", while the upper four bits correspond to a "major revision". Implementations SHOULD treat streams with an unknown minor revision as backwards-compatible as long as they recognize the major revision. When encountering a stream with an unknown major revision, implementations SHOULD assume it is not backwards compatible. The initial version '1' was chosen to keep implementations from relying on this byte as a null terminator for the OpusHead string.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, we recommend a pre-skip of at least 3840 samples (80 ms) to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment Header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message.<br />
<br />
In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of (61,298*N - 2) bytes, or about 60 kB per Opus stream. With 255 streams, this is 15,630,988 bytes (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 streams, each containing 120 ms of audio encoded as 2.5 ms frames, each frame using the maximum possible number of bytes (1275) and stored in the least efficient manner allowed (a VBR code 3 Opus packet). Even in such a packet, most of the data will be zeros, as 2.5 ms frames, which are required to run in the MDCT mode, cannot actually use all 1275 bytes. The largest packet consisting entirely of useful data is (15,326*N - 2) bytes, or about 15 kB per stream. This corresponds to 120 ms of audio encoded as 10 ms frames in either LP or Hybrid mode, but at a data rate of over 1 Mbps, which makes little sense for the quality achieved. A more reasonable limit is (7,664*N - 2) bytes, or about 7.5 kB per stream. This corresponds to 120 ms of audio encoded as 20 ms stereo MDCT-mode frames, with a total bitrate just under 511 kbps (not counting the Ogg encapsulation overhead). With N=8, the maximum useful number of streams for the channel meanings currently defined by mapping family 1, this gives a maximum packet size of 61,310 bytes, or just under 60 kB. This is still quite conservative, as it assumes each output channel is taken from one decoded channel of a stereo packet. An implementation could reasonably choose any of these numbers for its internal limits.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13366OggOpus2012-05-10T16:59:49Z<p>Derf: Add more reasonable implementation guidance for packet size limits.</p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below.<br />
<br />
All other pages with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer can assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 0 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): zero for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Some discussion is in order.<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be '1' for this version of the encapsulation specification. This 8-bit field is partitioned into two sub-fields. The lower four bits constitute a "minor revision", while the upper four bits correspond to a "major revision". Implementations SHOULD treat streams with an unknown minor revision as backwards-compatible as long as they recognize the major revision. When encountering a stream with an unknown major revision, implementations SHOULD assume it is not backwards compatible. The initial version '1' was chosen to keep implementations from relying on this byte as a null terminator for the OpusHead string.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, we recommend a pre-skip of at least 3840 samples (80 ms) to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment Header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message.<br />
<br />
In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of (61,298*N - 2) bytes, or about 60 kB per Opus stream. With 255 streams, this is 15,630,988 bytes (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 streams, each containing 120 ms of audio encoded as 2.5 ms frames, each frame using the maximum possible number of bytes (1275) and stored in the least efficient manner allowed (a VBR code 3 Opus packet). Even in such a packet, most of the data will be zeros, as 2.5 ms frames, which are required to run in the MDCT mode, cannot actually use all 1275 bytes. The largest packet consisting entirely of useful data is (15,326*N - 2) bytes, or about 15 kB per stream. This corresponds to 120 ms of audio encoded as 10 ms frames in either LP or Hybrid mode, but at a data rate of over 1 Mbps, which makes little sense for the quality achieved. A more reasonable limit is (7,664*N - 2) bytes, or about 7.5 kB per stream. This corresponds to 120 ms of audio encoded as 20 ms stereo MDCT-mode frames, with a total bitrate just under 511 kbps (not counting the Ogg encapsulation overhead). With N=8, the maximum useful number of streams for the channel assignments currently defined by mapping 1, this gives a maximum packet size of 61,310 bytes, or just under 60 kB. This is still quite conservative, as it assumes each output channel is taken from one decoded channel of a stereo packet. An implementation could reasonably choose any of these numbers for its internal limits.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13365OggOpus2012-05-10T15:34:58Z<p>Derf: </p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below. Every other page with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer will assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 0 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): zero for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Some discussion is in order.<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be '1' for this version of the encapsulation specification. This 8-bit field is partitioned into two sub-fields. The lower four bits constitute a "minor revision", while the upper four bits correspond to a "major revision". Implementations SHOULD treat streams with an unknown minor revision as backwards-compatible as long as they recognize the major revision. When encountering a stream with an unknown major revision, implementations SHOULD assume it is not backwards compatible. The initial version '1' was chosen to keep implementations from relying on this byte as a null terminator for the OpusHead string.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, we recommend a pre-skip of at least 3840 samples (80 ms) to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment Header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message. In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of 15630988 bytes (14.9 MiB) and can span up to 61298 Ogg Pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 channels, each containing 120ms of audio encoded as 2.5ms frames, each frame using the maximum possible number of bytes and stored in the least efficient manner allowed.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13364OggOpus2012-05-10T15:32:50Z<p>Derf: Add details for the split version field. Bump the minor revision to '1' (files with '0' are still backwards compatible).</p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below. Every other page with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer will assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 0 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): zero for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Some discussion is in order.<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be '1' for this version of the encapsulation specification. This 8-bit field is partitioned into two sub-fields. The lower four bits constitute a "minor revision", while the upper four bits correspond to a "major revision". Implementations should treat streams with an unknown minor revision as backwards-compatible as long as they recognize the major revision. When encountering a stream with an unknown major revision, implementations SHOULD assume it is not backwards compatible. The initial version '1' was chosen to keep implementations from relying on this byte as a null terminator for the OpusHead string.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, we recommend a pre-skip of at least 3840 samples (80 ms) to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment Header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message. In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of 15630988 bytes (14.9 MiB) and can span up to 61298 Ogg Pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 channels, each containing 120ms of audio encoded as 2.5ms frames, each frame using the maximum possible number of bytes and stored in the least efficient manner allowed.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13363OggOpus2012-05-10T15:19:32Z<p>Derf: </p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below. Every other page with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer will assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 0 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): zero for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Some discussion is in order.<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be zero for this version of the encapsulation spec. We do not plan to revise the spec, but this also acts as a null terminator for the signature bytes and helps align the rest of the fields.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, we recommend a pre-skip of at least 3840 samples (80 ms) to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment Header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message. In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of 15630988 bytes (14.9 MiB) and can span up to 61298 Ogg Pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 channels, each containing 120ms of audio encoded as 2.5ms frames, each frame using the maximum possible number of bytes and stored in the least efficient manner allowed.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13362OggOpus2012-05-10T00:31:13Z<p>Derf: /* Granule Position */</p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below. Every other page with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer will assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 0 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): zero for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Some discussion is in order.<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be zero for this version of the encapsulation spec. We do not plan to revise the spec, but this also acts as a null terminator for the signature bytes and helps align the rest of the fields.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, we recommend a pre-skip of at least 3840 samples (80 ms) to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment Header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message. In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of 15630988 bytes (14.9 MiB) and can span up to 61298 Ogg Pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 channels, each containing 120ms of audio encoded as 2.5ms frames, each frame using the maximum possible number of bytes and stored in the least efficient manner allowed.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13361OggOpus2012-05-10T00:27:34Z<p>Derf: /* Packet Organization */</p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field SHOULD be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below. Every other page with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer will assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 0 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): zero for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Some discussion is in order.<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be zero for this version of the encapsulation spec. We do not plan to revise the spec, but this also acts as a null terminator for the signature bytes and helps align the rest of the fields.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, we recommend a pre-skip of at least 3840 samples (80 ms) to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment Header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message. In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of 15630988 bytes (14.9 MiB) and can span up to 61298 Ogg Pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 channels, each containing 120ms of audio encoded as 2.5ms frames, each frame using the maximum possible number of bytes and stored in the least efficient manner allowed.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13360OggOpus2012-05-10T00:23:00Z<p>Derf: </p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set. Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field SHOULD be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below. Every other page with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer will assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 0 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): zero for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Some discussion is in order.<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be zero for this version of the encapsulation spec. We do not plan to revise the spec, but this also acts as a null terminator for the signature bytes and helps align the rest of the fields.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, we recommend a pre-skip of at least 3840 samples (80 ms) to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment Header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message. In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of 15630988 bytes (14.9 MiB) and can span up to 61298 Ogg Pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 channels, each containing 120ms of audio encoded as 2.5ms frames, each frame using the maximum possible number of bytes and stored in the least efficient manner allowed.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derfhttps://wiki.xiph.org/index.php?title=OggOpus&diff=13359OggOpus2012-05-10T00:18:05Z<p>Derf: /* Packet Organization */</p>
<hr />
<div>== Ogg Mapping for Opus ==<br />
<br />
The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [http://tools.ietf.org/html/draft-ietf-codec-opus Opus Specification] for technical details.<br />
<br />
Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.<br />
<br />
The remaining parameters that must be signaled are<br />
<br />
* The magic number for stream identification,<br />
* The stream count and coupling for multichannel audio, and<br />
* Any metadata or tags.<br />
<br />
=== Packet Organization ===<br />
<br />
Opus is framed in a continuous logical [http://www.xiph.org/ogg/doc/framing.html Ogg stream]. <br />
<br />
There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.<br />
<br />
The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.<br />
<br />
The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.<br />
<br />
All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set. Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.<br />
<br />
=== Granule Position ===<br />
<br />
The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field SHOULD be set to the special value ’-1’ in two's complement.<br />
<br />
The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.<br />
<br />
The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below. Every other page with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer will assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.<br />
<br />
There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.<br />
<br />
The PCM sample position is determined from the granule position using the formula<br />
<br />
'PCM sample position' = 'granule position' - 'pre-skip' .<br />
<br />
For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula<br />
<br />
'PCM sample position'<br />
'playback time' = --------------------- .<br />
48000.0<br />
<br />
The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.<br />
<br />
Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.<br />
<br />
The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.<br />
<br />
The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.<br />
<br />
On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.<br />
<br />
==== ID Header ====<br />
<br />
0 1 2 3<br />
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'O' | 'p' | 'u' | 's' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| 'H' | 'e' | 'a' | 'd' |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| version = 0 | channel count | pre-skip |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| original input sample rate in Hz |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
| output gain Q7.8 in dB | channel map | |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :<br />
| |<br />
: optional channel mapping table... :<br />
| |<br />
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<br />
<br />
<br />
- Magic signature: "OpusHead" (64 bits)<br />
- Version number (8 bits unsigned): zero for this spec<br />
- Channel count 'c' (8 bits unsigned): MUST be > 0<br />
- Pre-skip (16 bits unsigned, little endian)<br />
- Input sample rate (32 bits unsigned, little endian): informational only<br />
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when<br />
decoding<br />
- Channel mapping family (8 bits unsigned)<br />
-- 0 = one stream: mono or L,R stereo<br />
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...<br />
-- 2..254 = reserved (treat as 255)<br />
-- 255 = no defined channel meaning<br />
If channel mapping family > 0<br />
- Stream count 'N' (8 bits unsigned): MUST be > 0<br />
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <=<br />
255<br />
- Channel mapping (8*c bits)<br />
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)<br />
<br />
<br />
Some discussion is in order.<br />
<br />
* '''Magic signature'''<br />
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.<br />
<br />
* '''Version'''<br />
The version number must always be zero for this version of the encapsulation spec. We do not plan to revise the spec, but this also acts as a null terminator for the signature bytes and helps align the rest of the fields.<br />
<br />
* '''Channel count''' 'c'<br />
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.<br />
<br />
* '''Pre-skip'''<br />
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.<br />
<br />
When constructing cropped Ogg Opus streams, we recommend a pre-skip of at least 3840 samples (80 ms) to ensure complete convergence.<br />
<br />
* '''Input sample rate'''<br />
This is ''not'' the sample rate to use for playback of the encoded data.<br />
<br />
Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.<br />
<br />
An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:<br />
* If the hardware supports 48 kHz playback, decode at 48 kHz,<br />
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,<br />
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,<br />
* else decode at 48 kHz and resample.<br />
<br />
However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.<br />
<br />
A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't <br />
actually upsample the output to 10 MHz if requested).<br />
<br />
* '''Output gain'''<br />
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.<br />
<br />
An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.<br />
<br />
Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.<br />
<br />
The gain is the 20 log<sub>10</sub> ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.<br />
<br />
* '''Channel mapping family'''<br />
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet. <br />
<br />
Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:<br />
<br />
* Family 0 (RTP mapping)<br />
** Allowed numbers of channels: 1 or 2<br />
** 1 channel: monophonic (mono)<br />
** 2 channels: stereo (left, right)<br />
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.<br />
* Family 1 ([http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9 Vorbis channel order])<br />
** Allowed numbers of channels: 1 ... 8<br />
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.<br />
* Family 255 (no defined channel meaning)<br />
** Allowed numbers of channels: 1...255<br />
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.<br />
<br />
The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.<br />
<br />
An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.<br />
<br />
* '''Stream count''' 'N'<br />
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.<br />
<br />
For channel mapping family 0, this value defaults to 1, and is not coded.<br />
<br />
A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.<br />
<br />
* '''Two-channel stream count''' 'M'<br />
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.<br />
<br />
For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.<br />
<br />
Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.<br />
<br />
Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.<br />
<br />
* '''Channel mapping'''<br />
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.<br />
<br />
For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.<br />
<br />
The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.<br />
<br />
==== Comment header ====<br />
<br />
- 8 byte 'OpusTags' magic signature (64 bits)<br />
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:<br />
* Vendor string (always present).<br />
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.<br />
* TAG=value metadata strings (zero or more).<br />
** 4-byte little-endian string count.<br />
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.<br />
<br />
One new comment field is introduced for Ogg Opus:<br />
R128_TRACK_GAIN=-573 <br />
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [http://tech.ebu.ch/loudness EBU-R128] standard.<br />
<br />
An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.<br />
<br />
There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.<br />
<br />
To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.<br />
<br />
== Other Implementation Notes ==<br />
<br />
When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples prior to the seek point in order to ensure that the output audio is correct at the seek point.<br />
<br />
Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message. In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of 15630988 bytes (14.9 MiB) and can span up to 61298 Ogg Pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 channels, each containing 120ms of audio encoded as 2.5ms frames, each frame using the maximum possible number of bytes and stored in the least efficient manner allowed.<br />
<br />
== Test Vectors ==<br />
<br />
* [[OggOpus/testvectors|Planned test vectors for OggOpus]]<br />
* Opus test vectors</div>Derf