XiphWiki - User contributions [en]

Opus Recommended Settings

2024-11-25T15:10:24Z

MarkH: fix broken demo3 link

= Recommended Bitrates =
Depending on the kind of audio you want to encode with Opus, you may want to use different bitrate (quality) settings.

The settings in the table below are meant to '''start you off''' with a decent tradeoff between '''good quality''' and '''small file size''' (or '''bitrate usage''', if you're streaming).

You should test the suggested bitrate by actually '''listening''' to your encoded audio and then:
* tweaking the bitrate '''down''' if you think the quality is good, but the file size (or bitrate) is too big,
* tweaking the bitrate '''up''' if you think the quality is bad, and you can afford having bigger files (or a larger streaming bitrate).

{| class="wikitable" style="text-align:center"
|-
!Use Case
!Channels
!Bitrate (Kb/s)
!Notes
|-
|Low bandwidth HF/VHF digital radio
|1 (mono)
|Use '''[http://www.rowetel.com/?page_id=452 Codec 2]'''
|Opus only supports bitrates '''down to 6 Kb/s'''. 
Codec 2 handles ultra low bitrate speech at '''0.7 - 3.2 Kb/s'''.
|-
|VoIP
|1
|10 - 24
|10 Kb/s will deliver narrowband most of the time, 24 Kb/s should give fullband. 
More details in '''[[Opus_Recommended_Settings#Bandwidth_Transition_Thresholds|the relevant table]]''' further down this page.
|-
|rowspan="2"|Audiobooks / Podcasts
|1
|24
|Bitrates from here on up tend to deliver fullband audio.
|-
|2 (stereo)
|32
|
|-
|Music Streaming / Radio
|2
|64 - 96
|Opus has better quality than MP3, AAC and [[Vorbis]] at these rates. 
(listening test results: '''[http://listening-tests.hydrogenaud.io/igorc/results.html 64 Kb/s]''', '''[http://listening-test.coresv.net/results.htm 96 Kb/s]''')
|-
|rowspan="3"|Music Storage
|2
|96 - 128
|Opus at 128 KB/s (VBR) is pretty much '''[https://en.wikipedia.org/wiki/Transparency_(data_compression) transparent]'''.
|-
|6 (5.1 surround)
|128 - 256
|rowspan="2"|For surround sound, Opus uses '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml surround-sound bitrate allocation]'''.
|-
|8 (7.1 surround)
|256 - 450
|-
|Music Archiving
|1 - 8
|Use '''[[FLAC]]'''
|If you are archiving audio, use a '''[https://en.wikipedia.org/wiki/Audio_file_format#Lossless_compressed_audio_format lossless audio format]''' to prevent '''[https://en.wikipedia.org/wiki/Generation_loss generation loss]'''.
|}

= Technical Details =
For the more technical Opus users, here are some details to help you fine-tune your decision on which bitrate best fits your needs.

== Mono or Stereo ==
Opus tends to start '''downmixing stereo inputs to mono''' from roughly '''19 Kb/s and lower'''.
You can check the details in the '''[https://github.com/xiph/opus/blob/master/src/opus_encoder.c#L149 opus_encoder.c]''' source file.

You can force downmixing at any bitrate by using the following command-line parameters:

<code>--downmix-mono</code> - downmixes all input channels to mono

<code>--downmix-stereo</code> - downmixes all input channels to stereo (if there are more than 2 input channels, e.g. surround sound)

== Bandwidth Transition Thresholds ==
The following table shows rough bitrates that you might want to use to encode audio that has '''[https://tools.ietf.org/html/rfc6716#section-2 limited frequency bandwidths]'''.
This could be useful if your audio has already been bandpassed, or should go through a bandpass filter (e.g. VoIP speech).

{| class="wikitable" style="text-align:center"
|-
!rowspan="3"|Bandpass Range (Hz)
!colspan="4"|Rough Bitrate Required (Kb/s)
|-
!colspan="2"|Mono
!colspan="2"|Stereo
|-
!Voice
!Music
!Voice
!Music
|-
|style="text-align:right;"|NarrowBand (3 - 4000)
|12
|15
|?
|?
|-
|style="text-align:right;"|MediumBand (3 - 6000)
|15
|18-22
|?
|?
|-
|style="text-align:right;"|WideBand (3 - 8000)
|16-20
|22-28
|?
|?
|-
|style="text-align:right;"|SuperWideBand (3-12000)
|24-28
|28-32
|?
|?
|-
|style="text-align:right;"|FullBand (3-20000)
|28-40
|32-64
|32-64
|64-128
|}

The details of Opus' bandpass thresholds can be found in the '''[https://github.com/xiph/opus/blob/master/src/opus_encoder.c#L121 opus_encoder.c]''' source file.

The '''[http://wiki.hydrogenaud.io/index.php?title=Opus HydrogenAudio]''' wiki also has some great information on Opus and its usage.

== Framesize Tweaking ==
Opus can encode frames of '''2.5''', '''5''', '''10''', '''20''', '''40''', or '''60 ms'''. It can also combine multiple frames into packets of '''up to 120 ms'''.

Opus uses a '''20 ms''' frame size '''[https://tools.ietf.org/html/rfc6716#section-2.1.4 by default]''', as it gives a decent mix of low latency and good quality.

For real-time applications, sending fewer packets per second reduces the overall bitrate, since it reduces the overhead from '''[https://en.wikipedia.org/wiki/IPv6_packet#Fixed_header IP]''', '''[https://en.wikipedia.org/wiki/User_Datagram_Protocol#Packet_structure UDP]''', and '''[https://en.wikipedia.org/wiki/Real-time_Transport_Protocol#Packet_header RTP headers]'''.
However, it increases latency and sensitivity to packet losses, as losing one packet constitutes a loss of a bigger chunk of audio.
Unless operating at very low bitrates over RTP, there is no reason to use frame sizes above 20 ms, as those will have slightly lower quality for music encoding.

For these reasons, the default 20 ms frames are a good choice for most applications.

== Trading Coding Efficiency with CPU Time ==
The Opus encoder uses its maximum algorithmic '''complexity''' setting of '''10''' '''[https://tools.ietf.org/html/rfc6716#section-2.1.5 by default]'''. This means that it does not hesitate to use CPU to give you the best quality encoding at a given bitrate.

If the CPU usage is too high for the system you are using Opus on, you can try a lower complexity setting. The allowed values span from '''10''' (highest CPU usage and quality) down to '''0''' (lowest CPU usage and quality).

[[Category:Opus]]

OggKate

2023-02-21T16:20:46Z

MarkH: fix broken links

== Disclaimer ==
This is not a Xiph codec, though it may be embedded in Ogg alonside other Xiph
codecs, such as Vorbis and Theora. As such, please do not assume that Xiph has
anything to do with this, much less responsibility.

== What is Kate? ==

Kate is an overlay codec, originally designed for karaoke and text, that can be
multiplexed in Ogg.

Text and images can be carried and animated by a Kate stream.
Most of the time, they will (optionally) be multiplexed with audio/video to carry subtitles,
song lyrics (with or without karaoke data), etc.

Series of curves (splines, segments, etc) may be attached to various properties
(text position, font size, etc) to create animated overlays. This allows scrolling
or fading text to be defined. This can even be used to draw arbitrary shapes, so
hand drawing can also be represented by a Kate stream.

Example uses of Kate streams are movie subtitles for Theora videos, either text based,
as may be created by [http://www.v2v.cc/~j/ffmpeg2theora ffmpeg2theora], or image
based, such as created by [http://thoggen.net Thoggen] (patching needed), and lyrics,
as created by oggenc, from vorbis-tools.

== Why a new codec? ==

As I was adding support for Theora, Speex and FLAC to some software of mine, I found myself
wanting to have song lyrics accompanying Vorbis audio. Since Vorbis comments are limited to
the headers, one can't add them in the stream as they are sung, so another multiplexed stream
would be needed to carry them.

The three possible bases usable for such a codec I found were Writ, CMML, and OGM/SRT.

*[[OggWrit|Writ]] is an unmaintained start at an implementation of a very basic design, though I did find an encoder/decoder in py-ogg2 later on - I'd been quicker to write Kate from scratch anyway.
*[[CMML]] is more geared towards encapsulating metadata about an accompanying stream, rather than being a data stream itself, and seemed complex for a simple use, though I have now revised my view on this - besides, it seems designed for Annodex (which I haven't had a look at), though it does seems relatively generic for use outwith Annodex - though it is being "repurposed" as timed text now, bringing it closer to what I'm doing
*OGM/SRT, which I only found when I added Kate support to MPlayer, is shoehorning various data formats into an Ogg stream, and just dumps the SRT subtitle format as is, AFAICS (though I haven't looked at this one in detail, since I'd already had a working Kate implementation by that time)

I then decided to roll my own, not least because it's a fun thing to do.

I found other formats, such as USF (designed for inclusion in Matroska) and various subtitle formats,
but none were designed for embedding inside an Ogg container.

== Overview of the Kate bitstream format ==

I've taken much inspiration from Vorbis and Theora here.
Headers and packets (as well as the API design) follow the design of these two codecs.

A rough overview (see [[#Format specification|Format specification]] for more details) is:

Headers packets:
*ID header [BOS]: magic, version, granule fraction, encoding, language, etc
*Comment header: Vorbis comments, as per Vorbis/Theora streams
*Style definitions header: a list of predefined styles to be referred to by data packets
*Region definitions header: a list of predefined regions to be referred to by data packets
*Curves definitions header: a list of predefined curves to be referred to by data packets
*Motion definitions header: a list of predefined motions to be referred to by data packets
*Palette definitions header: a list of predefined palettes to be referred to by data packets
*Bitmap definitions header: a list of predefined bitmaps to be referred to by data packets
*Font mapping definitions header: a list of predefined font mappings to be referred to by data packets

Other header packets are ignored, and left for future expansion.

Data packets:
*text data: text/image and optional motions, accompanied by optional overrides for style, region, language, etc
*keepalive: can be emitted at any time to help a demuxer know where we're at, but those packets are optional
*repeats: a verbatim repeat of a text packet's payload, in order to bound any backward seeking needed when starting to play a stream partway through. These are also optional.
*end data [EOS]: marks the end of the stream, it doesn't have any useful payload

Other data packets are ignored, and left for future expansion.

The intent of the "keepalive" packet is to be sent at regular
intervals when no other packet has been emitted for a while. This would be to help seeking code
find a kate page more easily.

Things of note:
*Kate is a discontinuous codec, as defined in [http://www.xiph.org/ogg/doc/ogg-multiplex.html ogg-multiplex.html] in the Ogg documentation, which means it's timed by start granule, not end granule (as Theora and Vorbis).
* All data packets are on their own page, for two reasons:
**Ogg keeps track of granules at the page level, not the packet level
**if no text event happens for a while after a particular text event, we don't want to delay it so a larger page can be issued

See also [[#Seeking and memory|Problems to solve: Seeking and memory]].

*The granule encoding is not a direct time/granule correspondance, see the granule encoding section.
*The EOS packet should have a granule pos higher or equal to the end time of all events.
*User code doesn't have to know the number of headers to expect, this is moved inside the library code (as opposed to Vorbis and Theora).
*The format contains hooks so that additional information may be added in future revisions while keeping backward compatibility (though old decoders will correctly parse, but ignore the new information).

== Format specification ==

The Kate bitstream format consists of a number of sequential packets.
Packets can be either header packets or data packets. All header packets
must appear before any data packet.

Header packets must appear in order. Decoding of a data packet is not
possible until all header packets have been decoded.

Each Kate packet starts with a one byte type. A type with the MSB set
(eg, between 0x80 and 0xff) indicates a header packet, while a type with
the MSB cleared (eg, between 0x00 and 0x7f) indicates a data packet.
All header packets then have the Kate magic, from byte offset 1 to byte
offset 7 ("kate\0\0\0"). Note that this applies only to header packets:
data packets do not contain the Kate signature.

Since the ID header must appear first, a Kate stream can be recognized
by comparing the first eight bytes of the first packet with the signature
string "\200kate\0\0\0".

When embedded in Ogg,the first packet in a Kate stream (always packet type 0x80,
the id header packet) must be placed on a separate page. The corresponding Ogg
packet must be marked as beginning of stream (BOS).All subsequent header packets
must be on one or more pages. Subsequently, each data packet must be on a separate
page.

The last data packet must be the end of stream packet (packet type 0x7f).

When embedded in Ogg, the corresponding Ogg packet must be marked as end of stream (EOS).

As per the Ogg specification, granule positions must be non decreasing
within the stream. Header packets have granule position 0.

Currently existing packet types are:
:headers:
::0x80 ID header (BOS)
::0x81 Vorbis comment header
::0x82 regions list header
::0x83 styles list header
::0x84 curves list header
::0x85 motions list header
::0x86 palettes list header
::0x87 bitmaps list header
::0x88 font ranges and mappings header
:data:
::0x00 text data (including optional motions and overrides)
::0x01 keepalive
::0x02 repeat
::0x7f end packet (EOS)

This format described here is for bitstream version 0.x.
As or 19 december 2008, the latest bitstream version is 0.4.

For more detailed information, refer to the format documentation
in libkate (see URL below in the [[#Downloading|Downlading]] section).

Following is the definition of the ID header (packet type 0x80).
This works out to a 64 byte ID header. This is the header that should be
used to detect a Kate stream within an Ogg stream.

0 1 2 3 |
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| packtype | Identifier char[7]: 'kate\0\0\0' | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| kate magic continued | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| reserved - 0 | version major | version minor | num headers | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| text encoding | directionality| reserved - 0 | granule shift | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cw sh | canvas width | ch sh | canvas height | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| reserved - 0 | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| granule rate numerator | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| granule rate denominator | 28-31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (NUL terminated) | 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (continued) | 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (continued) | 40-43
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (continued) | 44-47
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (NUL terminated) | 48-51
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (continued) | 52-55
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (continued) | 56-59
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (continued) | 60-63
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The fields cw sh, canvas width, cw sh, and canvas height were introduced
in bistream 0.3. Earlier bitstreams will have 0 in these fields.

language and category are NUL terminating ASCII strings.
Language follows RFC 3066, though obviously will not accommodate language tags
with lots of subtags.

Category is currently loosely defined, and I haven't found yet a nice way to
present it in a generic way, but is meant for automatic classifying of
various multiplexed Kate streams (eg, to recognize that some streams are
subtitles (in a set of languages), and some others are commentary (in a
possibly different set of languages, etc).

== API overview ==

libkate offers an API very similar to that of libvorbis and libtheora, as well as
an extra higher level decoding API.

Here's an overview of the three main modules:

=== Decoding ===

Decoding is done in a way similar to libvorbis. First, initialize a kate_info and a
kate_comment structure. Then, read headers by calling kate_decode_headerin. Once
all headers have been read, a kate_state is initialized for decoding using kate_decode_init,
and kate_decode_packetin is called repeatedly with data packets. Events (eg, text) can be
retrieved via kate_decode_eventout.

=== Encoding ===

Encoding is also done in a way similar to libvorbis. First initialize a kate_info
and a kate_comment structure, and fill them out as needed. kate_encode_headers will
create ogg packets from those. Then, kate_encode_text is called repeatedly for all
the text events to add. When done, calling kate_encode_finish will create an end of
stream packet.

=== High level decoding API ===

There are only 3 calls here:

kate_high_decode_init
kate_high_decode_packetin
kate_high_decode_clear

Here, all Ogg packets are sent to kate_high_decode_packetin, which does the right
thing (header/data classification, decoding, and event retrieval). Note that you
do not get access to the comments directly using this, but you do get access to the
kate_info via events.

The libkate distribution includes commented examples for each of those.

Additionally, libkate includes a layer (liboggkate) to make it easier to use when
embedded in Ogg. While the normal API uses kate_packet structures, liboggkate uses
ogg_packet structures.

The High level decoding API does not have an Ogg specific layer, but functions exist
to wrap a kate_packet around a memory buffer (such as the one ogg_packet uses, for instance).

== Support ==

Among the software with Kate support:
*VLC
*ffmpeg2theora
*liboggz
*liboggplay
*Cortado (wikimedia version)
*vorbis-tools

I have patches for the following with Kate support:
*MPlayer
*xine
*GStreamer
*Thoggen
*Audacious
*and more...

These may be found in the libkate source distribution (see [[#Downloading|Downloading]]
for links).

In addition, libtiger is a rendering library for Kate streams using Pango and Cairo,
though it is not quite yet API stable (though no major changes are expected).

== Granule encoding ==

=== Ogg ===

Ogg leaves the encoding of granules up to a particular codec, only
mandating that granules be non decreasing with time.

The Kate bitstream format uses a linear mapping between time and
granule, described here.

A Kate granule position is composed of two different parts:
- a base granule, in the high bits
- a granule offset, in the low bits

+----------------+----------------+
| base | offset |
+----------------+----------------+

The number of bits these parts occupy is variable, and each stream
may choose how many bits to dedicate to each. The kate_info structure
for a stream holds that information in the granule_shift field,
so each part may be reconstructed from a granulepos.

The timestamp T of a given Kate packet is split into a base B and
offset O, and these are stored in the granulepos of that packet.
The split is done such that the B is the time of the earliest event
still active at the time, and the O is the time elapsed between B
and T. Thus, T = B + O. This mimics the way Theora stores its own
timestamps in granulepos, where the base acts as a keyframe, and
an offset acts as the position of an intra frame from the previous
keyframe. Since Kate allows time overlapping events, however, the
choice of the base to use is slightly more complex, as it may not
be the starting time of the previous event, if the stream contains
time overlapping events.

The kate_info structure for a stream holds a rational fraction
representing the time span of granule units for both the base and
the offset parts.

The granule rate is defined by the two fields:

kate_info::gps_numerator
kate_info::gps_denominator

The number of bits reserved for the offset is defined by the field:

kate_info::granule_shift

=== Generic timing ===

Kate data packets (data packet type 0) includes timing information (start time,
end time, and time of the earliest event still active). All these are stored as
64 bit at the rate defined by the granule rate, so they do not suffer from the
granule_shift space limitation.

This also allows for Kate streams to be stored in other containers.

== Motion ==

The Kate bitstream format includes motion definition, originally for karaoke purposes, but
which can be used for more general purpose, such as line based drawing, or animation of
the text (position, color, etc)

Motions are defined by the means of a series of curves (static points, segments, splines (catmull-rom, bezier, and b-splines)).
A 2D point can be obtained from a motion for any timestamp during the lifetime of a text.
This can be used for moving a marker in 2D above the text for karaoke, or to use the x
coordinate to color text when the motion position passes each letter or word, etc.
Motions have an attached semantics so the client code knows how to use a particular motion.
Predefined semantics include text color, text position, etc).

Since a motion can be composed of an arbitrary number of curves, each of which may have
an arbitrary number of control points, complex motions can be achieved. If the motion is
the main object of an event, it is even possible to have an empty text, and use the motion
as a virtual pencil to draw arbitrary shapes. Even on-the-fly handwriting subtitles could
be done this way, though this would require a lot of control points, and would not be able
to be used with text-to-speech.

As a proof of concept, I also have a "draw chat" program where two people can draw, and
the shapes are turned to b-splines and sent as a kate motion to be displayed on the other
person's window.

It is also possible for motions to be discontinuous - simply insert a curve of 'none' type.
While the timestamp lies within such a curve, no 2D point will be generated. This can be
used to temporarily hide a marker, for instance.

It is worth mentionning that pauses in the motion can be trivially included by inserting
at the right time and for the right duration a simple linear interpolation curve with only
two equal points, equal to the position the motion is supposed to pause at.

Kate defines a set of predefined mappings so that each decoder user interprets a motion in
the same way. A mapping is coded on 8 bits in the bitstream, and the first 128 are reserved
for Kate, leaving 128 for application specific mappings, to avoid constraining creative uses
of that feature. Predefined mappings include frame (eg, 0-1 points are mapped to the size of
the current video frame), or region, to scale 0-1 to the current region. This allows curves
to be defined without knowing in advance the pixel size of the area it should cover.

For uses which require more than two coordinates (eg, text color, where 4 (RGBA) values are
needed, Kate predefines the semantics text_color_rg and text_color_ba, so a 4D point can be
obtained using two different motions.

There are higher level constructs, such as morphing between two styles, or predefined
karaoke effects. More are planned to be added in the future.

See also [[#Trackers|Trackers]].

== Trackers ==

Since attaching motions to text position, etc, makes it hard for the client to keep track of
everything, doing interpolation, etc, the library supplies a tracker object, which handles the
interpolation of the relevant properties.
Once initialized with a text and a set of motions, the client code can give the tracker a new
timestamp, and get back the current text position, text color, etc.

Using a tracker is not necessary, if one wants to use the motions directly, or just ignore them,
but it makes life easier, especially when considering the the order in which motions are applied
does matter (to be defined formally, but the current source code is informative at this point).

== The Kate file format ==

Though this is not a feature of the bitstream format, I have created a text file format to
describe a series of events to be turned into a Kate bitstream.
At its minimum, the following is a valid input to the encoder:

: kate {
:: event { 00:00:05 --> 00:00:10 "This is a text" }
: }

This will create a simple stream with "This is a text" emitted at an offset of 5 seconds into
the track, lasting 5 seconds to an end time at 10 seconds.

Motions, regions, styles can be declared in a definitions block to be reused by events, or can
be defined inline. Defining those in the definitions block places them in a header so they can
be reused later, saving space. However, they can also be defined in each event, so they will be
sent with the event. This allows them to be generated on the fly (eg, if the bitstream is being
streamed from a realtime input).

For convenience, the Kate file format also allows C style macros, though without parameters.

Please note that the Kate file format is fully separate from the Kate bitstream format. The
difference between the two is similar to the difference between a C source file and the resulting
object file, when compiled.

Note that the format is not based on XML for a very parochial reason: I tend to dislike very
much editing XML by hand, as it's really hard to read. XML is really meant for machines to parse
generically text data in a shared syntax but with possibly unknown semantics, and I need those
text representations to be editable easily.

This also implies that there could be an XML representation of a Kate stream, which would be
useful if one were to make an editor that worked on a higher level than the current all-text
representation, and it is something that might very well happen in the future, in parallel with
the current format.

== Karaoke ==

Karaoke effects rely on motions, and there will be predefined higher level ways of specifying
timings and effects, two of which are already done.

As an example, this is a valid Karaoke script:

:kate {
:: simple_timed_glyph_style_morph {
::: from style "start_style" to style "end_style"
::: "Let " at 1.0
::: "us " at 1.2
::: "sing " at 1.4
::: "to" at 2.0
::: "ge" at 2.5
::: "ther" at 3.0
:: }
:}

The syllables will change from a style to another as time passes. The definition of the start_style
and end_style styles is omitted for brevity.

== Problems to solve ==

There are a few things to solve before the Kate bitstream format can be considered good
enough to be frozen:

Note: the following is mostly solved, and the bitstream is now stable, and has been
backward and forward compatible since the first released version. This will be updated
when I get some time.

=== Seeking and memory ===

When seeking to a particular time in a movie with subtitles, we may end up at a place when a subtitle has been started, but is not removed yet. Pure streaming doesn't have this problem as it remembers the subtitle being issued (as opposed to, say, Vorbis, for which all data valid now is decoded from the last packet). With Kate, a text string valid now may have been issued long ago.

I see three possible ways to solve this:
*each data packet includes the granule of the earliest still active packet (if none, this will be the granule of this very packet)
**this means seeks are two phased: first seek, find the next Kate packet, and seek again if the granule of the earlier still active packet is less than the original seeked granule. This implies support code on players to do the double seek.

*use "reference frames", a bit like Theora does, where the granule position is split in several fields: the higher bits represent a position for the reference frame, and the lowest bits a delta time to the current position. When seeking to a granule position, the lower bits are cleared off, yielding the granule position of the previous reference frame, so the seek ends up at the reference frame. The reference frame is a sync point where any active strings are issued again. This is a variant of the method described in the Writ wiki page, but the granule splitting avoids any "downtime".
**this requires reissuing packets, and it doesn't feel right (and wastes space).
**it also requires "dummy" decoding of Kate data from the reference frame to the actual seek point to fully refresh the state "memory".

*A variant of the two-granules-in-one system used by libcmml, where the "back link" points to the earliest still active string, rather than the previous one (this allows a two phase seek, rather than a multiphase seek, hopping back from event to event, with no real way to know if there is or not a previous event which is still active - I suppose CMML has no need to know this, if their "clips" do not overlap - mine can do).
**Such a system considerably shortens the usable granule space, though it can do a one phase seek, if I understand the system correctly, which I am not certain.
*** Well, it seems it can't do a one phase seek anyway.

*Additionally, it could be possible to emit simple "keepalive" packets at regular intervals to help a seek algorithm to sync up to the stream without needing too much data reading - this helps for discontinuous streams where there could be no pages for a while if no data is needed at that time.

=== Text encoding ===

A header field declares the text encoding used in the stream. At the moment, only UTF-8 is
supported, for simplicity. There are no plans to support other encodings, such as UTF-16,
at the moment.

Note that strings included in the header (language, category) are not affected by that
language encoding (rather obviously for language itself). These are ASCII.

The actual text in events may include simple HTML-like markup (at the moment, allowed markup
is the same as the one Pango uses, but more markup types may be defined in the future).
It is also possible to ask libkate to remove this markup if the client prefers to receive
plain text without the markup.

=== Language encoding ===

A header field defines the language (if any) used in the stream (this can be overridden in a
data packet, but this is not relevant to this point). At the moment, my test code uses
ISO 639-1 two letter codes, but I originally thought to use RFC 3066 tags. However, matching
a language to a user selection may be simpler for user code if the language encoding is kept
simple. At the moment, I tend to favor allowing both two letter tags (eg, "en") and secondary
tags (like "en_EN"), as RFC 3066 tags can be quite complex, but I welcome comments on this.

If a stream contains more than one language, there usually is a predominant language, which
can be set as the default language for the stream. Each event can then have a language
override. If there is no predominant language, and it is not possible to split the stream
into multiple substreams, each with its own language, then it is possible to use the "mul"
language tag, as a last resort.

=== Bitstream format for floating point values ===

Floating point values are be turned to a 16.16 fixed point format, then stored in a bitpacked
format, storing the number of zero bits at the head and tail of the floating point values once
per stream, and the remainder bits for all values in the stream. This seems to yield good results
(typically a 50% reduction over 32 bits raw writes, and 70% over the snprintf based storage), and
has the big advantage of being portable (eg, independant of any IEEE format).
However, this means reduced precision due to the quantization to 16.16. I may add support for
variable precision (eg, 8.24 fixed point formats) to alleviate this. This would however mean less
space savings, though these are likely to be insignificant when Kate streams are interleaved with
a video.

*Though this is not a Kate issue per se, the motion feature is very difficult to use without a curve editor. While tools may be coded to create a Kate bitstream for various existing subtitle formats, it is not certain it will be easy to find a good authoring tool for a series of curves. That said, it's not exactly difficult to do if you know a widget set.

=== Higher dimensional curves/motions ===

It is quite annoying to have to create two motions to control a color change, due to curves
being restricted to two dimensions. I may add support for arbitrary dimensions. It would also
help for 1D motions, like changing the time flow, where one coordinate is simply ignored at
the moment.
Alternatively, changes could be made to the Kate file format to hide the two dimensionality and
allow simpler specification of non-2 dimensional motions, but still map them to 2D in the kate
bitstream format.

=== Category definition ===

The category field in the BOS packet is a 16 byte text field (15 really, as it is zero terminated
in the bitstream itself). Its goal is to provide the reader with a short description of what kind
of information the stream contains, eg subtitles, lyrics, etc. This would be displayed to the user,
possibly to allow to choose to turn some streams on and off.

Since this category is meant primarily for a machine to parse, they will be kept to ASCII. When
a player recognizes a category, it is free to replace its name with one in the user's language if
it prefers. Even in English, the "lyrics" category could be displayed by a player as "Lyrics".

Since this is a free text field rather than an enumeration, it would be good to have a list of
common predefined category names that Kate streams can use.

This is a list of proposed predefined categories, feedback/additions welcome:

* subtitles - the usual movie subtitles, as text
* spu-subtitles - movie subtitles in DVD style paletted images
* lyrics - song lyrics

Please remember the 15 character limit if proposing other categories.

Note that the list of categories is subject to change, and will likely
be replaced by new, more "identifier like" ones. The three ones above,
however, would be kept for backward compatibility as they're already used.

== Text to speech ==

One of the goals of the Kate bitstream format is that text data can be easily parsed
by the user of the decoder, so any additional information, such as style, placement,
karaoke data, etc, should be able to be stripped to leave only the bare text. This is
in view of allowing text-to-speech software to use Kate bitstreams as a bandwith-cheap
way of conveying speech data, and could also allow things like e-books which can be
either read or listened to from the same bitstream (I have seen no reference to this
being used anywhere, but I see no reason why the granule progression should be temporal,
and not user controlled, such as by using a "next" button which would bump a granule
postion by a preset amount, simulating turning a page (this would be close to necessary
for text-to-speech, as the wall time duration of the spoken speech is not known in
advance to the Kate encoder, and can't be mapped to a time based granule progression)).
All text strings triggered consecutively between the two granule positions would then
be read in order.

== Possible additions ==

=== Embedded binary data ===

Images and font mappings can be included within a Kate stream.

==== Images ====

Though this could be misused to interfere with ability to render as text-to-speech, Kate
can use images as well as text. The same caveat as for fonts applies with regard to data
duplication.

Complex images might however be best left to a multiplexed OggSpots or OggMNG stream, unless the
images mesh with the text (eg, graphical exclamation points, custom fonts, (see next
paragraph), etc).

There is support for simple paletted bitmap images, with a variable length palette of up
to 256 colors (in fact, sized in powers of 2 up to 256) and matching pixel data in as
many bits per pixel as can address the palette. Palettes and images are stored separately,
so can be used with one another with no fixed assignment.

Palettes and bitmaps are put in two separate header for later use by reference, but can
also be placed in data packets, as with motions, etc, if they are not going to be reused.

PNG bitmaps can also be embedded in a Kate stream. These do not have associated palettes
(but the PNGs themselves may or may not be paletted). There is no support for decoding PNG
images in libkate itself, so a program will have to use libpng (or similar code) to decode
the PNG image. For instance, the libtiger rendering library uses Cairo to decode and render
PNG images in Kate streams.

This can be used to have custom fonts, so that raw text is still available if the stream
creator wants a custom look.

I expect that the need for more than 256 colors in a bitmap, or non palette bitmap data,
would be best handled by another codec, eg OggMNG or OggSpots. The goal of images in a
Kate stream is to mesh the images with the text, not to have large images by themselves.

On the other hand, interesting Karaoke effects could be achieved by having MNG images
instead of simple paletted bitmaps in a Kate streams. Comments would be most welcome on
whether this is going too far, however.

I am also investigating SVG images. These allow for very small footprint images for simple
vector drawings, and could be very useful for things like background gradients below text.

A possible solution to the duplication issue is to have another stream in the container
stream, which would hold the shared data (eg, fonts), which the user program could load,
and which could then be used by any Kate (and other) stream. Typically, this type of stream
would be a degenerate stream with only header packets (so it is fully processed before any
other stream presents data packets that might make use of that shared data), and all payload
such as fonts being contained within the headers. Thinking about it, it has parallels with
the way Vorbis stores its codebooks within a header packet, or even the way Kate stores the
list of styles within a header packet.

==== Fonts ====

Custom fonts are merely a set of ranges mapping unicode code points to bitmaps. As this implies,
fonts are bitmap fonts, not vector fonts, so scaling, if supported by the rendering client,
may not look as good as with a vector font.

A style may also refer to a font name to use (eg, "Tahoma"). These fonts may or may not be
available on the playing system, however, since the font data is not included in the stream,
just referenced by name. For this reason, it is best to keep to widely known fonts.

== Reference encoder/decoder ==

A encoder (kateenc) and a decoder (katedec) are included in the tools directory.
The encoder supports input from several different formats:
* a custom text based file format (see [[#The Kate file format|The Kate file format]]), which is by no means meant to be part of the Kate bitstream specification itself
* SubRip (.srt), the most common subtitle format I found
* LRC lyrics format.

As an example for the widely used SRT subtitles format, the following command line
create a Kate subtitles stream from an SRT file:

kateenc -l en -c subtitles -t srt -o subtites.ogg subtitles.srt

The reverse is possible, to recover an SRT file from a Kate stream, with katedec.

Note that the subtitles.ogg file should then be multiplexed into the A/V stream,
using either ogg-tools or oggz-tools.

The Kate bitstreams encoded and decoded by those tools are (supposed to be) correct for this
specification, provided their input is correct.

== Next steps ==

=== Continuations ===

Continuations are a way to add to existing events, and are mostly meant for motions. When streaming
in real time, what motions may be applied to events may not be known in advance (for instance, for a
draw chat program where two programs exchange Kate streams, the drawing motions are only known as
they are drawn. Continuations will allow an event to be extended in time, and motions to be appended
to it. This is only useful for streaming, as when stored in a file, everything is already known in
advance.

=== A rendering library ===

This will allow easier integration in other packages (movie players, etc).
I have started working on an implementation using Cairo and Pango, though I'm still at the early stages.
I might add support for embedding vector fonts in a Kate stream if I was going that way. Still need to think about this.
Another point of note is that when this library is available, it would make it easier to add
capabilities such as rotation, scaling, etc, to the bitstream, since this would not cause too
much work for playing programs using the rendering library. It is expected that these additions
would stay backward compatible (eg, an old player would ignore this information but still correctly
decode the information they can work with from a newly encoded stream).

=== An XML representation ===

While I purposefully did not write Kate description files in XML due to me finding editing XML such
a chore, it would be nice to be able to losslessly convert between the more user friendly representation
and an XML document, so one can do what one does with XML documents, like transformations.

And after all, some people might prefer editing the XML version.

=== Packaging ===

It would be really nice to have packages for libkate/libtiger for many distros.

If you're a packager for a distro which doesn't have yet packages for libkate
or libtiger, please consider helping :)

In particular, packages for Debian would be grand.

== Matroska mapping ==

The codec ID is "S_KATE".

As for Theora and Vorbis, Kate headers are stored in the private data as xiph-laced packets:

Byte 0: number of packets present, minus 1 (there must be at least one packet) - let this number be NP
Bytes 1..n: lengths of the first NP packets, coded in xiph style lacing
Bytes n+1..end: the data packets themselves concatenated one after the other

Note that the length of the last packet isn't encoded, it is deduced from the sizes of the other
packets and the total size of the private data.

This mapping is similar to the Vorbis and Theora mappings, with the caveat that one should not
expect a set number of headers.

== Downloading ==

libkate encodes and decodes Kate streams, and is API and ABI stable.

The libkate source distribution is available at [https://code.google.com/archive/p/libkate/ https://code.google.com/archive/p/libkate/].

A public git repository is available at [https://gitlab.xiph.org/xiph/kate https://gitlab.xiph.org/xiph/kate].

libtiger renders Kate streams using Pango and Cairo, and is alpha, with API changes still possible.

The libtiger source distribution is available at [https://code.google.com/archive/p/libtiger/ https://code.google.com/archive/p/libtiger/].

== HOWTOs ==

These paragraphs describe a few ways to use Kate streams:

=== Text movie subtitles ===

Kate streams can carry Unicode text (that is, text that can represent
pretty much any existing language/script). If several Kate streams are
multiplexed along with a video, subtitles in various languages can be
made for that movie.

An easy way to create such subtitles is to use ffmpeg2theora, which
can create Kate streams from SubRip (.srt) format files, a simple but
common text subtitles format. ffmpeg2theora 0.21 or later is needed.

At its simplest:

ffmpeg2theora -o video-with-subtitles.ogg --subtitles subtitles.srt
video-without-subtitles.avi

Several languages may be created and tagged with their language code
for easy selection in a media player:

ffmpeg2theora -o video-with-subtitles.ogg video-without-subtitles.avi
--subtitles japanese-subtitles.srt --subtitles-language ja
--subtitles welsh-subtitles.srt --subtitles-language cy
--subtitles english-subtitles.srt --subtitles-language en_GB

Alternatively, kateenc (which comes with the libkate distribution) can
create Kate streams from SubRip files as well. These can then be merged
with a video with oggz-tools:

kateenc -t srt -c SUB -l it -o subtitles.ogg italian-subtitles.srt
oggz merge -o movie-with-subtitles.ogg movie-without-subtitles.ogg subtitles.ogg

This second method can also be used to add subtitles to a video which
is already encoded to Theora, as it will not transcode the video again.

=== DVD subtitles ===

DVD subtitles are not text, but images. Thoggen, a DVD ripper program,
can convert these subtitles to Kate streams (at the time of writing,
Thoggen and GStreamer have not applied the necessary patches for this
to be possible out of the box, so patching them will be required).

When configuring how to rip DVD tracks, any subtitles will be detected
by Thoggen, and selecting them in the GUI will cause them to be saved as
Kate tracks along with the movie.

=== Song lyrics ===

Kate streams carrying song lyrics can be embedded in an Ogg file. The
oggenc Vorbis encoding tool from the Xiph.Org Vorbis tools allows lyrics
to be loaded from a LRC or SRT text file and converted to a Kate stream
multiplexed with the resulting Vorbis audio. At the time of writing,
the patch to oggenc was not applied yet, so it will have to be patched
manually with the patch found in the diffs directory.

oggenc -o song-with-lyrics.ogg --lyrics lyrics.lrc --lyrics-language en_US song.wav

So called 'enhanced LRC' files (containing extra karaoke timing information)
are supported, and a simple karaoke color change scheme will be saved
out for these files. For more complex karaoke effects (such as more
complex style changes, or sprite animation), kateenc should be used with
a Kate description file to create a separate Kate stream, which can then
be merged with a Vorbis only song with oggz-tools:

oggenc -o song.ogg song.wav
kateenc -t kate -c LRC -l en_US -o lyrics.ogg lyrics-with-karaoke.kate
oggz merge -o song-with-karaoke.ogg lyrics-with-karaoke.ogg song.ogg

This latter method may also be used if you already have an encoded Vorbis song
with no lyrics, and just want to add the lyrics without reencoding.

=== Metadata ===

Metadata can be attached to events, or to styles, bitmaps, regions, etc.
Metadata are free form tag/value pairs, and can be used to enrich their
attached data with extra information. However, how this information is
interpreted is up to the application layer.

It is worth noting that an event may not have attached text, so it is
possible to create an empty timed event with attached metadata.

For instance, let's say we have a documentary, with footage from various
places, as well as short interviews, and we want two things:
- tag footage with metadata about the location and date that footage was shot
- subtitle the interviews and tag those subtitles with information about the speaker

You can then create an empty Kate event for each footage part, synchronized
with the footage, and attach a new metadata item called GEO_LOCATION, filled
with latitude and longitude of the place the footage was shot at.
Similarly, for each subtitle event, a metadata item called SPEAKER can be
attached.

An empty event to tag a long 4:20 footage shot in Tokyo on 2011/08/12, and
inserted at 18:30 in the documentary could look like:

event {
00:18:30,000 --> 00:22:50,000
meta "GEO_LOCATION" = "35.42; 139.42"
meta "DATE" = "2011-08-12"
}

Here's a example for a line spoken by Dr Joe Bloggs at 18:30 into the documentary:

event {
00:18:30,000 --> 00:18:32,000
"Notice how the subtitles for my words have metadata attached to them"
meta "SPEAKER" = "Dr Joe Bloggs"
meta "URL" = "http://www.example.com/biography?name=Joe+Bloggs"
}

Notice how another metadata item, URL, is also present. The application
will have to be aware of those metadata in order to do something with it
though. Since those are free form, it is up to you to think of what
metadata you want, and make use of it.

Note that metadata may be attached to other objects, such as regions.
This way, you can for example create a region tagged with a name, and
track a person's movements with that region. Or you can tag a bitmap
with a copyright and a URL to a larger version of the image.

=== Changing a Kate stream embedded in an Ogg stream ===

If you need to change a Kate stream already embedded in an Ogg stream (eg, you have a movie with subtitles, and you want to fix a spelling mistake, or want to bring one of the subtitles forward in time, etc), you can do this easily with KateDJ, a tool that will extract Kate streams, decode them to a temporary location, and rebuild the original stream after you've made whatever changes you want.

KateDJ (included with the libkate distribution) is a GUI program using wxPython, a Python module for the wxWidgets GUI library, and the oggz tools (both needing installing separately if they are not already).

The procedure consists of:

* Run KateDJ
* Click 'Load Ogg stream' and select the file to load
* Click 'Demux file' to decode Kate streams in a temporary location
* Edit the Kate streams (a message box tells you where they are placed)
* When done, click 'Remux file from parts'
* If any errors are reported, continue editing until the remux step succeeds

== Frequently Asked Questions ==

=== Does libkate work on other plaforms than Linux ? ===

Yes, libkate is not Linux specific in any way. It optionally relies on libogg
and libpng, two libraries widely ported to various platforms.
It has been reported to work on Windows and MacOS X as well as UNIX platforms.

However, libtiger, a rendering library for Kate streams, relies on [http://www.pango.org/ Pango] and [https://www.cairographics.org/ Cairo],
which are not easy to build on Windows, though they can be.
The Tiger renderer is however completely separate from libkate, and is not needed
for full encoding and decoding of Kate streams.

=== Where can I find some example files ? ===

The libkate distribution can generate various examples, but already built files
can be found there:
[http://people.xiph.org/~oggk/elephants_dream/elephantsdream-with-subtitles.ogg]
[http://stallman.org/fry/Stephen_Fry-Happy_Birthday_GNU-nq_600px_425kbit.ogv]

These files use raw text only.

[[Category:Ogg Mappings]]

OpusContributing

2018-09-26T03:41:30Z

MarkH: /* How to report bugs */ trac issues have been migrated to gitlab.xiph.org

== Community ==

Development discussions and questions take place on the Xiph.Org Opus mailing list ([mailto:opus@xiph.org opus@xiph.org]).

Discussions related to the IETF process happen on the IETF codec working group mailing list ([mailto:codec@ietf.org codec@ietf.org]).

For archives of recent discussions, try:

* Xiph.Org [http://lists.xiph.org/pipermail/opus/ Opus mailing list archives]
* IETF [https://www.ietf.org/mail-archive/web/codec/current/maillist.html codec mailing list archives]

Informal development chat and support happens in #opus on irc.freenode.net. You can join the chat room through a '''[https://webchat.freenode.net/ web interface]''' if you don't have an IRC client.

== How To Contribute ==

There are many ways to contribute to Opus development:

* Reporting and fixing bugs
* Improving tools
* Improving testing framework - see the '''[https://github.com/xiph/opus/tree/master/tests test code]''' and the '''[[Opus_testvectors|Opus Test Vectors]]''' page
* Optimizations (assembly/intrinsics)
* Encoding quality improvements - see the '''[[Opus tuning]]''' page for suggestions
* Mapping to new containers - see Opus within '''[[MatroskaOpus|Matroska]]''', '''[[Mp4Opus|MP4]]''', '''[[OpusTS|Mpeg-TS]]''' and '''[[OggOpus|Ogg]]''' containers (Ogg spec is on Standards Track as '''[https://tools.ietf.org/html/rfc7845 RFC 7845]''').

It is generally advisable to contact the developers on the mailing list or the IRC channel before taking on new work on Opus,
to avoid duplicating work and to make sure you're doing things the right way from the start.

== How to report bugs ==

The '''[https://gitlab.xiph.org/ Xiph.Org GitLab]''' may be used to report bugs in [https://gitlab.xiph.org/xiph/opus/issues opus], [https://gitlab.xiph.org/xiph/opusfile/issues opusfile], [https://gitlab.xiph.org/xiph/libopusenc/issues libopusenc], or [https://gitlab.xiph.org/xiph/opus-tools/issues opus-tools]. Please also notify developers on the [mailto:opus@xiph.org mailing list].

For sensitive (security-related) bugs, please '''[https://www.opus-codec.org/contact/ contact the developers]''' directly.

== How to submit a patch ==

If you have a patch you would like to contribute, just send it to the mailing list.
We can also take [https://github.com/xiph/opus/ Github] pull requests,
but please send a note to the mailing list since the GitHub Opus repository is only a mirror.

== Coding style ==

Opus is the result of merging three different codebases and therefore does not
have a consistent coding style. For example, the SILK code uses 4-space
indentation, while the rest of the code is mostly 3-space, except the entropy
coder, which is 2-space.

The general rule is that you should follow the style of the code you're modifying.
* '''Do not''' reformat the code in your "functional change" patches, as it mostly makes it harder to review changes.
* '''Do''' send separate "format-fixing" patches, making sure they don't change Opus' functionality at all.

== Testing ==

For any new feature, it is strongly suggested to also include tests for the new code to make sure it works and keeps working in the future.

== Language ==

Opus only requires a C89 compiler, so any use of C99 and later constructs has
to be optional (e.g. OPUS_INLINE). This is also why we do not use C++-style
// comments.

To reduce the risk of exploitable memory errors, we do not use any function
pointers in the code unless they are declared as static const. We also
have "flat" objects, which can be copied using a "shallow copy", so do not add
pointers to non-static data in the data structures.

Opus Recommended Settings

2016-11-08T15:08:25Z

MarkH: /* Recommended Bitrates */ FLAC supports only 1-8 channels

= Recommended Bitrates =
Depending on the kind of audio you want to encode with Opus, you may want to use different bitrate (quality) settings.

The settings in the table below are meant to '''start you off''' with a decent tradeoff between '''good quality''' and '''small file size''' (or '''bitrate usage''', if you're streaming).

You should test the suggested bitrate by actually '''listening''' to your encoded audio and then tweaking the bitrate:
* '''down''' if you think the quality is good, but the file size (or bitrate) is too big
* '''up''' if you think the quality is bad, and you can afford having bigger files (or a larger streaming bitrate)

{| class="wikitable" style="text-align:center"
|-
!Use Case
!Channels
!Bitrate (Kb/s)
!Notes
|-
|Low bandwidth HF/VHF digital radio
|1 (mono)
|use '''[http://www.rowetel.com/?page_id=452 Codec 2]'''
|Opus only supports bitrates down to 6 Kb/s. Codec 2 handles ultra low bitrate speech from 0.7 to 3.2 Kb/s.
|-
|VoIP
|1
|10-24
|10 Kb/s will deliver narrowband most of the time, 24 Kb/s should give fullband. More details in '''[[Opus_Recommended_Settings#Bandwidth_Transition_Thresholds|the relevant table]]''' further down this page.
|-
|rowspan="2"|Audiobooks / Podcasts
|1
|24
|bitrates from here on up tend to deliver fullband audio.
|-
|2 (stereo)
|32
|
|-
|Music Streaming / Radio
|2
|64-96
|Opus has better quality than MP3, AAC and [[Vorbis]] at these rates. (test results '''[http://listening-tests.hydrogenaud.io/igorc/results.html here]''' and '''[http://listening-test.coresv.net/results.htm here]''')
|-
|rowspan="3"|Music Storage
|2
|96-128
|Opus at 128 KB/s (VBR) is pretty much '''[https://en.wikipedia.org/wiki/Transparency_(data_compression) transparent]'''
|-
|6 (5.1 surround)
|128-256
|rowspan="2"|for surround sound, Opus uses '''[https://xiph.org/~xiphmont/demo/opus/demo3.shtml surround-sound bitrate allocation]'''
|-
|8 (7.1 surround)
|256-450
|-
|Music Archiving
|1-8
|use '''[[FLAC]]'''
|if you are archiving audio, use a '''[https://en.wikipedia.org/wiki/Audio_file_format#Lossless_compressed_audio_format lossless audio format]''' to prevent '''[https://en.wikipedia.org/wiki/Generation_loss generation loss]'''
|}

= Technical Details =
For the more technical Opus users, here are some details to help you fine-tune your decision on which bitrate best fits your needs.

== Mono or Stereo ==
Opus tends to start '''downmixing stereo inputs to mono''' from roughly '''24 Kb/s and lower'''.
You can check the details in the '''[https://github.com/xiph/opus/blob/master/src/opus_encoder.c#L148 opus_encoder.c]''' source file.

You can force downmixing at any bitrate by using the following command-line parameters:

<code>--downmix-mono</code> - downmixes all input channels to mono

<code>--downmix-stereo</code> - downmixes all input channels to stereo (if there are more than 2 input channels, e.g. surround sound)

== Bandwidth Transition Thresholds ==
The following table shows rough bitrates that you might want to use to encode audio that has '''[https://tools.ietf.org/html/rfc6716#section-2 limited frequency bandwidths]'''.
This could be useful if your audio has already been bandpassed, or should go through a bandpass filter (e.g. VoIP speech).

{| class="wikitable" style="text-align:center"
|-
|rowspan="2"|(bitrates in Kb/s)
!colspan="2"|Mono
!colspan="2"|Stereo
|-
!Voice
!Music
!Voice
!Music
|-
!<abbr title="(3-4000 Hz)">NarrowBand</abbr>
|12
|15
|?
|?
|-
!<abbr title="(3-6000 Hz)">MediumBand</abbr>
|15
|18-22
|?
|?
|-
!<abbr title="(3-8000 Hz)">WideBand</abbr>
|16-20
|22-28
|?
|?
|-
!<abbr title="(3-12000 Hz)">SuperWideBand</abbr>
|24-28
|28-32
|?
|?
|-
!<abbr title="(3-20000 Hz)">FullBand</abbr>
|28-40
|32-64
|32-64
|64-128
|}

The details of Opus' bandpass thresholds can be found in the '''[https://github.com/xiph/opus/blob/master/src/opus_encoder.c#L121 opus_encoder.c]''' source file.

The '''[http://wiki.hydrogenaud.io/index.php?title=Opus HydrogenAudio]''' wiki also has some great information on Opus and its usage.

== Framesize Tweaking ==
Opus can encode frames of '''2.5''', '''5''', '''10''', '''20''', '''40''', or '''60 ms'''. It can also combine multiple frames into packets of '''up to 120 ms'''.

Opus uses a '''20 ms''' frame size '''[https://tools.ietf.org/html/rfc6716#section-2.1.4 by default]''', as it gives a decent mix of low latency and good quality.

For real-time applications, sending fewer packets per second reduces the overall bitrate, since it reduces the overhead from '''[https://en.wikipedia.org/wiki/IPv6_packet#Fixed_header IP]''', '''[https://en.wikipedia.org/wiki/User_Datagram_Protocol#Packet_structure UDP]''', and '''[https://en.wikipedia.org/wiki/Real-time_Transport_Protocol#Packet_header RTP headers]'''.
However, it increases latency and sensitivity to packet losses, as losing one packet constitutes a loss of a bigger chunk of audio.
Increasing the frame duration also slightly improves coding efficiency, but the gain becomes small for frame sizes above 20 ms.

For these reasons, the default 20 ms frames are a good choice for most applications.

== Trading Coding Efficiency with CPU Time ==
The Opus encoder uses its maximum algorithmic '''complexity''' setting of '''10''' '''[https://tools.ietf.org/html/rfc6716#section-2.1.5 by default]'''. This means that it does not hesitate to use CPU to give you the best quality encoding at a given bitrate.

If the CPU usage is too high for the system you are using Opus on, you can try a lower complexity setting. The allowed values span from '''10''' (highest CPU usage and quality) down to '''0''' (lowest CPU usage and quality).

[[Category:Opus]]

Opus Recommended Settings

2016-11-08T15:03:37Z

MarkH: /* Recommended Bitrates */ fix Codec2 URL, and clarify: not all ham radio uses want ultra low bitrate

= Recommended Bitrates =
Depending on the kind of audio you want to encode with Opus, you may want to use different bitrate (quality) settings.

The settings in the table below are meant to '''start you off''' with a decent tradeoff between '''good quality''' and '''small file size''' (or '''bitrate usage''', if you're streaming).

You should test the suggested bitrate by actually '''listening''' to your encoded audio and then tweaking the bitrate:
* '''down''' if you think the quality is good, but the file size (or bitrate) is too big
* '''up''' if you think the quality is bad, and you can afford having bigger files (or a larger streaming bitrate)

{| class="wikitable" style="text-align:center"
|-
!Use Case
!Channels
!Bitrate (Kb/s)
!Notes
|-
|Low bandwidth HF/VHF digital radio
|1 (mono)
|use '''[http://www.rowetel.com/?page_id=452 Codec 2]'''
|Opus only supports bitrates down to 6 Kb/s. Codec 2 handles ultra low bitrate speech from 0.7 to 3.2 Kb/s.
|-
|VoIP
|1
|10-24
|10 Kb/s will deliver narrowband most of the time, 24 Kb/s should give fullband. More details in '''[[Opus_Recommended_Settings#Bandwidth_Transition_Thresholds|the relevant table]]''' further down this page.
|-
|rowspan="2"|Audiobooks / Podcasts
|1
|24
|bitrates from here on up tend to deliver fullband audio.
|-
|2 (stereo)
|32
|
|-
|Music Streaming / Radio
|2
|64-96
|Opus has better quality than MP3, AAC and [[Vorbis]] at these rates. (test results '''[http://listening-tests.hydrogenaud.io/igorc/results.html here]''' and '''[http://listening-test.coresv.net/results.htm here]''')
|-
|rowspan="3"|Music Storage
|2
|96-128
|Opus at 128 KB/s (VBR) is pretty much '''[https://en.wikipedia.org/wiki/Transparency_(data_compression) transparent]'''
|-
|6 (5.1 surround)
|128-256
|rowspan="2"|for surround sound, Opus uses '''[https://xiph.org/~xiphmont/demo/opus/demo3.shtml surround-sound bitrate allocation]'''
|-
|8 (7.1 surround)
|256-450
|-
|Music Archiving
|any
|use '''[[FLAC]]'''
|if you are archiving audio, use a '''[https://en.wikipedia.org/wiki/Audio_file_format#Lossless_compressed_audio_format lossless audio format]''' to prevent '''[https://en.wikipedia.org/wiki/Generation_loss generation loss]'''
|}

= Technical Details =
For the more technical Opus users, here are some details to help you fine-tune your decision on which bitrate best fits your needs.

== Mono or Stereo ==
Opus tends to start '''downmixing stereo inputs to mono''' from roughly '''24 Kb/s and lower'''.
You can check the details in the '''[https://github.com/xiph/opus/blob/master/src/opus_encoder.c#L148 opus_encoder.c]''' source file.

You can force downmixing at any bitrate by using the following command-line parameters:

<code>--downmix-mono</code> - downmixes all input channels to mono

<code>--downmix-stereo</code> - downmixes all input channels to stereo (if there are more than 2 input channels, e.g. surround sound)

== Bandwidth Transition Thresholds ==
The following table shows rough bitrates that you might want to use to encode audio that has '''[https://tools.ietf.org/html/rfc6716#section-2 limited frequency bandwidths]'''.
This could be useful if your audio has already been bandpassed, or should go through a bandpass filter (e.g. VoIP speech).

{| class="wikitable" style="text-align:center"
|-
|rowspan="2"|(bitrates in Kb/s)
!colspan="2"|Mono
!colspan="2"|Stereo
|-
!Voice
!Music
!Voice
!Music
|-
!<abbr title="(3-4000 Hz)">NarrowBand</abbr>
|12
|15
|?
|?
|-
!<abbr title="(3-6000 Hz)">MediumBand</abbr>
|15
|18-22
|?
|?
|-
!<abbr title="(3-8000 Hz)">WideBand</abbr>
|16-20
|22-28
|?
|?
|-
!<abbr title="(3-12000 Hz)">SuperWideBand</abbr>
|24-28
|28-32
|?
|?
|-
!<abbr title="(3-20000 Hz)">FullBand</abbr>
|28-40
|32-64
|32-64
|64-128
|}

The details of Opus' bandpass thresholds can be found in the '''[https://github.com/xiph/opus/blob/master/src/opus_encoder.c#L121 opus_encoder.c]''' source file.

The '''[http://wiki.hydrogenaud.io/index.php?title=Opus HydrogenAudio]''' wiki also has some great information on Opus and its usage.

== Framesize Tweaking ==
Opus can encode frames of '''2.5''', '''5''', '''10''', '''20''', '''40''', or '''60 ms'''. It can also combine multiple frames into packets of '''up to 120 ms'''.

Opus uses a '''20 ms''' frame size '''[https://tools.ietf.org/html/rfc6716#section-2.1.4 by default]''', as it gives a decent mix of low latency and good quality.

For real-time applications, sending fewer packets per second reduces the overall bitrate, since it reduces the overhead from '''[https://en.wikipedia.org/wiki/IPv6_packet#Fixed_header IP]''', '''[https://en.wikipedia.org/wiki/User_Datagram_Protocol#Packet_structure UDP]''', and '''[https://en.wikipedia.org/wiki/Real-time_Transport_Protocol#Packet_header RTP headers]'''.
However, it increases latency and sensitivity to packet losses, as losing one packet constitutes a loss of a bigger chunk of audio.
Increasing the frame duration also slightly improves coding efficiency, but the gain becomes small for frame sizes above 20 ms.

For these reasons, the default 20 ms frames are a good choice for most applications.

== Trading Coding Efficiency with CPU Time ==
The Opus encoder uses its maximum algorithmic '''complexity''' setting of '''10''' '''[https://tools.ietf.org/html/rfc6716#section-2.1.5 by default]'''. This means that it does not hesitate to use CPU to give you the best quality encoding at a given bitrate.

If the CPU usage is too high for the system you are using Opus on, you can try a lower complexity setting. The allowed values span from '''10''' (highest CPU usage and quality) down to '''0''' (lowest CPU usage and quality).

[[Category:Opus]]

Opus Recommended Settings

2016-11-08T14:44:19Z

MarkH: /* Trading Coding Efficiency with CPU Time */ not really lots, try video encoding for that!

= Recommended Bitrates =
Depending on what kinds of sounds you want to encode with Opus, you should use different bitrate (quality) settings.

The settings in the table below are meant to '''start you off''' with a decent tradeoff between '''good quality''' and '''small filesize''' (or '''bitrate usage''', if you're streaming).

You should test the suggested bitrate by actually '''listening''' to your encoded audio and then tweaking the bitrate:
* '''down''' if you think the quality is good, but the filesize (or bitrate) is too big
* '''up''' if you think the quality is bad, and you can afford having bigger files (or a larger streaming bitrate)

{| class="wikitable" style="text-align:center"
|-
!Use Case
!Channels
!Bitrate (Kb/s)
!Notes
|-
|Ham radio
|1 (mono)
|use '''[http://www.rowetel.com/blog/?page_id=452 Codec 2]'''
|Opus only supports bitrates down to 6 Kb/s. Codec 2 handles speech from 0.7 to 3.2 Kb/s.
|-
|VoIP
|1
|10-24
|10 Kb/s will deliver narrowband most of the time, 24 Kb/s should give fullband. More details in '''[[Opus_Recommended_Settings#Bandwidth_Transition_Thresholds|the relevant table]]''' further down this page.
|-
|rowspan="2"|Audiobooks / Podcasts
|1
|24
|bitrates from here on up tend to deliver fullband audio.
|-
|2 (stereo)
|32
|
|-
|Music Streaming / Radio
|2
|64-96
|Opus has better quality than MP3, AAC and [[Vorbis]] at these rates. (test results '''[http://listening-tests.hydrogenaud.io/igorc/results.html here]''' and '''[http://listening-test.coresv.net/results.htm here]''')
|-
|rowspan="3"|Music Storage
|2
|96-128
|Opus at 128 KB/s (VBR) is pretty much '''[https://en.wikipedia.org/wiki/Transparency_(data_compression) transparent]'''
|-
|6 (5.1 surround)
|128-256
|rowspan="2"|for surround sound, Opus uses '''[https://xiph.org/~xiphmont/demo/opus/demo3.shtml surround-sound bitrate allocation]'''
|-
|8 (7.1 surround)
|256-450
|-
|Music Archiving
|any
|use '''[[FLAC]]'''
|if you are archiving audio, use a '''[https://en.wikipedia.org/wiki/Audio_file_format#Lossless_compressed_audio_format lossless audio format]''' to prevent '''[https://en.wikipedia.org/wiki/Generation_loss generation loss]'''
|}

= Technical Details =
For the more technical Opus users, here are some details to help you fine-tune your decision on which bitrate best fits your needs.

== Mono or Stereo ==
Opus tends to start '''downmixing stereo inputs to mono''' from roughly '''24 Kb/s and lower'''.
You can check the details in the '''[https://github.com/xiph/opus/blob/master/src/opus_encoder.c#L148 opus_encoder.c]''' source file.

You can force downmixing at any bitrate by using the following command-line parameters:

<code>--downmix-mono</code> - downmixes all input channels to mono

<code>--downmix-stereo</code> - downmixes all input channels to stereo (if there are more than 2 input channels, e.g. surround sound)

== Bandwidth Transition Thresholds ==
The following table shows rough bitrates that you might want to use to encode audio that has '''[https://tools.ietf.org/html/rfc6716#section-2 limited frequency bandwidths]'''.
This could be useful if your audio has already been bandpassed, or should go through a bandpass filter (e.g. VoIP speech).

{| class="wikitable" style="text-align:center"
|-
|rowspan="2"|(bitrates in Kb/s)
!colspan="2"|Mono
!colspan="2"|Stereo
|-
!Voice
!Music
!Voice
!Music
|-
!<abbr title="(3-4000 Hz)">NarrowBand</abbr>
|12
|15
|?
|?
|-
!<abbr title="(3-6000 Hz)">MediumBand</abbr>
|15
|18-22
|?
|?
|-
!<abbr title="(3-8000 Hz)">WideBand</abbr>
|16-20
|22-28
|?
|?
|-
!<abbr title="(3-12000 Hz)">SuperWideBand</abbr>
|24-28
|28-32
|?
|?
|-
!<abbr title="(3-20000 Hz)">FullBand</abbr>
|28-40
|32-64
|32-64
|64-128
|}

The details of Opus' bandpass thresholds can be found in the '''[https://github.com/xiph/opus/blob/master/src/opus_encoder.c#L121 opus_encoder.c]''' source file.

The '''[http://wiki.hydrogenaud.io/index.php?title=Opus HydrogenAudio]''' wiki also has some great information on Opus and its usage.

== Framesize Tweaking ==
Opus can encode frames of '''2.5''', '''5''', '''10''', '''20''', '''40''', or '''60 ms'''. It can also combine multiple frames into packets of '''up to 120 ms'''.

Opus uses a '''20 ms''' frame size '''[https://tools.ietf.org/html/rfc6716#section-2.1.4 by default]''', as it gives a decent mix of low latency and good quality.

For real-time applications, sending fewer packets per second reduces the overall bitrate, since it reduces the overhead from '''[https://en.wikipedia.org/wiki/IPv6_packet#Fixed_header IP]''', '''[https://en.wikipedia.org/wiki/User_Datagram_Protocol#Packet_structure UDP]''', and '''[https://en.wikipedia.org/wiki/Real-time_Transport_Protocol#Packet_header RTP headers]'''.
However, it increases latency and sensitivity to packet losses, as losing one packet constitutes a loss of a bigger chunk of audio.
Increasing the frame duration also slightly improves coding efficiency, but the gain becomes small for frame sizes above 20 ms.

For these reasons, the default 20 ms frames are a good choice for most applications.

== Trading Coding Efficiency with CPU Time ==
The Opus encoder uses its maximum algorithmic '''complexity''' setting of '''10''' '''[https://tools.ietf.org/html/rfc6716#section-2.1.5 by default]'''. This means that it does not hesitate to use CPU to give you the best quality encoding at a given bitrate.

If the CPU usage is too high for the system you are using Opus on, you can try a lower complexity setting. The allowed values span from '''10''' (highest CPU usage and quality) down to '''0''' (lowest CPU usage and quality).

[[Category:Opus]]

OpusTodo

2016-07-20T02:59:33Z

MarkH: update for 1.1.3

== For 1.2 ==
* Low bitrate quality improvements
* AVX optimizations
* Fix compilation as a single module for gecko

== Spec ==
* Matroska mapping. See: [[MatroskaOpus]] And firefox/ffmpeg implementation
* RTP payload format. Mono/stereo mapping is complete [[https://tools.ietf.org/html/rfc7587 RFC 7587]], no multichannel mapping yet.
* mp4 mapping. See [[https://opus-codec.org/docs/opus_in_isobmff.html ISO Base Media File Format draft]]

== Website ==
* De-uglify webpage - some suggestions:
** write about codecs obsoleted by OPUS (Speex, CELT, Vorbis(?) and the proprietary ones)
** write about implementations (libopus encoder/decoder, libavcodec decoder, any others?)
** [https://en.wikipedia.org/wiki/Comparison_of_audio_coding_formats audio codec comparison table] (Opus, Vorbis, Speex, ..., MP5) of features (channels, freq, bits per sample, license, language (C89), integer impl. (Vorbis decoder only, Opus YES, ...)
** future use in video files (Theora? Dirac? WebM? other future codecs...)
** audio files for storage (like Vorbis, no raw Opus defined, only inside OGG), ...
* Promotional material (some nice free/public-domain sounds/radio stations in Opus format)

== Other ==

* Oggz-validate (should also validate opus toc)

== Opus-tools ==
* Port opusdec to libopusfile/libopusurl.
* A simple real time streaming example tool
** Start with opusrtp.c in [https://git.xiph.org/?p=opus-tools.git opus-tools]
** Make <code>opusrtp rtp://example.com:5431/</code> listen to that host and port and mux packets from there. Generalize the cpac bases --sniff implementation
** Make sending similarly generic. Maybe just <code>opusrtp source.opus -o rtp://example.com:5431/</code> to send source.opus out to the destination?
** Make --sniff save one file per
** Implement DTLS-SRTP. See webrtc.
** audio capture/encode, decode/playback?
** Parse and act on sdp for convenience and testing.

* EBU R128/Replaygain (half done— needs a gain tool)

== Surround work ==

* Apply spreading to energy masking
* More conservative energy masking (not just mean difference) and dynalloc
* Allow SILK/hybrid on center channel for voice?

== Psychoacoustic stuff ==

* Adaptive width narrowing and forced intensity stereo bands

== Optimisations ==

* Vectorising comb_filter()
* Use 16-bit mul plus shift in denormalise_bands()
* Optimise MDCT somehow

== Third-Party tool enhancements ==
* mutagen: [https://bitbucket.org/lazka/mutagen/issue/202/oggopus-support-in-place-rewrites-for support padding in comments header], [https://bitbucket.org/lazka/mutagen/issue/203/oggopus-allow-updating-the-output_gain allow updating output gain in ID header]

== Future work ==
* psymodel based VBR
* Remove copy in inverse MDCT
* Save some float<->int conversions
* Improvements to LP mode CBR (greg has some code)
* Unconstrained SILK VBR
* Better handling for the case where FEC has a different bandwidth than the current mode
* PLC transitions on unprotected SILK-SILK bandwidth changes?
* Figure out how to use speech/music detection optimally
** find optimal switching time (low energy/tonality)
* Improve variable frame size

[[Category:Opus]]

OpusFAQ

2016-05-01T13:06:50Z

MarkH: opus-codec.com→opus-codec.org, http→https

If you are looking for info not covered in this FAQ, try the [https://opus-codec.org main Opus website] or the pages included in the [[:Category:Opus|Opus category]] of this wiki.

[[Image:Opus logo trans.png|right]]

== General Questions ==

=== What is Opus? Who created it? ===

Opus is a totally open, royalty-free, highly versatile audio codec.

It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''.

Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.

=== How does Opus compare to other codecs? ===

Opus is distinguished from most high quality formats (eg: AAC, [[Vorbis]], MP3) by having '''low delay''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: G.711, GSM, Speex) by supporting '''high audio quality''' ([https://tools.ietf.org/html/rfc6716#section-2.1.1 details here]).

It meets or exceeds existing codecs' quality across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.

Most importantly, the Opus format and its reference implementation are both available under '''[https://opus-codec.org/license/ liberal, royalty-free licenses]'''. 
This makes it:
* easy to adopt
* compatible with free software
* suitable for use as part of the basic infrastructure of the Internet

See the Opus '''[https://opus-codec.org/comparison comparison page]''' for more details.

=== Does Opus make all those other lossy codecs obsolete? ===

Theoretically, yes.

From a technical point of view (loss, delay, bitrates, ...) it should replace both [[Vorbis]] and [[Speex]], and the common proprietary codecs too.

=== Will Opus replace Vorbis in video files? ===

For Ogg [[Theora]] video files, it can, just the overall size reduction will be minimal and it will break compatibility with existing players.

For WebM video files, the convention is to use the VP9 video codec when using Opus as an audio codec.

=== How do I use Opus? What programs support Opus? ===

Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.

For now, the best way to '''encode''' Opus files is to use the '''opusenc''' command-line tool from the '''[https://opus-codec.org/downloads/ opus-tools package]'''.
If you want to encode many files at once (e.g. your music library), you can also try the '''[http://lamexp.sourceforge.net/ LameXP]''' converter.

For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.

Opus is a relatively new codec: many more applications will support it in the near future.

=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===

Yes and no.

Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.

However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.

The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.

See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.

If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!

=== What are the licensing requirements? ===

The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met.

Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).

See the '''[https://www.opus-codec.org/license/ Opus Licensing]''' page for details.

=== Why make Opus free? ===

On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.

Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.

Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.

This is why Opus, unlike many codecs, is free.

=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===

No.

The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.

=== Why not keep the SILK and CELT codecs separate? ===
Opus is more than just two independent codecs with a switch.

In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.

Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.

=== Now that Opus is standardized, will its development stop, or can it be further improved? ===
Yes, Opus '''can''' and '''should''' be improved, because unlike most ITU-T codecs, Opus is only defined in terms of its decoder.

The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.

Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.

In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.

=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===

Yes.

=== In what ways is Opus optimized for the Internet? ===

Opus has good packet loss robustness and concealment, but its optimisations go further.

One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.

This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.

One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.

=== What applications for Android can play Opus? ===

Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.

=== When will the next version be released? ===

When it's done. Seriously, we do not know.

Opus is not a large project with a fixed release schedule.

That being said, our [https://www.opus-codec.org/downloads/ pre-releases] and even the [https://git.xiph.org/?p=opus.git git repository] are generally pretty stable and given proper testing (which you should always do anyway), are safe to distribute.

Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.

== Software Developers' Questions ==

=== On what platforms does Opus run? ===

The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.

Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.

=== Is there a fixed-point implementation? ===

Yes.

The fixed-point and floating-point decoder and encoder implementations are part of the same code base.

The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.

=== Which implementation should I use? ===

While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.

The [https://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.

All Opus implementations are compatible by definition.

=== How is supporting Opus different from supporting Speex/G.711/MP3? ===

Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''.

The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.

=== My application doesn't work. Can anyone help me? ===

It's possible to get help, but before doing so, there are a few basic things to try:

* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.
* Read the [https://www.opus-codec.org/docs/ Opus documentation].
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.

If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.

=== How do I report a bug? ===

If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].

Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.

If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.

Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].

=== What is Opus Custom? ===

Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.

Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.

For these reasons, its use is discouraged outside of very specific applications, for example:
* ultra low delay applications where synchronization with the soundcard buffer is important.
* low-power embedded applications where compatibility with others is not important.

For almost all other types of applications, Opus Custom should not be used.

=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===

Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.

Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.

The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed resampler which can be used where resampling is required.

=== How is the bitrate setting used in VBR mode? ===

Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.

The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.

=== What frame size should I use? ===

A '''20ms''' frame size works well for most applications.

Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.

Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent).

=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===

The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.

In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.

FEC is only used by the encoder under certain conditions:
* the feature must be enabled via the OPUS_SET_INBAND_FEC CTL
* the encoder must be told to expect loss via the OPUS_SET_PACKET_LOSS_PERC CTL
* the codec must be operated in any of the linear prediction or Hybrid modes

Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.

Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default the implementation assumes there is no loss.

=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===

A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.

To build Opus without the references to <tt>malloc/free</tt>, you must:

* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.

If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage. 
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.

=== How can I ensure that my software interoperates with other software implementing Opus? ===

For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.

In general, here's a list of specific issues to check:
* Can your application handle all frame sizes, including changing the frame size from frame to frame?
* Does your application properly react to lost packet by calling the decoder with a NULL packet?

=== What is the complexity of Opus? ===

The complexity of Opus varies by a large amount based on the settings used.

It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone.

For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.

=== Opus is using too much CPU for my application. What can I do? ===

First don't panic and don't start writing assembly just yet.

It's possible that you're just not using the right set of options.

If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.

Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[https://www.opus-codec.org/docs/ Documentation]''' for details).

If all else fails and you need to optimize the Opus code, see the next question.

=== I would like to optimize/improve/help with Opus. Where should I start? ===

Please '''[https://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.

This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path.

=== Does Opus have an echo canceller like Speex does? ===

Echo cancellation is completely independent from codecs.

You can use any echo canceller (including the one from libspeexdsp) along with Opus.

That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].

[[Category:Opus]]

OpusFAQ

2016-05-01T04:37:22Z

MarkH: http→https for sites where it functions correctly

If you are looking for info not covered in this FAQ, try the [http://opus-codec.org main Opus website] or the pages included in the [[:Category:Opus|Opus category]] of this wiki.

[[Image:Opus logo trans.png|right]]

== General Questions ==

=== What is Opus? Who created it? ===

Opus is a totally open, royalty-free, highly versatile audio codec.

It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''.

Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.

=== How does Opus compare to other codecs? ===

Opus is distinguished from most high quality formats (eg: AAC, [[Vorbis]], MP3) by having '''low delay''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: G.711, GSM, Speex) by supporting '''high audio quality''' ([https://tools.ietf.org/html/rfc6716#section-2.1.1 details here]).

It meets or exceeds existing codecs' quality across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.

Most importantly, the Opus format and its reference implementation are both available under '''[http://opus-codec.com/license/ liberal, royalty-free licenses]'''. 
This makes it:
* easy to adopt
* compatible with free software
* suitable for use as part of the basic infrastructure of the Internet

See the Opus '''[http://opus-codec.org/comparison comparison page]''' for more details.

=== Does Opus make all those other lossy codecs obsolete? ===

Theoretically, yes.

From a technical point of view (loss, delay, bitrates, ...) it should replace both [[Vorbis]] and [[Speex]], and the common proprietary codecs too.

=== Will Opus replace Vorbis in video files? ===

For Ogg [[Theora]] video files, it can, just the overall size reduction will be minimal and it will break compatibility with existing players.

For WebM video files, the convention is to use the VP9 video codec when using Opus as an audio codec.

=== How do I use Opus? What programs support Opus? ===

Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.

For now, the best way to '''encode''' Opus files is to use the '''opusenc''' command-line tool from the '''[http://opus-codec.com/downloads/ opus-tools package]'''.
If you want to encode many files at once (e.g. your music library), you can also try the '''[http://lamexp.sourceforge.net/ LameXP]''' converter.

For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.

Opus is a relatively new codec: many more applications will support it in the near future.

=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===

Yes and no.

Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.

However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.

The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.

See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.

If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!

=== What are the licensing requirements? ===

The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met.

Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).

See the '''[http://www.opus-codec.org/license/ Opus Licensing]''' page for details.

=== Why make Opus free? ===

On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.

Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.

Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.

This is why Opus, unlike many codecs, is free.

=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===

No.

The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.

=== Why not keep the SILK and CELT codecs separate? ===
Opus is more than just two independent codecs with a switch.

In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.

Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.

=== Now that Opus is standardized, will its development stop, or can it be further improved? ===
Yes, Opus '''can''' and '''should''' be improved, because unlike most ITU-T codecs, Opus is only defined in terms of its decoder.

The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.

Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.

In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.

=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===

Yes.

=== In what ways is Opus optimized for the Internet? ===

Opus has good packet loss robustness and concealment, but its optimisations go further.

One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.

This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.

One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.

=== What applications for Android can play Opus? ===

Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.

=== When will the next version be released? ===

When it's done. Seriously, we do not know.

Opus is not a large project with a fixed release schedule.

That being said, our [http://www.opus-codec.org/downloads/ pre-releases] and even the [https://git.xiph.org/?p=opus.git git repository] are generally pretty stable and given proper testing (which you should always do anyway), are safe to distribute.

Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.

== Software Developers' Questions ==

=== On what platforms does Opus run? ===

The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.

Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.

=== Is there a fixed-point implementation? ===

Yes.

The fixed-point and floating-point decoder and encoder implementations are part of the same code base.

The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.

=== Which implementation should I use? ===

While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.

The [http://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.

All Opus implementations are compatible by definition.

=== How is supporting Opus different from supporting Speex/G.711/MP3? ===

Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''.

The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.

=== My application doesn't work. Can anyone help me? ===

It's possible to get help, but before doing so, there are a few basic things to try:

* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.
* Read the [http://www.opus-codec.org/docs/ Opus documentation].
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.

If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.

=== How do I report a bug? ===

If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].

Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.

If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.

Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].

=== What is Opus Custom? ===

Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.

Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.

For these reasons, its use is discouraged outside of very specific applications, for example:
* ultra low delay applications where synchronization with the soundcard buffer is important.
* low-power embedded applications where compatibility with others is not important.

For almost all other types of applications, Opus Custom should not be used.

=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===

Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.

Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.

The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed resampler which can be used where resampling is required.

=== How is the bitrate setting used in VBR mode? ===

Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.

The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.

=== What frame size should I use? ===

A '''20ms''' frame size works well for most applications.

Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.

Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent).

=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===

The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.

In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.

FEC is only used by the encoder under certain conditions:
* the feature must be enabled via the OPUS_SET_INBAND_FEC CTL
* the encoder must be told to expect loss via the OPUS_SET_PACKET_LOSS_PERC CTL
* the codec must be operated in any of the linear prediction or Hybrid modes

Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.

Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default the implementation assumes there is no loss.

=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===

A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.

To build Opus without the references to <tt>malloc/free</tt>, you must:

* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.

If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage. 
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.

=== How can I ensure that my software interoperates with other software implementing Opus? ===

For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.

In general, here's a list of specific issues to check:
* Can your application handle all frame sizes, including changing the frame size from frame to frame?
* Does your application properly react to lost packet by calling the decoder with a NULL packet?

=== What is the complexity of Opus? ===

The complexity of Opus varies by a large amount based on the settings used.

It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone.

For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.

=== Opus is using too much CPU for my application. What can I do? ===

First don't panic and don't start writing assembly just yet.

It's possible that you're just not using the right set of options.

If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.

Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[http://www.opus-codec.org/docs/ Documentation]''' for details).

If all else fails and you need to optimize the Opus code, see the next question.

=== I would like to optimize/improve/help with Opus. Where should I start? ===

Please '''[http://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.

This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path.

=== Does Opus have an echo canceller like Speex does? ===

Echo cancellation is completely independent from codecs.

You can use any echo canceller (including the one from libspeexdsp) along with Opus.

That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].

[[Category:Opus]]

OpusExtensions

2016-05-01T04:12:43Z

MarkH: RFC 7845

Opus audio data packets begin with a "table of contents" (TOC) sequence which defines the frame duration, audio bandwidth and coding mode of the packet, as well as describing how individual frames are packed into the data packet. [[https://tools.ietf.org/html/rfc6716#section-3.1 RFC 6716 Section 3.1]]. Other types of data packets are used with Opus in various containers are designed to start with a sequence which is not a valid TOC. This simplifies sorting such data for muxing implementations and ensures they will be rejected by the decoder if they are accidentally passed as Opus audio data.

Below is a list of such alternate sequences, to avoid duplication.

== List of reserved invalid Opus TOC sequences ==

* `Op` is used as a prefix for metadata headers in .opus files. [https://tools.ietf.org/html/rfc7845 RFC 7845]
* '0x3FF' in the first 11 bits marks an `opus_control_header` in MPEG-TS. [[OpusTS]]

== Space of all invalid Opus TOC sequences ==

* 0x0300
* 0x030D...0x0340
* 0x034D...0x0380
* 0x038D...0x03C0
* 0x03CD...0x03FF

* 0x0700
* 0x070D...0x0740
* 0x074D...0x0780
* 0x078D...0x07C0
* 0x07CD...0x07FF

* 0x0B00
* 0x0B07...0x0B40
* 0x0B47...0x0B80
* 0x0B87...0x0BC0
* 0x0BC7...0x0BFF

* 0x0F00
* 0x0F07...0x0F40
* 0x0F47...0x0F80
* 0x0F87...0x0FC0
* 0x0FC7...0x0FFF

* 0x1300
* 0x1304...0x1340
* 0x1344...0x1380
* 0x1384...0x13C0
* 0x13C4...0x13FF

* 0x1700
* 0x1704...0x1740
* 0x1744...0x1780
* 0x1784...0x17C0
* 0x17C4...0x17FF

* 0x1B00
* 0x1B03...0x1B40
* 0x1B43...0x1B80
* 0x1B83...0x1BC0
* 0x1BC3...0x1BFF

* 0x1F00
* 0x1F03...0x1F40
* 0x1F43...0x1F80
* 0x1F83...0x1FC0
* 0x1FC3...0x1FFF

* 0x2300
* 0x230D...0x2340
* 0x234D...0x2380
* 0x238D...0x23C0
* 0x23CD...0x23FF

* 0x2700
* 0x270D...0x2740
* 0x274D...0x2780
* 0x278D...0x27C0
* 0x27CD...0x27FF

* 0x2B00
* 0x2B07...0x2B40
* 0x2B47...0x2B80
* 0x2B87...0x2BC0
* 0x2BC7...0x2BFF

* 0x2F00
* 0x2F07...0x2F40
* 0x2F47...0x2F80
* 0x2F87...0x2FC0
* 0x2FC7...0x2FFF

* 0x3300
* 0x3304...0x3340
* 0x3344...0x3380
* 0x3384...0x33C0
* 0x33C4...0x33FF

* 0x3700
* 0x3704...0x3740
* 0x3744...0x3780
* 0x3784...0x37C0
* 0x37C4...0x37FF

* 0x3B00
* 0x3B03...0x3B40
* 0x3B43...0x3B80
* 0x3B83...0x3BC0
* 0x3BC3...0x3BFF

* 0x3F00
* 0x3F03...0x3F40
* 0x3F43...0x3F80
* 0x3F83...0x3FC0
* 0x3FC3...0x3FFF

* 0x4300
* 0x430D...0x4340
* 0x434D...0x4380
* 0x438D...0x43C0
* 0x43CD...0x43FF

* 0x4700
* 0x470D...0x4740
* 0x474D...0x4780
* 0x478D...0x47C0
* 0x47CD...0x47FF

* 0x4B00
* 0x4B07...0x4B40
* 0x4B47...0x4B80
* 0x4B87...0x4BC0
* 0x4BC7...0x4BFF

* 0x4F00
* 0x4F07...0x4F40
* 0x4F47...0x4F6F
* 0x4F70 ("Op"): ID and Comment headers in .opus files [https://tools.ietf.org/html/rfc7845 RFC 7845]
* 0x4F71...0x4F80
* 0x4F87...0x4FC0
* 0x4FC7...0x4FFF

* 0x5300
* 0x5304...0x5340
* 0x5344...0x5380
* 0x5384...0x53C0
* 0x53C4...0x53FF

* 0x5700
* 0x5704...0x5740
* 0x5744...0x5780
* 0x5784...0x57C0
* 0x57C4...0x57FF

* 0x5B00
* 0x5B03...0x5B40
* 0x5B43...0x5B80
* 0x5B83...0x5BC0
* 0x5BC3...0x5BFF

* 0x5F00
* 0x5F03...0x5F40
* 0x5F43...0x5F80
* 0x5F83...0x5FC0
* 0x5FC3...0x5FFF

* 0x6300
* 0x630D...0x6340
* 0x634D...0x6380
* 0x638D...0x63C0
* 0x63CD...0x63FF

* 0x6700
* 0x670D...0x6740
* 0x674D...0x6780
* 0x678D...0x67C0
* 0x67CD...0x67FF

* 0x6B00
* 0x6B07...0x6B40
* 0x6B47...0x6B80
* 0x6B87...0x6BC0
* 0x6BC7...0x6BFF

* 0x6F00
* 0x6F07...0x6F40
* 0x6F47...0x6F80
* 0x6F87...0x6FC0
* 0x6FC7...0x6FFF

* 0x7300
* 0x730D...0x7340
* 0x734D...0x7380
* 0x738D...0x73C0
* 0x73CD...0x73FF

* 0x7700
* 0x770D...0x7740
* 0x774D...0x7780
* 0x778D...0x77C0
* 0x77CD...0x77FF

* 0x7B00
* 0x7B07...0x7B40
* 0x7B47...0x7B80
* 0x7B87...0x7BC0
* 0x7BC7...0x7BFF

* 0x7F00
* 0x7F07...0x7F40
* 0x7F47...0x7F80
* 0x7F87...0x7FC0
* 0x7FC7...0x7FDF
* 0x7FE0...0x7FFF: opus_control_header in MPEG-TS [[OpusTS]]

* 0x8300
* 0x8331...0x8340
* 0x8371...0x8380
* 0x83B1...0x83C0
* 0x83F1...0x83FF

* 0x8700
* 0x8731...0x8740
* 0x8771...0x8780
* 0x87B1...0x87C0
* 0x87F1...0x87FF

* 0x8B00
* 0x8B19...0x8B40
* 0x8B59...0x8B80
* 0x8B99...0x8BC0
* 0x8BD9...0x8BFF

* 0x8F00
* 0x8F19...0x8F40
* 0x8F59...0x8F80
* 0x8F99...0x8FC0
* 0x8FD9...0x8FFF

* 0x9300
* 0x930D...0x9340
* 0x934D...0x9380
* 0x938D...0x93C0
* 0x93CD...0x93FF

* 0x9700
* 0x970D...0x9740
* 0x974D...0x9780
* 0x978D...0x97C0
* 0x97CD...0x97FF

* 0x9B00
* 0x9B07...0x9B40
* 0x9B47...0x9B80
* 0x9B87...0x9BC0
* 0x9BC7...0x9BFF

* 0x9F00
* 0x9F07...0x9F40
* 0x9F47...0x9F80
* 0x9F87...0x9FC0
* 0x9FC7...0x9FFF

* 0xA300
* 0xA331...0xA340
* 0xA371...0xA380
* 0xA3B1...0xA3C0
* 0xA3F1...0xA3FF

* 0xA700
* 0xA731...0xA740
* 0xA771...0xA780
* 0xA7B1...0xA7C0
* 0xA7F1...0xA7FF

* 0xAB00
* 0xAB19...0xAB40
* 0xAB59...0xAB80
* 0xAB99...0xABC0
* 0xABD9...0xABFF

* 0xAF00
* 0xAF19...0xAF40
* 0xAF59...0xAF80
* 0xAF99...0xAFC0
* 0xAFD9...0xAFFF

* 0xB300
* 0xB30D...0xB340
* 0xB34D...0xB380
* 0xB38D...0xB3C0
* 0xB3CD...0xB3FF

* 0xB700
* 0xB70D...0xB740
* 0xB74D...0xB780
* 0xB78D...0xB7C0
* 0xB7CD...0xB7FF

* 0xBB00
* 0xBB07...0xBB40
* 0xBB47...0xBB80
* 0xBB87...0xBBC0
* 0xBBC7...0xBBFF

* 0xBF00
* 0xBF07...0xBF40
* 0xBF47...0xBF80
* 0xBF87...0xBFC0
* 0xBFC7...0xBFFF

* 0xC300
* 0xC331...0xC340
* 0xC371...0xC380
* 0xC3B1...0xC3C0
* 0xC3F1...0xC3FF

* 0xC700
* 0xC731...0xC740
* 0xC771...0xC780
* 0xC7B1...0xC7C0
* 0xC7F1...0xC7FF

* 0xCB00
* 0xCB19...0xCB40
* 0xCB59...0xCB80
* 0xCB99...0xCBC0
* 0xCBD9...0xCBFF

* 0xCF00
* 0xCF19...0xCF40
* 0xCF59...0xCF80
* 0xCF99...0xCFC0
* 0xCFD9...0xCFFF

* 0xD300
* 0xD30D...0xD340
* 0xD34D...0xD380
* 0xD38D...0xD3C0
* 0xD3CD...0xD3FF

* 0xD700
* 0xD70D...0xD740
* 0xD74D...0xD780
* 0xD78D...0xD7C0
* 0xD7CD...0xD7FF

* 0xDB00
* 0xDB07...0xDB40
* 0xDB47...0xDB80
* 0xDB87...0xDBC0
* 0xDBC7...0xDBFF

* 0xDF00
* 0xDF07...0xDF40
* 0xDF47...0xDF80
* 0xDF87...0xDFC0
* 0xDFC7...0xDFFF

* 0xE300
* 0xE331...0xE340
* 0xE371...0xE380
* 0xE3B1...0xE3C0
* 0xE3F1...0xE3FF

* 0xE700
* 0xE731...0xE740
* 0xE771...0xE780
* 0xE7B1...0xE7C0
* 0xE7F1...0xE7FF

* 0xEB00
* 0xEB19...0xEB40
* 0xEB59...0xEB80
* 0xEB99...0xEBC0
* 0xEBD9...0xEBFF

* 0xEF00
* 0xEF19...0xEF40
* 0xEF59...0xEF80
* 0xEF99...0xEFC0
* 0xEFD9...0xEFFF

* 0xF300
* 0xF30D...0xF340
* 0xF34D...0xF380
* 0xF38D...0xF3C0
* 0xF3CD...0xF3FF

* 0xF700
* 0xF70D...0xF740
* 0xF74D...0xF780
* 0xF78D...0xF7C0
* 0xF7CD...0xF7FF

* 0xFB00
* 0xFB07...0xFB40
* 0xFB47...0xFB80
* 0xFB87...0xFBC0
* 0xFBC7...0xFBFF

* 0xFF00
* 0xFF07...0xFF40
* 0xFF47...0xFF80
* 0xFF87...0xFFC0
* 0xFFC7...0xFFFF

MatroskaOpus

2016-05-01T04:10:35Z

MarkH: RFC 7845

{{draft}}
This is an encapsulation spec for the [[Opus]] codec in [http://matroska.org/ Matroska]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.

* CodecID is A_OPUS
* SampleFrequecy is 48000
* Channels is number of output PCM channels
* SeekPreRoll is set to 80000000
* CodecPrivate consists of the 'OpusHead' packet, identical to the Ogg mapping.

The 'OpusHead' format is defined by the [https://tools.ietf.org/html/rfc7845 Ogg Opus] mapping. In particular it includes pre-skip, gain, and the channel mapping table required for correct surround output.

The second 'OpusTags' header packet from Ogg Opus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.

SeekPreRoll [56][BB] is a new unsigned integer element added to the TrackEntry element. The value is the number of nanoseconds that must be discarded, for that stream, after a seek until the decoded data is valid to render.

CodecDelay [56][AA] is a new unsigned integer element added to the TrackEntry element. The value is the number of nanoseconds that must be discarded, for that stream, from the start of that stream. The value is also the number of nanoseconds that all encoded timestamps for that stream must be shifted to get the presentation timestamp. (This will fix Vorbis encoding as well.)

DiscardPadding [75][A2] is a new signed integer element added to the BlockGroup element. DiscardPadding is the duration in nanoseconds of the silent data added to the Block (padding at the end of the block). The duration of DiscardPadding is not calculated in the duration of the Track and should be discarded during playback. (This will fix Vorbis encoding as well.)

== Muxing Recommendations ==

In order to prevent extraneous parsing of muxed content for the players that want to start playback at exactly time T, we will recommend muxers create files with another Cluster within N-1 at T-SeekPreRoll, where T is the start time of Cluster N. Then add CuePoints for all the new T-SeekPreRoll Clusters with a CueTrack of the audio stream. The CuePoints for the video stream will not change.

For example, a file is a muxed MKV with the following characteristics:
* 5 second interval between video keyframes
* Each video keyframe begins a new Cluster
* Cues will contain video keyframe CuePoints
* For each video keyframe at time T there will be new Cluster at T-SeekPreRoll
* Cues will contain audio CuePoints for T-SeekPreRoll Clusters
* Audio and video are interleaved in monotonically increasing order

Assume SeekPreRoll is 80 milliseconds, the first Cluster starts at 0 milliseconds with a video keyframe Block and has a duration of 4920 milliseconds. The second Cluster starts at 4920 milliseconds with an audio Block and has a duration of 80 milliseconds. Just to be clear, the second Cluster can contain Blocks from all streams. The third Cluster starts at 5000 milliseconds with a video keyframe Block and has a duration of 4920 milliseconds. The fourth Cluster starts at 9920 milliseconds with an audio Block and has a duration of 80 milliseconds.

With this recommendation players that want audio and video to start playback at time T can seek to Cluster T-SeekPreRoll and start decoding the audio stream. This will work the same for both local and HTTP playback.

== Open Questions ==

* Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?
** If the CodecPrivate is empty or not present and Channels is 1 or 2, players MAY treat it as a sane set of defaults, I guess. e.g. channel mapping family 0, no pre-skip or gain.
** For Channels > 2 the track MUST be rejected, since there's no way to map the encoded substreams to channels.
** We would also have to decide on a default value for OutputGain.
** Version must be 1.
* How can sample-accurate end-time trimming work in Matroska?
** We defined a new element added to a BlockGroup, DiscardPadding (previously PostPadding), which is defined as the number of nanoseconds to discard from the Block.
** Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus. This needs a new element specifying the number of samples to trim, perhaps a new BlockGroup child.
*** This has been addressed with DiscardPadding for Opus. DiscardPadding was speced to fix Vorbis (as well as other codecs) too.
* If new elements are required, can they be defined so as to enable correct seeking in rolling intra (a.k.a intra refresh) video as well?
** SeekPreRoll should work for rolling intra video.

== Handling Pre-skip data ==

* '''On [http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel Matroska-dev] we decided to implement proposal one ([http://lists.matroska.org/pipermail/matroska-devel/2013-June/004475.html ref]).'''
* Use Cases:
** UC1: Playback starts from the beginning of the stream. Source stream time starts at 0.
** UC2: Playback starts from the beginning of the stream. Pre-skip data ends in middle of compressed packet.
** UC3: Playback starts from the middle of the stream > SeekPreRoll time.
** UC4: Playback starts from the middle of the stream < SeekPreRoll time.
** UC5: Encode source stream to Opus, mux to Matroksa, then decode Opus stream, must have same number of samples as source stream.

* one: Timeshift the timestamps by pre-skip data.
** The Opus audio stream pre-skip data starts from time 0 and adds the pre-skip time to the normal audio time, like how Opus files are muxed into ogg files. We would add a new element to the TrackEntry element, CodecDelay, and the player would adjust the timestamps of the decoded samples by subtracting CodecDelay. All use cases should be covered.
** Cons:
*** The timestamp of the Block does not match the timestamp of the playback position.
*** Does not generalize known "decode, but not render" data.
*** Forces the player to handle the pre-skip samples. I.e. not the decoder.

* two: Add pre-skip data to CodecPrivate.
** On every discontinuity the decoder would need to decode and throw away the pre-skip data.
** Cons:
*** UC2 will throw away valid data and the AV sync will be off.
*** UC3 will redundantly decode the pre-skip data.

* three: Add TimeToDiscard to Block.
** Add an element to the Block element, TimeToDiscard in nanoseconds. A value of -1 would not render the whole Block, which would have the same effect as setting the invisible bit. How would this affect the Block timestamp? Maybe the new element should be SamplesToDiscard or DataToDiscard?
** Cons:

* four: Blocks that contain pre-skip data will set invisible flag.
** Blocks that contain pre-skip data have timestamps from the beginning of the stream. Blocks that only contain normal data have timestamps from the playback position.
** Cons:
*** Forces the player to handle the pre-skip samples. I.e. not the decoder.
*** UC2 will throw away valid data and the AV sync will be off. Other use cases should be fine.

* five: Force pre-skip packets to be prepended to the first normal packet in the first Block.
** The first Block's timestmap will be set to the start time of the source playback position. We would add a new element to the TrackEntry element, CodecDelay. All use cases should be covered.
** Cons:
*** Does not generalize known "decode, but not render" data.
*** Forces the player to handle the pre-skip samples. I.e. not the decoder.

* six: Create a new codec, OPUS_MKV.
** Basically the codec will wrap Opus packets with data telling the decoder what type of Opus packet it contains. Essentially we would be creating a new codec to handle pre-skip data within the decoder.
** Cons:
*** There will be two types of Opus data streams!
*** Does not generalize known "decode, but not render" data.

* seven: Negative timestamps.
** The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero.
** One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped <= start of output this shouldn't affect seeking.
** Cons:
*** Moritz suggests this won't work because the resolution of the timestamps is controlled by the muxer, so the SimpleBlock timestamp offset isn't sample accurate anyway ([http://lists.matroska.org/pipermail/matroska-devel/2012-September/004254.html ref]).

* eight: 
** The Ogg format uses granule positions which are converted to presentation timecodes using codec specific information on a per logical stream basis.
** The Matroska format uses absolute timecodes with an arbitrary per segement accuracy for all tracks in the segment.
** It is the belief of this tikiman that using a timecode offset of any kind in MKV is unholy.
** The preskip is communicated to the media software via the Opus header in the codec private data. At the begining of the track, the track timecode is not increased until prekip samples are in track frames.
** From then on audio is muxed as normal, however the audio should be muxed >= 3840 samples behind video frames.
*** i.e. Cluster Timecode: 5.000 seconds
*** Video Track Key Frame 5.000 seconds
*** Opus Track Frame 4.920 seconds

[[Category:Opus]]

MIME Types and File Extensions

2016-05-01T04:09:10Z

MarkH: RFC 7845

STATUS: [http://www.ietf.org/rfc/rfc5334.txt RFC 5334] encapsulates the below listed policies. More details are [http://wiki.xiph.org/index.php/MIMETypesCodecs here], which also include a specification of the codecs parameter of the MIME types. Use the correct file extensions straight away.

IMPLEMENTATION recommendations and patches: see [[MIME-Migration]].

== .ogg - audio/ogg ==

* Ogg Vorbis I Profile
* .ogg applies now for Vorbis I files only
* .ogg has more recently also been used for Ogg FLAC and for Theora, too — these uses are deprecated now in favor of .oga and .ogv respectively
* has been defined in RFC 3534 for application/ogg, so rfc 3534 will be re-defined

RATIONALE: .ogg has traditionally been used for Vorbis I files, in particular in HW players, hence it is kept for backwards-compatibility

== .ogv - video/ogg ==

* Ogg Video Profile (a/v in Ogg container)
* apps supporting .oga, .ogv SHOULD support decoding from muxed Ogg streams
* covers e.g. [[Theora]], Theora + Vorbis, Theora + Speex, Theora + FLAC, [[Dirac]] + Vorbis, [[OggMNG|MNG]] + FLAC, [[OggUVS]] inside Ogg
* This list is not exhaustive (for example, [[Dirac]] + FLAC is acceptable too)
* SHOULD contain a Skeleton track and/or MAY contain a CMML logical bitstream.

== .opus - audio/ogg ==

* Ogg Opus profile
* Defined by https://tools.ietf.org/html/rfc7845

== .oga - audio/ogg ==

* Ogg Audio Profile (audio in Ogg container)
* Applications supporting .oga, .ogv SHOULD support decoding from muxed Ogg streams
* Covers Ogg [[FLAC]], [[Ghost]], and [[OggPCM]]
* Although they share the same MIME type, Vorbis, Opus and Speex use different file extensions.
* SHOULD contain a Skeleton logical bitstream.
* Vorbis and Speex may use .oga, but it is not the prefered method of distributing these files because of backwards-compatibility issues.

== .ogx - application/ogg ==

* Ogg Multiplex Profile (anything in [[Ogg]])
* can contain any logical bitstreams multiplexed together in an ogg container
* will replace the .ogg extension from RFC 3534
* random multitrack files MUST contain a [[Skeleton]] track to identify all containing logical bitstreams
* apps that identify a logical bitstream which they cannot decode SHOULD ignore it but MAY still decode the ones they can
* thus, e.g. an annodex file can gracefully degrade to .ogx if an app cannot decode [[CMML]] and/or [[Skeleton]]
* USE: application/ogg has been registered, so can be used immediately

== .spx - audio/ogg ==

* Ogg Speex Profile
* .spx has traditionally been used for Speex files within Ogg and should be considered for backwards-compatibility

== .flac - audio/flac ==

* FLAC in native encapsulation format

== .anx - application/annodex ==

* THIS FILE FORMAT IS DEPRECATED.
* Profile for multiplexed Ogg that includes a skeleton track and at least one CMML logical bitstream
* apps that identify a logical bitstream which they cannot decode SHOULD ignore it but MAY still decode the ones they can
* apps that come across an annodex file and cannot decode CMML and/or Skeleton, but can deal with the others SHOULD gracefully degrade by ignoring these

== .axa - audio/annodex ==

* THIS FILE FORMAT IS DEPRECATED.
* Profile for audio in Annodex
* covers e.g. [[Vorbis]], [[Speex]], [[FLAC]], [[Opus]], [[Ghost]], [[OggPCM]] inside Ogg with Skeleton and CMML

== .axv - video/annodex ==

* THIS FILE FORMAT IS DEPRECATED.
* Profile for video in Annodex
* covers e.g. [[Theora]], Theora + Vorbis, Theora + Speex, Theora + FLAC, [[Dirac]] + Vorbis, [[OggMNG|MNG]] + FLAC, [[OggUVS]] inside Ogg with Skeleton and CMML

== .xspf - application/xspf+xml ==

* Profile for XSPF
* Covers [[XSPF]], while being used through XML
* Does not cover [[JSPF]], which is XSPF but on JSON

== Ogg Kate files - application/kate ==

* Binary representation of Kate encapsulated in Ogg
* may have a skeleton
* can be used to identify the mime type of the track itself (e.g. in skeleton)
* uses .ogx extension when in a file by itself
* is subdued by the dominant mime type if in a audio or video file to become audio/ogg or video/ogg

== Codec MIME types ==

Codecs need their own MIME types for streaming in RTP and to be used in multitrack ogg files using skeleton:

* audio/vorbis for Vorbis without container
* video/theora for Theora without container
* audio/speex for Speex without container
* audio/flac for FLAC without and in native container
* audio/opus for Opus without container
* text/cmml for CMML without container
* application/kate for the textual representation of Kate (.kate files)

OggOpus

2016-05-01T04:05:38Z

MarkH: draft is now RFC 7845

'''Superceded by [https://tools.ietf.org/html/rfc7845 RFC 7845].'''

== Ogg Mapping for Opus ==

The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [https://tools.ietf.org/html/rfc6716 Opus Specification] for technical details.

Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.

The remaining parameters that must be signaled are

* The magic number for stream identification,
* The stream count and coupling for multichannel audio, and
* Any metadata or tags.

=== Content Type ===

The recommended mime-type for Ogg Opus files is '''audio/ogg''', defined in [https://www.ietf.org/rfc/rfc5334.txt RFC 5334].

If more specificity is desired, one can distinguish Opus files as 'audio/ogg; codecs=opus'.

The recommended filename extension for Ogg Opus files is '''.opus'''.

=== Packet Organization ===

Opus is framed in a continuous logical [https://www.xiph.org/ogg/doc/framing.html Ogg stream].

There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.

The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.

The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.

All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first audio packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.

=== Granule Position ===

The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.

The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.

The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below.

All other pages with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer can assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.

There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.

The PCM sample position is determined from the granule position using the formula

'PCM sample position' = 'granule position' - 'pre-skip' .

For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula

'PCM sample position'
'playback time' = --------------------- .
48000.0

The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.

Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.

The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.

The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.

On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.

==== ID Header ====

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 'O' | 'p' | 'u' | 's' |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 'H' | 'e' | 'a' | 'd' |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version = 1 | channel count | pre-skip |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| original input sample rate in Hz |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| output gain Q7.8 in dB | channel map | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :
| |
: optional channel mapping table... :
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Brief description of each field:

- Magic signature: "OpusHead" (64 bits)
- Version number (8 bits unsigned): 0x01 for this spec
- Channel count 'c' (8 bits unsigned): MUST be > 0
- Pre-skip (16 bits unsigned, little endian)
- Input sample rate (32 bits unsigned, little endian): informational only
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when
decoding
- Channel mapping family (8 bits unsigned)
-- 0 = one stream: mono or L,R stereo
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...
-- 2..254 = reserved (treat as 255)
-- 255 = no defined channel meaning
If channel mapping family > 0
- Stream count 'N' (8 bits unsigned): MUST be > 0
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255
- Channel mapping (8*c bits)
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)

Detailed definition of each field:

* '''Magic signature'''
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.

* '''Version'''
The version number MUST always be '1' for this version of the encapsulation specification.

Implementations SHOULD treat streams where the upper four bits of the version number match a recognized specification as backwards-compatible with that specification. That is, the version number can be considered split into "major" and "minor" version sub-fields, with changes to the "minor" sub-field in the lower four bits signaling compatible changes. For example, a decoder implementing this specification SHOULD accept any stream with a version number 15 or less, and SHOULD assume any stream with a version number 16 or greater is incompatible. The initial version '1' was chosen to keep implementations from relying on this byte as a null terminator for the OpusHead string.

* '''Channel count''' 'c'
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.

* '''Pre-skip'''
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.

When constructing cropped Ogg Opus streams, a pre-skip of at least 3840 samples (80 ms) is RECOMMENDED to ensure complete convergence.

* '''Input sample rate'''
This is ''not'' the sample rate to use for playback of the encoded data.

Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.

An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:
* If the hardware supports 48 kHz playback, decode at 48 kHz,
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,
* else decode at 48 kHz and resample.

However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.

A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't
actually upsample the output to 10 MHz if requested).

* '''Output gain'''
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.

An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.

Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.

The gain is the 20 log10 ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.

* '''Channel mapping family'''
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet.

Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:

* Family 0 (RTP mapping)
** Allowed numbers of channels: 1 or 2
** 1 channel: monophonic (mono)
** 2 channels: stereo (left, right)
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.
* Family 1 ([https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-810004.3.9 Vorbis channel order])
** Allowed numbers of channels: 1 ... 8
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.
* Family 255 (no defined channel meaning)
** Allowed numbers of channels: 1...255
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.

The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.

An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.

* '''Stream count''' 'N'
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.

For channel mapping family 0, this value defaults to 1, and is not coded.

A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.

* '''Two-channel stream count''' 'M'
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.

For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.

Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.

Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.

* '''Channel mapping'''
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.

For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.

The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.

==== Comment Header ====

- 8 byte 'OpusTags' magic signature (64 bits)
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:
* Vendor string (always present).
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.
* TAG=value metadata strings (zero or more).
** 4-byte little-endian string count.
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.

One new comment field is introduced for Ogg Opus:
R128_TRACK_GAIN=-573
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [https://tech.ebu.ch/loudness EBU-R128] standard.

An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.

There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.

To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.

== Other Implementation Notes ==

When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.

Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Encoders SHOULD use no more padding than required to make a variable bitrate (VBR) stream constant bitrate (CBR). Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message.

In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of (61,298*N - 2) bytes, or about 60 kB per Opus stream. With 255 streams, this is 15,630,988 bytes (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 streams, each containing 120 ms of audio encoded as 2.5 ms frames, each frame using the maximum possible number of bytes (1275) and stored in the least efficient manner allowed (a VBR code 3 Opus packet). Even in such a packet, most of the data will be zeros, as 2.5 ms frames, which are required to run in the MDCT mode, cannot actually use all 1275 bytes. The largest packet consisting entirely of useful data is (15,326*N - 2) bytes, or about 15 kB per stream. This corresponds to 120 ms of audio encoded as 10 ms frames in either LP or Hybrid mode, but at a data rate of over 1 Mbps, which makes little sense for the quality achieved. A more reasonable limit is (7,664*N - 2) bytes, or about 7.5 kB per stream. This corresponds to 120 ms of audio encoded as 20 ms stereo MDCT-mode frames, with a total bitrate just under 511 kbps (not counting the Ogg encapsulation overhead). With N=8, the maximum useful number of streams for the channel meanings currently defined by mapping family 1, this gives a maximum packet size of 61,310 bytes, or just under 60 kB. This is still quite conservative, as it assumes each output channel is taken from one decoded channel of a stereo packet. An implementation could reasonably choose any of these numbers for its internal limits.

== Test Vectors ==

* [[OggOpus/testvectors|Planned test vectors for OggOpus]]
* Opus test vectors

[[Category:Ogg]]
[[Category:Opus]]

OpusTodo

2016-04-30T04:15:54Z

MarkH: Ogg mapping now published as RFC 7845

== For 1.1.3 ==
* aarch64 and AVX optimizations

== For 1.2 ==
* Low bitrate quality improvements
* Fix compilation as a single module for gecko

== Spec ==
* Matroska mapping. See: [[MatroskaOpus]] And firefox/ffmpeg implementation
* RTP payload format. Mono/stereo mapping is complete [[https://tools.ietf.org/html/rfc7587 RFC 7587]], no multichannel mapping yet.
* mp4 mapping. See [[https://opus-codec.org/docs/opus_in_isobmff.html ISO Base Media File Format draft]]

== Website ==
* De-uglify webpage - some suggestions:
** write about codecs obsoleted by OPUS (Speex, CELT, Vorbis(?) and the proprietary ones)
** write about implementations (libopus encoder/decoder, libavcodec decoder, any others?)
** [https://en.wikipedia.org/wiki/Comparison_of_audio_coding_formats audio codec comparison table] (Opus, Vorbis, Speex, ..., MP5) of features (channels, freq, bits per sample, license, language (C89), integer impl. (Vorbis decoder only, Opus YES, ...)
** future use in video files (Theora? Dirac? WebM? other future codecs...)
** audio files for storage (like Vorbis, no raw Opus defined, only inside OGG), ...
* Promotional material (some nice free/public-domain sounds/radio stations in Opus format)

== Other ==

* Oggz-validate (should also validate opus toc)

== Opus-tools ==
* Port opusdec to libopusfile/libopusurl.
* A simple real time streaming example tool
** Start with opusrtp.c in [https://git.xiph.org/?p=opus-tools.git opus-tools]
** Make <code>opusrtp rtp://example.com:5431/</code> listen to that host and port and mux packets from there. Generalize the cpac bases --sniff implementation
** Make sending similarly generic. Maybe just <code>opusrtp source.opus -o rtp://example.com:5431/</code> to send source.opus out to the destination?
** Make --sniff save one file per
** Implement DTLS-SRTP. See webrtc.
** audio capture/encode, decode/playback?
** Parse and act on sdp for convenience and testing.

* EBU R128/Replaygain (half done— needs a gain tool)

== Surround work ==

* Apply spreading to energy masking
* More conservative energy masking (not just mean difference) and dynalloc
* Allow SILK/hybrid on center channel for voice?

== Psychoacoustic stuff ==

* Adaptive width narrowing and forced intensity stereo bands

== Optimisations ==

* Vectorising comb_filter()
* Use 16-bit mul plus shift in denormalise_bands()
* Optimise MDCT somehow

== Third-Party tool enhancements ==
* mutagen: [https://bitbucket.org/lazka/mutagen/issue/202/oggopus-support-in-place-rewrites-for support padding in comments header], [https://bitbucket.org/lazka/mutagen/issue/203/oggopus-allow-updating-the-output_gain allow updating output gain in ID header]

== Future work ==
* psymodel based VBR
* Remove copy in inverse MDCT
* Save some float<->int conversions
* Improvements to LP mode CBR (greg has some code)
* Unconstrained SILK VBR
* Better handling for the case where FEC has a different bandwidth than the current mode
* PLC transitions on unprotected SILK-SILK bandwidth changes?
* Figure out how to use speech/music detection optimally
** find optimal switching time (low energy/tonality)
* Improve variable frame size

[[Category:Opus]]

OggOpusImplementation

2016-04-30T04:09:32Z

MarkH: Ogg Opus draft is now RFC 7845

== Implementation Status ==

Implementation status of [https://tools.ietf.org/html/rfc7845 RFC 7845]. This Internet Standards Track document describes encapsulation of Opus audio in the Ogg container to make <tt>.opus</tt> files and streams.

What follows is a brief summary of major implementations of the RFC, and their status.
This is intended to help understand the status of each portion of the RFC, per [https://tools.ietf.org/html/rfc6982 RFC 6982].

=== opus-tools ===

The initial development implementation of this RFC was in the opusenc, opusdec, and opusinfo command-line utilities, part of the opus-tools package and repository.
While still 'development' status (pre-1.0) these utilities are in active public use, and have shipped with Linux distributions as well as homebrew and MacPorts for OS X.
Together they implement basic read, write and playback support of Ogg Opus files including metadata, multichannel, start and end trimming, the gain field, live streams, and chained files, but currently do not support seeking.

This implementation is open source.

* https://git.xiph.org/?p=opus-tools.git
* http://www.opus-codec.org/downloads/

=== opusfile ===

The opusfile library is a separate implementation of this RFC as a helper library for demuxing and decoding.
Like opus-tools, it supports metadata, multichannel, start and end trimming, the gain field, live streams, and chained files.
Its primary focus is efficient seeking, including over HTTP(S) and in chained streams.
It currently does not create Ogg Opus files.
This library is in early development and is not widely deployed, though several projects are currently using it, including xmms2, taglib, and cmus, and it is shipped in some Linux distributions and in homebrew.

This implementation is open source.

* https://git.xiph.org/?p=opusfile.git
* http://www.opus-codec.org/downloads/

=== Firefox ===

The Firefox web browser is a widely deployed implementation of this RFC.
Basic playback support with the HTML5 <audio> element, including start and end trimming, the gain field, live streams, multiplexing with other streams (for, e.g., the <video> tag), and seeking, was added in Firefox 15, in production release starting August 28, 2012.
Multichannel support was added in Firefox 17, in production release starting November 20, 2012.
Metadata support was added in Firefox 18, in production release starting January 8, 2013.
Chained file support (as streams only, with seeking disabled) was added in Firefox 20, in production release starting April 2, 2013.
Encoding support was added in Firefox 26, in production release starting December 10, 2013.

This implementation is open source.

* https://mozilla.org/firefox/
* https://hacks.mozilla.org/2012/08/opus-support-for-webrtc/
* https://bugzilla.mozilla.org/show_bug.cgi?id=674225
* https://bugzilla.mozilla.org/show_bug.cgi?id=748144
* https://bugzilla.mozilla.org/show_bug.cgi?id=778050
* https://bugzilla.mozilla.org/show_bug.cgi?id=455165
* https://bugzilla.mozilla.org/show_bug.cgi?id=842243

=== Chrome ===

Google Chrome is a widely distributed implementation of this RFC. It added support with the HTML5 <audio> element in M25 and enabled it by default in M33 released in February 2014.
This implementation currently does not support chained files.
Prior to M33 support required passing --enable-opus-playback on the command line when invoking the executable.

This implementation is based on open source code in Chromium, Blink, and FFmpeg.

* https://www.google.com/intl/en/chrome/browser/
* https://www.google.com/intl/en/chrome/browser/canary.html
* https://code.google.com/p/chromium/issues/detail?id=104241

=== GStreamer ===

The GStreamer media framework includes an implementation of this RFC.
It supports metadata, multichannel, start and end trimming, the gain field, live streams, chained files, multiplexing with other streams (e.g., video), and seeking.
Support was first added in early 2011, and is part of the 0.11 and 1.0.x releases.
The code implementing this RFC is in the gst-plugins-bad collection, which generally indicates unsupported and/or experimental code, despite its release status.

This implementation is open source.

* http://gstreamer.net/
* http://cgit.freedesktop.org/gstreamer/gst-plugins-bad/

=== FFmpeg ===

The popular media framework and conversion tool FFmpeg implements this RFC. It supports encoding and decoding, multiplexing and demultiplexing with other streams, metadata, multichannel, start and end trimming, the gain field, live streams, and seeking.

This implementation is open source.

* https://ffmpeg.org/

=== libav ===

The development repository for libav implements this RFC, similar to FFmpeg.

This implementation is open source.

* https://libav.org/

=== VLC ===

VLC is another widely deployed implementation of demuxing, decoding, and playback support for this RFC.
It supports metadata, multichannel, start and end trimming, the gain field, live streams, seeking, chained files (though seeking does not work correctly with chained files), and multiplexing with other streams (e.g., video).
Opus support was added in version 2.0.4, released on October 18, 2012.

This implementation is open source.

* https://www.videolan.org/vlc/
* https://git.videolan.org/?p=vlc.git
* https://trac.videolan.org/vlc/ticket/7185

=== foobar2000 ===

A popular Windows application, foobar2000 implements read, write, and playback support for this RFC.
It supports metadata, multichannel, start and end trimming, the gain field, live streams, chained files, and seeking.
Opus support was added in version 1.1.14, released on August 17, 2012.
Encoding support is implemented using opusenc from opus-tools.

This implementation is closed source.

* http://www.foobar2000.org/

=== Rockbox ===

Rockbox is an established alternative firmware for portable music players (typically small, embedded devices) that implements demuxing, decoding, and playback support for this RFC starting with version 3.13 released March 5, 2013.
It supports metadata, start and end trimming, the gain field, and seeking.
It does not currently support multichannel or chained files.

This implementation is open source.

* http://www.rockbox.org/
* http://git.rockbox.org/?p=rockbox.git
* http://gerrit.rockbox.org/r/#/c/300/

=== Youki3 ===

Youki3 is a media player for the Android mobile operating system. It provides OPUS metadata reading support via TagLib [(c) Scott Wheeler; ported to Android] and playback via LibVLC [also see the VLC section above please; (C) VideoLAN developers].

* https://play.google.com/store/apps/details?id=net.mderezynski.youki3

The app source is currently closed, however since it utilizes LibVLC for playback, the respective source is open.

=== Mutagen ===

Mutagen is a Python module to handle audio metadata, supporting Ogg Opus among many other media formats. It has support for editing the vorbis-style comment fields in Ogg Opus since version 1.21 (2013-01). In 2014-11 (unreleased at time of writing) it added support for preserving marked comment padding as specified in the RFC. It is used by the MusicBrainz Picard tagger, Beets music library manager, Ex Falso and Quod Libet tagger and player, among many other applications.

This implementation is open source, licensed under the GPL-2.

* https://mutagen.readthedocs.org/
* https://bitbucket.org/lazka/mutagen

[[Category:Ogg]]
[[Category:Opus]]

OpusFAQ

2016-04-28T23:07:20Z

MarkH: update Opus RTP reference

If you are looking for info not covered in this FAQ, try the [http://opus-codec.org main Opus website] or the pages included in the [[:Category:Opus|Opus category]] of this wiki.

[[Image:Opus logo trans.png|right]]

== General Questions ==

=== What is Opus? Who created it? ===

Opus is a totally open, royalty-free, highly versatile audio codec.

It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the Internet Engineering Task Force (IETF) as '''[http://tools.ietf.org/html/rfc6716 RFC 6716]'''.

Opus has been in development since early 2007. Programmers associated with '''[http://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.

=== How does Opus compare to other codecs? ===

Opus is distinguished from most high quality formats (eg: AAC, [[Vorbis]], MP3) by having '''low delay''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: G.711, GSM, Speex) by supporting '''high audio quality''' ([https://tools.ietf.org/html/rfc6716#section-2.1.1 details here]).

It meets or exceeds existing codecs' quality across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.

Most importantly, the Opus format and its reference implementation are both available under '''[http://opus-codec.com/license/ liberal, royalty-free licenses]'''. 
This makes it:
* easy to adopt
* compatible with free software
* suitable for use as part of the basic infrastructure of the Internet

See the Opus '''[http://opus-codec.org/comparison comparison page]''' for more details.

=== Does Opus make all those other lossy codecs obsolete? ===

Theoretically, yes.

From a technical point of view (loss, delay, bitrates, ...) it should replace both [[Vorbis]] and [[Speex]], and the common proprietary codecs too.

=== Will Opus replace Vorbis in video files? ===

For Ogg [[Theora]] video files, it can, just the overall size reduction will be minimal and it will break compatibility with existing players.

For WebM video files, the convention is to use the VP9 video codec when using Opus as an audio codec.

=== How do I use Opus? What programs support Opus? ===

Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and many applications, including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[http://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.

For now, the best way to '''encode''' Opus files is to use the '''opusenc''' command-line tool from the '''[http://opus-codec.com/downloads/ opus-tools package]'''.
If you want to encode many files at once (e.g. your music library), you can also try the '''[http://lamexp.sourceforge.net/ LameXP]''' converter.

For real-time applications, Opus support is available in '''[http://www.webrtc.org/ Google's WebRTC codebase]'''.

Opus is a relatively new codec: many more applications will support it in the near future.

=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===

Yes and no.

Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.

However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.

The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.

See Monty's '''[http://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.

If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!

=== What are the licensing requirements? ===

The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met.

Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).

See the '''[http://www.opus-codec.org/license/ Opus Licensing]''' page for details.

=== Why make Opus free? ===

On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.

Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.

Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.

This is why Opus, unlike many codecs, is free.

=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===

No.

The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.

=== Why not keep the SILK and CELT codecs separate? ===
Opus is more than just two independent codecs with a switch.

In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.

Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.

=== Now that Opus is standardized, will its development stop, or can it be further improved? ===
Yes, Opus '''can''' and '''should''' be improved, because unlike most ITU-T codecs, Opus is only defined in terms of its decoder.

The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.

Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.

In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.

=== Will all future Opus releases comply with the [http://tools.ietf.org/html/rfc6716 Opus specification]? ===

Yes.

=== In what ways is Opus optimized for the Internet? ===

Opus has good packet loss robustness and concealment, but its optimisations go further.

One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.

This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.

One last aspect is that Opus is simple to transport over RTP, as can be seen from the [http://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.

=== What applications for Android can play Opus? ===

Right now, there are just a few but that list is fast growing. Please reference [http://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.

=== When will the next version be released? ===

When it's done. Seriously, we do not know.

Opus is not a large project with a fixed release schedule.

That being said, our [http://www.opus-codec.org/downloads/ pre-releases] and even the [https://git.xiph.org/?p=opus.git git repository] are generally pretty stable and given proper testing (which you should always do anyway), are safe to distribute.

Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.

== Software Developers' Questions ==

=== On what platforms does Opus run? ===

The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.

Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.

=== Is there a fixed-point implementation? ===

Yes.

The fixed-point and floating-point decoder and encoder implementations are part of the same code base.

The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.

=== Which implementation should I use? ===

While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.

The [http://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.

All Opus implementations are compatible by definition.

=== How is supporting Opus different from supporting Speex/G.711/MP3? ===

Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''.

The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.

=== My application doesn't work. Can anyone help me? ===

It's possible to get help, but before doing so, there are a few basic things to try:

* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.
* Read the [http://www.opus-codec.org/docs/ Opus documentation].
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.

If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.

=== How do I report a bug? ===

If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].

Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.

If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.

Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].

=== What is Opus Custom? ===

Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.

Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.

For these reasons, its use is discouraged outside of very specific applications, for example:
* ultra low delay applications where synchronization with the soundcard buffer is important.
* low-power embedded applications where compatibility with others is not important.

For almost all other types of applications, Opus Custom should not be used.

=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===

Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.

Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.

The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed resampler which can be used where resampling is required.

=== How is the bitrate setting used in VBR mode? ===

Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.

The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.

=== What frame size should I use? ===

A '''20ms''' frame size works well for most applications.

Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.

Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent).

=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===

The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.

In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.

FEC is only used by the encoder under certain conditions:
* the feature must be enabled via the OPUS_SET_INBAND_FEC CTL
* the encoder must be told to expect loss via the OPUS_SET_PACKET_LOSS_PERC CTL
* the codec must be operated in any of the linear prediction or Hybrid modes

Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.

Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default the implementation assumes there is no loss.

=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===

A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.

To build Opus without the references to <tt>malloc/free</tt>, you must:

* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.

If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage. 
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.

=== How can I ensure that my software interoperates with other software implementing Opus? ===

For applications using Ogg files, there are some [http://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.

In general, here's a list of specific issues to check:
* Can your application handle all frame sizes, including changing the frame size from frame to frame?
* Does your application properly react to lost packet by calling the decoder with a NULL packet?

=== What is the complexity of Opus? ===

The complexity of Opus varies by a large amount based on the settings used.

It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone.

For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.

=== Opus is using too much CPU for my application. What can I do? ===

First don't panic and don't start writing assembly just yet.

It's possible that you're just not using the right set of options.

If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.

Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[http://www.opus-codec.org/docs/ Documentation]''' for details).

If all else fails and you need to optimize the Opus code, see the next question.

=== I would like to optimize/improve/help with Opus. Where should I start? ===

Please '''[http://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.

This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path.

=== Does Opus have an echo canceller like Speex does? ===

Echo cancellation is completely independent from codecs.

You can use any echo canceller (including the one from libspeexdsp) along with Opus.

That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].

[[Category:Opus]]

OpusTodo

2016-01-12T18:42:26Z

MarkH: update for 1.1.2

== For 1.1.3 ==
* aarch64 and AVX optimizations

== For 1.2 ==
* Low bitrate quality improvements

== Spec ==
* Ogg mapping. See [[https://tools.ietf.org/html/draft-ietf-codec-oggopus IETF draft]]
* Matroska mapping. See: [[MatroskaOpus]] And firefox/ffmpeg implementation
* RTP payload format. Mono/stereo mapping is complete [[https://tools.ietf.org/html/rfc7587 RFC 7587]], no multichannel mapping yet.

== Website ==
* De-uglify webpage - some suggestions:
** write about codecs obsoleted by OPUS (Speex, CELT, Vorbis(?), and the prop. ones)
** write about implementations (libopus encoder/decoder, libavcodec decoder, any others?)
** comparison table (Opus, Vorbis, Speex, ..., MP5) of features (channels, freq, bits per sample, license, language (C89), integer impl. (Vorbis decoder only, Opus YES, ...)
** future use in video files (Theora? Dirac? WebM? other future codecs...)
** audio files for storage (like Vorbis, no raw Opus defined, only inside OGG), ...
* Promotional material (some nice free/public-domain sounds/radio stations in Opus format)

== Other ==

* Oggz-validate (should also validate opus toc)

== Opus-tools ==
* Port opusdec to libopusfile/libopusurl.
* A simple real time streaming example tool
** Start with opusrtp.c in [https://git.xiph.org/?p=opus-tools.git opus-tools]
** Make <code>opusrtp rtp://example.com:5431/</code> listen to that host and port and mux packets from there. Generalize the cpac bases --sniff implementation
** Make sending similarly generic. Maybe just <code>opusrtp source.opus -o rtp://example.com:5431/</code> to send source.opus out to the destination?
** Make --sniff save one file per
** Implement DTLS-SRTP. See webrtc.
** audio capture/encode, decode/playback?
** Parse and act on sdp for convenience and testing.

* EBU R128/Replaygain (half done— needs a gain tool)

== Surround work ==

* Apply spreading to energy masking
* More conservative energy masking (not just mean difference) and dynalloc
* Allow SILK/hybrid on center channel for voice?

== Psychoacoustic stuff ==

* Adaptive width narrowing and forced intensity stereo bands

== Optimisations ==

* Vectorising comb_filter()
* Use 16-bit mul plus shift in denormalise_bands()
* Optimise MDCT somehow

== Third-Party tool enhancements ==
* mutagen: [https://bitbucket.org/lazka/mutagen/issue/202/oggopus-support-in-place-rewrites-for support padding in comments header], [https://bitbucket.org/lazka/mutagen/issue/203/oggopus-allow-updating-the-output_gain allow updating output gain in ID header]

== Future work ==
* psymodel based VBR
* Remove copy in inverse MDCT
* Save some float<->int conversions
* Improvements to LP mode CBR (greg has some code)
* Unconstrained SILK VBR
* Better handling for the case where FEC has a different bandwidth than the current mode
* PLC transitions on unprotected SILK-SILK bandwidth changes?
* Figure out how to use speech/music detection optimally
** find optimal switching time (low energy/tonality)
* Improve variable frame size

[[Category:Opus]]

MatroskaOpus

2015-12-31T04:56:23Z

MarkH: update URL

{{draft}}
This is an encapsulation spec for the [[Opus]] codec in [http://matroska.org/ Matroska]. There are a number of outstanding functional issues with muxing Opus in Matroska, and until those are resolved, use of this spec is NOT RECOMMENDED.

* CodecID is A_OPUS
* SampleFrequecy is 48000
* Channels is number of output PCM channels
* SeekPreRoll is set to 80000000
* CodecPrivate consists of the 'OpusHead' packet, identical to the Ogg mapping.

The 'OpusHead' format is defined by the [https://tools.ietf.org/html/draft-ietf-codec-oggopus Ogg Opus] mapping. In particular it includes pre-skip, gain, and the channel mapping table required for correct surround output.

The second 'OpusTags' header packet from Ogg Opus is not used in the Matroska encapsulation. Matroska has its own system for tag metadata, and this avoids duplicating it and the need for sub-framing to index multiple packets within the CodecPrivate element.

SeekPreRoll [56][BB] is a new unsigned integer element added to the TrackEntry element. The value is the number of nanoseconds that must be discarded, for that stream, after a seek until the decoded data is valid to render.

CodecDelay [56][AA] is a new unsigned integer element added to the TrackEntry element. The value is the number of nanoseconds that must be discarded, for that stream, from the start of that stream. The value is also the number of nanoseconds that all encoded timestamps for that stream must be shifted to get the presentation timestamp. (This will fix Vorbis encoding as well.)

DiscardPadding [75][A2] is a new signed integer element added to the BlockGroup element. DiscardPadding is the duration in nanoseconds of the silent data added to the Block (padding at the end of the block). The duration of DiscardPadding is not calculated in the duration of the Track and should be discarded during playback. (This will fix Vorbis encoding as well.)

== Muxing Recommendations ==

In order to prevent extraneous parsing of muxed content for the players that want to start playback at exactly time T, we will recommend muxers create files with another Cluster within N-1 at T-SeekPreRoll, where T is the start time of Cluster N. Then add CuePoints for all the new T-SeekPreRoll Clusters with a CueTrack of the audio stream. The CuePoints for the video stream will not change.

For example, a file is a muxed MKV with the following characteristics:
* 5 second interval between video keyframes
* Each video keyframe begins a new Cluster
* Cues will contain video keyframe CuePoints
* For each video keyframe at time T there will be new Cluster at T-SeekPreRoll
* Cues will contain audio CuePoints for T-SeekPreRoll Clusters
* Audio and video are interleaved in monotonically increasing order

Assume SeekPreRoll is 80 milliseconds, the first Cluster starts at 0 milliseconds with a video keyframe Block and has a duration of 4920 milliseconds. The second Cluster starts at 4920 milliseconds with an audio Block and has a duration of 80 milliseconds. Just to be clear, the second Cluster can contain Blocks from all streams. The third Cluster starts at 5000 milliseconds with a video keyframe Block and has a duration of 4920 milliseconds. The fourth Cluster starts at 9920 milliseconds with an audio Block and has a duration of 80 milliseconds.

With this recommendation players that want audio and video to start playback at time T can seek to Cluster T-SeekPreRoll and start decoding the audio stream. This will work the same for both local and HTTP playback.

== Open Questions ==

* Should we say muxers MAY or SHOULD NOT produce simple streams without filling in CodecPrivate?
** If the CodecPrivate is empty or not present and Channels is 1 or 2, players MAY treat it as a sane set of defaults, I guess. e.g. channel mapping family 0, no pre-skip or gain.
** For Channels > 2 the track MUST be rejected, since there's no way to map the encoded substreams to channels.
** We would also have to decide on a default value for OutputGain.
** Version must be 1.
* How can sample-accurate end-time trimming work in Matroska?
** We defined a new element added to a BlockGroup, DiscardPadding (previously PostPadding), which is defined as the number of nanoseconds to discard from the Block.
** Currently all software encapsulating Vorbis in Matroska is broken in this regard, and muxing a Vorbis file in Matroska causes it to get longer (i.e., produce more audio output than the original Ogg file). It would be unfortunate to repeat this disaster for Opus. This needs a new element specifying the number of samples to trim, perhaps a new BlockGroup child.
*** This has been addressed with DiscardPadding for Opus. DiscardPadding was speced to fix Vorbis (as well as other codecs) too.
* If new elements are required, can they be defined so as to enable correct seeking in rolling intra (a.k.a intra refresh) video as well?
** SeekPreRoll should work for rolling intra video.

== Handling Pre-skip data ==

* '''On [http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel Matroska-dev] we decided to implement proposal one ([http://lists.matroska.org/pipermail/matroska-devel/2013-June/004475.html ref]).'''
* Use Cases:
** UC1: Playback starts from the beginning of the stream. Source stream time starts at 0.
** UC2: Playback starts from the beginning of the stream. Pre-skip data ends in middle of compressed packet.
** UC3: Playback starts from the middle of the stream > SeekPreRoll time.
** UC4: Playback starts from the middle of the stream < SeekPreRoll time.
** UC5: Encode source stream to Opus, mux to Matroksa, then decode Opus stream, must have same number of samples as source stream.

* one: Timeshift the timestamps by pre-skip data.
** The Opus audio stream pre-skip data starts from time 0 and adds the pre-skip time to the normal audio time, like how Opus files are muxed into ogg files. We would add a new element to the TrackEntry element, CodecDelay, and the player would adjust the timestamps of the decoded samples by subtracting CodecDelay. All use cases should be covered.
** Cons:
*** The timestamp of the Block does not match the timestamp of the playback position.
*** Does not generalize known "decode, but not render" data.
*** Forces the player to handle the pre-skip samples. I.e. not the decoder.

* two: Add pre-skip data to CodecPrivate.
** On every discontinuity the decoder would need to decode and throw away the pre-skip data.
** Cons:
*** UC2 will throw away valid data and the AV sync will be off.
*** UC3 will redundantly decode the pre-skip data.

* three: Add TimeToDiscard to Block.
** Add an element to the Block element, TimeToDiscard in nanoseconds. A value of -1 would not render the whole Block, which would have the same effect as setting the invisible bit. How would this affect the Block timestamp? Maybe the new element should be SamplesToDiscard or DataToDiscard?
** Cons:

* four: Blocks that contain pre-skip data will set invisible flag.
** Blocks that contain pre-skip data have timestamps from the beginning of the stream. Blocks that only contain normal data have timestamps from the playback position.
** Cons:
*** Forces the player to handle the pre-skip samples. I.e. not the decoder.
*** UC2 will throw away valid data and the AV sync will be off. Other use cases should be fine.

* five: Force pre-skip packets to be prepended to the first normal packet in the first Block.
** The first Block's timestmap will be set to the start time of the source playback position. We would add a new element to the TrackEntry element, CodecDelay. All use cases should be covered.
** Cons:
*** Does not generalize known "decode, but not render" data.
*** Forces the player to handle the pre-skip samples. I.e. not the decoder.

* six: Create a new codec, OPUS_MKV.
** Basically the codec will wrap Opus packets with data telling the decoder what type of Opus packet it contains. Essentially we would be creating a new codec to handle pre-skip data within the decoder.
** Cons:
*** There will be two types of Opus data streams!
*** Does not generalize known "decode, but not render" data.

* seven: Negative timestamps.
** The SimpleBlock timestamp is signed 16 bits, so the format can signal about half of the pre-skip if playback timestamps are to start at zero.
** One could set an incorrect timestamp on the skipped blocks, and rely on the decoder to drop them based on the OpusHead preskip value. As long as the initial blocks are timestamped <= start of output this shouldn't affect seeking.
** Cons:
*** Moritz suggests this won't work because the resolution of the timestamps is controlled by the muxer, so the SimpleBlock timestamp offset isn't sample accurate anyway ([http://lists.matroska.org/pipermail/matroska-devel/2012-September/004254.html ref]).

* eight: 
** The Ogg format uses granule positions which are converted to presentation timecodes using codec specific information on a per logical stream basis.
** The Matroska format uses absolute timecodes with an arbitrary per segement accuracy for all tracks in the segment.
** It is the belief of this tikiman that using a timecode offset of any kind in MKV is unholy.
** The preskip is communicated to the media software via the Opus header in the codec private data. At the begining of the track, the track timecode is not increased until prekip samples are in track frames.
** From then on audio is muxed as normal, however the audio should be muxed >= 3840 samples behind video frames.
*** i.e. Cluster Timecode: 5.000 seconds
*** Video Track Key Frame 5.000 seconds
*** Opus Track Frame 4.920 seconds

[[Category:Opus]]

OggOpus

2015-12-31T04:55:06Z

MarkH: spelling, update URLs

'''Superceded by [https://tools.ietf.org/html/draft-ietf-codec-oggopus the IETF draft].'''

== Ogg Mapping for Opus ==

The IETF Opus codec is a low-latency audio codec optimized for both voice and general-purpose audio. See the [https://tools.ietf.org/html/rfc6716 Opus Specification] for technical details.

Almost everything about Opus is either fixed or dynamically switchable, so most of the usual ID and setup header parameters in the header packets of an Ogg encapsulation aren't needed. In particular, bitrate, packet duration, mono/stereo flags, and coding modes are all dynamically switchable from packet to packet. The first one or two bytes in each data packet, the start of the 'TOC sequence' that defines the layout of the packet, specifies all of these parameters for that particular packet. See Section 3 of the Opus Specification for the exact format of the TOC sequence.

The remaining parameters that must be signaled are

* The magic number for stream identification,
* The stream count and coupling for multichannel audio, and
* Any metadata or tags.

=== Content Type ===

The recommended mime-type for Ogg Opus files is '''audio/ogg''', defined in [https://www.ietf.org/rfc/rfc5334.txt RFC 5334].

If more specificity is desired, one can distinguish Opus files as 'audio/ogg; codecs=opus'.

The recommended filename extension for Ogg Opus files is '''.opus'''.

=== Packet Organization ===

Opus is framed in a continuous logical [https://www.xiph.org/ogg/doc/framing.html Ogg stream].

There are two mandatory headers. The granule position of the pages containing these headers MUST be zero.

The first packet in the logical Ogg stream MUST contain the identification header, which uniquely identifies a stream as Opus audio. It MUST begin with the 8 bytes "OpusHead". It MUST be placed alone in the first page of the logical Ogg stream. This page MUST have the ’beginning of stream’ flag set.

The second Opus packet MUST contain the comment header. It must begin with the 8 bytes "OpusTags". It MAY span one or more pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it ends.

All subsequent pages are audio data pages and the packets they contain are audio data packets. The first audio page SHOULD NOT have the 'continued packet' flag set (which would indicate the first audio packet is continued from a previous page). Packets MUST be placed into Ogg pages in order until the end of stream. Audio packets MAY span page boundaries. A decoder MUST treat a zero-byte audio packet as if it were an Opus packet with an illegal TOC sequence. The last page SHOULD have the 'end of stream' flag set, but implementations should be prepared to deal with truncated streams which do not have a page marked 'end of stream'. The final packet SHOULD complete on the last page, i.e., the final lacing value should be less than 255. There MUST NOT be any more pages in an Opus logical stream after a page marked 'end of stream'.

=== Granule Position ===

The granule position of an audio page encodes the total number of PCM samples in the stream up to and including the last fully-decodable sample from the last packet ''completed'' on that page. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position field MUST be set to the special value ’-1’ in two's complement.

The granule position of an audio page is in units of PCM audio samples at a fixed rate of 48 kHz (per channel; a stereo stream’s granule position does not increment at twice the speed of a mono stream). It is possible to run a decoder at other sampling rates, but the format and this specification always count samples assuming a 48 kHz decoding rate.

The duration of an Opus packet may be any multiple of 2.5 ms, up to a maximum of 120 ms. This duration is encoded in the TOC sequence at the beginning of each packet. The number of samples returned by a decoder corresponds to this duration exactly, even for the first few packets. For example, a 20 ms packet fed to a decoder running at 48 kHz will always return 960 samples. A demuxer can parse these TOC sequences to work backwards or forwards from a packet with a known granule position (i.e., the last packet completed on some page) in order to assign granule positions to every packet, or even every individual sample. The one exception is the last page in the stream, as described below.

All other pages with completed packets after the first MUST have a granule position equal to the number of samples contained in packets that complete on that page plus the granule position of the most recent page with completed packets. This guarantees that a demuxer can assign individual packets the same granule position when working forwards as when working backwards. There must not be any gaps. In order to support capturing a stream that uses discontinuous transmission (DTX), an encoder SHOULD emit packets that explicitly request the use of Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in Section 3.2.1 of the Opus Specification) in place of the packets that were not transmitted.

There is some amount of latency introduced during the decoding process, to allow for overlap in the MDCT modes, stereo mixing in the LP modes, and resampling, and the encoder will introduce even more latency (though the exact amount is not specified). Therefore the first few samples produced by the decoder do not correspond to any real, input audio, but are instead composed of padding inserted by the encoder to compensate for this latency. These samples must be stored and decoded, as Opus is an asymptotically convergent predictive codec, meaning the decoded contents of each frame depend on the recent history of decoder inputs. A 'pre-skip' field in the ID header signals the number of samples which should be skipped at the beginning of the stream. This provides sufficient history to the decoder so that it has already converged before the stream's output begins. It may also be used to perform sample-accurate cropping of existing encoded streams. This amount need not be a multiple of 2.5 ms, may be smaller than a single packet, or may span the contents of several packets.

The PCM sample position is determined from the granule position using the formula

'PCM sample position' = 'granule position' - 'pre-skip' .

For example, if the granule position of the first page is 59971, and the pre-skip is 11971, then the PCM sample position of the last decoded sample from the first page is 48000. This may be converted into a playback time using the formula

'PCM sample position'
'playback time' = --------------------- .
48000.0

The initial PCM sample position before any samples are played is normally '0'. In this case, the PCM sample position of the first audio sample to be played starts at '1', because it marks the time on the clock ''after'' that sample has been played, and a stream that is exactly one second long has a final PCM sample position of '48000', as in the example here.

Vorbis streams use a granule position smaller than the number of audio samples contained in the first page to indicate that some of those samples must be trimmed from the output. However, to do so it requires that the first page contains exactly two packets, in order to allow the decoder to perform PCM position adjustments before needing to return any PCM data. Opus uses the pre-skip mechanism for this purpose instead, since the encoder may introduce more than a single packet's worth of latency, and since very large packets in streams with a very large number of channels may not fit on a single page.

The page with the 'end of stream' flag set MAY have a granule position that indicates the page contains less audio data than would normally be returned by decoding up through the final packet. This is used to end the stream somewhere other than an even frame boundary. The granule position of the most recent audio page with completed packets is used to make this determination, or '0' is used if there were no previous audio pages with a completed packet. The difference between these granule positions indicates how many samples to keep after decoding the packets that completed on the final page. The remaining samples are discarded. The number of discarded samples SHOULD be smaller than the number decoded from the last packet.

The granule position of the first audio page with a completed packet MAY be larger than the number of samples contained in packets that complete on that page, however it MUST NOT be smaller, unless that page has the 'end of stream' flag set. Allowing a granule position larger than the number of samples allows the beginning of a stream to be cropped without rewriting the granule position of all the remaining pages. This means that the PCM sample position just before the first sample to be played may be larger than '0', but the PCM sample position relative to '0' should still be used for the purposes of synchronization when multiplexing with other logical streams. This does not affect the behavior of pre-skip: exactly 'pre-skip' samples should be skipped from the beginning of the decoded output, even if the initial PCM sample position is greater than zero.

On the other hand, a granule position that is smaller than the number of decoded samples prevents a demuxer from working backwards to assign each packet or each individual sample a valid granule position, since granule positions must be non-negative. A decoder MUST reject as invalid any stream where the granule position is smaller than the number of samples contained in packets that complete on the first page with a completed packet, unless that page has the 'end of stream' flag set. It MAY defer this action until it decodes the last packet completed on that page. If that page has the 'end of stream' flag set, a demuxer can work forwards from the granule position '0', but MUST reject as invalid any stream where the granule position is smaller than the 'pre-skip' amount. This would indicate that more samples should be skipped from the initial decoded output than exist in the stream.

==== ID Header ====

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 'O' | 'p' | 'u' | 's' |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 'H' | 'e' | 'a' | 'd' |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version = 1 | channel count | pre-skip |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| original input sample rate in Hz |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| output gain Q7.8 in dB | channel map | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :
| |
: optional channel mapping table... :
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Brief description of each field:

- Magic signature: "OpusHead" (64 bits)
- Version number (8 bits unsigned): 0x01 for this spec
- Channel count 'c' (8 bits unsigned): MUST be > 0
- Pre-skip (16 bits unsigned, little endian)
- Input sample rate (32 bits unsigned, little endian): informational only
- Output gain (16 bits, little endian, signed Q7.8 in dB) to apply when
decoding
- Channel mapping family (8 bits unsigned)
-- 0 = one stream: mono or L,R stereo
-- 1 = channels in vorbis spec order: mono or L,R stereo or ... or FL,C,FR,RL,RR,LFE, ...
-- 2..254 = reserved (treat as 255)
-- 255 = no defined channel meaning
If channel mapping family > 0
- Stream count 'N' (8 bits unsigned): MUST be > 0
- Two-channel stream count 'M' (8 bits unsigned): MUST satisfy M <= N, M+N <= 255
- Channel mapping (8*c bits)
-- one stream index (8 bits unsigned) per channel (255 means silent throughout the file)

Detailed definition of each field:

* '''Magic signature'''
The magic signature "OpusHead" allows codec identification and is human readable. Starting with 'Op' helps distinguish it from data packets, as this is an invalid TOC sequence.

* '''Version'''
The version number MUST always be '1' for this version of the encapsulation specification.

Implementations SHOULD treat streams where the upper four bits of the version number match a recognized specification as backwards-compatible with that specification. That is, the version number can be considered split into "major" and "minor" version sub-fields, with changes to the "minor" sub-field in the lower four bits signaling compatible changes. For example, a decoder implementing this specification SHOULD accept any stream with a version number 15 or less, and SHOULD assume any stream with a version number 16 or greater is incompatible. The initial version '1' was chosen to keep implementations from relying on this byte as a null terminator for the OpusHead string.

* '''Channel count''' 'c'
The number of channels byte specifies the number of output channels (1...255) for this Ogg Opus stream.

* '''Pre-skip'''
This is the number of samples (at 48 kHz) to discard from the decoder output when starting playback, and also the number to subtract from a page's granule position to calculate its PCM sample position.

When constructing cropped Ogg Opus streams, a pre-skip of at least 3840 samples (80 ms) is RECOMMENDED to ensure complete convergence.

* '''Input sample rate'''
This is ''not'' the sample rate to use for playback of the encoded data.

Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8, 12, and 20 kHz. Each packet in the stream may have a different audio bandwidth. Regardless of the audio bandwidth, the reference decoder supports decoding any stream at a sample rate of 8, 12, 16, 24, or 48 kHz. The original sample rate of the encoder input is not preserved by the lossy compression.

An Ogg Opus player SHOULD select the playback sample rate according to the following procedure:
* If the hardware supports 48 kHz playback, decode at 48 kHz,
* else if the hardware's highest available sample rate is a supported rate, decode at this sample rate,
* else if the hardware's highest available sample rate is less than 48 kHz, decode at the next higher supported rate and resample,
* else decode at 48 kHz and resample.

However, the 'input sample rate' field allows the encoder to pass the sample rate of the original input stream as metadata. This may be useful when the user requires the output sample rate to match the input sample rate. For example, a non-player decoder writing PCM format to disk might choose to resample the output audio back to the original input rate to reduce surprise to the user, who might reasonably expect to get back a file with the same sample rate as the one they fed to the encoder.

A value of zero indicates 'unspecified'. Encoders SHOULD write the actual input rate or zero, but decoder implementations which do something with this field SHOULD take care to behave sanely if given crazy values (e.g. don't
actually upsample the output to 10 MHz if requested).

* '''Output gain'''
This is a gain to be applied by the decoder. Virtually all players and media frameworks should apply it by default. If a player chooses to apply any volume adjustment or gain modification, such as the R128_TRACK_GAIN or a user-facing volume knob, the adjustment MUST be applied ''in addition'' to this output gain in order to achieve playback at the desired volume.

An encoder SHOULD set the output gain to zero, and instead apply any gain prior to encoding, when this is possible and does not conflict with the user's wishes. The output gain should only be nonzero when the gain is adjusted after encoding, or when the user wishes to adjust the gain for playback while preserving the ability to recover the original signal amplitude.

Although the output gain has enormous range (+/- 128 dB, enough to amplify inaudible sounds to the threshold of physical pain), most applications can only reasonably use a small portion of this range around zero. The large range serves in part to ensure that gain can always be losslessly transferred between OpusHead and R128_TRACK_GAIN (see below) without saturating.

The gain is the 20 log10 ratio of output to input sample values to be applied to the decoder output. E.g. <code>sample *= pow(10, header.gain/(20.*256))</code> where header.gain is the raw 16 bit Q7.8 value from the header.

* '''Channel mapping family'''
This byte indicates the order and semantic meaning of the various channels encoded in each Opus packet.

Each possible value of this byte indicates a ''mapping family'', which defines a set of allowed numbers of channels, and the ordered set of channel names for each allowed number of channels. Currently there are three defined mapping families, although more may be added:

* Family 0 (RTP mapping)
** Allowed numbers of channels: 1 or 2
** 1 channel: monophonic (mono)
** 2 channels: stereo (left, right)
** '''Special mapping''': this channel mapping value also indicates that the contents consists of a single Opus stream that is stereo if and only if c==2, with stream index 0 mapped to channel 0, and (if stereo) stream index 1 mapped to channel 1. When the channel mapping byte has this value, no further fields are present in OpusHead.
* Family 1 ([https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-810004.3.9 Vorbis channel order])
** Allowed numbers of channels: 1 ... 8
** Channel meanings depend on the number of channels, see the Vorbis mapping for details.
* Family 255 (no defined channel meaning)
** Allowed numbers of channels: 1...255
** Channels are unidentified. General-purpose players SHOULD NOT attempt to play these streams, and offline decoders MAY deinterleave the output into separate PCM files, one per channel. Decoders SHOULD NOT produce output for channels mapped to stream index 255 (pure silence) unless they have no other way to indicate the index of non-silent channels.

The remaining channel mapping families (2...254) are reserved. A decoder encountering a reserved mapping byte should act as though the mapping byte is 255.

An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family of 0 or 1, even if the number of channels does not match the physically connected audio hardware. Players SHOULD perform channel mixing to increase or reduce the number of channels as needed.

* '''Stream count''' 'N'
This field indicates the total number of streams so the decoder can correctly parse the packed Opus packets inside the Ogg packet.

For channel mapping family 0, this value defaults to 1, and is not coded.

A multi-channel Opus file is composed of one or more individual Opus streams, each of which produce one or two channels of decoded data. Each Ogg packet contains one Opus packet from each stream. The first N-1 Opus packets are packed using the self-delimiting framing from Appendix B of the Opus Specification. The remaining Opus packet is packed using the regular, undelimited framing from Section 3 of the Opus Specification. All the Opus packets in a single Ogg packet MUST be constrained to produce the same number of decoded samples. A decoder SHOULD treat any Opus packet whose duration is different from that of the first Opus packet in an Ogg packet as if it were an Opus packet with an illegal TOC sequence.

* '''Two-channel stream count''' 'M'
Describes the number of streams whose decoders should be configured to produce two channels. This must be no larger than the number of total streams.

For channel mapping family 0, this value defaults to c-1 (i.e., 0 for mono and 1 for stereo), and is not coded.

Each packet in an Opus stream has an internal channel count of 1 or 2, which can change from packet to packet. This is selected by the encoder depending on the bitrate and the contents being encoded. The original channel count of the encoder input is not preserved by the lossy compression.

Regardless of the internal channel count, any Opus stream may be decoded as mono (a single channel) or stereo (two channels) by appropriate initialization of the decoder. The "two-channel stream count" field indicates that the first M Opus decoders should be initialized in stereo mode, and the remaining N-M decoders should be initialized in mono mode. The total number of decoded channels (M+N) MUST be no larger than 255, as there is no way to index more channels than that in the channel mapping.

* '''Channel mapping'''
Contains one index per output channel indicating which decoded channel should be used. If the index is less than 2*M, the output MUST be taken from decoding stream (index/2) as stereo and selecting the left channel if index is even, and the right channel if index is odd. If the index is 2*M or larger, the output MUST be taken from decoding stream (index-M) as mono. As a special case, an index of 255 means that the corresponding output channel MUST contain pure silence.

For channel mapping family 0, the first index defaults to 0, and if c==2, the second index defaults to 1. Neither index is coded.

The number of output channels (c) is not constrained to match the number of decoded channels (M+N). A single index MAY appear multiple times, i.e., the same decoded channel may be mapped to multiple output channels. Some decoded channels might not be assigned to any output channel, as well.

==== Comment Header ====

- 8 byte 'OpusTags' magic signature (64 bits)
- The remaining data follows the vorbis-comment header design used in OggVorbis (without the "framing-bit"), OggTheora, and Speex:
* Vendor string (always present).
** 4-byte little-endian length field, followed by length bytes of UTF-8 vendor string.
* TAG=value metadata strings (zero or more).
** 4-byte little-endian string count.
** Count strings consisting of 4-byte little-endian length and length bytes of UTF-8 string in "tag=value" form.

One new comment field is introduced for Ogg Opus:
R128_TRACK_GAIN=-573
representing the volume shift needed to normalize the track's volume. The gain is a Q7.8 fixed point number in dB, as in the OpusHead "output gain" field. This field is similar to the [[VorbisComment#Replay_Gain|REPLAYGAIN_TRACK_GAIN field in Vorbis]], although the normal volume reference is the [https://tech.ebu.ch/loudness EBU-R128] standard.

An Ogg Opus file MUST NOT have more than one such field, and if present its value MUST be an integer from -32768 to +32767 inclusive, represented in ASCII with no whitespace. If present, it MUST correctly represent the R128 normalization gain (relative to the OpusHead output gain). If a player chooses to make use of the TRACK_GAIN, it MUST be applied ''in addition'' to the OpusHead output gain. If an encoder populates the TRACK_GAIN field, and the output gain is not otherwise constrained or specified, the encoder SHOULD write the R128 gain into the OpusHead output gain and write "R128_TRACK_GAIN=0". If a tool modifies the OpusHead "output gain" field, it MUST also update or remove the R128_TRACK_GAIN comment field.

There is no comment field corresponding to Replaygain's ALBUM_GAIN; that information should instead be stored in the OpusHead 'output gain' field.

To avoid confusion with multiple normalization schemes, an OpusTags packet SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK fields.

== Other Implementation Notes ==

When seeking within an Ogg Opus stream, the decoder should start decoding (and discarding the output) at least 3840 samples (80 ms) prior to the seek point in order to ensure that the output audio is correct at the seek point.

Technically valid Opus packets can be arbitrarily large due to the padding format, although the amount of non-padding data they can contain is bounded. These packets may be spread over a similarly enormous number of Ogg pages. Encoders SHOULD use no more padding than required to make a variable bitrate (VBR) stream constant bitrate (CBR). Decoders SHOULD avoid attempting to allocate excessive amounts of memory when presented with a very large packet. The presence of an extremely large packet in the stream could indicate a potential memory exhaustion attack or stream corruption. Decoders should reject a packet that is too large to process, and print a warning message.

In an Ogg Opus stream, the largest possible valid packet that does not use padding has a size of (61,298*N - 2) bytes, or about 60 kB per Opus stream. With 255 streams, this is 15,630,988 bytes (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which will have a granulepos of -1. This is of course a very extreme packet, consisting of 255 streams, each containing 120 ms of audio encoded as 2.5 ms frames, each frame using the maximum possible number of bytes (1275) and stored in the least efficient manner allowed (a VBR code 3 Opus packet). Even in such a packet, most of the data will be zeros, as 2.5 ms frames, which are required to run in the MDCT mode, cannot actually use all 1275 bytes. The largest packet consisting entirely of useful data is (15,326*N - 2) bytes, or about 15 kB per stream. This corresponds to 120 ms of audio encoded as 10 ms frames in either LP or Hybrid mode, but at a data rate of over 1 Mbps, which makes little sense for the quality achieved. A more reasonable limit is (7,664*N - 2) bytes, or about 7.5 kB per stream. This corresponds to 120 ms of audio encoded as 20 ms stereo MDCT-mode frames, with a total bitrate just under 511 kbps (not counting the Ogg encapsulation overhead). With N=8, the maximum useful number of streams for the channel meanings currently defined by mapping family 1, this gives a maximum packet size of 61,310 bytes, or just under 60 kB. This is still quite conservative, as it assumes each output channel is taken from one decoded channel of a stereo packet. An implementation could reasonably choose any of these numbers for its internal limits.

== Test Vectors ==

* [[OggOpus/testvectors|Planned test vectors for OggOpus]]
* Opus test vectors

[[Category:Ogg]]
[[Category:Opus]]

OpusTodo

2015-11-26T02:59:34Z

MarkH: update to reflect 1.1.1 release

== For 1.1.2 ==
* aarch64 and AVX optimizations

== For 1.2 ==
* Quality improvements

== Spec ==
* Ogg mapping. See [[https://tools.ietf.org/html/draft-ietf-codec-oggopus IETF draft]]
* Matroska mapping. See: [[MatroskaOpus]] And firefox/ffmpeg implementation
* RTP payload format. Mono/stereo mapping is complete [[https://tools.ietf.org/html/rfc7587 RFC 7587]], no multichannel mapping yet.

== Website ==
* De-uglify webpage - some suggestions:
** write about codecs obsoleted by OPUS (Speex, CELT, Vorbis(?), and the prop. ones)
** write about implementations (libopus encoder/decoder, libavcodec decoder, any others?)
** comparison table (Opus, Vorbis, Speex, ..., MP5) of features (channels, freq, bits per sample, license, language (C89), integer impl. (Vorbis decoder only, Opus YES, ...)
** future use in video files (Theora? Dirac? WebM? other future codecs...)
** audio files for storage (like Vorbis, no raw Opus defined, only inside OGG), ...
* Promotional material (some nice free/public-domain sounds/radio stations in Opus format)

== Other ==

* Oggz-validate (should also validate opus toc)

== Opus-tools ==
* Port opusdec to libopusfile/libopusurl.
* A simple real time streaming example tool
** Start with opusrtp.c in [https://git.xiph.org/?p=opus-tools.git opus-tools]
** Make <code>opusrtp rtp://example.com:5431/</code> listen to that host and port and mux packets from there. Generalize the cpac bases --sniff implementation
** Make sending similarly generic. Maybe just <code>opusrtp source.opus -o rtp://example.com:5431/</code> to send source.opus out to the destination?
** Make --sniff save one file per
** Implement DTLS-SRTP. See webrtc.
** audio capture/encode, decode/playback?
** Parse and act on sdp for convenience and testing.

* EBU R128/Replaygain (half done— needs a gain tool)

== Surround work ==

* Apply spreading to energy masking
* More conservative energy masking (not just mean difference) and dynalloc
* Allow SILK/hybrid on center channel for voice?

== Psychoacoustic stuff ==

* Adaptive width narrowing and forced intensity stereo bands

== Optimisations ==

* Vectorising comb_filter()
* Use 16-bit mul plus shift in denormalise_bands()
* Optimise MDCT somehow

== Third-Party tool enhancements ==
* mutagen: [https://bitbucket.org/lazka/mutagen/issue/202/oggopus-support-in-place-rewrites-for support padding in comments header], [https://bitbucket.org/lazka/mutagen/issue/203/oggopus-allow-updating-the-output_gain allow updating output gain in ID header]

== Future work ==
* psymodel based VBR
* Remove copy in inverse MDCT
* Save some float<->int conversions
* Improvements to LP mode CBR (greg has some code)
* Unconstrained SILK VBR
* Better handling for the case where FEC has a different bandwidth than the current mode
* PLC transitions on unprotected SILK-SILK bandwidth changes?
* Figure out how to use speech/music detection optimally
** find optimal switching time (low energy/tonality)
* Improve variable frame size

[[Category:Opus]]

XiphInfra:List of services

2015-10-14T19:12:04Z

MarkH:

{| class="wikitable"
|-
! Service
! URL
! VM
! Host
! Maintainer
|-
| [[XiphWiki:Features|Wiki]]
| wiki.xiph.org
| wiki
| mf4.xiph.org
| ePirat
|-
| Rietveld
| review.xiph.org
| jenkins
| mf4.xiph.org
| unlord
|-
| Git
| git.xiph.org
| -
| mf4.xiph.org
| rillian
|-
| Subversion
| svn.xiph.org
| -
| mf4.xiph.org
| rillian
|-
| [[AreWeCompressedYet]]
| arewecompressedyet.com
| awcy
| catfish.xiph.org
| TD-Linux
|-
| Opus boodler streams
| opus-codec.org
| -
| mf4.xiph.org
| gmaxwell
|-
| Home pages
| people.xiph.org
| -
| mf4.xiph.org
|
|-
| Media
| media.xiph.org
| -
| media.xiph.org
|
|-
| Mail
| xiph.org
| -
| mf4.xiph.org
|
|-
| Trac
| trac.xiph.org
| -
| mf4.xiph.org
| tbr
|-
| Jenkins
| jenkins.xiph.org
| jenkins
| mf4.xiph.org
| TD-Linux
|-
| XiphBot-ng
| XiphWiki on freenode
| -
| mf4.xiph.org
| TD-Linux
|-
| Xiph-mirror
| github.com/xiph
| ?
| ?
| rillian
|-
| Icecast directory
| dir.xiph.org
| -
| dir.xiph.org
| tbr
|-
| Icecast directory beta
| dir-test.xiph.org
| ?
| mailfish.xiph.org
| ePirat, tbr
|}

XiphInfra:List of services

2015-10-14T18:36:47Z

MarkH: add xiph-mirror, icecast directory

{| class="wikitable"
|-
! Service
! URL
! VM
! Host
! Maintainer
|-
| Wiki
| wiki.xiph.org
| wiki
| mf4.xiph.org
| ePirat
|-
| Rietveld
| review.xiph.org
| jenkins
| mf4.xiph.org
| unlord
|-
| Git
| git.xiph.org
| -
| mf4.xiph.org
| rillian
|-
| Subversion
| svn.xiph.org
| -
| mf4.xiph.org
| rillian
|-
| [[AreWeCompressedYet]]
| arewecompressedyet.com
| awcy
| catfish.xiph.org
| TD-Linux
|-
| Opus boodler streams
| opus-codec.org
| -
| mf4.xiph.org
| gmaxwell
|-
| Home pages
| people.xiph.org
| -
| mf4.xiph.org
|
|-
| Media
| media.xiph.org
| -
| media.xiph.org
|
|-
| Mail
| xiph.org
| -
| mf4.xiph.org
|
|-
| Trac
| trac.xiph.org
| -
| mf4.xiph.org
|
|-
| Jenkins
| jenkins.xiph.org
| jenkins
| mf4.xiph.org
| TD-Linux
|-
| XiphBot-ng
| XiphWiki on freenode
| -
| mf4.xiph.org
| TD-Linux
|-
| Xiph-mirror
| github.com/xiph
| ?
| ?
| rillian
|-
| Icecast directory
| dir.xiph.org
| -
| dir.xiph.org
|
|}

OpusFAQ

2015-01-15T19:28:11Z

MarkH: /* Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? */ revert: this is a single 2-sentence item

[[Image:Opus logo trans.png|right]]

== General Questions ==

=== What is Opus? Who created it? ===

Opus is a totally open, royalty-free, highly versatile audio codec.

It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's SILK codec and Xiph.Org's CELT codec. It has been standardized by the Internet Engineering Task Force (IETF) as '''[http://tools.ietf.org/html/rfc6716 RFC 6716]'''.

Opus has been in development since early 2007. Programmers associated with Xiph.Org, Skype, and several other organizations have contributed to its development and to the standardization process as part of the IETF's codec working group.

=== How does Opus compare to other codecs? ===

Opus is distinguished from most formats for high quality audio (AAC, Vorbis, MP3) by having low delay and it is distinguished from most low delay formats (G.711, GSM, Speex) by supporting high audio quality. It meets or exceeds existing codecs' quality across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.

Further, the Opus format itself and the reference implementation are available under liberal royalty-free licenses, making it easy to adopt, compatible with free software, and suitable for usage as part of the basic infrastructure of the Internet.

See the Opus [http://opus-codec.org/comparison comparison page] for more details.

=== Does Opus make all those other lossy codecs obsolete? ===

Theoretically, yes.

From a technical point of view (loss, delay, bitrates, ...) it should replace both [[Vorbis]] and [[Speex]], and the common proprietary codecs too.

=== Will Opus replace Vorbis in video files? ===

For OGG [[Theora]] video files, it can, just the overall size reduction will be minimal and it will break compatibility with existing players.

For WebM video files, the convention is to use the VP9 video codec when using Opus as an audio codec.

=== How do I use Opus? What programs support Opus? ===

Opus decoding support is now included in many applications, including Firefox, foobar2000, and VLC, as well as in frameworks such as GStreamer and FFmpeg.

For now, the best way to '''encode''' Opus files is to use the opusenc command-line tool from the opus-tools package.

For real-time applications, Opus support is available in Google's WebRTC codebase.

Opus is still a relatively new codec: many more applications will support it in the near future.

=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===

Yes and no.

Opus encoding tools like opusenc will happily encode files that are sampled at 96 or 192 kHz.

However, input files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.

The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.

See Monty's [http://people.xiph.org/~xiphmont/demo/neil-young.html 24/192 Music Downloads ...and why they make no sense] for more details.

=== What are the licensing requirements? ===

The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met.

Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).

See the [http://www.opus-codec.org/license/ licensing page] for details.

=== Why make Opus free? ===

On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.

Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.

Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.

This is why Opus, unlike many codecs, is free.

=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===

No.

The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.

=== Why not keep the SILK and CELT codecs separate? ===

Opus is more than just two independent codecs with a switch.

In addition to a linear prediction "SILK mode" and a MDCT "CELT mode" it has a "hybrid mode," where speech frequencies up to 8 kHz are encoded with LP while those above 8 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kb/s.

Another advantage of the integration is the ability to switch between these modes seamlessly, without any "glitch" and without any out-of-band signalling.

=== Now that Opus is standardized, will its development stop, or can it be further improved? ===

Unlike most ITU-T codecs, Opus is only defined in terms of its decoder.

The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for MP3 encoders to improve far beyond the original l3enc and the dist10 reference implementation.

Although it is unlikely that Opus encoders will see such spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder. In fact, the 1.1 libopus release significantly improves on the reference encoder's quality.

=== Will all future Opus releases comply with the [http://tools.ietf.org/html/rfc6716 Opus specification]? ===

Yes.

=== In what ways is Opus optimized for the Internet? ===

Opus has good packet loss robustness and concealment, but its optimisations go further.

One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.

This is why Opus scales from about '''6 kb/s''' to '''512 kb/s''', in increments of 0.4 kb/s (one byte with 20 ms frames). The reason Opus can have more than 1200 possible bitrates while spending 11 bits signalling the bitrate is because UDP already encodes the packet size.

One last aspect is that Opus is simple to transport over RTP, as can be seen from [http://tools.ietf.org/html/draft-spittka-payload-rtp-opus Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.

=== What applications for Android can play Opus? ===

Right now, there are just a few but that list is fast growing. Please reference [http://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.

=== When will the next version be released? ===

When it's done. Seriously, we do not know.

Opus is not a large project with a fixed release schedule.

That being said, our [http://www.opus-codec.org/downloads/ pre-releases] and even the [https://git.xiph.org/?p=opus.git git repository] are generally pretty stable and given proper testing (which you should always do anyway), are safe to distribute.

Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.

== Software Developers' Questions ==

=== On what platforms does Opus run? ===

The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.

A few of the platforms on which Opus has been tested and is known to run include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.

=== Is there a fixed-point implementation? ===

Yes.

The fixed-point and floating-point decoder and encoder implementations are part of the same code base. The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or defining FIXED_POINT if not using the configure script) to build the code for fixed-point.

=== Which implementation should I use? ===

While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.

The [http://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.

All Opus implementations are compatible by definition.

=== How is supporting Opus different from supporting Speex/G.711/MP3? ===

Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''.

The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.

=== My application doesn't work. Can anyone help me? ===

It's possible to get help, but before doing so, there are a few basic things to try:

* Implement the application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.
* Read the [http://www.opus-codec.org/docs/ documentation].
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.

If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list].

=== How do I report a bug? ===

If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].

Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.

If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.

Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].

=== What is Opus Custom? ===

Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.

Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.

For these reasons, its use is discouraged outside of very specific applications, for example:
* ultra low delay applications where synchronization with the soundcard buffer is important.
* low-power embedded applications where compatibility with others is not important.

For almost all other types of applications, Opus Custom should not be used.

=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===

Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.

Note that it's generally preferable for a decoder to output at 48kHz even when you know the original input was 44.1kHz, not only because you can skip resampling but also because many inexpensive audio interfaces have poor quality output for 44.1k.

The opus-tools package source code contains a small, high quality, high performance, BSD licensed resampler which can be used where resampling is required.

=== How is the bitrate setting used in VBR mode? ===

Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.

The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.

=== What frame size should I use? ===

A '''20ms''' frame size works well for most applications.

Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.

Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent).

=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===

The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.

In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.

FEC is only used by the encoder under certain conditions:
* the feature must be enabled via the OPUS_SET_INBAND_FEC CTL
* the encoder must be told to expect loss via the OPUS_SET_PACKET_LOSS_PERC CTL
* the codec must be operated in any of the linear prediction or Hybrid modes

Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.

Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default the implementation assumes there is no loss.

=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===

A normal build of libopus only uses malloc/free in the _create() and _destroy() calls, so Opus is safe for realtime use so long as the codec state is pre-created.

In order to build Opus without any reference to malloc/free at all use init() calls rather than the create() calls in your application and compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt> you will get a build which does not use malloc/free.

If libopus is built with -DNONTHREADSAFE_PSEUDOSTACK (instead of VAR_ARRAYS, or USE_ALLOCA) it will use a user provided block of heap instead of stack for many things resulting in much lower the stack usage. However this makes the resulting library non-threadsafe and is not recommend on anything except limited embedded platforms.

=== How can I ensure that my software interoperates with other software implementing Opus? ===

For applications using Ogg files, there are some [http://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.

In general, here's a list of specific issues to check:
* Can your application handle all frame sizes, including changing the frame size from frame to frame?
* Does your application properly react to lost packet by calling the decoder with a NULL packet?

=== What is the complexity of Opus? ===

The complexity of Opus varies by a large amount based on the settings used.

It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone.

For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.

=== Opus is using too much CPU for my application. What can I do? ===

First don't panic and don't start writing assembly just yet.

It's possible that you're just not using the right set of options.

If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining FIXED_POINT in the build system.

Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using OPUS_SET_COMPLEXITY() (see doc for details). If all else fails and you need to optimize the Opus code, see the next question.

=== I would like to optimize Opus. Where should I start? ===

Please '''[http://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.

This will help coordinate the optimization effort and generally reduce the probability of wasting your time on duplicated effort or generally going on the wrong path.

=== Does Opus have an echo canceller like Speex does? ===

Echo cancellation is completely independent from codecs.

You can use any echo canceller (including the one from libspeexdsp) along with Opus.

That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].

OpusFAQ

2014-11-26T20:48:10Z

MarkH: /* How do I use Opus? What programs support Opus? */ Opus now in webrtc

[[Image:Opus logo trans.png|right]]

== General Questions ==

=== What is Opus? Who created it? ===

Opus is a totally open, royalty-free, highly versatile audio codec. It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's SILK codec and Xiph.Org's CELT codec. It has been standardized by the Internet Engineering Task Force (IETF) as '''[http://tools.ietf.org/html/rfc6716 RFC 6716]'''.

Opus has been in development since early 2007. Programmers associated with Xiph.Org, Skype, and several other organizations have contributed to its development and to the standardization process as part of the IETF's codec working group.

=== How does Opus compare to other codecs? ===

Opus is distinguished from most formats for high quality audio (AAC, Vorbis, MP3) by having low delay and it is distinguished from most low delay formats (G.711, GSM, Speex) by supporting high audio quality. It meets or exceeds existing codecs' quality across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format. Further, the Opus format itself and the reference implementation are available under liberal royalty-free licenses, making it easy to adopt, compatible with free software, and suitable for usage as part of the basic infrastructure of the Internet. See the Opus [http://opus-codec.org/comparison comparison page] for more details.

=== Does Opus make all those other lossy codecs obsolete? ===

Theoretically, yes. From technical point of view (loss, delay, bitrates, ...) it can replace both Vorbis and Speex, and the common proprietary codecs too.

=== Will Opus replace Vorbis in video files? ===

For OGG Theora video files, it can, just the overall size reduction will be minimal, and it will break compatibility with existing players.

For WebM video files, the convention is to use the VP9 video codec when using Opus as an audio codec.

=== How do I use Opus? What programs support Opus? ===

Opus decoding support is now included in many applications, including Firefox, foobar2000, and VLC, as well as in frameworks such as GStreamer and FFmpeg. For now, the best way to '''encode''' Opus files is to use the opusenc command-line tool from the opus-tools package. For real-time applications, Opus support is available in the Google webrtc codebase. Opus is still a new codec, expect many more applications to support it in the near future.

=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===

Yes and no. Opus encoding tools like opusenc will happily encode files that are sampled at 96 or 192 kHz. However, input files at these rates are internally converted to 48 kHz, and then only frequencies up to 20 kHz are encoded. The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go. See Monty's [http://people.xiph.org/~xiphmont/demo/neil-young.html 24/192 Music Downloads ...and why they make no sense] for more details.

=== What are the licensing requirements? ===

The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met.

Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with most (all?) open source licenses, including the GPL (v2 and v3). See the [http://www.opus-codec.org/license/ licensing page] for details.

=== Why make Opus free? ===

On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon. Most of the value of a high quality standard is the innovation and interoperation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common—everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency. Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected. This is why Opus, unlike many codecs, is free.

=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===

No. The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator" and even sharing code between Opus and the "old SILK" would be highly non-trivial.

=== Why not keep the SILK and CELT codecs separate? ===

Opus is more than just two independent codecs with a switch. In addition to a linear prediction "SILK mode" and a MDCT "CELT mode" it has a "hybrid mode," where speech frequencies up to 8 kHz are encoded with LP while those above 8 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kb/s. Another advantage of the integration is the ability to switch between these modes seamlessly, without any "glitch" and without any out-of-band signalling.

=== Now that Opus is standardized, will its development stop, or can it be further improved? ===

Unlike most ITU-T codecs, Opus is only defined in terms of its decoder. The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for MP3 encoders to improve far beyond the original l3enc and the dist10 reference implementation. Although it is unlikely that Opus encoders will see such spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder. In fact, the 1.1 libopus release significantly improves on the reference encoder's quality.

=== Will all future Opus releases comply with the [http://tools.ietf.org/html/rfc6716 Opus specification]? ===

Yes.

=== In what ways is Opus optimized for the Internet? ===

Opus being optimized for the Internet obviously means that it has good packet loss robustness and concealment, but it goes further. One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments. This is why Opus scales from about 6 kb/s to 512 kb/s, in increments of 0.4 kb/s (one byte with 20 ms frames). The reason Opus can have more than 1200 possible bitrates spending 11 bits signalling the bitrate is because UDP already encodes the packet size. One last aspect is that Opus is simple to transport over RTP, as can be seen from [http://tools.ietf.org/html/draft-spittka-payload-rtp-opus Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.

=== What applications for Android can play Opus? ===

Right now, there are just a few but that list is fast growing. Please reference [http://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.

=== When will the next version be released? ===

When it's done. Seriously. We do not know. Opus is not a large project with a fixed release schedule. That being said, our pre-releases and even the git repository are generally pretty stable and given proper testing (which you should always do anyway), are safe to distribute. Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.

== Opus for Software developers ==

=== On what platforms does Opus run? ===

The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs. A few of the platforms on which Opus has been tested and is known to run include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.

=== Is there a fixed-point implementation? ===

Yes; the fixed-point and floating-point decoder and encoder implementations are part of the same code base. The code defaults to float, so you need to configure with --enable-fixed-point (or defining FIXED_POINT if not using the configure script) to build the code for fixed-point.

=== Which implementation should I use? ===

While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation. The [http://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation— in terms of speed, encoding quality, device compatibility, etc— while still conforming to the standard. All Opus implementations are compatible by definition.

=== How is supporting Opus different from supporting Speex/G.711/MP3? ===

Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are any multiple of 2.5ms up to a maximum of 120ms.

The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.

=== My application doesn't work. Can anyone help me? ===

It's possible to get help, but before doing so, there are a few basic things to try:

* Implement the application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.
* Read the [http://www.opus-codec.org/docs/ documentation].
* Read the opus_demo.c source code to see how to use the encoder and decoder.

If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list].

=== How do I report a bug? ===

If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report]. Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur. If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc. Also, don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].

=== What is Opus Custom? ===

Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms. Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems. For these reasons, its use is discouraged outside of very specific applications, e.g.:
* ultra low delay applications where synchronization with the soundcard buffer is important.
* low-power embedded applications where compatibility with others is not important.

For almost all other types of applications, Opus Custom should not be used.

=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===

Tools which read or write Opus should interoperate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.

Note that it's generally preferable for a decoder to output at 48kHz even when you know the original input was 44.1kHz, not only because you can skip resampling but also because many inexpensive audio interfaces have poor quality output for 44.1k.

The opus-tools package source code contains a small, high quality, high performance, BSD licensed resampler which can be used where resampling is required.

=== How is the bitrate setting used in VBR mode? ===

Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality. The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.

=== What frame size should I use? ===

A 20 ms frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate. Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent).

=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===

The inband FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.

In order to make use of inband FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.

FEC is only used by the encoder under certain conditions: the feature must be enabled via the OPUS_SET_INBAND_FEC CTL, the encoder must be told to expect loss via the OPUS_SET_PACKET_LOSS_PERC CTL, and the codec must be operated in any of the linear prediction or Hybrid modes. Frame durations of <10ms and very high bitrates will use the MDCT modes, where FEC is not available.

Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default the implementation assumes there is no loss.

=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===

A normal build of libopus only uses malloc/free in the _create() and _destroy() calls, so Opus is safe for realtime use so long as the codec state is pre-created.

In order to build Opus without any reference to malloc/free at all use init() calls rather than the create() calls in your application and compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt> you will get a build which does not use malloc/free.

If libopus is built with -DNONTHREADSAFE_PSEUDOSTACK (instead of VAR_ARRAYS, or USE_ALLOCA) it will use a user provided block of heap instead of stack for many things resulting in much lower the stack usage. However this makes the resulting library non-threadsafe and is not recommend on anything except limited embedded platforms.

=== How can I ensure that my software interoperates with other software implementing Opus? ===

For applications using Ogg files, there are some [http://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful. In general, here's a list of specific issues to check:
* Can your application handle all frame sizes, including changing the frame size from frame to frame?
* Does your application properly react to lost packet by calling the decoder with a NULL packet?

=== What is the complexity of Opus? ===

The complexity of Opus varies by a large amount based on the settings used. It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone. For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run on really slow devices like 8-bit micro-controllers.

=== Opus is using too much CPU for my application. What can I do? ===

First don't panic and don't start writing assembly just yet. It's possible that you're just not using the right set of options. If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using --enable-fixed-point or defining FIXED_POINT in the build system. Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using OPUS_SET_COMPLEXITY() (see doc for details). If all else fails and you need to optimize the Opus code, see the next question.

=== I would like to optimize Opus. Where should I start? ===

Please [http://www.opus-codec.org/contact/ contact us] before you start, or at least before you get too far. This will help coordinate the optimization effort and generally reduce the probability of wasting your time on duplicated effort or generally going on the wrong path.

=== Does Opus have an echo canceller like Speex does? ===

Echo cancellation is completely independent from codecs. You can use any echo canceller (including the one from libspeexdsp) along with Opus. That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].

OpusFAQ

2014-11-26T20:45:55Z

MarkH: /* Opus for Software developers */ add a couple of common questions

[[Image:Opus logo trans.png|right]]

== General Questions ==

=== What is Opus? Who created it? ===

Opus is a totally open, royalty-free, highly versatile audio codec. It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's SILK codec and Xiph.Org's CELT codec. It has been standardized by the Internet Engineering Task Force (IETF) as '''[http://tools.ietf.org/html/rfc6716 RFC 6716]'''.

Opus has been in development since early 2007. Programmers associated with Xiph.Org, Skype, and several other organizations have contributed to its development and to the standardization process as part of the IETF's codec working group.

=== How does Opus compare to other codecs? ===

Opus is distinguished from most formats for high quality audio (AAC, Vorbis, MP3) by having low delay and it is distinguished from most low delay formats (G.711, GSM, Speex) by supporting high audio quality. It meets or exceeds existing codecs' quality across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format. Further, the Opus format itself and the reference implementation are available under liberal royalty-free licenses, making it easy to adopt, compatible with free software, and suitable for usage as part of the basic infrastructure of the Internet. See the Opus [http://opus-codec.org/comparison comparison page] for more details.

=== Does Opus make all those other lossy codecs obsolete? ===

Theoretically, yes. From technical point of view (loss, delay, bitrates, ...) it can replace both Vorbis and Speex, and the common proprietary codecs too.

=== Will Opus replace Vorbis in video files? ===

For OGG Theora video files, it can, just the overall size reduction will be minimal, and it will break compatibility with existing players.

For WebM video files, the convention is to use the VP9 video codec when using Opus as an audio codec.

=== How do I use Opus? What programs support Opus? ===

Opus decoding support is now included in many applications, including Firefox, foobar2000, and VLC, as well as in frameworks such as GStreamer and FFmpeg. For now, the best way to '''encode''' Opus files is to use the opusenc command-line tool from the opus-tools package. For real-time applications, Opus support should soon be available in the Google webrtc codebase. Opus is still a new codec, expect many more applications to support it in the near future.

=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===

Yes and no. Opus encoding tools like opusenc will happily encode files that are sampled at 96 or 192 kHz. However, input files at these rates are internally converted to 48 kHz, and then only frequencies up to 20 kHz are encoded. The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go. See Monty's [http://people.xiph.org/~xiphmont/demo/neil-young.html 24/192 Music Downloads ...and why they make no sense] for more details.

=== What are the licensing requirements? ===

The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met.

Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with most (all?) open source licenses, including the GPL (v2 and v3). See the [http://www.opus-codec.org/license/ licensing page] for details.

=== Why make Opus free? ===

On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon. Most of the value of a high quality standard is the innovation and interoperation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common—everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency. Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected. This is why Opus, unlike many codecs, is free.

=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===

No. The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator" and even sharing code between Opus and the "old SILK" would be highly non-trivial.

=== Why not keep the SILK and CELT codecs separate? ===

Opus is more than just two independent codecs with a switch. In addition to a linear prediction "SILK mode" and a MDCT "CELT mode" it has a "hybrid mode," where speech frequencies up to 8 kHz are encoded with LP while those above 8 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kb/s. Another advantage of the integration is the ability to switch between these modes seamlessly, without any "glitch" and without any out-of-band signalling.

=== Now that Opus is standardized, will its development stop, or can it be further improved? ===

Unlike most ITU-T codecs, Opus is only defined in terms of its decoder. The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for MP3 encoders to improve far beyond the original l3enc and the dist10 reference implementation. Although it is unlikely that Opus encoders will see such spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder. In fact, the 1.1 libopus release significantly improves on the reference encoder's quality.

=== Will all future Opus releases comply with the [http://tools.ietf.org/html/rfc6716 Opus specification]? ===

Yes.

=== In what ways is Opus optimized for the Internet? ===

Opus being optimized for the Internet obviously means that it has good packet loss robustness and concealment, but it goes further. One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments. This is why Opus scales from about 6 kb/s to 512 kb/s, in increments of 0.4 kb/s (one byte with 20 ms frames). The reason Opus can have more than 1200 possible bitrates spending 11 bits signalling the bitrate is because UDP already encodes the packet size. One last aspect is that Opus is simple to transport over RTP, as can be seen from [http://tools.ietf.org/html/draft-spittka-payload-rtp-opus Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.

=== What applications for Android can play Opus? ===

Right now, there are just a few but that list is fast growing. Please reference [http://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.

=== When will the next version be released? ===

When it's done. Seriously. We do not know. Opus is not a large project with a fixed release schedule. That being said, our pre-releases and even the git repository are generally pretty stable and given proper testing (which you should always do anyway), are safe to distribute. Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.

== Opus for Software developers ==

=== On what platforms does Opus run? ===

The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs. A few of the platforms on which Opus has been tested and is known to run include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.

=== Is there a fixed-point implementation? ===

Yes; the fixed-point and floating-point decoder and encoder implementations are part of the same code base. The code defaults to float, so you need to configure with --enable-fixed-point (or defining FIXED_POINT if not using the configure script) to build the code for fixed-point.

=== Which implementation should I use? ===

While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation. The [http://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation— in terms of speed, encoding quality, device compatibility, etc— while still conforming to the standard. All Opus implementations are compatible by definition.

=== How is supporting Opus different from supporting Speex/G.711/MP3? ===

Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are any multiple of 2.5ms up to a maximum of 120ms.

The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.

=== My application doesn't work. Can anyone help me? ===

It's possible to get help, but before doing so, there are a few basic things to try:

* Implement the application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.
* Read the [http://www.opus-codec.org/docs/ documentation].
* Read the opus_demo.c source code to see how to use the encoder and decoder.

If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list].

=== How do I report a bug? ===

If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report]. Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur. If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc. Also, don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].

=== What is Opus Custom? ===

Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms. Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems. For these reasons, its use is discouraged outside of very specific applications, e.g.:
* ultra low delay applications where synchronization with the soundcard buffer is important.
* low-power embedded applications where compatibility with others is not important.

For almost all other types of applications, Opus Custom should not be used.

=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===

Tools which read or write Opus should interoperate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.

Note that it's generally preferable for a decoder to output at 48kHz even when you know the original input was 44.1kHz, not only because you can skip resampling but also because many inexpensive audio interfaces have poor quality output for 44.1k.

The opus-tools package source code contains a small, high quality, high performance, BSD licensed resampler which can be used where resampling is required.

=== How is the bitrate setting used in VBR mode? ===

Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality. The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.

=== What frame size should I use? ===

A 20 ms frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate. Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent).

=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===

The inband FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.

In order to make use of inband FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.

FEC is only used by the encoder under certain conditions: the feature must be enabled via the OPUS_SET_INBAND_FEC CTL, the encoder must be told to expect loss via the OPUS_SET_PACKET_LOSS_PERC CTL, and the codec must be operated in any of the linear prediction or Hybrid modes. Frame durations of <10ms and very high bitrates will use the MDCT modes, where FEC is not available.

Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default the implementation assumes there is no loss.

=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===

A normal build of libopus only uses malloc/free in the _create() and _destroy() calls, so Opus is safe for realtime use so long as the codec state is pre-created.

In order to build Opus without any reference to malloc/free at all use init() calls rather than the create() calls in your application and compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt> you will get a build which does not use malloc/free.

If libopus is built with -DNONTHREADSAFE_PSEUDOSTACK (instead of VAR_ARRAYS, or USE_ALLOCA) it will use a user provided block of heap instead of stack for many things resulting in much lower the stack usage. However this makes the resulting library non-threadsafe and is not recommend on anything except limited embedded platforms.

=== How can I ensure that my software interoperates with other software implementing Opus? ===

For applications using Ogg files, there are some [http://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful. In general, here's a list of specific issues to check:
* Can your application handle all frame sizes, including changing the frame size from frame to frame?
* Does your application properly react to lost packet by calling the decoder with a NULL packet?

=== What is the complexity of Opus? ===

The complexity of Opus varies by a large amount based on the settings used. It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone. For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run on really slow devices like 8-bit micro-controllers.

=== Opus is using too much CPU for my application. What can I do? ===

First don't panic and don't start writing assembly just yet. It's possible that you're just not using the right set of options. If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using --enable-fixed-point or defining FIXED_POINT in the build system. Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using OPUS_SET_COMPLEXITY() (see doc for details). If all else fails and you need to optimize the Opus code, see the next question.

=== I would like to optimize Opus. Where should I start? ===

Please [http://www.opus-codec.org/contact/ contact us] before you start, or at least before you get too far. This will help coordinate the optimization effort and generally reduce the probability of wasting your time on duplicated effort or generally going on the wrong path.

=== Does Opus have an echo canceller like Speex does? ===

Echo cancellation is completely independent from codecs. You can use any echo canceller (including the one from libspeexdsp) along with Opus. That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].

OggOpusImplementation

2014-01-24T09:25:25Z

MarkH: MacPorts; fix OS name; Chrome no longer uses WebKit; update FFmpeg support

== Implementation Status ==

Implementation status of the [https://tools.ietf.org/html/draft-ietf-codec-oggopus Ogg Opus draft]. This draft describes encapsulation of Opus audio in the Ogg container to make <tt>.opus</tt> files and streams.

What follows is a brief summary of major implementations of the draft, and their status.
This is intended to help understand the status of each portion of the draft, per [https://tools.ietf.org/html/rfc6982 RFC 6982].

=== opus-tools ===

The initial development implementation of this draft was in the opusenc, opusdec, and opusinfo command-line utilities, part of the opus-tools package and repository.
While still 'development' status (pre-1.0) these utilities are in active public use, and have shipped with Linux distributions as well as homebrew and MacPorts for OS X.
Together they implement basic read, write and playback support of Ogg Opus files including metadata, multichannel, start and end trimming, the gain field, live streams, and chained files, but currently do not support seeking.

This implementation is open source.

* https://git.xiph.org/?p=opus-tools.git
* http://www.opus-codec.org/downloads/

=== opusfile ===

The opusfile library is a separate implementation of this draft as a helper library for demuxing and decoding.
Like opus-tools, it supports metadata, multichannel, start and end trimming, the gain field, live streams, and chained files.
Its primary focus is efficient seeking, including over HTTP(S) and in chained streams.
It currently does not create Ogg Opus files.
This library is in early development and is not widely deployed, though several projects are currently using it, including xmms2, taglib, and cmus, and it is shipped in some Linux distributions and in homebrew.

This implementation is open source.

* https://git.xiph.org/?p=opusfile.git
* http://www.opus-codec.org/downloads/

=== Firefox ===

The Firefox web browser is a widely deployed implementation of this draft.
Basic playback support with the HTML5 <audio> element, including start and end trimming, the gain field, live streams, multiplexing with other streams (for, e.g., the <video> tag), and seeking, was added in Firefox 15, in production release starting August 28, 2012.
Multichannel support was added in Firefox 17, in production release starting November 20, 2012.
Metadata support was added in Firefox 18, in production release starting January 8, 2013.
Chained file support (as streams only, with seeking disabled) was added in Firefox 20, in production release starting April 2, 2013.
Encoding support was added in Firefox 26, in production release starting December 10, 2013.

This implementation is open source.

* https://mozilla.org/firefox/
* https://hacks.mozilla.org/2012/08/opus-support-for-webrtc/
* https://bugzilla.mozilla.org/show_bug.cgi?id=674225
* https://bugzilla.mozilla.org/show_bug.cgi?id=748144
* https://bugzilla.mozilla.org/show_bug.cgi?id=778050
* https://bugzilla.mozilla.org/show_bug.cgi?id=455165
* https://bugzilla.mozilla.org/show_bug.cgi?id=842243

=== Chrome ===

Google's Chrome web browser added support for this draft with the HTML5 <audio> element in M25 and enabled it by default in M33 for stable release in early 2014.
This implementation currently does not support end trimming, the gain tag, or chained files.
Prior to M33 support required passing --enable-opus-playback on the command line when invoking the executable.

This implementation is based on open source code in Chromium, Blink, and FFmpeg.

* https://www.google.com/intl/en/chrome/browser/
* https://www.google.com/intl/en/chrome/browser/canary.html
* https://code.google.com/p/chromium/issues/detail?id=104241

=== GStreamer ===

The GStreamer media framework includes an implementation of this draft.
It supports metadata, multichannel, start and end trimming, the gain field, live streams, chained files, multiplexing with other streams (e.g., video), and seeking.
Support was first added in early 2011, and is part of the 0.11 and 1.0.x releases.
The code implementing this draft is in the gst-plugins-bad collection, which generally indicates unsupported and/or experimental code, despite its release status.

This implementation is open source.

* http://gstreamer.net/
* http://cgit.freedesktop.org/gstreamer/gst-plugins-bad/

=== FFmpeg ===

The popular media framework and conversion tool FFmpeg implements this draft.
It supports encoding and decoding, multiplexing and demultiplexing with other streams,
metadata, multichannel, start and end trimming, the gain field, live streams, and seeking.

This implementation is open source.

* https://ffmpeg.org/

=== libav ===

The development repository for libav implements this draft, similar to FFmpeg.

This implementation is open source.

* https://libav.org/

=== VLC ===

VLC is another widely deployed implementation of demuxing, decoding, and playback support for this draft.
It supports metadata, multichannel, start and end trimming, the gain field, live streams, seeking, chained files (though seeking does not work correctly with chained files), and multiplexing with other streams (e.g., video).
Opus support was added in version 2.0.4, released on October 18, 2012.

This implementation is open source.

* https://www.videolan.org/vlc/
* https://git.videolan.org/?p=vlc.git
* https://trac.videolan.org/vlc/ticket/7185

=== foobar2000 ===

A popular Windows application, foobar2000 implements read, write, and playback support for this draft.
It supports metadata, multichannel, start and end trimming, the gain field, live streams, chained files, and seeking.
Opus support was added in version 1.1.14, released on August 17, 2012.
Encoding support is implemented using opusenc from opus-tools.

This implementation is closed source.

* http://www.foobar2000.org/

=== Rockbox ===

Rockbox is an established alternative firmware for portable music players (typically small, embedded devices) that implements demuxing, decoding, and playback support for this draft starting with version 3.13 released March 5, 2013.
It supports metadata, start and end trimming, the gain field, and seeking.
It does not currently support multichannel or chained files.

This implementation is open source.

* http://www.rockbox.org/
* http://git.rockbox.org/?p=rockbox.git
* http://gerrit.rockbox.org/r/#/c/300/