XiphWiki - User contributions [en]

FLAC

2025-06-25T09:51:37Z

MrZeus: /* OggFLAC */

[[File:FLAC Logo.svg|frameless|right]]
'''FLAC''' stands for '''Free Lossless Audio Codec'''. FLAC is an [[wikipedia:audio compression|audio compression]] [[wikipedia:codec|codec]] that is [[wikipedia:lossless data compression|lossless]]. Unlike [[wikipedia:lossy data compression|lossy]] codecs such as [[Opus]], [[Vorbis]] and [[wikipedia:MP3|MP3]], it does not remove any information from the audio stream.

On 2003 January 29th, the [[Xiph.Org Foundation]] announced the incorporation of FLAC under their flag, to go along with Vorbis, [[Theora]], and [[Speex]].

== The Project ==

The FLAC project consists of:
* the stream format
* libFLAC, a library of reference encoders and decoders, and a metadata interface
* libFLAC++, an object wrapper around libFLAC
* flac, a command-line wrapper around libFLAC to encode and decode .flac files
* metaflac, a command-line metadata editor for .flac files
* input plugins for various music players ([[wikipedia:Winamp|Winamp]], [[wikipedia:XMMS|XMMS]], [[wikipedia:Foobar2000|foobar2000]], and more in the works)

"Free" means that the specification of the stream format is in the [[wikipedia:public domain|public domain]] (the FLAC project reserves the right to set the FLAC specification and certify compliance), and that neither the FLAC format nor any of the implemented encoding/decoding methods are covered by any patent. It also means that the sources for libFLAC and libFLAC++ are available under The New BSD license and the sources for flac and metaflac applications, and the plugins are available under the [[wikipedia:GPL|GPL]].

== OggFLAC ==

The FLAC codec comes with its own transport system, termed Native FLAC.

A FLAC stream can also be encapsulated in an [[Ogg]] container, the result being termed OggFLAC.
The details of how to do this are called [https://xiph.org/flac/ogg_mapping.html Ogg mapping].

== Comparisons ==

FLAC is distinguished from general lossless algorithms such as ZIP and gzip in that it is specifically designed for the efficient packing of audio data; while ZIP may compress a CD-quality audio file 20–40%, FLAC achieves compression rates of 30–70%.

While lossy codecs can achieve ratios of 80–90+%, they do this at the expense of discarding data from the original stream. Though FLAC uses a similar technique in its encoding process, it also adds "residual" data to allow the decoder to restore the original waveform flawlessly.

FLAC has become the preferred lossless format for trading live music online. It has a smaller file size than Shorten, and unlike MP3, it's lossless, which ensures the highest fidelity to the source material, which is important to live music traders. It has recently become a favorite trading format of non-live lossless audio traders as well.

There are other lossless audio codecs: [http://www.wavpack.com/ WavPack] (marginally better compression, slower), [http://wiki.hydrogenaud.io/index.php?title=TAK Tom's lossless Audio Kompressor], [http://www.monkeysaudio.com/ Monkey's Audio] and [https://en.wikipedia.org/wiki/Category:Lossless_audio_codecs some others].

FLAC compiles on many platforms: most Unices (including Linux, *BSD, Solaris, and Mac OS X), DOS, Windows, BeOS, and OS/2. There are build systems for autoconf/automake, MSVC, Watcom C, and Project Builder.

== More information ==

* [[FLACDecoders]]: List of decoders
* [[FLACEncoders]]: List of encoders

== Non-PC playback support ==

FLAC is supported by a wide range of devices.

The [[PortablePlayers#Portable Vorbis Native Support Table|portable players Vorbis support matrix]] also contains information about FLAC support.

Other examples of FLAC supporting devices are:

* [[PortablePlayers/Flash#Cowon.2FiAudio_D2.2C_F2.2C_T2.2C_U3.2C_U2.2C_G3.2C_5.2C_G2.2C_U5.2C_7|iAudio]]: http://www.iaudio.com
* Kenwood Music Keg
* Naim HDX: http://www.naim-audio.com/products/hdx.html
* PhatNoise Home Media Player
* PhatNoise Phatbox
* [[PortablePlayers/Harddisk#Rio Karma|Rio Karma]]: http://www.digitalnetworksna.com/rioaudio/
* [[StaticPlayers#Slim_Devices_Squeezebox.2C_Squeezebox2.2C_Squeezebox3.2C_Transporter|SlimDevices Squeezebox]]: http://www.slimdevices.com

FLAC is supported by the following chips and/or chipsets:

* VLSI Solution OY's [http://www.vlsi.fi/en/products/vs1053.html VS1053b] decodes FLAC

== External Links ==

* [https://xiph.org/flac/ Project homepage]
* [http://www.danrules.com/macflac/ MacFLAC] [[GUI]] frontend to encode/decode FLAC on [[Mac OS X]]
* [[Wikipedia: FLAC]]
* [http://losslessaudio.blogspot.co.uk/ The Lossless Audio Blog]

* [http://www.audiograaf.nl/downloads.html Lossless Codec comparison]

[[Category:FLAC]]

Opus Recommended Settings

2018-12-14T13:15:23Z

MrZeus: /* Mono or Stereo */ tweak mono/stereo threshhold according to current source code

= Recommended Bitrates =
Depending on the kind of audio you want to encode with Opus, you may want to use different bitrate (quality) settings.

The settings in the table below are meant to '''start you off''' with a decent tradeoff between '''good quality''' and '''small file size''' (or '''bitrate usage''', if you're streaming).

You should test the suggested bitrate by actually '''listening''' to your encoded audio and then:
* tweaking the bitrate '''down''' if you think the quality is good, but the file size (or bitrate) is too big,
* tweaking the bitrate '''up''' if you think the quality is bad, and you can afford having bigger files (or a larger streaming bitrate).

{| class="wikitable" style="text-align:center"
|-
!Use Case
!Channels
!Bitrate (Kb/s)
!Notes
|-
|Low bandwidth HF/VHF digital radio
|1 (mono)
|Use '''[http://www.rowetel.com/?page_id=452 Codec 2]'''
|Opus only supports bitrates '''down to 6 Kb/s'''. 
Codec 2 handles ultra low bitrate speech at '''0.7 - 3.2 Kb/s'''.
|-
|VoIP
|1
|10 - 24
|10 Kb/s will deliver narrowband most of the time, 24 Kb/s should give fullband. 
More details in '''[[Opus_Recommended_Settings#Bandwidth_Transition_Thresholds|the relevant table]]''' further down this page.
|-
|rowspan="2"|Audiobooks / Podcasts
|1
|24
|Bitrates from here on up tend to deliver fullband audio.
|-
|2 (stereo)
|32
|
|-
|Music Streaming / Radio
|2
|64 - 96
|Opus has better quality than MP3, AAC and [[Vorbis]] at these rates. 
(listening test results: '''[http://listening-tests.hydrogenaud.io/igorc/results.html 64 Kb/s]''', '''[http://listening-test.coresv.net/results.htm 96 Kb/s]''')
|-
|rowspan="3"|Music Storage
|2
|96 - 128
|Opus at 128 KB/s (VBR) is pretty much '''[https://en.wikipedia.org/wiki/Transparency_(data_compression) transparent]'''.
|-
|6 (5.1 surround)
|128 - 256
|rowspan="2"|For surround sound, Opus uses '''[https://xiph.org/~xiphmont/demo/opus/demo3.shtml surround-sound bitrate allocation]'''.
|-
|8 (7.1 surround)
|256 - 450
|-
|Music Archiving
|1 - 8
|Use '''[[FLAC]]'''
|If you are archiving audio, use a '''[https://en.wikipedia.org/wiki/Audio_file_format#Lossless_compressed_audio_format lossless audio format]''' to prevent '''[https://en.wikipedia.org/wiki/Generation_loss generation loss]'''.
|}

= Technical Details =
For the more technical Opus users, here are some details to help you fine-tune your decision on which bitrate best fits your needs.

== Mono or Stereo ==
Opus tends to start '''downmixing stereo inputs to mono''' from roughly '''19 Kb/s and lower'''.
You can check the details in the '''[https://github.com/xiph/opus/blob/master/src/opus_encoder.c#L149 opus_encoder.c]''' source file.

You can force downmixing at any bitrate by using the following command-line parameters:

<code>--downmix-mono</code> - downmixes all input channels to mono

<code>--downmix-stereo</code> - downmixes all input channels to stereo (if there are more than 2 input channels, e.g. surround sound)

== Bandwidth Transition Thresholds ==
The following table shows rough bitrates that you might want to use to encode audio that has '''[https://tools.ietf.org/html/rfc6716#section-2 limited frequency bandwidths]'''.
This could be useful if your audio has already been bandpassed, or should go through a bandpass filter (e.g. VoIP speech).

{| class="wikitable" style="text-align:center"
|-
!rowspan="3"|Bandpass Range (Hz)
!colspan="4"|Rough Bitrate Required (Kb/s)
|-
!colspan="2"|Mono
!colspan="2"|Stereo
|-
!Voice
!Music
!Voice
!Music
|-
|style="text-align:right;"|NarrowBand (3 - 4000)
|12
|15
|?
|?
|-
|style="text-align:right;"|MediumBand (3 - 6000)
|15
|18-22
|?
|?
|-
|style="text-align:right;"|WideBand (3 - 8000)
|16-20
|22-28
|?
|?
|-
|style="text-align:right;"|SuperWideBand (3-12000)
|24-28
|28-32
|?
|?
|-
|style="text-align:right;"|FullBand (3-20000)
|28-40
|32-64
|32-64
|64-128
|}

The details of Opus' bandpass thresholds can be found in the '''[https://github.com/xiph/opus/blob/master/src/opus_encoder.c#L121 opus_encoder.c]''' source file.

The '''[http://wiki.hydrogenaud.io/index.php?title=Opus HydrogenAudio]''' wiki also has some great information on Opus and its usage.

== Framesize Tweaking ==
Opus can encode frames of '''2.5''', '''5''', '''10''', '''20''', '''40''', or '''60 ms'''. It can also combine multiple frames into packets of '''up to 120 ms'''.

Opus uses a '''20 ms''' frame size '''[https://tools.ietf.org/html/rfc6716#section-2.1.4 by default]''', as it gives a decent mix of low latency and good quality.

For real-time applications, sending fewer packets per second reduces the overall bitrate, since it reduces the overhead from '''[https://en.wikipedia.org/wiki/IPv6_packet#Fixed_header IP]''', '''[https://en.wikipedia.org/wiki/User_Datagram_Protocol#Packet_structure UDP]''', and '''[https://en.wikipedia.org/wiki/Real-time_Transport_Protocol#Packet_header RTP headers]'''.
However, it increases latency and sensitivity to packet losses, as losing one packet constitutes a loss of a bigger chunk of audio.
Unless operating at very low bitrates over RTP, there is no reason to use frame sizes above 20 ms, as those will have slightly lower quality for music encoding.

For these reasons, the default 20 ms frames are a good choice for most applications.

== Trading Coding Efficiency with CPU Time ==
The Opus encoder uses its maximum algorithmic '''complexity''' setting of '''10''' '''[https://tools.ietf.org/html/rfc6716#section-2.1.5 by default]'''. This means that it does not hesitate to use CPU to give you the best quality encoding at a given bitrate.

If the CPU usage is too high for the system you are using Opus on, you can try a lower complexity setting. The allowed values span from '''10''' (highest CPU usage and quality) down to '''0''' (lowest CPU usage and quality).

[[Category:Opus]]

Speex

2018-05-21T09:18:06Z

MrZeus: /* Usefulness, Speex DSP */

{{historical}}

= The Speex codec is deprecated! Xiph recommends you use the superior '''[[Opus]]''' codec instead. =

== Website ==
The [http://www.speex.org/ Speex homepage] has all the project info.

There is also a '''[[Speex FAQ]]'''.

== Hardware ==
See [[Speex hardware]] for a partial list of supported hardware

== Usefulness, Speex DSP ==
In 2009, the source code of Speex was split into '''Speex Codec''' and '''Speex DSP'''.

While the Codec part has been deprecated since 2013, the '''[https://git.xiph.org/?p=speexdsp.git;a=summary Speex DSP]''' part is still useful and still under development.

It contains among others resampling code (written originally 2007 for the Speex codec) used by several other audio projects, including '''[https://git.xiph.org/?p=opus-tools.git Opus Tools]''' (actually OPUS TOOLS 0.1.9 from 2014 contains an outdated version of the resampling code from 2009 with minimal modifications).

== Tasks ==
These are some improvements that could be made to Speex.

[mailto:speex-dev@xiph.org Let us know] if you'd like to work on one of them.

* Speech/signal processing (DSP design)
** Improve noise suppression (get rid of musical noise) and residual echo suppression
** Improve packet-loss concealment (PLC)
** Re-write the built-in voice activity detector (VAD)
** Improve the 2.15 kbps vocoder mode (there are even 4 unused bits left to use)
** Algorithmic optimizations (see if some searches can be simplified/approximated)

* Complete fixed-point (DSP development)
** Wideband
** VBR
** Rest of the narrowband modes
** Preprocessor (noise suppression, AGC)
** Jitter buffer
** Arch-specific optimization
** More...

* Tune (playing with parameters)
** Noise weighting filter
** Perceptual enhancement

* Features (plain C programming)
** Implement maximum VBR bit-rate
** Implement peeling (write functions to strip some of the bits)
*** Peel high-band (wideband -> narrowband)
*** Transform 24.6 kbps mode to 15 kbps mode

* Documentation
** Use questions from the mailing list to create a [[Speex_FAQ|better FAQ]] on this wiki
** Update the Speex manual based on recent papers
** Improve libspeex documentation
** Write good example code
** Split off the documentation of SpeexDSP ... the latest release "speexdsp-1.2rc3.tar.gz" from 2015 still contains "The Speex Codec Manual Version 1.2 Beta 3" from 2007 ... it documents the codec part, not the DSP part, further [http://www.speex.org/downloads] contains some dead links, and it does not prominently document that only the codec is deprecated, but the DSP part is not

== External links ==
* [[Applications that use Speex]]
* [[Games that use Speex]]
* [[Wikipedia: Speex]]

[[Category:Speex]]

Games that use Vorbis

2018-03-03T16:20:19Z

MrZeus: add Press X To Not Die (fun game!)

The following games use [[Vorbis]], most frequently for their in-game music or sound effects:

* All Games By [http://www.reflexive.com/index.php?CAT=Search&SEARCH=dev%3AReflexive+Entertainment&PAGE=GameList Reflexive Entertainment].

* [http://www.mobygames.com/game/windows/007-nightfire 007: Nightfire]: Uses Ogg Vorbis for background soundtrack.

* [http://www.asciisector.net/ Ascii Sector]: Space combat/exploration/trading game. Uses Ogg Vorbis for music.

* [http://www.ageofconan.com/ Age of Conan — Hyborian Adventures]: Uses Ogg Vorbis for all audio.

* [http://www.americasarmy.com/ America’s Army]: Uses Ogg Vorbis for main theme.

* [http://www.amnesiagame.com/ Amnesia: The Dark Descent]: Uses Ogg Vorbis for all audio.

* [http://assault.cubers.net/ AssaultCube]: A free fast paced first-person shooter with little hardware requirements for Windows, Linux and OS X. Uses Ogg Vorbis for all game sounds and music.

* [http://www.lionhead.com/bw2/ Black & White 2]: Uses Ogg Vorbis for music.

* [http://www.pyrogon.com/games/candycruncher/ Candy Cruncher]: This cute puzzle game from Brian Hook’s company, Pyrogon, uses Vorbis for the addictive music you hear while you race the clock.

* [http://www.callofcthulhu.com/ Call of Cthulhu] is a first-person horror game that combines intense action and adventure elements. It uses Ogg Vorbis for music and speech.

* [http://www.mobygames.com/game/windows/catechumen Catechumen] is a Christian-themed FPS that uses Ogg Vorbis.

* [http://www.civilization5.com/ Civilization V] is a turn-based strategy game that uses Ogg Vorbis for music.

* [http://www.atari.com/crashday/ Crashday]: Stunt racing game, developed by independent German studio Moon Byte. Uses Ogg Vorbis for music.

* [http://buenavistagames.go.com/product/chickenLittlePC.html Chicken Little]: Adventure game for children inspired by the motion picture in PC edition uses Vorbis for dialogs and music. (not sure if sound effects too)

* [http://www.cossacks2.de/ Cossacks 2]: “Cossacks II: Napoleonic Wars” is a sequel of “Cossacks: European Wars”. Ogg Vorbis 1.0 files are in \data\music\

* [http://www.darwinia.co.uk/ Darwinia]: The second title from Indy developer Introversion Software. Darwinia is a stylized retro — Tron meets Cannon Fodder. It uses Vorbis for all in game sound effects and music.

* [http://www.introversion.co.uk/defcon/ DEFCON]: The third title from Introversion Software. Uses Vorbis for music, effects, everything, like Darwinia.

* [http://devilmaycry.com/ Devil May Cry 4] (for the PC, at least): Uses (occasionally multichannel) Ogg Vorbis for ingame and cutscene music.

* [http://www.eidos.co.uk/gss/dxiw/ Deus Ex: Invisible War] by Ion Storm/Eidos: Uses Ogg Vorbis for music and voice (and possibly for sound fx too).

* [http://diablo3.com Diablo III] uses Vorbis for audio.

* [http://www.idsoftware.com/games/doom/doom3/ DOOM 3]: The latest version of this famous first person shooter game from id software uses Vorbis for the theme music as well as their ambient and game sounds.

* [http://mobygames.com/game/sheet/p,3/gameId,6505/ Duke Nukem: Manhattan Project]: This game from 3D Realms was released in 2002 and used Vorbis for their music. (Official website is down, using Mobygames link)

* [http://www.popcap.com/games/free/dynomite Dynomite]: Puzzle Bobble/Bust A Move clone for Windows by PopCap Games, with mouse control. Uses Ogg Vorbis for nearly all sound effects.

* [http://en.wikipedia.org/wiki/Eschalon:_Book_I Eschalon]: A classic-style roleplaying game, for Windows, Mac, and Linux. Music is in ''Ogg Vorbis'' format.

* [http://www.mobygames.com/game/enclave/ Enclave] by Starbreeze/Black Label Games: Uses Ogg Vorbis for music (and possibly for sound fx and voice too).

* [http://www.eve-online.com EVE Online] by CCP Games, the Icelandic-homed space-based single-shard persistent world game uses Ogg Vorbis for its music.

* [http://www.lionhead.com/fabletlc/ Fable: The Lost Chapters]: Uses Ogg Vorbis for music and cutscenes (Ancient libVorbis version, 1.0 RC2).

* [http://farcry.ubi.com/ FarCry] by Crytek: uses Ogg Vorbis for music and effects.

* [http://www.freedom-fighters.co.uk/ Freedom Fighters] by IO Interactive: String search reveals “libVorbis I 20011217” in freedom.exe.

* [http://www.siriusgames.dk/index.php?pageid=67 Gangland] by MediaMobsters: Uses Ogg Vorbis for music and cutscenes (Data\streams\). Encoded with Xiph.Org libVorbis I 20020717. Decoder library: FMOD 3.71.

* [http://www.rockstargames.com/vicecity/ Grand Theft Auto: Vice City] by Rockstar Games/Rockstar North uses Ogg Vorbis to store music, radio, ambient sounds, police messages and cutscene audio. Players can also store their custom tracks (accessible in-game via the “User Track Player” radio station) in Ogg Vorbis.

* [http://www.rockstargames.com/sanandreas/ Grand Theft Auto: San Andreas] by Rockstar Games/Rockstar North uses Ogg Vorbis to store music, radio, ambient sounds, police messages and cutscene audio. Players can also store their custom tracks (accessible in-game via the “User Track Player” radio station) in Ogg Vorbis.

* [http://www.gothic3.com/ Gothic 3] by Piranha Bytes: Vorbis is used in the Ogg container for everything (music, speech, effects) except of the intro video. For example: Music @ 256 kb/s, Speech @ 86 kb/s. About 18 hours of speech compressed to 700 MB.

* [http://www.guiltygearx2reload.com/ Guilty Gear XX]: The PC version, at least, uses Ogg Vorbis for all the music.

* [http://www.guitarherogame.com/gh2/ Guitar Hero II] by Red Octane (Activision), XBox360 platform only (multichannel Vorbis with 5 or 6 channels per song)

* [http://halo.bungie.org/ Halo]: Mac and PC versions of Halo use Ogg Vorbis for all audio, it seems. The Xiph license and dynamically linked libraries of Ogg and Vorbis are included in the Halo directory. XBox version does not use Ogg Vorbis.

* [http://harrypotter.ea.com/cofs/index.html Harry Potter II (Chamber of Secrets)]: This is unsubstantiated, it was reported on one of the vorbis mailing lists, but there is little evidence either way on this title. EA has been supportive of Vorbis though, so it’s not entirely impossible. If anyone can give us a yay or nay on this, please do.

* [http://www.mightandmagicgame.com/HeroesV/ Heroes of Might and Magic V]: Uses Vorbis for audio and Theora for video.

* [http://www.eidosinteractive.com/games/info.html?gmid=118 Hitman 2]: uses Vorbis. (PC only or consoles too?)

* [http://www.codemasters.com/igi2/front.htm IGI2: Covert Strike]: Not a Norwegian first-person shooter.

* [http://www.inthegroove.com In The Groove]: The premier dance game created by [http://www.roxorgames.com Roxor Games, Inc.] Uses Vorbis for all of the in-game music.

* [http://www.agdinteractive.com/games/kq1/ King's Quest I]: King's Quest I: Quest for the Crown (Enhanced) is a fan remake of the original Sierra classic. Uses Ogg Vorbis for sound and Ogg Theora for cutscene movies.

* [http://www.p3int.com/KULT/ KULT Heretic Kingdoms] by 3D People/Project 3 Interactive: Uses Vorbis (1.0) for music, voice and sound effects.

* Recent Legacy of Kain Games: On the PC, both '''Soul Reaver 2''' and '''Blood Omen 2''' by Crystal Dynamics/Eidos use Ogg Vorbis for music and sound effects. (Source: [http://www.thelostworlds.net/FAQ.HTML#ogg])

* [http://www.ncsoft.net/eng/ncgames/lineage2_intro.asp Lineage II]: NCSoft Corporation’s 3D MMORPG Lineage II uses Ogg Vorbis for its music. They use 1.0beta3, though.

* [http://www.liveforspeed.net/ Live for Speed]: Online racing simulator uses Ogg for all audio and sound effects.

* [http://www.mobygames.com/game/lock-on-modern-air-combat Lock On: Modern Air Combat]: Published by Ubisoft; CD-ROM contains over 1800 Ogg Vorbis files for speech.

* [http://www.mafia-game.com/ Mafia: The City Of Lost Heaven]: Not sure about any console version, but PC version is reported to use Ogg Vorbis.

* [http://www.popcap.com/games/magicmatch Magic Match]: A very elaborate "Match 3" casual game that uses Ogg Vorbis for its audio.

* [http://www.capcom.co.jp/rockmanx8/ Mega Man X8]: The PC version of Mega Man X8 makes use of Vorbis for music and dialogue during cutscenes.

* [http://www.mobygames.com/game/gamecube/metal-gear-solid-the-twin-snakes Metal Gear Solid: The Twin Snakes]: Uses Ogg Vorbis for all speech in the game.

* [http://minecraft.net Minecraft]: Uses Ogg Vorbis for music and sound effects.

* MotoGP: This motorcycle racing sim uses Vorbis for the music and allows players to drop their own .ogg files into the music dir to listen to them in-game.

* [http://www.mystrevelation.com/ Myst IV: Revelation]: Fourth game in the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* [http://www.mystvgame.com/ Myst V: End of Ages]: Fifth and final game in the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* Nascar Racing Games from Papyrus: They had this to say about their decision and experience:
<blockquote style="background-color: #eeeeee">
"We’re using a lot of spoken audio in this title (a first for us) and your codec has allowed us to reduce more than 350MB of audio data to about 40MB, a huge savings of memory and disk space! We are very impressed." — Tom Faiano, Producer
</blockquote>
<blockquote style="background-color: #eeeeee">
"Incorporating Ogg Vorbis into our codebase was quite painless, and in the end, even refreshing. No fuss no muss. Thank you for your efforts!" — Bill Farquhar, Soundguy du jour
</blockquote>

* [http://www.nexuiz.com/ Nexuiz], a fast-paced FPS with roots in Quake I, uses Vorbis for background music. The minstagib mod uses Vorbis for all of its sound.

* [http://www.codemasters.com/flashpoint/ Operation Flashpoint]: This highly successful military simulation/action game from Codemasters uses Vorbis for the in-game music.

* [http://www.orunner.com/ Ostrich Runner] by Geleos: This funny Russian cartoon-style game for kids and not only kids uses Ogg Vorbis for sound, speech and music.

* [http://www.ysagoon.com/glob2/ Globulation 2]: State of the art GPL-ed strategy game!

* [http://www.penumbragame.com Penumbra: Black Plague]: Uses Ogg Vorbis for all audio.

* [http://www.psobb.com/index.php Phantasy Star Online: Blue Burst]: Uses Ogg Vorbis for music, stored in data/ogg.

* [http://www.gopostal.com/ Postal 2]: Probably not the game we want to use to showcase Vorbis, but it’s being used in this Unreal-engine-powered ultra-violent game.

* [http://www.praetoriansgame.com/ Praetorians]: This very successful game from Pyro Studios uses Vorbis for its music.

* [http://www.pressxtonotdie.com/ Press X To Not Die]: This interactive action-comedy film uses Vorbis for its audio and Theora for its video.

* [http://www.psychonauts.com/ Psychonauts]: Has vorbis.dll and vorbisfile.dll.

* [http://www.quake4game.com/ Quake 4]: Quake 4 is the fourth title in the series of Quake FPS computer games. All game music, speech and sound effects make use of Vorbis.

* [http://www.restricted-area.net/ Restricted Area]: by Master Creating uses Ogg Vorbis for music and VP3 for videos.

* Ricochet: An addictive version of Break out.

* [http://www.rockband.com/ Rock Band]: XBox360 version uses the same type of multichannel Vorbis files as Guitar Hero II, but with more channels to handle the drums and vocals separately.

* [http://www.rockmanager.net/ Rock Manager]: Vorbis is used in this “new rock’n roll management sim for PC from Pan Vision and Monsterland”.

* [http://www.sacred2.com/ Sacred 2] by Studio II: uses multichannel(!) Ogg Vorbis for music, speech and sound effects.

* [http://www.s2games.com/savage/ Savage]: This S2 Games “RTSS” hybrid genre game uses Vorbis for all the in-game music.

* [http://www.serioussam.com/se/ Serious Sam: The Second Encounter]: uses Vorbis for the music, although it is slightly obfuscated so as not to be easily playable by standard Ogg Vorbis players.

* [http://www.serioussam2.com/ Serious Sam 2]: not only uses Vorbis for the music but even Theora for the videos

* [http://www.totalwar.com/community/warlord.htm Shogun: Total War]: Shogun uses Vorbis, but only to distribute — everything is decompressed to wav during the install.

* [http://www.singles2.com/englisch/index.html Singles 2]: Uses Ogg Vorbis for sound

* [http://www.lart.pl/en/portfolioItem.php?id=91 Ski Jumping 2004]: A commerical game that accurately models the activity of ski jumping. The game also contains over 700 Ogg Vorbis files.

* [http://mobygames.com/game/sheet/p,3/gameId,3453/ Star Trek: Away Team]: Vorbis is used for all sound in the game — music, voiceover and SFX. This squad-based strategy game is set in the Star Trek Next Generation universe. (Official website is down, using Mobygames link)

* [http://starcraft2.com/ StarCraft II]: Uses Vorbis for audio

* StoneLoops! Of Jurassica ([http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=315210057&mt=8 Apple iTunes App Store link]): Colorful puzzle game for the iPhone/iPod Touch that uses Ogg Vorbis for audio.

* [http://supertux.lethargik.org/ Super Tux]: Uses Vorbis for music.

* [http://www.splintercell3.com/ Tom Clancy’s Splinter Cell Chaos Theory]: .LS0 files are in fact Ogg Vorbis files.

* [http://www.lucasarts.com/games/swrepubliccommando/ Star Wars Republic Commando]: Vorbis is used in the ambient and game music in this latest action game from LucasArts.

* [http://www.reflexive.net/index.php?PAGE=game_detail&AID=30 Swarm]: A fun little arcade shooter.

* [http://www.swat4.com/ SWAT 4]: SWAT 4 uses Ogg Vorbis for audio files.

* [http://www.croteam.com/talosprinciple/ The Talos Principle] is a first-person puzzle game that uses Ogg Vorbis for music.

* [http://www.there.com/ There]: uses both Ogg Vorbis for the sound effects and Ogg Speex for realtime group voice chat, a first for an immersive consumer-oriented world.
<blockquote style="background-color: #eeeeee">
"Voice has become a very popular part of our product!" — David Weekly, a There developer
</blockquote>

* [http://www.wesnoth.org The Battle for Wesnoth]: uses Ogg Vorbis for it's music and for most of it's sounds.

* [http://www.riddickgame.com/ The Chronicles of Riddick: Escape From Butcher’s Bay (Director’s Cut)]: Uses Vorbis for all audio and Theora for cutscenes.

* [https://thimbleweedpark.com/ Thimbleweed Park]: Retro-looking point-and-click adventure, [https://blog.thimbleweedpark.com/tracking_talkies using Ogg Vorbis for its music, character voices and sound effects].
<blockquote style="background-color: #eeeeee">
"[The characters' dialog is] around 6GB of .wav files and we needed to compress them for inclusion in the game. We used .ogg files due to it being free of the patent and licensing issues that .mp3 has, although either would have worked." — Ron Gilbert
</blockquote>

* [http://www.thethinggames.com/ The Thing]: Uses Vorbis
<blockquote style="background-color: #eeeeee">
"The original multilanguage distro took three CDs, and went down to only one after I converted all wavs to oggs. Nifty :) Sadly enough, marketing decided to not have one language per CD anyway (probably to annoy people who migrate) :/ Thanks for a very cool (and easy to use) lib/format!" — Vincent Penquerc’h
</blockquote>

* [http://www.asahi-net.or.jp/~cs8k-cyu/windows/tt_e.html Torus Trooper]: Frantic 3D shootemup, using Vorbis for the music. (see also the [http://www.emhsoft.net/ttrooper/ Linux port] and [http://www.apple.com/downloads/macosx/games/action_adventure/torustrooper.html MacOS version])

* [http://www.trackmania.com/ TrackMania] uses Vorbis for music in menu and tracks. [music in self-made tracks also need to be in Vorbis]

* [http://www.mikeoldfield.com/ Tr3s Lunas] (aka Music VR episode 1): This game, featuring the music of Mike Oldfield, uses Vorbis for the music.

* [http://www.tribesvengeance.com Tribes: Vengance] by Irration Games/Sierra use Ogg Vorbis for music.

* [http://www.mobygames.com/game/gamecube/true-crime-new-york-city True Crime: New York City]: GameCube version contains over 11,500 Ogg Vorbis files. It is likely that other platform ports also use the same files (note that the [http://www.mobygames.com/game/xbox/true-crime-new-york-city Xbox version] uses Windows Media Audio files in place of Ogg Vorbis files)

* [http://tuxtype.sourceforge.net/ Tuxtyping 2]: Educational typing tutor for kids of all ages!

* [http://www.ufo-aftershock.com/ UFO: Aftershock]: Uses Vorbis for music.

* [http://www.ufo-afterlight.com/ UFO: Afterlight]: Uses Vorbis for music.

* [http://www.atari.com/us/games/unreal2/pc Unreal 2]: PC version uses Vorbis, usage on consoles not confirmed.
<blockquote style="background-color: #eeeeee">
"We went with Ogg Vorbis due to its excellent playback and compression, and we used it not only for music but also all of the in-game voice. Without it, we never would have been able to fit on two CDs." — [http://www.4unrealers.com/entrevistas/263/ 4unrealers.com]
</blockquote>

* [http://www.unrealtournament.com/ut2003/ Unreal Tournament 2003]: This overwhelmingly-popular multiplayer first person shooter PC title uses Vorbis for its music.

* [http://www.unrealtournament.com/ut2004/ Unreal Tournament 2004]: Yet another Unreal game which uses Vorbis for the music (What about effects and voice? Does anyone know?). The readme file of the demo even mentions Speex!

* [http://sc2.sourceforge.net/ The Ur-Quan Masters]: Port of Star Control 2 to modern computers. Toys for Bob released the source of this amazing game under the GPL in 2002. Ogg Vorbis is used for the dialogue and the background music.

* [http://uru.ubi.com/ Uru: Ages Beyond Myst]: Spinoff from the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* [http://mobygames.com/game/sheet/p,3/gameId,8635/ Lionheart — Legacy of the Crusader]: An 3/4 RPG from Black Isle. Uses Vorbis for all audio. Thanks to all the guys that made Vorbis great.. (I even donated money myself, someday maybe I can convince the company to kick in some bucks as well). Official site is down, using mobygames link.

* [http://www.global-gaming.com/Dominion/ Urban Dominion] (beta): First Person Massively Multiplayer Online Role-Playing Game by Global-Gaming. Uses Ogg Vorbis for the sound system.

* [http://www.vietcong-game.com/ Vietcong]: Vietnam War First Person Shooter by Pterodon. Uses Ogg Vorbis I believe for the background music.

* [http://vegastrike.sourceforge.net/ Vega Strike]: It is a free spacesim. Ogg Vorbis files are stored in \music\ .

* [http://www.gathering.com/wingsofwar/ Wings Of War]: It is an arcade shooter in times of WWI. Game has ogg.dll, vorbis.dll and vorbisfile.dll — but *.ogg files are not accessible.

* [http://jonof.edgenetwork.org/winbuild/ WinBuild]: Winbuild is a port of Ken Silverman’s [http://www.advsys.net/ken/buildsrc/default.htm original Build engine demo] (for DOS) to Windows. It uses Vorbis compression for the music.

* [http://www.worldofwarcraft.com/ World of Warcraft]: popular massively multiplayer online role-playing game from Blizzard Entertainment use Vorbis for speech and sound effects.

* [http://www.zax-game.com/ Zax — The Alien Hunter]: A large 3/4 view action adventure game.

[[Category:Vorbis]]

GranulePosAndSeeking

2018-03-02T12:04:15Z

MrZeus: /* References */ fix formatting

== Granulepos encoding and How seeking really works ==

This describes how to seek on a multiplexed Ogg stream containing logical bitstreams with granuleshift, such as [[Theora]], [[Kate]], [[CMML]] or [[OggText]].
The purpose is to locate the earliest page that is required for rendering a given time offset.
Due to the fact that two time-seeking operations are required, this procedure is commonly referred to as a "'''double seek'''".

=== Definitions ===
Let's define '''time''' to mean '''the time represented by a GranulePos value'''. Hence the "time" of a page is the "time represented by the page GranulePos" header field.

Define '''seek''' to mean: for each '''logical''' bitstream, locate the '''bytewise-latest page''' in the bitstream with a '''time before the target time''', then choose the '''bytewise-earliest''' among these pages. If two or more pages have the same time (aka. GranulePos value), seeking must locate the bytewise-earlier page.

==== Granules and Granuleshift ====

We use the term '''granule''' to refer to time measured in the units of the codec. For audio codecs this is ''usually'' samples, and for video codecs it is ''usually'' frames or fields.

In some formats, pages have a dependency on the data of an earlier page; for example in [[Theora]], interframes have a dependency on an earlier keyframe -- the keyframe data is required to decode the interframe. We encode both the time of the page and the time of the page it depends on into the granulepos. In order to do this we treat the granulepos as a bitfield as follows:

+---------------------+-------------+
| prev_granule | offset |
+---------------------+-------------+

Then if a page has time in units of codec granules <tt>curr_granule</tt>, and the page it depends on has time
<tt>prev_granule</tt>, we define <tt>offset</tt> as the difference between these:

offset = curr_granule - prev_granule

We refer to the number of bits used to encode the offset as the "granuleshift". This is fixed for all pages in
that track (logical bitstream). So we encode the later page's granulepos as:

granulepos = (prev_granule << granuleshift) | offset

When decoding, we can extract the current_granule from a granulepos by simply adding these fields:

curr_granule = prev_granule + offset

Which expands to this expression of the page granulepos:

curr_granule = (granulepos >> granuleshift) + (granulepos & ((1 << granuleshift) - 1)))

Keyframes, and other data with no dependency on earlier packets, are encoded with:

prev_granule = curr_granule, offset = 0

=== Seeking within Single-Track files ===

To locate the earliest page in a track (a logical bitstream) required for rendering a given time offset:

# seek to the desired time
# read the prev_granule out of the granulepos
# seek to the time represented by the prev_granule

=== Seeking within Multitrack files ===

To locate the earliest page in a multitrack file (a physical bitstream) required for rendering '''all''' tracks from a given time offset:

# seek to the desired time
# scan forward until a page has been seen from all of the tracks that use granuleshift; while doing so, record the prev_granule of the bytewise-earliest page encountered from each track
# seek to the minimum of the prev_granules of those pages

It is useful to put a bound on the forward scan; the distance scanned
only depends on the way the stream is constructed, so it can be large
if pages in a particular logical bistream is sparse.

=== But how do I "seek to the desired time"?===
The above assumes that you already know how to seek to a particular GranulePos within the stream efficiently.

This isn't as simple as it sounds, because the Ogg format does not include an index. The lack of an index is a '''feature''' rather than a deficiency and it is one of the primary reasons to use Ogg over some other formats.

Because Ogg doesn't have in index:
* infinite streams and partial streams are automatically supported by correctly written applications
* there is no risk of truncation or minor corruption making a stream unseekable
* no memory is required to store an index
* no bandwidth is wasted to transmit it
* seeking granularity is not limited to the precision of the index

On the other hand, non-indexed formats require a bit more intelligence from the application using them, so many applications have gotten it wrong (although some intelligence is also needed in a well written application for indexed formats, so that it can seek with a corrupted index or below the index granularity).

====Do NOT build your own index====
If you are thinking about seeking within an Ogg file by building your own complete index: '''STOP! This is not a good procedure.'''

Building an index may seem simple, but it requires a costly read of the '''entire''' stream (which may be gigabytes in size, or even infinite).

There is a better way.

====Bisection Search====
The correct way to seek to a particular granule value in Ogg is by using a [http://en.wikipedia.org/wiki/Bisection_method bisection search]:

# Seek to the middle of the stream
# obtain sync
# compare your target granule position with the current position.
# If the target is less than the current position, repeat these steps on the left side.
# If it's greater, repeat it on the right side.

By applying this recursive algorithm, you are guaranteed to find your target location much faster than building an index for the whole stream.

To correctly support chaining, you should first use this kind of search to locate the stream endpoints. Then, the above approach can be applied within the streams, to seek to any location within a chained file.

Doing this correctly is somewhat more complicated than it seems, due to the existence of '''continued pages''' and the risk of a small valid page being contained within a packet. Both of these challenges can be addressed, but the solution is left as an exercise for the reader. (Hint: The maximum Ogg page size is always '''smaller than 64 KBytes''')

This Bisection Search is very good compared to the alternatives (a linear scan of the whole file), often taking just a couple of reads to locate the correct location in a file gigabytes in size, but the truly obsessive can out-perform the bisection on average, by using the local bitrate to pick a better target than the half way point used in a bisection search ([http://en.wikipedia.org/wiki/Secant_method Secant method]).

Be careful about the worst case becoming linear (see [http://en.wikipedia.org/wiki/Brent%27s_method Brent's method]). The improvement possible from better-than-bisection approaches is probably only relevant for seeking across a high latency network. In typical low-latency applications, the added complexity may not be worth the cost.

== References ==

From an Email by Monty, [http://web.archive.org/web/20031201054855/http://www.xiph.org/archives/theora-dev/200209/0040.html 13th Sept 2002]

'''Note that this document is obsolete, and incorrect with respect to seeking in multiplexed streams.''' It does accurately describe the rationale behind the two-part granulepos scheme (option 3 below) now use in Theora, Dirac, CMML and other codecs in Ogg.

===Monty's email===
Folks have noticed that the documentation is semi-silent about how to properly encode the granule position and interleave synchronization of keyframe-based video. The primary reasons for this:

* we at Xiph hadn't had to do it yet
* there are several easy possibilities, and the longer we had to think about it before mandating One True Spec, the better that spec would likely be.

The lack of a painfully explicit spec has led to the theory that it's not possible; that's not true, there are a few ways to do it. Several require no extension to Ogg stream v 0. A last way requires an extra field (a point against it), but does not actually break any stream that currently exists.

The time has come to lay down the spec as we're currently building the real abstraction layers in a concrete Ogg framework now where the Ogg engine, the codecs, and the overarching Ogg control layers are neatly put into boxes connected in formalized ways.

Below I go into detail about each scheme in a 'thinking aloud' sort of way. This is not because I haven't already given the matter sufficient thought, it is because I wish to give the reader sufficient background information to understand why one way is better than the others.

This is not a call for input so much as an educational effort (and a public sanity check of my thinking; please do pipe up if it appears I missed a salient point).

==== Starting Assumptions ====

# '''Ogg is not a non-linear format.''' It is not a replacement for the scripting system of a DVD player. It is a media transport format designed to do nothing more than deliver content, in a stream, and have all the pieces arrive on time and in sync. It is not designed to *prevent* more complex use of content, it merely does not implement anything beyond a linear representation of the data contained within. If you want to build a real non-linear format, build it *from* Ogg, not *into* Ogg. This has been the intent from day 1.
# '''The Ogg layer does not know specifics of the codec data it's multiplexing into a stream.''' It knows nothing beyond 'Oooo, packets!', that the packets belong to different buckets, that the packets go in order, and that packets have position markers. Ogg does not even have a concept of 'time'; it only knows about the sequentially increasing, unitless position markers. It is up to higher layers which have access to the codec APIs to assign and convert units of framing or time.
# '''Given pre-cached decode headers, a player may seek into a stream at any point and begin decode.''' It may be the case that audio may start after video by a fraction of a second, or video might be blank until the stream hits the next keyframe, but this simplest case must just work, and there will be sufficient information to maintain perfect cross-media sync.
# (This departs from current reality, but it will be the reality very soon; vorbisfile currently blurs the careful abstraction I'm about to describe) '''Seeking at an arbitrary level of precision is a distributed abstraction in the larger Ogg picture.''' At the lowest-level Ogg stream abstraction, seeking is one operation: "find me the page from logical stream 'n' with granule position 'x'". All more complex seeking operations are a function of a higher-level layer (with knowledge of the media types and codec in use) making intelligent use of this lowest Ogg abstraction. The Ogg stream abstraction need deal with nothing more complex than 'find this page'.

The various granulepos strategies for keyframes concern this last point.

The complication with video is that frames often depend on previous and possibly future frames. This happens in a larger, general category of codecs whose streams may not begin decode from just any packet as well as packets that may not represent an entire frame, or even a fixed-time sampling algorithm.

It is a mistake to design a seeking system tied to an exact set of very specific cases. While one could implement an explicit keyframe mechanism at the Ogg level, this mechanism would not cover any of the other interesting seeking cases while, as I'll show below, the mechanism would not actually be necessary.

There will be a few complaints that Ogg is being unnecessarily subtle and shifts a great deal of complexity into software which a few extra page header fields could eliminate.

Consider the following:

# Ogg was designed to impose a roughly .5-1% over the raw packet data over a wide range of packet usage patterns. 'A few extra fields' begins inflating that figure for specific special cases that only apply to a few stream types. Right now there is no header field that is not general to every stream. There is no fat in the page headers.
# The Ogg-level seeking algorithm is exceptionally simple and can be described in a single sentence: "Find the earliest page with a granulepos less than but closest to 'x'". This shifts the onus of assembling more complex seeking operation requiring knowledge of a specific media type into a higher layer that has knowledge of that media type. The higher layer becomes responsible for determining for what 'x' Ogg should search. The division of labor is clear and sensible.
# Complex, precise seeking operations are still contained entirely within the framework, just at a higher layer than Ogg-stream. At no time is an application developer required to deal with seeking mechanisms within an Ogg stream or to manually maintain stream
synchronization.

==== High level handwaving: How seeking really works ====

The granulepos is intended to mean, roughly, 'If I stop decode at the end of this page, I will get data from my decoder up to position 'granulepos'. The granulepos simultaneously provides seeking information and a 'length-of-stream' indicator. Depending on the codec, it can also usually be used to indicate a timebase, but that isn't our problem right now.

By inference, the granulepos is also used to construct a value 'y' such that 'if I begin decode *from* point 'y', I will get data
beginning at position 'granulepos'. Although in some codecs, y == granulepos, that is not necessarily the case when decode can't begin at any arbitrary packet. The granulepos encoding method candidates I will now describe affect exactly the 'granulepos' to 'y' conversion process. Note also that none of these affect Ogg, only the higher decision-making layers... Different circumstanced necessitated by different codecs can lead to different valid choices, all of which work as far as Ogg is concerned. However, for our I-/P-/B-frame video case, there is a pretty clear winner.

===== Strategy 1: Straight Granulepos, Keyframes Are Not Our Problem. =====

In this scheme, the granulepos is a simple frame counter. The seeking decision-maker in the codec's framework plugin is responsible for determining if a frame is a keyframe or not, and if it can't begin decode from a given frame, it must request another earlier frame until it finds a keyframe. If the codec so desires, it can store 'what is my keyframe?' information in the stream packets.

This case means that each seek to a *specific* frame in a video stream will generally result in two Ogg seeks; a first seek to the the requested frame, then a second seek backwards to find that frame's keyframe.

A larger concern is the semantic accuracy of the granulepos; it's intended to reflect position accurately when decoding forward. In this scheme, it's fine for a P-frame to update the counter (as it can be decoded going strictly forward), but B frames will also advance the counter; they can't be decoded without subsequent P or I frames. Thus, the semantic value of granulepos no longer strictly represents 'we can decode up to 'granulepos' at the end of this frame'.

===== Strategy 2: Granulepos Represents Keyframes Only =====

In this scheme, only keyframes update the granulepos (monotonically or non-monotonically). It simplifies the seeking process to a keyframe as an Ogg-level seek to page 'x' will always yield a page with a keyframe. In addition, granulepos will also always mean 'we can decode up to *at least* this point in the stream. If the stream is truncated at P or B frames past granulepos, the extra frames can be discarded. (A special case would need to be defined to terminate a stream that doesn't end on an I frame).

The difficulty with this scheme is that it presents slightly more for the software level decoder to track; a proper frame number could not be determined internally without tracking from an I frame. Also, the granulepos an Ogg page would not necessarily map to the last packet on the page, or even any packet on that page; multiple sequential pages could have the same granulepos. It is conceptually slightly messy, although the 'messiness' does not make it at all impractical.

===== Strategy 3: Granulepos Encodes Some State =====

In some ways, this strategy is the most semantically 'over clever', but also the easiest to implement and the one that gives the most correct, up to date sync information. Pending comments, it is the I/P/B video strategy I currently favor.

The granulepos is 64 bits, a size that is absolutely necessary if, for example, it represents the PCM sample count in an audio codec. When being used to encode video frame number, however, it is comparatively absurdly large*.

* note that although granulepos is not permitted to wrap around, we can simply begin a new logical stream segment with a new serial number should a 30fps video stream ever hit the ten-billion year mark.

Thus we clearly have room to skim a few bits off the bottom of granulepos to represent I, P or B frame. These bits are not used as flags, but rather, frame representation becomes a counting problem; We do this such that the count is still always strictly increasing.

For example, we know that I frames will never be more than 256 frames apart and P frames no more than 31 B frames apart, the granulepos of an I frame can be defined to always be granulepos | 0xff == 0. If we can have up to seven intervening P frames, they could be numbered in granulepos-of-iframe + 0x20, 0x40, 0x60... 0xe0. B frames between the I and P frames would use the remaining five bits and be numbers as sub-I and sub-P frames 1 through 31. Thus, starting from zero, the frames/packets in the pattern IPBBPBBI would be numbered 0x000, 0x020, 0x021, 0x022, 0x040, 0x041, 0x042, 0x100.

If we wish to preserve the ability to represent a timebase, the granulepos number for I frames need not be increased monotonically and shifted; it can be used to represent the frame number. The above example becomes 0x000, 0x020, 0x021, 0x022, 0x040, 0x041, 0x042, 0x700. To get real frame number (from an I frame), we just shift granulepos >> 8. This scheme can be taken further or modified to get frame number from any video frame.

In this way, we can always seek, first time, to a desired key frame page (by seeking to Ogg page 'x' where x | 0xff == 0). In addition, each frame still has a unique frame number and also a clear 'group' number, potentially useful information to the decoder. Lastly, granulepos is still semantically correct, although it is now, in a sense, representing a whole.fractional frame number for buffering purposes.

===== Strategy 4: Extra 'Seekpos' Field / Straw Man =====

Another possibility requires extension of the current Ogg page format. Although older players would reject any such extended pages as invalid, we do have versioning and typing fields, so there's not actually any compatibility problems with current Ogg pages... in the future.

The idea in this scheme is to keep the current granulepos as a frame number field (ala scheme 1), but also add a new field 'seekpos' that is used, rather than granulepos, in seeking. The seekpos would represent the number of the last keyframe that passed by.

Advantages:

# The net effect of this strategy is to modify scheme 1 to only require one bisection seek rather than two.
# Some amount of code simplification (over scheme 1) at the decision-making level.

Disadvantages:

# The Ogg format will need to be revved. No current (ala 1.0) Ogg code will understand the new pages.
# The header becomes larger, from a minimum size of 27 bytes to a minimum size of 35.
# This strategy only enhances keyframes; it is of no use in other odd seeking cases.
# Gives no more information than scheme 3, but is still more complicated, both in code and API (Ogg would have to understand keyframes).

Thus, there's no substantial reason to prefer extending the format over a scheme that's possible within the existing framework. Note that schemes 1-3 can all be implemented within the Ogg stream today.

Monty

[[Category:Ogg]]

GranulePosAndSeeking

2018-03-01T22:24:15Z

MrZeus: /* But how do I "seek to the desired time"? */

== Granulepos encoding and How seeking really works ==

This describes how to seek on a multiplexed Ogg stream containing logical bitstreams with granuleshift, such as [[Theora]], [[Kate]], [[CMML]] or [[OggText]].
The purpose is to locate the earliest page that is required for rendering a given time offset.
Due to the fact that two time-seeking operations are required, this procedure is commonly referred to as a "'''double seek'''".

=== Definitions ===
Let's define '''time''' to mean '''the time represented by a GranulePos value'''. Hence the "time" of a page is the "time represented by the page GranulePos" header field.

Define '''seek''' to mean: for each '''logical''' bitstream, locate the '''bytewise-latest page''' in the bitstream with a '''time before the target time''', then choose the '''bytewise-earliest''' among these pages. If two or more pages have the same time (aka. GranulePos value), seeking must locate the bytewise-earlier page.

==== Granules and Granuleshift ====

We use the term '''granule''' to refer to time measured in the units of the codec. For audio codecs this is ''usually'' samples, and for video codecs it is ''usually'' frames or fields.

In some formats, pages have a dependency on the data of an earlier page; for example in [[Theora]], interframes have a dependency on an earlier keyframe -- the keyframe data is required to decode the interframe. We encode both the time of the page and the time of the page it depends on into the granulepos. In order to do this we treat the granulepos as a bitfield as follows:

+---------------------+-------------+
| prev_granule | offset |
+---------------------+-------------+

Then if a page has time in units of codec granules <tt>curr_granule</tt>, and the page it depends on has time
<tt>prev_granule</tt>, we define <tt>offset</tt> as the difference between these:

offset = curr_granule - prev_granule

We refer to the number of bits used to encode the offset as the "granuleshift". This is fixed for all pages in
that track (logical bitstream). So we encode the later page's granulepos as:

granulepos = (prev_granule << granuleshift) | offset

When decoding, we can extract the current_granule from a granulepos by simply adding these fields:

curr_granule = prev_granule + offset

Which expands to this expression of the page granulepos:

curr_granule = (granulepos >> granuleshift) + (granulepos & ((1 << granuleshift) - 1)))

Keyframes, and other data with no dependency on earlier packets, are encoded with:

prev_granule = curr_granule, offset = 0

=== Seeking within Single-Track files ===

To locate the earliest page in a track (a logical bitstream) required for rendering a given time offset:

# seek to the desired time
# read the prev_granule out of the granulepos
# seek to the time represented by the prev_granule

=== Seeking within Multitrack files ===

To locate the earliest page in a multitrack file (a physical bitstream) required for rendering '''all''' tracks from a given time offset:

# seek to the desired time
# scan forward until a page has been seen from all of the tracks that use granuleshift; while doing so, record the prev_granule of the bytewise-earliest page encountered from each track
# seek to the minimum of the prev_granules of those pages

It is useful to put a bound on the forward scan; the distance scanned
only depends on the way the stream is constructed, so it can be large
if pages in a particular logical bistream is sparse.

=== But how do I "seek to the desired time"?===
The above assumes that you already know how to seek to a particular GranulePos within the stream efficiently.

This isn't as simple as it sounds, because the Ogg format does not include an index. The lack of an index is a '''feature''' rather than a deficiency and it is one of the primary reasons to use Ogg over some other formats.

Because Ogg doesn't have in index:
* infinite streams and partial streams are automatically supported by correctly written applications
* there is no risk of truncation or minor corruption making a stream unseekable
* no memory is required to store an index
* no bandwidth is wasted to transmit it
* seeking granularity is not limited to the precision of the index

On the other hand, non-indexed formats require a bit more intelligence from the application using them, so many applications have gotten it wrong (although some intelligence is also needed in a well written application for indexed formats, so that it can seek with a corrupted index or below the index granularity).

====Do NOT build your own index====
If you are thinking about seeking within an Ogg file by building your own complete index: '''STOP! This is not a good procedure.'''

Building an index may seem simple, but it requires a costly read of the '''entire''' stream (which may be gigabytes in size, or even infinite).

There is a better way.

====Bisection Search====
The correct way to seek to a particular granule value in Ogg is by using a [http://en.wikipedia.org/wiki/Bisection_method bisection search]:

# Seek to the middle of the stream
# obtain sync
# compare your target granule position with the current position.
# If the target is less than the current position, repeat these steps on the left side.
# If it's greater, repeat it on the right side.

By applying this recursive algorithm, you are guaranteed to find your target location much faster than building an index for the whole stream.

To correctly support chaining, you should first use this kind of search to locate the stream endpoints. Then, the above approach can be applied within the streams, to seek to any location within a chained file.

Doing this correctly is somewhat more complicated than it seems, due to the existence of '''continued pages''' and the risk of a small valid page being contained within a packet. Both of these challenges can be addressed, but the solution is left as an exercise for the reader. (Hint: The maximum Ogg page size is always '''smaller than 64 KBytes''')

This Bisection Search is very good compared to the alternatives (a linear scan of the whole file), often taking just a couple of reads to locate the correct location in a file gigabytes in size, but the truly obsessive can out-perform the bisection on average, by using the local bitrate to pick a better target than the half way point used in a bisection search ([http://en.wikipedia.org/wiki/Secant_method Secant method]).

Be careful about the worst case becoming linear (see [http://en.wikipedia.org/wiki/Brent%27s_method Brent's method]). The improvement possible from better-than-bisection approaches is probably only relevant for seeking across a high latency network. In typical low-latency applications, the added complexity may not be worth the cost.

== References ==

From an Email by Monty, [http://web.archive.org/web/20031201054855/http://www.xiph.org/archives/theora-dev/200209/0040.html 13th Sept 2002]

'''Note that this document is obsolete, and incorrect with respect to seeking in multiplexed streams.''' It does accurately describe the rationale behind the two-part granulepos scheme (option 3 below) now use in Theora, Dirac, CMML and other codecs in Ogg.

Folks have noticed that the documentation is semi-silent about how to properly encode the granule position and interleave synchronization of keyframe-based video. The primary reasons for this:

* we at Xiph hadn't had to do it yet

* there are several easy possibilities, and the longer we had to think about it before mandating One True Spec, the better that spec would likely be.

The lack of a painfully explicit spec has led to the theory that it's not possible; that's not true, there are a few ways to do it. Several require no extension to Ogg stream v 0. A last way requires an extra field (a point against it), but does not actually break any stream that currently exists.

The time has come to lay down the spec as we're currently building the real abstraction layers in a concrete Ogg framework now where the Ogg engine, the codecs, and the overarching Ogg control layers are neatly put into boxes connected in formalized ways. Below I go into detail about each scheme in a 'thinking aloud' sort of way. This is not because I haven't already given the matter sufficient thought, it is because I wish to give the reader sufficient background information to understand why one way is better than the others. This is not a call for input so much as an educational effort (and a public sanity check of my thinking; please do pipe up if it appears I missed a salient point).

==== Starting Assumptions: ====

1) Ogg is not a non-linear format. It is not a replacement for the scripting system of a DVD player. It is a media transport format
designed to do nothing more than deliver content, in a stream, and have all the pieces arrive on time and in sync. It is not designed to *prevent* more complex use of content, it merely does not implement anything beyond a linear representation of the data contained within. If you want to build a real non-linear format, build it *from* Ogg, not *into* Ogg. This has been the intent from day 1.

2) The Ogg layer does not know specifics of the codec data it's multiplexing into a stream. It knows nothing beyond 'Oooo, packets!', that the packets belong to different buckets, that the packets go in order, and that packets have position markers. Ogg does not even have a concept of 'time'; it only knows about the sequentially increasing, unitless position markers. It is up to higher layers which have access to the codec APIs to assign and convert units of framing or time.

3) Given pre-cached decode headers, a player may seek into a stream at any point and begin decode. It may be the case that audio may start after video by a fraction of a second, or video might be blank until the stream hits the next keyframe, but this simplest case must just work, and there will be sufficient information to maintain perfect cross-media sync.

4) (This departs from current reality, but it will be the reality very soon; vorbisfile currently blurs the careful abstraction I'm about to describe) Seeking at an arbitrary level of precision is a distributed abstraction in the larger Ogg picture. At the lowest-level Ogg stream abstraction, seeking is one operation: "find me the page from logical stream 'n' with granule position 'x'". All more complex seeking operations are a function of a higher-level layer (with knowledge of the media types and codec in use) making intelligent use of this lowest Ogg abstraction. The Ogg stream abstraction need deal with nothing more complex than 'find this page'.

The various granulepos strategies for keyframes concern this last point.

The basic issue with video from which complexity arises is that frames often depend on previous and possibly future frames. This happens in a larger, general category of codecs whose streams may not begin decode from just any packet as well as packets that may not represent an entire frame, or even a fixed-time sampling algorithm. It is a mistake to design a seeking system tied to an exact set of very specific cases. While one could implement an explicit keyframe mechanism at the Ogg level, this mechanism would not cover any of the other interesting seeking cases while, as I'll show below, the mechanism would not actually be necessary.

There will be a few complaints that Ogg is being unnecessarily subtle and shifts a great deal of complexity into software which a few extra page header fields could eliminate. Consider the following:

1) Ogg was designed to impose a roughly .5-1% over the raw packet data over a wide range of packet usage patterns. 'A few extra fields' begins inflating that figure for specific special cases that only apply to a few stream types. Right now there is no header field that is not general to every stream. There is no fat in the page headers.

2) The Ogg-level seeking algorithm is exceptionally simple and can be described in a single sentence: "Find the earliest page with a granulepos less than but closest to 'x'". This shifts the onus of assembling more complex seeking operation requiring knowledge of a specific media type into a higher layer that has knowledge of that media type. The higher layer becomes responsible for determining for what 'x' Ogg should search. The division of labor is clear and
sensible.

3) Complex, precise seeking operations are still contained entirely within the framework, just at a higher layer than Ogg-stream. At no time is an application developer required to deal with seeking mechanisms within an Ogg stream or to manually maintain stream
synchronization.

==== High level handwaving- How seeking really works ====

The granulepos is intended to mean, roughly, 'If I stop decode at the end of this page, I will get data from my decoder up to position 'granulepos'. The granulepos simultaneously provides seeking information and a 'length-of-stream' indicator. Depending on the codec, it can also usually be used to indicate a timebase, but that isn't our problem right now.

By inference, the granulepos is also used to construct a value 'y' such that 'if I begin decode *from* point 'y', I will get data
beginning at position 'granulepos'. Although in some codecs, y == granulepos, that is not necessarily the case when decode can't begin at any arbitrary packet. The granulepos encoding method candidates I will now describe affect exactly the 'granulepos' to 'y' conversion process. Note also that none of these affect Ogg, only the higher decision-making layers... Different circumstanced necessitated by different codecs can lead to different valid choices, all of which work as far as Ogg is concerned. However, for our I-/P-/B-frame video case, there is a pretty clear winner.

===== Strategy 1: Straight Granulepos, Keyframes Are Not Our Problem. =====

In this scheme, the granulepos is a simple frame counter. The seeking decision-maker in the codec's framework plugin is responsible for determining if a frame is a keyframe or not, and if it can't begin decode from a given frame, it must request another earlier frame until it finds a keyframe. If the codec so desires, it can store 'what is my keyframe?' information in the stream packets.

This case means that each seek to a *specific* frame in a video stream will generally result in two Ogg seeks; a first seek to the the requested frame, then a second seek backwards to find that frame's keyframe.

A larger concern is the semantic accuracy of the granulepos; it's intended to reflect position accurately when decoding forward. In this scheme, it's fine for a P-frame to update the counter (as it can be decoded going strictly forward), but B frames will also advance the counter; they can't be decoded without subsequent P or I frames. Thus, the semantic value of granulepos no longer strictly represents 'we can decode up to 'granulepos' at the end of this frame'.

===== Strategy 2: Granulepos Represents Keyframes Only =====

In this scheme, only keyframes update the granulepos (monotonically or non-monotonically). It simplifies the seeking process to a keyframe as an Ogg-level seek to page 'x' will always yield a page with a keyframe. In addition, granulepos will also always mean 'we can decode up to *at least* this point in the stream. If the stream is truncated at P or B frames past granulepos, the extra frames can be discarded. (A special case would need to be defined to terminate a stream that doesn't end on an I frame).

The difficulty with this scheme is that it presents slightly more for the software level decoder to track; a proper frame number could not be determined internally without tracking from an I frame. Also, the granulepos an Ogg page would not necessarily map to the last packet on the page, or even any packet on that page; multiple sequential pages could have the same granulepos. It is conceptually slightly messy, although the 'messiness' does not make it at all impractical.

===== Strategy 3: Granulepos Encodes Some State =====

In some ways, this strategy is the most semantically 'over clever', but also the easiest to implement and the one that gives the most correct, up to date sync information. Pending comments, it is the I/P/B video strategy I currently favor.

The granulepos is 64 bits, a size that is absolutely necessary if, for example, it represents the PCM sample count in an audio codec. When being used to encode video frame number, however, it is comparatively absurdly large*.

* note that although granulepos is not permitted to wrap around, we can simply begin a new logical stream segment with a new serial number should a 30fps video stream ever hit the ten-billion year mark.

Thus we clearly have room to skim a few bits off the bottom of granulepos to represent I, P or B frame. These bits are not used as flags, but rather, frame representation becomes a counting problem; We do this such that the count is still always strictly increasing.

For example, we know that I frames will never be more than 256 frames apart and P frames no more than 31 B frames apart, the granulepos of an I frame can be defined to always be granulepos | 0xff == 0. If we can have up to seven intervening P frames, they could be numbered in granulepos-of-iframe + 0x20, 0x40, 0x60... 0xe0. B frames between the I and P frames would use the remaining five bits and be numbers as sub-I and sub-P frames 1 through 31. Thus, starting from zero, the frames/packets in the pattern IPBBPBBI would be numbered 0x000, 0x020, 0x021, 0x022, 0x040, 0x041, 0x042, 0x100.

If we wish to preserve the ability to represent a timebase, the granulepos number for I frames need not be increased monotonically and shifted; it can be used to represent the frame number. The above example becomes 0x000, 0x020, 0x021, 0x022, 0x040, 0x041, 0x042, 0x700. To get real frame number (from an I frame), we just shift granulepos >> 8. This scheme can be taken further or modified to get frame number from any video frame.

In this way, we can always seek, first time, to a desired key frame page (by seeking to Ogg page 'x' where x | 0xff == 0). In addition, each frame still has a unique frame number and also a clear 'group' number, potentially useful information to the decoder. Lastly, granulepos is still semantically correct, although it is now, in a sense, representing a whole.fractional frame number for buffering purposes.

===== Scheme Four: Extra 'Seekpos' Field / Straw Man =====

Another possibility requires extension of the current Ogg page format. Although older players would reject any such extended pages as invalid, we do have versioning and typing fields, so there's not actually any compatibility problems with current Ogg pages... in the future.

The idea in this scheme is to keep the current granulepos as a frame number field (ala scheme 1), but also add a new field 'seekpos' that is used, rather than granulepos, in seeking. The seekpos would represent the number of the last keyframe that passed by.

advantages:

1) The net effect of this strategy is to modify scheme 1 to only require one bisection seek rather than two. Some amount of code simplification (over scheme 1) at the decision-making level.

disadvantages:

1) The Ogg format will need to be revved. No current (ala 1.0) Ogg code will understand the new pages.

2) The header becomes larger, from a minimum size of 27 bytes to a minimum size of 35.

3) This strategy only enhances keyframes; it is of no use in other odd seeking cases.

4) Gives no more information than scheme 3, but is still more complicated, both in code and API (Ogg would have to understand keyframes).

Thus, there's no substantial reason to prefer extending the format over a scheme that's possible within the existing framework. Note that schemes 1-3 can all be implemented within the Ogg stream today.

Monty

[[Category:Ogg]]

GranulePosAndSeeking

2018-03-01T16:30:13Z

MrZeus: /* Definitions */ fix weird formatting due to open bracket

== Granulepos encoding and How seeking really works ==

This describes how to seek on a multiplexed Ogg stream containing logical bitstreams with granuleshift, such as [[Theora]], [[Kate]], [[CMML]] or [[OggText]].
The purpose is to locate the earliest page that is required for rendering a given time offset.
Due to the fact that two time-seeking operations are required, this procedure is commonly referred to as a "'''double seek'''".

=== Definitions ===
Let's define '''time''' to mean '''the time represented by a GranulePos value'''. Hence the "time" of a page is the "time represented by the page GranulePos" header field.

Define '''seek''' to mean: for each '''logical''' bitstream, locate the '''bytewise-latest page''' in the bitstream with a '''time before the target time''', then choose the '''bytewise-earliest''' among these pages. If two or more pages have the same time (aka. GranulePos value), seeking must locate the bytewise-earlier page.

==== Granules and Granuleshift ====

We use the term '''granule''' to refer to time measured in the units of the codec. For audio codecs this is ''usually'' samples, and for video codecs it is ''usually'' frames or fields.

In some formats, pages have a dependency on the data of an earlier page; for example in [[Theora]], interframes have a dependency on an earlier keyframe -- the keyframe data is required to decode the interframe. We encode both the time of the page and the time of the page it depends on into the granulepos. In order to do this we treat the granulepos as a bitfield as follows:

+---------------------+-------------+
| prev_granule | offset |
+---------------------+-------------+

Then if a page has time in units of codec granules <tt>curr_granule</tt>, and the page it depends on has time
<tt>prev_granule</tt>, we define <tt>offset</tt> as the difference between these:

offset = curr_granule - prev_granule

We refer to the number of bits used to encode the offset as the "granuleshift". This is fixed for all pages in
that track (logical bitstream). So we encode the later page's granulepos as:

granulepos = (prev_granule << granuleshift) | offset

When decoding, we can extract the current_granule from a granulepos by simply adding these fields:

curr_granule = prev_granule + offset

Which expands to this expression of the page granulepos:

curr_granule = (granulepos >> granuleshift) + (granulepos & ((1 << granuleshift) - 1)))

Keyframes, and other data with no dependency on earlier packets, are encoded with:

prev_granule = curr_granule, offset = 0

=== Seeking within Single-Track files ===

To locate the earliest page in a track (a logical bitstream) required for rendering a given time offset:

# seek to the desired time
# read the prev_granule out of the granulepos
# seek to the time represented by the prev_granule

=== Seeking within Multitrack files ===

To locate the earliest page in a multitrack file (a physical bitstream) required for rendering '''all''' tracks from a given time offset:

# seek to the desired time
# scan forward until a page has been seen from all of the tracks that use granuleshift; while doing so, record the prev_granule of the bytewise-earliest page encountered from each track
# seek to the minimum of the prev_granules of those pages

It is useful to put a bound on the forward scan; the distance scanned
only depends on the way the stream is constructed, so it can be large
if pages in a particular logical bistream is sparse.

=== But how do I "seek to the desired time"?===
The above assumes that you already know how to seek to a particular GranulePos within the stream efficiently.

This isn't as simple as it sounds, because the Ogg format does not include an index. The lack of an index is a feature rather than a deficiency and it is one of the primary reasons to use Ogg over some other formats.

Because Ogg doesn't have in index, infinite streams and partial streams are automatically supported by correctly written applications. There is no risk of truncation or minor corruption making a stream unseekable. No memory is required to store an index, no bandwidth is wasted to transmit it, and seeking granularity is not limited to the precision of the index.

On the other hand, non-indexed formats require a bit more intelligence from the application using them, so many applications have gotten it wrong (although some intelligence is also needed in a well written application for indexed formats, so that it can seek with a corrupted index or below the index granularity).

If you are thinking about seeking within an Ogg file by building your own complete index: '''STOP! This is not a good procedure.'''

Building an index may seem simple, but it requires a costly read of the entire stream (which may be gigabytes in size, or even infinite). There is a better way.

The correct way to seek to a particular granule value in Ogg is by using a [http://en.wikipedia.org/wiki/Bisection_method bisection search]:

# Seek to the middle of the stream
# obtain sync
# compare your target granule position with the current position.
# If the target is less than the current position, repeat these steps on the left side.
# If it's greater, repeat it on the right side.

By applying this recursive algorithm, you are guaranteed to find your target location much faster than building an index for the whole stream.

To correctly support chaining, you should first use this kind of search to locate the stream endpoints. Then, the above approach can be applied within the streams, to seek to any location within a chained file.

Doing this correctly is somewhat more complicated than it seems, due to the existence of '''continued pages''' and the risk of a small valid page being contained within a packet. Both of these challenges can be addressed, but the solution is left as an exercise for the reader. (Hint: The maximum Ogg page size is < 64 KBytes)

This Bisection Search is very good compared to the alternatives (a linear scan of the whole file), often taking just a couple of reads to locate the correct location in a file gigabytes in size, but the truly obsessive can out-perform the bisection on average, by using the local bitrate to pick a better target than the half way point used in a bisection search ([http://en.wikipedia.org/wiki/Secant_method Secant method]).

Be careful about the worst case becoming linear (see [http://en.wikipedia.org/wiki/Brent%27s_method Brent's method]). The improvement possible from better-than-bisection approaches is probably only relevant for seeking across a high latency network. In typical low-latency applications, the added complexity may not be worth the cost.

== References ==

From an Email by Monty, [http://web.archive.org/web/20031201054855/http://www.xiph.org/archives/theora-dev/200209/0040.html 13th Sept 2002]

'''Note that this document is obsolete, and incorrect with respect to seeking in multiplexed streams.''' It does accurately describe the rationale behind the two-part granulepos scheme (option 3 below) now use in Theora, Dirac, CMML and other codecs in Ogg.

Folks have noticed that the documentation is semi-silent about how to properly encode the granule position and interleave synchronization of keyframe-based video. The primary reasons for this:

* we at Xiph hadn't had to do it yet

* there are several easy possibilities, and the longer we had to think about it before mandating One True Spec, the better that spec would likely be.

The lack of a painfully explicit spec has led to the theory that it's not possible; that's not true, there are a few ways to do it. Several require no extension to Ogg stream v 0. A last way requires an extra field (a point against it), but does not actually break any stream that currently exists.

The time has come to lay down the spec as we're currently building the real abstraction layers in a concrete Ogg framework now where the Ogg engine, the codecs, and the overarching Ogg control layers are neatly put into boxes connected in formalized ways. Below I go into detail about each scheme in a 'thinking aloud' sort of way. This is not because I haven't already given the matter sufficient thought, it is because I wish to give the reader sufficient background information to understand why one way is better than the others. This is not a call for input so much as an educational effort (and a public sanity check of my thinking; please do pipe up if it appears I missed a salient point).

==== Starting Assumptions: ====

1) Ogg is not a non-linear format. It is not a replacement for the scripting system of a DVD player. It is a media transport format
designed to do nothing more than deliver content, in a stream, and have all the pieces arrive on time and in sync. It is not designed to *prevent* more complex use of content, it merely does not implement anything beyond a linear representation of the data contained within. If you want to build a real non-linear format, build it *from* Ogg, not *into* Ogg. This has been the intent from day 1.

2) The Ogg layer does not know specifics of the codec data it's multiplexing into a stream. It knows nothing beyond 'Oooo, packets!', that the packets belong to different buckets, that the packets go in order, and that packets have position markers. Ogg does not even have a concept of 'time'; it only knows about the sequentially increasing, unitless position markers. It is up to higher layers which have access to the codec APIs to assign and convert units of framing or time.

3) Given pre-cached decode headers, a player may seek into a stream at any point and begin decode. It may be the case that audio may start after video by a fraction of a second, or video might be blank until the stream hits the next keyframe, but this simplest case must just work, and there will be sufficient information to maintain perfect cross-media sync.

4) (This departs from current reality, but it will be the reality very soon; vorbisfile currently blurs the careful abstraction I'm about to describe) Seeking at an arbitrary level of precision is a distributed abstraction in the larger Ogg picture. At the lowest-level Ogg stream abstraction, seeking is one operation: "find me the page from logical stream 'n' with granule position 'x'". All more complex seeking operations are a function of a higher-level layer (with knowledge of the media types and codec in use) making intelligent use of this lowest Ogg abstraction. The Ogg stream abstraction need deal with nothing more complex than 'find this page'.

The various granulepos strategies for keyframes concern this last point.

The basic issue with video from which complexity arises is that frames often depend on previous and possibly future frames. This happens in a larger, general category of codecs whose streams may not begin decode from just any packet as well as packets that may not represent an entire frame, or even a fixed-time sampling algorithm. It is a mistake to design a seeking system tied to an exact set of very specific cases. While one could implement an explicit keyframe mechanism at the Ogg level, this mechanism would not cover any of the other interesting seeking cases while, as I'll show below, the mechanism would not actually be necessary.

There will be a few complaints that Ogg is being unnecessarily subtle and shifts a great deal of complexity into software which a few extra page header fields could eliminate. Consider the following:

1) Ogg was designed to impose a roughly .5-1% over the raw packet data over a wide range of packet usage patterns. 'A few extra fields' begins inflating that figure for specific special cases that only apply to a few stream types. Right now there is no header field that is not general to every stream. There is no fat in the page headers.

2) The Ogg-level seeking algorithm is exceptionally simple and can be described in a single sentence: "Find the earliest page with a granulepos less than but closest to 'x'". This shifts the onus of assembling more complex seeking operation requiring knowledge of a specific media type into a higher layer that has knowledge of that media type. The higher layer becomes responsible for determining for what 'x' Ogg should search. The division of labor is clear and
sensible.

3) Complex, precise seeking operations are still contained entirely within the framework, just at a higher layer than Ogg-stream. At no time is an application developer required to deal with seeking mechanisms within an Ogg stream or to manually maintain stream
synchronization.

==== High level handwaving- How seeking really works ====

The granulepos is intended to mean, roughly, 'If I stop decode at the end of this page, I will get data from my decoder up to position 'granulepos'. The granulepos simultaneously provides seeking information and a 'length-of-stream' indicator. Depending on the codec, it can also usually be used to indicate a timebase, but that isn't our problem right now.

By inference, the granulepos is also used to construct a value 'y' such that 'if I begin decode *from* point 'y', I will get data
beginning at position 'granulepos'. Although in some codecs, y == granulepos, that is not necessarily the case when decode can't begin at any arbitrary packet. The granulepos encoding method candidates I will now describe affect exactly the 'granulepos' to 'y' conversion process. Note also that none of these affect Ogg, only the higher decision-making layers... Different circumstanced necessitated by different codecs can lead to different valid choices, all of which work as far as Ogg is concerned. However, for our I-/P-/B-frame video case, there is a pretty clear winner.

===== Strategy 1: Straight Granulepos, Keyframes Are Not Our Problem. =====

In this scheme, the granulepos is a simple frame counter. The seeking decision-maker in the codec's framework plugin is responsible for determining if a frame is a keyframe or not, and if it can't begin decode from a given frame, it must request another earlier frame until it finds a keyframe. If the codec so desires, it can store 'what is my keyframe?' information in the stream packets.

This case means that each seek to a *specific* frame in a video stream will generally result in two Ogg seeks; a first seek to the the requested frame, then a second seek backwards to find that frame's keyframe.

A larger concern is the semantic accuracy of the granulepos; it's intended to reflect position accurately when decoding forward. In this scheme, it's fine for a P-frame to update the counter (as it can be decoded going strictly forward), but B frames will also advance the counter; they can't be decoded without subsequent P or I frames. Thus, the semantic value of granulepos no longer strictly represents 'we can decode up to 'granulepos' at the end of this frame'.

===== Strategy 2: Granulepos Represents Keyframes Only =====

In this scheme, only keyframes update the granulepos (monotonically or non-monotonically). It simplifies the seeking process to a keyframe as an Ogg-level seek to page 'x' will always yield a page with a keyframe. In addition, granulepos will also always mean 'we can decode up to *at least* this point in the stream. If the stream is truncated at P or B frames past granulepos, the extra frames can be discarded. (A special case would need to be defined to terminate a stream that doesn't end on an I frame).

The difficulty with this scheme is that it presents slightly more for the software level decoder to track; a proper frame number could not be determined internally without tracking from an I frame. Also, the granulepos an Ogg page would not necessarily map to the last packet on the page, or even any packet on that page; multiple sequential pages could have the same granulepos. It is conceptually slightly messy, although the 'messiness' does not make it at all impractical.

===== Strategy 3: Granulepos Encodes Some State =====

In some ways, this strategy is the most semantically 'over clever', but also the easiest to implement and the one that gives the most correct, up to date sync information. Pending comments, it is the I/P/B video strategy I currently favor.

The granulepos is 64 bits, a size that is absolutely necessary if, for example, it represents the PCM sample count in an audio codec. When being used to encode video frame number, however, it is comparatively absurdly large*.

* note that although granulepos is not permitted to wrap around, we can simply begin a new logical stream segment with a new serial number should a 30fps video stream ever hit the ten-billion year mark.

Thus we clearly have room to skim a few bits off the bottom of granulepos to represent I, P or B frame. These bits are not used as flags, but rather, frame representation becomes a counting problem; We do this such that the count is still always strictly increasing.

For example, we know that I frames will never be more than 256 frames apart and P frames no more than 31 B frames apart, the granulepos of an I frame can be defined to always be granulepos | 0xff == 0. If we can have up to seven intervening P frames, they could be numbered in granulepos-of-iframe + 0x20, 0x40, 0x60... 0xe0. B frames between the I and P frames would use the remaining five bits and be numbers as sub-I and sub-P frames 1 through 31. Thus, starting from zero, the frames/packets in the pattern IPBBPBBI would be numbered 0x000, 0x020, 0x021, 0x022, 0x040, 0x041, 0x042, 0x100.

If we wish to preserve the ability to represent a timebase, the granulepos number for I frames need not be increased monotonically and shifted; it can be used to represent the frame number. The above example becomes 0x000, 0x020, 0x021, 0x022, 0x040, 0x041, 0x042, 0x700. To get real frame number (from an I frame), we just shift granulepos >> 8. This scheme can be taken further or modified to get frame number from any video frame.

In this way, we can always seek, first time, to a desired key frame page (by seeking to Ogg page 'x' where x | 0xff == 0). In addition, each frame still has a unique frame number and also a clear 'group' number, potentially useful information to the decoder. Lastly, granulepos is still semantically correct, although it is now, in a sense, representing a whole.fractional frame number for buffering purposes.

===== Scheme Four: Extra 'Seekpos' Field / Straw Man =====

Another possibility requires extension of the current Ogg page format. Although older players would reject any such extended pages as invalid, we do have versioning and typing fields, so there's not actually any compatibility problems with current Ogg pages... in the future.

The idea in this scheme is to keep the current granulepos as a frame number field (ala scheme 1), but also add a new field 'seekpos' that is used, rather than granulepos, in seeking. The seekpos would represent the number of the last keyframe that passed by.

advantages:

1) The net effect of this strategy is to modify scheme 1 to only require one bisection seek rather than two. Some amount of code simplification (over scheme 1) at the decision-making level.

disadvantages:

1) The Ogg format will need to be revved. No current (ala 1.0) Ogg code will understand the new pages.

2) The header becomes larger, from a minimum size of 27 bytes to a minimum size of 35.

3) This strategy only enhances keyframes; it is of no use in other odd seeking cases.

4) Gives no more information than scheme 3, but is still more complicated, both in code and API (Ogg would have to understand keyframes).

Thus, there's no substantial reason to prefer extending the format over a scheme that's possible within the existing framework. Note that schemes 1-3 can all be implemented within the Ogg stream today.

Monty

[[Category:Ogg]]

Bounties

2017-11-28T10:30:11Z

MrZeus: split paras in header

These are proposed bounty projects, similar to http://gnome.org/bounties/
or the [http://ghostscript.com/article/58.html Ghostscript bug bounty] program.

We don't have the same level of funding but could start a pot with $10-$100 and
let people contribute to specific bounties through PayPal.

=== Xiph Quicktime Plugin ===
[http://www.xiph.org/quicktime/ QuickTime Components] is now a project hosted on xiph.org.

You have to write a Quicktime Plugin for the Ogg container and the Xiph Codec Family.
[http://qtcomponents.sf.net qtcomponents] provides support for Ogg Vorbis and MNG. This could be used as start.
Xiph Quicktime Plugin has to support encoding/decoding for:
* Ogg Media container
**[http://qtcomponents.sf.net qtcomponents] ''has an operational pluggable API for import, it needs some work to be long term supportable. It does not have a pluggable API for exporting at this time.''
* Support for Chained Ogg Streams
**[http://qtcomponents.sf.net qtcomponents] ''imports chained files as multiple tracks in QuickTime. It does not create chained files during export.''
* Support for Icecast Streams (sending is optional)
**[http://qtcomponents.sf.net qtcomponents] ''implements nothing towards this item. First up is a reverse-engineering effort, as the specifications for a streaming media handler have not been published.''
* Support for Xiph Codec Family: Vorbis, Theora, FLAC, Speex, Writ
**[http://qtcomponents.sf.net qtcomponents] ''has code for Vorbis and Speex (not working at the moment) and there is code at [http://damien.drix.free.fr/qtflac/ Damien Drix's site] for FLAC (decode only).''
It must also be possible to use the Xiph codecs in .mov files in combination with other quicktime codecs.
*[http://qtcomponents.sf.net qtcomponents] ''supports embedding media encoded with Xiph codes into .mov files.''
The plugin should work with at least QuickTime 6.x and 7.x on Mac OS X and Windows. (Mac OS 9 would be nice but probably isn't as important.)

All work must be released under the GPL.

Proposed bounty: 100€

=== Aggressive low-bitrate libvorbis encoding improvements for Vorbis I ===
libvorbis has a lot of room for improvement in all quality/bitrate departments, particularly at the lower quality levels / bitrates. There are many directions from which to approach this problem.

To claim this bounty, the following criteria would have to be met:
* A 25%-or-better reduction in bitrate for quality levels -1, 0, 1 on a reasonable testsuite while maintaining qualitative equivilence (or improvement) in community testing.
* No overall qualitative/bitrate regressions in quality levels 2 upwards
* Output ogg files compatible with Vorbis I spec
* Changes under suitable license for re-integration with Xiph.Org libvorbis

Proposed bounty: 200€

=== iPod playback support ===
The [http://ipodlinux.sourceforge.net/ Linux on iPod] project has vorbis decode working (with alternate firmware) at a good fraction of realtime. It should be a small matter of optimization to get it working
for useful playback.

Proposed bounty: 100€

=== Ogg Vorbis Bitrate Peeling ===
:Note: a bounty for this project has been posted on [https://launchpad.net/ launchpad.net]: [https://launchpad.net/bounties/ogg-vorbis-bitrate-peeling Add bitrate peeling to the standard libvorbis encoding library].
Ogg Vorbis bitrate peeling has been a topic brought up time and again to combat MP3 enthusiasts. But this feature does not actually exist, only the mere possibility abounds. This bounty is set to change that.
The peeler must meet the following criteria:
* Any Vorbis stream can be converted (not transcoded) to a lower quality setting
* Resulting streams would be identical or nearly identical to a stream generated by encoding the original source to the selected quality
* This process is reasonably fast (that is, signifigantly faster than re-encoding from source)
The following must also be accomplished to claim this bounty:
* The encoding libraries must be updated to create peelable Vorbis streams natively
* Old Vorbis streams must be peelable already, or convertable with a utility in order to be made peelable
* If older streams are not natively peelable, old unpeelable Vorbis streams must be identifiable and discernable from peelable streams in such a way as to facilitate transcoding streams from the old format
* All work submitted must be licenced under a BSD style licence (excepting circumstances where other licences may conflict)

Proposed bounty: 100€

Main Page

2017-11-23T20:53:38Z

MrZeus: /* Timed Text/Metadata */ copied/pasted Martin's suggestion

In an effort to bring open-source ideals to the world of multimedia, the [[Xiph.Org Foundation]] develops a multitude of amazing products. This wiki describes our free and open protocols and software.

== Demonstrations of Xiph technologies ==

Want to hear or see Xiph in action? These projects are using our codecs, formats, or libraries.

=== Audio ===

* [[OpusSupport]]: List of software and services supporting the [[Opus]] codec
* [[VorbisStreams|Vorbis Streams]]: Stations streaming with the [[Vorbis]] codec
* [[Games that use Vorbis]]: Games using the Vorbis codec for music or sound effects
* [[VorbisHardware|Vorbis Hardware]]: Hardware players using the Vorbis codec
* [[VorbisSoftwarePlayers|Vorbis Software Players]]: list of media players with out-of-box support for Vorbis

=== Video ===

* [[TheoraHardware|Theora Hardware]]: Hardware using the [[Theora]] video codec
* [[TheoraSoftwarePlayers|Theora Software Players]]: list of media players with Theora support
* [[List of Theora videos]]: Sources for video encoded with Theora

== Projects/Formats ==

=== Container Formats ===

* [[Ogg]]: Media container. This is our native format and the recommended container for Xiph codecs.
** [[Ogg Skeleton]]: Skeleton information on all logical content bitstreams in Ogg.
** [[MIMETypesCodecs|Specification of MIME types and respective codecs parameter]]
* [[SpeexRTP]]: RTP payload format for voice
* [[VorbisRTP]]: RTP payload format for general audio
* [[TheoraRTP]]: RTP payload format for video
* [[XSPF]]: XML Sharable Playlist Format

=== Codecs ===
====Compressed Audio/Video====
* [[OpusFAQ|Opus]]: Lossy, low-latency, general-purpose audio codec
* [[Vorbis]]: Lossy audio codec with a [[Tremor|fixed point decoder]]
* [[FLAC]]: Free Lossless Audio Codec
* [[Theora]]: Lossy video codec
* [[Speex]]: Speech codec (obsoleted by [[OpusFAQ|Opus]])
====Uncompressed Audio/Video====
* [[OggPCM]]: Audio codec
====Timed Text/Metadata====
* [[OggKate|Kate]]: Format for lyrics and subtitles
* [[CMML]]: Continuous Media Markup Language, used for [http://www.annodex.net/ Annodex] and subtitles (xine, vlc, gstreamer, and DirectShow support; obsoleted by [[OggKate|Kate]])

=== Software ===

* '''Software for distributing media'''
** [[Icecast Server|Icecast]]: Streaming server
** [[IceS]]: Source client for Icecast servers

* '''Libraries'''
** [[OggPlay]]: library for synchronised Xiph media playback
**[[XiphQT]]: Quicktime component to play the main Xiph formats
** [[VorbisCommentEdit]]: Macintosh Framework making it easy to incorporate the editing of [[VorbisComment|Vorbis Comments]]

* '''Other software'''
** [[OggComponent/VorbisComponent]]: Wrappers to integrate Vorbis into Mac OS X (does not yet support encoding)
** [http://xiph.org/paranoia/ cdparanoia]: CDDA extractor/ripper

=== Community ===

*[[How to help]]
*[[Spread Open Media]]: project to promote Xiph formats.
**[[MailOgging]]: provides templates for anyone willing to contact a company requesting them to add support for Xiph formats.
*[[People]]: Who's who in Xiph.

=== Work in Progress ===
* [[Work In Progress]]: codecs and software still in the research and development stages.
* [[Todo]]: To-do list for various Xiph projects.

== Project management ==

* [[AdminProcesses]]: who's in charge of what project
* [[MonthlyMeeting]]: page with information on Xiph's MonthlyMeeting
* [[MailingLists]]: list of Xiph's mailing lists
* [[Bounties]]: list of bounties that you can take to improve Xiph's projects

== Resources for Video and Audio programmers ==

* [[Ambisonics]]: page with technical information on Ambisonics
* [[Videos|Educational Videos]] about audio/video technology.
* [[Resources and papers on Audio, Music and Speech|Courses and papers on Audio, Music and Speech]]: page with links to MIT and other universities' content
* [[Oggless]]: for ideas on how to use the different Xiph codecs outside Ogg

== Wiki internal ==

* [[XiphInfra:List of services]]: List of services the Xiph.Org Foundation uses.
* [[Translations]]: Do you feel like helping us with some translation work?
* [[XiphWiki:Sandbox]]: Test page for testing your Wiki-editing skills.
* [[XiphWiki:Copyrights]]: License used for all content posted on the XiphWiki.
* [[Logos]]: Logos of the various Xiph projects.

Main Page

2017-11-23T16:00:59Z

MrZeus: /* Demonstrations of Xiph technologies */ add list of software supporting Opus

In an effort to bring open-source ideals to the world of multimedia, the [[Xiph.Org Foundation]] develops a multitude of amazing products. This wiki describes our free and open protocols and software.

== Demonstrations of Xiph technologies ==

Want to hear or see Xiph in action? These projects are using our codecs, formats, or libraries.

=== Audio ===

* [[OpusSupport]]: List of software and services supporting the [[Opus]] codec
* [[VorbisStreams|Vorbis Streams]]: Stations streaming with the [[Vorbis]] codec
* [[Games that use Vorbis]]: Games using the Vorbis codec for music or sound effects
* [[VorbisHardware|Vorbis Hardware]]: Hardware players using the Vorbis codec
* [[VorbisSoftwarePlayers|Vorbis Software Players]]: list of media players with out-of-box support for Vorbis

=== Video ===

* [[TheoraHardware|Theora Hardware]]: Hardware using the [[Theora]] video codec
* [[TheoraSoftwarePlayers|Theora Software Players]]: list of media players with Theora support
* [[List of Theora videos]]: Sources for video encoded with Theora

== Projects/Formats ==

=== Container Formats ===

* [[Ogg]]: Media container. This is our native format and the recommended container for Xiph codecs.
** [[Ogg Skeleton]]: Skeleton information on all logical content bitstreams in Ogg.
** [[MIMETypesCodecs|Specification of MIME types and respective codecs parameter]]
* [[SpeexRTP]]: RTP payload format for voice
* [[VorbisRTP]]: RTP payload format for general audio
* [[TheoraRTP]]: RTP payload format for video
* [[XSPF]]: XML Sharable Playlist Format

=== Codecs ===
====Compressed Audio/Video====
* [[OpusFAQ|Opus]]: Lossy, low-latency, general-purpose audio codec
* [[Vorbis]]: Lossy audio codec with a [[Tremor|fixed point decoder]]
* [[FLAC]]: Free Lossless Audio Codec
* [[Theora]]: Lossy video codec
* [[Speex]]: Speech codec (obsoleted by [[OpusFAQ|Opus]])
====Uncompressed Audio/Video====
* [[OggPCM]]: Audio codec
====Timed Text/Metadata====
* [[CMML]]: Continuous Media Markup Language, used for [http://www.annodex.net/ Annodex] and subtitles (xine, vlc, gstreamer, and DirectShow support)
* [[OggKate|Kate]]: new format for lyrics and subtitles

=== Software ===

* '''Software for distributing media'''
** [[Icecast Server|Icecast]]: Streaming server
** [[IceS]]: Source client for Icecast servers

* '''Libraries'''
** [[OggPlay]]: library for synchronised Xiph media playback
**[[XiphQT]]: Quicktime component to play the main Xiph formats
** [[VorbisCommentEdit]]: Macintosh Framework making it easy to incorporate the editing of [[VorbisComment|Vorbis Comments]]

* '''Other software'''
** [[OggComponent/VorbisComponent]]: Wrappers to integrate Vorbis into Mac OS X (does not yet support encoding)
** [http://xiph.org/paranoia/ cdparanoia]: CDDA extractor/ripper

=== Community ===

*[[How to help]]
*[[Spread Open Media]]: project to promote Xiph formats.
**[[MailOgging]]: provides templates for anyone willing to contact a company requesting them to add support for Xiph formats.
*[[People]]: Who's who in Xiph.

=== Work in Progress ===
* [[Work In Progress]]: codecs and software still in the research and development stages.
* [[Todo]]: To-do list for various Xiph projects.

== Project management ==

* [[AdminProcesses]]: who's in charge of what project
* [[MonthlyMeeting]]: page with information on Xiph's MonthlyMeeting
* [[MailingLists]]: list of Xiph's mailing lists
* [[Bounties]]: list of bounties that you can take to improve Xiph's projects

== Resources for Video and Audio programmers ==

* [[Ambisonics]]: page with technical information on Ambisonics
* [[Videos|Educational Videos]] about audio/video technology.
* [[Resources and papers on Audio, Music and Speech|Courses and papers on Audio, Music and Speech]]: page with links to MIT and other universities' content
* [[Oggless]]: for ideas on how to use the different Xiph codecs outside Ogg

== Wiki internal ==

* [[XiphInfra:List of services]]: List of services the Xiph.Org Foundation uses.
* [[Translations]]: Do you feel like helping us with some translation work?
* [[XiphWiki:Sandbox]]: Test page for testing your Wiki-editing skills.
* [[XiphWiki:Copyrights]]: License used for all content posted on the XiphWiki.
* [[Logos]]: Logos of the various Xiph projects.

Icecast Server/Installing latest version (official Xiph repositories)

2017-11-23T11:53:25Z

MrZeus: /* Step 2: Import the Multimedia signing key */

Xiph.org provides the latest version of Icecast packaged for [https://build.opensuse.org/package/repositories/multimedia:xiph/icecast various distributions]. The packages are built centrally from [https://build.opensuse.org/package/show/multimedia:xiph/icecast one set of sources] on the [https://build.opensuse.org/ openSUSE OpenBuildService instance] in the [https://build.opensuse.org/project/show/multimedia:xiph Multimedia/Xiph.org project].

Packages are usually available on release day. Packaging follows closely the original distro packaging to stay a seamless drop in replacement. The packages are meant for users who need the latest version of Icecast or HTTPS support, while their distribution doesn't provide it.

== Debian and Ubuntu (in all its flavors) ==
It takes 4 simple commands to install the latest Icecast version on a deb based distro.

==== Step 1: Add the repository ====
This expects that you have '''sudo''' installed. If not, open a root shell and run the '''echo''' command directly.

Copy and paste the command for your distribution release and make sure that it's executed as '''one''' line!

{| class="wikitable"
!Distribution Release
!Command
!Repository
!Comments
|-
|Debian 8.0 (jessie)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/Debian_8.0/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Debian_8.0/ repository]
|
|-
|Debian 7.0 (wheezy)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/Debian_7.0/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Debian_7.0/ repository]
|
|-
|Debian 6.0 (squeeze)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/Debian_6.0/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Debian_6.0/ repository]
|
|-
|Ubuntu 14.04 (trusty)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_14.04/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_14.04/ repository]
|
|-
|Ubuntu 15.04 (vivid)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_15.04/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_15.04/ repository]
|''not'' LTS<ref name="ubuntu-lts">Ubuntu releases that are not Long-Term-Support have a [https://wiki.ubuntu.com/Releases short life cycle]. Xiph.org will stop offering updated packages for those some time after Canonical/the Ubuntu Project end their support.</ref>!
|-
|Ubuntu 15.10 (wily)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_15.10/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_15.10/ repository]
|''not'' LTS<ref name="ubuntu-lts"/>!
|-
|Ubuntu 16.04 (xenial)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_16.04/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_16.04/ repository]
|LTS<ref name="ubuntu-lts"/>
|-
|Ubuntu 17.04 (zesty)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_17.04/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_17.04/ repository]
|''not'' LTS<ref name="ubuntu-lts">Ubuntu releases that are not Long-Term-Support have a [https://wiki.ubuntu.com/Releases short life cycle]. Xiph.org will stop offering updated packages for those some time after Canonical/the Ubuntu Project end their support.</ref>!
|-
|Ubuntu 17.10 (artful)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_17.10/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_17.10/ repository]
|''not'' LTS<ref name="ubuntu-lts"/>!
|-
|source package, any deb distro
|<syntaxhighlight lang="bash">sudo sh -c "echo deb-src http://download.opensuse.org/repositories/multimedia:/xiph/Debian_8.0/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Debian_8.0/ repository]
| actually available for ''all'' repository URLs
|-
|Linux Mint:
|Use the information listed above for the [https://en.wikipedia.org/wiki/List_of_Linux_Mint_releases#Release_history corresponding Ubuntu release].
|}

==== Step 2: Import the ''Multimedia'' signing key ====
You need to add the [http://icecast.org/multimedia-obs.key openSUSE OBS '''Multimedia''' signing key] as a Trusted Key to your system.

There are many ways to verify this key, e.g. by a simple web search.

After downloading the key, you can run this command to verify your copy:

<syntaxhighlight lang="bash">gpg multimedia-obs.key</syntaxhighlight>

It should yield:

pub rsa2048 2017-11-21 [SC] [expires: 2020-01-30]
<nowiki> 0E313DB7936B4E76E720065B77EC2301F23C6AA3</nowiki>
uid multimedia OBS Project <multimedia@build.opensuse.org>

Then you can add the key to your system's Trusted Keys using:

<syntaxhighlight lang="bash">sudo apt-key add multimedia-obs.key</syntaxhighlight>

The simplest way (but not very secure, since you're not checking the key) is to add the key with one command line:

<syntaxhighlight lang="bash">wget -qO - http://icecast.org/multimedia-obs.key | sudo apt-key add -</syntaxhighlight>

==== Step 3: Update your repository index ====
<syntaxhighlight lang="bash">sudo apt-get update</syntaxhighlight>
==== Step 4: Install Icecast ====
<syntaxhighlight lang="bash">sudo apt-get install icecast2</syntaxhighlight>

== RedHat and its derivatives ==
{| class="wikitable"
!Distribution Release
!Command
!Repository
!Comments
|-
|EPEL5 (RHEL 5/CentOS 5)
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/CentOS_5 repository]
|
|-
|EPEL6 (RHEL 6/CentOS 6)
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/CentOS_6 repository]
|
|-
|EPEL7 (RHEL 7/CentOS 7)
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/CentOS_7 repository]
|
|-
|Fedora 22
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Fedora_22 repository]
|
|-
|Fedora 23
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Fedora_23 repository]
|
|-

|}

== openSUSE ==
{| class="wikitable"
!Distribution Release
!Command
!Repository
!Comments
|-
|SUSE Linux Enterprise 11.3
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/SLE_11_SP3 repository]
|
|-
|SUSE Linux Enterprise 11.4
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/SLE_11_SP4 repository]
|
|-
|SUSE Linux Enterprise 12
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/SLE_12 repository]
|
|-
|openSUSE 13.1
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/openSUSE_13.1 repository]
|
|-
|openSUSE 13.2
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/openSUSE_13.2 repository]
|
|-
|openSUSE Leap 42.1
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/openSUSE_Leap_42.1 repository]
|
|-
|openSUSE Tumbleweed
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/openSUSE_Tumbleweed repository]
|
|-

|}

== Footnotes ==
<references/>
[[Category: Icecast]]

Icecast Server/Installing latest version (official Xiph repositories)

2017-11-23T11:31:42Z

MrZeus: /* Debian and Ubuntu (in all its flavors) */

Xiph.org provides the latest version of Icecast packaged for [https://build.opensuse.org/package/repositories/multimedia:xiph/icecast various distributions]. The packages are built centrally from [https://build.opensuse.org/package/show/multimedia:xiph/icecast one set of sources] on the [https://build.opensuse.org/ openSUSE OpenBuildService instance] in the [https://build.opensuse.org/project/show/multimedia:xiph Multimedia/Xiph.org project].

Packages are usually available on release day. Packaging follows closely the original distro packaging to stay a seamless drop in replacement. The packages are meant for users who need the latest version of Icecast or HTTPS support, while their distribution doesn't provide it.

== Debian and Ubuntu (in all its flavors) ==
It takes 4 simple commands to install the latest Icecast version on a deb based distro.

==== Step 1: Add the repository ====
This expects that you have '''sudo''' installed. If not, open a root shell and run the '''echo''' command directly.

Copy and paste the command for your distribution release and make sure that it's executed as '''one''' line!

{| class="wikitable"
!Distribution Release
!Command
!Repository
!Comments
|-
|Debian 8.0 (jessie)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/Debian_8.0/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Debian_8.0/ repository]
|
|-
|Debian 7.0 (wheezy)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/Debian_7.0/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Debian_7.0/ repository]
|
|-
|Debian 6.0 (squeeze)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/Debian_6.0/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Debian_6.0/ repository]
|
|-
|Ubuntu 14.04 (trusty)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_14.04/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_14.04/ repository]
|
|-
|Ubuntu 15.04 (vivid)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_15.04/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_15.04/ repository]
|''not'' LTS<ref name="ubuntu-lts">Ubuntu releases that are not Long-Term-Support have a [https://wiki.ubuntu.com/Releases short life cycle]. Xiph.org will stop offering updated packages for those some time after Canonical/the Ubuntu Project end their support.</ref>!
|-
|Ubuntu 15.10 (wily)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_15.10/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_15.10/ repository]
|''not'' LTS<ref name="ubuntu-lts"/>!
|-
|Ubuntu 16.04 (xenial)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_16.04/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_16.04/ repository]
|LTS<ref name="ubuntu-lts"/>
|-
|Ubuntu 17.04 (zesty)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_17.04/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_17.04/ repository]
|''not'' LTS<ref name="ubuntu-lts">Ubuntu releases that are not Long-Term-Support have a [https://wiki.ubuntu.com/Releases short life cycle]. Xiph.org will stop offering updated packages for those some time after Canonical/the Ubuntu Project end their support.</ref>!
|-
|Ubuntu 17.10 (artful)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_17.10/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_17.10/ repository]
|''not'' LTS<ref name="ubuntu-lts"/>!
|-
|source package, any deb distro
|<syntaxhighlight lang="bash">sudo sh -c "echo deb-src http://download.opensuse.org/repositories/multimedia:/xiph/Debian_8.0/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Debian_8.0/ repository]
| actually available for ''all'' repository URLs
|-
|Linux Mint:
|Use the information listed above for the [https://en.wikipedia.org/wiki/List_of_Linux_Mint_releases#Release_history corresponding Ubuntu release].
|}

==== Step 2: Import the ''Multimedia'' signing key ====
You need to add the [http://icecast.org/multimedia-obs.key openSUSE OBS '''Multimedia''' signing key] as a trusted key to your system.

There are many ways to verify this key, e.g. by a simple web search.

After downloading the key, you can run this command to verify your copy:

<code>gpg multimedia-obs.key</code>

It should yield:

pub rsa2048 2017-11-21 [SC] [expires: 2020-01-30]
<nowiki> 0E313DB7936B4E76E720065B77EC2301F23C6AA3</nowiki>
uid multimedia OBS Project <multimedia@build.opensuse.org>

Then you can add the key to your system's Trusted Keys using:

<syntaxhighlight lang="bash">sudo apt-key add multimedia-obs.key</syntaxhighlight>

The simplest way (but not very secure, since you're not checking the key) is to add the key with one command line:

<syntaxhighlight lang="bash">wget -qO - http://icecast.org/multimedia-obs.key | sudo apt-key add -</syntaxhighlight>

==== Step 3: Update your repository index ====
<syntaxhighlight lang="bash">sudo apt-get update</syntaxhighlight>
==== Step 4: Install Icecast ====
<syntaxhighlight lang="bash">sudo apt-get install icecast2</syntaxhighlight>

== RedHat and its derivatives ==
{| class="wikitable"
!Distribution Release
!Command
!Repository
!Comments
|-
|EPEL5 (RHEL 5/CentOS 5)
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/CentOS_5 repository]
|
|-
|EPEL6 (RHEL 6/CentOS 6)
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/CentOS_6 repository]
|
|-
|EPEL7 (RHEL 7/CentOS 7)
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/CentOS_7 repository]
|
|-
|Fedora 22
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Fedora_22 repository]
|
|-
|Fedora 23
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Fedora_23 repository]
|
|-

|}

== openSUSE ==
{| class="wikitable"
!Distribution Release
!Command
!Repository
!Comments
|-
|SUSE Linux Enterprise 11.3
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/SLE_11_SP3 repository]
|
|-
|SUSE Linux Enterprise 11.4
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/SLE_11_SP4 repository]
|
|-
|SUSE Linux Enterprise 12
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/SLE_12 repository]
|
|-
|openSUSE 13.1
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/openSUSE_13.1 repository]
|
|-
|openSUSE 13.2
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/openSUSE_13.2 repository]
|
|-
|openSUSE Leap 42.1
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/openSUSE_Leap_42.1 repository]
|
|-
|openSUSE Tumbleweed
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/openSUSE_Tumbleweed repository]
|
|-

|}

== Footnotes ==
<references/>
[[Category: Icecast]]

Icecast Server/Installing latest version (official Xiph repositories)

2017-11-22T17:35:57Z

MrZeus: /* Import the Multimedia signing key */

Xiph.org provides the latest version of Icecast packaged for [https://build.opensuse.org/package/repositories/multimedia:xiph/icecast various distributions]. The packages are built centrally from [https://build.opensuse.org/package/show/multimedia:xiph/icecast one set of sources] on the [https://build.opensuse.org/ openSUSE OpenBuildService instance] in the [https://build.opensuse.org/project/show/multimedia:xiph Multimedia/Xiph.org project].

Packages are usually available on release day. Packaging follows closely the original distro packaging to stay a seamless drop in replacement. The packages are meant for users who need the latest version of Icecast or HTTPS support, while their distribution doesn't provide it.

== Debian and Ubuntu (in all its flavors) ==
It takes 4 simple commands to install the latest Icecast version on a deb based distro.

==== Add the repository ====
This expects that you have ''sudo'' installed, if not, open a root shell and run the ''echo'' command directly.

Copy paste the command for your distribution release and make sure that it's executed as '''one''' line!
{| class="wikitable"
!Distribution Release
!Command
!Repository
!Comments
|-
|Debian 8.0 (jessie)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/Debian_8.0/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Debian_8.0/ repository]
|
|-
|Debian 7.0 (wheezy)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/Debian_7.0/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Debian_7.0/ repository]
|
|-
|Debian 6.0 (squeeze)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/Debian_6.0/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Debian_6.0/ repository]
|
|-
|Ubuntu 14.04 (trusty)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_14.04/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_14.04/ repository]
|
|-
|Ubuntu 15.04 (vivid)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_15.04/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_15.04/ repository]
|''not'' LTS<ref name="ubuntu-lts">Ubuntu releases that are not Long-Term-Support have a [https://wiki.ubuntu.com/Releases short life cycle]. Xiph.org will stop offering updated packages for those some time after Canonical/the Ubuntu Project end their support.</ref>!
|-
|Ubuntu 15.10 (wily)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_15.10/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_15.10/ repository]
|''not'' LTS<ref name="ubuntu-lts"/>!
|-
|Ubuntu 16.04 (xenial)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_16.04/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_16.04/ repository]
|LTS<ref name="ubuntu-lts"/>
|-
|Ubuntu 17.04 (zesty)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_17.04/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_17.04/ repository]
|''not'' LTS<ref name="ubuntu-lts">Ubuntu releases that are not Long-Term-Support have a [https://wiki.ubuntu.com/Releases short life cycle]. Xiph.org will stop offering updated packages for those some time after Canonical/the Ubuntu Project end their support.</ref>!
|-
|Ubuntu 17.10 (artful)
|<syntaxhighlight lang="bash">sudo sh -c "echo deb http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_17.10/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/xUbuntu_17.10/ repository]
|''not'' LTS<ref name="ubuntu-lts"/>!
|-
|source package, any deb distro
|<syntaxhighlight lang="bash">sudo sh -c "echo deb-src http://download.opensuse.org/repositories/multimedia:/xiph/Debian_8.0/ ./ >>/etc/apt/sources.list.d/icecast.list"</syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Debian_8.0/ repository]
| actually available for ''all'' repository URLs
|-
|Linux Mint:
|Use the information listed above for the [https://en.wikipedia.org/wiki/List_of_Linux_Mint_releases#Release_history corresponding Ubuntu release].
|}

==== Import the ''Multimedia'' signing key ====
You need to add the [http://icecast.org/multimedia-obs.key openSUSE OBS '''Multimedia''' signing key] as a trusted key to your system.

There are many ways to verify this key, e.g. by a simple web search.

After downloading the key, you can run this command to verify your copy:

<code>gpg multimedia-obs.key</code>

It should yield:

pub rsa2048 2017-11-21 [SC] [expires: 2020-01-30]
<nowiki> 0E313DB7936B4E76E720065B77EC2301F23C6AA3</nowiki>
uid multimedia OBS Project <multimedia@build.opensuse.org>

Then you can the key to your system's Trusted Keys, using:

<code>sudo apt-key add multimedia-obs.key</code>

The most simple way (but not very secure, since you're not checking the key) is to add the key with one command line:

<code>wget -qO - http://icecast.org/multimedia-obs.key | sudo apt-key add -</code>

==== Update repository index ====
<code><nowiki>sudo apt-get update</nowiki></code>
==== Install Icecast ====
<code><nowiki>sudo apt-get install icecast2</nowiki></code>

== RedHat and its derivatives ==
{| class="wikitable"
!Distribution Release
!Command
!Repository
!Comments
|-
|EPEL5 (RHEL 5/CentOS 5)
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/CentOS_5 repository]
|
|-
|EPEL6 (RHEL 6/CentOS 6)
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/CentOS_6 repository]
|
|-
|EPEL7 (RHEL 7/CentOS 7)
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/CentOS_7 repository]
|
|-
|Fedora 22
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Fedora_22 repository]
|
|-
|Fedora 23
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/Fedora_23 repository]
|
|-

|}

== openSUSE ==
{| class="wikitable"
!Distribution Release
!Command
!Repository
!Comments
|-
|SUSE Linux Enterprise 11.3
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/SLE_11_SP3 repository]
|
|-
|SUSE Linux Enterprise 11.4
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/SLE_11_SP4 repository]
|
|-
|SUSE Linux Enterprise 12
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/SLE_12 repository]
|
|-
|openSUSE 13.1
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/openSUSE_13.1 repository]
|
|-
|openSUSE 13.2
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/openSUSE_13.2 repository]
|
|-
|openSUSE Leap 42.1
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/openSUSE_Leap_42.1 repository]
|
|-
|openSUSE Tumbleweed
|<syntaxhighlight lang="bash"></syntaxhighlight>
|[http://download.opensuse.org/repositories/multimedia:/xiph/openSUSE_Tumbleweed repository]
|
|-

|}

== Footnotes ==
<references/>
[[Category: Icecast]]

OggKate

2017-11-21T09:32:05Z

MrZeus: /* Frequently Asked Questions */ linkify Pango and Cairo

== Disclaimer ==
This is not a Xiph codec, though it may be embedded in Ogg alonside other Xiph
codecs, such as Vorbis and Theora. As such, please do not assume that Xiph has
anything to do with this, much less responsibility.

== What is Kate? ==

Kate is an overlay codec, originally designed for karaoke and text, that can be
multiplexed in Ogg.

Text and images can be carried and animated by a Kate stream.
Most of the time, they will (optionally) be multiplexed with audio/video to carry subtitles,
song lyrics (with or without karaoke data), etc.

Series of curves (splines, segments, etc) may be attached to various properties
(text position, font size, etc) to create animated overlays. This allows scrolling
or fading text to be defined. This can even be used to draw arbitrary shapes, so
hand drawing can also be represented by a Kate stream.

Example uses of Kate streams are movie subtitles for Theora videos, either text based,
as may be created by [http://www.v2v.cc/~j/ffmpeg2theora ffmpeg2theora], or image
based, such as created by [http://thoggen.net Thoggen] (patching needed), and lyrics,
as created by oggenc, from vorbis-tools.

== Why a new codec? ==

As I was adding support for Theora, Speex and FLAC to some software of mine, I found myself
wanting to have song lyrics accompanying Vorbis audio. Since Vorbis comments are limited to
the headers, one can't add them in the stream as they are sung, so another multiplexed stream
would be needed to carry them.

The three possible bases usable for such a codec I found were Writ, CMML, and OGM/SRT.

*[[OggWrit|Writ]] is an unmaintained start at an implementation of a very basic design, though I did find an encoder/decoder in py-ogg2 later on - I'd been quicker to write Kate from scratch anyway.
*[[CMML]] is more geared towards encapsulating metadata about an accompanying stream, rather than being a data stream itself, and seemed complex for a simple use, though I have now revised my view on this - besides, it seems designed for Annodex (which I haven't had a look at), though it does seems relatively generic for use outwith Annodex - though it is being "repurposed" as timed text now, bringing it closer to what I'm doing
*OGM/SRT, which I only found when I added Kate support to MPlayer, is shoehorning various data formats into an Ogg stream, and just dumps the SRT subtitle format as is, AFAICS (though I haven't looked at this one in detail, since I'd already had a working Kate implementation by that time)

I then decided to roll my own, not least because it's a fun thing to do.

I found other formats, such as USF (designed for inclusion in Matroska) and various subtitle formats,
but none were designed for embedding inside an Ogg container.

== Overview of the Kate bitstream format ==

I've taken much inspiration from Vorbis and Theora here.
Headers and packets (as well as the API design) follow the design of these two codecs.

A rough overview (see [[#Format specification|Format specification]] for more details) is:

Headers packets:
*ID header [BOS]: magic, version, granule fraction, encoding, language, etc
*Comment header: Vorbis comments, as per Vorbis/Theora streams
*Style definitions header: a list of predefined styles to be referred to by data packets
*Region definitions header: a list of predefined regions to be referred to by data packets
*Curves definitions header: a list of predefined curves to be referred to by data packets
*Motion definitions header: a list of predefined motions to be referred to by data packets
*Palette definitions header: a list of predefined palettes to be referred to by data packets
*Bitmap definitions header: a list of predefined bitmaps to be referred to by data packets
*Font mapping definitions header: a list of predefined font mappings to be referred to by data packets

Other header packets are ignored, and left for future expansion.

Data packets:
*text data: text/image and optional motions, accompanied by optional overrides for style, region, language, etc
*keepalive: can be emitted at any time to help a demuxer know where we're at, but those packets are optional
*repeats: a verbatim repeat of a text packet's payload, in order to bound any backward seeking needed when starting to play a stream partway through. These are also optional.
*end data [EOS]: marks the end of the stream, it doesn't have any useful payload

Other data packets are ignored, and left for future expansion.

The intent of the "keepalive" packet is to be sent at regular
intervals when no other packet has been emitted for a while. This would be to help seeking code
find a kate page more easily.

Things of note:
*Kate is a discontinuous codec, as defined in [http://www.xiph.org/ogg/doc/ogg-multiplex.html ogg-multiplex.html] in the Ogg documentation, which means it's timed by start granule, not end granule (as Theora and Vorbis).
* All data packets are on their own page, for two reasons:
**Ogg keeps track of granules at the page level, not the packet level
**if no text event happens for a while after a particular text event, we don't want to delay it so a larger page can be issued

See also [[#Seeking and memory|Problems to solve: Seeking and memory]].

*The granule encoding is not a direct time/granule correspondance, see the granule encoding section.
*The EOS packet should have a granule pos higher or equal to the end time of all events.
*User code doesn't have to know the number of headers to expect, this is moved inside the library code (as opposed to Vorbis and Theora).
*The format contains hooks so that additional information may be added in future revisions while keeping backward compatibility (though old decoders will correctly parse, but ignore the new information).

== Format specification ==

The Kate bitstream format consists of a number of sequential packets.
Packets can be either header packets or data packets. All header packets
must appear before any data packet.

Header packets must appear in order. Decoding of a data packet is not
possible until all header packets have been decoded.

Each Kate packet starts with a one byte type. A type with the MSB set
(eg, between 0x80 and 0xff) indicates a header packet, while a type with
the MSB cleared (eg, between 0x00 and 0x7f) indicates a data packet.
All header packets then have the Kate magic, from byte offset 1 to byte
offset 7 ("kate\0\0\0"). Note that this applies only to header packets:
data packets do not contain the Kate signature.

Since the ID header must appear first, a Kate stream can be recognized
by comparing the first eight bytes of the first packet with the signature
string "\200kate\0\0\0".

When embedded in Ogg,the first packet in a Kate stream (always packet type 0x80,
the id header packet) must be placed on a separate page. The corresponding Ogg
packet must be marked as beginning of stream (BOS).All subsequent header packets
must be on one or more pages. Subsequently, each data packet must be on a separate
page.

The last data packet must be the end of stream packet (packet type 0x7f).

When embedded in Ogg, the corresponding Ogg packet must be marked as end of stream (EOS).

As per the Ogg specification, granule positions must be non decreasing
within the stream. Header packets have granule position 0.

Currently existing packet types are:
:headers:
::0x80 ID header (BOS)
::0x81 Vorbis comment header
::0x82 regions list header
::0x83 styles list header
::0x84 curves list header
::0x85 motions list header
::0x86 palettes list header
::0x87 bitmaps list header
::0x88 font ranges and mappings header
:data:
::0x00 text data (including optional motions and overrides)
::0x01 keepalive
::0x02 repeat
::0x7f end packet (EOS)

This format described here is for bitstream version 0.x.
As or 19 december 2008, the latest bitstream version is 0.4.

For more detailed information, refer to the format documentation
in libkate (see URL below in the [[#Downloading|Downlading]] section).

Following is the definition of the ID header (packet type 0x80).
This works out to a 64 byte ID header. This is the header that should be
used to detect a Kate stream within an Ogg stream.

0 1 2 3 |
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| packtype | Identifier char[7]: 'kate\0\0\0' | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| kate magic continued | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| reserved - 0 | version major | version minor | num headers | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| text encoding | directionality| reserved - 0 | granule shift | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cw sh | canvas width | ch sh | canvas height | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| reserved - 0 | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| granule rate numerator | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| granule rate denominator | 28-31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (NUL terminated) | 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (continued) | 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (continued) | 40-43
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (continued) | 44-47
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (NUL terminated) | 48-51
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (continued) | 52-55
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (continued) | 56-59
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (continued) | 60-63
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The fields cw sh, canvas width, cw sh, and canvas height were introduced
in bistream 0.3. Earlier bitstreams will have 0 in these fields.

language and category are NUL terminating ASCII strings.
Language follows RFC 3066, though obviously will not accommodate language tags
with lots of subtags.

Category is currently loosely defined, and I haven't found yet a nice way to
present it in a generic way, but is meant for automatic classifying of
various multiplexed Kate streams (eg, to recognize that some streams are
subtitles (in a set of languages), and some others are commentary (in a
possibly different set of languages, etc).

== API overview ==

libkate offers an API very similar to that of libvorbis and libtheora, as well as
an extra higher level decoding API.

Here's an overview of the three main modules:

=== Decoding ===

Decoding is done in a way similar to libvorbis. First, initialize a kate_info and a
kate_comment structure. Then, read headers by calling kate_decode_headerin. Once
all headers have been read, a kate_state is initialized for decoding using kate_decode_init,
and kate_decode_packetin is called repeatedly with data packets. Events (eg, text) can be
retrieved via kate_decode_eventout.

=== Encoding ===

Encoding is also done in a way similar to libvorbis. First initialize a kate_info
and a kate_comment structure, and fill them out as needed. kate_encode_headers will
create ogg packets from those. Then, kate_encode_text is called repeatedly for all
the text events to add. When done, calling kate_encode_finish will create an end of
stream packet.

=== High level decoding API ===

There are only 3 calls here:

kate_high_decode_init
kate_high_decode_packetin
kate_high_decode_clear

Here, all Ogg packets are sent to kate_high_decode_packetin, which does the right
thing (header/data classification, decoding, and event retrieval). Note that you
do not get access to the comments directly using this, but you do get access to the
kate_info via events.

The libkate distribution includes commented examples for each of those.

Additionally, libkate includes a layer (liboggkate) to make it easier to use when
embedded in Ogg. While the normal API uses kate_packet structures, liboggkate uses
ogg_packet structures.

The High level decoding API does not have an Ogg specific layer, but functions exist
to wrap a kate_packet around a memory buffer (such as the one ogg_packet uses, for instance).

== Support ==

Among the software with Kate support:
*VLC
*ffmpeg2theora
*liboggz
*liboggplay
*Cortado (wikimedia version)
*vorbis-tools

I have patches for the following with Kate support:
*MPlayer
*xine
*GStreamer
*Thoggen
*Audacious
*and more...

These may be found in the libkate source distribution (see [[#Downloading|Downloading]]
for links).

In addition, libtiger is a rendering library for Kate streams using Pango and Cairo,
though it is not quite yet API stable (though no major changes are expected).

== Granule encoding ==

=== Ogg ===

Ogg leaves the encoding of granules up to a particular codec, only
mandating that granules be non decreasing with time.

The Kate bitstream format uses a linear mapping between time and
granule, described here.

A Kate granule position is composed of two different parts:
- a base granule, in the high bits
- a granule offset, in the low bits

+----------------+----------------+
| base | offset |
+----------------+----------------+

The number of bits these parts occupy is variable, and each stream
may choose how many bits to dedicate to each. The kate_info structure
for a stream holds that information in the granule_shift field,
so each part may be reconstructed from a granulepos.

The timestamp T of a given Kate packet is split into a base B and
offset O, and these are stored in the granulepos of that packet.
The split is done such that the B is the time of the earliest event
still active at the time, and the O is the time elapsed between B
and T. Thus, T = B + O. This mimics the way Theora stores its own
timestamps in granulepos, where the base acts as a keyframe, and
an offset acts as the position of an intra frame from the previous
keyframe. Since Kate allows time overlapping events, however, the
choice of the base to use is slightly more complex, as it may not
be the starting time of the previous event, if the stream contains
time overlapping events.

The kate_info structure for a stream holds a rational fraction
representing the time span of granule units for both the base and
the offset parts.

The granule rate is defined by the two fields:

kate_info::gps_numerator
kate_info::gps_denominator

The number of bits reserved for the offset is defined by the field:

kate_info::granule_shift

=== Generic timing ===

Kate data packets (data packet type 0) includes timing information (start time,
end time, and time of the earliest event still active). All these are stored as
64 bit at the rate defined by the granule rate, so they do not suffer from the
granule_shift space limitation.

This also allows for Kate streams to be stored in other containers.

== Motion ==

The Kate bitstream format includes motion definition, originally for karaoke purposes, but
which can be used for more general purpose, such as line based drawing, or animation of
the text (position, color, etc)

Motions are defined by the means of a series of curves (static points, segments, splines (catmull-rom, bezier, and b-splines)).
A 2D point can be obtained from a motion for any timestamp during the lifetime of a text.
This can be used for moving a marker in 2D above the text for karaoke, or to use the x
coordinate to color text when the motion position passes each letter or word, etc.
Motions have an attached semantics so the client code knows how to use a particular motion.
Predefined semantics include text color, text position, etc).

Since a motion can be composed of an arbitrary number of curves, each of which may have
an arbitrary number of control points, complex motions can be achieved. If the motion is
the main object of an event, it is even possible to have an empty text, and use the motion
as a virtual pencil to draw arbitrary shapes. Even on-the-fly handwriting subtitles could
be done this way, though this would require a lot of control points, and would not be able
to be used with text-to-speech.

As a proof of concept, I also have a "draw chat" program where two people can draw, and
the shapes are turned to b-splines and sent as a kate motion to be displayed on the other
person's window.

It is also possible for motions to be discontinuous - simply insert a curve of 'none' type.
While the timestamp lies within such a curve, no 2D point will be generated. This can be
used to temporarily hide a marker, for instance.

It is worth mentionning that pauses in the motion can be trivially included by inserting
at the right time and for the right duration a simple linear interpolation curve with only
two equal points, equal to the position the motion is supposed to pause at.

Kate defines a set of predefined mappings so that each decoder user interprets a motion in
the same way. A mapping is coded on 8 bits in the bitstream, and the first 128 are reserved
for Kate, leaving 128 for application specific mappings, to avoid constraining creative uses
of that feature. Predefined mappings include frame (eg, 0-1 points are mapped to the size of
the current video frame), or region, to scale 0-1 to the current region. This allows curves
to be defined without knowing in advance the pixel size of the area it should cover.

For uses which require more than two coordinates (eg, text color, where 4 (RGBA) values are
needed, Kate predefines the semantics text_color_rg and text_color_ba, so a 4D point can be
obtained using two different motions.

There are higher level constructs, such as morphing between two styles, or predefined
karaoke effects. More are planned to be added in the future.

See also [[#Trackers|Trackers]].

== Trackers ==

Since attaching motions to text position, etc, makes it hard for the client to keep track of
everything, doing interpolation, etc, the library supplies a tracker object, which handles the
interpolation of the relevant properties.
Once initialized with a text and a set of motions, the client code can give the tracker a new
timestamp, and get back the current text position, text color, etc.

Using a tracker is not necessary, if one wants to use the motions directly, or just ignore them,
but it makes life easier, especially when considering the the order in which motions are applied
does matter (to be defined formally, but the current source code is informative at this point).

== The Kate file format ==

Though this is not a feature of the bitstream format, I have created a text file format to
describe a series of events to be turned into a Kate bitstream.
At its minimum, the following is a valid input to the encoder:

: kate {
:: event { 00:00:05 --> 00:00:10 "This is a text" }
: }

This will create a simple stream with "This is a text" emitted at an offset of 5 seconds into
the track, lasting 5 seconds to an end time at 10 seconds.

Motions, regions, styles can be declared in a definitions block to be reused by events, or can
be defined inline. Defining those in the definitions block places them in a header so they can
be reused later, saving space. However, they can also be defined in each event, so they will be
sent with the event. This allows them to be generated on the fly (eg, if the bitstream is being
streamed from a realtime input).

For convenience, the Kate file format also allows C style macros, though without parameters.

Please note that the Kate file format is fully separate from the Kate bitstream format. The
difference between the two is similar to the difference between a C source file and the resulting
object file, when compiled.

Note that the format is not based on XML for a very parochial reason: I tend to dislike very
much editing XML by hand, as it's really hard to read. XML is really meant for machines to parse
generically text data in a shared syntax but with possibly unknown semantics, and I need those
text representations to be editable easily.

This also implies that there could be an XML representation of a Kate stream, which would be
useful if one were to make an editor that worked on a higher level than the current all-text
representation, and it is something that might very well happen in the future, in parallel with
the current format.

== Karaoke ==

Karaoke effects rely on motions, and there will be predefined higher level ways of specifying
timings and effects, two of which are already done.

As an example, this is a valid Karaoke script:

:kate {
:: simple_timed_glyph_style_morph {
::: from style "start_style" to style "end_style"
::: "Let " at 1.0
::: "us " at 1.2
::: "sing " at 1.4
::: "to" at 2.0
::: "ge" at 2.5
::: "ther" at 3.0
:: }
:}

The syllables will change from a style to another as time passes. The definition of the start_style
and end_style styles is omitted for brevity.

== Problems to solve ==

There are a few things to solve before the Kate bitstream format can be considered good
enough to be frozen:

Note: the following is mostly solved, and the bitstream is now stable, and has been
backward and forward compatible since the first released version. This will be updated
when I get some time.

=== Seeking and memory ===

When seeking to a particular time in a movie with subtitles, we may end up at a place when a subtitle has been started, but is not removed yet. Pure streaming doesn't have this problem as it remembers the subtitle being issued (as opposed to, say, Vorbis, for which all data valid now is decoded from the last packet). With Kate, a text string valid now may have been issued long ago.

I see three possible ways to solve this:
*each data packet includes the granule of the earliest still active packet (if none, this will be the granule of this very packet)
**this means seeks are two phased: first seek, find the next Kate packet, and seek again if the granule of the earlier still active packet is less than the original seeked granule. This implies support code on players to do the double seek.

*use "reference frames", a bit like Theora does, where the granule position is split in several fields: the higher bits represent a position for the reference frame, and the lowest bits a delta time to the current position. When seeking to a granule position, the lower bits are cleared off, yielding the granule position of the previous reference frame, so the seek ends up at the reference frame. The reference frame is a sync point where any active strings are issued again. This is a variant of the method described in the Writ wiki page, but the granule splitting avoids any "downtime".
**this requires reissuing packets, and it doesn't feel right (and wastes space).
**it also requires "dummy" decoding of Kate data from the reference frame to the actual seek point to fully refresh the state "memory".

*A variant of the two-granules-in-one system used by libcmml, where the "back link" points to the earliest still active string, rather than the previous one (this allows a two phase seek, rather than a multiphase seek, hopping back from event to event, with no real way to know if there is or not a previous event which is still active - I suppose CMML has no need to know this, if their "clips" do not overlap - mine can do).
**Such a system considerably shortens the usable granule space, though it can do a one phase seek, if I understand the system correctly, which I am not certain.
*** Well, it seems it can't do a one phase seek anyway.

*Additionally, it could be possible to emit simple "keepalive" packets at regular intervals to help a seek algorithm to sync up to the stream without needing too much data reading - this helps for discontinuous streams where there could be no pages for a while if no data is needed at that time.

=== Text encoding ===

A header field declares the text encoding used in the stream. At the moment, only UTF-8 is
supported, for simplicity. There are no plans to support other encodings, such as UTF-16,
at the moment.

Note that strings included in the header (language, category) are not affected by that
language encoding (rather obviously for language itself). These are ASCII.

The actual text in events may include simple HTML-like markup (at the moment, allowed markup
is the same as the one Pango uses, but more markup types may be defined in the future).
It is also possible to ask libkate to remove this markup if the client prefers to receive
plain text without the markup.

=== Language encoding ===

A header field defines the language (if any) used in the stream (this can be overridden in a
data packet, but this is not relevant to this point). At the moment, my test code uses
ISO 639-1 two letter codes, but I originally thought to use RFC 3066 tags. However, matching
a language to a user selection may be simpler for user code if the language encoding is kept
simple. At the moment, I tend to favor allowing both two letter tags (eg, "en") and secondary
tags (like "en_EN"), as RFC 3066 tags can be quite complex, but I welcome comments on this.

If a stream contains more than one language, there usually is a predominant language, which
can be set as the default language for the stream. Each event can then have a language
override. If there is no predominant language, and it is not possible to split the stream
into multiple substreams, each with its own language, then it is possible to use the "mul"
language tag, as a last resort.

=== Bitstream format for floating point values ===

Floating point values are be turned to a 16.16 fixed point format, then stored in a bitpacked
format, storing the number of zero bits at the head and tail of the floating point values once
per stream, and the remainder bits for all values in the stream. This seems to yield good results
(typically a 50% reduction over 32 bits raw writes, and 70% over the snprintf based storage), and
has the big advantage of being portable (eg, independant of any IEEE format).
However, this means reduced precision due to the quantization to 16.16. I may add support for
variable precision (eg, 8.24 fixed point formats) to alleviate this. This would however mean less
space savings, though these are likely to be insignificant when Kate streams are interleaved with
a video.

*Though this is not a Kate issue per se, the motion feature is very difficult to use without a curve editor. While tools may be coded to create a Kate bitstream for various existing subtitle formats, it is not certain it will be easy to find a good authoring tool for a series of curves. That said, it's not exactly difficult to do if you know a widget set.

=== Higher dimensional curves/motions ===

It is quite annoying to have to create two motions to control a color change, due to curves
being restricted to two dimensions. I may add support for arbitrary dimensions. It would also
help for 1D motions, like changing the time flow, where one coordinate is simply ignored at
the moment.
Alternatively, changes could be made to the Kate file format to hide the two dimensionality and
allow simpler specification of non-2 dimensional motions, but still map them to 2D in the kate
bitstream format.

=== Category definition ===

The category field in the BOS packet is a 16 byte text field (15 really, as it is zero terminated
in the bitstream itself). Its goal is to provide the reader with a short description of what kind
of information the stream contains, eg subtitles, lyrics, etc. This would be displayed to the user,
possibly to allow to choose to turn some streams on and off.

Since this category is meant primarily for a machine to parse, they will be kept to ASCII. When
a player recognizes a category, it is free to replace its name with one in the user's language if
it prefers. Even in English, the "lyrics" category could be displayed by a player as "Lyrics".

Since this is a free text field rather than an enumeration, it would be good to have a list of
common predefined category names that Kate streams can use.

This is a list of proposed predefined categories, feedback/additions welcome:

* subtitles - the usual movie subtitles, as text
* spu-subtitles - movie subtitles in DVD style paletted images
* lyrics - song lyrics

Please remember the 15 character limit if proposing other categories.

Note that the list of categories is subject to change, and will likely
be replaced by new, more "identifier like" ones. The three ones above,
however, would be kept for backward compatibility as they're already used.

== Text to speech ==

One of the goals of the Kate bitstream format is that text data can be easily parsed
by the user of the decoder, so any additional information, such as style, placement,
karaoke data, etc, should be able to be stripped to leave only the bare text. This is
in view of allowing text-to-speech software to use Kate bitstreams as a bandwith-cheap
way of conveying speech data, and could also allow things like e-books which can be
either read or listened to from the same bitstream (I have seen no reference to this
being used anywhere, but I see no reason why the granule progression should be temporal,
and not user controlled, such as by using a "next" button which would bump a granule
postion by a preset amount, simulating turning a page (this would be close to necessary
for text-to-speech, as the wall time duration of the spoken speech is not known in
advance to the Kate encoder, and can't be mapped to a time based granule progression)).
All text strings triggered consecutively between the two granule positions would then
be read in order.

== Possible additions ==

=== Embedded binary data ===

Images and font mappings can be included within a Kate stream.

==== Images ====

Though this could be misused to interfere with ability to render as text-to-speech, Kate
can use images as well as text. The same caveat as for fonts applies with regard to data
duplication.

Complex images might however be best left to a multiplexed OggSpots or OggMNG stream, unless the
images mesh with the text (eg, graphical exclamation points, custom fonts, (see next
paragraph), etc).

There is support for simple paletted bitmap images, with a variable length palette of up
to 256 colors (in fact, sized in powers of 2 up to 256) and matching pixel data in as
many bits per pixel as can address the palette. Palettes and images are stored separately,
so can be used with one another with no fixed assignment.

Palettes and bitmaps are put in two separate header for later use by reference, but can
also be placed in data packets, as with motions, etc, if they are not going to be reused.

PNG bitmaps can also be embedded in a Kate stream. These do not have associated palettes
(but the PNGs themselves may or may not be paletted). There is no support for decoding PNG
images in libkate itself, so a program will have to use libpng (or similar code) to decode
the PNG image. For instance, the libtiger rendering library uses Cairo to decode and render
PNG images in Kate streams.

This can be used to have custom fonts, so that raw text is still available if the stream
creator wants a custom look.

I expect that the need for more than 256 colors in a bitmap, or non palette bitmap data,
would be best handled by another codec, eg OggMNG or OggSpots. The goal of images in a
Kate stream is to mesh the images with the text, not to have large images by themselves.

On the other hand, interesting Karaoke effects could be achieved by having MNG images
instead of simple paletted bitmaps in a Kate streams. Comments would be most welcome on
whether this is going too far, however.

I am also investigating SVG images. These allow for very small footprint images for simple
vector drawings, and could be very useful for things like background gradients below text.

A possible solution to the duplication issue is to have another stream in the container
stream, which would hold the shared data (eg, fonts), which the user program could load,
and which could then be used by any Kate (and other) stream. Typically, this type of stream
would be a degenerate stream with only header packets (so it is fully processed before any
other stream presents data packets that might make use of that shared data), and all payload
such as fonts being contained within the headers. Thinking about it, it has parallels with
the way Vorbis stores its codebooks within a header packet, or even the way Kate stores the
list of styles within a header packet.

==== Fonts ====

Custom fonts are merely a set of ranges mapping unicode code points to bitmaps. As this implies,
fonts are bitmap fonts, not vector fonts, so scaling, if supported by the rendering client,
may not look as good as with a vector font.

A style may also refer to a font name to use (eg, "Tahoma"). These fonts may or may not be
available on the playing system, however, since the font data is not included in the stream,
just referenced by name. For this reason, it is best to keep to widely known fonts.

== Reference encoder/decoder ==

A encoder (kateenc) and a decoder (katedec) are included in the tools directory.
The encoder supports input from several different formats:
* a custom text based file format (see [[#The Kate file format|The Kate file format]]), which is by no means meant to be part of the Kate bitstream specification itself
* SubRip (.srt), the most common subtitle format I found
* LRC lyrics format.

As an example for the widely used SRT subtitles format, the following command line
create a Kate subtitles stream from an SRT file:

kateenc -l en -c subtitles -t srt -o subtites.ogg subtitles.srt

The reverse is possible, to recover an SRT file from a Kate stream, with katedec.

Note that the subtitles.ogg file should then be multiplexed into the A/V stream,
using either ogg-tools or oggz-tools.

The Kate bitstreams encoded and decoded by those tools are (supposed to be) correct for this
specification, provided their input is correct.

== Next steps ==

=== Continuations ===

Continuations are a way to add to existing events, and are mostly meant for motions. When streaming
in real time, what motions may be applied to events may not be known in advance (for instance, for a
draw chat program where two programs exchange Kate streams, the drawing motions are only known as
they are drawn. Continuations will allow an event to be extended in time, and motions to be appended
to it. This is only useful for streaming, as when stored in a file, everything is already known in
advance.

=== A rendering library ===

This will allow easier integration in other packages (movie players, etc).
I have started working on an implementation using Cairo and Pango, though I'm still at the early stages.
I might add support for embedding vector fonts in a Kate stream if I was going that way. Still need to think about this.
Another point of note is that when this library is available, it would make it easier to add
capabilities such as rotation, scaling, etc, to the bitstream, since this would not cause too
much work for playing programs using the rendering library. It is expected that these additions
would stay backward compatible (eg, an old player would ignore this information but still correctly
decode the information they can work with from a newly encoded stream).

=== An XML representation ===

While I purposefully did not write Kate description files in XML due to me finding editing XML such
a chore, it would be nice to be able to losslessly convert between the more user friendly representation
and an XML document, so one can do what one does with XML documents, like transformations.

And after all, some people might prefer editing the XML version.

=== Packaging ===

It would be really nice to have packages for libkate/libtiger for many distros.

If you're a packager for a distro which doesn't have yet packages for libkate
or libtiger, please consider helping :)

In particular, packages for Debian would be grand.

== Matroska mapping ==

The codec ID is "S_KATE".

As for Theora and Vorbis, Kate headers are stored in the private data as xiph-laced packets:

Byte 0: number of packets present, minus 1 (there must be at least one packet) - let this number be NP
Bytes 1..n: lengths of the first NP packets, coded in xiph style lacing
Bytes n+1..end: the data packets themselves concatenated one after the other

Note that the length of the last packet isn't encoded, it is deduced from the sizes of the other
packets and the total size of the private data.

This mapping is similar to the Vorbis and Theora mappings, with the caveat that one should not
expect a set number of headers.

== Downloading ==

libkate encodes and decodes Kate streams, and is API and ABI stable.

The libkate source distribution is available at [http://libkate.googlecode.com/ http://libkate.googlecode.com/].

A public git repository is available at [http://git.xiph.org/?p=users/oggk/kate.git;a=summary http://git.xiph.org/?p=users/oggk/kate.git;a=summary].

libtiger renders Kate streams using Pango and Cairo, and is alpha, with API changes still possible.

The libtiger source distribution is available at [http://libtiger.googlecode.com/ http://libtiger.googlecode.com/].

A public git repository is available at [http://git.xiph.org/?p=users/oggk/tiger.git;a=summary http://git.xiph.org/?p=users/oggk/tiger.git;a=summary].

== HOWTOs ==

These paragraphs describe a few ways to use Kate streams:

=== Text movie subtitles ===

Kate streams can carry Unicode text (that is, text that can represent
pretty much any existing language/script). If several Kate streams are
multiplexed along with a video, subtitles in various languages can be
made for that movie.

An easy way to create such subtitles is to use ffmpeg2theora, which
can create Kate streams from SubRip (.srt) format files, a simple but
common text subtitles format. ffmpeg2theora 0.21 or later is needed.

At its simplest:

ffmpeg2theora -o video-with-subtitles.ogg --subtitles subtitles.srt
video-without-subtitles.avi

Several languages may be created and tagged with their language code
for easy selection in a media player:

ffmpeg2theora -o video-with-subtitles.ogg video-without-subtitles.avi
--subtitles japanese-subtitles.srt --subtitles-language ja
--subtitles welsh-subtitles.srt --subtitles-language cy
--subtitles english-subtitles.srt --subtitles-language en_GB

Alternatively, kateenc (which comes with the libkate distribution) can
create Kate streams from SubRip files as well. These can then be merged
with a video with oggz-tools:

kateenc -t srt -c SUB -l it -o subtitles.ogg italian-subtitles.srt
oggz merge -o movie-with-subtitles.ogg movie-without-subtitles.ogg subtitles.ogg

This second method can also be used to add subtitles to a video which
is already encoded to Theora, as it will not transcode the video again.

=== DVD subtitles ===

DVD subtitles are not text, but images. Thoggen, a DVD ripper program,
can convert these subtitles to Kate streams (at the time of writing,
Thoggen and GStreamer have not applied the necessary patches for this
to be possible out of the box, so patching them will be required).

When configuring how to rip DVD tracks, any subtitles will be detected
by Thoggen, and selecting them in the GUI will cause them to be saved as
Kate tracks along with the movie.

=== Song lyrics ===

Kate streams carrying song lyrics can be embedded in an Ogg file. The
oggenc Vorbis encoding tool from the Xiph.Org Vorbis tools allows lyrics
to be loaded from a LRC or SRT text file and converted to a Kate stream
multiplexed with the resulting Vorbis audio. At the time of writing,
the patch to oggenc was not applied yet, so it will have to be patched
manually with the patch found in the diffs directory.

oggenc -o song-with-lyrics.ogg --lyrics lyrics.lrc --lyrics-language en_US song.wav

So called 'enhanced LRC' files (containing extra karaoke timing information)
are supported, and a simple karaoke color change scheme will be saved
out for these files. For more complex karaoke effects (such as more
complex style changes, or sprite animation), kateenc should be used with
a Kate description file to create a separate Kate stream, which can then
be merged with a Vorbis only song with oggz-tools:

oggenc -o song.ogg song.wav
kateenc -t kate -c LRC -l en_US -o lyrics.ogg lyrics-with-karaoke.kate
oggz merge -o song-with-karaoke.ogg lyrics-with-karaoke.ogg song.ogg

This latter method may also be used if you already have an encoded Vorbis song
with no lyrics, and just want to add the lyrics without reencoding.

=== Metadata ===

Metadata can be attached to events, or to styles, bitmaps, regions, etc.
Metadata are free form tag/value pairs, and can be used to enrich their
attached data with extra information. However, how this information is
interpreted is up to the application layer.

It is worth noting that an event may not have attached text, so it is
possible to create an empty timed event with attached metadata.

For instance, let's say we have a documentary, with footage from various
places, as well as short interviews, and we want two things:
- tag footage with metadata about the location and date that footage was shot
- subtitle the interviews and tag those subtitles with information about the speaker

You can then create an empty Kate event for each footage part, synchronized
with the footage, and attach a new metadata item called GEO_LOCATION, filled
with latitude and longitude of the place the footage was shot at.
Similarly, for each subtitle event, a metadata item called SPEAKER can be
attached.

An empty event to tag a long 4:20 footage shot in Tokyo on 2011/08/12, and
inserted at 18:30 in the documentary could look like:

event {
00:18:30,000 --> 00:22:50,000
meta "GEO_LOCATION" = "35.42; 139.42"
meta "DATE" = "2011-08-12"
}

Here's a example for a line spoken by Dr Joe Bloggs at 18:30 into the documentary:

event {
00:18:30,000 --> 00:18:32,000
"Notice how the subtitles for my words have metadata attached to them"
meta "SPEAKER" = "Dr Joe Bloggs"
meta "URL" = "http://www.example.com/biography?name=Joe+Bloggs"
}

Notice how another metadata item, URL, is also present. The application
will have to be aware of those metadata in order to do something with it
though. Since those are free form, it is up to you to think of what
metadata you want, and make use of it.

Note that metadata may be attached to other objects, such as regions.
This way, you can for example create a region tagged with a name, and
track a person's movements with that region. Or you can tag a bitmap
with a copyright and a URL to a larger version of the image.

=== Changing a Kate stream embedded in an Ogg stream ===

If you need to change a Kate stream already embedded in an Ogg stream (eg, you have a movie with subtitles, and you want to fix a spelling mistake, or want to bring one of the subtitles forward in time, etc), you can do this easily with KateDJ, a tool that will extract Kate streams, decode them to a temporary location, and rebuild the original stream after you've made whatever changes you want.

KateDJ (included with the libkate distribution) is a GUI program using wxPython, a Python module for the wxWidgets GUI library, and the oggz tools (both needing installing separately if they are not already).

The procedure consists of:

* Run KateDJ
* Click 'Load Ogg stream' and select the file to load
* Click 'Demux file' to decode Kate streams in a temporary location
* Edit the Kate streams (a message box tells you where they are placed)
* When done, click 'Remux file from parts'
* If any errors are reported, continue editing until the remux step succeeds

== Frequently Asked Questions ==

=== Does libkate work on other plaforms than Linux ? ===

Yes, libkate is not Linux specific in any way. It optionally relies on libogg
and libpng, two libraries widely ported to various platforms.
It has been reported to work on Windows and MacOS X as well as UNIX platforms.

However, libtiger, a rendering library for Kate streams, relies on [http://www.pango.org/ Pango] and [https://www.cairographics.org/ Cairo],
which are not easy to build on Windows, though they can be.
The Tiger renderer is however completely separate from libkate, and is not needed
for full encoding and decoding of Kate streams.

=== Where can I find some example files ? ===

The libkate distribution can generate various examples, but already built files
can be found there:
[http://people.xiph.org/~oggk/elephants_dream/elephantsdream-with-subtitles.ogg]
[http://stallman.org/fry/Stephen_Fry-Happy_Birthday_GNU-nq_600px_425kbit.ogv]

These files use raw text only.

[[Category:Ogg Mappings]]

OggKate

2017-11-21T09:15:06Z

MrZeus: trim trrailing spaces and surplus blank lines

== Disclaimer ==
This is not a Xiph codec, though it may be embedded in Ogg alonside other Xiph
codecs, such as Vorbis and Theora. As such, please do not assume that Xiph has
anything to do with this, much less responsibility.

== What is Kate? ==

Kate is an overlay codec, originally designed for karaoke and text, that can be
multiplexed in Ogg.

Text and images can be carried and animated by a Kate stream.
Most of the time, they will (optionally) be multiplexed with audio/video to carry subtitles,
song lyrics (with or without karaoke data), etc.

Series of curves (splines, segments, etc) may be attached to various properties
(text position, font size, etc) to create animated overlays. This allows scrolling
or fading text to be defined. This can even be used to draw arbitrary shapes, so
hand drawing can also be represented by a Kate stream.

Example uses of Kate streams are movie subtitles for Theora videos, either text based,
as may be created by [http://www.v2v.cc/~j/ffmpeg2theora ffmpeg2theora], or image
based, such as created by [http://thoggen.net Thoggen] (patching needed), and lyrics,
as created by oggenc, from vorbis-tools.

== Why a new codec? ==

As I was adding support for Theora, Speex and FLAC to some software of mine, I found myself
wanting to have song lyrics accompanying Vorbis audio. Since Vorbis comments are limited to
the headers, one can't add them in the stream as they are sung, so another multiplexed stream
would be needed to carry them.

The three possible bases usable for such a codec I found were Writ, CMML, and OGM/SRT.

*[[OggWrit|Writ]] is an unmaintained start at an implementation of a very basic design, though I did find an encoder/decoder in py-ogg2 later on - I'd been quicker to write Kate from scratch anyway.
*[[CMML]] is more geared towards encapsulating metadata about an accompanying stream, rather than being a data stream itself, and seemed complex for a simple use, though I have now revised my view on this - besides, it seems designed for Annodex (which I haven't had a look at), though it does seems relatively generic for use outwith Annodex - though it is being "repurposed" as timed text now, bringing it closer to what I'm doing
*OGM/SRT, which I only found when I added Kate support to MPlayer, is shoehorning various data formats into an Ogg stream, and just dumps the SRT subtitle format as is, AFAICS (though I haven't looked at this one in detail, since I'd already had a working Kate implementation by that time)

I then decided to roll my own, not least because it's a fun thing to do.

I found other formats, such as USF (designed for inclusion in Matroska) and various subtitle formats,
but none were designed for embedding inside an Ogg container.

== Overview of the Kate bitstream format ==

I've taken much inspiration from Vorbis and Theora here.
Headers and packets (as well as the API design) follow the design of these two codecs.

A rough overview (see [[#Format specification|Format specification]] for more details) is:

Headers packets:
*ID header [BOS]: magic, version, granule fraction, encoding, language, etc
*Comment header: Vorbis comments, as per Vorbis/Theora streams
*Style definitions header: a list of predefined styles to be referred to by data packets
*Region definitions header: a list of predefined regions to be referred to by data packets
*Curves definitions header: a list of predefined curves to be referred to by data packets
*Motion definitions header: a list of predefined motions to be referred to by data packets
*Palette definitions header: a list of predefined palettes to be referred to by data packets
*Bitmap definitions header: a list of predefined bitmaps to be referred to by data packets
*Font mapping definitions header: a list of predefined font mappings to be referred to by data packets

Other header packets are ignored, and left for future expansion.

Data packets:
*text data: text/image and optional motions, accompanied by optional overrides for style, region, language, etc
*keepalive: can be emitted at any time to help a demuxer know where we're at, but those packets are optional
*repeats: a verbatim repeat of a text packet's payload, in order to bound any backward seeking needed when starting to play a stream partway through. These are also optional.
*end data [EOS]: marks the end of the stream, it doesn't have any useful payload

Other data packets are ignored, and left for future expansion.

The intent of the "keepalive" packet is to be sent at regular
intervals when no other packet has been emitted for a while. This would be to help seeking code
find a kate page more easily.

Things of note:
*Kate is a discontinuous codec, as defined in [http://www.xiph.org/ogg/doc/ogg-multiplex.html ogg-multiplex.html] in the Ogg documentation, which means it's timed by start granule, not end granule (as Theora and Vorbis).
* All data packets are on their own page, for two reasons:
**Ogg keeps track of granules at the page level, not the packet level
**if no text event happens for a while after a particular text event, we don't want to delay it so a larger page can be issued

See also [[#Seeking and memory|Problems to solve: Seeking and memory]].

*The granule encoding is not a direct time/granule correspondance, see the granule encoding section.
*The EOS packet should have a granule pos higher or equal to the end time of all events.
*User code doesn't have to know the number of headers to expect, this is moved inside the library code (as opposed to Vorbis and Theora).
*The format contains hooks so that additional information may be added in future revisions while keeping backward compatibility (though old decoders will correctly parse, but ignore the new information).

== Format specification ==

The Kate bitstream format consists of a number of sequential packets.
Packets can be either header packets or data packets. All header packets
must appear before any data packet.

Header packets must appear in order. Decoding of a data packet is not
possible until all header packets have been decoded.

Each Kate packet starts with a one byte type. A type with the MSB set
(eg, between 0x80 and 0xff) indicates a header packet, while a type with
the MSB cleared (eg, between 0x00 and 0x7f) indicates a data packet.
All header packets then have the Kate magic, from byte offset 1 to byte
offset 7 ("kate\0\0\0"). Note that this applies only to header packets:
data packets do not contain the Kate signature.

Since the ID header must appear first, a Kate stream can be recognized
by comparing the first eight bytes of the first packet with the signature
string "\200kate\0\0\0".

When embedded in Ogg,the first packet in a Kate stream (always packet type 0x80,
the id header packet) must be placed on a separate page. The corresponding Ogg
packet must be marked as beginning of stream (BOS).All subsequent header packets
must be on one or more pages. Subsequently, each data packet must be on a separate
page.

The last data packet must be the end of stream packet (packet type 0x7f).

When embedded in Ogg, the corresponding Ogg packet must be marked as end of stream (EOS).

As per the Ogg specification, granule positions must be non decreasing
within the stream. Header packets have granule position 0.

Currently existing packet types are:
:headers:
::0x80 ID header (BOS)
::0x81 Vorbis comment header
::0x82 regions list header
::0x83 styles list header
::0x84 curves list header
::0x85 motions list header
::0x86 palettes list header
::0x87 bitmaps list header
::0x88 font ranges and mappings header
:data:
::0x00 text data (including optional motions and overrides)
::0x01 keepalive
::0x02 repeat
::0x7f end packet (EOS)

This format described here is for bitstream version 0.x.
As or 19 december 2008, the latest bitstream version is 0.4.

For more detailed information, refer to the format documentation
in libkate (see URL below in the [[#Downloading|Downlading]] section).

Following is the definition of the ID header (packet type 0x80).
This works out to a 64 byte ID header. This is the header that should be
used to detect a Kate stream within an Ogg stream.

0 1 2 3 |
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| packtype | Identifier char[7]: 'kate\0\0\0' | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| kate magic continued | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| reserved - 0 | version major | version minor | num headers | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| text encoding | directionality| reserved - 0 | granule shift | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cw sh | canvas width | ch sh | canvas height | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| reserved - 0 | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| granule rate numerator | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| granule rate denominator | 28-31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (NUL terminated) | 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (continued) | 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (continued) | 40-43
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (continued) | 44-47
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (NUL terminated) | 48-51
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (continued) | 52-55
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (continued) | 56-59
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (continued) | 60-63
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The fields cw sh, canvas width, cw sh, and canvas height were introduced
in bistream 0.3. Earlier bitstreams will have 0 in these fields.

language and category are NUL terminating ASCII strings.
Language follows RFC 3066, though obviously will not accommodate language tags
with lots of subtags.

Category is currently loosely defined, and I haven't found yet a nice way to
present it in a generic way, but is meant for automatic classifying of
various multiplexed Kate streams (eg, to recognize that some streams are
subtitles (in a set of languages), and some others are commentary (in a
possibly different set of languages, etc).

== API overview ==

libkate offers an API very similar to that of libvorbis and libtheora, as well as
an extra higher level decoding API.

Here's an overview of the three main modules:

=== Decoding ===

Decoding is done in a way similar to libvorbis. First, initialize a kate_info and a
kate_comment structure. Then, read headers by calling kate_decode_headerin. Once
all headers have been read, a kate_state is initialized for decoding using kate_decode_init,
and kate_decode_packetin is called repeatedly with data packets. Events (eg, text) can be
retrieved via kate_decode_eventout.

=== Encoding ===

Encoding is also done in a way similar to libvorbis. First initialize a kate_info
and a kate_comment structure, and fill them out as needed. kate_encode_headers will
create ogg packets from those. Then, kate_encode_text is called repeatedly for all
the text events to add. When done, calling kate_encode_finish will create an end of
stream packet.

=== High level decoding API ===

There are only 3 calls here:

kate_high_decode_init
kate_high_decode_packetin
kate_high_decode_clear

Here, all Ogg packets are sent to kate_high_decode_packetin, which does the right
thing (header/data classification, decoding, and event retrieval). Note that you
do not get access to the comments directly using this, but you do get access to the
kate_info via events.

The libkate distribution includes commented examples for each of those.

Additionally, libkate includes a layer (liboggkate) to make it easier to use when
embedded in Ogg. While the normal API uses kate_packet structures, liboggkate uses
ogg_packet structures.

The High level decoding API does not have an Ogg specific layer, but functions exist
to wrap a kate_packet around a memory buffer (such as the one ogg_packet uses, for instance).

== Support ==

Among the software with Kate support:
*VLC
*ffmpeg2theora
*liboggz
*liboggplay
*Cortado (wikimedia version)
*vorbis-tools

I have patches for the following with Kate support:
*MPlayer
*xine
*GStreamer
*Thoggen
*Audacious
*and more...

These may be found in the libkate source distribution (see [[#Downloading|Downloading]]
for links).

In addition, libtiger is a rendering library for Kate streams using Pango and Cairo,
though it is not quite yet API stable (though no major changes are expected).

== Granule encoding ==

=== Ogg ===

Ogg leaves the encoding of granules up to a particular codec, only
mandating that granules be non decreasing with time.

The Kate bitstream format uses a linear mapping between time and
granule, described here.

A Kate granule position is composed of two different parts:
- a base granule, in the high bits
- a granule offset, in the low bits

+----------------+----------------+
| base | offset |
+----------------+----------------+

The number of bits these parts occupy is variable, and each stream
may choose how many bits to dedicate to each. The kate_info structure
for a stream holds that information in the granule_shift field,
so each part may be reconstructed from a granulepos.

The timestamp T of a given Kate packet is split into a base B and
offset O, and these are stored in the granulepos of that packet.
The split is done such that the B is the time of the earliest event
still active at the time, and the O is the time elapsed between B
and T. Thus, T = B + O. This mimics the way Theora stores its own
timestamps in granulepos, where the base acts as a keyframe, and
an offset acts as the position of an intra frame from the previous
keyframe. Since Kate allows time overlapping events, however, the
choice of the base to use is slightly more complex, as it may not
be the starting time of the previous event, if the stream contains
time overlapping events.

The kate_info structure for a stream holds a rational fraction
representing the time span of granule units for both the base and
the offset parts.

The granule rate is defined by the two fields:

kate_info::gps_numerator
kate_info::gps_denominator

The number of bits reserved for the offset is defined by the field:

kate_info::granule_shift

=== Generic timing ===

Kate data packets (data packet type 0) includes timing information (start time,
end time, and time of the earliest event still active). All these are stored as
64 bit at the rate defined by the granule rate, so they do not suffer from the
granule_shift space limitation.

This also allows for Kate streams to be stored in other containers.

== Motion ==

The Kate bitstream format includes motion definition, originally for karaoke purposes, but
which can be used for more general purpose, such as line based drawing, or animation of
the text (position, color, etc)

Motions are defined by the means of a series of curves (static points, segments, splines (catmull-rom, bezier, and b-splines)).
A 2D point can be obtained from a motion for any timestamp during the lifetime of a text.
This can be used for moving a marker in 2D above the text for karaoke, or to use the x
coordinate to color text when the motion position passes each letter or word, etc.
Motions have an attached semantics so the client code knows how to use a particular motion.
Predefined semantics include text color, text position, etc).

Since a motion can be composed of an arbitrary number of curves, each of which may have
an arbitrary number of control points, complex motions can be achieved. If the motion is
the main object of an event, it is even possible to have an empty text, and use the motion
as a virtual pencil to draw arbitrary shapes. Even on-the-fly handwriting subtitles could
be done this way, though this would require a lot of control points, and would not be able
to be used with text-to-speech.

As a proof of concept, I also have a "draw chat" program where two people can draw, and
the shapes are turned to b-splines and sent as a kate motion to be displayed on the other
person's window.

It is also possible for motions to be discontinuous - simply insert a curve of 'none' type.
While the timestamp lies within such a curve, no 2D point will be generated. This can be
used to temporarily hide a marker, for instance.

It is worth mentionning that pauses in the motion can be trivially included by inserting
at the right time and for the right duration a simple linear interpolation curve with only
two equal points, equal to the position the motion is supposed to pause at.

Kate defines a set of predefined mappings so that each decoder user interprets a motion in
the same way. A mapping is coded on 8 bits in the bitstream, and the first 128 are reserved
for Kate, leaving 128 for application specific mappings, to avoid constraining creative uses
of that feature. Predefined mappings include frame (eg, 0-1 points are mapped to the size of
the current video frame), or region, to scale 0-1 to the current region. This allows curves
to be defined without knowing in advance the pixel size of the area it should cover.

For uses which require more than two coordinates (eg, text color, where 4 (RGBA) values are
needed, Kate predefines the semantics text_color_rg and text_color_ba, so a 4D point can be
obtained using two different motions.

There are higher level constructs, such as morphing between two styles, or predefined
karaoke effects. More are planned to be added in the future.

See also [[#Trackers|Trackers]].

== Trackers ==

Since attaching motions to text position, etc, makes it hard for the client to keep track of
everything, doing interpolation, etc, the library supplies a tracker object, which handles the
interpolation of the relevant properties.
Once initialized with a text and a set of motions, the client code can give the tracker a new
timestamp, and get back the current text position, text color, etc.

Using a tracker is not necessary, if one wants to use the motions directly, or just ignore them,
but it makes life easier, especially when considering the the order in which motions are applied
does matter (to be defined formally, but the current source code is informative at this point).

== The Kate file format ==

Though this is not a feature of the bitstream format, I have created a text file format to
describe a series of events to be turned into a Kate bitstream.
At its minimum, the following is a valid input to the encoder:

: kate {
:: event { 00:00:05 --> 00:00:10 "This is a text" }
: }

This will create a simple stream with "This is a text" emitted at an offset of 5 seconds into
the track, lasting 5 seconds to an end time at 10 seconds.

Motions, regions, styles can be declared in a definitions block to be reused by events, or can
be defined inline. Defining those in the definitions block places them in a header so they can
be reused later, saving space. However, they can also be defined in each event, so they will be
sent with the event. This allows them to be generated on the fly (eg, if the bitstream is being
streamed from a realtime input).

For convenience, the Kate file format also allows C style macros, though without parameters.

Please note that the Kate file format is fully separate from the Kate bitstream format. The
difference between the two is similar to the difference between a C source file and the resulting
object file, when compiled.

Note that the format is not based on XML for a very parochial reason: I tend to dislike very
much editing XML by hand, as it's really hard to read. XML is really meant for machines to parse
generically text data in a shared syntax but with possibly unknown semantics, and I need those
text representations to be editable easily.

This also implies that there could be an XML representation of a Kate stream, which would be
useful if one were to make an editor that worked on a higher level than the current all-text
representation, and it is something that might very well happen in the future, in parallel with
the current format.

== Karaoke ==

Karaoke effects rely on motions, and there will be predefined higher level ways of specifying
timings and effects, two of which are already done.

As an example, this is a valid Karaoke script:

:kate {
:: simple_timed_glyph_style_morph {
::: from style "start_style" to style "end_style"
::: "Let " at 1.0
::: "us " at 1.2
::: "sing " at 1.4
::: "to" at 2.0
::: "ge" at 2.5
::: "ther" at 3.0
:: }
:}

The syllables will change from a style to another as time passes. The definition of the start_style
and end_style styles is omitted for brevity.

== Problems to solve ==

There are a few things to solve before the Kate bitstream format can be considered good
enough to be frozen:

Note: the following is mostly solved, and the bitstream is now stable, and has been
backward and forward compatible since the first released version. This will be updated
when I get some time.

=== Seeking and memory ===

When seeking to a particular time in a movie with subtitles, we may end up at a place when a subtitle has been started, but is not removed yet. Pure streaming doesn't have this problem as it remembers the subtitle being issued (as opposed to, say, Vorbis, for which all data valid now is decoded from the last packet). With Kate, a text string valid now may have been issued long ago.

I see three possible ways to solve this:
*each data packet includes the granule of the earliest still active packet (if none, this will be the granule of this very packet)
**this means seeks are two phased: first seek, find the next Kate packet, and seek again if the granule of the earlier still active packet is less than the original seeked granule. This implies support code on players to do the double seek.

*use "reference frames", a bit like Theora does, where the granule position is split in several fields: the higher bits represent a position for the reference frame, and the lowest bits a delta time to the current position. When seeking to a granule position, the lower bits are cleared off, yielding the granule position of the previous reference frame, so the seek ends up at the reference frame. The reference frame is a sync point where any active strings are issued again. This is a variant of the method described in the Writ wiki page, but the granule splitting avoids any "downtime".
**this requires reissuing packets, and it doesn't feel right (and wastes space).
**it also requires "dummy" decoding of Kate data from the reference frame to the actual seek point to fully refresh the state "memory".

*A variant of the two-granules-in-one system used by libcmml, where the "back link" points to the earliest still active string, rather than the previous one (this allows a two phase seek, rather than a multiphase seek, hopping back from event to event, with no real way to know if there is or not a previous event which is still active - I suppose CMML has no need to know this, if their "clips" do not overlap - mine can do).
**Such a system considerably shortens the usable granule space, though it can do a one phase seek, if I understand the system correctly, which I am not certain.
*** Well, it seems it can't do a one phase seek anyway.

*Additionally, it could be possible to emit simple "keepalive" packets at regular intervals to help a seek algorithm to sync up to the stream without needing too much data reading - this helps for discontinuous streams where there could be no pages for a while if no data is needed at that time.

=== Text encoding ===

A header field declares the text encoding used in the stream. At the moment, only UTF-8 is
supported, for simplicity. There are no plans to support other encodings, such as UTF-16,
at the moment.

Note that strings included in the header (language, category) are not affected by that
language encoding (rather obviously for language itself). These are ASCII.

The actual text in events may include simple HTML-like markup (at the moment, allowed markup
is the same as the one Pango uses, but more markup types may be defined in the future).
It is also possible to ask libkate to remove this markup if the client prefers to receive
plain text without the markup.

=== Language encoding ===

A header field defines the language (if any) used in the stream (this can be overridden in a
data packet, but this is not relevant to this point). At the moment, my test code uses
ISO 639-1 two letter codes, but I originally thought to use RFC 3066 tags. However, matching
a language to a user selection may be simpler for user code if the language encoding is kept
simple. At the moment, I tend to favor allowing both two letter tags (eg, "en") and secondary
tags (like "en_EN"), as RFC 3066 tags can be quite complex, but I welcome comments on this.

If a stream contains more than one language, there usually is a predominant language, which
can be set as the default language for the stream. Each event can then have a language
override. If there is no predominant language, and it is not possible to split the stream
into multiple substreams, each with its own language, then it is possible to use the "mul"
language tag, as a last resort.

=== Bitstream format for floating point values ===

Floating point values are be turned to a 16.16 fixed point format, then stored in a bitpacked
format, storing the number of zero bits at the head and tail of the floating point values once
per stream, and the remainder bits for all values in the stream. This seems to yield good results
(typically a 50% reduction over 32 bits raw writes, and 70% over the snprintf based storage), and
has the big advantage of being portable (eg, independant of any IEEE format).
However, this means reduced precision due to the quantization to 16.16. I may add support for
variable precision (eg, 8.24 fixed point formats) to alleviate this. This would however mean less
space savings, though these are likely to be insignificant when Kate streams are interleaved with
a video.

*Though this is not a Kate issue per se, the motion feature is very difficult to use without a curve editor. While tools may be coded to create a Kate bitstream for various existing subtitle formats, it is not certain it will be easy to find a good authoring tool for a series of curves. That said, it's not exactly difficult to do if you know a widget set.

=== Higher dimensional curves/motions ===

It is quite annoying to have to create two motions to control a color change, due to curves
being restricted to two dimensions. I may add support for arbitrary dimensions. It would also
help for 1D motions, like changing the time flow, where one coordinate is simply ignored at
the moment.
Alternatively, changes could be made to the Kate file format to hide the two dimensionality and
allow simpler specification of non-2 dimensional motions, but still map them to 2D in the kate
bitstream format.

=== Category definition ===

The category field in the BOS packet is a 16 byte text field (15 really, as it is zero terminated
in the bitstream itself). Its goal is to provide the reader with a short description of what kind
of information the stream contains, eg subtitles, lyrics, etc. This would be displayed to the user,
possibly to allow to choose to turn some streams on and off.

Since this category is meant primarily for a machine to parse, they will be kept to ASCII. When
a player recognizes a category, it is free to replace its name with one in the user's language if
it prefers. Even in English, the "lyrics" category could be displayed by a player as "Lyrics".

Since this is a free text field rather than an enumeration, it would be good to have a list of
common predefined category names that Kate streams can use.

This is a list of proposed predefined categories, feedback/additions welcome:

* subtitles - the usual movie subtitles, as text
* spu-subtitles - movie subtitles in DVD style paletted images
* lyrics - song lyrics

Please remember the 15 character limit if proposing other categories.

Note that the list of categories is subject to change, and will likely
be replaced by new, more "identifier like" ones. The three ones above,
however, would be kept for backward compatibility as they're already used.

== Text to speech ==

One of the goals of the Kate bitstream format is that text data can be easily parsed
by the user of the decoder, so any additional information, such as style, placement,
karaoke data, etc, should be able to be stripped to leave only the bare text. This is
in view of allowing text-to-speech software to use Kate bitstreams as a bandwith-cheap
way of conveying speech data, and could also allow things like e-books which can be
either read or listened to from the same bitstream (I have seen no reference to this
being used anywhere, but I see no reason why the granule progression should be temporal,
and not user controlled, such as by using a "next" button which would bump a granule
postion by a preset amount, simulating turning a page (this would be close to necessary
for text-to-speech, as the wall time duration of the spoken speech is not known in
advance to the Kate encoder, and can't be mapped to a time based granule progression)).
All text strings triggered consecutively between the two granule positions would then
be read in order.

== Possible additions ==

=== Embedded binary data ===

Images and font mappings can be included within a Kate stream.

==== Images ====

Though this could be misused to interfere with ability to render as text-to-speech, Kate
can use images as well as text. The same caveat as for fonts applies with regard to data
duplication.

Complex images might however be best left to a multiplexed OggSpots or OggMNG stream, unless the
images mesh with the text (eg, graphical exclamation points, custom fonts, (see next
paragraph), etc).

There is support for simple paletted bitmap images, with a variable length palette of up
to 256 colors (in fact, sized in powers of 2 up to 256) and matching pixel data in as
many bits per pixel as can address the palette. Palettes and images are stored separately,
so can be used with one another with no fixed assignment.

Palettes and bitmaps are put in two separate header for later use by reference, but can
also be placed in data packets, as with motions, etc, if they are not going to be reused.

PNG bitmaps can also be embedded in a Kate stream. These do not have associated palettes
(but the PNGs themselves may or may not be paletted). There is no support for decoding PNG
images in libkate itself, so a program will have to use libpng (or similar code) to decode
the PNG image. For instance, the libtiger rendering library uses Cairo to decode and render
PNG images in Kate streams.

This can be used to have custom fonts, so that raw text is still available if the stream
creator wants a custom look.

I expect that the need for more than 256 colors in a bitmap, or non palette bitmap data,
would be best handled by another codec, eg OggMNG or OggSpots. The goal of images in a
Kate stream is to mesh the images with the text, not to have large images by themselves.

On the other hand, interesting Karaoke effects could be achieved by having MNG images
instead of simple paletted bitmaps in a Kate streams. Comments would be most welcome on
whether this is going too far, however.

I am also investigating SVG images. These allow for very small footprint images for simple
vector drawings, and could be very useful for things like background gradients below text.

A possible solution to the duplication issue is to have another stream in the container
stream, which would hold the shared data (eg, fonts), which the user program could load,
and which could then be used by any Kate (and other) stream. Typically, this type of stream
would be a degenerate stream with only header packets (so it is fully processed before any
other stream presents data packets that might make use of that shared data), and all payload
such as fonts being contained within the headers. Thinking about it, it has parallels with
the way Vorbis stores its codebooks within a header packet, or even the way Kate stores the
list of styles within a header packet.

==== Fonts ====

Custom fonts are merely a set of ranges mapping unicode code points to bitmaps. As this implies,
fonts are bitmap fonts, not vector fonts, so scaling, if supported by the rendering client,
may not look as good as with a vector font.

A style may also refer to a font name to use (eg, "Tahoma"). These fonts may or may not be
available on the playing system, however, since the font data is not included in the stream,
just referenced by name. For this reason, it is best to keep to widely known fonts.

== Reference encoder/decoder ==

A encoder (kateenc) and a decoder (katedec) are included in the tools directory.
The encoder supports input from several different formats:
* a custom text based file format (see [[#The Kate file format|The Kate file format]]), which is by no means meant to be part of the Kate bitstream specification itself
* SubRip (.srt), the most common subtitle format I found
* LRC lyrics format.

As an example for the widely used SRT subtitles format, the following command line
create a Kate subtitles stream from an SRT file:

kateenc -l en -c subtitles -t srt -o subtites.ogg subtitles.srt

The reverse is possible, to recover an SRT file from a Kate stream, with katedec.

Note that the subtitles.ogg file should then be multiplexed into the A/V stream,
using either ogg-tools or oggz-tools.

The Kate bitstreams encoded and decoded by those tools are (supposed to be) correct for this
specification, provided their input is correct.

== Next steps ==

=== Continuations ===

Continuations are a way to add to existing events, and are mostly meant for motions. When streaming
in real time, what motions may be applied to events may not be known in advance (for instance, for a
draw chat program where two programs exchange Kate streams, the drawing motions are only known as
they are drawn. Continuations will allow an event to be extended in time, and motions to be appended
to it. This is only useful for streaming, as when stored in a file, everything is already known in
advance.

=== A rendering library ===

This will allow easier integration in other packages (movie players, etc).
I have started working on an implementation using Cairo and Pango, though I'm still at the early stages.
I might add support for embedding vector fonts in a Kate stream if I was going that way. Still need to think about this.
Another point of note is that when this library is available, it would make it easier to add
capabilities such as rotation, scaling, etc, to the bitstream, since this would not cause too
much work for playing programs using the rendering library. It is expected that these additions
would stay backward compatible (eg, an old player would ignore this information but still correctly
decode the information they can work with from a newly encoded stream).

=== An XML representation ===

While I purposefully did not write Kate description files in XML due to me finding editing XML such
a chore, it would be nice to be able to losslessly convert between the more user friendly representation
and an XML document, so one can do what one does with XML documents, like transformations.

And after all, some people might prefer editing the XML version.

=== Packaging ===

It would be really nice to have packages for libkate/libtiger for many distros.

If you're a packager for a distro which doesn't have yet packages for libkate
or libtiger, please consider helping :)

In particular, packages for Debian would be grand.

== Matroska mapping ==

The codec ID is "S_KATE".

As for Theora and Vorbis, Kate headers are stored in the private data as xiph-laced packets:

Byte 0: number of packets present, minus 1 (there must be at least one packet) - let this number be NP
Bytes 1..n: lengths of the first NP packets, coded in xiph style lacing
Bytes n+1..end: the data packets themselves concatenated one after the other

Note that the length of the last packet isn't encoded, it is deduced from the sizes of the other
packets and the total size of the private data.

This mapping is similar to the Vorbis and Theora mappings, with the caveat that one should not
expect a set number of headers.

== Downloading ==

libkate encodes and decodes Kate streams, and is API and ABI stable.

The libkate source distribution is available at [http://libkate.googlecode.com/ http://libkate.googlecode.com/].

A public git repository is available at [http://git.xiph.org/?p=users/oggk/kate.git;a=summary http://git.xiph.org/?p=users/oggk/kate.git;a=summary].

libtiger renders Kate streams using Pango and Cairo, and is alpha, with API changes still possible.

The libtiger source distribution is available at [http://libtiger.googlecode.com/ http://libtiger.googlecode.com/].

A public git repository is available at [http://git.xiph.org/?p=users/oggk/tiger.git;a=summary http://git.xiph.org/?p=users/oggk/tiger.git;a=summary].

== HOWTOs ==

These paragraphs describe a few ways to use Kate streams:

=== Text movie subtitles ===

Kate streams can carry Unicode text (that is, text that can represent
pretty much any existing language/script). If several Kate streams are
multiplexed along with a video, subtitles in various languages can be
made for that movie.

An easy way to create such subtitles is to use ffmpeg2theora, which
can create Kate streams from SubRip (.srt) format files, a simple but
common text subtitles format. ffmpeg2theora 0.21 or later is needed.

At its simplest:

ffmpeg2theora -o video-with-subtitles.ogg --subtitles subtitles.srt
video-without-subtitles.avi

Several languages may be created and tagged with their language code
for easy selection in a media player:

ffmpeg2theora -o video-with-subtitles.ogg video-without-subtitles.avi
--subtitles japanese-subtitles.srt --subtitles-language ja
--subtitles welsh-subtitles.srt --subtitles-language cy
--subtitles english-subtitles.srt --subtitles-language en_GB

Alternatively, kateenc (which comes with the libkate distribution) can
create Kate streams from SubRip files as well. These can then be merged
with a video with oggz-tools:

kateenc -t srt -c SUB -l it -o subtitles.ogg italian-subtitles.srt
oggz merge -o movie-with-subtitles.ogg movie-without-subtitles.ogg subtitles.ogg

This second method can also be used to add subtitles to a video which
is already encoded to Theora, as it will not transcode the video again.

=== DVD subtitles ===

DVD subtitles are not text, but images. Thoggen, a DVD ripper program,
can convert these subtitles to Kate streams (at the time of writing,
Thoggen and GStreamer have not applied the necessary patches for this
to be possible out of the box, so patching them will be required).

When configuring how to rip DVD tracks, any subtitles will be detected
by Thoggen, and selecting them in the GUI will cause them to be saved as
Kate tracks along with the movie.

=== Song lyrics ===

Kate streams carrying song lyrics can be embedded in an Ogg file. The
oggenc Vorbis encoding tool from the Xiph.Org Vorbis tools allows lyrics
to be loaded from a LRC or SRT text file and converted to a Kate stream
multiplexed with the resulting Vorbis audio. At the time of writing,
the patch to oggenc was not applied yet, so it will have to be patched
manually with the patch found in the diffs directory.

oggenc -o song-with-lyrics.ogg --lyrics lyrics.lrc --lyrics-language en_US song.wav

So called 'enhanced LRC' files (containing extra karaoke timing information)
are supported, and a simple karaoke color change scheme will be saved
out for these files. For more complex karaoke effects (such as more
complex style changes, or sprite animation), kateenc should be used with
a Kate description file to create a separate Kate stream, which can then
be merged with a Vorbis only song with oggz-tools:

oggenc -o song.ogg song.wav
kateenc -t kate -c LRC -l en_US -o lyrics.ogg lyrics-with-karaoke.kate
oggz merge -o song-with-karaoke.ogg lyrics-with-karaoke.ogg song.ogg

This latter method may also be used if you already have an encoded Vorbis song
with no lyrics, and just want to add the lyrics without reencoding.

=== Metadata ===

Metadata can be attached to events, or to styles, bitmaps, regions, etc.
Metadata are free form tag/value pairs, and can be used to enrich their
attached data with extra information. However, how this information is
interpreted is up to the application layer.

It is worth noting that an event may not have attached text, so it is
possible to create an empty timed event with attached metadata.

For instance, let's say we have a documentary, with footage from various
places, as well as short interviews, and we want two things:
- tag footage with metadata about the location and date that footage was shot
- subtitle the interviews and tag those subtitles with information about the speaker

You can then create an empty Kate event for each footage part, synchronized
with the footage, and attach a new metadata item called GEO_LOCATION, filled
with latitude and longitude of the place the footage was shot at.
Similarly, for each subtitle event, a metadata item called SPEAKER can be
attached.

An empty event to tag a long 4:20 footage shot in Tokyo on 2011/08/12, and
inserted at 18:30 in the documentary could look like:

event {
00:18:30,000 --> 00:22:50,000
meta "GEO_LOCATION" = "35.42; 139.42"
meta "DATE" = "2011-08-12"
}

Here's a example for a line spoken by Dr Joe Bloggs at 18:30 into the documentary:

event {
00:18:30,000 --> 00:18:32,000
"Notice how the subtitles for my words have metadata attached to them"
meta "SPEAKER" = "Dr Joe Bloggs"
meta "URL" = "http://www.example.com/biography?name=Joe+Bloggs"
}

Notice how another metadata item, URL, is also present. The application
will have to be aware of those metadata in order to do something with it
though. Since those are free form, it is up to you to think of what
metadata you want, and make use of it.

Note that metadata may be attached to other objects, such as regions.
This way, you can for example create a region tagged with a name, and
track a person's movements with that region. Or you can tag a bitmap
with a copyright and a URL to a larger version of the image.

=== Changing a Kate stream embedded in an Ogg stream ===

If you need to change a Kate stream already embedded in an Ogg stream (eg, you have a movie with subtitles, and you want to fix a spelling mistake, or want to bring one of the subtitles forward in time, etc), you can do this easily with KateDJ, a tool that will extract Kate streams, decode them to a temporary location, and rebuild the original stream after you've made whatever changes you want.

KateDJ (included with the libkate distribution) is a GUI program using wxPython, a Python module for the wxWidgets GUI library, and the oggz tools (both needing installing separately if they are not already).

The procedure consists of:

* Run KateDJ
* Click 'Load Ogg stream' and select the file to load
* Click 'Demux file' to decode Kate streams in a temporary location
* Edit the Kate streams (a message box tells you where they are placed)
* When done, click 'Remux file from parts'
* If any errors are reported, continue editing until the remux step succeeds

== Frequently Asked Questions ==

=== Does libkate work on other plaforms than Linux ? ===

Yes, libkate is not Linux specific in any way. It optionally relies on libogg
and libpng, two libraries widely ported to various platforms.
It has been reported to work on Windows and MacOS X as well as UNIX platforms.

However, libtiger, a rendering library for Kate streams, relies on Pango and Cairo,
which are not easy to build on Windows, though they can be.
The Tiger renderer is however completely separate from libkate, and is not needed
for full encoding and decoding of Kate streams.

=== Where can I find some example files ? ===

The libkate distribution can generate various examples, but already built files
can be found there:
[http://people.xiph.org/~oggk/elephants_dream/elephantsdream-with-subtitles.ogg]
[http://stallman.org/fry/Stephen_Fry-Happy_Birthday_GNU-nq_600px_425kbit.ogv]

These files use raw text only.

[[Category:Ogg Mappings]]

OggKate

2017-11-21T09:11:25Z

MrZeus: /* Karaoke */

== Disclaimer ==
This is not a Xiph codec, though it may be embedded in Ogg alonside other Xiph
codecs, such as Vorbis and Theora. As such, please do not assume that Xiph has
anything to do with this, much less responsibility.

== What is Kate? ==

Kate is an overlay codec, originally designed for karaoke and text, that can be
multiplexed in Ogg.

Text and images can be carried and animated by a Kate stream.
Most of the time, they will (optionally) be multiplexed with audio/video to carry subtitles,
song lyrics (with or without karaoke data), etc.

Series of curves (splines, segments, etc) may be attached to various properties
(text position, font size, etc) to create animated overlays. This allows scrolling
or fading text to be defined. This can even be used to draw arbitrary shapes, so
hand drawing can also be represented by a Kate stream.

Example uses of Kate streams are movie subtitles for Theora videos, either text based,
as may be created by [http://www.v2v.cc/~j/ffmpeg2theora ffmpeg2theora], or image
based, such as created by [http://thoggen.net Thoggen] (patching needed), and lyrics,
as created by oggenc, from vorbis-tools.

== Why a new codec? ==

As I was adding support for Theora, Speex and FLAC to some software of mine, I found myself
wanting to have song lyrics accompanying Vorbis audio. Since Vorbis comments are limited to
the headers, one can't add them in the stream as they are sung, so another multiplexed stream
would be needed to carry them.

The three possible bases usable for such a codec I found were Writ, CMML, and OGM/SRT.

*[[OggWrit|Writ]] is an unmaintained start at an implementation of a very basic design, though I did find an encoder/decoder in py-ogg2 later on - I'd been quicker to write Kate from scratch anyway.
*[[CMML]] is more geared towards encapsulating metadata about an accompanying stream, rather than being a data stream itself, and seemed complex for a simple use, though I have now revised my view on this - besides, it seems designed for Annodex (which I haven't had a look at), though it does seems relatively generic for use outwith Annodex - though it is being "repurposed" as timed text now, bringing it closer to what I'm doing
*OGM/SRT, which I only found when I added Kate support to MPlayer, is shoehorning various data formats into an Ogg stream, and just dumps the SRT subtitle format as is, AFAICS (though I haven't looked at this one in detail, since I'd already had a working Kate implementation by that time)

I then decided to roll my own, not least because it's a fun thing to do.

I found other formats, such as USF (designed for inclusion in Matroska) and various subtitle formats,
but none were designed for embedding inside an Ogg container.

== Overview of the Kate bitstream format ==

I've taken much inspiration from Vorbis and Theora here.
Headers and packets (as well as the API design) follow the design of these two codecs.

A rough overview (see [[#Format specification|Format specification]] for more details) is:

Headers packets:
*ID header [BOS]: magic, version, granule fraction, encoding, language, etc
*Comment header: Vorbis comments, as per Vorbis/Theora streams
*Style definitions header: a list of predefined styles to be referred to by data packets
*Region definitions header: a list of predefined regions to be referred to by data packets
*Curves definitions header: a list of predefined curves to be referred to by data packets
*Motion definitions header: a list of predefined motions to be referred to by data packets
*Palette definitions header: a list of predefined palettes to be referred to by data packets
*Bitmap definitions header: a list of predefined bitmaps to be referred to by data packets
*Font mapping definitions header: a list of predefined font mappings to be referred to by data packets

Other header packets are ignored, and left for future expansion.

Data packets:
*text data: text/image and optional motions, accompanied by optional overrides for style, region, language, etc
*keepalive: can be emitted at any time to help a demuxer know where we're at, but those packets are optional
*repeats: a verbatim repeat of a text packet's payload, in order to bound any backward seeking needed when starting to play a stream partway through. These are also optional.
*end data [EOS]: marks the end of the stream, it doesn't have any useful payload

Other data packets are ignored, and left for future expansion.

The intent of the "keepalive" packet is to be sent at regular
intervals when no other packet has been emitted for a while. This would be to help seeking code
find a kate page more easily.

Things of note:
*Kate is a discontinuous codec, as defined in [http://www.xiph.org/ogg/doc/ogg-multiplex.html ogg-multiplex.html] in the Ogg documentation, which means it's timed by start granule, not end granule (as Theora and Vorbis).
* All data packets are on their own page, for two reasons:
**Ogg keeps track of granules at the page level, not the packet level
**if no text event happens for a while after a particular text event, we don't want to delay it so a larger page can be issued

See also [[#Seeking and memory|Problems to solve: Seeking and memory]].

*The granule encoding is not a direct time/granule correspondance, see the granule encoding section.
*The EOS packet should have a granule pos higher or equal to the end time of all events.
*User code doesn't have to know the number of headers to expect, this is moved inside the library code (as opposed to Vorbis and Theora).
*The format contains hooks so that additional information may be added in future revisions while keeping backward compatibility (though old decoders will correctly parse, but ignore the new information).

== Format specification ==

The Kate bitstream format consists of a number of sequential packets.
Packets can be either header packets or data packets. All header packets
must appear before any data packet.

Header packets must appear in order. Decoding of a data packet is not
possible until all header packets have been decoded.

Each Kate packet starts with a one byte type. A type with the MSB set
(eg, between 0x80 and 0xff) indicates a header packet, while a type with
the MSB cleared (eg, between 0x00 and 0x7f) indicates a data packet.
All header packets then have the Kate magic, from byte offset 1 to byte
offset 7 ("kate\0\0\0"). Note that this applies only to header packets:
data packets do not contain the Kate signature.

Since the ID header must appear first, a Kate stream can be recognized
by comparing the first eight bytes of the first packet with the signature
string "\200kate\0\0\0".

When embedded in Ogg,the first packet in a Kate stream (always packet type 0x80,
the id header packet) must be placed on a separate page. The corresponding Ogg
packet must be marked as beginning of stream (BOS).All subsequent header packets
must be on one or more pages. Subsequently, each data packet must be on a separate
page.

The last data packet must be the end of stream packet (packet type 0x7f).

When embedded in Ogg, the corresponding Ogg packet must be marked as end of stream (EOS).

As per the Ogg specification, granule positions must be non decreasing
within the stream. Header packets have granule position 0.

Currently existing packet types are:
:headers:
::0x80 ID header (BOS)
::0x81 Vorbis comment header
::0x82 regions list header
::0x83 styles list header
::0x84 curves list header
::0x85 motions list header
::0x86 palettes list header
::0x87 bitmaps list header
::0x88 font ranges and mappings header
:data:
::0x00 text data (including optional motions and overrides)
::0x01 keepalive
::0x02 repeat
::0x7f end packet (EOS)

This format described here is for bitstream version 0.x.
As or 19 december 2008, the latest bitstream version is 0.4.

For more detailed information, refer to the format documentation
in libkate (see URL below in the [[#Downloading|Downlading]] section).

Following is the definition of the ID header (packet type 0x80).
This works out to a 64 byte ID header. This is the header that should be
used to detect a Kate stream within an Ogg stream.

0 1 2 3 |
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| packtype | Identifier char[7]: 'kate\0\0\0' | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| kate magic continued | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| reserved - 0 | version major | version minor | num headers | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| text encoding | directionality| reserved - 0 | granule shift | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cw sh | canvas width | ch sh | canvas height | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| reserved - 0 | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| granule rate numerator | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| granule rate denominator | 28-31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (NUL terminated) | 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (continued) | 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (continued) | 40-43
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| language (continued) | 44-47
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (NUL terminated) | 48-51
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (continued) | 52-55
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (continued) | 56-59
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| category (continued) | 60-63
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The fields cw sh, canvas width, cw sh, and canvas height were introduced
in bistream 0.3. Earlier bitstreams will have 0 in these fields.

language and category are NUL terminating ASCII strings.
Language follows RFC 3066, though obviously will not accommodate language tags
with lots of subtags.

Category is currently loosely defined, and I haven't found yet a nice way to
present it in a generic way, but is meant for automatic classifying of
various multiplexed Kate streams (eg, to recognize that some streams are
subtitles (in a set of languages), and some others are commentary (in a
possibly different set of languages, etc).

== API overview ==

libkate offers an API very similar to that of libvorbis and libtheora, as well as
an extra higher level decoding API.

Here's an overview of the three main modules:

=== Decoding ===

Decoding is done in a way similar to libvorbis. First, initialize a kate_info and a
kate_comment structure. Then, read headers by calling kate_decode_headerin. Once
all headers have been read, a kate_state is initialized for decoding using kate_decode_init,
and kate_decode_packetin is called repeatedly with data packets. Events (eg, text) can be
retrieved via kate_decode_eventout.

=== Encoding ===

Encoding is also done in a way similar to libvorbis. First initialize a kate_info
and a kate_comment structure, and fill them out as needed. kate_encode_headers will
create ogg packets from those. Then, kate_encode_text is called repeatedly for all
the text events to add. When done, calling kate_encode_finish will create an end of
stream packet.

=== High level decoding API ===

There are only 3 calls here:

kate_high_decode_init
kate_high_decode_packetin
kate_high_decode_clear

Here, all Ogg packets are sent to kate_high_decode_packetin, which does the right
thing (header/data classification, decoding, and event retrieval). Note that you
do not get access to the comments directly using this, but you do get access to the
kate_info via events.

The libkate distribution includes commented examples for each of those.

Additionally, libkate includes a layer (liboggkate) to make it easier to use when
embedded in Ogg. While the normal API uses kate_packet structures, liboggkate uses
ogg_packet structures.

The High level decoding API does not have an Ogg specific layer, but functions exist
to wrap a kate_packet around a memory buffer (such as the one ogg_packet uses, for instance).

== Support ==

Among the software with Kate support:
*VLC
*ffmpeg2theora
*liboggz
*liboggplay
*Cortado (wikimedia version)
*vorbis-tools

I have patches for the following with Kate support:
*MPlayer
*xine
*GStreamer
*Thoggen
*Audacious
*and more...

These may be found in the libkate source distribution (see [[#Downloading|Downloading]]
for links).

In addition, libtiger is a rendering library for Kate streams using Pango and Cairo,
though it is not quite yet API stable (though no major changes are expected).

== Granule encoding ==

=== Ogg ===

Ogg leaves the encoding of granules up to a particular codec, only
mandating that granules be non decreasing with time.

The Kate bitstream format uses a linear mapping between time and
granule, described here.

A Kate granule position is composed of two different parts:
- a base granule, in the high bits
- a granule offset, in the low bits

+----------------+----------------+
| base | offset |
+----------------+----------------+

The number of bits these parts occupy is variable, and each stream
may choose how many bits to dedicate to each. The kate_info structure
for a stream holds that information in the granule_shift field,
so each part may be reconstructed from a granulepos.

The timestamp T of a given Kate packet is split into a base B and
offset O, and these are stored in the granulepos of that packet.
The split is done such that the B is the time of the earliest event
still active at the time, and the O is the time elapsed between B
and T. Thus, T = B + O. This mimics the way Theora stores its own
timestamps in granulepos, where the base acts as a keyframe, and
an offset acts as the position of an intra frame from the previous
keyframe. Since Kate allows time overlapping events, however, the
choice of the base to use is slightly more complex, as it may not
be the starting time of the previous event, if the stream contains
time overlapping events.

The kate_info structure for a stream holds a rational fraction
representing the time span of granule units for both the base and
the offset parts.

The granule rate is defined by the two fields:

kate_info::gps_numerator
kate_info::gps_denominator

The number of bits reserved for the offset is defined by the field:

kate_info::granule_shift

=== Generic timing ===

Kate data packets (data packet type 0) includes timing information (start time,
end time, and time of the earliest event still active). All these are stored as
64 bit at the rate defined by the granule rate, so they do not suffer from the
granule_shift space limitation.

This also allows for Kate streams to be stored in other containers.

== Motion ==

The Kate bitstream format includes motion definition, originally for karaoke purposes, but
which can be used for more general purpose, such as line based drawing, or animation of
the text (position, color, etc)

Motions are defined by the means of a series of curves (static points, segments, splines (catmull-rom, bezier, and b-splines)).
A 2D point can be obtained from a motion for any timestamp during the lifetime of a text.
This can be used for moving a marker in 2D above the text for karaoke, or to use the x
coordinate to color text when the motion position passes each letter or word, etc.
Motions have an attached semantics so the client code knows how to use a particular motion.
Predefined semantics include text color, text position, etc).

Since a motion can be composed of an arbitrary number of curves, each of which may have
an arbitrary number of control points, complex motions can be achieved. If the motion is
the main object of an event, it is even possible to have an empty text, and use the motion
as a virtual pencil to draw arbitrary shapes. Even on-the-fly handwriting subtitles could
be done this way, though this would require a lot of control points, and would not be able
to be used with text-to-speech.

As a proof of concept, I also have a "draw chat" program where two people can draw, and
the shapes are turned to b-splines and sent as a kate motion to be displayed on the other
person's window.

It is also possible for motions to be discontinuous - simply insert a curve of 'none' type.
While the timestamp lies within such a curve, no 2D point will be generated. This can be
used to temporarily hide a marker, for instance.

It is worth mentionning that pauses in the motion can be trivially included by inserting
at the right time and for the right duration a simple linear interpolation curve with only
two equal points, equal to the position the motion is supposed to pause at.

Kate defines a set of predefined mappings so that each decoder user interprets a motion in
the same way. A mapping is coded on 8 bits in the bitstream, and the first 128 are reserved
for Kate, leaving 128 for application specific mappings, to avoid constraining creative uses
of that feature. Predefined mappings include frame (eg, 0-1 points are mapped to the size of
the current video frame), or region, to scale 0-1 to the current region. This allows curves
to be defined without knowing in advance the pixel size of the area it should cover.

For uses which require more than two coordinates (eg, text color, where 4 (RGBA) values are
needed, Kate predefines the semantics text_color_rg and text_color_ba, so a 4D point can be
obtained using two different motions.

There are higher level constructs, such as morphing between two styles, or predefined
karaoke effects. More are planned to be added in the future.

See also [[#Trackers|Trackers]].

== Trackers ==

Since attaching motions to text position, etc, makes it hard for the client to keep track of
everything, doing interpolation, etc, the library supplies a tracker object, which handles the
interpolation of the relevant properties.
Once initialized with a text and a set of motions, the client code can give the tracker a new
timestamp, and get back the current text position, text color, etc.

Using a tracker is not necessary, if one wants to use the motions directly, or just ignore them,
but it makes life easier, especially when considering the the order in which motions are applied
does matter (to be defined formally, but the current source code is informative at this point).

== The Kate file format ==

Though this is not a feature of the bitstream format, I have created a text file format to
describe a series of events to be turned into a Kate bitstream.
At its minimum, the following is a valid input to the encoder:

: kate {
:: event { 00:00:05 --> 00:00:10 "This is a text" }
: }

This will create a simple stream with "This is a text" emitted at an offset of 5 seconds into
the track, lasting 5 seconds to an end time at 10 seconds.

Motions, regions, styles can be declared in a definitions block to be reused by events, or can
be defined inline. Defining those in the definitions block places them in a header so they can
be reused later, saving space. However, they can also be defined in each event, so they will be
sent with the event. This allows them to be generated on the fly (eg, if the bitstream is being
streamed from a realtime input).

For convenience, the Kate file format also allows C style macros, though without parameters.

Please note that the Kate file format is fully separate from the Kate bitstream format. The
difference between the two is similar to the difference between a C source file and the resulting
object file, when compiled.

Note that the format is not based on XML for a very parochial reason: I tend to dislike very
much editing XML by hand, as it's really hard to read. XML is really meant for machines to parse
generically text data in a shared syntax but with possibly unknown semantics, and I need those
text representations to be editable easily.

This also implies that there could be an XML representation of a Kate stream, which would be
useful if one were to make an editor that worked on a higher level than the current all-text
representation, and it is something that might very well happen in the future, in parallel with
the current format.

== Karaoke ==

Karaoke effects rely on motions, and there will be predefined higher level ways of specifying
timings and effects, two of which are already done.

As an example, this is a valid Karaoke script:

:kate {
:: simple_timed_glyph_style_morph {
::: from style "start_style" to style "end_style"
::: "Let " at 1.0
::: "us " at 1.2
::: "sing " at 1.4
::: "to" at 2.0
::: "ge" at 2.5
::: "ther" at 3.0
:: }
:}

The syllables will change from a style to another as time passes. The definition of the start_style
and end_style styles is omitted for brevity.

== Problems to solve ==

There are a few things to solve before the Kate bitstream format can be considered good
enough to be frozen:

Note: the following is mostly solved, and the bitstream is now stable, and has been
backward and forward compatible since the first released version. This will be updated
when I get some time.

=== Seeking and memory ===

When seeking to a particular time in a movie with subtitles, we may end up at a place when a subtitle has been started, but is not removed yet. Pure streaming doesn't have this problem as it remembers the subtitle being issued (as opposed to, say, Vorbis, for which all data valid now is decoded from the last packet). With Kate, a text string valid now may have been issued long ago.

I see three possible ways to solve this:
*each data packet includes the granule of the earliest still active packet (if none, this will be the granule of this very packet)
**this means seeks are two phased: first seek, find the next Kate packet, and seek again if the granule of the earlier still active packet is less than the original seeked granule. This implies support code on players to do the double seek.

*use "reference frames", a bit like Theora does, where the granule position is split in several fields: the higher bits represent a position for the reference frame, and the lowest bits a delta time to the current position. When seeking to a granule position, the lower bits are cleared off, yielding the granule position of the previous reference frame, so the seek ends up at the reference frame. The reference frame is a sync point where any active strings are issued again. This is a variant of the method described in the Writ wiki page, but the granule splitting avoids any "downtime".
**this requires reissuing packets, and it doesn't feel right (and wastes space).
**it also requires "dummy" decoding of Kate data from the reference frame to the actual seek point to fully refresh the state "memory".

*A variant of the two-granules-in-one system used by libcmml, where the "back link" points to the earliest still active string, rather than the previous one (this allows a two phase seek, rather than a multiphase seek, hopping back from event to event, with no real way to know if there is or not a previous event which is still active - I suppose CMML has no need to know this, if their "clips" do not overlap - mine can do).
**Such a system considerably shortens the usable granule space, though it can do a one phase seek, if I understand the system correctly, which I am not certain.
*** Well, it seems it can't do a one phase seek anyway.

*Additionally, it could be possible to emit simple "keepalive" packets at regular intervals to help a seek algorithm to sync up to the stream without needing too much data reading - this helps for discontinuous streams where there could be no pages for a while if no data is needed at that time.

=== Text encoding ===

A header field declares the text encoding used in the stream. At the moment, only UTF-8 is
supported, for simplicity. There are no plans to support other encodings, such as UTF-16,
at the moment.

Note that strings included in the header (language, category) are not affected by that
language encoding (rather obviously for language itself). These are ASCII.

The actual text in events may include simple HTML-like markup (at the moment, allowed markup
is the same as the one Pango uses, but more markup types may be defined in the future).
It is also possible to ask libkate to remove this markup if the client prefers to receive
plain text without the markup.

=== Language encoding ===

A header field defines the language (if any) used in the stream (this can be overridden in a
data packet, but this is not relevant to this point). At the moment, my test code uses
ISO 639-1 two letter codes, but I originally thought to use RFC 3066 tags. However, matching
a language to a user selection may be simpler for user code if the language encoding is kept
simple. At the moment, I tend to favor allowing both two letter tags (eg, "en") and secondary
tags (like "en_EN"), as RFC 3066 tags can be quite complex, but I welcome comments on this.

If a stream contains more than one language, there usually is a predominant language, which
can be set as the default language for the stream. Each event can then have a language
override. If there is no predominant language, and it is not possible to split the stream
into multiple substreams, each with its own language, then it is possible to use the "mul"
language tag, as a last resort.

=== Bitstream format for floating point values ===

Floating point values are be turned to a 16.16 fixed point format, then stored in a bitpacked
format, storing the number of zero bits at the head and tail of the floating point values once
per stream, and the remainder bits for all values in the stream. This seems to yield good results
(typically a 50% reduction over 32 bits raw writes, and 70% over the snprintf based storage), and
has the big advantage of being portable (eg, independant of any IEEE format).
However, this means reduced precision due to the quantization to 16.16. I may add support for
variable precision (eg, 8.24 fixed point formats) to alleviate this. This would however mean less
space savings, though these are likely to be insignificant when Kate streams are interleaved with
a video.

*Though this is not a Kate issue per se, the motion feature is very difficult to use without a curve editor. While tools may be coded to create a Kate bitstream for various existing subtitle formats, it is not certain it will be easy to find a good authoring tool for a series of curves. That said, it's not exactly difficult to do if you know a widget set.

=== Higher dimensional curves/motions ===

It is quite annoying to have to create two motions to control a color change, due to curves
being restricted to two dimensions. I may add support for arbitrary dimensions. It would also
help for 1D motions, like changing the time flow, where one coordinate is simply ignored at
the moment.
Alternatively, changes could be made to the Kate file format to hide the two dimensionality and
allow simpler specification of non-2 dimensional motions, but still map them to 2D in the kate
bitstream format.

=== Category definition ===

The category field in the BOS packet is a 16 byte text field (15 really, as it is zero terminated
in the bitstream itself). Its goal is to provide the reader with a short description of what kind
of information the stream contains, eg subtitles, lyrics, etc. This would be displayed to the user,
possibly to allow to choose to turn some streams on and off.

Since this category is meant primarily for a machine to parse, they will be kept to ASCII. When
a player recognizes a category, it is free to replace its name with one in the user's language if
it prefers. Even in English, the "lyrics" category could be displayed by a player as "Lyrics".

Since this is a free text field rather than an enumeration, it would be good to have a list of
common predefined category names that Kate streams can use.

This is a list of proposed predefined categories, feedback/additions welcome:

* subtitles - the usual movie subtitles, as text
* spu-subtitles - movie subtitles in DVD style paletted images
* lyrics - song lyrics

Please remember the 15 character limit if proposing other categories.

Note that the list of categories is subject to change, and will likely
be replaced by new, more "identifier like" ones. The three ones above,
however, would be kept for backward compatibility as they're already used.

== Text to speech ==

One of the goals of the Kate bitstream format is that text data can be easily parsed
by the user of the decoder, so any additional information, such as style, placement,
karaoke data, etc, should be able to be stripped to leave only the bare text. This is
in view of allowing text-to-speech software to use Kate bitstreams as a bandwith-cheap
way of conveying speech data, and could also allow things like e-books which can be
either read or listened to from the same bitstream (I have seen no reference to this
being used anywhere, but I see no reason why the granule progression should be temporal,
and not user controlled, such as by using a "next" button which would bump a granule
postion by a preset amount, simulating turning a page (this would be close to necessary
for text-to-speech, as the wall time duration of the spoken speech is not known in
advance to the Kate encoder, and can't be mapped to a time based granule progression)).
All text strings triggered consecutively between the two granule positions would then
be read in order.

== Possible additions ==

=== Embedded binary data ===

Images and font mappings can be included within a Kate stream.

==== Images ====

Though this could be misused to interfere with ability to render as text-to-speech, Kate
can use images as well as text. The same caveat as for fonts applies with regard to data
duplication.

Complex images might however be best left to a multiplexed OggSpots or OggMNG stream, unless the
images mesh with the text (eg, graphical exclamation points, custom fonts, (see next
paragraph), etc).

There is support for simple paletted bitmap images, with a variable length palette of up
to 256 colors (in fact, sized in powers of 2 up to 256) and matching pixel data in as
many bits per pixel as can address the palette. Palettes and images are stored separately,
so can be used with one another with no fixed assignment.

Palettes and bitmaps are put in two separate header for later use by reference, but can
also be placed in data packets, as with motions, etc, if they are not going to be reused.

PNG bitmaps can also be embedded in a Kate stream. These do not have associated palettes
(but the PNGs themselves may or may not be paletted). There is no support for decoding PNG
images in libkate itself, so a program will have to use libpng (or similar code) to decode
the PNG image. For instance, the libtiger rendering library uses Cairo to decode and render
PNG images in Kate streams.

This can be used to have custom fonts, so that raw text is still available if the stream
creator wants a custom look.

I expect that the need for more than 256 colors in a bitmap, or non palette bitmap data,
would be best handled by another codec, eg OggMNG or OggSpots. The goal of images in a
Kate stream is to mesh the images with the text, not to have large images by themselves.

On the other hand, interesting Karaoke effects could be achieved by having MNG images
instead of simple paletted bitmaps in a Kate streams. Comments would be most welcome on
whether this is going too far, however.

I am also investigating SVG images. These allow for very small footprint images for simple
vector drawings, and could be very useful for things like background gradients below text.

A possible solution to the duplication issue is to have another stream in the container
stream, which would hold the shared data (eg, fonts), which the user program could load,
and which could then be used by any Kate (and other) stream. Typically, this type of stream
would be a degenerate stream with only header packets (so it is fully processed before any
other stream presents data packets that might make use of that shared data), and all payload
such as fonts being contained within the headers. Thinking about it, it has parallels with
the way Vorbis stores its codebooks within a header packet, or even the way Kate stores the
list of styles within a header packet.

==== Fonts ====

Custom fonts are merely a set of ranges mapping unicode code points to bitmaps. As this implies,
fonts are bitmap fonts, not vector fonts, so scaling, if supported by the rendering client,
may not look as good as with a vector font.

A style may also refer to a font name to use (eg, "Tahoma"). These fonts may or may not be
available on the playing system, however, since the font data is not included in the stream,
just referenced by name. For this reason, it is best to keep to widely known fonts.

== Reference encoder/decoder ==

A encoder (kateenc) and a decoder (katedec) are included in the tools directory.
The encoder supports input from several different formats:
* a custom text based file format (see [[#The Kate file format|The Kate file format]]), which is by no means meant to be part of the Kate bitstream specification itself
* SubRip (.srt), the most common subtitle format I found
* LRC lyrics format.

As an example for the widely used SRT subtitles format, the following command line
create a Kate subtitles stream from an SRT file:

kateenc -l en -c subtitles -t srt -o subtites.ogg subtitles.srt

The reverse is possible, to recover an SRT file from a Kate stream, with katedec.

Note that the subtitles.ogg file should then be multiplexed into the A/V stream,
using either ogg-tools or oggz-tools.

The Kate bitstreams encoded and decoded by those tools are (supposed to be) correct for this
specification, provided their input is correct.

== Next steps ==

=== Continuations ===

Continuations are a way to add to existing events, and are mostly meant for motions. When streaming
in real time, what motions may be applied to events may not be known in advance (for instance, for a
draw chat program where two programs exchange Kate streams, the drawing motions are only known as
they are drawn. Continuations will allow an event to be extended in time, and motions to be appended
to it. This is only useful for streaming, as when stored in a file, everything is already known in
advance.

=== A rendering library ===

This will allow easier integration in other packages (movie players, etc).
I have started working on an implementation using Cairo and Pango, though I'm still at the early stages.
I might add support for embedding vector fonts in a Kate stream if I was going that way. Still need to think about this.
Another point of note is that when this library is available, it would make it easier to add
capabilities such as rotation, scaling, etc, to the bitstream, since this would not cause too
much work for playing programs using the rendering library. It is expected that these additions
would stay backward compatible (eg, an old player would ignore this information but still correctly
decode the information they can work with from a newly encoded stream).

=== An XML representation ===

While I purposefully did not write Kate description files in XML due to me finding editing XML such
a chore, it would be nice to be able to losslessly convert between the more user friendly representation
and an XML document, so one can do what one does with XML documents, like transformations.

And after all, some people might prefer editing the XML version.

=== Packaging ===

It would be really nice to have packages for libkate/libtiger for many distros.

If you're a packager for a distro which doesn't have yet packages for libkate
or libtiger, please consider helping :)

In particular, packages for Debian would be grand.

== Matroska mapping ==

The codec ID is "S_KATE".

As for Theora and Vorbis, Kate headers are stored in the private data as xiph-laced packets:

Byte 0: number of packets present, minus 1 (there must be at least one packet) - let this number be NP
Bytes 1..n: lengths of the first NP packets, coded in xiph style lacing
Bytes n+1..end: the data packets themselves concatenated one after the other

Note that the length of the last packet isn't encoded, it is deduced from the sizes of the other
packets and the total size of the private data.

This mapping is similar to the Vorbis and Theora mappings, with the caveat that one should not
expect a set number of headers.

== Downloading ==

libkate encodes and decodes Kate streams, and is API and ABI stable.

The libkate source distribution is available at [http://libkate.googlecode.com/ http://libkate.googlecode.com/].

A public git repository is available at [http://git.xiph.org/?p=users/oggk/kate.git;a=summary http://git.xiph.org/?p=users/oggk/kate.git;a=summary].

libtiger renders Kate streams using Pango and Cairo, and is alpha, with API changes still possible.

The libtiger source distribution is available at [http://libtiger.googlecode.com/ http://libtiger.googlecode.com/].

A public git repository is available at [http://git.xiph.org/?p=users/oggk/tiger.git;a=summary http://git.xiph.org/?p=users/oggk/tiger.git;a=summary].

== HOWTOs ==

These paragraphs describe a few ways to use Kate streams:

=== Text movie subtitles ===

Kate streams can carry Unicode text (that is, text that can represent
pretty much any existing language/script). If several Kate streams are
multiplexed along with a video, subtitles in various languages can be
made for that movie.

An easy way to create such subtitles is to use ffmpeg2theora, which
can create Kate streams from SubRip (.srt) format files, a simple but
common text subtitles format. ffmpeg2theora 0.21 or later is needed.

At its simplest:

ffmpeg2theora -o video-with-subtitles.ogg --subtitles subtitles.srt
video-without-subtitles.avi

Several languages may be created and tagged with their language code
for easy selection in a media player:

ffmpeg2theora -o video-with-subtitles.ogg video-without-subtitles.avi
--subtitles japanese-subtitles.srt --subtitles-language ja
--subtitles welsh-subtitles.srt --subtitles-language cy
--subtitles english-subtitles.srt --subtitles-language en_GB

Alternatively, kateenc (which comes with the libkate distribution) can
create Kate streams from SubRip files as well. These can then be merged
with a video with oggz-tools:

kateenc -t srt -c SUB -l it -o subtitles.ogg italian-subtitles.srt
oggz merge -o movie-with-subtitles.ogg movie-without-subtitles.ogg subtitles.ogg

This second method can also be used to add subtitles to a video which
is already encoded to Theora, as it will not transcode the video again.

=== DVD subtitles ===

DVD subtitles are not text, but images. Thoggen, a DVD ripper program,
can convert these subtitles to Kate streams (at the time of writing,
Thoggen and GStreamer have not applied the necessary patches for this
to be possible out of the box, so patching them will be required).

When configuring how to rip DVD tracks, any subtitles will be detected
by Thoggen, and selecting them in the GUI will cause them to be saved as
Kate tracks along with the movie.

=== Song lyrics ===

Kate streams carrying song lyrics can be embedded in an Ogg file. The
oggenc Vorbis encoding tool from the Xiph.Org Vorbis tools allows lyrics
to be loaded from a LRC or SRT text file and converted to a Kate stream
multiplexed with the resulting Vorbis audio. At the time of writing,
the patch to oggenc was not applied yet, so it will have to be patched
manually with the patch found in the diffs directory.

oggenc -o song-with-lyrics.ogg --lyrics lyrics.lrc --lyrics-language en_US song.wav

So called 'enhanced LRC' files (containing extra karaoke timing information)
are supported, and a simple karaoke color change scheme will be saved
out for these files. For more complex karaoke effects (such as more
complex style changes, or sprite animation), kateenc should be used with
a Kate description file to create a separate Kate stream, which can then
be merged with a Vorbis only song with oggz-tools:

oggenc -o song.ogg song.wav
kateenc -t kate -c LRC -l en_US -o lyrics.ogg lyrics-with-karaoke.kate
oggz merge -o song-with-karaoke.ogg lyrics-with-karaoke.ogg song.ogg

This latter method may also be used if you already have an encoded Vorbis song
with no lyrics, and just want to add the lyrics without reencoding.

=== Metadata ===

Metadata can be attached to events, or to styles, bitmaps, regions, etc.
Metadata are free form tag/value pairs, and can be used to enrich their
attached data with extra information. However, how this information is
interpreted is up to the application layer.

It is worth noting that an event may not have attached text, so it is
possible to create an empty timed event with attached metadata.

For instance, let's say we have a documentary, with footage from various
places, as well as short interviews, and we want two things:
- tag footage with metadata about the location and date that footage was shot
- subtitle the interviews and tag those subtitles with information about the speaker

You can then create an empty Kate event for each footage part, synchronized
with the footage, and attach a new metadata item called GEO_LOCATION, filled
with latitude and longitude of the place the footage was shot at.
Similarly, for each subtitle event, a metadata item called SPEAKER can be
attached.

An empty event to tag a long 4:20 footage shot in Tokyo on 2011/08/12, and
inserted at 18:30 in the documentary could look like:

event {
00:18:30,000 --> 00:22:50,000
meta "GEO_LOCATION" = "35.42; 139.42"
meta "DATE" = "2011-08-12"
}

Here's a example for a line spoken by Dr Joe Bloggs at 18:30 into the documentary:

event {
00:18:30,000 --> 00:18:32,000
"Notice how the subtitles for my words have metadata attached to them"
meta "SPEAKER" = "Dr Joe Bloggs"
meta "URL" = "http://www.example.com/biography?name=Joe+Bloggs"
}

Notice how another metadata item, URL, is also present. The application
will have to be aware of those metadata in order to do something with it
though. Since those are free form, it is up to you to think of what
metadata you want, and make use of it.

Note that metadata may be attached to other objects, such as regions.
This way, you can for example create a region tagged with a name, and
track a person's movements with that region. Or you can tag a bitmap
with a copyright and a URL to a larger version of the image.

=== Changing a Kate stream embedded in an Ogg stream ===

If you need to change a Kate stream already embedded in an Ogg stream (eg, you have a movie with subtitles, and you want to fix a spelling mistake, or want to bring one of the subtitles forward in time, etc), you can do this easily with KateDJ, a tool that will extract Kate streams, decode them to a temporary location, and rebuild the original stream after you've made whatever changes you want.

KateDJ (included with the libkate distribution) is a GUI program using wxPython, a Python module for the wxWidgets GUI library, and the oggz tools (both needing installing separately if they are not already).

The procedure consists of:

* Run KateDJ
* Click 'Load Ogg stream' and select the file to load
* Click 'Demux file' to decode Kate streams in a temporary location
* Edit the Kate streams (a message box tells you where they are placed)
* When done, click 'Remux file from parts'
* If any errors are reported, continue editing until the remux step succeeds

== Frequently Asked Questions ==

=== Does libkate work on other plaforms than Linux ? ===

Yes, libkate is not Linux specific in any way. It optionally relies on libogg
and libpng, two libraries widely ported to various platforms.
It has been reported to work on Windows and MacOS X as well as UNIX platforms.

However, libtiger, a rendering library for Kate streams, relies on Pango and Cairo,
which are not easy to build on Windows, though they can be.
The Tiger renderer is however completely separate from libkate, and is not needed
for full encoding and decoding of Kate streams.

=== Where can I find some example files ? ===

The libkate distribution can generate various examples, but already built files
can be found there:
[http://people.xiph.org/~oggk/elephants_dream/elephantsdream-with-subtitles.ogg]
[http://stallman.org/fry/Stephen_Fry-Happy_Birthday_GNU-nq_600px_425kbit.ogv]

These files use raw text only.

[[Category:Ogg Mappings]]

GranulePosAndSeeking

2017-11-20T17:14:50Z

MrZeus: /* But how do I "seek to the desired time"? */

== Granulepos encoding and How seeking really works ==

This describes how to seek on a multiplexed Ogg stream containing logical bitstreams with granuleshift, such as [[Theora]], [[Kate]], [[CMML]] or [[OggText]].
The purpose is to locate the earliest page that is required for rendering a given time offset.
Due to the fact that two time-seeking operations are required, this procedure is commonly referred to as a "'''double seek'''".

=== Definitions ===

Overload '''time''' to mean '''the time represented by a GranulePos value'''. Hence the "time" of a page is the "time represented by the page GranulePos" header field.

Define '''seek''' to mean: for each '''logical''' bitstream, locate the '''bytewise-latest page''' in the bitstream with a '''time < the
target time''', then choose the '''bytewise-earliest''' among these pages. If two or more pages have the same time (aka. GranulePos value), seeking must locate the bytewise-earlier page.

==== Granules and Granuleshift ====

We use the term '''granule''' to refer to time measured in the units of the codec. For audio codecs this is ''usually'' samples, and for video codecs it is ''usually'' frames or fields.

In some formats, pages have a dependency on the data of an earlier page; for example in [[Theora]], interframes have a dependency on an earlier keyframe -- the keyframe data is required to decode the interframe. We encode both the time of the page and the time of the page it depends on into the granulepos. In order to do this we treat the granulepos as a bitfield as follows:

+---------------------+-------------+
| prev_granule | offset |
+---------------------+-------------+

Then if a page has time in units of codec granules <tt>curr_granule</tt>, and the page it depends on has time
<tt>prev_granule</tt>, we define <tt>offset</tt> as the difference between these:

offset = curr_granule - prev_granule

We refer to the number of bits used to encode the offset as the "granuleshift". This is fixed for all pages in
that track (logical bitstream). So we encode the later page's granulepos as:

granulepos = (prev_granule << granuleshift) | offset

When decoding, we can extract the current_granule from a granulepos by simply adding these fields:

curr_granule = prev_granule + offset

Which expands to this expression of the page granulepos:

curr_granule = (granulepos >> granuleshift) + (granulepos & ((1 << granuleshift) - 1)))

Keyframes, and other data with no dependency on earlier packets, are encoded with:

prev_granule = curr_granule, offset = 0

=== Seeking within Single-Track files ===

To locate the earliest page in a track (a logical bitstream) required for rendering a given time offset:

# seek to the desired time
# read the prev_granule out of the granulepos
# seek to the time represented by the prev_granule

=== Seeking within Multitrack files ===

To locate the earliest page in a multitrack file (a physical bitstream) required for rendering '''all''' tracks from a given time offset:

# seek to the desired time
# scan forward until a page has been seen from all of the tracks that use granuleshift; while doing so, record the prev_granule of the bytewise-earliest page encountered from each track
# seek to the minimum of the prev_granules of those pages

It is useful to put a bound on the forward scan; the distance scanned
only depends on the way the stream is constructed, so it can be large
if pages in a particular logical bistream is sparse.

=== But how do I "seek to the desired time"?===
The above assumes that you already know how to seek to a particular GranulePos within the stream efficiently.

This isn't as simple as it sounds, because the Ogg format does not include an index. The lack of an index is a feature rather than a deficiency and it is one of the primary reasons to use Ogg over some other formats.

Because Ogg doesn't have in index, infinite streams and partial streams are automatically supported by correctly written applications. There is no risk of truncation or minor corruption making a stream unseekable. No memory is required to store an index, no bandwidth is wasted to transmit it, and seeking granularity is not limited to the precision of the index.

On the other hand, non-indexed formats require a bit more intelligence from the application using them, so many applications have gotten it wrong (although some intelligence is also needed in a well written application for indexed formats, so that it can seek with a corrupted index or below the index granularity).

If you are thinking about seeking within an Ogg file by building your own complete index: '''STOP! This is not a good procedure.'''

Building an index may seem simple, but it requires a costly read of the entire stream (which may be gigabytes in size, or even infinite). There is a better way.

The correct way to seek to a particular granule value in Ogg is by using a [http://en.wikipedia.org/wiki/Bisection_method bisection search]:

# Seek to the middle of the stream
# obtain sync
# compare your target granule position with the current position.
# If the target is less than the current position, repeat these steps on the left side.
# If it's greater, repeat it on the right side.

By applying this recursive algorithm, you are guaranteed to find your target location much faster than building an index for the whole stream.

To correctly support chaining, you should first use this kind of search to locate the stream endpoints. Then, the above approach can be applied within the streams, to seek to any location within a chained file.

Doing this correctly is somewhat more complicated than it seems, due to the existence of '''continued pages''' and the risk of a small valid page being contained within a packet. Both of these challenges can be addressed, but the solution is left as an exercise for the reader. (Hint: The maximum Ogg page size is < 64 KBytes)

This Bisection Search is very good compared to the alternatives (a linear scan of the whole file), often taking just a couple of reads to locate the correct location in a file gigabytes in size, but the truly obsessive can out-perform the bisection on average, by using the local bitrate to pick a better target than the half way point used in a bisection search ([http://en.wikipedia.org/wiki/Secant_method Secant method]).

Be careful about the worst case becoming linear (see [http://en.wikipedia.org/wiki/Brent%27s_method Brent's method]). The improvement possible from better-than-bisection approaches is probably only relevant for seeking across a high latency network. In typical low-latency applications, the added complexity may not be worth the cost.

== References ==

From an Email by Monty, [http://web.archive.org/web/20031201054855/http://www.xiph.org/archives/theora-dev/200209/0040.html 13th Sept 2002]

'''Note that this document is obsolete, and incorrect with respect to seeking in multiplexed streams.''' It does accurately describe the rationale behind the two-part granulepos scheme (option 3 below) now use in Theora, Dirac, CMML and other codecs in Ogg.

Folks have noticed that the documentation is semi-silent about how to properly encode the granule position and interleave synchronization of keyframe-based video. The primary reasons for this:

* we at Xiph hadn't had to do it yet

* there are several easy possibilities, and the longer we had to think about it before mandating One True Spec, the better that spec would likely be.

The lack of a painfully explicit spec has led to the theory that it's not possible; that's not true, there are a few ways to do it. Several require no extension to Ogg stream v 0. A last way requires an extra field (a point against it), but does not actually break any stream that currently exists.

The time has come to lay down the spec as we're currently building the real abstraction layers in a concrete Ogg framework now where the Ogg engine, the codecs, and the overarching Ogg control layers are neatly put into boxes connected in formalized ways. Below I go into detail about each scheme in a 'thinking aloud' sort of way. This is not because I haven't already given the matter sufficient thought, it is because I wish to give the reader sufficient background information to understand why one way is better than the others. This is not a call for input so much as an educational effort (and a public sanity check of my thinking; please do pipe up if it appears I missed a salient point).

==== Starting Assumptions: ====

1) Ogg is not a non-linear format. It is not a replacement for the scripting system of a DVD player. It is a media transport format
designed to do nothing more than deliver content, in a stream, and have all the pieces arrive on time and in sync. It is not designed to *prevent* more complex use of content, it merely does not implement anything beyond a linear representation of the data contained within. If you want to build a real non-linear format, build it *from* Ogg, not *into* Ogg. This has been the intent from day 1.

2) The Ogg layer does not know specifics of the codec data it's multiplexing into a stream. It knows nothing beyond 'Oooo, packets!', that the packets belong to different buckets, that the packets go in order, and that packets have position markers. Ogg does not even have a concept of 'time'; it only knows about the sequentially increasing, unitless position markers. It is up to higher layers which have access to the codec APIs to assign and convert units of framing or time.

3) Given pre-cached decode headers, a player may seek into a stream at any point and begin decode. It may be the case that audio may start after video by a fraction of a second, or video might be blank until the stream hits the next keyframe, but this simplest case must just work, and there will be sufficient information to maintain perfect cross-media sync.

4) (This departs from current reality, but it will be the reality very soon; vorbisfile currently blurs the careful abstraction I'm about to describe) Seeking at an arbitrary level of precision is a distributed abstraction in the larger Ogg picture. At the lowest-level Ogg stream abstraction, seeking is one operation: "find me the page from logical stream 'n' with granule position 'x'". All more complex seeking operations are a function of a higher-level layer (with knowledge of the media types and codec in use) making intelligent use of this lowest Ogg abstraction. The Ogg stream abstraction need deal with nothing more complex than 'find this page'.

The various granulepos strategies for keyframes concern this last point.

The basic issue with video from which complexity arises is that frames often depend on previous and possibly future frames. This happens in a larger, general category of codecs whose streams may not begin decode from just any packet as well as packets that may not represent an entire frame, or even a fixed-time sampling algorithm. It is a mistake to design a seeking system tied to an exact set of very specific cases. While one could implement an explicit keyframe mechanism at the Ogg level, this mechanism would not cover any of the other interesting seeking cases while, as I'll show below, the mechanism would not actually be necessary.

There will be a few complaints that Ogg is being unnecessarily subtle and shifts a great deal of complexity into software which a few extra page header fields could eliminate. Consider the following:

1) Ogg was designed to impose a roughly .5-1% over the raw packet data over a wide range of packet usage patterns. 'A few extra fields' begins inflating that figure for specific special cases that only apply to a few stream types. Right now there is no header field that is not general to every stream. There is no fat in the page headers.

2) The Ogg-level seeking algorithm is exceptionally simple and can be described in a single sentence: "Find the earliest page with a granulepos less than but closest to 'x'". This shifts the onus of assembling more complex seeking operation requiring knowledge of a specific media type into a higher layer that has knowledge of that media type. The higher layer becomes responsible for determining for what 'x' Ogg should search. The division of labor is clear and
sensible.

3) Complex, precise seeking operations are still contained entirely within the framework, just at a higher layer than Ogg-stream. At no time is an application developer required to deal with seeking mechanisms within an Ogg stream or to manually maintain stream
synchronization.

==== High level handwaving- How seeking really works ====

The granulepos is intended to mean, roughly, 'If I stop decode at the end of this page, I will get data from my decoder up to position 'granulepos'. The granulepos simultaneously provides seeking information and a 'length-of-stream' indicator. Depending on the codec, it can also usually be used to indicate a timebase, but that isn't our problem right now.

By inference, the granulepos is also used to construct a value 'y' such that 'if I begin decode *from* point 'y', I will get data
beginning at position 'granulepos'. Although in some codecs, y == granulepos, that is not necessarily the case when decode can't begin at any arbitrary packet. The granulepos encoding method candidates I will now describe affect exactly the 'granulepos' to 'y' conversion process. Note also that none of these affect Ogg, only the higher decision-making layers... Different circumstanced necessitated by different codecs can lead to different valid choices, all of which work as far as Ogg is concerned. However, for our I-/P-/B-frame video case, there is a pretty clear winner.

===== Strategy 1: Straight Granulepos, Keyframes Are Not Our Problem. =====

In this scheme, the granulepos is a simple frame counter. The seeking decision-maker in the codec's framework plugin is responsible for determining if a frame is a keyframe or not, and if it can't begin decode from a given frame, it must request another earlier frame until it finds a keyframe. If the codec so desires, it can store 'what is my keyframe?' information in the stream packets.

This case means that each seek to a *specific* frame in a video stream will generally result in two Ogg seeks; a first seek to the the requested frame, then a second seek backwards to find that frame's keyframe.

A larger concern is the semantic accuracy of the granulepos; it's intended to reflect position accurately when decoding forward. In this scheme, it's fine for a P-frame to update the counter (as it can be decoded going strictly forward), but B frames will also advance the counter; they can't be decoded without subsequent P or I frames. Thus, the semantic value of granulepos no longer strictly represents 'we can decode up to 'granulepos' at the end of this frame'.

===== Strategy 2: Granulepos Represents Keyframes Only =====

In this scheme, only keyframes update the granulepos (monotonically or non-monotonically). It simplifies the seeking process to a keyframe as an Ogg-level seek to page 'x' will always yield a page with a keyframe. In addition, granulepos will also always mean 'we can decode up to *at least* this point in the stream. If the stream is truncated at P or B frames past granulepos, the extra frames can be discarded. (A special case would need to be defined to terminate a stream that doesn't end on an I frame).

The difficulty with this scheme is that it presents slightly more for the software level decoder to track; a proper frame number could not be determined internally without tracking from an I frame. Also, the granulepos an Ogg page would not necessarily map to the last packet on the page, or even any packet on that page; multiple sequential pages could have the same granulepos. It is conceptually slightly messy, although the 'messiness' does not make it at all impractical.

===== Strategy 3: Granulepos Encodes Some State =====

In some ways, this strategy is the most semantically 'over clever', but also the easiest to implement and the one that gives the most correct, up to date sync information. Pending comments, it is the I/P/B video strategy I currently favor.

The granulepos is 64 bits, a size that is absolutely necessary if, for example, it represents the PCM sample count in an audio codec. When being used to encode video frame number, however, it is comparatively absurdly large*.

* note that although granulepos is not permitted to wrap around, we can simply begin a new logical stream segment with a new serial number should a 30fps video stream ever hit the ten-billion year mark.

Thus we clearly have room to skim a few bits off the bottom of granulepos to represent I, P or B frame. These bits are not used as flags, but rather, frame representation becomes a counting problem; We do this such that the count is still always strictly increasing.

For example, we know that I frames will never be more than 256 frames apart and P frames no more than 31 B frames apart, the granulepos of an I frame can be defined to always be granulepos | 0xff == 0. If we can have up to seven intervening P frames, they could be numbered in granulepos-of-iframe + 0x20, 0x40, 0x60... 0xe0. B frames between the I and P frames would use the remaining five bits and be numbers as sub-I and sub-P frames 1 through 31. Thus, starting from zero, the frames/packets in the pattern IPBBPBBI would be numbered 0x000, 0x020, 0x021, 0x022, 0x040, 0x041, 0x042, 0x100.

If we wish to preserve the ability to represent a timebase, the granulepos number for I frames need not be increased monotonically and shifted; it can be used to represent the frame number. The above example becomes 0x000, 0x020, 0x021, 0x022, 0x040, 0x041, 0x042, 0x700. To get real frame number (from an I frame), we just shift granulepos >> 8. This scheme can be taken further or modified to get frame number from any video frame.

In this way, we can always seek, first time, to a desired key frame page (by seeking to Ogg page 'x' where x | 0xff == 0). In addition, each frame still has a unique frame number and also a clear 'group' number, potentially useful information to the decoder. Lastly, granulepos is still semantically correct, although it is now, in a sense, representing a whole.fractional frame number for buffering purposes.

===== Scheme Four: Extra 'Seekpos' Field / Straw Man =====

Another possibility requires extension of the current Ogg page format. Although older players would reject any such extended pages as invalid, we do have versioning and typing fields, so there's not actually any compatibility problems with current Ogg pages... in the future.

The idea in this scheme is to keep the current granulepos as a frame number field (ala scheme 1), but also add a new field 'seekpos' that is used, rather than granulepos, in seeking. The seekpos would represent the number of the last keyframe that passed by.

advantages:

1) The net effect of this strategy is to modify scheme 1 to only require one bisection seek rather than two. Some amount of code simplification (over scheme 1) at the decision-making level.

disadvantages:

1) The Ogg format will need to be revved. No current (ala 1.0) Ogg code will understand the new pages.

2) The header becomes larger, from a minimum size of 27 bytes to a minimum size of 35.

3) This strategy only enhances keyframes; it is of no use in other odd seeking cases.

4) Gives no more information than scheme 3, but is still more complicated, both in code and API (Ogg would have to understand keyframes).

Thus, there's no substantial reason to prefer extending the format over a scheme that's possible within the existing framework. Note that schemes 1-3 can all be implemented within the Ogg stream today.

Monty

[[Category:Ogg]]

GranulePosAndSeeking

2017-11-20T17:03:42Z

MrZeus: /* Granulepos encoding and How seeking really works */ fix grammar and formatting

== Granulepos encoding and How seeking really works ==

This describes how to seek on a multiplexed Ogg stream containing logical bitstreams with granuleshift, such as [[Theora]], [[Kate]], [[CMML]] or [[OggText]].
The purpose is to locate the earliest page that is required for rendering a given time offset.
Due to the fact that two time-seeking operations are required, this procedure is commonly referred to as a "'''double seek'''".

=== Definitions ===

Overload '''time''' to mean '''the time represented by a GranulePos value'''. Hence the "time" of a page is the "time represented by the page GranulePos" header field.

Define '''seek''' to mean: for each '''logical''' bitstream, locate the '''bytewise-latest page''' in the bitstream with a '''time < the
target time''', then choose the '''bytewise-earliest''' among these pages. If two or more pages have the same time (aka. GranulePos value), seeking must locate the bytewise-earlier page.

==== Granules and Granuleshift ====

We use the term '''granule''' to refer to time measured in the units of the codec. For audio codecs this is ''usually'' samples, and for video codecs it is ''usually'' frames or fields.

In some formats, pages have a dependency on the data of an earlier page; for example in [[Theora]], interframes have a dependency on an earlier keyframe -- the keyframe data is required to decode the interframe. We encode both the time of the page and the time of the page it depends on into the granulepos. In order to do this we treat the granulepos as a bitfield as follows:

+---------------------+-------------+
| prev_granule | offset |
+---------------------+-------------+

Then if a page has time in units of codec granules <tt>curr_granule</tt>, and the page it depends on has time
<tt>prev_granule</tt>, we define <tt>offset</tt> as the difference between these:

offset = curr_granule - prev_granule

We refer to the number of bits used to encode the offset as the "granuleshift". This is fixed for all pages in
that track (logical bitstream). So we encode the later page's granulepos as:

granulepos = (prev_granule << granuleshift) | offset

When decoding, we can extract the current_granule from a granulepos by simply adding these fields:

curr_granule = prev_granule + offset

Which expands to this expression of the page granulepos:

curr_granule = (granulepos >> granuleshift) + (granulepos & ((1 << granuleshift) - 1)))

Keyframes, and other data with no dependency on earlier packets, are encoded with:

prev_granule = curr_granule, offset = 0

=== Seeking within Single-Track files ===

To locate the earliest page in a track (a logical bitstream) required for rendering a given time offset:

# seek to the desired time
# read the prev_granule out of the granulepos
# seek to the time represented by the prev_granule

=== Seeking within Multitrack files ===

To locate the earliest page in a multitrack file (a physical bitstream) required for rendering '''all''' tracks from a given time offset:

# seek to the desired time
# scan forward until a page has been seen from all of the tracks that use granuleshift; while doing so, record the prev_granule of the bytewise-earliest page encountered from each track
# seek to the minimum of the prev_granules of those pages

It is useful to put a bound on the forward scan; the distance scanned
only depends on the way the stream is constructed, so it can be large
if pages in a particular logical bistream is sparse.

=== But how do I "seek to the desired time"?===
The above assumes that you already know how to seek to a particular GranulePos within the stream efficiently.

This isn't as simple as it sounds, because the Ogg format does not include an index. The lack of an index is a feature rather than a deficiency and it is one of the primary reasons to use Ogg over some other formats.

Because Ogg doesn't have in index, infinite streams and partial streams are automatically supported by correctly written applications. There is no risk of truncation or minor corruption making a stream unseekable. No memory is required to store an index, no bandwidth is wasted to transmit it, and seeking granularity is not limited to the precision of the index.

On the other hand, non-indexed formats require a bit more intelligence from the application using them, so many applications have gotten it wrong (although some intelligence is also needed in a well written application for indexed formats, so that it can seek with a corrupted index or below the index granularity).

If you are thinking about seeking within an Ogg file by building your own complete index: '''STOP! This is not a good procedure.'''

Building an index may seem simple, but it requires a costly read of the entire stream (which may be gigabytes in size, or even infinite). There is a better way.

The correct way to seek to a particular granule value in Ogg is by using a [http://en.wikipedia.org/wiki/Bisection_method bisection search]:

# Seek to the middle of the stream
# obtain sync
# compare your target granule position with the current position.
# If the target is less than the current position, repeat these steps on the left side.
# If it's greater, repeat it on the right side.

By applying this recursive algorithm, you are guaranteed to find your target location much faster than building an index for the whole stream.

To correctly support chaining, you should first use this kind of search to locate the stream endpoints. Then, the above approach can be applied within the streams, to seek to any location within a chained file.

Doing this correctly is somewhat more complicated than it seems, due to the existence of '''continued pages''' and the risk of a small valid page being contained within a packet. Both of these challenges can be addressed, but the solution is left as an exercise for the reader. (Hint: The maximum Ogg page size is < 64 KBytes)

This Bisection Search is very good compared to the alternatives (a linear scan of the whole file), often taking just a couple of reads to locate the correct location in a file gigabytes in size, but the truly obsessive can out-perform the bisection on average, by using the local bitrate to pick a better target than the half way point used in a bisection search ([http://en.wikipedia.org/wiki/Secant_method secant method].

Be careful about the worst case becoming linear; see [http://en.wikipedia.org/wiki/Brent%27s_method Brent's method]). The improvement possible from better-than-bisection approaches is probably only relevant for seeking across a high latency network. In typical applications the added complexity may not be worth the cost.

== References ==

From an Email by Monty, [http://web.archive.org/web/20031201054855/http://www.xiph.org/archives/theora-dev/200209/0040.html 13th Sept 2002]

'''Note that this document is obsolete, and incorrect with respect to seeking in multiplexed streams.''' It does accurately describe the rationale behind the two-part granulepos scheme (option 3 below) now use in Theora, Dirac, CMML and other codecs in Ogg.

Folks have noticed that the documentation is semi-silent about how to properly encode the granule position and interleave synchronization of keyframe-based video. The primary reasons for this:

* we at Xiph hadn't had to do it yet

* there are several easy possibilities, and the longer we had to think about it before mandating One True Spec, the better that spec would likely be.

The lack of a painfully explicit spec has led to the theory that it's not possible; that's not true, there are a few ways to do it. Several require no extension to Ogg stream v 0. A last way requires an extra field (a point against it), but does not actually break any stream that currently exists.

The time has come to lay down the spec as we're currently building the real abstraction layers in a concrete Ogg framework now where the Ogg engine, the codecs, and the overarching Ogg control layers are neatly put into boxes connected in formalized ways. Below I go into detail about each scheme in a 'thinking aloud' sort of way. This is not because I haven't already given the matter sufficient thought, it is because I wish to give the reader sufficient background information to understand why one way is better than the others. This is not a call for input so much as an educational effort (and a public sanity check of my thinking; please do pipe up if it appears I missed a salient point).

==== Starting Assumptions: ====

1) Ogg is not a non-linear format. It is not a replacement for the scripting system of a DVD player. It is a media transport format
designed to do nothing more than deliver content, in a stream, and have all the pieces arrive on time and in sync. It is not designed to *prevent* more complex use of content, it merely does not implement anything beyond a linear representation of the data contained within. If you want to build a real non-linear format, build it *from* Ogg, not *into* Ogg. This has been the intent from day 1.

2) The Ogg layer does not know specifics of the codec data it's multiplexing into a stream. It knows nothing beyond 'Oooo, packets!', that the packets belong to different buckets, that the packets go in order, and that packets have position markers. Ogg does not even have a concept of 'time'; it only knows about the sequentially increasing, unitless position markers. It is up to higher layers which have access to the codec APIs to assign and convert units of framing or time.

3) Given pre-cached decode headers, a player may seek into a stream at any point and begin decode. It may be the case that audio may start after video by a fraction of a second, or video might be blank until the stream hits the next keyframe, but this simplest case must just work, and there will be sufficient information to maintain perfect cross-media sync.

4) (This departs from current reality, but it will be the reality very soon; vorbisfile currently blurs the careful abstraction I'm about to describe) Seeking at an arbitrary level of precision is a distributed abstraction in the larger Ogg picture. At the lowest-level Ogg stream abstraction, seeking is one operation: "find me the page from logical stream 'n' with granule position 'x'". All more complex seeking operations are a function of a higher-level layer (with knowledge of the media types and codec in use) making intelligent use of this lowest Ogg abstraction. The Ogg stream abstraction need deal with nothing more complex than 'find this page'.

The various granulepos strategies for keyframes concern this last point.

The basic issue with video from which complexity arises is that frames often depend on previous and possibly future frames. This happens in a larger, general category of codecs whose streams may not begin decode from just any packet as well as packets that may not represent an entire frame, or even a fixed-time sampling algorithm. It is a mistake to design a seeking system tied to an exact set of very specific cases. While one could implement an explicit keyframe mechanism at the Ogg level, this mechanism would not cover any of the other interesting seeking cases while, as I'll show below, the mechanism would not actually be necessary.

There will be a few complaints that Ogg is being unnecessarily subtle and shifts a great deal of complexity into software which a few extra page header fields could eliminate. Consider the following:

1) Ogg was designed to impose a roughly .5-1% over the raw packet data over a wide range of packet usage patterns. 'A few extra fields' begins inflating that figure for specific special cases that only apply to a few stream types. Right now there is no header field that is not general to every stream. There is no fat in the page headers.

2) The Ogg-level seeking algorithm is exceptionally simple and can be described in a single sentence: "Find the earliest page with a granulepos less than but closest to 'x'". This shifts the onus of assembling more complex seeking operation requiring knowledge of a specific media type into a higher layer that has knowledge of that media type. The higher layer becomes responsible for determining for what 'x' Ogg should search. The division of labor is clear and
sensible.

3) Complex, precise seeking operations are still contained entirely within the framework, just at a higher layer than Ogg-stream. At no time is an application developer required to deal with seeking mechanisms within an Ogg stream or to manually maintain stream
synchronization.

==== High level handwaving- How seeking really works ====

The granulepos is intended to mean, roughly, 'If I stop decode at the end of this page, I will get data from my decoder up to position 'granulepos'. The granulepos simultaneously provides seeking information and a 'length-of-stream' indicator. Depending on the codec, it can also usually be used to indicate a timebase, but that isn't our problem right now.

By inference, the granulepos is also used to construct a value 'y' such that 'if I begin decode *from* point 'y', I will get data
beginning at position 'granulepos'. Although in some codecs, y == granulepos, that is not necessarily the case when decode can't begin at any arbitrary packet. The granulepos encoding method candidates I will now describe affect exactly the 'granulepos' to 'y' conversion process. Note also that none of these affect Ogg, only the higher decision-making layers... Different circumstanced necessitated by different codecs can lead to different valid choices, all of which work as far as Ogg is concerned. However, for our I-/P-/B-frame video case, there is a pretty clear winner.

===== Strategy 1: Straight Granulepos, Keyframes Are Not Our Problem. =====

In this scheme, the granulepos is a simple frame counter. The seeking decision-maker in the codec's framework plugin is responsible for determining if a frame is a keyframe or not, and if it can't begin decode from a given frame, it must request another earlier frame until it finds a keyframe. If the codec so desires, it can store 'what is my keyframe?' information in the stream packets.

This case means that each seek to a *specific* frame in a video stream will generally result in two Ogg seeks; a first seek to the the requested frame, then a second seek backwards to find that frame's keyframe.

A larger concern is the semantic accuracy of the granulepos; it's intended to reflect position accurately when decoding forward. In this scheme, it's fine for a P-frame to update the counter (as it can be decoded going strictly forward), but B frames will also advance the counter; they can't be decoded without subsequent P or I frames. Thus, the semantic value of granulepos no longer strictly represents 'we can decode up to 'granulepos' at the end of this frame'.

===== Strategy 2: Granulepos Represents Keyframes Only =====

In this scheme, only keyframes update the granulepos (monotonically or non-monotonically). It simplifies the seeking process to a keyframe as an Ogg-level seek to page 'x' will always yield a page with a keyframe. In addition, granulepos will also always mean 'we can decode up to *at least* this point in the stream. If the stream is truncated at P or B frames past granulepos, the extra frames can be discarded. (A special case would need to be defined to terminate a stream that doesn't end on an I frame).

The difficulty with this scheme is that it presents slightly more for the software level decoder to track; a proper frame number could not be determined internally without tracking from an I frame. Also, the granulepos an Ogg page would not necessarily map to the last packet on the page, or even any packet on that page; multiple sequential pages could have the same granulepos. It is conceptually slightly messy, although the 'messiness' does not make it at all impractical.

===== Strategy 3: Granulepos Encodes Some State =====

In some ways, this strategy is the most semantically 'over clever', but also the easiest to implement and the one that gives the most correct, up to date sync information. Pending comments, it is the I/P/B video strategy I currently favor.

The granulepos is 64 bits, a size that is absolutely necessary if, for example, it represents the PCM sample count in an audio codec. When being used to encode video frame number, however, it is comparatively absurdly large*.

* note that although granulepos is not permitted to wrap around, we can simply begin a new logical stream segment with a new serial number should a 30fps video stream ever hit the ten-billion year mark.

Thus we clearly have room to skim a few bits off the bottom of granulepos to represent I, P or B frame. These bits are not used as flags, but rather, frame representation becomes a counting problem; We do this such that the count is still always strictly increasing.

For example, we know that I frames will never be more than 256 frames apart and P frames no more than 31 B frames apart, the granulepos of an I frame can be defined to always be granulepos | 0xff == 0. If we can have up to seven intervening P frames, they could be numbered in granulepos-of-iframe + 0x20, 0x40, 0x60... 0xe0. B frames between the I and P frames would use the remaining five bits and be numbers as sub-I and sub-P frames 1 through 31. Thus, starting from zero, the frames/packets in the pattern IPBBPBBI would be numbered 0x000, 0x020, 0x021, 0x022, 0x040, 0x041, 0x042, 0x100.

If we wish to preserve the ability to represent a timebase, the granulepos number for I frames need not be increased monotonically and shifted; it can be used to represent the frame number. The above example becomes 0x000, 0x020, 0x021, 0x022, 0x040, 0x041, 0x042, 0x700. To get real frame number (from an I frame), we just shift granulepos >> 8. This scheme can be taken further or modified to get frame number from any video frame.

In this way, we can always seek, first time, to a desired key frame page (by seeking to Ogg page 'x' where x | 0xff == 0). In addition, each frame still has a unique frame number and also a clear 'group' number, potentially useful information to the decoder. Lastly, granulepos is still semantically correct, although it is now, in a sense, representing a whole.fractional frame number for buffering purposes.

===== Scheme Four: Extra 'Seekpos' Field / Straw Man =====

Another possibility requires extension of the current Ogg page format. Although older players would reject any such extended pages as invalid, we do have versioning and typing fields, so there's not actually any compatibility problems with current Ogg pages... in the future.

The idea in this scheme is to keep the current granulepos as a frame number field (ala scheme 1), but also add a new field 'seekpos' that is used, rather than granulepos, in seeking. The seekpos would represent the number of the last keyframe that passed by.

advantages:

1) The net effect of this strategy is to modify scheme 1 to only require one bisection seek rather than two. Some amount of code simplification (over scheme 1) at the decision-making level.

disadvantages:

1) The Ogg format will need to be revved. No current (ala 1.0) Ogg code will understand the new pages.

2) The header becomes larger, from a minimum size of 27 bytes to a minimum size of 35.

3) This strategy only enhances keyframes; it is of no use in other odd seeking cases.

4) Gives no more information than scheme 3, but is still more complicated, both in code and API (Ogg would have to understand keyframes).

Thus, there's no substantial reason to prefer extending the format over a scheme that's possible within the existing framework. Note that schemes 1-3 can all be implemented within the Ogg stream today.

Monty

[[Category:Ogg]]

Ogg

2017-11-20T14:41:11Z

MrZeus: /* Detecting Ogg files and extracting information */

The '''Ogg''' transport bitstream is designed to provide framing, error protection and seeking structure for higher-level codec streams that consist of raw, unencapsulated data packets, such as the [[Opus]], [[Vorbis]] and [[FLAC]] audio codecs or the [[Theora]] and [[Dirac]] video codecs.

== Name ==

Ogg derives from "ogging", jargon from the [https://wikipedia.org/wiki/Netrek computer game Netrek]. Ogg is not an acronym and should not be mentioned as "OGG".

== Design constraints for Ogg bitstreams ==

* True streaming; we must not need to seek to build a 100% complete bitstream.
* Use no more than approximately 1-2% of bitstream bandwidth for packet boundary marking, high-level framing, sync and seeking.
* Specification of absolute position within the original sample stream.
* Simple mechanism to ease limited editing, such as a simplified concatenation mechanism.
* Detection of corruption, recapture after error and direct, random access to data at arbitrary positions in the bitstream.

== Specification / standard==

The Ogg transport bitstream and file format is defined in RFC 3533 approved 2003-May. As RFC documents are invariable once approved, there will never be newer versions of RFC 3533, but an [[RFC_3533_Errata]] exists instead.

Existing flaws are discussed at [[OggIssues]], ideas for the future at [[TransOgg]].

== Detecting Ogg files and extracting information ==

Ogg files begin with a signature "OggS". This signature also repeats many times inside the file, at the beginning of every page.

There are several tools to get information about Ogg files:
* Ogginfo - part of Vorbis-Tools, supports Vorbis codec only (historical Ogg-vs-Vorbis issue), other codecs cause it to report garbage
* Opusinfo - part of Opus-Tools, supports only Opus codec well, only minimal Vorbis support
* Oggz ???
* [http://sourceforge.net/projects/mediainfo/ MediaInfo] - provides information about media (and some other) files, supports many types, also Ogg with various codecs, generic audio and video information only, no Ogg-specific details.

== Projects using Ogg ==

=== Codecs ===

* [[CMML]]
* [[FLAC]] ([http://xiph.org/flac/ogg_mapping.html Ogg mapping])
* [[OggKate|Kate]]
* [http://opus-codec.org/ Opus] ([[OggOpus|Ogg mapping]])
* [[OggPCM|PCM]]
* [[Ogg Skeleton|Skeleton]]
* [[Speex]] ([[OggSpeex|Ogg mapping]])
* [[Theora]] ([[OggTheora|Ogg mapping]])
* [[Vorbis]] ([[OggVorbis|Ogg mapping]])
* [[OggWrit|Writ]]

=== Servers ===

* [[Icecast]]
* [http://www.metavid.org/ Metavid]

== Developer info ==

* [[GranulePosAndSeeking]] - a discussion of the interpretation of granulepos, and the algorithm for seeking on Ogg files
* [[FishFaq]] - also discusses Granule Position

=== Ogg page format ===

The LSb (least significant bit) comes first in the Bytes. Fields
with more than one byte length are encoded LSB (least significant
byte) first.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| capture_pattern: Magic number for page start "OggS" | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version | header_type | granule_position | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | bitstream_serial_number | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_sequence_number | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | CRC_checksum | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_segments | segment_table | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... | 28-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

== Implementations ==

The Ogg encapsulation format can be handled with the following libraries:

* libogg: [http://svn.xiph.org/trunk/ogg/ libogg svn] (C, cross-platform) Low-level Ogg parsing and writing.
* liboggz: [http://git.xiph.org/?p=liboggz.git liboggz git] (C, cross-platform) liboggz wraps libogg and provides features such as seeking.
* the Ogg Directshow filters: see [http://www.illiminable.com/ogg/ illiminable] (C++, Win32)
* [http://www.kfish.org/software/hogg HOgg] (pure Haskell)
* [http://www.jcraft.com/jorbis/ JOrbis] (pure Java) contains com.jcraft.jogg
* [http://www.sacredchao.net/quodlibet/wiki/Development/Mutagen Mutagen] (pure Python)

== See also ==

* [[Flash]]
* [[Oggless]]
* [[MIME Types and File Extensions]]
* [[RFC_3533_Errata]] - errors and flaws in the specification
* [[Nut_Container]]

== External links ==

* [http://www.xiph.org/ogg/doc/ Ogg documentation]
* [http://www.ietf.org/rfc/rfc3533.txt Ogg RFC]
* [http://en.wikipedia.org/wiki/Ogg Ogg at Wikipedia]
* [http://wiki.multimedia.cx/index.php?title=Ogg Ogg at Multimedia Wiki]

[[Category:Ogg]]

Ogg

2017-11-20T14:36:49Z

MrZeus: /* Specification / standard */

The '''Ogg''' transport bitstream is designed to provide framing, error protection and seeking structure for higher-level codec streams that consist of raw, unencapsulated data packets, such as the [[Opus]], [[Vorbis]] and [[FLAC]] audio codecs or the [[Theora]] and [[Dirac]] video codecs.

== Name ==

Ogg derives from "ogging", jargon from the [https://wikipedia.org/wiki/Netrek computer game Netrek]. Ogg is not an acronym and should not be mentioned as "OGG".

== Design constraints for Ogg bitstreams ==

* True streaming; we must not need to seek to build a 100% complete bitstream.
* Use no more than approximately 1-2% of bitstream bandwidth for packet boundary marking, high-level framing, sync and seeking.
* Specification of absolute position within the original sample stream.
* Simple mechanism to ease limited editing, such as a simplified concatenation mechanism.
* Detection of corruption, recapture after error and direct, random access to data at arbitrary positions in the bitstream.

== Specification / standard==

The Ogg transport bitstream and file format is defined in RFC 3533 approved 2003-May. As RFC documents are invariable once approved, there will never be newer versions of RFC 3533, but an [[RFC_3533_Errata]] exists instead.

Existing flaws are discussed at [[OggIssues]], ideas for the future at [[TransOgg]].

== Detecting Ogg files and extracting information ==

Ogg files begin with a signature "OggS". This signature also repeats many times inside the file, at the beginning of every page. There are several tools to get information about Ogg files:
* Ogginfo - part of Vorbis-Tools, supports Vorbis codec only (historical Ogg-vs-Vorbis issue), other codecs cause it to report garbage
* Opusinfo - part of Opus-Tools, supports only Opus codec well, only minimal Vorbis support
* Oggz ???
* MediaInfo [http://sourceforge.net/projects/mediainfo/ sf.net/projects/mediainfo] - provides information about media (and some other) files, supports many types, also Ogg with various codecs, generic audio and video information only, no Ogg-specific details

== Projects using Ogg ==

=== Codecs ===

* [[CMML]]
* [[FLAC]] ([http://xiph.org/flac/ogg_mapping.html Ogg mapping])
* [[OggKate|Kate]]
* [http://opus-codec.org/ Opus] ([[OggOpus|Ogg mapping]])
* [[OggPCM|PCM]]
* [[Ogg Skeleton|Skeleton]]
* [[Speex]] ([[OggSpeex|Ogg mapping]])
* [[Theora]] ([[OggTheora|Ogg mapping]])
* [[Vorbis]] ([[OggVorbis|Ogg mapping]])
* [[OggWrit|Writ]]

=== Servers ===

* [[Icecast]]
* [http://www.metavid.org/ Metavid]

== Developer info ==

* [[GranulePosAndSeeking]] - a discussion of the interpretation of granulepos, and the algorithm for seeking on Ogg files
* [[FishFaq]] - also discusses Granule Position

=== Ogg page format ===

The LSb (least significant bit) comes first in the Bytes. Fields
with more than one byte length are encoded LSB (least significant
byte) first.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| capture_pattern: Magic number for page start "OggS" | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version | header_type | granule_position | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | bitstream_serial_number | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_sequence_number | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | CRC_checksum | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_segments | segment_table | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... | 28-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

== Implementations ==

The Ogg encapsulation format can be handled with the following libraries:

* libogg: [http://svn.xiph.org/trunk/ogg/ libogg svn] (C, cross-platform) Low-level Ogg parsing and writing.
* liboggz: [http://git.xiph.org/?p=liboggz.git liboggz git] (C, cross-platform) liboggz wraps libogg and provides features such as seeking.
* the Ogg Directshow filters: see [http://www.illiminable.com/ogg/ illiminable] (C++, Win32)
* [http://www.kfish.org/software/hogg HOgg] (pure Haskell)
* [http://www.jcraft.com/jorbis/ JOrbis] (pure Java) contains com.jcraft.jogg
* [http://www.sacredchao.net/quodlibet/wiki/Development/Mutagen Mutagen] (pure Python)

== See also ==

* [[Flash]]
* [[Oggless]]
* [[MIME Types and File Extensions]]
* [[RFC_3533_Errata]] - errors and flaws in the specification
* [[Nut_Container]]

== External links ==

* [http://www.xiph.org/ogg/doc/ Ogg documentation]
* [http://www.ietf.org/rfc/rfc3533.txt Ogg RFC]
* [http://en.wikipedia.org/wiki/Ogg Ogg at Wikipedia]
* [http://wiki.multimedia.cx/index.php?title=Ogg Ogg at Multimedia Wiki]

[[Category:Ogg]]

Ogg

2017-11-19T21:07:51Z

MrZeus: /* Codecs */ remove duplicate Opus listing

The '''Ogg''' transport bitstream is designed to provide framing, error protection and seeking structure for higher-level codec streams that consist of raw, unencapsulated data packets, such as the [[Opus]], [[Vorbis]] and [[FLAC]] audio codecs or the [[Theora]] and [[Dirac]] video codecs.

== Name ==

Ogg derives from "ogging", jargon from the [https://wikipedia.org/wiki/Netrek computer game Netrek]. Ogg is not an acronym and should not be mentioned as "OGG".

== Design constraints for Ogg bitstreams ==

* True streaming; we must not need to seek to build a 100% complete bitstream.
* Use no more than approximately 1-2% of bitstream bandwidth for packet boundary marking, high-level framing, sync and seeking.
* Specification of absolute position within the original sample stream.
* Simple mechanism to ease limited editing, such as a simplified concatenation mechanism.
* Detection of corruption, recapture after error and direct, random access to data at arbitrary positions in the bitstream.

== Specification / standard==

The Ogg transport bitstream and file format is defined in RFC 3533 approved 2003-May. As RFC documents are invariable once approved, there will never be newer versions of RFC 3533, but an [[RFC_3533_Errata]] exists instead. Existing flaws are discussed at [[OggIssues]], ideas for the future at [[TransOgg]].

== Detecting Ogg files and extracting information ==

Ogg files begin with a signature "OggS". This signature also repeats many times inside the file, at the beginning of every page. There are several tools to get information about Ogg files:
* Ogginfo - part of Vorbis-Tools, supports Vorbis codec only (historical Ogg-vs-Vorbis issue), other codecs cause it to report garbage
* Opusinfo - part of Opus-Tools, supports only Opus codec well, only minimal Vorbis support
* Oggz ???
* MediaInfo [http://sourceforge.net/projects/mediainfo/ sf.net/projects/mediainfo] - provides information about media (and some other) files, supports many types, also Ogg with various codecs, generic audio and video information only, no Ogg-specific details

== Projects using Ogg ==

=== Codecs ===

* [[CMML]]
* [[FLAC]] ([http://xiph.org/flac/ogg_mapping.html Ogg mapping])
* [[OggKate|Kate]]
* [http://opus-codec.org/ Opus] ([[OggOpus|Ogg mapping]])
* [[OggPCM|PCM]]
* [[Ogg Skeleton|Skeleton]]
* [[Speex]] ([[OggSpeex|Ogg mapping]])
* [[Theora]] ([[OggTheora|Ogg mapping]])
* [[Vorbis]] ([[OggVorbis|Ogg mapping]])
* [[OggWrit|Writ]]

=== Servers ===

* [[Icecast]]
* [http://www.metavid.org/ Metavid]

== Developer info ==

* [[GranulePosAndSeeking]] - a discussion of the interpretation of granulepos, and the algorithm for seeking on Ogg files
* [[FishFaq]] - also discusses Granule Position

=== Ogg page format ===

The LSb (least significant bit) comes first in the Bytes. Fields
with more than one byte length are encoded LSB (least significant
byte) first.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| capture_pattern: Magic number for page start "OggS" | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version | header_type | granule_position | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | bitstream_serial_number | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_sequence_number | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | CRC_checksum | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_segments | segment_table | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... | 28-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

== Implementations ==

The Ogg encapsulation format can be handled with the following libraries:

* libogg: [http://svn.xiph.org/trunk/ogg/ libogg svn] (C, cross-platform) Low-level Ogg parsing and writing.
* liboggz: [http://git.xiph.org/?p=liboggz.git liboggz git] (C, cross-platform) liboggz wraps libogg and provides features such as seeking.
* the Ogg Directshow filters: see [http://www.illiminable.com/ogg/ illiminable] (C++, Win32)
* [http://www.kfish.org/software/hogg HOgg] (pure Haskell)
* [http://www.jcraft.com/jorbis/ JOrbis] (pure Java) contains com.jcraft.jogg
* [http://www.sacredchao.net/quodlibet/wiki/Development/Mutagen Mutagen] (pure Python)

== See also ==

* [[Flash]]
* [[Oggless]]
* [[MIME Types and File Extensions]]
* [[RFC_3533_Errata]] - errors and flaws in the specification
* [[Nut_Container]]

== External links ==

* [http://www.xiph.org/ogg/doc/ Ogg documentation]
* [http://www.ietf.org/rfc/rfc3533.txt Ogg RFC]
* [http://en.wikipedia.org/wiki/Ogg Ogg at Wikipedia]
* [http://wiki.multimedia.cx/index.php?title=Ogg Ogg at Multimedia Wiki]

[[Category:Ogg]]

Ogg

2017-11-19T21:05:09Z

MrZeus: /* Name */ linkify Netrek

The '''Ogg''' transport bitstream is designed to provide framing, error protection and seeking structure for higher-level codec streams that consist of raw, unencapsulated data packets, such as the [[Opus]], [[Vorbis]] and [[FLAC]] audio codecs or the [[Theora]] and [[Dirac]] video codecs.

== Name ==

Ogg derives from "ogging", jargon from the [https://wikipedia.org/wiki/Netrek computer game Netrek]. Ogg is not an acronym and should not be mentioned as "OGG".

== Design constraints for Ogg bitstreams ==

* True streaming; we must not need to seek to build a 100% complete bitstream.
* Use no more than approximately 1-2% of bitstream bandwidth for packet boundary marking, high-level framing, sync and seeking.
* Specification of absolute position within the original sample stream.
* Simple mechanism to ease limited editing, such as a simplified concatenation mechanism.
* Detection of corruption, recapture after error and direct, random access to data at arbitrary positions in the bitstream.

== Specification / standard==

The Ogg transport bitstream and file format is defined in RFC 3533 approved 2003-May. As RFC documents are invariable once approved, there will never be newer versions of RFC 3533, but an [[RFC_3533_Errata]] exists instead. Existing flaws are discussed at [[OggIssues]], ideas for the future at [[TransOgg]].

== Detecting Ogg files and extracting information ==

Ogg files begin with a signature "OggS". This signature also repeats many times inside the file, at the beginning of every page. There are several tools to get information about Ogg files:
* Ogginfo - part of Vorbis-Tools, supports Vorbis codec only (historical Ogg-vs-Vorbis issue), other codecs cause it to report garbage
* Opusinfo - part of Opus-Tools, supports only Opus codec well, only minimal Vorbis support
* Oggz ???
* MediaInfo [http://sourceforge.net/projects/mediainfo/ sf.net/projects/mediainfo] - provides information about media (and some other) files, supports many types, also Ogg with various codecs, generic audio and video information only, no Ogg-specific details

== Projects using Ogg ==

=== Codecs ===

* [[Opus]]
* [[CMML]]
* [[FLAC]] ([http://xiph.org/flac/ogg_mapping.html Ogg mapping])
* [[OggKate|Kate]]
* [http://opus-codec.org/ Opus] ([[OggOpus|Ogg mapping]])
* [[OggPCM|PCM]]
* [[Ogg Skeleton|Skeleton]]
* [[Speex]] ([[OggSpeex|Ogg mapping]])
* [[Theora]] ([[OggTheora|Ogg mapping]])
* [[Vorbis]] ([[OggVorbis|Ogg mapping]])
* [[OggWrit|Writ]]

=== Servers ===

* [[Icecast]]
* [http://www.metavid.org/ Metavid]

== Developer info ==

* [[GranulePosAndSeeking]] - a discussion of the interpretation of granulepos, and the algorithm for seeking on Ogg files
* [[FishFaq]] - also discusses Granule Position

=== Ogg page format ===

The LSb (least significant bit) comes first in the Bytes. Fields
with more than one byte length are encoded LSB (least significant
byte) first.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| capture_pattern: Magic number for page start "OggS" | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version | header_type | granule_position | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | bitstream_serial_number | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_sequence_number | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | CRC_checksum | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_segments | segment_table | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... | 28-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

== Implementations ==

The Ogg encapsulation format can be handled with the following libraries:

* libogg: [http://svn.xiph.org/trunk/ogg/ libogg svn] (C, cross-platform) Low-level Ogg parsing and writing.
* liboggz: [http://git.xiph.org/?p=liboggz.git liboggz git] (C, cross-platform) liboggz wraps libogg and provides features such as seeking.
* the Ogg Directshow filters: see [http://www.illiminable.com/ogg/ illiminable] (C++, Win32)
* [http://www.kfish.org/software/hogg HOgg] (pure Haskell)
* [http://www.jcraft.com/jorbis/ JOrbis] (pure Java) contains com.jcraft.jogg
* [http://www.sacredchao.net/quodlibet/wiki/Development/Mutagen Mutagen] (pure Python)

== See also ==

* [[Flash]]
* [[Oggless]]
* [[MIME Types and File Extensions]]
* [[RFC_3533_Errata]] - errors and flaws in the specification
* [[Nut_Container]]

== External links ==

* [http://www.xiph.org/ogg/doc/ Ogg documentation]
* [http://www.ietf.org/rfc/rfc3533.txt Ogg RFC]
* [http://en.wikipedia.org/wiki/Ogg Ogg at Wikipedia]
* [http://wiki.multimedia.cx/index.php?title=Ogg Ogg at Multimedia Wiki]

[[Category:Ogg]]

Daala

2017-11-17T22:29:43Z

MrZeus: /* Presentations */

Daala is the codename for a new video compression technology.

The effort is a collaboration between the [https://www.mozilla.org/en-US/research/ Mozilla Foundation], the [https://www.xiph.org/ Xiph.Org Foundation] and any other contributors that wish to help.

The goal of the project is to provide a video format that's free to implement, use and distribute, and a reference implementation with technical performance superior to [https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding H.265].

Please see the links below or the [https://www.xiph.org/daala/ main page] for more information.

== Wiki Pages ==
* [[Daala Quickstart|Daala Quickstart (Linux/MacOS)]]
* [[Daala Quickstart Windows|Daala Quickstart (Windows)]]
* [[Daala MinGW64 Environment]]

* [[Daala Weekly Meetings|Daala Weekly Meetings]]

* [[AreWeCompressedYet]]
* [[RD Curve Data Format]]

* [[DaalaTodo|Daala To-do List]]
* [[DaalaRoadmap|Daala Roadmap]]

* [[Intra|Intra-prediction within Daala]]

* [[Videos|Digital Primers]] - educational videos about audio/video technology

== Communication ==
You are '''encouraged''' to join the
* [irc://irc.freenode.net/daala '''#daala''' IRC channel at freenode.net] - if you don't have an IRC client, you can use Freenode's '''[https://webchat.freenode.net/?channels=%23daala webchat]''' instead.
* [http://lists.xiph.org/mailman/listinfo/daala Daala Email List]

=== Weekly Meetings ===
You are also welcome to attend the public [[Daala Weekly Meetings|weekly progress meetings]] by installing and using [http://wiki.mumble.info Mumble]. 
The address is '''mf4.xiph.org''' and the port is '''64738''' (you can run '''mumble://mf4.xiph.org:64738''' within your browser as a shortcut). 
The meetings occur on '''Tuesdays''' at '''[http://www.timeanddate.com/worldclock/fixedtime.html?msg=Daala+Weekly+Meeting&iso=20150428T09&p1=1241 9AM Pacific Time]''' (5PM UTC/GMT).
The meeting agenda used to be available at '''[https://daala.etherpad.mozilla.org/weekly-meeting this Etherpad]''', the October 13, 2015 meeting is available on [https://docs.google.com/document/d/1JP_Ko3wPuyDWhooZcp_m9kndyfZ75xN5YOi5yIMCW0s/edit?pli=1 Google Docs] and, following the migration to Etherpad Lite, the meeting agenda and minutes are now available at [https://public.etherpad-mozilla.org/p/daala-weekly-meeting this Etherpad].

=== Other ===
* [http://forum.doom9.org/showthread.php?t=168004 Doom9 Forum discussion] - generic forum thread regarding Daala
* <del>[https://daala.etherpad.mozilla.org/ep/padlist/all-pads Daala Etherpads] - you can [https://daala.etherpad.mozilla.org/ep/account/request-account request a free account] to view these. You should receive access within a few days.</del> Mozilla are transitioning to Etherpad Lite.
* [http://benjamin.smedbergs.us/weekly-updates.fcgi/project/daala Daala Project Status Board] - what Daala bits the Mozilla people are working on

== Coding ==
You can get a copy of the latest Daala Source Code from [https://git.xiph.org/?p=daala.git;a=summary '''git.xiph.org'''] or [https://github.com/xiph/daala '''GitHub''']. Please stick to the '''[https://git.xiph.org/?p=daala.git;a=blob_plain;f=doc/coding_style.html Coding Style Guide]'''.

* [https://review.xiph.org/all?limit=100 Xiph Code Reviews] - there is a proposal on the review process '''[[DaalaReview|here]]'''
* [https://github.com/xiph/daala/issues Daala's issues] - Issue/bug tracker on Github
* [https://mf4.xiph.org/jenkins/view/daala/ Continuous Integration Tests] - these run every time a new commit is made to the Daala git master, to make sure the new code hasn't broken existing functionality.
* [[OggDaala]] - definitions for embedding Daala video within an [[Ogg]] container.

== Demos ==
* [https://people.xiph.org/~xiphmont/demo/daala/player-demo.shtml Daala Video Player] - an example implementation of a Daala decoder and player, ported to Javascript using [https://github.com/kripken/emscripten Emscripten].

=== Codec Techniques ===
* [https://people.xiph.org/~xiphmont/demo/ Demo Articles] - explanations on certain techniques used in Daala (and other Xiph.Org projects)
* [http://exp.martres.me/edi/ Edge-Directed Interpolation] ([https://github.com/smarter/edi source code])
* [https://people.xiph.org/~ds/edi/info.html More Edge-Directed Interpolation]
* [https://people.xiph.org/~unlord/demo/intra.html Intra-prediction]
* [https://people.xiph.org/~unlord/zigzags.html Macroblock Coefficient Zigzag Graph] - HTML page generated using [https://github.com/xiph/daala/blob/master/tools/draw_zigzags.c tools/draw_zigzags.c] from the Daala source code.
* [https://jmvalin.ca/video/haar_example/ Still Image Screenshots] - comparison between Daala's Lapped Transform and Haar methods, and JPEG/x264/x265.

== Documents ==
* [https://people.xiph.org/~unlord/spie_cfl.pdf Chroma from Luma (CfL)]
* [http://jmvalin.ca/papers/spie_pvq.pdf Perceptual Vector Quantisation (PVQ)] - see also [https://people.xiph.org/~yushin/ietf/draft-cho-netvc-applypvq.html Applying PVQ Outside Daala]
* [https://people.xiph.org/~tterribe/daala/vbsobmc.pdf Overlapped Block Motion Compensation (OBMC)]
* [https://mf4.xiph.org/jenkins/job/daala-autotools/ws/doc/html/index.html C API Documentation]
* [https://people.xiph.org/~yushin/tmp__/yushin_phd_thesis.pdf Image Coding Thesis] by Yushin Cho
* [http://arxiv.org/pdf/1411.4290v1.pdf Maximising Coding Efficiency Through Block Rotation] and why it [http://lists.xiph.org/pipermail/daala/2015-January/000054.html won't work well within Daala]
* [http://jmvalin.ca/video/theoretical_results.pdf JMSpeex' Journal of Dubious Theoretical Results] - "take with an entire shaker-full of salt"
* [https://people.xiph.org/~unlord/pcs_daala.pdf Using Daala Intra Frames for Still Picture Coding]
* [http://arxiv.org/abs/1602.05975 The Daala Directional Deringing Filter]
* [https://people.xiph.org/~unlord/icip2016.pdf Daala: A Perceptually-Driven Still Image Codec](draft) - submitted at [http://2016.ieeeicip.org/ ICIP 2016].
* [http://people.xiph.org/~tterribe/daala/neon_tutorial.pdf SIMD Assembly Tutorial: ARM NEON]
* [https://jmvalin.ca/video/mmsp2016_poster.pdf Daala Technologies Poster]

=== IETF Drafts ===
* [https://tools.ietf.org/html/draft-egge-videocodec-tdlt Time-Domain Lapped Transforms (TDLT)] - documents the Lapped Transform pre- and post-filters used for block-edge decorrelation
* [https://tools.ietf.org/html/draft-valin-videocodec-pvq Perceptual Vector Quantisation (PVQ)] -
* [https://tools.ietf.org/html/draft-terriberry-codingtools Coding Tools] - documents Entropy Coding, Integer Transforms and other techniques
* [https://tools.ietf.org/html/draft-moffitt-netvc-requirements Internet Video Codec (NetVC) Requirements] - explains what requirements and use cases Daala is trying to cater for
* [https://tools.ietf.org/html/draft-daede-netvc-testing Internet Video Codec (NetVC) Testing and Quality Measurement]
* [https://tools.ietf.org/html/draft-terriberry-ipr-license Example IPR Licence Terms]

Additional drafts can be found at the [https://datatracker.ietf.org/wg/netvc/documents/ IETF DataTracker].

== Presentations ==
For a more in-depth look at the IETF's NetVC Meetings, use the [https://datatracker.ietf.org/wg/netvc/meetings/ IETF DataTracker].

* 2017-11-15 - IETF 100 - [https://www.youtube.com/watch?v=_wRLR8ypCg0&t=4682s Video] - [https://docs.google.com/presentation/d/12BsMrcGo27bgf7OMpOmlIZUEBblMmBXESCeG3_FZ5cI/edit#slide=id.p Slides]
* 2017-05-25 - IETF 98 - [https://datatracker.ietf.org/meeting/98/materials/slides-98-hackathon-netvc/ NetVC Hackathon Slides], [https://jmvalin.ca/video/cdef_slides.pdf CDEF Slides]
* 2017-02-05 - FOSDEM 2017 - [https://video.fosdem.org/2017/K.3.401/om_av1.vp8.webm Video] - [https://fosdem.org/2017/schedule/event/om_av1/attachments/slides/1795/export/events/attachments/om_av1/slides/1795/av1_update.pdf Slides]
* 2017-01-17 - Linux Conf AU - [https://www.youtube.com/watch?v=lzPaldsmJbk Video] - [https://people.xiph.org/~tterribe/pubs/lca2017/aom.pdf Slides]
-------
* 2016-11-15 - IETF 97 - [https://datatracker.ietf.org/meeting/97/agenda/netvc/ Agenda] - [http://recs.conf.meetecho.com/Playout/watch.jsp?recording=IETF97_NETVC&chapter=chapter_1 Video and Chat] - [https://datatracker.ietf.org/meeting/97/session/netvc/ Slides]
* 2016-09-04 - VideoLAN Dev Days 2016 - [https://www.youtube.co.uk/watch?v=AOssZFJ0EdI Video] - [http://people.xiph.org/~tterribe/daala/vdd2016.pdf Slides]
* 2016-08-31 - SPIE Royalty-free Video - [http://spie.org/OPO/conferencedetails/digital-image-processing#session-7 Schedule and Abstracts] - [https://www.youtube.co.uk/watch?v=wi1BefrfTos&t=41m40s Video] - [http://people.xiph.org/~tterribe/daala/daala-spie-adip2016-slides.pdf Slides]
* 2016-07-18 - IETF 96 - [https://datatracker.ietf.org/meeting/96/agenda/netvc/ Agenda] - [http://recs.conf.meetecho.com/Playout/watch.jsp?recording=IETF96_NETVC&chapter=chapter_1 Video and Chat] - [https://datatracker.ietf.org/meeting/96/session/netvc/ Slides]
* 2016-06-27 - Daala's Entropy Coder - [http://people.xiph.org/~tterribe/daala/daala_ec.pdf Slides]
* 2016-04-07 - IETF 95 - [https://datatracker.ietf.org/meeting/95/agenda/netvc/ Agenda] - [http://recs.conf.meetecho.com/Playout/watch.jsp?recording=IETF95_NETVC&chapter=chapter_1 Video and Chat] - [https://www.ietf.org/proceedings/95/slides/slides-95-netvc-2.pdf Slides] - [https://datatracker.ietf.org/meeting/95/session/netvc/ Other Materials]
* 2016-04-01 - DCC 2016 - [https://people.xiph.org/~unlord/Daala-DCC2016.pdf Slides] - [https://people.xiph.org/~unlord/1853a466.pdf Paper]
* 2016-01-30 - FOSDEM 2016 - [https://fosdem.org/2016/schedule/event/daala/ Summary] - [https://video.fosdem.org/2016/h2214/implementing-a-native-daala-decoder-in-ffmpeg.mp4 Video (MP4, 57MB)] - [https://people.xiph.org/~unlord/Xmi.pdf Slides]
-------
* 2015-11-02 - IETF 94 - [https://datatracker.ietf.org/meeting/94/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf94/netvc Video and Chat] - [https://datatracker.ietf.org/meeting/94/session/netvc/ Other Materials]
* 2015-10-24 - LinuxDay 24 (Turin) - [https://people.xiph.org/~tterribe/daala/linuxday24.pdf Slides]
* 2015-10-21 - MPEG 113 - Future Video Coding Workshop - [https://people.xiph.org/~tterribe/daala/mpeg113.pdf Slides]
* 2015-09-19 - VideoLAN Dev Days - [https://www.youtube.com/playlist?list=PLQLpBN3oI7E44HIdTOovThc1MNHLchgHE YouTube Playlist] - [https://people.xiph.org/~tterribe/daala/vdd2015.pdf Daala Slides]
* 2015-07-22 - IETF 93 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF93_NETVC_II&chapter=chapter_1 NetVC Session 2/2] - [https://datatracker.ietf.org/meeting/93/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf93/netvc_II Video and Chat] - [https://www.ietf.org/proceedings/93/slides/slides-93-netvc-4.pdf Slides] - [https://www.ietf.org/jabber/logs/netvc/2015-07-22.html Jabber Log] - [https://datatracker.ietf.org/meeting/93/session/netvc/ Other Materials]
* 2015-07-20 - IETF 93 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF93_NETVC&chapter=chapter_1 NetVC Session 1/2] - [https://datatracker.ietf.org/meeting/93/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf93/netvc Video and Chat] - [https://www.ietf.org/proceedings/93/slides/slides-93-netvc-3.pdf Slides] - [https://www.ietf.org/jabber/logs/netvc/2015-07-20.html Jabber Log]
* 2015-03-24 - IETF 92 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF92_NETVC&chapter=chapter_0 NetVC Session] - Audio as [https://people.xiph.org/~tdaede/audio/ietf92-venetian-20150324-0900-am1.opus Opus] (29MB) or [https://www.ietf.org/audio/ietf92/ietf92-venetian-20150324-0900-am1.mp3 MP3] (119MB, action starts at 14:50) - [https://www.ietf.org/proceedings/92/slides/slides-92-netvc-0.pdf Slides] - [https://www.ietf.org/mail-archive/web/video-codec/current/msg00235.html Notes] - [https://www.ietf.org/jabber/logs/netvc/2015-03-24.html Jabber Log]
* 2015-02-11 - SPIE talks:


** [http://people.xiph.org/~tdaede/video/SPIE_Nathan.webm Chroma from Luma (CfL)] - [https://people.xiph.org/~unlord/SPIE-2015-CfL.pdf Slides] - [https://people.xiph.org/~unlord/spie_cfl.pdf Paper]
** [http://people.xiph.org/~tdaede/video/SPIE_PVQ.webm Perceptual Vector Quantisation (PVQ)] - [http://people.xiph.org/~tterribe/daala/spie_pvq_slides.pdf Slides] - [http://jmvalin.ca/papers/spie_pvq.pdf Paper]
** [http://people.xiph.org/~tdaede/video/SPIE_Tim.webm Adaptive Motion Compensation Without Blocking Artifacts] - [http://people.xiph.org/~tterribe/daala/spie_obmc_slides.pdf Slides] - [https://people.xiph.org/~tterribe/daala/vbsobmc.pdf Paper]
* 2015-01-31 - [http://ftp.osuosl.org/pub/fosdem/2015/devroom-open_media/daala.mp4 Daala Project Update at FOSDEM 2015] - [https://fosdem.org/2015/schedule/event/daala/ summary] - [https://fosdem.org/2015/schedule/event/daala/attachments/slides/569/export/events/attachments/daala/slides/569/Daala_FOSDEM_2015.pdf Slides]
* 2015-01-14 - [https://www.youtube.co.uk/watch?v=Dmho4gcRvQ4 Linux Conf 2015] - [http://lca2015.linux.org.au/schedule/30187/view_talk presentation summary] - [https://people.xiph.org/~tterribe/pubs/lca2015/daala.pdf Slides]
-------
* 2014-09-16 - [https://air.mozilla.org/daala-are-we-compressed-yet/ Daala: Are We Compressed Yet?]
* 2014-06-25 - [https://air.mozilla.org/sparsity-induced-prediction-for-images-and-video/ Sparsity Induced Prediction for Images and Video]
* 2014-06-06 - VP9 Summit (no video available) - [https://people.xiph.org/~xiphmont/demo/daala/daala-vp9summit-20140606.pdf Slides]
-------
* 2013-10-23 - [https://people.xiph.org/~xiphmont/video/Free_Codecs_Update_Opus_and_Daala.ogv Opus and Daala: State of the Art Royalty-free Codecs] - [https://people.xiph.org/~greg/gstreamer-daala-opus.pdf Slides]
* 2013-09-30 - [https://people.xiph.org/~tterribe/daala/coding_party2/?C=M;O=A Daala Coding Party 2] - [https://people.xiph.org/~unlord/Daala-Intra.pdf Slides]
* 2013-05-02 - [https://people.xiph.org/~xiphmont/tim-terriberry-presents-daala/ Tim Terriberry Presents Daala]
-------
* 2012-01-24 - [https://media.basilgohar.com/derf-talks/?C=M;O=A Introduction to Video Coding] - [https://people.xiph.org/~tterribe/pubs/lca2012/auckland/intro_to_video1.pdf Slides] (no video for slides 1-50)

== Other Websites ==
* [https://www.youtube.com/playlist?list=PLEeMksZoEQ1xQEuLF50w0RwDwLgDGwSG- Daala Presentations on YouTube]
* [https://www.youtube.com/playlist?list=PLOU2XLYxmsIJGErt5rrCqaSGTMyyqNt2H Google's Compressor Head videos] - a beginner's introduction to the world of data compression
* [https://www.zazzle.com/daala_tee_shirt-235139149596175944 Daala T-shirts] - if you'd like a free one, help out with the project and ask the Mozilla guys nicely for one :-)
* [https://www.xiph.org/donate/ Donate to Xiph.Org]
* [[Daala_on_Wheels|Historical Daala wiki page]]

[[Category:Daala]]

Daala

2017-11-17T22:25:41Z

MrZeus: /* Presentations */

Daala is the codename for a new video compression technology.

The effort is a collaboration between the [https://www.mozilla.org/en-US/research/ Mozilla Foundation], the [https://www.xiph.org/ Xiph.Org Foundation] and any other contributors that wish to help.

The goal of the project is to provide a video format that's free to implement, use and distribute, and a reference implementation with technical performance superior to [https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding H.265].

Please see the links below or the [https://www.xiph.org/daala/ main page] for more information.

== Wiki Pages ==
* [[Daala Quickstart|Daala Quickstart (Linux/MacOS)]]
* [[Daala Quickstart Windows|Daala Quickstart (Windows)]]
* [[Daala MinGW64 Environment]]

* [[Daala Weekly Meetings|Daala Weekly Meetings]]

* [[AreWeCompressedYet]]
* [[RD Curve Data Format]]

* [[DaalaTodo|Daala To-do List]]
* [[DaalaRoadmap|Daala Roadmap]]

* [[Intra|Intra-prediction within Daala]]

* [[Videos|Digital Primers]] - educational videos about audio/video technology

== Communication ==
You are '''encouraged''' to join the
* [irc://irc.freenode.net/daala '''#daala''' IRC channel at freenode.net] - if you don't have an IRC client, you can use Freenode's '''[https://webchat.freenode.net/?channels=%23daala webchat]''' instead.
* [http://lists.xiph.org/mailman/listinfo/daala Daala Email List]

=== Weekly Meetings ===
You are also welcome to attend the public [[Daala Weekly Meetings|weekly progress meetings]] by installing and using [http://wiki.mumble.info Mumble]. 
The address is '''mf4.xiph.org''' and the port is '''64738''' (you can run '''mumble://mf4.xiph.org:64738''' within your browser as a shortcut). 
The meetings occur on '''Tuesdays''' at '''[http://www.timeanddate.com/worldclock/fixedtime.html?msg=Daala+Weekly+Meeting&iso=20150428T09&p1=1241 9AM Pacific Time]''' (5PM UTC/GMT).
The meeting agenda used to be available at '''[https://daala.etherpad.mozilla.org/weekly-meeting this Etherpad]''', the October 13, 2015 meeting is available on [https://docs.google.com/document/d/1JP_Ko3wPuyDWhooZcp_m9kndyfZ75xN5YOi5yIMCW0s/edit?pli=1 Google Docs] and, following the migration to Etherpad Lite, the meeting agenda and minutes are now available at [https://public.etherpad-mozilla.org/p/daala-weekly-meeting this Etherpad].

=== Other ===
* [http://forum.doom9.org/showthread.php?t=168004 Doom9 Forum discussion] - generic forum thread regarding Daala
* <del>[https://daala.etherpad.mozilla.org/ep/padlist/all-pads Daala Etherpads] - you can [https://daala.etherpad.mozilla.org/ep/account/request-account request a free account] to view these. You should receive access within a few days.</del> Mozilla are transitioning to Etherpad Lite.
* [http://benjamin.smedbergs.us/weekly-updates.fcgi/project/daala Daala Project Status Board] - what Daala bits the Mozilla people are working on

== Coding ==
You can get a copy of the latest Daala Source Code from [https://git.xiph.org/?p=daala.git;a=summary '''git.xiph.org'''] or [https://github.com/xiph/daala '''GitHub''']. Please stick to the '''[https://git.xiph.org/?p=daala.git;a=blob_plain;f=doc/coding_style.html Coding Style Guide]'''.

* [https://review.xiph.org/all?limit=100 Xiph Code Reviews] - there is a proposal on the review process '''[[DaalaReview|here]]'''
* [https://github.com/xiph/daala/issues Daala's issues] - Issue/bug tracker on Github
* [https://mf4.xiph.org/jenkins/view/daala/ Continuous Integration Tests] - these run every time a new commit is made to the Daala git master, to make sure the new code hasn't broken existing functionality.
* [[OggDaala]] - definitions for embedding Daala video within an [[Ogg]] container.

== Demos ==
* [https://people.xiph.org/~xiphmont/demo/daala/player-demo.shtml Daala Video Player] - an example implementation of a Daala decoder and player, ported to Javascript using [https://github.com/kripken/emscripten Emscripten].

=== Codec Techniques ===
* [https://people.xiph.org/~xiphmont/demo/ Demo Articles] - explanations on certain techniques used in Daala (and other Xiph.Org projects)
* [http://exp.martres.me/edi/ Edge-Directed Interpolation] ([https://github.com/smarter/edi source code])
* [https://people.xiph.org/~ds/edi/info.html More Edge-Directed Interpolation]
* [https://people.xiph.org/~unlord/demo/intra.html Intra-prediction]
* [https://people.xiph.org/~unlord/zigzags.html Macroblock Coefficient Zigzag Graph] - HTML page generated using [https://github.com/xiph/daala/blob/master/tools/draw_zigzags.c tools/draw_zigzags.c] from the Daala source code.
* [https://jmvalin.ca/video/haar_example/ Still Image Screenshots] - comparison between Daala's Lapped Transform and Haar methods, and JPEG/x264/x265.

== Documents ==
* [https://people.xiph.org/~unlord/spie_cfl.pdf Chroma from Luma (CfL)]
* [http://jmvalin.ca/papers/spie_pvq.pdf Perceptual Vector Quantisation (PVQ)] - see also [https://people.xiph.org/~yushin/ietf/draft-cho-netvc-applypvq.html Applying PVQ Outside Daala]
* [https://people.xiph.org/~tterribe/daala/vbsobmc.pdf Overlapped Block Motion Compensation (OBMC)]
* [https://mf4.xiph.org/jenkins/job/daala-autotools/ws/doc/html/index.html C API Documentation]
* [https://people.xiph.org/~yushin/tmp__/yushin_phd_thesis.pdf Image Coding Thesis] by Yushin Cho
* [http://arxiv.org/pdf/1411.4290v1.pdf Maximising Coding Efficiency Through Block Rotation] and why it [http://lists.xiph.org/pipermail/daala/2015-January/000054.html won't work well within Daala]
* [http://jmvalin.ca/video/theoretical_results.pdf JMSpeex' Journal of Dubious Theoretical Results] - "take with an entire shaker-full of salt"
* [https://people.xiph.org/~unlord/pcs_daala.pdf Using Daala Intra Frames for Still Picture Coding]
* [http://arxiv.org/abs/1602.05975 The Daala Directional Deringing Filter]
* [https://people.xiph.org/~unlord/icip2016.pdf Daala: A Perceptually-Driven Still Image Codec](draft) - submitted at [http://2016.ieeeicip.org/ ICIP 2016].
* [http://people.xiph.org/~tterribe/daala/neon_tutorial.pdf SIMD Assembly Tutorial: ARM NEON]
* [https://jmvalin.ca/video/mmsp2016_poster.pdf Daala Technologies Poster]

=== IETF Drafts ===
* [https://tools.ietf.org/html/draft-egge-videocodec-tdlt Time-Domain Lapped Transforms (TDLT)] - documents the Lapped Transform pre- and post-filters used for block-edge decorrelation
* [https://tools.ietf.org/html/draft-valin-videocodec-pvq Perceptual Vector Quantisation (PVQ)] -
* [https://tools.ietf.org/html/draft-terriberry-codingtools Coding Tools] - documents Entropy Coding, Integer Transforms and other techniques
* [https://tools.ietf.org/html/draft-moffitt-netvc-requirements Internet Video Codec (NetVC) Requirements] - explains what requirements and use cases Daala is trying to cater for
* [https://tools.ietf.org/html/draft-daede-netvc-testing Internet Video Codec (NetVC) Testing and Quality Measurement]
* [https://tools.ietf.org/html/draft-terriberry-ipr-license Example IPR Licence Terms]

Additional drafts can be found at the [https://datatracker.ietf.org/wg/netvc/documents/ IETF DataTracker].

== Presentations ==
For a more in-depth look at the IETF's NetVC Meetings, use the [https://datatracker.ietf.org/wg/netvc/meetings/ IETF DataTracker].

* 2017-11-15 - IETF 100 - [https://www.youtube.com/watch?v=_wRLR8ypCg0&t=4682s Video] - Slides?
* 2017-05-25 - IETF 98 - [https://datatracker.ietf.org/meeting/98/materials/slides-98-hackathon-netvc/ NetVC Hackathon Slides], [https://jmvalin.ca/video/cdef_slides.pdf CDEF Slides]
* 2017-02-05 - FOSDEM 2017 - [https://video.fosdem.org/2017/K.3.401/om_av1.vp8.webm Video] - [https://fosdem.org/2017/schedule/event/om_av1/attachments/slides/1795/export/events/attachments/om_av1/slides/1795/av1_update.pdf Slides]
* 2017-01-17 - Linux Conf AU - [https://www.youtube.com/watch?v=lzPaldsmJbk Video] - [https://people.xiph.org/~tterribe/pubs/lca2017/aom.pdf Slides]
-------
* 2016-11-15 - IETF 97 - [https://datatracker.ietf.org/meeting/97/agenda/netvc/ Agenda] - [http://recs.conf.meetecho.com/Playout/watch.jsp?recording=IETF97_NETVC&chapter=chapter_1 Video and Chat] - [https://datatracker.ietf.org/meeting/97/session/netvc/ Slides]
* 2016-09-04 - VideoLAN Dev Days 2016 - [https://www.youtube.co.uk/watch?v=AOssZFJ0EdI Video] - [http://people.xiph.org/~tterribe/daala/vdd2016.pdf Slides]
* 2016-08-31 - SPIE Royalty-free Video - [http://spie.org/OPO/conferencedetails/digital-image-processing#session-7 Schedule and Abstracts] - [https://www.youtube.co.uk/watch?v=wi1BefrfTos&t=41m40s Video] - [http://people.xiph.org/~tterribe/daala/daala-spie-adip2016-slides.pdf Slides]
* 2016-07-18 - IETF 96 - [https://datatracker.ietf.org/meeting/96/agenda/netvc/ Agenda] - [http://recs.conf.meetecho.com/Playout/watch.jsp?recording=IETF96_NETVC&chapter=chapter_1 Video and Chat] - [https://datatracker.ietf.org/meeting/96/session/netvc/ Slides]
* 2016-06-27 - Daala's Entropy Coder - [http://people.xiph.org/~tterribe/daala/daala_ec.pdf Slides]
* 2016-04-07 - IETF 95 - [https://datatracker.ietf.org/meeting/95/agenda/netvc/ Agenda] - [http://recs.conf.meetecho.com/Playout/watch.jsp?recording=IETF95_NETVC&chapter=chapter_1 Video and Chat] - [https://www.ietf.org/proceedings/95/slides/slides-95-netvc-2.pdf Slides] - [https://datatracker.ietf.org/meeting/95/session/netvc/ Other Materials]
* 2016-04-01 - DCC 2016 - [https://people.xiph.org/~unlord/Daala-DCC2016.pdf Slides] - [https://people.xiph.org/~unlord/1853a466.pdf Paper]
* 2016-01-30 - FOSDEM 2016 - [https://fosdem.org/2016/schedule/event/daala/ Summary] - [https://video.fosdem.org/2016/h2214/implementing-a-native-daala-decoder-in-ffmpeg.mp4 Video (MP4, 57MB)] - [https://people.xiph.org/~unlord/Xmi.pdf Slides]
-------
* 2015-11-02 - IETF 94 - [https://datatracker.ietf.org/meeting/94/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf94/netvc Video and Chat] - [https://datatracker.ietf.org/meeting/94/session/netvc/ Other Materials]
* 2015-10-24 - LinuxDay 24 (Turin) - [https://people.xiph.org/~tterribe/daala/linuxday24.pdf Slides]
* 2015-10-21 - MPEG 113 - Future Video Coding Workshop - [https://people.xiph.org/~tterribe/daala/mpeg113.pdf Slides]
* 2015-09-19 - VideoLAN Dev Days - [https://www.youtube.com/playlist?list=PLQLpBN3oI7E44HIdTOovThc1MNHLchgHE YouTube Playlist] - [https://people.xiph.org/~tterribe/daala/vdd2015.pdf Daala Slides]
* 2015-07-22 - IETF 93 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF93_NETVC_II&chapter=chapter_1 NetVC Session 2/2] - [https://datatracker.ietf.org/meeting/93/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf93/netvc_II Video and Chat] - [https://www.ietf.org/proceedings/93/slides/slides-93-netvc-4.pdf Slides] - [https://www.ietf.org/jabber/logs/netvc/2015-07-22.html Jabber Log] - [https://datatracker.ietf.org/meeting/93/session/netvc/ Other Materials]
* 2015-07-20 - IETF 93 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF93_NETVC&chapter=chapter_1 NetVC Session 1/2] - [https://datatracker.ietf.org/meeting/93/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf93/netvc Video and Chat] - [https://www.ietf.org/proceedings/93/slides/slides-93-netvc-3.pdf Slides] - [https://www.ietf.org/jabber/logs/netvc/2015-07-20.html Jabber Log]
* 2015-03-24 - IETF 92 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF92_NETVC&chapter=chapter_0 NetVC Session] - Audio as [https://people.xiph.org/~tdaede/audio/ietf92-venetian-20150324-0900-am1.opus Opus] (29MB) or [https://www.ietf.org/audio/ietf92/ietf92-venetian-20150324-0900-am1.mp3 MP3] (119MB, action starts at 14:50) - [https://www.ietf.org/proceedings/92/slides/slides-92-netvc-0.pdf Slides] - [https://www.ietf.org/mail-archive/web/video-codec/current/msg00235.html Notes] - [https://www.ietf.org/jabber/logs/netvc/2015-03-24.html Jabber Log]
* 2015-02-11 - SPIE talks:


** [http://people.xiph.org/~tdaede/video/SPIE_Nathan.webm Chroma from Luma (CfL)] - [https://people.xiph.org/~unlord/SPIE-2015-CfL.pdf Slides] - [https://people.xiph.org/~unlord/spie_cfl.pdf Paper]
** [http://people.xiph.org/~tdaede/video/SPIE_PVQ.webm Perceptual Vector Quantisation (PVQ)] - [http://people.xiph.org/~tterribe/daala/spie_pvq_slides.pdf Slides] - [http://jmvalin.ca/papers/spie_pvq.pdf Paper]
** [http://people.xiph.org/~tdaede/video/SPIE_Tim.webm Adaptive Motion Compensation Without Blocking Artifacts] - [http://people.xiph.org/~tterribe/daala/spie_obmc_slides.pdf Slides] - [https://people.xiph.org/~tterribe/daala/vbsobmc.pdf Paper]
* 2015-01-31 - [http://ftp.osuosl.org/pub/fosdem/2015/devroom-open_media/daala.mp4 Daala Project Update at FOSDEM 2015] - [https://fosdem.org/2015/schedule/event/daala/ summary] - [https://fosdem.org/2015/schedule/event/daala/attachments/slides/569/export/events/attachments/daala/slides/569/Daala_FOSDEM_2015.pdf Slides]
* 2015-01-14 - [https://www.youtube.co.uk/watch?v=Dmho4gcRvQ4 Linux Conf 2015] - [http://lca2015.linux.org.au/schedule/30187/view_talk presentation summary] - [https://people.xiph.org/~tterribe/pubs/lca2015/daala.pdf Slides]
-------
* 2014-09-16 - [https://air.mozilla.org/daala-are-we-compressed-yet/ Daala: Are We Compressed Yet?]
* 2014-06-25 - [https://air.mozilla.org/sparsity-induced-prediction-for-images-and-video/ Sparsity Induced Prediction for Images and Video]
* 2014-06-06 - VP9 Summit (no video available) - [https://people.xiph.org/~xiphmont/demo/daala/daala-vp9summit-20140606.pdf Slides]
-------
* 2013-10-23 - [https://people.xiph.org/~xiphmont/video/Free_Codecs_Update_Opus_and_Daala.ogv Opus and Daala: State of the Art Royalty-free Codecs] - [https://people.xiph.org/~greg/gstreamer-daala-opus.pdf Slides]
* 2013-09-30 - [https://people.xiph.org/~tterribe/daala/coding_party2/?C=M;O=A Daala Coding Party 2] - [https://people.xiph.org/~unlord/Daala-Intra.pdf Slides]
* 2013-05-02 - [https://people.xiph.org/~xiphmont/tim-terriberry-presents-daala/ Tim Terriberry Presents Daala]
-------
* 2012-01-24 - [https://media.basilgohar.com/derf-talks/?C=M;O=A Introduction to Video Coding] - [https://people.xiph.org/~tterribe/pubs/lca2012/auckland/intro_to_video1.pdf Slides] (no video for slides 1-50)

== Other Websites ==
* [https://www.youtube.com/playlist?list=PLEeMksZoEQ1xQEuLF50w0RwDwLgDGwSG- Daala Presentations on YouTube]
* [https://www.youtube.com/playlist?list=PLOU2XLYxmsIJGErt5rrCqaSGTMyyqNt2H Google's Compressor Head videos] - a beginner's introduction to the world of data compression
* [https://www.zazzle.com/daala_tee_shirt-235139149596175944 Daala T-shirts] - if you'd like a free one, help out with the project and ask the Mozilla guys nicely for one :-)
* [https://www.xiph.org/donate/ Donate to Xiph.Org]
* [[Daala_on_Wheels|Historical Daala wiki page]]

[[Category:Daala]]

Daala

2017-11-17T22:18:31Z

MrZeus: /* Presentations */ add IETF 98 slides

Daala is the codename for a new video compression technology.

The effort is a collaboration between the [https://www.mozilla.org/en-US/research/ Mozilla Foundation], the [https://www.xiph.org/ Xiph.Org Foundation] and any other contributors that wish to help.

The goal of the project is to provide a video format that's free to implement, use and distribute, and a reference implementation with technical performance superior to [https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding H.265].

Please see the links below or the [https://www.xiph.org/daala/ main page] for more information.

== Wiki Pages ==
* [[Daala Quickstart|Daala Quickstart (Linux/MacOS)]]
* [[Daala Quickstart Windows|Daala Quickstart (Windows)]]
* [[Daala MinGW64 Environment]]

* [[Daala Weekly Meetings|Daala Weekly Meetings]]

* [[AreWeCompressedYet]]
* [[RD Curve Data Format]]

* [[DaalaTodo|Daala To-do List]]
* [[DaalaRoadmap|Daala Roadmap]]

* [[Intra|Intra-prediction within Daala]]

* [[Videos|Digital Primers]] - educational videos about audio/video technology

== Communication ==
You are '''encouraged''' to join the
* [irc://irc.freenode.net/daala '''#daala''' IRC channel at freenode.net] - if you don't have an IRC client, you can use Freenode's '''[https://webchat.freenode.net/?channels=%23daala webchat]''' instead.
* [http://lists.xiph.org/mailman/listinfo/daala Daala Email List]

=== Weekly Meetings ===
You are also welcome to attend the public [[Daala Weekly Meetings|weekly progress meetings]] by installing and using [http://wiki.mumble.info Mumble]. 
The address is '''mf4.xiph.org''' and the port is '''64738''' (you can run '''mumble://mf4.xiph.org:64738''' within your browser as a shortcut). 
The meetings occur on '''Tuesdays''' at '''[http://www.timeanddate.com/worldclock/fixedtime.html?msg=Daala+Weekly+Meeting&iso=20150428T09&p1=1241 9AM Pacific Time]''' (5PM UTC/GMT).
The meeting agenda used to be available at '''[https://daala.etherpad.mozilla.org/weekly-meeting this Etherpad]''', the October 13, 2015 meeting is available on [https://docs.google.com/document/d/1JP_Ko3wPuyDWhooZcp_m9kndyfZ75xN5YOi5yIMCW0s/edit?pli=1 Google Docs] and, following the migration to Etherpad Lite, the meeting agenda and minutes are now available at [https://public.etherpad-mozilla.org/p/daala-weekly-meeting this Etherpad].

=== Other ===
* [http://forum.doom9.org/showthread.php?t=168004 Doom9 Forum discussion] - generic forum thread regarding Daala
* <del>[https://daala.etherpad.mozilla.org/ep/padlist/all-pads Daala Etherpads] - you can [https://daala.etherpad.mozilla.org/ep/account/request-account request a free account] to view these. You should receive access within a few days.</del> Mozilla are transitioning to Etherpad Lite.
* [http://benjamin.smedbergs.us/weekly-updates.fcgi/project/daala Daala Project Status Board] - what Daala bits the Mozilla people are working on

== Coding ==
You can get a copy of the latest Daala Source Code from [https://git.xiph.org/?p=daala.git;a=summary '''git.xiph.org'''] or [https://github.com/xiph/daala '''GitHub''']. Please stick to the '''[https://git.xiph.org/?p=daala.git;a=blob_plain;f=doc/coding_style.html Coding Style Guide]'''.

* [https://review.xiph.org/all?limit=100 Xiph Code Reviews] - there is a proposal on the review process '''[[DaalaReview|here]]'''
* [https://github.com/xiph/daala/issues Daala's issues] - Issue/bug tracker on Github
* [https://mf4.xiph.org/jenkins/view/daala/ Continuous Integration Tests] - these run every time a new commit is made to the Daala git master, to make sure the new code hasn't broken existing functionality.
* [[OggDaala]] - definitions for embedding Daala video within an [[Ogg]] container.

== Demos ==
* [https://people.xiph.org/~xiphmont/demo/daala/player-demo.shtml Daala Video Player] - an example implementation of a Daala decoder and player, ported to Javascript using [https://github.com/kripken/emscripten Emscripten].

=== Codec Techniques ===
* [https://people.xiph.org/~xiphmont/demo/ Demo Articles] - explanations on certain techniques used in Daala (and other Xiph.Org projects)
* [http://exp.martres.me/edi/ Edge-Directed Interpolation] ([https://github.com/smarter/edi source code])
* [https://people.xiph.org/~ds/edi/info.html More Edge-Directed Interpolation]
* [https://people.xiph.org/~unlord/demo/intra.html Intra-prediction]
* [https://people.xiph.org/~unlord/zigzags.html Macroblock Coefficient Zigzag Graph] - HTML page generated using [https://github.com/xiph/daala/blob/master/tools/draw_zigzags.c tools/draw_zigzags.c] from the Daala source code.
* [https://jmvalin.ca/video/haar_example/ Still Image Screenshots] - comparison between Daala's Lapped Transform and Haar methods, and JPEG/x264/x265.

== Documents ==
* [https://people.xiph.org/~unlord/spie_cfl.pdf Chroma from Luma (CfL)]
* [http://jmvalin.ca/papers/spie_pvq.pdf Perceptual Vector Quantisation (PVQ)] - see also [https://people.xiph.org/~yushin/ietf/draft-cho-netvc-applypvq.html Applying PVQ Outside Daala]
* [https://people.xiph.org/~tterribe/daala/vbsobmc.pdf Overlapped Block Motion Compensation (OBMC)]
* [https://mf4.xiph.org/jenkins/job/daala-autotools/ws/doc/html/index.html C API Documentation]
* [https://people.xiph.org/~yushin/tmp__/yushin_phd_thesis.pdf Image Coding Thesis] by Yushin Cho
* [http://arxiv.org/pdf/1411.4290v1.pdf Maximising Coding Efficiency Through Block Rotation] and why it [http://lists.xiph.org/pipermail/daala/2015-January/000054.html won't work well within Daala]
* [http://jmvalin.ca/video/theoretical_results.pdf JMSpeex' Journal of Dubious Theoretical Results] - "take with an entire shaker-full of salt"
* [https://people.xiph.org/~unlord/pcs_daala.pdf Using Daala Intra Frames for Still Picture Coding]
* [http://arxiv.org/abs/1602.05975 The Daala Directional Deringing Filter]
* [https://people.xiph.org/~unlord/icip2016.pdf Daala: A Perceptually-Driven Still Image Codec](draft) - submitted at [http://2016.ieeeicip.org/ ICIP 2016].
* [http://people.xiph.org/~tterribe/daala/neon_tutorial.pdf SIMD Assembly Tutorial: ARM NEON]
* [https://jmvalin.ca/video/mmsp2016_poster.pdf Daala Technologies Poster]

=== IETF Drafts ===
* [https://tools.ietf.org/html/draft-egge-videocodec-tdlt Time-Domain Lapped Transforms (TDLT)] - documents the Lapped Transform pre- and post-filters used for block-edge decorrelation
* [https://tools.ietf.org/html/draft-valin-videocodec-pvq Perceptual Vector Quantisation (PVQ)] -
* [https://tools.ietf.org/html/draft-terriberry-codingtools Coding Tools] - documents Entropy Coding, Integer Transforms and other techniques
* [https://tools.ietf.org/html/draft-moffitt-netvc-requirements Internet Video Codec (NetVC) Requirements] - explains what requirements and use cases Daala is trying to cater for
* [https://tools.ietf.org/html/draft-daede-netvc-testing Internet Video Codec (NetVC) Testing and Quality Measurement]
* [https://tools.ietf.org/html/draft-terriberry-ipr-license Example IPR Licence Terms]

Additional drafts can be found at the [https://datatracker.ietf.org/wg/netvc/documents/ IETF DataTracker].

== Presentations ==
For a more in-depth look at the IETF's NetVC Meetings, use the [https://datatracker.ietf.org/wg/netvc/meetings/ IETF DataTracker].

* 2017-11-15 - IETF 100 - [https://www.youtube.com/watch?v=_wRLR8ypCg0&t=4682s Video] - Slides?
* 2017-05-25 - IETF 98 - [https://datatracker.ietf.org/meeting/98/materials/slides-98-hackathon-netvc/ NetVC Hackathon Slides]
* 2017-02-05 - FOSDEM 2017 - [https://video.fosdem.org/2017/K.3.401/om_av1.vp8.webm Video] - [https://fosdem.org/2017/schedule/event/om_av1/attachments/slides/1795/export/events/attachments/om_av1/slides/1795/av1_update.pdf Slides]
* 2017-01-17 - Linux Conf AU - [https://www.youtube.com/watch?v=lzPaldsmJbk Video] - [https://people.xiph.org/~tterribe/pubs/lca2017/aom.pdf Slides]
-------
* 2016-11-15 - IETF 97 - [https://datatracker.ietf.org/meeting/97/agenda/netvc/ Agenda] - [http://recs.conf.meetecho.com/Playout/watch.jsp?recording=IETF97_NETVC&chapter=chapter_1 Video and Chat] - [https://datatracker.ietf.org/meeting/97/session/netvc/ Slides]
* 2016-09-04 - VideoLAN Dev Days 2016 - [https://www.youtube.co.uk/watch?v=AOssZFJ0EdI Video] - [http://people.xiph.org/~tterribe/daala/vdd2016.pdf Slides]
* 2016-08-31 - SPIE Royalty-free Video - [http://spie.org/OPO/conferencedetails/digital-image-processing#session-7 Schedule and Abstracts] - [https://www.youtube.co.uk/watch?v=wi1BefrfTos&t=41m40s Video] - [http://people.xiph.org/~tterribe/daala/daala-spie-adip2016-slides.pdf Slides]
* 2016-07-18 - IETF 96 - [https://datatracker.ietf.org/meeting/96/agenda/netvc/ Agenda] - [http://recs.conf.meetecho.com/Playout/watch.jsp?recording=IETF96_NETVC&chapter=chapter_1 Video and Chat] - [https://datatracker.ietf.org/meeting/96/session/netvc/ Slides]
* 2016-06-27 - Daala's Entropy Coder - [http://people.xiph.org/~tterribe/daala/daala_ec.pdf Slides]
* 2016-04-07 - IETF 95 - [https://datatracker.ietf.org/meeting/95/agenda/netvc/ Agenda] - [http://recs.conf.meetecho.com/Playout/watch.jsp?recording=IETF95_NETVC&chapter=chapter_1 Video and Chat] - [https://www.ietf.org/proceedings/95/slides/slides-95-netvc-2.pdf Slides] - [https://datatracker.ietf.org/meeting/95/session/netvc/ Other Materials]
* 2016-04-01 - DCC 2016 - [https://people.xiph.org/~unlord/Daala-DCC2016.pdf Slides] - [https://people.xiph.org/~unlord/1853a466.pdf Paper]
* 2016-01-30 - FOSDEM 2016 - [https://fosdem.org/2016/schedule/event/daala/ Summary] - [https://video.fosdem.org/2016/h2214/implementing-a-native-daala-decoder-in-ffmpeg.mp4 Video (MP4, 57MB)] - [https://people.xiph.org/~unlord/Xmi.pdf Slides]
-------
* 2015-11-02 - IETF 94 - [https://datatracker.ietf.org/meeting/94/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf94/netvc Video and Chat] - [https://datatracker.ietf.org/meeting/94/session/netvc/ Other Materials]
* 2015-10-24 - LinuxDay 24 (Turin) - [https://people.xiph.org/~tterribe/daala/linuxday24.pdf Slides]
* 2015-10-21 - MPEG 113 - Future Video Coding Workshop - [https://people.xiph.org/~tterribe/daala/mpeg113.pdf Slides]
* 2015-09-19 - VideoLAN Dev Days - [https://www.youtube.com/playlist?list=PLQLpBN3oI7E44HIdTOovThc1MNHLchgHE YouTube Playlist] - [https://people.xiph.org/~tterribe/daala/vdd2015.pdf Daala Slides]
* 2015-07-22 - IETF 93 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF93_NETVC_II&chapter=chapter_1 NetVC Session 2/2] - [https://datatracker.ietf.org/meeting/93/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf93/netvc_II Video and Chat] - [https://www.ietf.org/proceedings/93/slides/slides-93-netvc-4.pdf Slides] - [https://www.ietf.org/jabber/logs/netvc/2015-07-22.html Jabber Log] - [https://datatracker.ietf.org/meeting/93/session/netvc/ Other Materials]
* 2015-07-20 - IETF 93 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF93_NETVC&chapter=chapter_1 NetVC Session 1/2] - [https://datatracker.ietf.org/meeting/93/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf93/netvc Video and Chat] - [https://www.ietf.org/proceedings/93/slides/slides-93-netvc-3.pdf Slides] - [https://www.ietf.org/jabber/logs/netvc/2015-07-20.html Jabber Log]
* 2015-03-24 - IETF 92 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF92_NETVC&chapter=chapter_0 NetVC Session] - Audio as [https://people.xiph.org/~tdaede/audio/ietf92-venetian-20150324-0900-am1.opus Opus] (29MB) or [https://www.ietf.org/audio/ietf92/ietf92-venetian-20150324-0900-am1.mp3 MP3] (119MB, action starts at 14:50) - [https://www.ietf.org/proceedings/92/slides/slides-92-netvc-0.pdf Slides] - [https://www.ietf.org/mail-archive/web/video-codec/current/msg00235.html Notes] - [https://www.ietf.org/jabber/logs/netvc/2015-03-24.html Jabber Log]
* 2015-02-11 - SPIE talks:


** [http://people.xiph.org/~tdaede/video/SPIE_Nathan.webm Chroma from Luma (CfL)] - [https://people.xiph.org/~unlord/SPIE-2015-CfL.pdf Slides] - [https://people.xiph.org/~unlord/spie_cfl.pdf Paper]
** [http://people.xiph.org/~tdaede/video/SPIE_PVQ.webm Perceptual Vector Quantisation (PVQ)] - [http://people.xiph.org/~tterribe/daala/spie_pvq_slides.pdf Slides] - [http://jmvalin.ca/papers/spie_pvq.pdf Paper]
** [http://people.xiph.org/~tdaede/video/SPIE_Tim.webm Adaptive Motion Compensation Without Blocking Artifacts] - [http://people.xiph.org/~tterribe/daala/spie_obmc_slides.pdf Slides] - [https://people.xiph.org/~tterribe/daala/vbsobmc.pdf Paper]
* 2015-01-31 - [http://ftp.osuosl.org/pub/fosdem/2015/devroom-open_media/daala.mp4 Daala Project Update at FOSDEM 2015] - [https://fosdem.org/2015/schedule/event/daala/ summary] - [https://fosdem.org/2015/schedule/event/daala/attachments/slides/569/export/events/attachments/daala/slides/569/Daala_FOSDEM_2015.pdf Slides]
* 2015-01-14 - [https://www.youtube.co.uk/watch?v=Dmho4gcRvQ4 Linux Conf 2015] - [http://lca2015.linux.org.au/schedule/30187/view_talk presentation summary] - [https://people.xiph.org/~tterribe/pubs/lca2015/daala.pdf Slides]
-------
* 2014-09-16 - [https://air.mozilla.org/daala-are-we-compressed-yet/ Daala: Are We Compressed Yet?]
* 2014-06-25 - [https://air.mozilla.org/sparsity-induced-prediction-for-images-and-video/ Sparsity Induced Prediction for Images and Video]
* 2014-06-06 - VP9 Summit (no video available) - [https://people.xiph.org/~xiphmont/demo/daala/daala-vp9summit-20140606.pdf Slides]
-------
* 2013-10-23 - [https://people.xiph.org/~xiphmont/video/Free_Codecs_Update_Opus_and_Daala.ogv Opus and Daala: State of the Art Royalty-free Codecs] - [https://people.xiph.org/~greg/gstreamer-daala-opus.pdf Slides]
* 2013-09-30 - [https://people.xiph.org/~tterribe/daala/coding_party2/?C=M;O=A Daala Coding Party 2] - [https://people.xiph.org/~unlord/Daala-Intra.pdf Slides]
* 2013-05-02 - [https://people.xiph.org/~xiphmont/tim-terriberry-presents-daala/ Tim Terriberry Presents Daala]
-------
* 2012-01-24 - [https://media.basilgohar.com/derf-talks/?C=M;O=A Introduction to Video Coding] - [https://people.xiph.org/~tterribe/pubs/lca2012/auckland/intro_to_video1.pdf Slides] (no video for slides 1-50)

== Other Websites ==
* [https://www.youtube.com/playlist?list=PLEeMksZoEQ1xQEuLF50w0RwDwLgDGwSG- Daala Presentations on YouTube]
* [https://www.youtube.com/playlist?list=PLOU2XLYxmsIJGErt5rrCqaSGTMyyqNt2H Google's Compressor Head videos] - a beginner's introduction to the world of data compression
* [https://www.zazzle.com/daala_tee_shirt-235139149596175944 Daala T-shirts] - if you'd like a free one, help out with the project and ask the Mozilla guys nicely for one :-)
* [https://www.xiph.org/donate/ Donate to Xiph.Org]
* [[Daala_on_Wheels|Historical Daala wiki page]]

[[Category:Daala]]

Daala

2017-11-17T22:01:28Z

MrZeus: /* Presentations */ add IETF 100 CfL video

Daala is the codename for a new video compression technology.

The effort is a collaboration between the [https://www.mozilla.org/en-US/research/ Mozilla Foundation], the [https://www.xiph.org/ Xiph.Org Foundation] and any other contributors that wish to help.

The goal of the project is to provide a video format that's free to implement, use and distribute, and a reference implementation with technical performance superior to [https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding H.265].

Please see the links below or the [https://www.xiph.org/daala/ main page] for more information.

== Wiki Pages ==
* [[Daala Quickstart|Daala Quickstart (Linux/MacOS)]]
* [[Daala Quickstart Windows|Daala Quickstart (Windows)]]
* [[Daala MinGW64 Environment]]

* [[Daala Weekly Meetings|Daala Weekly Meetings]]

* [[AreWeCompressedYet]]
* [[RD Curve Data Format]]

* [[DaalaTodo|Daala To-do List]]
* [[DaalaRoadmap|Daala Roadmap]]

* [[Intra|Intra-prediction within Daala]]

* [[Videos|Digital Primers]] - educational videos about audio/video technology

== Communication ==
You are '''encouraged''' to join the
* [irc://irc.freenode.net/daala '''#daala''' IRC channel at freenode.net] - if you don't have an IRC client, you can use Freenode's '''[https://webchat.freenode.net/?channels=%23daala webchat]''' instead.
* [http://lists.xiph.org/mailman/listinfo/daala Daala Email List]

=== Weekly Meetings ===
You are also welcome to attend the public [[Daala Weekly Meetings|weekly progress meetings]] by installing and using [http://wiki.mumble.info Mumble]. 
The address is '''mf4.xiph.org''' and the port is '''64738''' (you can run '''mumble://mf4.xiph.org:64738''' within your browser as a shortcut). 
The meetings occur on '''Tuesdays''' at '''[http://www.timeanddate.com/worldclock/fixedtime.html?msg=Daala+Weekly+Meeting&iso=20150428T09&p1=1241 9AM Pacific Time]''' (5PM UTC/GMT).
The meeting agenda used to be available at '''[https://daala.etherpad.mozilla.org/weekly-meeting this Etherpad]''', the October 13, 2015 meeting is available on [https://docs.google.com/document/d/1JP_Ko3wPuyDWhooZcp_m9kndyfZ75xN5YOi5yIMCW0s/edit?pli=1 Google Docs] and, following the migration to Etherpad Lite, the meeting agenda and minutes are now available at [https://public.etherpad-mozilla.org/p/daala-weekly-meeting this Etherpad].

=== Other ===
* [http://forum.doom9.org/showthread.php?t=168004 Doom9 Forum discussion] - generic forum thread regarding Daala
* <del>[https://daala.etherpad.mozilla.org/ep/padlist/all-pads Daala Etherpads] - you can [https://daala.etherpad.mozilla.org/ep/account/request-account request a free account] to view these. You should receive access within a few days.</del> Mozilla are transitioning to Etherpad Lite.
* [http://benjamin.smedbergs.us/weekly-updates.fcgi/project/daala Daala Project Status Board] - what Daala bits the Mozilla people are working on

== Coding ==
You can get a copy of the latest Daala Source Code from [https://git.xiph.org/?p=daala.git;a=summary '''git.xiph.org'''] or [https://github.com/xiph/daala '''GitHub''']. Please stick to the '''[https://git.xiph.org/?p=daala.git;a=blob_plain;f=doc/coding_style.html Coding Style Guide]'''.

* [https://review.xiph.org/all?limit=100 Xiph Code Reviews] - there is a proposal on the review process '''[[DaalaReview|here]]'''
* [https://github.com/xiph/daala/issues Daala's issues] - Issue/bug tracker on Github
* [https://mf4.xiph.org/jenkins/view/daala/ Continuous Integration Tests] - these run every time a new commit is made to the Daala git master, to make sure the new code hasn't broken existing functionality.
* [[OggDaala]] - definitions for embedding Daala video within an [[Ogg]] container.

== Demos ==
* [https://people.xiph.org/~xiphmont/demo/daala/player-demo.shtml Daala Video Player] - an example implementation of a Daala decoder and player, ported to Javascript using [https://github.com/kripken/emscripten Emscripten].

=== Codec Techniques ===
* [https://people.xiph.org/~xiphmont/demo/ Demo Articles] - explanations on certain techniques used in Daala (and other Xiph.Org projects)
* [http://exp.martres.me/edi/ Edge-Directed Interpolation] ([https://github.com/smarter/edi source code])
* [https://people.xiph.org/~ds/edi/info.html More Edge-Directed Interpolation]
* [https://people.xiph.org/~unlord/demo/intra.html Intra-prediction]
* [https://people.xiph.org/~unlord/zigzags.html Macroblock Coefficient Zigzag Graph] - HTML page generated using [https://github.com/xiph/daala/blob/master/tools/draw_zigzags.c tools/draw_zigzags.c] from the Daala source code.
* [https://jmvalin.ca/video/haar_example/ Still Image Screenshots] - comparison between Daala's Lapped Transform and Haar methods, and JPEG/x264/x265.

== Documents ==
* [https://people.xiph.org/~unlord/spie_cfl.pdf Chroma from Luma (CfL)]
* [http://jmvalin.ca/papers/spie_pvq.pdf Perceptual Vector Quantisation (PVQ)] - see also [https://people.xiph.org/~yushin/ietf/draft-cho-netvc-applypvq.html Applying PVQ Outside Daala]
* [https://people.xiph.org/~tterribe/daala/vbsobmc.pdf Overlapped Block Motion Compensation (OBMC)]
* [https://mf4.xiph.org/jenkins/job/daala-autotools/ws/doc/html/index.html C API Documentation]
* [https://people.xiph.org/~yushin/tmp__/yushin_phd_thesis.pdf Image Coding Thesis] by Yushin Cho
* [http://arxiv.org/pdf/1411.4290v1.pdf Maximising Coding Efficiency Through Block Rotation] and why it [http://lists.xiph.org/pipermail/daala/2015-January/000054.html won't work well within Daala]
* [http://jmvalin.ca/video/theoretical_results.pdf JMSpeex' Journal of Dubious Theoretical Results] - "take with an entire shaker-full of salt"
* [https://people.xiph.org/~unlord/pcs_daala.pdf Using Daala Intra Frames for Still Picture Coding]
* [http://arxiv.org/abs/1602.05975 The Daala Directional Deringing Filter]
* [https://people.xiph.org/~unlord/icip2016.pdf Daala: A Perceptually-Driven Still Image Codec](draft) - submitted at [http://2016.ieeeicip.org/ ICIP 2016].
* [http://people.xiph.org/~tterribe/daala/neon_tutorial.pdf SIMD Assembly Tutorial: ARM NEON]
* [https://jmvalin.ca/video/mmsp2016_poster.pdf Daala Technologies Poster]

=== IETF Drafts ===
* [https://tools.ietf.org/html/draft-egge-videocodec-tdlt Time-Domain Lapped Transforms (TDLT)] - documents the Lapped Transform pre- and post-filters used for block-edge decorrelation
* [https://tools.ietf.org/html/draft-valin-videocodec-pvq Perceptual Vector Quantisation (PVQ)] -
* [https://tools.ietf.org/html/draft-terriberry-codingtools Coding Tools] - documents Entropy Coding, Integer Transforms and other techniques
* [https://tools.ietf.org/html/draft-moffitt-netvc-requirements Internet Video Codec (NetVC) Requirements] - explains what requirements and use cases Daala is trying to cater for
* [https://tools.ietf.org/html/draft-daede-netvc-testing Internet Video Codec (NetVC) Testing and Quality Measurement]
* [https://tools.ietf.org/html/draft-terriberry-ipr-license Example IPR Licence Terms]

Additional drafts can be found at the [https://datatracker.ietf.org/wg/netvc/documents/ IETF DataTracker].

== Presentations ==
For a more in-depth look at the IETF's NetVC Meetings, use the [https://datatracker.ietf.org/wg/netvc/meetings/ IETF DataTracker].

* 2017-11-15 - IETF 100 - [https://www.youtube.com/watch?v=_wRLR8ypCg0&t=4682s Video] - Slides?
* 2017-02-05 - FOSDEM 2017 - [https://video.fosdem.org/2017/K.3.401/om_av1.vp8.webm Video] - [https://fosdem.org/2017/schedule/event/om_av1/attachments/slides/1795/export/events/attachments/om_av1/slides/1795/av1_update.pdf Slides]
* 2017-01-17 - Linux Conf AU - [https://www.youtube.com/watch?v=lzPaldsmJbk Video] - [https://people.xiph.org/~tterribe/pubs/lca2017/aom.pdf Slides]
-------
* 2016-11-15 - IETF 97 - [https://datatracker.ietf.org/meeting/97/agenda/netvc/ Agenda] - [http://recs.conf.meetecho.com/Playout/watch.jsp?recording=IETF97_NETVC&chapter=chapter_1 Video and Chat] - [https://datatracker.ietf.org/meeting/97/session/netvc/ Slides]
* 2016-09-04 - VideoLAN Dev Days 2016 - [https://www.youtube.co.uk/watch?v=AOssZFJ0EdI Video] - [http://people.xiph.org/~tterribe/daala/vdd2016.pdf Slides]
* 2016-08-31 - SPIE Royalty-free Video - [http://spie.org/OPO/conferencedetails/digital-image-processing#session-7 Schedule and Abstracts] - [https://www.youtube.co.uk/watch?v=wi1BefrfTos&t=41m40s Video] - [http://people.xiph.org/~tterribe/daala/daala-spie-adip2016-slides.pdf Slides]
* 2016-07-18 - IETF 96 - [https://datatracker.ietf.org/meeting/96/agenda/netvc/ Agenda] - [http://recs.conf.meetecho.com/Playout/watch.jsp?recording=IETF96_NETVC&chapter=chapter_1 Video and Chat] - [https://datatracker.ietf.org/meeting/96/session/netvc/ Slides]
* 2016-06-27 - Daala's Entropy Coder - [http://people.xiph.org/~tterribe/daala/daala_ec.pdf Slides]
* 2016-04-07 - IETF 95 - [https://datatracker.ietf.org/meeting/95/agenda/netvc/ Agenda] - [http://recs.conf.meetecho.com/Playout/watch.jsp?recording=IETF95_NETVC&chapter=chapter_1 Video and Chat] - [https://www.ietf.org/proceedings/95/slides/slides-95-netvc-2.pdf Slides] - [https://datatracker.ietf.org/meeting/95/session/netvc/ Other Materials]
* 2016-04-01 - DCC 2016 - [https://people.xiph.org/~unlord/Daala-DCC2016.pdf Slides] - [https://people.xiph.org/~unlord/1853a466.pdf Paper]
* 2016-01-30 - FOSDEM 2016 - [https://fosdem.org/2016/schedule/event/daala/ Summary] - [https://video.fosdem.org/2016/h2214/implementing-a-native-daala-decoder-in-ffmpeg.mp4 Video (MP4, 57MB)] - [https://people.xiph.org/~unlord/Xmi.pdf Slides]
-------
* 2015-11-02 - IETF 94 - [https://datatracker.ietf.org/meeting/94/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf94/netvc Video and Chat] - [https://datatracker.ietf.org/meeting/94/session/netvc/ Other Materials]
* 2015-10-24 - LinuxDay 24 (Turin) - [https://people.xiph.org/~tterribe/daala/linuxday24.pdf Slides]
* 2015-10-21 - MPEG 113 - Future Video Coding Workshop - [https://people.xiph.org/~tterribe/daala/mpeg113.pdf Slides]
* 2015-09-19 - VideoLAN Dev Days - [https://www.youtube.com/playlist?list=PLQLpBN3oI7E44HIdTOovThc1MNHLchgHE YouTube Playlist] - [https://people.xiph.org/~tterribe/daala/vdd2015.pdf Daala Slides]
* 2015-07-22 - IETF 93 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF93_NETVC_II&chapter=chapter_1 NetVC Session 2/2] - [https://datatracker.ietf.org/meeting/93/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf93/netvc_II Video and Chat] - [https://www.ietf.org/proceedings/93/slides/slides-93-netvc-4.pdf Slides] - [https://www.ietf.org/jabber/logs/netvc/2015-07-22.html Jabber Log] - [https://datatracker.ietf.org/meeting/93/session/netvc/ Other Materials]
* 2015-07-20 - IETF 93 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF93_NETVC&chapter=chapter_1 NetVC Session 1/2] - [https://datatracker.ietf.org/meeting/93/agenda/netvc/ Agenda] - [http://www.meetecho.com/ietf93/netvc Video and Chat] - [https://www.ietf.org/proceedings/93/slides/slides-93-netvc-3.pdf Slides] - [https://www.ietf.org/jabber/logs/netvc/2015-07-20.html Jabber Log]
* 2015-03-24 - IETF 92 - [http://recordings.conf.meetecho.com/Playout/watch.jsp?recording=IETF92_NETVC&chapter=chapter_0 NetVC Session] - Audio as [https://people.xiph.org/~tdaede/audio/ietf92-venetian-20150324-0900-am1.opus Opus] (29MB) or [https://www.ietf.org/audio/ietf92/ietf92-venetian-20150324-0900-am1.mp3 MP3] (119MB, action starts at 14:50) - [https://www.ietf.org/proceedings/92/slides/slides-92-netvc-0.pdf Slides] - [https://www.ietf.org/mail-archive/web/video-codec/current/msg00235.html Notes] - [https://www.ietf.org/jabber/logs/netvc/2015-03-24.html Jabber Log]
* 2015-02-11 - SPIE talks:


** [http://people.xiph.org/~tdaede/video/SPIE_Nathan.webm Chroma from Luma (CfL)] - [https://people.xiph.org/~unlord/SPIE-2015-CfL.pdf Slides] - [https://people.xiph.org/~unlord/spie_cfl.pdf Paper]
** [http://people.xiph.org/~tdaede/video/SPIE_PVQ.webm Perceptual Vector Quantisation (PVQ)] - [http://people.xiph.org/~tterribe/daala/spie_pvq_slides.pdf Slides] - [http://jmvalin.ca/papers/spie_pvq.pdf Paper]
** [http://people.xiph.org/~tdaede/video/SPIE_Tim.webm Adaptive Motion Compensation Without Blocking Artifacts] - [http://people.xiph.org/~tterribe/daala/spie_obmc_slides.pdf Slides] - [https://people.xiph.org/~tterribe/daala/vbsobmc.pdf Paper]
* 2015-01-31 - [http://ftp.osuosl.org/pub/fosdem/2015/devroom-open_media/daala.mp4 Daala Project Update at FOSDEM 2015] - [https://fosdem.org/2015/schedule/event/daala/ summary] - [https://fosdem.org/2015/schedule/event/daala/attachments/slides/569/export/events/attachments/daala/slides/569/Daala_FOSDEM_2015.pdf Slides]
* 2015-01-14 - [https://www.youtube.co.uk/watch?v=Dmho4gcRvQ4 Linux Conf 2015] - [http://lca2015.linux.org.au/schedule/30187/view_talk presentation summary] - [https://people.xiph.org/~tterribe/pubs/lca2015/daala.pdf Slides]
-------
* 2014-09-16 - [https://air.mozilla.org/daala-are-we-compressed-yet/ Daala: Are We Compressed Yet?]
* 2014-06-25 - [https://air.mozilla.org/sparsity-induced-prediction-for-images-and-video/ Sparsity Induced Prediction for Images and Video]
* 2014-06-06 - VP9 Summit (no video available) - [https://people.xiph.org/~xiphmont/demo/daala/daala-vp9summit-20140606.pdf Slides]
-------
* 2013-10-23 - [https://people.xiph.org/~xiphmont/video/Free_Codecs_Update_Opus_and_Daala.ogv Opus and Daala: State of the Art Royalty-free Codecs] - [https://people.xiph.org/~greg/gstreamer-daala-opus.pdf Slides]
* 2013-09-30 - [https://people.xiph.org/~tterribe/daala/coding_party2/?C=M;O=A Daala Coding Party 2] - [https://people.xiph.org/~unlord/Daala-Intra.pdf Slides]
* 2013-05-02 - [https://people.xiph.org/~xiphmont/tim-terriberry-presents-daala/ Tim Terriberry Presents Daala]
-------
* 2012-01-24 - [https://media.basilgohar.com/derf-talks/?C=M;O=A Introduction to Video Coding] - [https://people.xiph.org/~tterribe/pubs/lca2012/auckland/intro_to_video1.pdf Slides] (no video for slides 1-50)

== Other Websites ==
* [https://www.youtube.com/playlist?list=PLEeMksZoEQ1xQEuLF50w0RwDwLgDGwSG- Daala Presentations on YouTube]
* [https://www.youtube.com/playlist?list=PLOU2XLYxmsIJGErt5rrCqaSGTMyyqNt2H Google's Compressor Head videos] - a beginner's introduction to the world of data compression
* [https://www.zazzle.com/daala_tee_shirt-235139149596175944 Daala T-shirts] - if you'd like a free one, help out with the project and ask the Mozilla guys nicely for one :-)
* [https://www.xiph.org/donate/ Donate to Xiph.Org]
* [[Daala_on_Wheels|Historical Daala wiki page]]

[[Category:Daala]]

Speex FAQ

2017-11-09T15:02:27Z

MrZeus: /* Can Speex run on fixed-point processors or DSPs? */

== General ==

=== Why do we need [[Speex]]? [[Vorbis]] is open source and patent-free. ===

[[Vorbis]] is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets speech only. For that reason, Speex can achieve much better results than Vorbis on speech (typically 2-4 times higher compression at equal quality).

=== Isn't there an open source implementation of the [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] codec? Why is Speex necessary? ===

First of all, it's not clear whether [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps.

=== Ogg, Vorbis, Speex, what's the difference? ===

[[Ogg]] is a '''container''' format for holding multimedia data (audio, video, subtitles, etc.).

[[Vorbis]] is an '''audio codec''' that uses Ogg to store its bit-streams as files, hence the name '''Ogg Vorbis'''.

[[Speex]] is a '''speech codec''', that also uses the Ogg format to store its bit-streams as files,
so technically they would be '''Ogg Speex''' files. However, most people refer to them just as Speex files.

One difference Speex has with Vorbis, is that Speex is less tied to Ogg.
Actually, if you want to use Speex for Voice over IP (VoIP), you don't need to use an Ogg container at all.

=== What's the extension for Speex files? ===

Speex files tend to have the '''.spx''' extension. Note that the Speex tools (speexenc, speexdec) do not rely on the file extension at all, so any extension will work.

=== Can I use Speex for compressing music? ===

You can, but you'll be better off compressing with Vorbis when it comes to music.
Just like Vorbis is not really adapted to speech, Speex is really not adapted for music.

=== I converted some MP3s to Speex and the quality is bad. What's wrong? ===

This is called '''[https://en.wikipedia.org/wiki/Transcoding transcoding]''' (converting from one lossy format to another) and it will always result in much poorer quality than the original MP3.

Avoid transcoding speech, unless you have a ''really'' good reason to do so (e.g. you need a smaller file size).
You should also avoid '''tandeming''' (self-transcoding). I.e. if you decode a Speex file and re-encode it again at the same bit-rate, you ''will'' lose quality.

=== How does Speex compare to other proprietary codecs? ===

It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either).

Speex also has many features that are not present in most other codecs. These include variable bit-rate (VBR), integration of narrowband and wideband, as well as stereo support. Of course, another area where Speex is really hard to beat is the quality/price ratio. Unlike many very expensive codecs, Speex is free and anyone may distribute or modify it at will.

=== Can Speex pass [https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling DTMF] signals correctly? ===

I guess it all depends on the bit-rate used. Though no formal testing has yet been performed, I'd say it works correctly at 8 kbps and above. Make sure you don't use the lowest complexity (see SPEEX_SET_COMPLEXITY or --comp option), as it causes significant noise.

=== Can Speex pass [https://en.wikipedia.org/wiki/List_of_ITU-T_V-series_recommendations#Simultaneous_transmission_of_data_and_other_signals V.90] / [https://en.wikipedia.org/wiki/V.92 V.92] modem signals correctly? ===
No it cannot, as that would break fundamental laws of information theory. If it '''could''' do that, I'd be '''very''' rich by now :-)

=== Does Speex have anything to do with the University of Sherbrooke? ===

No.

I wrote Speex while pursuing my Ph.D. at the University of Sherbrooke (2002-2005) in mobile robotics. Although I did my master with the Sherbrooke speech coding group (in speech enhancement, not coding), I was no longer associated with them when developing Speex. Furthermore, Speex does not make use of any code or proprietary technology developed in the Sherbrooke speech coding group.

=== When will the next version of Speex be released? ===

Speex has been superseded by the [[Opus]] codec, so probably no more versions of Speex will be released by Xiph. However, since the code is open-source, you are welcome to make your own additions to Speex as you please.

=== How can I help if I don't know about speech processing? ===

There's always "semi-technical" work to do. The documentation can be improved, and so can this FAQ.

== License ==

=== Under what license is Speex released? ===

As of version 1.0 beta 1, Speex is released under the revised (3-clause) BSD license. This license is one of the most permissive open source licenses.

=== Am I allowed to use Speex in commercial software? ===

Yes. As long as you comply with the license. This basically means you have to keep the copyright notice and you can't use our name to promote your product without authorization.

=== Do I have to release source code if I use Speex in a proprietary application? ===

No. The BSD license does '''not''' require you to release any source code. It is however '''appreciated''' (but not required) if you contribute back useful changes you make to Speex. This is generally also in your interest because it means you get maintenance of that code for free (i.e. no need to merge again in newer versions).

== Using libspeex ==

=== Does Speex run on Windows? ===

Compilation on Windows has been supported since version 0.8.0. There are also several front-ends available from the website.

=== Why is encoding so slow compared to decoding? ===

For most kinds of compression, encoding is inherently slower than decoding.

In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries.
When decoding, all that needs to be done is look up the right entry in the codebook using the encoded index. Since a lookup is much faster than a search, the decoder works much faster than the encoder.

=== Why is Speex so slow on my iPaq (or insert any platform without an FPU)? ===

You probably didn't build Speex with the '''--enable-fixed-point''' option.

Even if you did, not all modes have been ported to use fixed-point arithmetic, so the code may be slowed down by any float operations left (e.g. in the Wideband mode).

=== I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that? ===

One of the causes could be scaling of the input speech. Speex expects signals to have a +-32767 (signed short) dynamic range. If the dynamic range of your signals is too small (e.g. +-1.0), you will suffer important quantization noise. A good target is to have a dynamic range around +=8000 which is large enough, but small enough to make sure there's no clipping when converting back to signed short.

=== I get very distorted speech when using libspeex in my application. What's wrong? ===

There are many possible causes for that. One of them is errors in the way the bits are manipulated. Another possible cause is the use of the same encoder or decoder state for more than one audio stream (channel), which produces strange effects with the filter memories. If the input speech has an amplitude close to +-32767, it is possible that at decoding, the amplitude be a bit higher than that, causing clipping when saving as 16-bit PCM.

=== Can Speex run on fixed-point processors or DSPs? ===

Yes. You can compile Speex for fixed-point CPUs by passing the '''--enable-fixed-point''' option to the configure script or defining '''FIXED_POINT'''.

=== What architectures are supported? ===

Speex is designed to run on pretty much any CPU that can do 16x16 multiplications. That includes:
* x86
* PowerPC
* ARM
* Blackfin
* TI C5x and C6x
* dsPIC (unsupported, unofficial port)
* Cell (in progress)

...and many others

=== Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? ===

Yes, but you must resample the signal to a supported sample rate before encoding.

Speex is optimised for '''8 kHz''' and '''16 kHz''' and it can also encode 32 kHz files as well (your mileage may vary). Anything else is ''unsupported'' and tends to be ''heavily sub-optimal''. You might as well use Vorbis instead.

Note that Speex includes a resampler module as of version '''1.2beta2'''. Refer to the section called Resampler in the Speex documentation.

=== Should I use version 1.0.x or 1.2? ===
While currently 1.2 is marked as beta, the author believes 1.2 is vastly superior and more stable than the 1.0.x series regarding compression and API performance. As such, it is recommended to use version 1.2, even though it is marked as beta.

== Technical ==

=== What's the difference between CELP and ACELP? ===

CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction".

That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used forms of CELP.

Unfortunately, since ACELP is patented, it could not be used in Speex.

=== Where can I get information about how Speex works? ===

There is not (yet) a complete description of the algorithm (except for the source code itself).

However, several aspects are documented either in the [http://speex.org/docs/ manual] or in the paper [http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex].

[[Category:Speex]]

Speex FAQ

2017-11-09T15:00:49Z

MrZeus: /* Why is Speex so slow on my iPaq (or insert any platform without an FPU)? */

== General ==

=== Why do we need [[Speex]]? [[Vorbis]] is open source and patent-free. ===

[[Vorbis]] is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets speech only. For that reason, Speex can achieve much better results than Vorbis on speech (typically 2-4 times higher compression at equal quality).

=== Isn't there an open source implementation of the [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] codec? Why is Speex necessary? ===

First of all, it's not clear whether [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps.

=== Ogg, Vorbis, Speex, what's the difference? ===

[[Ogg]] is a '''container''' format for holding multimedia data (audio, video, subtitles, etc.).

[[Vorbis]] is an '''audio codec''' that uses Ogg to store its bit-streams as files, hence the name '''Ogg Vorbis'''.

[[Speex]] is a '''speech codec''', that also uses the Ogg format to store its bit-streams as files,
so technically they would be '''Ogg Speex''' files. However, most people refer to them just as Speex files.

One difference Speex has with Vorbis, is that Speex is less tied to Ogg.
Actually, if you want to use Speex for Voice over IP (VoIP), you don't need to use an Ogg container at all.

=== What's the extension for Speex files? ===

Speex files tend to have the '''.spx''' extension. Note that the Speex tools (speexenc, speexdec) do not rely on the file extension at all, so any extension will work.

=== Can I use Speex for compressing music? ===

You can, but you'll be better off compressing with Vorbis when it comes to music.
Just like Vorbis is not really adapted to speech, Speex is really not adapted for music.

=== I converted some MP3s to Speex and the quality is bad. What's wrong? ===

This is called '''[https://en.wikipedia.org/wiki/Transcoding transcoding]''' (converting from one lossy format to another) and it will always result in much poorer quality than the original MP3.

Avoid transcoding speech, unless you have a ''really'' good reason to do so (e.g. you need a smaller file size).
You should also avoid '''tandeming''' (self-transcoding). I.e. if you decode a Speex file and re-encode it again at the same bit-rate, you ''will'' lose quality.

=== How does Speex compare to other proprietary codecs? ===

It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either).

Speex also has many features that are not present in most other codecs. These include variable bit-rate (VBR), integration of narrowband and wideband, as well as stereo support. Of course, another area where Speex is really hard to beat is the quality/price ratio. Unlike many very expensive codecs, Speex is free and anyone may distribute or modify it at will.

=== Can Speex pass [https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling DTMF] signals correctly? ===

I guess it all depends on the bit-rate used. Though no formal testing has yet been performed, I'd say it works correctly at 8 kbps and above. Make sure you don't use the lowest complexity (see SPEEX_SET_COMPLEXITY or --comp option), as it causes significant noise.

=== Can Speex pass [https://en.wikipedia.org/wiki/List_of_ITU-T_V-series_recommendations#Simultaneous_transmission_of_data_and_other_signals V.90] / [https://en.wikipedia.org/wiki/V.92 V.92] modem signals correctly? ===
No it cannot, as that would break fundamental laws of information theory. If it '''could''' do that, I'd be '''very''' rich by now :-)

=== Does Speex have anything to do with the University of Sherbrooke? ===

No.

I wrote Speex while pursuing my Ph.D. at the University of Sherbrooke (2002-2005) in mobile robotics. Although I did my master with the Sherbrooke speech coding group (in speech enhancement, not coding), I was no longer associated with them when developing Speex. Furthermore, Speex does not make use of any code or proprietary technology developed in the Sherbrooke speech coding group.

=== When will the next version of Speex be released? ===

Speex has been superseded by the [[Opus]] codec, so probably no more versions of Speex will be released by Xiph. However, since the code is open-source, you are welcome to make your own additions to Speex as you please.

=== How can I help if I don't know about speech processing? ===

There's always "semi-technical" work to do. The documentation can be improved, and so can this FAQ.

== License ==

=== Under what license is Speex released? ===

As of version 1.0 beta 1, Speex is released under the revised (3-clause) BSD license. This license is one of the most permissive open source licenses.

=== Am I allowed to use Speex in commercial software? ===

Yes. As long as you comply with the license. This basically means you have to keep the copyright notice and you can't use our name to promote your product without authorization.

=== Do I have to release source code if I use Speex in a proprietary application? ===

No. The BSD license does '''not''' require you to release any source code. It is however '''appreciated''' (but not required) if you contribute back useful changes you make to Speex. This is generally also in your interest because it means you get maintenance of that code for free (i.e. no need to merge again in newer versions).

== Using libspeex ==

=== Does Speex run on Windows? ===

Compilation on Windows has been supported since version 0.8.0. There are also several front-ends available from the website.

=== Why is encoding so slow compared to decoding? ===

For most kinds of compression, encoding is inherently slower than decoding.

In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries.
When decoding, all that needs to be done is look up the right entry in the codebook using the encoded index. Since a lookup is much faster than a search, the decoder works much faster than the encoder.

=== Why is Speex so slow on my iPaq (or insert any platform without an FPU)? ===

You probably didn't build Speex with the '''--enable-fixed-point''' option.

Even if you did, not all modes have been ported to use fixed-point arithmetic, so the code may be slowed down by any float operations left (e.g. in the Wideband mode).

=== I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that? ===

One of the causes could be scaling of the input speech. Speex expects signals to have a +-32767 (signed short) dynamic range. If the dynamic range of your signals is too small (e.g. +-1.0), you will suffer important quantization noise. A good target is to have a dynamic range around +=8000 which is large enough, but small enough to make sure there's no clipping when converting back to signed short.

=== I get very distorted speech when using libspeex in my application. What's wrong? ===

There are many possible causes for that. One of them is errors in the way the bits are manipulated. Another possible cause is the use of the same encoder or decoder state for more than one audio stream (channel), which produces strange effects with the filter memories. If the input speech has an amplitude close to +-32767, it is possible that at decoding, the amplitude be a bit higher than that, causing clipping when saving as 16-bit PCM.

=== Can Speex run on fixed-point processors or DSPs? ===

Yes. You can compile Speex for fixed-point CPUs by passing the --enable-fixed-point option to the configure script or defining FIXED_POINT.

=== What architectures are supported? ===

Speex is designed to run on pretty much any CPU that can do 16x16 multiplications. That includes:
* x86
* PowerPC
* ARM
* Blackfin
* TI C5x and C6x
* dsPIC (unsupported, unofficial port)
* Cell (in progress)

...and many others

=== Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? ===

Yes, but you must resample the signal to a supported sample rate before encoding.

Speex is optimised for '''8 kHz''' and '''16 kHz''' and it can also encode 32 kHz files as well (your mileage may vary). Anything else is ''unsupported'' and tends to be ''heavily sub-optimal''. You might as well use Vorbis instead.

Note that Speex includes a resampler module as of version '''1.2beta2'''. Refer to the section called Resampler in the Speex documentation.

=== Should I use version 1.0.x or 1.2? ===
While currently 1.2 is marked as beta, the author believes 1.2 is vastly superior and more stable than the 1.0.x series regarding compression and API performance. As such, it is recommended to use version 1.2, even though it is marked as beta.

== Technical ==

=== What's the difference between CELP and ACELP? ===

CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction".

That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used forms of CELP.

Unfortunately, since ACELP is patented, it could not be used in Speex.

=== Where can I get information about how Speex works? ===

There is not (yet) a complete description of the algorithm (except for the source code itself).

However, several aspects are documented either in the [http://speex.org/docs/ manual] or in the paper [http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex].

[[Category:Speex]]

Speex FAQ

2017-11-08T12:57:05Z

MrZeus: /* Can Speex pass V.90/V.92 modem signals correctly? */ fix wiki link

== General ==

=== Why do we need [[Speex]]? [[Vorbis]] is open source and patent-free. ===

[[Vorbis]] is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets speech only. For that reason, Speex can achieve much better results than Vorbis on speech (typically 2-4 times higher compression at equal quality).

=== Isn't there an open source implementation of the [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] codec? Why is Speex necessary? ===

First of all, it's not clear whether [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps.

=== Ogg, Vorbis, Speex, what's the difference? ===

[[Ogg]] is a '''container''' format for holding multimedia data (audio, video, subtitles, etc.).

[[Vorbis]] is an '''audio codec''' that uses Ogg to store its bit-streams as files, hence the name '''Ogg Vorbis'''.

[[Speex]] is a '''speech codec''', that also uses the Ogg format to store its bit-streams as files,
so technically they would be '''Ogg Speex''' files. However, most people refer to them just as Speex files.

One difference Speex has with Vorbis, is that Speex is less tied to Ogg.
Actually, if you want to use Speex for Voice over IP (VoIP), you don't need to use an Ogg container at all.

=== What's the extension for Speex files? ===

Speex files tend to have the '''.spx''' extension. Note that the Speex tools (speexenc, speexdec) do not rely on the file extension at all, so any extension will work.

=== Can I use Speex for compressing music? ===

You can, but you'll be better off compressing with Vorbis when it comes to music.
Just like Vorbis is not really adapted to speech, Speex is really not adapted for music.

=== I converted some MP3s to Speex and the quality is bad. What's wrong? ===

This is called '''[https://en.wikipedia.org/wiki/Transcoding transcoding]''' (converting from one lossy format to another) and it will always result in much poorer quality than the original MP3.

Avoid transcoding speech, unless you have a ''really'' good reason to do so (e.g. you need a smaller file size).
You should also avoid '''tandeming''' (self-transcoding). I.e. if you decode a Speex file and re-encode it again at the same bit-rate, you ''will'' lose quality.

=== How does Speex compare to other proprietary codecs? ===

It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either).

Speex also has many features that are not present in most other codecs. These include variable bit-rate (VBR), integration of narrowband and wideband, as well as stereo support. Of course, another area where Speex is really hard to beat is the quality/price ratio. Unlike many very expensive codecs, Speex is free and anyone may distribute or modify it at will.

=== Can Speex pass [https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling DTMF] signals correctly? ===

I guess it all depends on the bit-rate used. Though no formal testing has yet been performed, I'd say it works correctly at 8 kbps and above. Make sure you don't use the lowest complexity (see SPEEX_SET_COMPLEXITY or --comp option), as it causes significant noise.

=== Can Speex pass [https://en.wikipedia.org/wiki/List_of_ITU-T_V-series_recommendations#Simultaneous_transmission_of_data_and_other_signals V.90] / [https://en.wikipedia.org/wiki/V.92 V.92] modem signals correctly? ===
No it cannot, as that would break fundamental laws of information theory. If it '''could''' do that, I'd be '''very''' rich by now :-)

=== Does Speex have anything to do with the University of Sherbrooke? ===

No.

I wrote Speex while pursuing my Ph.D. at the University of Sherbrooke (2002-2005) in mobile robotics. Although I did my master with the Sherbrooke speech coding group (in speech enhancement, not coding), I was no longer associated with them when developing Speex. Furthermore, Speex does not make use of any code or proprietary technology developed in the Sherbrooke speech coding group.

=== When will the next version of Speex be released? ===

Speex has been superseded by the [[Opus]] codec, so probably no more versions of Speex will be released by Xiph. However, since the code is open-source, you are welcome to make your own additions to Speex as you please.

=== How can I help if I don't know about speech processing? ===

There's always "semi-technical" work to do. The documentation can be improved, and so can this FAQ.

== License ==

=== Under what license is Speex released? ===

As of version 1.0 beta 1, Speex is released under the revised (3-clause) BSD license. This license is one of the most permissive open source licenses.

=== Am I allowed to use Speex in commercial software? ===

Yes. As long as you comply with the license. This basically means you have to keep the copyright notice and you can't use our name to promote your product without authorization.

=== Do I have to release source code if I use Speex in a proprietary application? ===

No. The BSD license does '''not''' require you to release any source code. It is however '''appreciated''' (but not required) if you contribute back useful changes you make to Speex. This is generally also in your interest because it means you get maintenance of that code for free (i.e. no need to merge again in newer versions).

== Using libspeex ==

=== Does Speex run on Windows? ===

Compilation on Windows has been supported since version 0.8.0. There are also several front-ends available from the website.

=== Why is encoding so slow compared to decoding? ===

For most kinds of compression, encoding is inherently slower than decoding.

In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries.
When decoding, all that needs to be done is look up the right entry in the codebook using the encoded index. Since a lookup is much faster than a search, the decoder works much faster than the encoder.

=== Why is Speex so slow on my iPaq (or insert any platform without an FPU)? ===

You probably didn't build Speex with the fixed-point option (--enable-fixed-point). Even if you did, not all modes have been ported to use fixed-point arithmetic, so the code may be slowed down by a few float operations left (e.g. in the wideband mode).

=== I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that? ===

One of the causes could be scaling of the input speech. Speex expects signals to have a +-32767 (signed short) dynamic range. If the dynamic range of your signals is too small (e.g. +-1.0), you will suffer important quantization noise. A good target is to have a dynamic range around +=8000 which is large enough, but small enough to make sure there's no clipping when converting back to signed short.

=== I get very distorted speech when using libspeex in my application. What's wrong? ===

There are many possible causes for that. One of them is errors in the way the bits are manipulated. Another possible cause is the use of the same encoder or decoder state for more than one audio stream (channel), which produces strange effects with the filter memories. If the input speech has an amplitude close to +-32767, it is possible that at decoding, the amplitude be a bit higher than that, causing clipping when saving as 16-bit PCM.

=== Can Speex run on fixed-point processors or DSPs? ===

Yes. You can compile Speex for fixed-point CPUs by passing the --enable-fixed-point option to the configure script or defining FIXED_POINT.

=== What architectures are supported? ===

Speex is designed to run on pretty much any CPU that can do 16x16 multiplications. That includes:
* x86
* PowerPC
* ARM
* Blackfin
* TI C5x and C6x
* dsPIC (unsupported, unofficial port)
* Cell (in progress)

...and many others

=== Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? ===

Yes, but you must resample the signal to a supported sample rate before encoding.

Speex is optimised for '''8 kHz''' and '''16 kHz''' and it can also encode 32 kHz files as well (your mileage may vary). Anything else is ''unsupported'' and tends to be ''heavily sub-optimal''. You might as well use Vorbis instead.

Note that Speex includes a resampler module as of version '''1.2beta2'''. Refer to the section called Resampler in the Speex documentation.

=== Should I use version 1.0.x or 1.2? ===
While currently 1.2 is marked as beta, the author believes 1.2 is vastly superior and more stable than the 1.0.x series regarding compression and API performance. As such, it is recommended to use version 1.2, even though it is marked as beta.

== Technical ==

=== What's the difference between CELP and ACELP? ===

CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction".

That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used forms of CELP.

Unfortunately, since ACELP is patented, it could not be used in Speex.

=== Where can I get information about how Speex works? ===

There is not (yet) a complete description of the algorithm (except for the source code itself).

However, several aspects are documented either in the [http://speex.org/docs/ manual] or in the paper [http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex].

[[Category:Speex]]

OpusFAQ

2017-11-07T17:03:55Z

MrZeus: /* Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? */

[[Image:Opus logo trans.png]]

If you are looking for info not covered in this FAQ, try the '''[https://opus-codec.org main Opus website]''' or the pages included in the '''[[:Category:Opus|Opus category]]''' of this wiki.

== General Questions ==

=== What is Opus? Who created it? ===

Opus is a totally open, royalty-free, highly versatile audio codec.

It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''.

Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.

=== How does Opus compare to other codecs? ===

Opus is distinguished from most high quality formats (eg: [[Vorbis]], AAC, MP3) by having '''[https://tools.ietf.org/html/rfc6716#section-2 low delay]''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: [[Speex]], G.711, GSM) by supporting '''[https://tools.ietf.org/html/rfc6716#section-2.1.1 high audio quality]''' (supports narrow-band all the way to full-band audio).

It '''[https://opus-codec.org/comparison meets or exceeds existing codecs' quality]''' across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.

Most importantly, the Opus format and its reference implementation are both available under '''[https://opus-codec.org/license/ liberal, royalty-free licenses]'''. 
This makes it:
* easy to adopt
* compatible with free software
* suitable for use as part of the basic infrastructure of the Internet

=== Does Opus make all those other lossy codecs obsolete? ===

Yes.

From a technical point of view (loss, delay, bitrates, ...) Opus renders '''[[Speex]]''' obsolete and should also replace '''[[Vorbis]]''' and the common proprietary codecs too (e.g. AAC, MP3, ...).

=== Will Opus replace Vorbis in video files? ===

For '''[[Ogg]]''' video files (which use the '''[[Theora]]''' video codec), you ''can'' use Opus instead of Vorbis, but the overall size reduction will be minimal and it will break compatibility with existing players.

For WebM video files, the convention is to use the '''[http://www.webmproject.org/vp9/ VP9 video codec]''' when using Opus as an audio codec.

=== How do I use Opus? ===

For now, the best way to '''encode''' audio into Opus files is to use the '''opusenc''' command-line tool from the '''[https://opus-codec.org/downloads/ opus-tools package]'''.

If you want to encode many files at once (e.g. your music library), try the applications listed in the '''[[OpusSupport|Opus Support]]''' page.

For rough guidelines on encoding settings, see the '''[[Opus Recommended Settings]]''' page.

=== What programs support Opus? ===

Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.

For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.

Opus is a relatively new codec (standardized in September 2012), but '''[[OpusSupport|many more applications]]''' will support it in the near future.

=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===

Yes and no.

Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.

However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.

The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.

See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.

If you want a codec to handle high sampling rates losslessly, use '''[[FLAC]]'''!

=== What are the licensing requirements? ===

The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met.

Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).

See the '''[https://www.opus-codec.org/license/ Opus Licensing]''' page for details.

=== Why make Opus free? ===

On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.

Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.

Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.

This is why Opus, unlike many codecs, is free.

=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===

No.

The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.

=== Why not keep the SILK and CELT codecs separate? ===
Opus is more than just two independent codecs with a switch.

In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.

Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.

=== Now that Opus is standardized, will its development stop or can it be further improved? ===
Yes, Opus '''can''' and '''should''' be improved, because unlike most '''[https://en.wikipedia.org/wiki/ITU-T#Key_standards_published_by_ITU ITU-T codecs]''', Opus is only defined in terms of its decoder.

The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.

Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.

In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.

=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===

Yes.

=== In what ways is Opus optimized for the Internet? ===

Opus has good packet loss robustness and concealment, but its optimisations go further.

One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.

This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.

One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.

=== What applications for Android can play Opus? ===

Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.

=== When will the next version be released? ===

When it's done. Seriously, we do not know.

Opus is not a large project with a fixed release schedule.

That being said, our '''[https://www.opus-codec.org/downloads/ pre-releases]''' and even the git repositories ('''[https://git.xiph.org/?p=opus.git Xiph]''', '''[https://github.com/xiph/opus GitHub]''') are pretty stable and given proper testing (which you should always do anyway), are safe to distribute.

Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.

== Software Developers' Questions ==

=== On what platforms does Opus run? ===

The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.

Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.

=== Is there a fixed-point implementation? ===

Yes.

The fixed-point and floating-point decoder and encoder implementations are part of the same code base.

The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.

=== Which implementation should I use? ===

While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.

The [https://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.

All Opus implementations are compatible by definition.

=== How is supporting Opus different from supporting Speex/G.711/MP3? ===

Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''.

The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.

=== My application doesn't work. Can anyone help me? ===

It's possible to get help, but before doing so, there are a few basic things to try:

* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.
* Read the [https://www.opus-codec.org/docs/ Opus documentation].
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.

If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.

=== How do I report a bug? ===

If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].

Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.

If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.

Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].

=== What is Opus Custom? ===

Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.

Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.

For these reasons, '''its use is discouraged''' outside of very specific applications.

You may want to use Opus Custom for:

* ultra-low-delay applications, where synchronization with the soundcard buffer is important.
* low-power embedded applications, where compatibility with others is not important.

For almost all other types of applications, Opus Custom should not be used.

=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===

Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.

Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.

The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed '''[https://github.com/xiph/opus-tools/blob/master/src/resample.c resampler]''' which can be used where resampling is required.

=== But won't the resampler hurt the quality? Isn't it better to use 44.1 kHz directly? ===

Not really. The quality degradation caused by any reasonable resampler (SoX, libspeexdsp, libsamplerate, ...) is far less than the distortion caused by the best lossy codec at its highest bitrate. If you can't tolerate the quality degradation caused by a good 44.1 ↔ 48 kHz resampler, then you shouldn't be using a lossy codec in the first place. Similarly, the extra CPU spent in the resampler is small compared to the rest of the codec. Not only that, but many soundcards only support 48 kHz on playback, so players can directly play the output rather than resample it to 48 kHz (e.g. for a 44.1 kHz MP3). So effectively, Opus is only shifting the burden of resampling from the decoder side to the encoder side.

One advantage of supporting only one internal rate is that it makes it possible for Opus to support many features, including efficient speech compression (through SILK) and real-time applications. It also means all the quality tuning effort can be spent on a single configuration, which helps bring even better quality.

=== How is the bitrate setting used in VBR mode? ===

Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.

The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.

=== What frame size should I use? ===

A '''20ms''' frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.

Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent). For file encoding, using a frame size larger than 20 ms will usually result in '''worse''' quality for the same bitrate because it constrains the encoder in the decisions it can make.

=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===

The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.

In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.

FEC is only used by the encoder under certain conditions:
* the feature must be enabled via the '''OPUS_SET_INBAND_FEC''' CTL
* the encoder must be told to expect loss via the '''OPUS_SET_PACKET_LOSS_PERC''' CTL
* the codec must be operated in any of the '''Linear Prediction''' or '''Hybrid''' modes

Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.

Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default, the implementation assumes there is no loss.

=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===

A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.

To build Opus without the references to <tt>malloc/free</tt>, you must:

* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.

If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage. 
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.

=== How can I ensure that my software interoperates with other software implementing Opus? ===

For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.

In general, here's a list of specific issues to check:
* Can your application handle all frame sizes, including changing the frame size from frame to frame?
* Does your application react properly to lost packets, by calling the decoder with a NULL packet?

=== What is the complexity of Opus? ===

The complexity of Opus varies by a large amount based on the settings used.

It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone.

For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.

=== Opus is using too much CPU for my application. What can I do? ===

First don't panic and don't start writing assembly just yet.

It's possible that you're just not using the right set of options.

If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.

Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[https://www.opus-codec.org/docs/ Documentation]''' for details).

If all else fails and you need to optimize the Opus code, see the next question.

=== I would like to optimize/improve/help with Opus. Where should I start? ===

Please '''[https://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.

This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path. More details in the '''[[OpusContributing|contributing page]]'''.

=== Does Opus have an echo canceller like Speex does? ===

Echo cancellation is completely independent from codecs.

You can use any echo canceller (including the one from libspeexdsp) along with Opus.

That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].

=== How do I get the duration of a .opus file? ===

Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__info.html op_pcm_total()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.

If you want to implement this yourself, you need to
* Read the BOS (Beginning Of Stream) pages to enumerate the serial numbers of all concurrently multiplexed streams, identify the Opus stream you want, and get its preskip value.
* Read up through the first complete audio data page to compute the starting granule position (since the timestamps might not start at 0, e.g., if the file was captured from a live stream that was joined after the start).
* Seek near the end of a file and look for a page with the same serial number as found in the headers (just under 64 kB from the end should be sufficient to ensure you find a page, assuming the Opus data is not multiplexed with another stream and there is no trailing garbage in the file).
* If you find a page whose serial number was not included in the original set of BOS pages, you have a chained stream. You need to bisect the file to identify the end of the first chain and the start of the next, and repeat this process for each link in the chain.
* If you don't find any pages at all, or find a page whose serial number was included in the original set of BOS pages, but was not the serial number of the Opus stream you want, back up and try again (being careful to avoid rescanning the same data, which can produce quadratic worst-case complexity).
* If you find a page whose serial number matches the Opus stream you want, look at its final granule position, and compute the total duration (in seconds) as (final_granule_position - initial_granule_position - preskip)/48000.0.

=== Why don't you store the duration in the header? Isn't all of that slow and complicated? ===

Computing the duration directly from the file contents allows files to be written in a single pass, without any seeking, which is necessary for live streaming. Chaining also simplifies live streaming, as you can just pipe multiple files into the same network connection, with all associated metadata updates, etc., and the results are still valid .opus files (contrast with the '''[http://www.smackfu.com/stuff/programming/shoutcast.html hacks used to add metadata to MP3 streams]''').

Opening a typical .opus file, which is not multiplexed and not chained, and computing the duration over the network requires just one extra HTTP request, which can proceed in parallel with the buffering in the main request. This is the behavior you will get from libopusfile's HTTP backend by default.

Enumeration of chain boundaries can be expensive in files with many links, but in our testing libopusfile used nearly an order of magnitude fewer seeks to do this than some other media frameworks (at the time). Storing a duration in a header wouldn't solve this, since every link in a chain has its own, independent headers. If the cost of chain enumeration is a problem, the best way to avoid it is to store the links in separate files (i.e., don't use chaining).

=== How do I seek in a .opus file? ===

Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__seeking.html op_pcm_seek() or op_raw_seek()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.

If you want to implement seeking yourself, you need to
* Identify the link that contains the target (if you have a chained file).
* Adjust the target by 80 ms to get enough pre-roll data (to ensure the decoder will have converged by the time you reach the target), as recommended by '''[https://tools.ietf.org/html/rfc7845 RFC 7845]'''.
* Estimate the location of the last audio data page with a completed packet prior to the adjusted target, using the duration and size (in bytes) of the link.
* Seek to that location and scan forward until you find an audio data page with a completed packet (that contains a valid granule position).
* If you think you are sufficiently close to the adjusted target, scan forward until you find the next audio data page with a completed packet.
* If the adjusted target lies between the first audio data page with a completed packet you found and the next one, stop. You can decode forward from here and start playing when you reach your (original, unadjusted) target.
* Otherwise, go back and re-estimate the seek location using the granule positions and file offsets of the page(s) you just found.

libopusfile includes fallbacks to prevent pathological worst-case behavior when its guesses are repeatedly wrong. Weighted bisection can degrade to a linear scan, but libopusfile's worst case is within a constant factor of naive bisection (i.e., logarithmic). We have only ever observed such pathological behavior in files we manually constructed to trigger it.

libopusfile also takes shortcuts when the target location is near the current position, to make small seeks cheaper. In the best case it can loop forever over very short files whose data is contained in a single page (e.g., less than 1 second long with default encoder settings) without any seeking at all.

You can find more information on seeking in files that contain Opus multiplexed with other streams (e.g., video) '''[[GranulePosAndSeeking|on this page]]'''.

=== Wouldn't it be better to build an index? ===

As with file durations, an index at the beginning of the file is incompatible with live streaming. It also means more data has to be fetched before a file can start playing over the network, because you must read past the index even when you don't intend to seek. The index could be stored at the end (which even still allows encoding the file in a single pass), but this requires one (or more) extra seeks to read the index (especially if its exact location at the end is not known), either on file open or on first seek. Unlike the final timestamp, which is small and fixed in size, an index grows with the file duration, and can have unbounded size. It is also easy for an index to become out of sync with a file that has been edited or damaged, in which case seeking will simply fail. By contrast, you can seek in a truncated .opus download without issues.

In practice, bisection seeking on VBR audio achieves performance that is very nearly as good as seeking with an index, without any of the drawbacks of an index. libopusfile provides a test program called seeking_example which can be used to benchmark the performance on your files.

On a 96 kbps VBR file nearly one hour long (the second movement of Mahler's Symphony No. 8 "Symphony of a Thousand"):

Testing exact PCM seeking to random places in 169680000 samples (58m55.000s)...
Total seek operations: 1020 (1.020 per exact seek, 2 maximum).

On a chained file formed by concatenating the eight test vectors for the currently supported channel layouts in mapping family 1:

Opened file containing 8 links with 18 seeks (2.250 per link).
Testing exact PCM seeking to random places in 2759064 samples (57.481s)...
Total seek operations: 946 (0.946 per exact seek, 2 maximum).

That is, the number of physical seeks required is almost always 1, every once in a while 2, and in short files, sometimes even 0.

[[Category:Opus]]

OpusFAQ

2017-11-07T16:40:23Z

MrZeus: /* What programs support Opus? */ add context to "relatively new codec"

[[Image:Opus logo trans.png]]

If you are looking for info not covered in this FAQ, try the '''[https://opus-codec.org main Opus website]''' or the pages included in the '''[[:Category:Opus|Opus category]]''' of this wiki.

== General Questions ==

=== What is Opus? Who created it? ===

Opus is a totally open, royalty-free, highly versatile audio codec.

It is primarily designed for interactive speech and music transmission over the Internet, but is also applicable to storage and streaming applications. It incorporates technology from Skype's '''[https://en.wikipedia.org/wiki/SILK SILK]''' codec and Xiph.Org's '''[http://celt-codec.org/ CELT]''' codec. It has been standardized by the '''[https://www.ietf.org/ Internet Engineering Task Force]''' (IETF) as '''[https://tools.ietf.org/html/rfc6716 RFC 6716]'''.

Opus has been in development since early 2007. Programmers associated with '''[https://xiph.org/ Xiph.Org]''', '''[https://www.skype.com/ Skype]''' and several other organizations have contributed to its development and to the standardization process as part of the '''[https://datatracker.ietf.org/wg/codec/charter/ IETF's Codec Working Group]'''.

=== How does Opus compare to other codecs? ===

Opus is distinguished from most high quality formats (eg: [[Vorbis]], AAC, MP3) by having '''[https://tools.ietf.org/html/rfc6716#section-2 low delay]''' (5 ~ 66.5 ms) and distinguished from most low delay formats (eg: [[Speex]], G.711, GSM) by supporting '''[https://tools.ietf.org/html/rfc6716#section-2.1.1 high audio quality]''' (supports narrow-band all the way to full-band audio).

It '''[https://opus-codec.org/comparison meets or exceeds existing codecs' quality]''' across a wide range of bitrates, and it operates at lower delay than virtually any existing compressed format.

Most importantly, the Opus format and its reference implementation are both available under '''[https://opus-codec.org/license/ liberal, royalty-free licenses]'''. 
This makes it:
* easy to adopt
* compatible with free software
* suitable for use as part of the basic infrastructure of the Internet

=== Does Opus make all those other lossy codecs obsolete? ===

Yes.

From a technical point of view (loss, delay, bitrates, ...) Opus renders '''[[Speex]]''' obsolete and should also replace '''[[Vorbis]]''' and the common proprietary codecs too (e.g. AAC, MP3, ...).

=== Will Opus replace Vorbis in video files? ===

For '''[[Ogg]]''' video files (which use the '''[[Theora]]''' video codec), you ''can'' use Opus instead of Vorbis, but the overall size reduction will be minimal and it will break compatibility with existing players.

For WebM video files, the convention is to use the '''[http://www.webmproject.org/vp9/ VP9 video codec]''' when using Opus as an audio codec.

=== How do I use Opus? ===

For now, the best way to '''encode''' audio into Opus files is to use the '''opusenc''' command-line tool from the '''[https://opus-codec.org/downloads/ opus-tools package]'''.

If you want to encode many files at once (e.g. your music library), try the applications listed in the '''[[OpusSupport|Opus Support]]''' page.

For rough guidelines on encoding settings, see the '''[[Opus Recommended Settings]]''' page.

=== What programs support Opus? ===

Opus decoding support is now included in '''[http://caniuse.com/opus some Internet browsers]''' and '''[[OpusSupport|many applications]]''', including '''[https://www.mozilla.org/firefox Firefox]''', '''[https://www.foobar2000.org/ foobar2000]''' and '''[https://www.videolan.org/vlc/ VLC]''', as well as in frameworks such as '''[https://gstreamer.freedesktop.org/ GStreamer]''' and '''[https://ffmpeg.org/ FFmpeg]'''.

For real-time applications, Opus support is available in '''[https://www.webrtc.org/ Google's WebRTC codebase]'''.

Opus is a relatively new codec (standardized in September 2012), but '''[[OpusSupport|many more applications]]''' will support it in the near future.

=== Does Opus support higher sampling rates, such as 96 kHz or 192 kHz? ===

Yes and no.

Opus encoding tools like opusenc will happily encode input files that are sampled at 96 or 192 kHz.

However, files at these rates are internally '''converted to 48 kHz''' and then only frequencies '''up to 20 kHz''' are encoded.

The reason is simple: lossy codecs are designed to preserve audible details while discarding irrelevant information. Since the human ear can only hear up to 20 kHz at best (usually lower than that), frequency content above 20 kHz is the first thing to go.

See Monty's '''[https://people.xiph.org/~xiphmont/demo/neil-young.html article]''' for more details.

If you want a codec to handle higher sampling rates losslessly, use '''[[FLAC]]'''!

=== What are the licensing requirements? ===

The reference Opus source code is released under a three-clause BSD license, which is a very permissive Open Source license. Commercial use and distribution (including in proprietary software) is permitted, provided that some basic conditions specified in the license are met.

Opus is also covered by some patents, for which royalty-free usage rights are granted, under conditions that the authors believe are compatible with (hopefully) all open source licenses, including the GPL (v2 and v3).

See the '''[https://www.opus-codec.org/license/ Opus Licensing]''' page for details.

=== Why make Opus free? ===

On the Internet, protocol and codec standards are part of the common infrastructure everyone builds upon.

Most of the value of a high-quality standard is the innovation and inter-operation provided by the systems built on top of it. When a few parties have monopoly rights to monetize a standard, that infrastructure stops being so common and everyone else has more reason to use their own solution instead, increasing cost and reducing efficiency.

Imagine a road system where each type of car could only drive on its own manufacturer's pavement. We all benefit from living in a world where all the roads are connected.

This is why Opus, unlike many codecs, is free.

=== Is the SILK part of Opus compatible with the SILK implementation shipped in Skype? ===

No.

The SILK codec, as submitted by Skype to the IETF, was heavily modified as part of its integration within Opus. The modifications are significant enough that it is not possible to just write a "translator". Even sharing code between Opus and the "old SILK" would be highly complex.

=== Why not keep the SILK and CELT codecs separate? ===
Opus is more than just two independent codecs with a switch.

In addition to a [https://en.wikipedia.org/wiki/Linear_predictive_coding Linear Prediction] '''SILK mode''' and an [https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform MDCT] '''CELT mode''' it has a '''hybrid mode''', where speech frequencies up to 8 kHz are encoded with LP while those between 8 and 20 kHz are encoded with MDCT. This is what allows Opus to have such high speech quality around 32 kbps.

Another advantage of the integration is the ability to switch between these 3 modes seamlessly, without any audible "glitches" and without any out-of-band signalling.

=== Now that Opus is standardized, will its development stop or can it be further improved? ===
Yes, Opus '''can''' and '''should''' be improved, because unlike most '''[https://en.wikipedia.org/wiki/ITU-T#Key_standards_published_by_ITU ITU-T codecs]''', Opus is only defined in terms of its decoder.

The encoder can keep evolving as long as the bitstream it produces can be decoded by the reference decoder. This is what made it possible for modern MP3 encoders (e.g. '''[https://en.wikipedia.org/wiki/LAME LAME]''') to improve far beyond the original '''[https://en.wikipedia.org/wiki/L3enc L3enc]''' and '''dist10''' reference implementations.

Although it is unlikely that Opus encoders will see such a spectacular evolution, we certainly hope that future encoders will become much better than the reference encoder.

In fact, the 1.1 libopus release significantly improves on the reference encoder's quality. See '''[https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml Monty's demo]''' for more details.

=== Will all future Opus releases comply with the [https://tools.ietf.org/html/rfc6716 Opus specification]? ===

Yes.

=== In what ways is Opus optimized for the Internet? ===

Opus has good packet loss robustness and concealment, but its optimisations go further.

One of the first things we've been asked when designing Opus was to make the rate '''really''' adaptable because we never know what kind of rates will be available. This not only meant having a wide range of bitrates, but also being able to vary in small increments.

This is why Opus scales from about '''6 ''' to '''512 kb/s''', in increments of '''0.4 kb/s''' (one byte with 20 ms frames). Opus can have '''more than 1200 possible bitrates''' while spending only '''11 bits''' signalling the bitrate because UDP already encodes the packet size.

One last aspect is that Opus is simple to transport over RTP, as can be seen from the [https://tools.ietf.org/html/rfc7587 Opus RTP payload format]. For example, it's possible to decode RTP packets without having even seen the SDP or any out-of-band signalling.

=== What applications for Android can play Opus? ===

Right now, there are just a few but that list is fast growing. Please reference [https://android.stackexchange.com/q/37970/7425 this question on android.stackexchange.com]. Feel free to suggest other applications.

=== When will the next version be released? ===

When it's done. Seriously, we do not know.

Opus is not a large project with a fixed release schedule.

That being said, our '''[https://www.opus-codec.org/downloads/ pre-releases]''' and even the git repositories ('''[https://git.xiph.org/?p=opus.git Xiph]''', '''[https://github.com/xiph/opus GitHub]''') are pretty stable and given proper testing (which you should always do anyway), are safe to distribute.

Just be aware that the API of new features (that have never been included in a stable release) could potentially still change.

== Software Developers' Questions ==

=== On what platforms does Opus run? ===

The Opus code base is written in C89 and should run on the vast majority of recent (and not so recent) CPUs.

Some of the platforms '''[https://mf4.xiph.org/jenkins/view/opus/ on which Opus has been tested]''' include x86, x86-64, ARM, Itanium, Blackfin, and SPARC.

=== Is there a fixed-point implementation? ===

Yes.

The fixed-point and floating-point decoder and encoder implementations are part of the same code base.

The code defaults to float, so you need to configure with '''--enable-fixed-point''' (or define '''FIXED_POINT''' if not using the configure script) to build the code for fixed-point.

=== Which implementation should I use? ===

While the implementation in RFC 6716 is what ''defines'' the standard, it is likely not the best and most up-to-date implementation.

The [https://opus-codec.org/ Opus] website was set up for the purpose of continually improving the implementation — in terms of speed, encoding quality, device compatibility, etc — while still conforming to the standard.

All Opus implementations are compatible by definition.

=== How is supporting Opus different from supporting Speex/G.711/MP3? ===

Opus has variable frame durations which can change on the fly, so an Opus decoder needs to be ready to accept packets with durations that are '''any multiple of 2.5ms''' up to a '''maximum of 120ms'''.

The opus encoder and decoder do not need to have matched sampling rates or channel counts. It is recommended to always just decode at the highest rate the hardware supports (e.g. 48kHz stereo) so the user gets the full quality of whatever the far end is sending.

=== My application doesn't work. Can anyone help me? ===

It's possible to get help, but before doing so, there are a few basic things to try:

* Implement your application with uncompressed audio instead of Opus. If it still doesn't work, then the problem isn't related to Opus.
* Read the [https://www.opus-codec.org/docs/ Opus documentation].
* Read the [https://git.xiph.org/?p=opus.git;a=blob;f=src/opus_demo.c opus_demo.c] source code to see how to use the encoder and decoder.

If you still can't solve the problem, the best option is to ask for help on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on the '''#opus''' IRC channel on '''irc.freenode.net'''.

=== How do I report a bug? ===

If you think you have found a bug in Opus (and not in your application), please [https://trac.xiph.org/newticket?component=Opus file a bug report].

Please include a way for us to reproduce the problem. The best way to do this is to provide an input file, along with the opusenc/opusdec/opus_demo command line that causes the bug to occur.

If the bug cannot be triggered by the command line tools, please provide a simple patch or C file that can help reproduce it. Please also provide any other relevant information, such as OS, CPU, build options, etc.

Don't hesitate to also contact us on the [http://lists.xiph.org/mailman/listinfo/opus mailing list] or on [irc://irc.freenode.net/opus IRC].

=== What is Opus Custom? ===

Opus Custom is an '''optional''' part of the Opus standard that allows for sampling rates other than 8, 12, 16, 24, or 48 kHz and frame sizes other than multiples of 2.5 ms.

Opus Custom requires additional out-of-band signalling that Opus does not normally require and disables many of Opus' coding modes. Also, because it is an optional part of the specification, using Opus Custom may lead to compatibility problems.

For these reasons, '''its use is discouraged''' outside of very specific applications.

You may want to use Opus Custom for:

* ultra-low-delay applications, where synchronization with the soundcard buffer is important.
* low-power embedded applications, where compatibility with others is not important.

For almost all other types of applications, Opus Custom should not be used.

=== How do I use 44.1 kHz or some other sampling rate not directly supported by Opus? ===

Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.

Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.

The '''[https://opus-codec.org/downloads/ opus-tools]''' package source code contains a small, high quality, high performance, BSD licensed '''[https://github.com/xiph/opus-tools/blob/master/src/resample.c resampler]''' which can be used where resampling is required.

=== But won't the resampler hurt the quality? Isn't it better to use 44.1 kHz directly? ===

Not really. The quality degradation caused by any reasonable resampler (SoX, libspeexdsp, libsamplerate, ...) is far less than the distortion caused by the best lossy codec at its highest bitrate. If you can't tolerate the quality degradation caused by a good 44.1 ↔ 48 kHz resampler, then you shouldn't be using a lossy codec in the first place. Similarly, the extra CPU spent in the resampler is small compared to the rest of the codec. Not only that, but many soundcards only support 48 kHz on playback, so players can directly play the output rather than resample it to 48 kHz (e.g. for a 44.1 kHz MP3). So effectively, Opus is only shifting the burden of resampling from the decoder side to the encoder side.

One advantage of supporting only one internal rate is that it makes it possible for Opus to support many features, including efficient speech compression (through SILK) and real-time applications. It also means all the quality tuning effort can be spent on a single configuration, which helps bring even better quality.

=== How is the bitrate setting used in VBR mode? ===

Variable bitrate (VBR) mode allows the bitrate to automatically vary over time based on the audio being encoded, in order to achieve a consistent quality.

The bitrate setting controls the desired quality, on a scale that is calibrated to closely approximate the average bitrate that would be obtained over a large and diverse collection of audio. The actual bitrate of any particular audio stream may be higher or lower than this average.

=== What frame size should I use? ===

A '''20ms''' frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.

Sizes greater than 20 ms increase latency and are generally beneficial only at fairly low bitrates, or when used to reduce external overhead (e.g. by reducing the number of packets that are sent). For file encoding, using a frame size larger than 20 ms will usually result in '''worse''' quality for the same bitrate because it constrains the encoder in the decisions it can make.

=== Forward Error correction (FEC) doesn't appear to do anything! HELP! ===

The in-band FEC feature of Opus helps reduce the harm of packet loss by encoding some information about the prior packet.

In order to make use of in-band FEC the decoder must delay its output by at least one frame so that it can call the decoder with the decode_fec argument on the ''next'' frame in order to reconstruct the missed frame. This works best if it's integrated with a jitter buffer.

FEC is only used by the encoder under certain conditions:
* the feature must be enabled via the '''OPUS_SET_INBAND_FEC''' CTL
* the encoder must be told to expect loss via the '''OPUS_SET_PACKET_LOSS_PERC''' CTL
* the codec must be operated in any of the '''Linear Prediction''' or '''Hybrid''' modes

Frame durations shorter than 10ms and very high bitrates will use the MDCT modes, where FEC is not available.

Even when FEC is not used, telling the encoder about the expected level of loss will help it make more intelligent decisions. By default, the implementation assumes there is no loss.

=== I can't use malloc or much stack on my embedded platform. How do I make Opus work? ===

A normal build of libopus only uses <tt>malloc/free</tt> in the <tt>_create()</tt> and <tt>_destroy()</tt> calls, making it safe for realtime use as long as the codec state is pre-created.

To build Opus without the references to <tt>malloc/free</tt>, you must:

* use <tt>init()</tt> calls rather than <tt>create()</tt> calls in your application
* compile with <tt>CFLAGS="-DOVERRIDE_OPUS_ALLOC -DOVERRIDE_OPUS_FREE -D'opus_alloc(x)=NULL' -D'opus_free(x)=NULL' "</tt>.

If libopus is built with <tt>-DNONTHREADSAFE_PSEUDOSTACK</tt> (instead of <tt>VAR_ARRAYS</tt>, or <tt>USE_ALLOCA</tt>), it will use a user-provided block of heap instead of stack for many things, resulting in much lower stack usage. 
This makes the resulting library '''non-threadsafe''' and is '''not recommended''' on anything except limited embedded platforms.

=== How can I ensure that my software interoperates with other software implementing Opus? ===

For applications using Ogg files, there are some [https://people.xiph.org/~greg/opus_testvectors/ Ogg Opus testvectors] to test decoders and you can test encoders with opusdec. For RTP applications, the opusrtp tool can be useful.

In general, here's a list of specific issues to check:
* Can your application handle all frame sizes, including changing the frame size from frame to frame?
* Does your application react properly to lost packets, by calling the decoder with a NULL packet?

=== What is the complexity of Opus? ===

The complexity of Opus varies by a large amount based on the settings used.

It depends on the mode, audio bandwidth, number of channels, and even a "complexity knob" that can trade complexity for quality. It will run easily on any recent PC or smartphone.

For slower embedded CPUs/DSPs, the amount of CPU required will vary depending on the configuration and the exact CPU, so you will need to experiment. Do not expect Opus to run quickly on really slow devices like 8-bit micro-controllers.

=== Opus is using too much CPU for my application. What can I do? ===

First don't panic and don't start writing assembly just yet.

It's possible that you're just not using the right set of options.

If you're targeting an embedded/mobile platform, chances are the fixed-point build will be faster, so make sure you're using '''--enable-fixed-point''' or defining '''FIXED_POINT''' in the build system.

Opus also has a complexity option that can trade quality for complexity. The default is highest quality and highest complexity. You can control this using '''OPUS_SET_COMPLEXITY()''' (see the '''[https://www.opus-codec.org/docs/ Documentation]''' for details).

If all else fails and you need to optimize the Opus code, see the next question.

=== I would like to optimize/improve/help with Opus. Where should I start? ===

Please '''[https://www.opus-codec.org/contact/ contact us]''' before you start, or at least before you get too far.

This will help coordinate the efforts made on Opus and reduce the probability of wasting your time on duplicated effort or going down the wrong path. More details in the '''[[OpusContributing|contributing page]]'''.

=== Does Opus have an echo canceller like Speex does? ===

Echo cancellation is completely independent from codecs.

You can use any echo canceller (including the one from libspeexdsp) along with Opus.

That being said, among the free acoustic echo cancelers (AEC) we're aware of, the best is probably the Google AEC from the [https://code.google.com/p/webrtc/ WebRTC codebase].

=== How do I get the duration of a .opus file? ===

Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__info.html op_pcm_total()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.

If you want to implement this yourself, you need to
* Read the BOS (Beginning Of Stream) pages to enumerate the serial numbers of all concurrently multiplexed streams, identify the Opus stream you want, and get its preskip value.
* Read up through the first complete audio data page to compute the starting granule position (since the timestamps might not start at 0, e.g., if the file was captured from a live stream that was joined after the start).
* Seek near the end of a file and look for a page with the same serial number as found in the headers (just under 64 kB from the end should be sufficient to ensure you find a page, assuming the Opus data is not multiplexed with another stream and there is no trailing garbage in the file).
* If you find a page whose serial number was not included in the original set of BOS pages, you have a chained stream. You need to bisect the file to identify the end of the first chain and the start of the next, and repeat this process for each link in the chain.
* If you don't find any pages at all, or find a page whose serial number was included in the original set of BOS pages, but was not the serial number of the Opus stream you want, back up and try again (being careful to avoid rescanning the same data, which can produce quadratic worst-case complexity).
* If you find a page whose serial number matches the Opus stream you want, look at its final granule position, and compute the total duration (in seconds) as (final_granule_position - initial_granule_position - preskip)/48000.0.

=== Why don't you store the duration in the header? Isn't all of that slow and complicated? ===

Computing the duration directly from the file contents allows files to be written in a single pass, without any seeking, which is necessary for live streaming. Chaining also simplifies live streaming, as you can just pipe multiple files into the same network connection, with all associated metadata updates, etc., and the results are still valid .opus files (contrast with the '''[http://www.smackfu.com/stuff/programming/shoutcast.html hacks used to add metadata to MP3 streams]''').

Opening a typical .opus file, which is not multiplexed and not chained, and computing the duration over the network requires just one extra HTTP request, which can proceed in parallel with the buffering in the main request. This is the behavior you will get from libopusfile's HTTP backend by default.

Enumeration of chain boundaries can be expensive in files with many links, but in our testing libopusfile used nearly an order of magnitude fewer seeks to do this than some other media frameworks (at the time). Storing a duration in a header wouldn't solve this, since every link in a chain has its own, independent headers. If the cost of chain enumeration is a problem, the best way to avoid it is to store the links in separate files (i.e., don't use chaining).

=== How do I seek in a .opus file? ===

Use '''[https://mf4.xiph.org/jenkins/view/opus/job/opusfile-autotools/ws/doc/html/group__stream__seeking.html op_pcm_seek() or op_raw_seek()]''' from '''[https://opus-codec.org/downloads/ libopusfile]'''.

If you want to implement seeking yourself, you need to
* Identify the link that contains the target (if you have a chained file).
* Adjust the target by 80 ms to get enough pre-roll data (to ensure the decoder will have converged by the time you reach the target), as recommended by '''[https://tools.ietf.org/html/rfc7845 RFC 7845]'''.
* Estimate the location of the last audio data page with a completed packet prior to the adjusted target, using the duration and size (in bytes) of the link.
* Seek to that location and scan forward until you find an audio data page with a completed packet (that contains a valid granule position).
* If you think you are sufficiently close to the adjusted target, scan forward until you find the next audio data page with a completed packet.
* If the adjusted target lies between the first audio data page with a completed packet you found and the next one, stop. You can decode forward from here and start playing when you reach your (original, unadjusted) target.
* Otherwise, go back and re-estimate the seek location using the granule positions and file offsets of the page(s) you just found.

libopusfile includes fallbacks to prevent pathological worst-case behavior when its guesses are repeatedly wrong. Weighted bisection can degrade to a linear scan, but libopusfile's worst case is within a constant factor of naive bisection (i.e., logarithmic). We have only ever observed such pathological behavior in files we manually constructed to trigger it.

libopusfile also takes shortcuts when the target location is near the current position, to make small seeks cheaper. In the best case it can loop forever over very short files whose data is contained in a single page (e.g., less than 1 second long with default encoder settings) without any seeking at all.

You can find more information on seeking in files that contain Opus multiplexed with other streams (e.g., video) '''[[GranulePosAndSeeking|on this page]]'''.

=== Wouldn't it be better to build an index? ===

As with file durations, an index at the beginning of the file is incompatible with live streaming. It also means more data has to be fetched before a file can start playing over the network, because you must read past the index even when you don't intend to seek. The index could be stored at the end (which even still allows encoding the file in a single pass), but this requires one (or more) extra seeks to read the index (especially if its exact location at the end is not known), either on file open or on first seek. Unlike the final timestamp, which is small and fixed in size, an index grows with the file duration, and can have unbounded size. It is also easy for an index to become out of sync with a file that has been edited or damaged, in which case seeking will simply fail. By contrast, you can seek in a truncated .opus download without issues.

In practice, bisection seeking on VBR audio achieves performance that is very nearly as good as seeking with an index, without any of the drawbacks of an index. libopusfile provides a test program called seeking_example which can be used to benchmark the performance on your files.

On a 96 kbps VBR file nearly one hour long (the second movement of Mahler's Symphony No. 8 "Symphony of a Thousand"):

Testing exact PCM seeking to random places in 169680000 samples (58m55.000s)...
Total seek operations: 1020 (1.020 per exact seek, 2 maximum).

On a chained file formed by concatenating the eight test vectors for the currently supported channel layouts in mapping family 1:

Opened file containing 8 links with 18 seeks (2.250 per link).
Testing exact PCM seeking to random places in 2759064 samples (57.481s)...
Total seek operations: 946 (0.946 per exact seek, 2 maximum).

That is, the number of physical seeks required is almost always 1, every once in a while 2, and in short files, sometimes even 0.

[[Category:Opus]]

Icecast Server/known https restrictions

2017-11-03T16:40:43Z

MrZeus: /* TLS Mode compatibility charts */ whitespace, grammar, emboldening

This page lists known problems of latest released Icecast when operating with TLS enabled.

* 'listenurl' in the internal XML status representation is not protocol aware and will always use 'http' + global hostname (default: "localhost") and port (default: first listen-socket).
* Virtual playlist files don't work
* Authentication helper doesn't work (needs verification)
* Certificate reload is not implemented in 2.4.x. Icecast2 2.4.x needs to be restarted to reload the certificate. (supported in branch ph3-update-TLS.)
* …

== TLS Mode compatibility charts ==
The following tables list Icecast configuration settings (horizontal) versus client settings (vertical).

Note: While '''auto''' mode may connect using TLS, it will not establish a secure connection. '''auto_no_plain''' will ensure a secure connection.

=== Icecast2 2.4.x ===
{| class="wikitable"
! rowspan="2" |
! colspan="2" | libshout
|-
! 0 !! 1
|-
! disabled
| Yes || No
|-
! auto
| Yes || Yes
|-
! auto_no_plain
| No || Yes
|-
! [https://tools.ietf.org/html/rfc2817 RFC2817]
| No || No
|-
! [https://tools.ietf.org/html/rfc2818 RFC2818]
| No || Yes
|}

=== Icecast2 2.5.x (branch "master") ===
Note: for truth values the following keywords can be used in the configuration: 0, false, no, off, 1, true, yes, on

{| class="wikitable"
! rowspan="3" |
! colspan="7" | libshout
|-
! colspan="2" | TLS not configured
! colspan="5" | TLS configured
|-
! disabled !! auto, false
! disabled !! auto, false !! auto_no_plain !! rfc2817 !! rfc2818, true
|-
! disabled
| Yes || Yes || Yes || Yes || No || No || No
|-
! auto
| Yes || Yes || Yes || Yes || Yes || Yes || Yes
|-
! auto_no_plain
| No || No || No || Yes || Yes || Yes || Yes
|-
! [https://tools.ietf.org/html/rfc2817 RFC2817]
| No || No || No || Yes || Yes || Yes || No
|-
! [https://tools.ietf.org/html/rfc2818 RFC2818]
| No || No || No || Yes || Yes || No || Yes
|}

[[Category:Icecast]]

Icecast Server/known https restrictions

2017-11-03T16:38:21Z

MrZeus: /* TLS Mode compatibility charts */ linkify RFC mentions

This page lists known problems of latest released Icecast when operating with TLS enabled.

* 'listenurl' in the internal XML status representation is not protocol aware and will always use 'http' + global hostname (default: "localhost") and port (default: first listen-socket).
* Virtual playlist files don't work
* Authentication helper doesn't work (needs verification)
* Certificate reload is not implemented in 2.4.x. Icecast2 2.4.x needs to be restarted to reload the certificate. (supported in branch ph3-update-TLS.)
* …

== TLS Mode compatibility charts ==
The following list Icecast configuration settings (horizontal) versus client settings (vertical).
Note: While auto mode may connect using TLS it will not establish a secure connection. auto_no_plain will ensure a secure connection.

=== Icecast2 2.4.x ===
{| class="wikitable"
! rowspan="2" |
! colspan="2" | libshout
|-
! 0 !! 1
|-
! disabled
| Yes || No
|-
! auto
| Yes || Yes
|-
! auto_no_plain
| No || Yes
|-
! [https://tools.ietf.org/html/rfc2817 RFC2817]
| No || No
|-
! [https://tools.ietf.org/html/rfc2818 RFC2818]
| No || Yes
|}

=== Icecast2 2.5.x (branch "master") ===
Note: for truth values the following keywords can be used in the configuration: 0, false, no, off, 1, true, yes, on

{| class="wikitable"
! rowspan="3" |
! colspan="7" | libshout
|-
! colspan="2" | TLS not configured
! colspan="5" | TLS configured
|-
! disabled !! auto, false
! disabled !! auto, false !! auto_no_plain !! rfc2817 !! rfc2818, true
|-
! disabled
| Yes || Yes || Yes || Yes || No || No || No
|-
! auto
| Yes || Yes || Yes || Yes || Yes || Yes || Yes
|-
! auto_no_plain
| No || No || No || Yes || Yes || Yes || Yes
|-
! [https://tools.ietf.org/html/rfc2817 RFC2817]
| No || No || No || Yes || Yes || Yes || No
|-
! [https://tools.ietf.org/html/rfc2818 RFC2818]
| No || No || No || Yes || Yes || No || Yes
|}

[[Category:Icecast]]

Icecast Server/Git workflow

2017-11-03T16:05:33Z

MrZeus: /* Repositories */ headerize headers.

The Icecast project recently migrated from Subversion to Git, this page outlines how to get started with it!

== Repositories ==

The repositories are at [https://git.xiph.org git.xiph.org] and are mirrored to [https://github.com/xiph GitHub]. All repository names start with "icecast-" for clarity.
{| class="wikitable"
!Name
!Anonymous access URL
!SSH URL (only project members)
!Comments
|-
|[https://git.xiph.org/?p=icecast-server.git Icecast server]
|<code><nowiki>https://git.xiph.org/icecast-server.git</nowiki></code>
|<code>ssh://git@git.xiph.org/icecast-server.git</code>
|
|-
|[https://git.xiph.org/?p=icecast-ices.git IceS]
|<code><nowiki>https://git.xiph.org/icecast-ices.git</nowiki></code>
|<code>ssh://git@git.xiph.org/icecast-ices.git</code>
|
|-
|[https://git.xiph.org/?p=icecast-libshout.git libshout]
|<code><nowiki>https://git.xiph.org/icecast-libshout.git</nowiki></code>
|<code>ssh://git@git.xiph.org/icecast-libshout.git</code>
|
|-
|[https://git.xiph.org/?p=icecast-directory.git Icecast directory]
|<code><nowiki>https://git.xiph.org/icecast-directory.git</nowiki></code>
|<code>ssh://git@git.xiph.org/icecast-directory.git</code>
|As seen running on http://dir.xiph.org - soon™
|-
|[https://git.xiph.org/?p=icecast-common.git Icecast shared code]
|<code><nowiki>https://git.xiph.org/icecast-common.git</nowiki></code>
|<code>ssh://git@git.xiph.org/icecast-common.git</code>
|No need to check out separately, see below.
|-
|[https://git.xiph.org/?p=icecast-m4.git Icecast shared autofoo]
|<code><nowiki>https://git.xiph.org/icecast-m4.git</nowiki></code>
|<code>ssh://git@git.xiph.org/icecast-m4.git</code>
|No need to check out separately, see below.
|}

The repositories were migrated with their full history, but for reference the old subversion repository structure remains [https://trac.xiph.org/browser/icecast/ browseable] and all subprojects can be checked out below <nowiki>http://svn.xiph.org/icecast/trunk/</nowiki><projectname>. This might be useful in case of some branches (not all were migrated) and no longer maintained projects that were not migrated to Git.

== Cloning the Repo ==
First you need to clone the Git repository, because we use submodules, these should also be cloned, do to this, run:

<nowiki>git clone --recursive https://git.xiph.org/icecast-server.git</nowiki>

If your Git version (<code>git --version</code>) is lower than 1.6.5, do:

<nowiki>git clone https://git.xiph.org/icecast-server.git</nowiki>
cd icecast-server
git submodule update --init

== Initializing the Submodules ==
The steps we did above, for cloning, initialized the Submodules, but if you want to do any changes to them
and push them back to the remote repository, we need to set them to a specific branch, in this case, master.

First of all, checkout the master branch, depending on your git version, your modules may be initialized in a detached HEAD state.

git submodule foreach git checkout master

(If your git version does not support this, <code>cd</code> into each submodule and run <code>git checkout master</code>)

== Pushing changes to a remote Server ==
When you are done with some super cool new feature, or even while working on it, you may want to push your current state to the remote repository, so others can test it and give you
some Feedback!
For this example let's assume you've built an ACL, therefore changed something in httpp.c which is in the common submodule and changed a lot of stuff in parent repository.

First you need to commit the changed you made in the common submodule, so <code>cd</code> into it, and do

git status

This will list you the changes you made, each change you want to have in the commit needs to be added, let's assume (which is the most common case) you want to commit all changes.
You could either do <code>git add .</code> or even shorter:

git commit -a

The <code>-a</code> or <code>--all</code> option will add all changed or deleted files, but not add any untracked files.

Now enter a meaningful commit message, the first line should be a rough summary, followed by two newlines and a more verbose description. Less is not more in this case, that’s what the summary is for.

Ok now it's time to push the changes to the remote server, if this is the first time you do this, you might need to set the origin url, because it defaults to a http(s) one, so that people without ssh access can clone the repository and submodules too, but for cloning you want to use ssh. Let’s set the remote origin like this:

<nowiki>git remote set-url origin ssh://git@git.xiph.org/icecast-common.git</nowiki>

Now push the changes to the remote location:

git push origin master

This tells git to push your copy of the master branch to the remote location origin (that we’ve just set to the right url).

Ok now that we cared about the submodule, let's <code>cd</code> back into the parent repository, and commit the changes we made there:

git commit -a

Now enter a meaningful commit message. (Yes, I sound like a broken record, but this is important)

Push the stuff to the remote:

git push origin master

(If you are on a different branch than master, you probably want to replace master with the branch you are on, obviously, or just do <code>git push</code>)

NOTE: Even if you hadn't changed anything in the parent repository but just in the submodule repository, you would need to commit the change of the version of the submodule to the parent repository. If you just had updated the httpp.c you still would needed to do <code>git commit -a -m "Update commons to recent version for latest httpp changes"</code>, and push it, to make the parent repository point to the right submodule version.

== Updating the local repository ==
Let's say someone else committed something and pushed it, and you want to update your local copy to the one of the remote. Let's assume you have nothing changed, so you are just a bit behind in history, then it is a simple as:

git pull

and

git submodule update

to make sure submodules are up to date too.

If something changed, then git needs to reconcile the local changes and the remote changes. We prefer to avoid merge commits, unless there is a larger branch developed feature.

This can be either accomplished by always running:

git pull --rebase

Or by setting up git to do this automatically for you. The following passage is taken verbatim from [https://coderwall.com/p/tnoiug/rebase-by-default-when-doing-git-pull Marcin Kulik's blog post on the same topic]

''In git >= 1.7.9:''

git config --global pull.rebase true

''In git < 1.7.9:''

git config --global branch.autosetuprebase always

''The latter has the effect of automatically adding branch.<name>.rebase true for each checked out local branch that is tracking an upstream branch to the repository config file.''

''Note that if you have both options set (not really recommended) then branch.<name>.rebase true that is automatically added for each branch takes precedence over global pull.rebase true.''

[[Category: Icecast]]

Ambisonics

2017-11-03T12:32:59Z

MrZeus: /* Malham notation */ remove extra table row

''This page is part of the Xiph Wiki, and is aimed at people developing file formats and associated software for Ambisonics. For an general introduction to Ambisonics, please go to the ''[[Wikipedia:Ambisonics|Wikipedia page on Ambisonics]]''.''

'''Ambisonics''' is a surround sound system first developed in the 1970s. Its main difference from other surround techniques is that it separates transmission channels from speaker feeds, the speaker feeds being derived using a decoder situated in the living room. Decoders can be implemented in either hardware or software. Typically more speakers are used than transmission channels, and the more speakers used then the more stable the resulting soundfield. Speakers can be arranged in a number of configurations, regular polygons being the most popular.

Ambisonic files can come in a number of different formats. The main one is called B-Format, the other formats being derived from this. UHJ format is mono- and stereo-compatible. G-Format is a set of speaker feeds, so can be enjoyed in surround sound without the need for a decoder in the living room.

== Ambisonics and 5.1 ==

Ambisonics and conventional 5.1 surround sound are very different. 5.1 is a set speaker feeds, the signal only being fully defined for sounds coming from a speaker. Phantom images between speakers can be created, but the technique to do so is left unspecified. Many 5.1 releases use pair-wise mixing to create phantom images. This is understandable as almost all stereo recordings are mixed using pair-wise mixing.

Pair-wise mixing is also called "pan-potting", "amplitude mixing" and "intensity stereophony". It mixes signals into the feeds for a pair of speakers to create the illusion that a sound is coming from a point somewhere between the speakers. During mixing, the apparent location of each sound is determined only by the relative amplitude of that sound in the two speakers.

Unfortunately, pair-wise mixing works poorly when the speakers are to the rear of the listener and not-at-all when they are to one side. You can demonstrate this for yourself by performing [http://members.tripod.com/martin_leese/Ambisonic/experiment.html a very simple experiment]. Pair-wise mixing did not work in the quadraphonic era and it will not work now. Such an absolute statement can be made because the way that humans localize sound has not changed.

Ambisonics is fundamentally different from 5.1. What is encoded in Ambisonics is not speaker feeds, but ''direction''. When mixing in Ambisonics, the positions of the speakers are unknown ''and are of no interest''. Further, when Ambisonics is decoded to speaker feeds all of the speakers cooperate to localize a sound in its correct position so, for example, when the speakers on the left push those on the right pull. The speakers all contribute to the creation of a single coherent soundfield.

=== Ambisonics to 5.1 ===
Converting Ambisonics to 5.1 is straightforward, and is discussed below (see [[#G-Format|G-Format]]).

=== 5.1 to Ambisonics ===
Converting 5.1 to Ambisonics is more difficult. It is easy to make the five speaker feeds phantom images, called "virtual speakers". (The ".1" channel can be folded into W.) The problem with this is that even if the Ambisonic rendering is perfect, the result will only be as good as the original 5.1 played through ''real'' speakers. It will not be an improvement. Nobody has yet come up with a way for Ambisonics to improve 5.1; 5.1 is simply too broken.

== B-Format ==

B-Format is a single coherent soundfield composed of a set of related channels. The number of channels used depends on whether the soundfield is horizontal-only or full-sphere, and on the order. These B-Format channels are transmission channels, not speaker feeds. Listening to B-Format requires a decoder in your living room. Some numbers of channels are tabulated below.

=== Channel correlation ===
Compression techniques typically make use of channel correlation to remove redundancy from the audio data, and so improve the compression ratio.

The correlation between B-Format channels depends on the content. Four-channel B-Format consists of an omni-directional component, called W, and three figure-of-eight components pointing forward, left and up, called X, Y, Z. ([http://members.tripod.com/martin_leese/Ambisonic/Harmonic.html Pictures are available].) Three-channel, horizontal-only B-Format simply omits the Z channel. This means that anything in X also appears in W. Same for Y and Z. (W is omni-directional; everything appears in W.) Also, if content comes from Front-Left then it appears equally in X and Y. Same for content from Front-Right, Back-Left, Back-Right; only the relative polarities change. So there can be a lot of correlation between B-Format channels, but it is content dependent.

One problem with B-Format is that it is big on low-frequency phase. The phase relationships between the different B-Format channels are important if the resulting soundfield is to correctly "gel". This may be a problem when B-Format channels are compressed using lossy compression.

There is a file specification in use for downloadable B-Format files called the [http://members.tripod.com/martin_leese/Ambisonic/B-Format_file_format.html ".amb" specification].

=== Limitations of the ".amb" specification ===
The [http://members.tripod.com/martin_leese/Ambisonic/B-Format_file_format.html ".amb" specification] for downloadable B-Format files is based on the WAVE-EX format. There are currently over 200 pieces available in this format [http://www.ambisonia.com for free download]. Most of these are first-order full-sphere soundfields. (A [https://en.wikipedia.org/wiki/List_of_Ambisonic_Software list of Ambisonic software decoders] is avaialble on Wikipedia.) Some of the limitations of the specification are:

#It is limited to 4 GByte files (2 GBytes if somebody screwed up).
#It is limited to third-order soundfields and below. While third-order looks like a lot (16 channels), there are already research rigs that reproduce fourth-order (25 channels).
#No compression (particularly lossless).

The reason that the ".amb" file specification is limited to third-order and below is because it uses the number of channels to uniquely define the soundfield order. Unfortunately this simple and elegant scheme does not work above third-order as ambiguities creep in. (One ambiguity is illustrated in the table below.)

A more general file format will have to use something else, such as ''Malham notation'', or storing both the horizontal-order and height-order. There is a one-to-one correspondence between Malham notation and the pair of orders, and either can generate the number of channels.

==== Malham notation ====
Malham notation specifies the order of a B-Format soundfield using a string of characters, each character being either '''f''' (for full-sphere) or '''h''' (for horizontal). The first character in the string specifies the type of the first-order components, the second character the type of the second-order components, etc.

{| class="wikitable" style="text-align:center"
|-
!Horizontal order
!Height order
!Soundfield_type
!Malham notation
!Number of_channels
!Channels
|-
| 1|| 0||horizontal||'''h'''|| 3||WXY
|-
| 1|| 1||full-sphere||'''f'''|| 4||WXYZ
|-
| 2|| 0||horizontal||'''hh'''|| 5||WXYUV
|-
| 2|| 1||mixed-order||'''fh'''|| 6||WXYZUV
|-
| 2|| 2||full-sphere||'''ff'''|| 9||WXYZRSTUV
|-
| 3|| 0||horizontal||'''hhh'''|| 7||WXYUVPQ
|-
| 3|| 1||mixed-order||'''fhh'''|| 8||WXYZUVPQ
|-
| 3|| 2||mixed-order||'''ffh'''|| 11||WXYZRSTUVPQ
|-
| 3|| 3||full-sphere||'''fff'''|| 16||WXYZRSTUVKLMNOPQ
|-
| 4|| 0||horizontal||'''hhhh'''|| 9||extra channels unlabled
|}

=== Default channel conversions from B-Format ===
Converting a B-Format file to a mono file is straightforward. Use Mono = W*sqrt(2).

Converting a B-Format file to a stereo file is more difficult. The "proper" way to do this is to convert the W,X,Y channels to two-channel UHJ. Unfortunately this requires the use of wide-band 90-degree phase shifters. In the digital domain these are usually implemented as convolution filters.

Assuming 90-degree phase shifters are unavailable then the problem is one of choice. Starting from B-Format, it is possible to synthesize ''any'' mic response pointing in ''any'' direction. Hence, it is possible to synthesize ''all'' coincident stereo mic techniques. Two popular stereo techniques are ''Blumlein Mid-Side'' and ''Blumlein Crossed Pair''.

==== Blumlein Mid-Side ====
<pre>
Mid = (W*sqrt(2)) + X /*This is a cardioid response pointing forward*/
Left = Mid + Y
Right = Mid - Y
</pre>

==== Blumlein Crossed Pair ====
<pre>
Left = (X + Y)/sqrt(2) /* (Left, Right) are just the (Y, X) */
Right = (X - Y)/sqrt(2) /* responses rotated by -45 degrees */
</pre>

Which conversion to stereo is better depends on the material and how it was recorded. A good suggestion is to not specify a ''particular'' default channel conversion; instead, simply specify that there must be one. If one has to be specified then Blumlein Crossed Pair is the simpler.

== UHJ format ==

B-Format is the main format for Ambisonic files. However, B-Format is not mono- or stereo-compatible. This is why the UHJ hierarchical system was developed. Depending on the number of channels available, the UHJ system can carry more or less information, but at all times it is fully mono- and stereo-compatible. Up to four channels (Left, Right, T, Q) may be used. The T-channel can also be band-limited but, as this "2½-channel UHJ" was only ever used for FM radio transmission, it will not be discussed further.

To listen to UHJ files in surround requires a decoder in your living room. Also, UHJ is restricted to first-order soundfields, either horizontal (two- and three-channel UHJ) or full-sphere (four-channel UHJ).

Converting B-Format channels to UHJ channels, and vice versa, requires the use of wide-band 90-degree phase shifters. In the digital domain these are usually implemented as convolution filters. Conversion between four-channel B-Format (W, X, Y, Z) and four-channel UHJ (Left, Right, T, Q) can be accomplished without loss of information. The same with three-channel to three-channel (W, X, Y) <=> (Left, Right, T). It is possible to recover three-channel B-Format (W, X, Y) from two-channel UHJ (Left, Right), but not without loss. It is also important for the Ambisonic decoder to be aware that the B-Format channels were recovered from two-channel UHJ (because of the need to apply different shelf filters).

Several hundred [http://surrounddiscography.com/ two-channel UHJ LPs and CDs] have been released. Three- and four-channel UHJ recordings have never been commercially released.

=== UHJ encoding and decoding equations ===

==== Encoding ====
<pre>
S = 0.9396926*W + 0.1855740*X
D = j(-0.3420201*W + 0.5098604*X) + 0.6554516*Y

Left = (S + D)/2.0
Right = (S - D)/2.0
T = j(-0.1432*W + 0.6512*X) - 0.7071*Y
Q = 0.9772*Z

where j is a +90 degree phase shift
</pre>

==== Decoding ====
For two-channel UHJ:
<pre>
S = (Left + Right)/2.0
D = (Left - Right)/2.0

W = 0.982*S + j*0.164*D
X = 0.419*S - j*0.828*D
Y = 0.763*D + j*0.385*S

where j is a +90 degree phase shift
</pre>
Note that two-channel UHJ requires the player to use different shelf filters than for B-Format (or for three- and four-channel UHJ).

For three- and four-channel UHJ:
<pre>
S = (Left + Right)/2.0
D = (Left - Right)/2.0

W = 0.982*S + j*0.197(0.828*D + 0.768*T)
X = 0.419*S - j(0.828*D + 0.768*T)
Y = 0.796*D - 0.676*T + j*0.187*S
Z = 1.023*Q

where j is a +90 degree phase shift
</pre>

There is a file specification for downloadable two-channel UHJ files
called the [http://members.tripod.com/martin_leese/Ambisonic/UHJ_file_format.html ".uhj" specification], but it is not currently in use.

=== Limitations of the ".uhj" specification ===
The [http://members.tripod.com/martin_leese/Ambisonic/UHJ_file_format.html ".uhj" specification] for downloadable two-channel UHJ files is based on the WAVE or WAVE-EX format. A UHJ chunk is added to the file to indicate it is UHJ. As unrecognized chunks are always skipped, use of this chunk maintains stereo compatibility. Some of the limitations of the specification are:

#It is limited to 4 GByte files (2 GBytes if somebody screwed up).
#It is limited to two-channel UHJ files. Three- and four-channel UHJ are not accommodated.
#No compression.

The ".uhj" spcecification is only defined for two-channel UHJ to maintain stereo compatibility. While it would be possible to add the UHJ chunk to three- and four-channel WAVE-EX files, the recommendations from Microsoft for playing such files is that the audio device should render the extra channels to output ports not in use. This can happen even when the extra channels are masked off. (Put simply, in WAVE-EX files the channel mask does ''not'' mask channels.) Because of this, three- and four-channel WAVE-EX files can not be made stereo compatible.

In the Xiph world, it should be possible to use default channel conversions to ensure that three- and four-channel UHJ files remain stereo compatible.

=== Default channel conversions from UHJ ===
Converting a UHJ file to a mono file is straightforward. Use Mono =
(Left + Right) / sqrt(2).

Converting a UHJ file to a stereo file is even easier. Use Left = Left, Right = Right, and discard T and Q if present.

== G-Format ==

A G-Format file is any common multi-channel surround file containing an Ambisonic soundfield pre-decoded to its speaker feeds. This allows listeners who do not own an Ambisonic decoder to enjoy Ambisonics.

The sound engineer creates a set of speaker feeds for a particular number and arrangement of speakers. This is typically four speakers arranged in a square. Other speaker arrangements are also possible

In Ambisonics, all speakers cooperate to localize sounds in any particular direction; there are no "surround speakers" as such. Because of this, best results when playing G-Format recordings (and Ambisonics in general) are obtained when the speakers are matched. The easiest way to accomplish this is to use identical speakers. Unfortunately, many home theater systems include a center-front speaker which is different from the other speakers.

An easy way to cope with this is adopted on G-Format recordings commercially released on DVD-A by [http://www.wyastone.co.uk/all-labels/nimbus/dvd-audio.html Nimbus Records]. They use four speakers in a square, the center-front speaker being unused.

=== Recovering B-Format from G-Format ===
It is sometimes possible to recover the original B-Format channels from the G-Format speaker feeds. The recovered B-Format channels can then be fed to a decoder in the listener's living room, and so accommodate a speaker arrangement different from the one used when the G-Format file was produced. Each B-Format channel is recovered using a weighted combination of the speaker feeds in the G-Format file. The conversion coefficients required for the B-Format recovery depend on the particular speaker arrangement chosen by the sound engineer. (Obviously, if a B-Format version of the file also exists then it can be fed to the decoder directly without the need for G-Format.)

File formats for G-Format include all multi-channel formats that contain speaker feeds. However, these will not contain information to allow the B-Format channels to be automatically recovered. A [http://members.tripod.com/martin_leese/Ambisonic/G-Format_chunk.html ".amg" file format] (based on WAVE-EX) for downloadable G-Format files, which will allow the B-Format channels to be automatically recovered, has been proposed. Such file formats have the advantage of storing the conversion coefficients at the time the G-Format file is created. This is the only time the required information is readily available.

=== Default channel conversions from G-Format ===
Converting a G-Format file to a mono or stereo file is straightforward. First, recover the B-Format channels using the conversion coefficients contained in the file. Second, follow the advice given above for [[#Default channel conversions from B-Format|Default channel conversions from B-Format]].

An alternative approach is to encode directly into the file the coefficients for producing the stereo mix. An appropriate [http://members.tripod.com/martin_leese/Audio/StereoMix_chunk.html chunk for WAVE-EX files] has been proposed. This could be [http://members.tripod.com/martin_leese/Audio/stereo_mix_proposal.html extended to other multi-channel file formats].

== Resources on Ambisonics ==

*There is a set of [http://en.wikipedia.org/wiki/Ambisonics Wikipedia articles on Ambisonics].
*Of particular relevance is the [http://members.tripod.com/martin_leese/Ambisonic/B-Format_file_format.html ".amb" specification] in use for downloadable B-Format files. However the ".amb" spec has some limitations which it would be useful to overcome.
*There is also the [http://members.tripod.com/martin_leese/Ambisonic/UHJ_file_format.html ".uhj" specification] for downloadable two-channel UHJ files, but it is not currently in use. The ".uhj" spec also has some limitations which it would be useful to overcome.
*[http://members.tripod.com/martin_leese/Ambisonic/ This website] has many pages on Ambisonics (including at the bottom links to other Ambisonic websites).
*[http://www.ambisonic.net/ Ambisonic.Net website] includes a detailed series of descriptive and practical articles on current and past Ambisonic techniques with links to tools, other sites and additional material.
*[http://ambisonic.info/info/ricardo.html Richard Lee's page on Ambisonics] contains articles on shelf filters and the design of Ambisonic decoders.

[[Category:Developers stuff]]

Icecast Server/known https restrictions

2017-11-03T12:11:35Z

MrZeus: /* Icecast2 2.4.x */ and again

This page lists known problems of latest released Icecast when operating with TLS enabled.

* 'listenurl' in the internal XML status representation is not protocol aware and will always use 'http' + global hostname (default: "localhost") and port (default: first listen-socket).
* Virtual playlist files don't work
* Authentication helper doesn't work (needs verification)
* Certificate reload is not implemented in 2.4.x. Icecast2 2.4.x needs to be restarted to reload the certificate. (supported in branch ph3-update-TLS.)
* …

== TLS Mode compatibility charts ==
The following list Icecast configuration settings (horizontal) versus client settings (vertical).
Note: While auto mode may connect using TLS it will not establish a secure connection. auto_no_plain will ensure a secure connection.

=== Icecast2 2.4.x ===
{| class="wikitable"
! rowspan="2" |
! colspan="2" | libshout
|-
! 0 !! 1
|-
! disabled
| Yes || No
|-
! auto
| Yes || Yes
|-
! auto_no_plain
| No || Yes
|-
! RFC2817
| No || No
|-
! RFC2818
| No || Yes
|}

=== Icecast2 2.5.x (branch "master") ===
Note: for truth values the following keywords can be used in the configuration: 0, false, no, off, 1, true, yes, on

{| class="wikitable"
! rowspan="3" |
! colspan="7" | libshout
|-
! colspan="2" | TLS not configured
! colspan="5" | TLS configured
|-
! disabled !! auto, false
! disabled !! auto, false !! auto_no_plain !! rfc2817 !! rfc2818, true
|-
! disabled
| Yes || Yes || Yes || Yes || No || No || No
|-
! auto
| Yes || Yes || Yes || Yes || Yes || Yes || Yes
|-
! auto_no_plain
| No || No || No || Yes || Yes || Yes || Yes
|-
! RFC2817
| No || No || No || Yes || Yes || Yes || No
|-
! RFC2818
| No || No || No || Yes || Yes || No || Yes
|}

[[Category:Icecast]]

Icecast Server/known https restrictions

2017-11-03T12:09:46Z

MrZeus: /* Icecast2 2.5.x (branch "master") */ prettify table

This page lists known problems of latest released Icecast when operating with TLS enabled.

* 'listenurl' in the internal XML status representation is not protocol aware and will always use 'http' + global hostname (default: "localhost") and port (default: first listen-socket).
* Virtual playlist files don't work
* Authentication helper doesn't work (needs verification)
* Certificate reload is not implemented in 2.4.x. Icecast2 2.4.x needs to be restarted to reload the certificate. (supported in branch ph3-update-TLS.)
* …

== TLS Mode compatibility charts ==
The following list Icecast configuration settings (horizontal) versus client settings (vertical).
Note: While auto mode may connect using TLS it will not establish a secure connection. auto_no_plain will ensure a secure connection.

=== Icecast2 2.4.x ===
{| class="wikitable"
! !! 0 !! 1
|-
! colspan="3" | libshout
|-
! disabled
| Yes || No
|-
! auto
| Yes || Yes
|-
! auto_no_plain
| No || Yes
|-
! RFC2817
| No || No
|-
! RFC2818
| No || Yes
|}

=== Icecast2 2.5.x (branch "master") ===
Note: for truth values the following keywords can be used in the configuration: 0, false, no, off, 1, true, yes, on

{| class="wikitable"
! rowspan="3" |
! colspan="7" | libshout
|-
! colspan="2" | TLS not configured
! colspan="5" | TLS configured
|-
! disabled !! auto, false
! disabled !! auto, false !! auto_no_plain !! rfc2817 !! rfc2818, true
|-
! disabled
| Yes || Yes || Yes || Yes || No || No || No
|-
! auto
| Yes || Yes || Yes || Yes || Yes || Yes || Yes
|-
! auto_no_plain
| No || No || No || Yes || Yes || Yes || Yes
|-
! RFC2817
| No || No || No || Yes || Yes || Yes || No
|-
! RFC2818
| No || No || No || Yes || Yes || No || Yes
|}

[[Category:Icecast]]

Games that use Vorbis

2017-10-06T14:07:04Z

MrZeus: lighten block quote bg colour, to make them more readable

The following games use [[Vorbis]], most frequently for their in-game music or sound effects:

* All Games By [http://www.reflexive.com/index.php?CAT=Search&SEARCH=dev%3AReflexive+Entertainment&PAGE=GameList Reflexive Entertainment].

* [http://www.mobygames.com/game/windows/007-nightfire 007: Nightfire]: Uses Ogg Vorbis for background soundtrack.

* [http://www.asciisector.net/ Ascii Sector]: Space combat/exploration/trading game. Uses Ogg Vorbis for music.

* [http://www.ageofconan.com/ Age of Conan — Hyborian Adventures]: Uses Ogg Vorbis for all audio.

* [http://www.americasarmy.com/ America’s Army]: Uses Ogg Vorbis for main theme.

* [http://www.amnesiagame.com/ Amnesia: The Dark Descent]: Uses Ogg Vorbis for all audio.

* [http://assault.cubers.net/ AssaultCube]: A free fast paced first-person shooter with little hardware requirements for Windows, Linux and OS X. Uses Ogg Vorbis for all game sounds and music.

* [http://www.lionhead.com/bw2/ Black & White 2]: Uses Ogg Vorbis for music.

* [http://www.pyrogon.com/games/candycruncher/ Candy Cruncher]: This cute puzzle game from Brian Hook’s company, Pyrogon, uses Vorbis for the addictive music you hear while you race the clock.

* [http://www.callofcthulhu.com/ Call of Cthulhu] is a first-person horror game that combines intense action and adventure elements. It uses Ogg Vorbis for music and speech.

* [http://www.mobygames.com/game/windows/catechumen Catechumen] is a Christian-themed FPS that uses Ogg Vorbis.

* [http://www.civilization5.com/ Civilization V] is a turn-based strategy game that uses Ogg Vorbis for music.

* [http://www.atari.com/crashday/ Crashday]: Stunt racing game, developed by independent German studio Moon Byte. Uses Ogg Vorbis for music.

* [http://buenavistagames.go.com/product/chickenLittlePC.html Chicken Litte]: Adventure game for children inspired by the motion picture in PC edition uses Vorbis for dialogs and music. (not sure if sound effects too)

* [http://www.cossacks2.de/ Cossacks 2]: “Cossacks II: Napoleonic Wars” is a sequel of “Cossacks: European Wars”. Ogg Vorbis 1.0 files are in \data\music\

* [http://www.darwinia.co.uk/ Darwinia]: The second title from Indy developer Introversion Software. Darwinia is a stylised retro — Tron meets Cannon Fodder. It uses Vorbis for all in game sound effects and music.

* [http://www.introversion.co.uk/defcon/ DEFCON]: The third title from Introversion Software. Uses Vorbis for music, effects, everything, like Darwinia.

* [http://devilmaycry.com/ Devil May Cry 4] (for the PC, at least): Uses (occasionally multichannel) Ogg Vorbis for ingame and cutscene music.

* [http://www.eidos.co.uk/gss/dxiw/ Deus Ex: Invisible War] by Ion Storm/Eidos: Uses Ogg Vorbis for music and voice (and possibly for sound fx too).

* [http://diablo3.com Diablo III] uses Vorbis for audio.

* [http://www.idsoftware.com/games/doom/doom3/ DOOM 3]: The latest version of this famous first person shooter game from id software uses Vorbis for the theme music as well as their ambient and game sounds.

* [http://mobygames.com/game/sheet/p,3/gameId,6505/ Duke Nukem: Manhattan Project]: This game from 3D Realms was released in 2002 and used Vorbis for their music. (Official website is down, using Mobygames link)

* [http://www.popcap.com/games/free/dynomite Dynomite]: Puzzle Bobble/Bust A Move clone for Windows by PopCap Games, with mouse control. Uses Ogg Vorbis for nearly all sound effects.

* [http://en.wikipedia.org/wiki/Eschalon:_Book_I Eschalon]: A classic-style roleplaying game, for Windows, Mac, and Linux. Music is in ''Ogg Vorbis'' format.

* [http://www.mobygames.com/game/enclave/ Enclave] by Starbreeze/Black Label Games: Uses Ogg Vorbis for music (and possibly for sound fx and voice too).

* [http://www.eve-online.com EVE Online] by CCP Games, the Icelandic-homed space-based single-shard persistent world game uses Ogg Vorbis for its music.

* [http://www.lionhead.com/fabletlc/ Fable: The Lost Chapters]: Uses Ogg Vorbis for music and cutscenes (Ancient libVorbis version, 1.0 RC2).

* [http://farcry.ubi.com/ FarCry] by Crytek: uses Ogg Vorbis for music and effects.

* [http://www.freedom-fighters.co.uk/ Freedom Fighters] by IO Interactive: String search reveals “libVorbis I 20011217” in freedom.exe.

* [http://www.siriusgames.dk/index.php?pageid=67 Gangland] by MediaMobsters: Uses Ogg Vorbis for music and cutscenes (Data\streams\). Encoded with Xiph.Org libVorbis I 20020717. Decoder library: FMOD 3.71.

* [http://www.rockstargames.com/vicecity/ Grand Theft Auto: Vice City] by Rockstar Games/Rockstar North uses Ogg Vorbis to store music, radio, ambient sounds, police messages and cutscene audio. Players can also store their custom tracks (accessible in-game via the “User Track Player” radio station) in Ogg Vorbis.

* [http://www.rockstargames.com/sanandreas/ Grand Theft Auto: San Andreas] by Rockstar Games/Rockstar North uses Ogg Vorbis to store music, radio, ambient sounds, police messages and cutscene audio. Players can also store their custom tracks (accessible in-game via the “User Track Player” radio station) in Ogg Vorbis.

* [http://www.gothic3.com/ Gothic 3] by Piranha Bytes: Vorbis is used in the ogg container for everything (music, speech, effects) except of the intro video. For example: Music @ 256 kb/s, Speech @ 86 kb/s. About 18 hours of speech compressed to 700 MB.

* [http://www.guiltygearx2reload.com/ Guilty Gear XX]: The PC version, at least, uses Ogg Vorbis for all the music.

* [http://www.guitarherogame.com/gh2/ Guitar Hero II] by Red Octane (Activision), XBox360 platform only (multichannel Vorbis with 5 or 6 channels per song)

* [http://halo.bungie.org/ Halo]: Mac and PC versions of Halo use Ogg Vorbis for all audio, it seems. The Xiph license and dynamically linked libraries of Ogg and Vorbis are included in the Halo directory. XBox version does not use Ogg Vorbis.

* [http://harrypotter.ea.com/cofs/index.html Harry Potter II (Chamber of Secrets)]: This is unsubstantiated, it was reported on one of the vorbis mailing lists, but there is little evidence either way on this title. EA has been supportive of Vorbis though, so it’s not entirely impossible. If anyone can give us a yay or nay on this, please do.

* [http://www.mightandmagicgame.com/HeroesV/ Heroes of Might and Magic V]: Uses Vorbis for audio and Theora for video.

* [http://www.eidosinteractive.com/games/info.html?gmid=118 Hitman 2]: uses Vorbis. (PC only or consoles too?)

* [http://www.codemasters.com/igi2/front.htm IGI2: Covert Strike]: Not a Norwegian first-person shooter.

* [http://www.inthegroove.com In The Groove]: The premier dance game created by [http://www.roxorgames.com Roxor Games, Inc.] Uses Vorbis for all of the in-game music.

* [http://www.agdinteractive.com/games/kq1/ King's Quest I]: King's Quest I: Quest for the Crown (Enchanced) is a fan remake of the original Sierra classic. Uses Ogg Vorbis for sound and Ogg Theora for cutscene movies.

* [http://www.p3int.com/KULT/ KULT Heretic Kingdoms] by 3D People/Project 3 Interactive: Uses Vorbis (1.0) for music, voice and sound effects.

* Recent Legacy of Kain Games: On the PC, both '''Soul Reaver 2''' and '''Blood Omen 2''' by Crystal Dynamics/Eidos use Ogg Vorbis for music and sound effects. (Source: [http://www.thelostworlds.net/FAQ.HTML#ogg])

* [http://www.ncsoft.net/eng/ncgames/lineage2_intro.asp Lineage II]: NCSoft Corporation’s 3D MMORPG Lineage II uses Ogg Vorbis for its music. They use 1.0beta3, though.

* [http://www.liveforspeed.net/ Live for Speed]: Online racing simulator uses Ogg for all audio and sound effects.

* [http://www.mobygames.com/game/lock-on-modern-air-combat Lock On: Modern Air Combat]: Published by Ubisoft; CD-ROM contains over 1800 Ogg Vorbis files for speech.

* [http://www.mafia-game.com/ Mafia: The City Of Lost Heaven]: Not sure about any console version, but PC version is reported to use Ogg Vorbis.

* [http://www.popcap.com/games/magicmatch Magic Match]: A very elaborate "Match 3" casual game that uses Ogg Vorbis for its audio.

* [http://www.capcom.co.jp/rockmanx8/ Mega Man X8]: The PC version of Mega Man X8 makes use of Vorbis for music and dialogue during cutscenes.

* [http://www.mobygames.com/game/gamecube/metal-gear-solid-the-twin-snakes Metal Gear Solid: The Twin Snakes]: Uses Ogg Vorbis for all speech in the game.

* [http://minecraft.net Minecraft]: Uses Ogg Vorbis for music and sound effects.

* MotoGP: This motorcycle racing sim uses Vorbis for the music and allows players to drop their own .ogg files into the music dir to listen to them in-game.

* [http://www.mystrevelation.com/ Myst IV: Revelation]: Fourth game in the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* [http://www.mystvgame.com/ Myst V: End of Ages]: Fifth and final game in the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* Nascar Racing Games from Papyrus: They had this to say about their decision and experience:
<blockquote style="background-color: #eeeeee">
"We’re using a lot of spoken audio in this title (a first for us) and your codec has allowed us to reduce more than 350MB of audio data to about 40MB, a huge savings of memory and disk space! We are very impressed." — Tom Faiano, Producer
</blockquote>
<blockquote style="background-color: #eeeeee">
"Incorprating Ogg Vorbis into our codebase was quite painless, and in the end, even refreshing. No fuss no muss. Thank you for your efforts!" — Bill Farquhar, Soundguy du jour
</blockquote>

* [http://www.nexuiz.com/ Nexuiz], a fast-paced FPS with roots in Quake I, uses Vorbis for background music. The minstagib mod uses Vorbis for all of its sound.

* [http://www.codemasters.com/flashpoint/ Operation Flashpoint]: This highly successful military simulation/action game from Codemasters uses Vorbis for the in-game music.

* [http://www.orunner.com/ Ostrich Runner] by Geleos: This funny Russian cartoon-style game for kids and not only kids uses Ogg Vorbis for sound, speech and music.

* [http://www.ysagoon.com/glob2/ Globulation 2]: State of the art GPL-ed strategy game!

* [http://www.penumbragame.com Penumbra: Black Plague]: Uses Ogg Vorbis for all audio.

* [http://www.psobb.com/index.php Phantasy Star Online: Blue Burst]: Uses Ogg Vorbis for music, stored in data/ogg.

* [http://www.gopostal.com/ Postal 2]: Probably not the game we want to use to showcase Vorbis, but it’s being used in this Unreal-engine-powered ultra-violent game.

* [http://www.praetoriansgame.com/ Praetorians]: This very successful game from Pyro Studios uses Vorbis for its music.

* [http://www.psychonauts.com/ Psychonauts]: Has vorbis.dll and vorbisfile.dll.

* [http://www.quake4game.com/ Quake 4]: Quake 4 is the fourth title in the series of Quake FPS computer games. All game music, speech and sound effects make use of Vorbis.

* [http://www.restricted-area.net/ Restricted Area]: by Master Creating uses Ogg Vorbis for music and VP3 for videos.

* Ricochet: An addictive version of Break out.

* [http://www.rockband.com/ Rock Band]: XBox360 version uses the same type of multichannel Vorbis files as Guitar Hero II, but with more channels to handle the drums and vocals separately.

* [http://www.rockmanager.net/ Rock Manager]: Vorbis is used in this “new rock ’n roll management sim for PC from Pan Vision and Monsterland”.

* [http://www.sacred2.com/ Sacred 2] by Studio II: uses multichannel(!) Ogg Vorbis for music, speech and sound effects.

* [http://www.s2games.com/savage/ Savage]: This S2 Games “RTSS” hybrid genre game uses Vorbis for all the in-game music.

* [http://www.serioussam.com/se/ Serious Sam: The Second Encounter]: uses Vorbis for the music, although it is slightly obfuscated so as not to be easily playable by standard Ogg Vorbis players.

* [http://www.serioussam2.com/ Serious Sam 2]: not only uses Vorbis for the music but even Theora for the videos

* [http://www.totalwar.com/community/warlord.htm Shogun: Total War]: Shogun uses Vorbis, but only to distribute — everything is decompressed to wav during the install.

* [http://www.singles2.com/englisch/index.html Singles 2]: Uses ogg vorbis for sound

* [http://www.lart.pl/en/portfolioItem.php?id=91 Ski Jumping 2004]: A commerical game that accurately models the activity of ski jumping. The game also contains over 700 Ogg Vorbis files.

* [http://mobygames.com/game/sheet/p,3/gameId,3453/ Star Trek: Away Team]: Vorbis is used for all sound in the game — music, voiceover and SFX. This squad-based strategy game is set in the Star Trek Next Generation universe. (Official website is down, using Mobygames link)

* [http://starcraft2.com/ StarCraft II]: Uses Vorbis for audio

* StoneLoops! Of Jurassica ([http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=315210057&mt=8 Apple iTunes App Store link]): Colorful puzzle game for the iPhone/iPod Touch that uses Ogg Vorbis for audio.

* [http://supertux.lethargik.org/ Super Tux]: Uses Vorbis for music.

* [http://www.splintercell3.com/ Tom Clancy’s Splinter Cell Chaos Theory]: .LS0 files are in fact Ogg Vorbis files.

* [http://www.lucasarts.com/games/swrepubliccommando/ Star Wars Republic Commando]: Vorbis is used in the ambient and game music in this latest action game from LucasArts.

* [http://www.reflexive.net/index.php?PAGE=game_detail&AID=30 Swarm]: A fun little arcade shooter.

* [http://www.swat4.com/ SWAT 4]: SWAT 4 uses Ogg Vorbis for audio files.

* [http://www.croteam.com/talosprinciple/ The Talos Principle] is a first-person puzzle game that uses Ogg Vorbis for music.

* [http://www.there.com/ There]: uses both Ogg Vorbis for the sound effects and Ogg Speex for realtime group voice chat, a first for an immersive consumer-oriented world.
<blockquote style="background-color: #eeeeee">
"Voice has become a very popular part of our product!" — David Weekly, a There developer
</blockquote>

* [http://www.wesnoth.org The Battle for Wesnoth]: uses Ogg Vorbis for it's music and for most of it's sounds.

* [http://www.riddickgame.com/ The Chronicles of Riddick: Escape From Butcher’s Bay (Director’s Cut)]: Uses Vorbis for all audio and Theora for cutscenes.

* [https://thimbleweedpark.com/ Thimbleweed Park]: Retro-looking point-and-click adventure, [https://blog.thimbleweedpark.com/tracking_talkies using Ogg Vorbis for its music, character voices and sound effects].
<blockquote style="background-color: #eeeeee">
"[The characters' dialog is] around 6GB of .wav files and we needed to compress them for inclusion in the game. We used .ogg files due to it being free of the patent and licensing issues that .mp3 has, although either would have worked." — Ron Gilbert
</blockquote>

* [http://www.thethinggames.com/ The Thing]: Uses Vorbis
<blockquote style="background-color: #eeeeee">
"The original multilanguage distro took three CDs, and went down to only one after I converted all wavs to oggs. Nifty :) Sadly enough, marketing decided to not have one language per CD anyway (probably to annoy people who migrate) :/ Thanks for a very cool (and easy to use) lib/format!" — Vincent Penquerc’h
</blockquote>

* [http://www.asahi-net.or.jp/~cs8k-cyu/windows/tt_e.html Torus Trooper]: Frantic 3D shootemup, using Vorbis for the music. (see also the [http://www.emhsoft.net/ttrooper/ Linux port] and [http://www.apple.com/downloads/macosx/games/action_adventure/torustrooper.html MacOS version])

* [http://www.trackmania.com/ TrackMania] uses Vorbis for music in menu and tracks. [music in self-made tracks also need to be in Vorbis]

* [http://www.mikeoldfield.com/ Tr3s Lunas] (aka Music VR episode 1): This game, featuring the music of Mike Oldfield, uses Vorbis for the music.

* [http://www.tribesvengeance.com Tribes: Vengance] by Irration Games/Sierra use Ogg Vorbis for music.

* [http://www.mobygames.com/game/gamecube/true-crime-new-york-city True Crime: New York City]: GameCube version contains over 11,500 Ogg Vorbis files. It is likely that other platform ports also use the same files (note that the [http://www.mobygames.com/game/xbox/true-crime-new-york-city Xbox version] uses Windows Media Audio files in place of Ogg Vorbis files)

* [http://tuxtype.sourceforge.net/ Tuxtyping 2]: Educational typing tutor for kids of all ages!

* [http://www.ufo-aftershock.com/ UFO: Aftershock]: Uses Vorbis for music.

* [http://www.ufo-afterlight.com/ UFO: Afterlight]: Uses Vorbis for music.

* [http://www.atari.com/us/games/unreal2/pc Unreal 2]: PC version uses Vorbis, usage on consoles not confirmed.
<blockquote style="background-color: #eeeeee">
"We went with Ogg Vorbis due to its excellent playback and compression, and we used it not only for music but also all of the in-game voice. Without it, we never would have been able to fit on two CDs." — [http://www.4unrealers.com/entrevistas/263/ 4unrealers.com]
</blockquote>

* [http://www.unrealtournament.com/ut2003/ Unreal Tournament 2003]: This overwhelmingly-popular multiplayer first person shooter PC title uses Vorbis for its music.

* [http://www.unrealtournament.com/ut2004/ Unreal Tournament 2004]: Yet another Unreal game which uses Vorbis for the music (What about effects and voice? Does anyone know?). The readme file of the demo even mentions Speex!

* [http://sc2.sourceforge.net/ The Ur-Quan Masters]: Port of Star Control 2 to modern computers. Toys for Bob released the source of this amazing game under the GPL in 2002. Ogg Vorbis is used for the dialogue and the background music.

* [http://uru.ubi.com/ Uru: Ages Beyond Myst]: Spinoff from the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* [http://mobygames.com/game/sheet/p,3/gameId,8635/ Lionheart — Legacy of the Crusader]: An 3/4 RPG from Black Isle. Uses Vorbis for all audio. Thanks to all the guys that made Vorbis great.. (I even donated money myself, someday maybe I can convince the company to kick in some bucks as well). Official site is down, using mobygames link.

* [http://www.global-gaming.com/Dominion/ Urban Dominion] (beta): First Person Massively Multiplayer Online Role-Playing Game by Global-Gaming. Uses Ogg Vorbis for the sound system.

* [http://www.vietcong-game.com/ Vietcong]: Vietnam War First Person Shooter by Pterodon. Uses Ogg Vorbis I believe for the background music.

* [http://vegastrike.sourceforge.net/ Vega Strike]: It is a free spacesim. Ogg Vorbis files are stored in \music\ .

* [http://www.gathering.com/wingsofwar/ Wings Of War]: It is an arcade shooter in times of WWI. Game has ogg.dll, vorbis.dll and vorbisfile.dll — but *.ogg files are not accesible.

* [http://jonof.edgenetwork.org/winbuild/ WinBuild]: Winbuild is a port of Ken Silverman’s [http://www.advsys.net/ken/buildsrc/default.htm original Build engine demo] (for DOS) to Windows. It uses Vorbis compression for the music.

* [http://www.worldofwarcraft.com/ World of Warcraft]: popular massively multiplayer online role-playing game from Blizzard Entertainment use Vorbis for speech and sound effects.

* [http://www.zax-game.com/ Zax — The Alien Hunter]: A large 3/4 view action adventure game.

[[Category:Vorbis]]

Games that use Vorbis

2017-10-04T11:32:40Z

MrZeus:

The following games use [[Vorbis]], most frequently for their in-game music or sound effects:

* All Games By [http://www.reflexive.com/index.php?CAT=Search&SEARCH=dev%3AReflexive+Entertainment&PAGE=GameList Reflexive Entertainment].

* [http://www.mobygames.com/game/windows/007-nightfire 007: Nightfire]: Uses Ogg Vorbis for background soundtrack.

* [http://www.asciisector.net/ Ascii Sector]: Space combat/exploration/trading game. Uses Ogg Vorbis for music.

* [http://www.ageofconan.com/ Age of Conan — Hyborian Adventures]: Uses Ogg Vorbis for all audio.

* [http://www.americasarmy.com/ America’s Army]: Uses Ogg Vorbis for main theme.

* [http://www.amnesiagame.com/ Amnesia: The Dark Descent]: Uses Ogg Vorbis for all audio.

* [http://assault.cubers.net/ AssaultCube]: A free fast paced first-person shooter with little hardware requirements for Windows, Linux and OS X. Uses Ogg Vorbis for all game sounds and music.

* [http://www.lionhead.com/bw2/ Black & White 2]: Uses Ogg Vorbis for music.

* [http://www.pyrogon.com/games/candycruncher/ Candy Cruncher]: This cute puzzle game from Brian Hook’s company, Pyrogon, uses Vorbis for the addictive music you hear while you race the clock.

* [http://www.callofcthulhu.com/ Call of Cthulhu] is a first-person horror game that combines intense action and adventure elements. It uses Ogg Vorbis for music and speech.

* [http://www.mobygames.com/game/windows/catechumen Catechumen] is a Christian-themed FPS that uses Ogg Vorbis.

* [http://www.civilization5.com/ Civilization V] is a turn-based strategy game that uses Ogg Vorbis for music.

* [http://www.atari.com/crashday/ Crashday]: Stunt racing game, developed by independent German studio Moon Byte. Uses Ogg Vorbis for music.

* [http://buenavistagames.go.com/product/chickenLittlePC.html Chicken Litte]: Adventure game for children inspired by the motion picture in PC edition uses Vorbis for dialogs and music. (not sure if sound effects too)

* [http://www.cossacks2.de/ Cossacks 2]: “Cossacks II: Napoleonic Wars” is a sequel of “Cossacks: European Wars”. Ogg Vorbis 1.0 files are in \data\music\

* [http://www.darwinia.co.uk/ Darwinia]: The second title from Indy developer Introversion Software. Darwinia is a stylised retro — Tron meets Cannon Fodder. It uses Vorbis for all in game sound effects and music.

* [http://www.introversion.co.uk/defcon/ DEFCON]: The third title from Introversion Software. Uses Vorbis for music, effects, everything, like Darwinia.

* [http://devilmaycry.com/ Devil May Cry 4] (for the PC, at least): Uses (occasionally multichannel) Ogg Vorbis for ingame and cutscene music.

* [http://www.eidos.co.uk/gss/dxiw/ Deus Ex: Invisible War] by Ion Storm/Eidos: Uses Ogg Vorbis for music and voice (and possibly for sound fx too).

* [http://diablo3.com Diablo III] uses Vorbis for audio.

* [http://www.idsoftware.com/games/doom/doom3/ DOOM 3]: The latest version of this famous first person shooter game from id software uses Vorbis for the theme music as well as their ambient and game sounds.

* [http://mobygames.com/game/sheet/p,3/gameId,6505/ Duke Nukem: Manhattan Project]: This game from 3D Realms was released in 2002 and used Vorbis for their music. (Official website is down, using Mobygames link)

* [http://www.popcap.com/games/free/dynomite Dynomite]: Puzzle Bobble/Bust A Move clone for Windows by PopCap Games, with mouse control. Uses Ogg Vorbis for nearly all sound effects.

* [http://en.wikipedia.org/wiki/Eschalon:_Book_I Eschalon]: A classic-style roleplaying game, for Windows, Mac, and Linux. Music is in ''Ogg Vorbis'' format.

* [http://www.mobygames.com/game/enclave/ Enclave] by Starbreeze/Black Label Games: Uses Ogg Vorbis for music (and possibly for sound fx and voice too).

* [http://www.eve-online.com EVE Online] by CCP Games, the Icelandic-homed space-based single-shard persistent world game uses Ogg Vorbis for its music.

* [http://www.lionhead.com/fabletlc/ Fable: The Lost Chapters]: Uses Ogg Vorbis for music and cutscenes (Ancient libVorbis version, 1.0 RC2).

* [http://farcry.ubi.com/ FarCry] by Crytek: uses Ogg Vorbis for music and effects.

* [http://www.freedom-fighters.co.uk/ Freedom Fighters] by IO Interactive: String search reveals “libVorbis I 20011217” in freedom.exe.

* [http://www.siriusgames.dk/index.php?pageid=67 Gangland] by MediaMobsters: Uses Ogg Vorbis for music and cutscenes (Data\streams\). Encoded with Xiph.Org libVorbis I 20020717. Decoder library: FMOD 3.71.

* [http://www.rockstargames.com/vicecity/ Grand Theft Auto: Vice City] by Rockstar Games/Rockstar North uses Ogg Vorbis to store music, radio, ambient sounds, police messages and cutscene audio. Players can also store their custom tracks (accessible in-game via the “User Track Player” radio station) in Ogg Vorbis.

* [http://www.rockstargames.com/sanandreas/ Grand Theft Auto: San Andreas] by Rockstar Games/Rockstar North uses Ogg Vorbis to store music, radio, ambient sounds, police messages and cutscene audio. Players can also store their custom tracks (accessible in-game via the “User Track Player” radio station) in Ogg Vorbis.

* [http://www.gothic3.com/ Gothic 3] by Piranha Bytes: Vorbis is used in the ogg container for everything (music, speech, effects) except of the intro video. For example: Music @ 256 kb/s, Speech @ 86 kb/s. About 18 hours of speech compressed to 700 MB.

* [http://www.guiltygearx2reload.com/ Guilty Gear XX]: The PC version, at least, uses Ogg Vorbis for all the music.

* [http://www.guitarherogame.com/gh2/ Guitar Hero II] by Red Octane (Activision), XBox360 platform only (multichannel Vorbis with 5 or 6 channels per song)

* [http://halo.bungie.org/ Halo]: Mac and PC versions of Halo use Ogg Vorbis for all audio, it seems. The Xiph license and dynamically linked libraries of Ogg and Vorbis are included in the Halo directory. XBox version does not use Ogg Vorbis.

* [http://harrypotter.ea.com/cofs/index.html Harry Potter II (Chamber of Secrets)]: This is unsubstantiated, it was reported on one of the vorbis mailing lists, but there is little evidence either way on this title. EA has been supportive of Vorbis though, so it’s not entirely impossible. If anyone can give us a yay or nay on this, please do.

* [http://www.mightandmagicgame.com/HeroesV/ Heroes of Might and Magic V]: Uses Vorbis for audio and Theora for video.

* [http://www.eidosinteractive.com/games/info.html?gmid=118 Hitman 2]: uses Vorbis. (PC only or consoles too?)

* [http://www.codemasters.com/igi2/front.htm IGI2: Covert Strike]: Not a Norwegian first-person shooter.

* [http://www.inthegroove.com In The Groove]: The premier dance game created by [http://www.roxorgames.com Roxor Games, Inc.] Uses Vorbis for all of the in-game music.

* [http://www.agdinteractive.com/games/kq1/ King's Quest I]: King's Quest I: Quest for the Crown (Enchanced) is a fan remake of the original Sierra classic. Uses Ogg Vorbis for sound and Ogg Theora for cutscene movies.

* [http://www.p3int.com/KULT/ KULT Heretic Kingdoms] by 3D People/Project 3 Interactive: Uses Vorbis (1.0) for music, voice and sound effects.

* Recent Legacy of Kain Games: On the PC, both Soul Reaver 2 and Blood Omen 2 by Crystal Dynamics/Eidos use Ogg Vorbis for music and sound effects. (Source: [http://www.thelostworlds.net/FAQ.HTML#ogg])

* [http://www.ncsoft.net/eng/ncgames/lineage2_intro.asp Lineage II]: NCSoft Corporation’s 3D MMORPG Lineage II uses Ogg Vorbis for its music. They use 1.0beta3, though.

* [http://www.liveforspeed.net/ Live for Speed]: Online racing simulator uses Ogg for all audio and sound effects.

* [http://www.mobygames.com/game/lock-on-modern-air-combat Lock On: Modern Air Combat]: Published by Ubisoft; CD-ROM contains over 1800 Ogg Vorbis files for speech.

* [http://www.mafia-game.com/ Mafia: The City Of Lost Heaven]: Not sure about any console version, but PC version is reported to use Ogg Vorbis.

* [http://www.popcap.com/games/magicmatch Magic Match]: A very elaborate "Match 3" casual game that uses Ogg Vorbis for its audio.

* [http://www.capcom.co.jp/rockmanx8/ Mega Man X8]: The PC version of Mega Man X8 makes use of Vorbis for music and dialogue during cutscenes.

* [http://www.mobygames.com/game/gamecube/metal-gear-solid-the-twin-snakes Metal Gear Solid: The Twin Snakes]: Uses Ogg Vorbis for all speech in the game.

* [http://minecraft.net Minecraft]: Uses Ogg Vorbis for music and sound effects.

* MotoGP: This motorcycle racing sim uses Vorbis for the music and allows players to drop their own .ogg files into the music dir to listen to them in-game.

* [http://www.mystrevelation.com/ Myst IV: Revelation]: Fourth game in the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* [http://www.mystvgame.com/ Myst V: End of Ages]: Fifth and final game in the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* Nascar Racing Games from Papyrus: They had this to say about their decision and experience:

“We’re using a lot of spoken audio in this title (a first for us) and
your codec has allowed us to reduce more than 350MB of audio data to
about 40MB, a huge savings of memory and disk space! We are very
impressed.” — Tom Faiano, Producer

“Incorprating Ogg Vorbis into our codebase was quite painless, and in the
end, even refreshing. No fuss no muss. Thank you for your efforts!”
— Bill Farquhar, Soundguy du jour

* [http://www.nexuiz.com/ Nexuiz], a fast-paced FPS with roots in Quake I, uses Vorbis for background music. The minstagib mod uses Vorbis for all of its sound.

* [http://www.codemasters.com/flashpoint/ Operation Flashpoint]: This highly successful military simulation/action game from Codemasters uses Vorbis for the in-game music.

* [http://www.orunner.com/ Ostrich Runner] by Geleos: This funny Russian cartoon-style game for kids and not only kids uses Ogg Vorbis for sound, speech and music.

* [http://www.ysagoon.com/glob2/ Globulation 2]: State of the art GPL-ed strategy game!

* [http://www.penumbragame.com Penumbra: Black Plague]: Uses Ogg Vorbis for all audio.

* [http://www.psobb.com/index.php Phantasy Star Online: Blue Burst]: Uses Ogg Vorbis for music, stored in data/ogg.

* [http://www.gopostal.com/ Postal 2]: Probably not the game we want to use to showcase Vorbis, but it’s being used in this Unreal-engine-powered ultra-violent game.

* [http://www.praetoriansgame.com/ Praetorians]: This very successful game from Pyro Studios uses Vorbis for its music.

* [http://www.psychonauts.com/ Psychonauts]: Has vorbis.dll and vorbisfile.dll.

* [http://www.quake4game.com/ Quake 4]: Quake 4 is the fourth title in the series of Quake FPS computer games. All game music, speech and sound effects make use of Vorbis.

* [http://www.restricted-area.net/ Restricted Area]: by Master Creating uses Ogg Vorbis for music and VP3 for videos.

* Ricochet: An addictive version of Break out.

* [http://www.rockband.com/ Rock Band]: XBox360 version uses the same type of multichannel Vorbis files as Guitar Hero II, but with more channels to handle the drums and vocals separately.

* [http://www.rockmanager.net/ Rock Manager]: Vorbis is used in this “new rock ’n roll management sim for PC from Pan Vision and Monsterland”.

* [http://www.sacred2.com/ Sacred 2] by Studio II: uses multichannel(!) Ogg Vorbis for music, speech and sound effects.

* [http://www.s2games.com/savage/ Savage]: This S2 Games “RTSS” hybrid genre game uses Vorbis for all the in-game music.

* [http://www.serioussam.com/se/ Serious Sam: The Second Encounter]: uses Vorbis for the music, although it is slightly obfuscated so as not to be easily playable by standard Ogg Vorbis players.

* [http://www.serioussam2.com/ Serious Sam 2]: not only uses Vorbis for the music but even Theora for the videos

* [http://www.totalwar.com/community/warlord.htm Shogun: Total War]: Shogun uses Vorbis, but only to distribute — everything is decompressed to wav during the install.

* [http://www.singles2.com/englisch/index.html Singles 2]: Uses ogg vorbis for sound

* [http://www.lart.pl/en/portfolioItem.php?id=91 Ski Jumping 2004]: A commerical game that accurately models the activity of ski jumping. The game also contains over 700 Ogg Vorbis files.

* [http://mobygames.com/game/sheet/p,3/gameId,3453/ Star Trek: Away Team]: Vorbis is used for all sound in the game — music, voiceover and SFX. This squad-based strategy game is set in the Star Trek Next Generation universe. (Official website is down, using Mobygames link)

* [http://starcraft2.com/ StarCraft II]: Uses Vorbis for audio

* StoneLoops! Of Jurassica ([http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=315210057&mt=8 Apple iTunes App Store link]): Colorful puzzle game for the iPhone/iPod Touch that uses Ogg Vorbis for audio.

* [http://supertux.lethargik.org/ Super Tux]: Uses Vorbis for music.

* [http://www.splintercell3.com/ Tom Clancy’s Splinter Cell Chaos Theory]: .LS0 files are in fact Ogg Vorbis files.

* [http://www.lucasarts.com/games/swrepubliccommando/ Star Wars Republic Commando]: Vorbis is used in the ambient and game music in this latest action game from LucasArts.

* [http://www.reflexive.net/index.php?PAGE=game_detail&AID=30 Swarm]: A fun little arcade shooter.

* [http://www.swat4.com/ SWAT 4]: SWAT 4 uses Ogg Vorbis for audio files.

* [http://www.croteam.com/talosprinciple/ The Talos Principle] is a first-person puzzle game that uses Ogg Vorbis for music.

* [http://www.there.com/ There]: uses both Ogg Vorbis for the sound effects and Ogg Speex for realtime group voice chat, a first for an immersive consumer-oriented world. Voice has become a very popular part of our product! ** posted by [http://david.weekly.org David Weekly], a There developer.

* [http://www.wesnoth.org The Battle for Wesnoth]: uses Ogg Vorbis for it's music and for most of it's sounds.

* [http://www.riddickgame.com/ The Chronicles of Riddick: Escape From Butcher’s Bay (Director’s Cut)]: Uses Vorbis for all audio and Theora for cutscenes.

* [https://thimbleweedpark.com/ Thimbleweed Park]: Retro-looking point-and-click adventure, [https://blog.thimbleweedpark.com/tracking_talkies using Ogg Vorbis for its music, character voices and sound effects].

"[The characters' dialog is] around 6GB of .wav files
and we needed to compress them for inclusion in the game.
We used .ogg files due to it being free of the patent
and licensing issues that .mp3 has, although either would have worked."
— Ron Gilbert

* [http://www.thethinggames.com/ The Thing]: Uses Vorbis

“The original multilanguage distro took three CDs, and went down to
only one after I converted all wavs to oggs. Nifty :) Sadly enough,
marketing decided to not have one language per CD anyway (probably to
annoy people who migrate) :/ Thanks for a very cool (and easy to use)
lib/format!” — Vincent Penquerc’h

* [http://www.asahi-net.or.jp/~cs8k-cyu/windows/tt_e.html Torus Trooper]: Frantic 3D shootemup, using Vorbis for the music. (see also the [http://www.emhsoft.net/ttrooper/ Linux port] and [http://www.apple.com/downloads/macosx/games/action_adventure/torustrooper.html MacOS version])

* [http://www.trackmania.com/ TrackMania] uses Vorbis for music in menu and tracks. [music in self-made tracks also need to be in Vorbis]

* [http://www.mikeoldfield.com/ Tr3s Lunas] (aka Music VR episode 1): This game, featuring the music of Mike Oldfield, uses Vorbis for the music.

* [http://www.tribesvengeance.com Tribes: Vengance] by Irration Games/Sierra use Ogg Vorbis for music.

* [http://www.mobygames.com/game/gamecube/true-crime-new-york-city True Crime: New York City]: GameCube version contains over 11,500 Ogg Vorbis files. It is likely that other platform ports also use the same files (note that the [http://www.mobygames.com/game/xbox/true-crime-new-york-city Xbox version] uses Windows Media Audio files in place of Ogg Vorbis files)

* [http://tuxtype.sourceforge.net/ Tuxtyping 2]: Educational typing tutor for kids of all ages!

* [http://www.ufo-aftershock.com/ UFO: Aftershock]: Uses Vorbis for music.

* [http://www.ufo-afterlight.com/ UFO: Afterlight]: Uses Vorbis for music.

* [http://www.atari.com/us/games/unreal2/pc Unreal 2]: PC version uses Vorbis, usage on consoles not confirmed.

“We went with Ogg Vorbis due to its excellent playback and compression,
and we used it not only for music but also all of the in-game voice.
Without it, we never would have been able to fit on two CDs.”
— [http://www.4unrealers.com/entrevistas/263/ 4unrealers.com]

* [http://www.unrealtournament.com/ut2003/ Unreal Tournament 2003]: This overwhelmingly-popular multiplayer first person shooter PC title uses Vorbis for its music.

* [http://www.unrealtournament.com/ut2004/ Unreal Tournament 2004]: Yet another Unreal game which uses Vorbis for the music (What about effects and voice? Does anyone know?). The readme file of the demo even mentions Speex!

* [http://sc2.sourceforge.net/ The Ur-Quan Masters]: Port of Star Control 2 to modern computers. Toys for Bob released the source of this amazing game under the GPL in 2002. Ogg Vorbis is used for the dialogue and the background music.

* [http://uru.ubi.com/ Uru: Ages Beyond Myst]: Spinoff from the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* [http://mobygames.com/game/sheet/p,3/gameId,8635/ Lionheart — Legacy of the Crusader]: An 3/4 RPG from Black Isle. Uses Vorbis for all audio. Thanks to all the guys that made Vorbis great.. (I even donated money myself, someday maybe I can convince the company to kick in some bucks as well). Official site is down, using mobygames link.

* [http://www.global-gaming.com/Dominion/ Urban Dominion] (beta): First Person Massively Multiplayer Online Role-Playing Game by Global-Gaming. Uses Ogg Vorbis for the sound system.

* [http://www.vietcong-game.com/ Vietcong]: Vietnam War First Person Shooter by Pterodon. Uses Ogg Vorbis I believe for the background music.

* [http://vegastrike.sourceforge.net/ Vega Strike]: It is a free spacesim. Ogg Vorbis files are stored in \music\ .

* [http://www.gathering.com/wingsofwar/ Wings Of War]: It is an arcade shooter in times of WWI. Game has ogg.dll, vorbis.dll and vorbisfile.dll — but *.ogg files are not accesible.

* [http://jonof.edgenetwork.org/winbuild/ WinBuild]: Winbuild is a port of Ken Silverman’s [http://www.advsys.net/ken/buildsrc/default.htm original Build engine demo] (for DOS) to Windows. It uses Vorbis compression for the music.

* [http://www.worldofwarcraft.com/ World of Warcraft]: popular massively multiplayer online role-playing game from Blizzard Entertainment use Vorbis for speech and sound effects.

* [http://www.zax-game.com/ Zax — The Alien Hunter]: A large 3/4 view action adventure game.

[[Category:Vorbis]]

Games that use Vorbis

2017-10-04T09:20:33Z

MrZeus: fix alphabetic ordering...

The following games use [[Vorbis]], most frequently for their in-game music or sound effects:

* All Games By [http://www.reflexive.com/index.php?CAT=Search&SEARCH=dev%3AReflexive+Entertainment&PAGE=GameList Reflexive Entertainment].

* [http://www.mobygames.com/game/windows/007-nightfire 007: Nightfire]: Uses Ogg Vorbis for background soundtrack.

* [http://www.asciisector.net/ Ascii Sector]: Space combat/exploration/trading game. Uses Ogg Vorbis for music.

* [http://www.ageofconan.com/ Age of Conan — Hyborian Adventures]: Uses Ogg Vorbis for all audio.

* [http://www.americasarmy.com/ America’s Army]: Uses Ogg Vorbis for main theme.

* [http://www.amnesiagame.com/ Amnesia: The Dark Descent]: Uses Ogg Vorbis for all audio.

* [http://assault.cubers.net/ AssaultCube]: A free fast paced first-person shooter with little hardware requirements for Windows, Linux and OS X. Uses Ogg Vorbis for all game sounds and music.

* [http://www.lionhead.com/bw2/ Black & White 2]: Uses Ogg Vorbis for music.

* [http://www.pyrogon.com/games/candycruncher/ Candy Cruncher]: This cute puzzle game from Brian Hook’s company, Pyrogon, uses Vorbis for the addictive music you hear while you race the clock.

* [http://www.callofcthulhu.com/ Call of Cthulhu] is a first-person horror game that combines intense action and adventure elements. It uses Ogg Vorbis for music and speech.

* [http://www.mobygames.com/game/windows/catechumen Catechumen] is a Christian-themed FPS that uses Ogg Vorbis.

* [http://www.civilization5.com/ Civilization V] is a turn-based strategy game that uses Ogg Vorbis for music.

* [http://www.atari.com/crashday/ Crashday]: Stunt racing game, developed by independent German studio Moon Byte. Uses Ogg Vorbis for music.

* [http://buenavistagames.go.com/product/chickenLittlePC.html Chicken Litte]: Adventure game for children inspired by the motion picture in PC edition uses Vorbis for dialogs and music. (not sure if sound effects too)

* [http://www.cossacks2.de/ Cossacks 2]: “Cossacks II: Napoleonic Wars” is a sequel of “Cossacks: European Wars”. Ogg Vorbis 1.0 files are in \data\music\

* [http://www.darwinia.co.uk/ Darwinia]: The second title from Indy developer Introversion Software. Darwinia is a stylised retro — Tron meets Cannon Fodder. It uses Vorbis for all in game sound effects and music.

* [http://www.introversion.co.uk/defcon/ DEFCON]: The third title from Introversion Software. Uses Vorbis for music, effects, everything, like Darwinia.

* [http://devilmaycry.com/ Devil May Cry 4] (for the PC, at least): Uses (occasionally multichannel) Ogg Vorbis for ingame and cutscene music.

* [http://www.eidos.co.uk/gss/dxiw/ Deus Ex: Invisible War] by Ion Storm/Eidos: Uses Ogg Vorbis for music and voice (and possibly for sound fx too).

* [http://diablo3.com Diablo III] uses Vorbis for audio.

* [http://www.idsoftware.com/games/doom/doom3/ DOOM 3]: The latest version of this famous first person shooter game from id software uses Vorbis for the theme music as well as their ambient and game sounds.

* [http://mobygames.com/game/sheet/p,3/gameId,6505/ Duke Nukem: Manhattan Project]: This game from 3D Realms was released in 2002 and used Vorbis for their music. (Official website is down, using Mobygames link)

* [http://www.popcap.com/games/free/dynomite Dynomite]: Puzzle Bobble/Bust A Move clone for Windows by PopCap Games, with mouse control. Uses Ogg Vorbis for nearly all sound effects.

* [http://en.wikipedia.org/wiki/Eschalon:_Book_I Eschalon]: A classic-style roleplaying game, for Windows, Mac, and Linux. Music is in ''Ogg Vorbis'' format.

* [http://www.mobygames.com/game/enclave/ Enclave] by Starbreeze/Black Label Games: Uses Ogg Vorbis for music (and possibly for sound fx and voice too).

* [http://www.eve-online.com EVE Online] by CCP Games, the Icelandic-homed space-based single-shard persistent world game uses Ogg Vorbis for its music.

* [http://www.lionhead.com/fabletlc/ Fable: The Lost Chapters]: Uses Ogg Vorbis for music and cutscenes (Ancient libVorbis version, 1.0 RC2).

* [http://farcry.ubi.com/ FarCry] by Crytek: uses Ogg Vorbis for music and effects.

* [http://www.freedom-fighters.co.uk/ Freedom Fighters] by IO Interactive: String search reveals “libVorbis I 20011217” in freedom.exe.

* [http://www.siriusgames.dk/index.php?pageid=67 Gangland] by MediaMobsters: Uses Ogg Vorbis for music and cutscenes (Data\streams\). Encoded with Xiph.Org libVorbis I 20020717. Decoder library: FMOD 3.71.

* [http://www.rockstargames.com/vicecity/ Grand Theft Auto: Vice City] by Rockstar Games/Rockstar North uses Ogg Vorbis to store music, radio, ambient sounds, police messages and cutscene audio. Players can also store their custom tracks (accessible in-game via the “User Track Player” radio station) in Ogg Vorbis.

* [http://www.rockstargames.com/sanandreas/ Grand Theft Auto: San Andreas] by Rockstar Games/Rockstar North uses Ogg Vorbis to store music, radio, ambient sounds, police messages and cutscene audio. Players can also store their custom tracks (accessible in-game via the “User Track Player” radio station) in Ogg Vorbis.

* [http://www.gothic3.com/ Gothic 3] by Piranha Bytes: Vorbis is used in the ogg container for everything (music, speech, effects) except of the intro video. For example: Music @ 256 kb/s, Speech @ 86 kb/s. About 18 hours of speech compressed to 700 MB.

* [http://www.guiltygearx2reload.com/ Guilty Gear XX]: The PC version, at least, uses Ogg Vorbis for all the music.

* [http://www.guitarherogame.com/gh2/ Guitar Hero II] by Red Octane (Activision), XBox360 platform only (multichannel Vorbis with 5 or 6 channels per song)

* [http://halo.bungie.org/ Halo]: Mac and PC versions of Halo use Ogg Vorbis for all audio, it seems. The Xiph license and dynamically linked libraries of Ogg and Vorbis are included in the Halo directory. XBox version does not use Ogg Vorbis.

* [http://harrypotter.ea.com/cofs/index.html Harry Potter II (Chamber of Secrets)]: This is unsubstantiated, it was reported on one of the vorbis mailing lists, but there is little evidence either way on this title. EA has been supportive of Vorbis though, so it’s not entirely impossible. If anyone can give us a yay or nay on this, please do.

* [http://www.mightandmagicgame.com/HeroesV/ Heroes of Might and Magic V]: Uses Vorbis for audio and Theora for video.

* [http://www.eidosinteractive.com/games/info.html?gmid=118 Hitman 2]: uses Vorbis. (PC only or consoles too?)

* [http://www.codemasters.com/igi2/front.htm IGI2: Covert Strike]: Not a Norwegian first-person shooter.

* [http://www.inthegroove.com In The Groove]: The premier dance game created by [http://www.roxorgames.com Roxor Games, Inc.] Uses Vorbis for all of the in-game music.

* [http://www.agdinteractive.com/games/kq1/ King's Quest I]: King's Quest I: Quest for the Crown (Enchanced) is a fan remake of the original Sierra classic. Uses Ogg Vorbis for sound and Ogg Theora for cutscene movies.

* [http://www.p3int.com/KULT/ KULT Heretic Kingdoms] by 3D People/Project 3 Interactive: Uses Vorbis (1.0) for music, voice and sound effects.

* Recent Legacy of Kain Games: On the PC, both Soul Reaver 2 and Blood Omen 2 by Crystal Dynamics/Eidos use Ogg Vorbis for music and sound effects. (Source: [http://www.thelostworlds.net/FAQ.HTML#ogg])

* [http://www.ncsoft.net/eng/ncgames/lineage2_intro.asp Lineage II]: NCSoft Corporation’s 3D MMORPG Lineage II uses Ogg Vorbis for its music. They use 1.0beta3, though.

* [http://www.liveforspeed.net/ Live for Speed]: Online racing simulator uses Ogg for all audio and sound effects.

* [http://www.mobygames.com/game/lock-on-modern-air-combat Lock On: Modern Air Combat]: Published by Ubisoft; CD-ROM contains over 1800 Ogg Vorbis files for speech.

* [http://www.mafia-game.com/ Mafia: The City Of Lost Heaven]: Not sure about any console version, but PC version is reported to use Ogg Vorbis.

* [http://www.popcap.com/games/magicmatch Magic Match]: A very elaborate "Match 3" casual game that uses Ogg Vorbis for its audio.

* [http://www.capcom.co.jp/rockmanx8/ Mega Man X8]: The PC version of Mega Man X8 makes use of Vorbis for music and dialogue during cutscenes.

* [http://www.mobygames.com/game/gamecube/metal-gear-solid-the-twin-snakes Metal Gear Solid: The Twin Snakes]: Uses Ogg Vorbis for all speech in the game.

* [http://minecraft.net Minecraft]: Uses Ogg Vorbis for music and sound effects.

* MotoGP: This motorcycle racing sim uses Vorbis for the music and allows players to drop their own .ogg files into the music dir to listen to them in-game.

* [http://www.mystrevelation.com/ Myst IV: Revelation]: Fourth game in the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* [http://www.mystvgame.com/ Myst V: End of Ages]: Fifth and final game in the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* Nascar Racing Games from Papyrus: They had this to say about their decision and experience:

“We’re using a lot of spoken audio in this title (a first for us) and
your codec has allowed us to reduce more than 350MB of audio data to
about 40MB, a huge savings of memory and disk space! We are very
impressed.” —Tom Faiano, Producer

“Incorprating Ogg Vorbis into our codebase was quite painless, and in the
end, even refreshing. No fuss no muss. Thank you for your efforts!”
—Bill Farquhar, Soundguy du jour

* [http://www.nexuiz.com/ Nexuiz], a fast-paced FPS with roots in Quake I, uses Vorbis for background music. The minstagib mod uses Vorbis for all of its sound.

* [http://www.codemasters.com/flashpoint/ Operation Flashpoint]: This highly successful military simulation/action game from Codemasters uses Vorbis for the in-game music.

* [http://www.orunner.com/ Ostrich Runner] by Geleos: This funny Russian cartoon-style game for kids and not only kids uses Ogg Vorbis for sound, speech and music.

* [http://www.ysagoon.com/glob2/ Globulation 2]: State of the art GPL-ed strategy game!

* [http://www.penumbragame.com Penumbra: Black Plague]: Uses Ogg Vorbis for all audio.

* [http://www.psobb.com/index.php Phantasy Star Online: Blue Burst]: Uses Ogg Vorbis for music, stored in data/ogg.

* [http://www.gopostal.com/ Postal 2]: Probably not the game we want to use to showcase Vorbis, but it’s being used in this Unreal-engine-powered ultra-violent game.

* [http://www.praetoriansgame.com/ Praetorians]: This very successful game from Pyro Studios uses Vorbis for its music.

* [http://www.psychonauts.com/ Psychonauts]: Has vorbis.dll and vorbisfile.dll.

* [http://www.quake4game.com/ Quake 4]: Quake 4 is the fourth title in the series of Quake FPS computer games. All game music, speech and sound effects make use of Vorbis.

* [http://www.restricted-area.net/ Restricted Area]: by Master Creating uses Ogg Vorbis for music and VP3 for videos.

* Ricochet: An addictive version of Break out.

* [http://www.rockband.com/ Rock Band]: XBox360 version uses the same type of multichannel Vorbis files as Guitar Hero II, but with more channels to handle the drums and vocals separately.

* [http://www.rockmanager.net/ Rock Manager]: Vorbis is used in this “new rock ’n roll management sim for PC from Pan Vision and Monsterland”.

* [http://www.sacred2.com/ Sacred 2] by Studio II: uses multichannel(!) Ogg Vorbis for music, speech and sound effects.

* [http://www.s2games.com/savage/ Savage]: This S2 Games “RTSS” hybrid genre game uses Vorbis for all the in-game music.

* [http://www.serioussam.com/se/ Serious Sam: The Second Encounter]: uses Vorbis for the music, although it is slightly obfuscated so as not to be easily playable by standard Ogg Vorbis players.

* [http://www.serioussam2.com/ Serious Sam 2]: not only uses Vorbis for the music but even Theora for the videos

* [http://www.totalwar.com/community/warlord.htm Shogun: Total War]: Shogun uses Vorbis, but only to distribute — everything is decompressed to wav during the install.

* [http://www.singles2.com/englisch/index.html Singles 2]: Uses ogg vorbis for sound

* [http://www.lart.pl/en/portfolioItem.php?id=91 Ski Jumping 2004]: A commerical game that accurately models the activity of ski jumping. The game also contains over 700 Ogg Vorbis files.

* [http://mobygames.com/game/sheet/p,3/gameId,3453/ Star Trek: Away Team]: Vorbis is used for all sound in the game — music, voiceover and SFX. This squad-based strategy game is set in the Star Trek Next Generation universe. (Official website is down, using Mobygames link)

* [http://starcraft2.com/ StarCraft II]: Uses Vorbis for audio

* StoneLoops! Of Jurassica ([http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=315210057&mt=8 Apple iTunes App Store link]): Colorful puzzle game for the iPhone/iPod Touch that uses Ogg Vorbis for audio.

* [http://supertux.lethargik.org/ Super Tux]: Uses Vorbis for music.

* [http://www.splintercell3.com/ Tom Clancy’s Splinter Cell Chaos Theory]: .LS0 files are in fact Ogg Vorbis files.

* [http://www.lucasarts.com/games/swrepubliccommando/ Star Wars Republic Commando]: Vorbis is used in the ambient and game music in this latest action game from LucasArts.

* [http://www.reflexive.net/index.php?PAGE=game_detail&AID=30 Swarm]: A fun little arcade shooter.

* [http://www.swat4.com/ SWAT 4]: SWAT 4 uses Ogg Vorbis for audio files.

* [http://www.croteam.com/talosprinciple/ The Talos Principle] is a first-person puzzle game that uses Ogg Vorbis for music.

* [http://www.there.com/ There]: uses both Ogg Vorbis for the sound effects and Ogg Speex for realtime group voice chat, a first for an immersive consumer-oriented world. Voice has become a very popular part of our product! ** posted by [http://david.weekly.org David Weekly], a There developer.

* [http://www.wesnoth.org The Battle for Wesnoth]: uses Ogg Vorbis for it's music and for most of it's sounds.

* [http://www.riddickgame.com/ The Chronicles of Riddick: Escape From Butcher’s Bay (Director’s Cut)]: Uses Vorbis for all audio and Theora for cutscenes.

* [https://thimbleweedpark.com/ Thimbleweed Park]: Retro-looking point-and-click adventure, [https://blog.thimbleweedpark.com/tracking_talkies using Ogg Vorbis for its music, character voices and sound effects].

"[The characters' dialog is] around 6GB of .wav files
and we needed to compress them for inclusion in the game.
We used .ogg files due to it being free of the patent
and licensing issues that .mp3 has, although either would have worked."

—Ron Gilbert

* [http://www.thethinggames.com/ The Thing]: Uses Vorbis

“The original multilanguage distro took three CDs, and went down to
only one after I converted all wavs to oggs. Nifty :) Sadly enough,
marketing decided to not have one language per CD anyway (probably to
annoy people who migrate) :/ Thanks for a very cool (and easy to use)
lib/format!”

—Vincent Penquerc’h

* [http://www.asahi-net.or.jp/~cs8k-cyu/windows/tt_e.html Torus Trooper]: Frantic 3D shootemup, using Vorbis for the music. (see also the [http://www.emhsoft.net/ttrooper/ Linux port] and [http://www.apple.com/downloads/macosx/games/action_adventure/torustrooper.html MacOS version])

* [http://www.trackmania.com/ TrackMania] uses Vorbis for music in menu and tracks. [music in self-made tracks also need to be in Vorbis]

* [http://www.mikeoldfield.com/ Tr3s Lunas] (aka Music VR episode 1): This game, featuring the music of Mike Oldfield, uses Vorbis for the music.

* [http://www.tribesvengeance.com Tribes: Vengance] by Irration Games/Sierra use Ogg Vorbis for music.

* [http://www.mobygames.com/game/gamecube/true-crime-new-york-city True Crime: New York City]: GameCube version contains over 11,500 Ogg Vorbis files. It is likely that other platform ports also use the same files (note that the [http://www.mobygames.com/game/xbox/true-crime-new-york-city Xbox version] uses Windows Media Audio files in place of Ogg Vorbis files)

* [http://tuxtype.sourceforge.net/ Tuxtyping 2]: Educational typing tutor for kids of all ages!

* [http://www.ufo-aftershock.com/ UFO: Aftershock]: Uses Vorbis for music.

* [http://www.ufo-afterlight.com/ UFO: Afterlight]: Uses Vorbis for music.

* [http://www.atari.com/us/games/unreal2/pc Unreal 2]: PC version uses Vorbis, usage on consoles not confirmed.

“We went with Ogg Vorbis due to its excellent playback and compression,
and we used it not only for music but also all of the in-game voice.
Without it, we never would have been able to fit on two CDs.”

— http://www.4unrealers.com/entrevistas/263/

* [http://www.unrealtournament.com/ut2003/ Unreal Tournament 2003]: This overwhelmingly-popular multiplayer first person shooter PC title uses Vorbis for its music.

* [http://www.unrealtournament.com/ut2004/ Unreal Tournament 2004]: Yet another Unreal game which uses Vorbis for the music (What about effects and voice? Does anyone know?). The readme file of the demo even mentions Speex!

* [http://sc2.sourceforge.net/ The Ur-Quan Masters]: Port of Star Control 2 to modern computers. Toys for Bob released the source of this amazing game under the GPL in 2002. Ogg Vorbis is used for the dialogue and the background music.

* [http://uru.ubi.com/ Uru: Ages Beyond Myst]: Spinoff from the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* [http://mobygames.com/game/sheet/p,3/gameId,8635/ Lionheart — Legacy of the Crusader]: An 3/4 RPG from Black Isle. Uses Vorbis for all audio. Thanks to all the guys that made Vorbis great.. (I even donated money myself, someday maybe I can convince the company to kick in some bucks as well). Official site is down, using mobygames link.

* [http://www.global-gaming.com/Dominion/ Urban Dominion] (beta): First Person Massively Multiplayer Online Role-Playing Game by Global-Gaming. Uses Ogg Vorbis for the sound system.

* [http://www.vietcong-game.com/ Vietcong]: Vietnam War First Person Shooter by Pterodon. Uses Ogg Vorbis I believe for the background music.

* [http://vegastrike.sourceforge.net/ Vega Strike]: It is a free spacesim. Ogg Vorbis files are stored in \music\ .

* [http://www.gathering.com/wingsofwar/ Wings Of War]: It is an arcade shooter in times of WWI. Game has ogg.dll, vorbis.dll and vorbisfile.dll — but *.ogg files are not accesible.

* [http://jonof.edgenetwork.org/winbuild/ WinBuild]: Winbuild is a port of Ken Silverman’s [http://www.advsys.net/ken/buildsrc/default.htm original Build engine demo] (for DOS) to Windows. It uses Vorbis compression for the music.

* [http://www.worldofwarcraft.com/ World of Warcraft]: popular massively multiplayer online role-playing game from Blizzard Entertainment use Vorbis for speech and sound effects.

* [http://www.zax-game.com/ Zax — The Alien Hunter]: A large 3/4 view action adventure game.

[[Category:Vorbis]]

Games that use Vorbis

2017-10-04T09:19:00Z

MrZeus: add Thimbleweed Park!

The following games use [[Vorbis]], most frequently for their in-game music or sound effects:

* All Games By [http://www.reflexive.com/index.php?CAT=Search&SEARCH=dev%3AReflexive+Entertainment&PAGE=GameList Reflexive Entertainment].

* [http://www.mobygames.com/game/windows/007-nightfire 007: Nightfire]: Uses Ogg Vorbis for background soundtrack.

* [http://www.asciisector.net/ Ascii Sector]: Space combat/exploration/trading game. Uses Ogg Vorbis for music.

* [http://www.ageofconan.com/ Age of Conan — Hyborian Adventures]: Uses Ogg Vorbis for all audio.

* [http://www.americasarmy.com/ America’s Army]: Uses Ogg Vorbis for main theme.

* [http://www.amnesiagame.com/ Amnesia: The Dark Descent]: Uses Ogg Vorbis for all audio.

* [http://assault.cubers.net/ AssaultCube]: A free fast paced first-person shooter with little hardware requirements for Windows, Linux and OS X. Uses Ogg Vorbis for all game sounds and music.

* [http://www.lionhead.com/bw2/ Black & White 2]: Uses Ogg Vorbis for music.

* [http://www.pyrogon.com/games/candycruncher/ Candy Cruncher]: This cute puzzle game from Brian Hook’s company, Pyrogon, uses Vorbis for the addictive music you hear while you race the clock.

* [http://www.callofcthulhu.com/ Call of Cthulhu] is a first-person horror game that combines intense action and adventure elements. It uses Ogg Vorbis for music and speech.

* [http://www.mobygames.com/game/windows/catechumen Catechumen] is a Christian-themed FPS that uses Ogg Vorbis.

* [http://www.civilization5.com/ Civilization V] is a turn-based strategy game that uses Ogg Vorbis for music.

* [http://www.atari.com/crashday/ Crashday]: Stunt racing game, developed by independent German studio Moon Byte. Uses Ogg Vorbis for music.

* [http://buenavistagames.go.com/product/chickenLittlePC.html Chicken Litte]: Adventure game for children inspired by the motion picture in PC edition uses Vorbis for dialogs and music. (not sure if sound effects too)

* [http://www.cossacks2.de/ Cossacks 2]: “Cossacks II: Napoleonic Wars” is a sequel of “Cossacks: European Wars”. Ogg Vorbis 1.0 files are in \data\music\

* [http://www.darwinia.co.uk/ Darwinia]: The second title from Indy developer Introversion Software. Darwinia is a stylised retro — Tron meets Cannon Fodder. It uses Vorbis for all in game sound effects and music.

* [http://www.introversion.co.uk/defcon/ DEFCON]: The third title from Introversion Software. Uses Vorbis for music, effects, everything, like Darwinia.

* [http://devilmaycry.com/ Devil May Cry 4] (for the PC, at least): Uses (occasionally multichannel) Ogg Vorbis for ingame and cutscene music.

* [http://www.eidos.co.uk/gss/dxiw/ Deus Ex: Invisible War] by Ion Storm/Eidos: Uses Ogg Vorbis for music and voice (and possibly for sound fx too).

* [http://diablo3.com Diablo III] uses Vorbis for audio.

* [http://www.idsoftware.com/games/doom/doom3/ DOOM 3]: The latest version of this famous first person shooter game from id software uses Vorbis for the theme music as well as their ambient and game sounds.

* [http://mobygames.com/game/sheet/p,3/gameId,6505/ Duke Nukem: Manhattan Project]: This game from 3D Realms was released in 2002 and used Vorbis for their music. (Official website is down, using Mobygames link)

* [http://www.popcap.com/games/free/dynomite Dynomite]: Puzzle Bobble/Bust A Move clone for Windows by PopCap Games, with mouse control. Uses Ogg Vorbis for nearly all sound effects.

* [http://en.wikipedia.org/wiki/Eschalon:_Book_I Eschalon]: A classic-style roleplaying game, for Windows, Mac, and Linux. Music is in ''Ogg Vorbis'' format.

* [http://www.mobygames.com/game/enclave/ Enclave] by Starbreeze/Black Label Games: Uses Ogg Vorbis for music (and possibly for sound fx and voice too).

* [http://www.eve-online.com EVE Online] by CCP Games, the Icelandic-homed space-based single-shard persistent world game uses Ogg Vorbis for its music.

* [http://www.lionhead.com/fabletlc/ Fable: The Lost Chapters]: Uses Ogg Vorbis for music and cutscenes (Ancient libVorbis version, 1.0 RC2).

* [http://farcry.ubi.com/ FarCry] by Crytek: uses Ogg Vorbis for music and effects.

* [http://www.freedom-fighters.co.uk/ Freedom Fighters] by IO Interactive: String search reveals “libVorbis I 20011217” in freedom.exe.

* [http://www.siriusgames.dk/index.php?pageid=67 Gangland] by MediaMobsters: Uses Ogg Vorbis for music and cutscenes (Data\streams\). Encoded with Xiph.Org libVorbis I 20020717. Decoder library: FMOD 3.71.

* [http://www.rockstargames.com/vicecity/ Grand Theft Auto: Vice City] by Rockstar Games/Rockstar North uses Ogg Vorbis to store music, radio, ambient sounds, police messages and cutscene audio. Players can also store their custom tracks (accessible in-game via the “User Track Player” radio station) in Ogg Vorbis.

* [http://www.rockstargames.com/sanandreas/ Grand Theft Auto: San Andreas] by Rockstar Games/Rockstar North uses Ogg Vorbis to store music, radio, ambient sounds, police messages and cutscene audio. Players can also store their custom tracks (accessible in-game via the “User Track Player” radio station) in Ogg Vorbis.

* [http://www.gothic3.com/ Gothic 3] by Piranha Bytes: Vorbis is used in the ogg container for everything (music, speech, effects) except of the intro video. For example: Music @ 256 kb/s, Speech @ 86 kb/s. About 18 hours of speech compressed to 700 MB.

* [http://www.guiltygearx2reload.com/ Guilty Gear XX]: The PC version, at least, uses Ogg Vorbis for all the music.

* [http://www.guitarherogame.com/gh2/ Guitar Hero II] by Red Octane (Activision), XBox360 platform only (multichannel Vorbis with 5 or 6 channels per song)

* [http://halo.bungie.org/ Halo]: Mac and PC versions of Halo use Ogg Vorbis for all audio, it seems. The Xiph license and dynamically linked libraries of Ogg and Vorbis are included in the Halo directory. XBox version does not use Ogg Vorbis.

* [http://harrypotter.ea.com/cofs/index.html Harry Potter II (Chamber of Secrets)]: This is unsubstantiated, it was reported on one of the vorbis mailing lists, but there is little evidence either way on this title. EA has been supportive of Vorbis though, so it’s not entirely impossible. If anyone can give us a yay or nay on this, please do.

* [http://www.mightandmagicgame.com/HeroesV/ Heroes of Might and Magic V]: Uses Vorbis for audio and Theora for video.

* [http://www.eidosinteractive.com/games/info.html?gmid=118 Hitman 2]: uses Vorbis. (PC only or consoles too?)

* [http://www.codemasters.com/igi2/front.htm IGI2: Covert Strike]: Not a Norwegian first-person shooter.

* [http://www.inthegroove.com In The Groove]: The premier dance game created by [http://www.roxorgames.com Roxor Games, Inc.] Uses Vorbis for all of the in-game music.

* [http://www.agdinteractive.com/games/kq1/ King's Quest I]: King's Quest I: Quest for the Crown (Enchanced) is a fan remake of the original Sierra classic. Uses Ogg Vorbis for sound and Ogg Theora for cutscene movies.

* [http://www.p3int.com/KULT/ KULT Heretic Kingdoms] by 3D People/Project 3 Interactive: Uses Vorbis (1.0) for music, voice and sound effects.

* Recent Legacy of Kain Games: On the PC, both Soul Reaver 2 and Blood Omen 2 by Crystal Dynamics/Eidos use Ogg Vorbis for music and sound effects. (Source: [http://www.thelostworlds.net/FAQ.HTML#ogg])

* [http://www.ncsoft.net/eng/ncgames/lineage2_intro.asp Lineage II]: NCSoft Corporation’s 3D MMORPG Lineage II uses Ogg Vorbis for its music. They use 1.0beta3, though.

* [http://www.liveforspeed.net/ Live for Speed]: Online racing simulator uses Ogg for all audio and sound effects.

* [http://www.mobygames.com/game/lock-on-modern-air-combat Lock On: Modern Air Combat]: Published by Ubisoft; CD-ROM contains over 1800 Ogg Vorbis files for speech.

* [http://www.mafia-game.com/ Mafia: The City Of Lost Heaven]: Not sure about any console version, but PC version is reported to use Ogg Vorbis.

* [http://www.popcap.com/games/magicmatch Magic Match]: A very elaborate "Match 3" casual game that uses Ogg Vorbis for its audio.

* [http://www.capcom.co.jp/rockmanx8/ Mega Man X8]: The PC version of Mega Man X8 makes use of Vorbis for music and dialogue during cutscenes.

* [http://www.mobygames.com/game/gamecube/metal-gear-solid-the-twin-snakes Metal Gear Solid: The Twin Snakes]: Uses Ogg Vorbis for all speech in the game.

* [http://minecraft.net Minecraft]: Uses Ogg Vorbis for music and sound effects.

* MotoGP: This motorcycle racing sim uses Vorbis for the music and allows players to drop their own .ogg files into the music dir to listen to them in-game.

* [http://www.mystrevelation.com/ Myst IV: Revelation]: Fourth game in the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* [http://www.mystvgame.com/ Myst V: End of Ages]: Fifth and final game in the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* Nascar Racing Games from Papyrus: They had this to say about their decision and experience:

“We’re using a lot of spoken audio in this title (a first for us) and
your codec has allowed us to reduce more than 350MB of audio data to
about 40MB, a huge savings of memory and disk space! We are very
impressed.” —Tom Faiano, Producer

“Incorprating Ogg Vorbis into our codebase was quite painless, and in the
end, even refreshing. No fuss no muss. Thank you for your efforts!”
—Bill Farquhar, Soundguy du jour

* [http://www.nexuiz.com/ Nexuiz], a fast-paced FPS with roots in Quake I, uses Vorbis for background music. The minstagib mod uses Vorbis for all of its sound.

* [http://www.codemasters.com/flashpoint/ Operation Flashpoint]: This highly successful military simulation/action game from Codemasters uses Vorbis for the in-game music.

* [http://www.orunner.com/ Ostrich Runner] by Geleos: This funny Russian cartoon-style game for kids and not only kids uses Ogg Vorbis for sound, speech and music.

* [http://www.ysagoon.com/glob2/ Globulation 2]: State of the art GPL-ed strategy game!

* [http://www.penumbragame.com Penumbra: Black Plague]: Uses Ogg Vorbis for all audio.

* [http://www.psobb.com/index.php Phantasy Star Online: Blue Burst]: Uses Ogg Vorbis for music, stored in data/ogg.

* [http://www.gopostal.com/ Postal 2]: Probably not the game we want to use to showcase Vorbis, but it’s being used in this Unreal-engine-powered ultra-violent game.

* [http://www.praetoriansgame.com/ Praetorians]: This very successful game from Pyro Studios uses Vorbis for its music.

* [http://www.psychonauts.com/ Psychonauts]: Has vorbis.dll and vorbisfile.dll.

* [http://www.quake4game.com/ Quake 4]: Quake 4 is the fourth title in the series of Quake FPS computer games. All game music, speech and sound effects make use of Vorbis.

* [http://www.restricted-area.net/ Restricted Area]: by Master Creating uses Ogg Vorbis for music and VP3 for videos.

* Ricochet: An addictive version of Break out.

* [http://www.rockband.com/ Rock Band]: XBox360 version uses the same type of multichannel Vorbis files as Guitar Hero II, but with more channels to handle the drums and vocals separately.

* [http://www.rockmanager.net/ Rock Manager]: Vorbis is used in this “new rock ’n roll management sim for PC from Pan Vision and Monsterland”.

* [http://www.sacred2.com/ Sacred 2] by Studio II: uses multichannel(!) Ogg Vorbis for music, speech and sound effects.

* [http://www.s2games.com/savage/ Savage]: This S2 Games “RTSS” hybrid genre game uses Vorbis for all the in-game music.

* [http://www.serioussam.com/se/ Serious Sam: The Second Encounter]: uses Vorbis for the music, although it is slightly obfuscated so as not to be easily playable by standard Ogg Vorbis players.

* [http://www.serioussam2.com/ Serious Sam 2]: not only uses Vorbis for the music but even Theora for the videos

* [http://www.totalwar.com/community/warlord.htm Shogun: Total War]: Shogun uses Vorbis, but only to distribute — everything is decompressed to wav during the install.

* [http://www.singles2.com/englisch/index.html Singles 2]: Uses ogg vorbis for sound

* [http://www.lart.pl/en/portfolioItem.php?id=91 Ski Jumping 2004]: A commerical game that accurately models the activity of ski jumping. The game also contains over 700 Ogg Vorbis files.

* [http://mobygames.com/game/sheet/p,3/gameId,3453/ Star Trek: Away Team]: Vorbis is used for all sound in the game — music, voiceover and SFX. This squad-based strategy game is set in the Star Trek Next Generation universe. (Official website is down, using Mobygames link)

* [http://starcraft2.com/ StarCraft II]: Uses Vorbis for audio

* StoneLoops! Of Jurassica ([http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=315210057&mt=8 Apple iTunes App Store link]): Colorful puzzle game for the iPhone/iPod Touch that uses Ogg Vorbis for audio.

* [http://supertux.lethargik.org/ Super Tux]: Uses Vorbis for music.

* [http://www.splintercell3.com/ Tom Clancy’s Splinter Cell Chaos Theory]: .LS0 files are in fact Ogg Vorbis files.

* [http://www.lucasarts.com/games/swrepubliccommando/ Star Wars Republic Commando]: Vorbis is used in the ambient and game music in this latest action game from LucasArts.

* [http://www.reflexive.net/index.php?PAGE=game_detail&AID=30 Swarm]: A fun little arcade shooter.

* [http://www.swat4.com/ SWAT 4]: SWAT 4 uses Ogg Vorbis for audio files.

* [http://www.croteam.com/talosprinciple/ The Talos Principle] is a first-person puzzle game that uses Ogg Vorbis for music.

* [http://www.there.com/ There]: uses both Ogg Vorbis for the sound effects and Ogg Speex for realtime group voice chat, a first for an immersive consumer-oriented world. Voice has become a very popular part of our product! ** posted by [http://david.weekly.org David Weekly], a There developer.

* [http://www.wesnoth.org The Battle for Wesnoth]: uses Ogg Vorbis for it's music and for most of it's sounds.

* [http://www.riddickgame.com/ The Chronicles of Riddick: Escape From Butcher’s Bay (Director’s Cut)]: Uses Vorbis for all audio and Theora for cutscenes.

* [http://www.thethinggames.com/ The Thing]: Uses Vorbis

“The original multilanguage distro took three CDs, and went down to
only one after I converted all wavs to oggs. Nifty :) Sadly enough,
marketing decided to not have one language per CD anyway (probably to
annoy people who migrate) :/ Thanks for a very cool (and easy to use)
lib/format!”

—Vincent Penquerc’h

* [https://thimbleweedpark.com/ Thimbleweed Park]: Retro-looking point-and-click adventure, [https://blog.thimbleweedpark.com/tracking_talkies using Ogg Vorbis for its music, character voices and sound effects].

"[The characters' dialog is] around 6GB of .wav files
and we needed to compress them for inclusion in the game.
We used .ogg files due to it being free of the patent
and licensing issues that .mp3 has, although either would have worked."

—Ron Gilbert

* [http://www.asahi-net.or.jp/~cs8k-cyu/windows/tt_e.html Torus Trooper]: Frantic 3D shootemup, using Vorbis for the music. (see also the [http://www.emhsoft.net/ttrooper/ Linux port] and [http://www.apple.com/downloads/macosx/games/action_adventure/torustrooper.html MacOS version])

* [http://www.trackmania.com/ TrackMania] uses Vorbis for music in menu and tracks. [music in self-made tracks also need to be in Vorbis]

* [http://www.mikeoldfield.com/ Tr3s Lunas] (aka Music VR episode 1): This game, featuring the music of Mike Oldfield, uses Vorbis for the music.

* [http://www.tribesvengeance.com Tribes: Vengance] by Irration Games/Sierra use Ogg Vorbis for music.

* [http://www.mobygames.com/game/gamecube/true-crime-new-york-city True Crime: New York City]: GameCube version contains over 11,500 Ogg Vorbis files. It is likely that other platform ports also use the same files (note that the [http://www.mobygames.com/game/xbox/true-crime-new-york-city Xbox version] uses Windows Media Audio files in place of Ogg Vorbis files)

* [http://tuxtype.sourceforge.net/ Tuxtyping 2]: Educational typing tutor for kids of all ages!

* [http://www.ufo-aftershock.com/ UFO: Aftershock]: Uses Vorbis for music.

* [http://www.ufo-afterlight.com/ UFO: Afterlight]: Uses Vorbis for music.

* [http://www.atari.com/us/games/unreal2/pc Unreal 2]: PC version uses Vorbis, usage on consoles not confirmed.

“We went with Ogg Vorbis due to its excellent playback and compression,
and we used it not only for music but also all of the in-game voice.
Without it, we never would have been able to fit on two CDs.”

— http://www.4unrealers.com/entrevistas/263/

* [http://www.unrealtournament.com/ut2003/ Unreal Tournament 2003]: This overwhelmingly-popular multiplayer first person shooter PC title uses Vorbis for its music.

* [http://www.unrealtournament.com/ut2004/ Unreal Tournament 2004]: Yet another Unreal game which uses Vorbis for the music (What about effects and voice? Does anyone know?). The readme file of the demo even mentions Speex!

* [http://sc2.sourceforge.net/ The Ur-Quan Masters]: Port of Star Control 2 to modern computers. Toys for Bob released the source of this amazing game under the GPL in 2002. Ogg Vorbis is used for the dialogue and the background music.

* [http://uru.ubi.com/ Uru: Ages Beyond Myst]: Spinoff from the Myst series. Uses Ogg Vorbis for all music, speech and sound effects.

* [http://mobygames.com/game/sheet/p,3/gameId,8635/ Lionheart — Legacy of the Crusader]: An 3/4 RPG from Black Isle. Uses Vorbis for all audio. Thanks to all the guys that made Vorbis great.. (I even donated money myself, someday maybe I can convince the company to kick in some bucks as well). Official site is down, using mobygames link.

* [http://www.global-gaming.com/Dominion/ Urban Dominion] (beta): First Person Massively Multiplayer Online Role-Playing Game by Global-Gaming. Uses Ogg Vorbis for the sound system.

* [http://www.vietcong-game.com/ Vietcong]: Vietnam War First Person Shooter by Pterodon. Uses Ogg Vorbis I believe for the background music.

* [http://vegastrike.sourceforge.net/ Vega Strike]: It is a free spacesim. Ogg Vorbis files are stored in \music\ .

* [http://www.gathering.com/wingsofwar/ Wings Of War]: It is an arcade shooter in times of WWI. Game has ogg.dll, vorbis.dll and vorbisfile.dll — but *.ogg files are not accesible.

* [http://jonof.edgenetwork.org/winbuild/ WinBuild]: Winbuild is a port of Ken Silverman’s [http://www.advsys.net/ken/buildsrc/default.htm original Build engine demo] (for DOS) to Windows. It uses Vorbis compression for the music.

* [http://www.worldofwarcraft.com/ World of Warcraft]: popular massively multiplayer online role-playing game from Blizzard Entertainment use Vorbis for speech and sound effects.

* [http://www.zax-game.com/ Zax — The Alien Hunter]: A large 3/4 view action adventure game.

[[Category:Vorbis]]

Ogg

2017-08-01T10:26:53Z

MrZeus: /* Ogg page format */

The '''Ogg''' transport bitstream is designed to provide framing, error protection and seeking structure for higher-level codec streams that consist of raw, unencapsulated data packets, such as the [[Opus]], [[Vorbis]] and [[FLAC]] audio codecs or the [[Theora]] and [[Dirac]] video codecs.

== Name ==

Ogg derives from "ogging", jargon from the computer game Netrek. Ogg is not an acronym and should not be mentioned as "OGG".

== Design constraints for Ogg bitstreams ==

* True streaming; we must not need to seek to build a 100% complete bitstream.
* Use no more than approximately 1-2% of bitstream bandwidth for packet boundary marking, high-level framing, sync and seeking.
* Specification of absolute position within the original sample stream.
* Simple mechanism to ease limited editing, such as a simplified concatenation mechanism.
* Detection of corruption, recapture after error and direct, random access to data at arbitrary positions in the bitstream.

== Specification / standard==

The Ogg transport bitstream and file format is defined in RFC 3533 approved 2003-May. As RFC documents are invariable once approved, there will never be newer versions of RFC 3533, but an [[RFC_3533_Errata]] exists instead. Existing flaws are discussed at [[OggIssues]], ideas for the future at [[TransOgg]].

== Detecting Ogg files and extracting information ==

Ogg files begin with a signature "OggS". This signature also repeats many times inside the file, at the beginning of every page. There are several tools to get information about Ogg files:
* Ogginfo - part of Vorbis-Tools, supports Vorbis codec only (historical Ogg-vs-Vorbis issue), other codecs cause it to report garbage
* Opusinfo - part of Opus-Tools, supports only Opus codec well, only minimal Vorbis support
* Oggz ???
* MediaInfo [http://sourceforge.net/projects/mediainfo/ sf.net/projects/mediainfo] - provides information about media (and some other) files, supports many types, also Ogg with various codecs, generic audio and video information only, no Ogg-specific details

== Projects using Ogg ==

=== Codecs ===

* [[Opus]]
* [[CMML]]
* [[FLAC]] ([http://xiph.org/flac/ogg_mapping.html Ogg mapping])
* [[OggKate|Kate]]
* [http://opus-codec.org/ Opus] ([[OggOpus|Ogg mapping]])
* [[OggPCM|PCM]]
* [[Ogg Skeleton|Skeleton]]
* [[Speex]] ([[OggSpeex|Ogg mapping]])
* [[Theora]] ([[OggTheora|Ogg mapping]])
* [[Vorbis]] ([[OggVorbis|Ogg mapping]])
* [[OggWrit|Writ]]

=== Servers ===

* [[Icecast]]
* [http://www.metavid.org/ Metavid]

== Developer info ==

* [[GranulePosAndSeeking]] - a discussion of the interpretation of granulepos, and the algorithm for seeking on Ogg files
* [[FishFaq]] - also discusses Granule Position

=== Ogg page format ===

The LSb (least significant bit) comes first in the Bytes. Fields
with more than one byte length are encoded LSB (least significant
byte) first.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| capture_pattern: Magic number for page start "OggS" | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version | header_type | granule_position | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | bitstream_serial_number | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_sequence_number | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | CRC_checksum | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_segments | segment_table | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... | 28-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

== Implementations ==

The Ogg encapsulation format can be handled with the following libraries:

* libogg: [http://svn.xiph.org/trunk/ogg/ libogg svn] (C, cross-platform) Low-level Ogg parsing and writing.
* liboggz: [http://git.xiph.org/?p=liboggz.git liboggz git] (C, cross-platform) liboggz wraps libogg and provides features such as seeking.
* the Ogg Directshow filters: see [http://www.illiminable.com/ogg/ illiminable] (C++, Win32)
* [http://www.kfish.org/software/hogg HOgg] (pure Haskell)
* [http://www.jcraft.com/jorbis/ JOrbis] (pure Java) contains com.jcraft.jogg
* [http://www.sacredchao.net/quodlibet/wiki/Development/Mutagen Mutagen] (pure Python)

== See also ==

* [[Flash]]
* [[Oggless]]
* [[MIME Types and File Extensions]]
* [[RFC_3533_Errata]] - errors and flaws in the specification
* [[Nut_Container]]

== External links ==

* [http://www.xiph.org/ogg/doc/ Ogg documentation]
* [http://www.ietf.org/rfc/rfc3533.txt Ogg RFC]
* [http://en.wikipedia.org/wiki/Ogg Ogg at Wikipedia]
* [http://wiki.multimedia.cx/index.php?title=Ogg Ogg at Multimedia Wiki]

[[Category:Ogg]]

XiphInfra:List of services

2017-06-22T12:07:45Z

MrZeus: there is working HTTPS on these servers for now

{|class="wikitable sortable"
! Service
! URL
! VM
! Host
! Maintainer(s)
|-
| [[AreWeCompressedYet]]
| style="text-align:right;"| https://arewecompressedyet.com
| awcy
| style="text-align:right;"| https://catfish.xiph.org
| TD-Linux
|-
| Git Repos
| style="text-align:right;"| https://git.xiph.org
|
| style="text-align:right;"| https://mf4.xiph.org
| rillian
|-
| Home Pages
| style="text-align:right;"| https://people.xiph.org
|
| style="text-align:right;"| https://mf4.xiph.org
|
|-
| [[Icecast]] Streams
| style="text-align:right;"| http://dir.xiph.org
|
| style="text-align:right;"| https://mf4.xiph.org
| tbr
|-
| [[Icecast]] Streams (Beta)
| style="text-align:right;"| http://dir-test.xiph.org
|
| style="text-align:right;"| https://catfish.xiph.org
| ePirat, tbr
|-
| Jenkins
| style="text-align:right;"| https://mf4.xiph.org/jenkins/
| jenkins
| style="text-align:right;"| https://mf4.xiph.org
| TD-Linux
|-
| Mail
| style="text-align:right;"| xiph.org
| mailfish
| style="text-align:right;"| https://catfish.xiph.org
| ePirat, tbr
|-
| MailMan
| style="text-align:right;"| http://lists.xiph.org
| mailfish
| style="text-align:right;"| https://catfish.xiph.org
| ePirat, tbr
|-
| Media
| style="text-align:right;"| https://media.xiph.org
|
| style="text-align:right;"| https://media.xiph.org
| TD-Linux
|-
| Opus Boodler Streams
| style="text-align:right;"| https://opus-codec.org
|
| style="text-align:right;"| https://mf4.xiph.org
| gmaxwell
|-
| Rietveld
| style="text-align:right;"| https://review.xiph.org
| jenkins
| style="text-align:right;"| https://mf4.xiph.org
| TD-Linux
|-
| Subversion Repos
| style="text-align:right;"| https://svn.xiph.org
|
| style="text-align:right;"| https://mf4.xiph.org
| rillian
|-
| Trac Bug Tracker
| style="text-align:right;"| https://trac.xiph.org
|
| style="text-align:right;"| https://mf4.xiph.org
| tbr
|-
| [[XiphWiki:Features|Wiki]]
| style="text-align:right;"| https://wiki.xiph.org
| [[XiphInfra:Wiki VM|wiki]]
| style="text-align:right;"| https://mf4.xiph.org
| ePirat
|-
| Xiph Mirror Repos
| style="text-align:right;"| https://github.com/xiph
|
|
| rillian
|-
| XiphBot-ng
| style="text-align:right;"| XiphWiki on freenode.net
|
| style="text-align:right;"| https://mf4.xiph.org
| TD-Linux
|}

<noinclude>See the [[XiphInfra:Overview|Overview]] page for more information.</noinclude>

Speex FAQ

2017-06-08T16:00:22Z

MrZeus: /* Why is encoding so slow compared to decoding? */

== General ==

=== Why do we need [[Speex]]? [[Vorbis]] is open source and patent-free. ===

[[Vorbis]] is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets speech only. For that reason, Speex can achieve much better results than Vorbis on speech (typically 2-4 times higher compression at equal quality).

=== Isn't there an open source implementation of the [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] codec? Why is Speex necessary? ===

First of all, it's not clear whether [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps.

=== Ogg, Vorbis, Speex, what's the difference? ===

[[Ogg]] is a '''container''' format for holding multimedia data (audio, video, subtitles, etc.).

[[Vorbis]] is an '''audio codec''' that uses Ogg to store its bit-streams as files, hence the name '''Ogg Vorbis'''.

[[Speex]] is a '''speech codec''', that also uses the Ogg format to store its bit-streams as files,
so technically they would be '''Ogg Speex''' files. However, most people refer to them just as Speex files.

One difference Speex has with Vorbis, is that Speex is less tied to Ogg.
Actually, if you want to use Speex for Voice over IP (VoIP), you don't need to use an Ogg container at all.

=== What's the extension for Speex files? ===

Speex files tend to have the '''.spx''' extension. Note that the Speex tools (speexenc, speexdec) do not rely on the file extension at all, so any extension will work.

=== Can I use Speex for compressing music? ===

You can, but you'll be better off compressing with Vorbis when it comes to music.
Just like Vorbis is not really adapted to speech, Speex is really not adapted for music.

=== I converted some MP3s to Speex and the quality is bad. What's wrong? ===

This is called '''[https://en.wikipedia.org/wiki/Transcoding transcoding]''' (converting from one lossy format to another) and it will always result in much poorer quality than the original MP3.

Avoid transcoding speech, unless you have a ''really'' good reason to do so (e.g. you need a smaller file size).
You should also avoid '''tandeming''' (self-transcoding). I.e. if you decode a Speex file and re-encode it again at the same bit-rate, you ''will'' lose quality.

=== How does Speex compare to other proprietary codecs? ===

It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either).

Speex also has many features that are not present in most other codecs. These include variable bit-rate (VBR), integration of narrowband and wideband, as well as stereo support. Of course, another area where Speex is really hard to beat is the quality/price ratio. Unlike many very expensive codecs, Speex is free and anyone may distribute or modify it at will.

=== Can Speex pass [https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling DTMF] signals correctly? ===

I guess it all depends on the bit-rate used. Though no formal testing has yet been performed, I'd say it works correctly at 8 kbps and above. Make sure you don't use the lowest complexity (see SPEEX_SET_COMPLEXITY or --comp option), as it causes significant noise.

=== Can Speex pass [https://en.wikipedia.org/wiki/V.90_(recommendation) V.90]/[https://en.wikipedia.org/wiki/V.92 V.92] modem signals correctly? ===

If it could do that I'd be very rich by now :-) No it cannot, as that would break fundamental laws of information theory.

=== Does Speex have anything to do with the University of Sherbrooke? ===

No.

I wrote Speex while pursuing my Ph.D. at the University of Sherbrooke (2002-2005) in mobile robotics. Although I did my master with the Sherbrooke speech coding group (in speech enhancement, not coding), I was no longer associated with them when developing Speex. Furthermore, Speex does not make use of any code or proprietary technology developed in the Sherbrooke speech coding group.

=== When will the next version of Speex be released? ===

Speex has been superseded by the [[Opus]] codec, so probably no more versions of Speex will be released by Xiph. However, since the code is open-source, you are welcome to make your own additions to Speex as you please.

=== How can I help if I don't know about speech processing? ===

There's always "semi-technical" work to do. The documentation can be improved, and so can this FAQ.

== License ==

=== Under what license is Speex released? ===

As of version 1.0 beta 1, Speex is released under the revised (3-clause) BSD license. This license is one of the most permissive open source licenses.

=== Am I allowed to use Speex in commercial software? ===

Yes. As long as you comply with the license. This basically means you have to keep the copyright notice and you can't use our name to promote your product without authorization.

=== Do I have to release source code if I use Speex in a proprietary application? ===

No. The BSD license does '''not''' require you to release any source code. It is however '''appreciated''' (but not required) if you contribute back useful changes you make to Speex. This is generally also in your interest because it means you get maintenance of that code for free (i.e. no need to merge again in newer versions).

== Using libspeex ==

=== Does Speex run on Windows? ===

Compilation on Windows has been supported since version 0.8.0. There are also several front-ends available from the website.

=== Why is encoding so slow compared to decoding? ===

For most kinds of compression, encoding is inherently slower than decoding.

In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries.
When decoding, all that needs to be done is look up the right entry in the codebook using the encoded index. Since a lookup is much faster than a search, the decoder works much faster than the encoder.

=== Why is Speex so slow on my iPaq (or insert any platform without an FPU)? ===

You probably didn't build Speex with the fixed-point option (--enable-fixed-point). Even if you did, not all modes have been ported to use fixed-point arithmetic, so the code may be slowed down by a few float operations left (e.g. in the wideband mode).

=== I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that? ===

One of the causes could be scaling of the input speech. Speex expects signals to have a +-32767 (signed short) dynamic range. If the dynamic range of your signals is too small (e.g. +-1.0), you will suffer important quantization noise. A good target is to have a dynamic range around +=8000 which is large enough, but small enough to make sure there's no clipping when converting back to signed short.

=== I get very distorted speech when using libspeex in my application. What's wrong? ===

There are many possible causes for that. One of them is errors in the way the bits are manipulated. Another possible cause is the use of the same encoder or decoder state for more than one audio stream (channel), which produces strange effects with the filter memories. If the input speech has an amplitude close to +-32767, it is possible that at decoding, the amplitude be a bit higher than that, causing clipping when saving as 16-bit PCM.

=== Can Speex run on fixed-point processors or DSPs? ===

Yes. You can compile Speex for fixed-point CPUs by passing the --enable-fixed-point option to the configure script or defining FIXED_POINT.

=== What architectures are supported? ===

Speex is designed to run on pretty much any CPU that can do 16x16 multiplications. That includes:
* x86
* PowerPC
* ARM
* Blackfin
* TI C5x and C6x
* dsPIC (unsupported, unofficial port)
* Cell (in progress)

...and many others

=== Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? ===

Yes, but you must resample the signal to a supported sample rate before encoding.

Speex is optimised for '''8 kHz''' and '''16 kHz''' and it can also encode 32 kHz files as well (your mileage may vary). Anything else is ''unsupported'' and tends to be ''heavily sub-optimal''. You might as well use Vorbis instead.

Note that Speex includes a resampler module as of version '''1.2beta2'''. Refer to the section called Resampler in the Speex documentation.

=== Should I use version 1.0.x or 1.2? ===
While currently 1.2 is marked as beta, the author believes 1.2 is vastly superior and more stable than the 1.0.x series regarding compression and API performance. As such, it is recommended to use version 1.2, even though it is marked as beta.

== Technical ==

=== What's the difference between CELP and ACELP? ===

CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction".

That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used forms of CELP.

Unfortunately, since ACELP is patented, it could not be used in Speex.

=== Where can I get information about how Speex works? ===

There is not (yet) a complete description of the algorithm (except for the source code itself).

However, several aspects are documented either in the [http://speex.org/docs/ manual] or in the paper [http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex].

[[Category:Speex]]

Speex FAQ

2017-06-08T15:08:18Z

MrZeus: /* Where can I get information about how Speex works? */

== General ==

=== Why do we need [[Speex]]? [[Vorbis]] is open source and patent-free. ===

[[Vorbis]] is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets speech only. For that reason, Speex can achieve much better results than Vorbis on speech (typically 2-4 times higher compression at equal quality).

=== Isn't there an open source implementation of the [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] codec? Why is Speex necessary? ===

First of all, it's not clear whether [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps.

=== Ogg, Vorbis, Speex, what's the difference? ===

[[Ogg]] is a '''container''' format for holding multimedia data (audio, video, subtitles, etc.).

[[Vorbis]] is an '''audio codec''' that uses Ogg to store its bit-streams as files, hence the name '''Ogg Vorbis'''.

[[Speex]] is a '''speech codec''', that also uses the Ogg format to store its bit-streams as files,
so technically they would be '''Ogg Speex''' files. However, most people refer to them just as Speex files.

One difference Speex has with Vorbis, is that Speex is less tied to Ogg.
Actually, if you want to use Speex for Voice over IP (VoIP), you don't need to use an Ogg container at all.

=== What's the extension for Speex files? ===

Speex files tend to have the '''.spx''' extension. Note that the Speex tools (speexenc, speexdec) do not rely on the file extension at all, so any extension will work.

=== Can I use Speex for compressing music? ===

You can, but you'll be better off compressing with Vorbis when it comes to music.
Just like Vorbis is not really adapted to speech, Speex is really not adapted for music.

=== I converted some MP3s to Speex and the quality is bad. What's wrong? ===

This is called '''[https://en.wikipedia.org/wiki/Transcoding transcoding]''' (converting from one lossy format to another) and it will always result in much poorer quality than the original MP3.

Avoid transcoding speech, unless you have a ''really'' good reason to do so (e.g. you need a smaller file size).
You should also avoid '''tandeming''' (self-transcoding). I.e. if you decode a Speex file and re-encode it again at the same bit-rate, you ''will'' lose quality.

=== How does Speex compare to other proprietary codecs? ===

It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either).

Speex also has many features that are not present in most other codecs. These include variable bit-rate (VBR), integration of narrowband and wideband, as well as stereo support. Of course, another area where Speex is really hard to beat is the quality/price ratio. Unlike many very expensive codecs, Speex is free and anyone may distribute or modify it at will.

=== Can Speex pass [https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling DTMF] signals correctly? ===

I guess it all depends on the bit-rate used. Though no formal testing has yet been performed, I'd say it works correctly at 8 kbps and above. Make sure you don't use the lowest complexity (see SPEEX_SET_COMPLEXITY or --comp option), as it causes significant noise.

=== Can Speex pass [https://en.wikipedia.org/wiki/V.90_(recommendation) V.90]/[https://en.wikipedia.org/wiki/V.92 V.92] modem signals correctly? ===

If it could do that I'd be very rich by now :-) No it cannot, as that would break fundamental laws of information theory.

=== Does Speex have anything to do with the University of Sherbrooke? ===

No.

I wrote Speex while pursuing my Ph.D. at the University of Sherbrooke (2002-2005) in mobile robotics. Although I did my master with the Sherbrooke speech coding group (in speech enhancement, not coding), I was no longer associated with them when developing Speex. Furthermore, Speex does not make use of any code or proprietary technology developed in the Sherbrooke speech coding group.

=== When will the next version of Speex be released? ===

Speex has been superseded by the [[Opus]] codec, so probably no more versions of Speex will be released by Xiph. However, since the code is open-source, you are welcome to make your own additions to Speex as you please.

=== How can I help if I don't know about speech processing? ===

There's always "semi-technical" work to do. The documentation can be improved, and so can this FAQ.

== License ==

=== Under what license is Speex released? ===

As of version 1.0 beta 1, Speex is released under the revised (3-clause) BSD license. This license is one of the most permissive open source licenses.

=== Am I allowed to use Speex in commercial software? ===

Yes. As long as you comply with the license. This basically means you have to keep the copyright notice and you can't use our name to promote your product without authorization.

=== Do I have to release source code if I use Speex in a proprietary application? ===

No. The BSD license does '''not''' require you to release any source code. It is however '''appreciated''' (but not required) if you contribute back useful changes you make to Speex. This is generally also in your interest because it means you get maintenance of that code for free (i.e. no need to merge again in newer versions).

== Using libspeex ==

=== Does Speex run on Windows? ===

Compilation on Windows has been supported since version 0.8.0. There are also several front-ends available from the website.

=== Why is encoding so slow compared to decoding? ===

For most kinds of compression, encoding is inherently slower than decoding.

In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries.
When decoding, all that needs to be done is look up the right entry in the codebook using the encoded index. Since a lookup is much faster than a search, the decoder works much faster than the encoder.

This behaviour ties in with the assumption that in most use cases, people will be encoding a file once, but then decode it multiple times.

=== Why is Speex so slow on my iPaq (or insert any platform without an FPU)? ===

You probably didn't build Speex with the fixed-point option (--enable-fixed-point). Even if you did, not all modes have been ported to use fixed-point arithmetic, so the code may be slowed down by a few float operations left (e.g. in the wideband mode).

=== I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that? ===

One of the causes could be scaling of the input speech. Speex expects signals to have a +-32767 (signed short) dynamic range. If the dynamic range of your signals is too small (e.g. +-1.0), you will suffer important quantization noise. A good target is to have a dynamic range around +=8000 which is large enough, but small enough to make sure there's no clipping when converting back to signed short.

=== I get very distorted speech when using libspeex in my application. What's wrong? ===

There are many possible causes for that. One of them is errors in the way the bits are manipulated. Another possible cause is the use of the same encoder or decoder state for more than one audio stream (channel), which produces strange effects with the filter memories. If the input speech has an amplitude close to +-32767, it is possible that at decoding, the amplitude be a bit higher than that, causing clipping when saving as 16-bit PCM.

=== Can Speex run on fixed-point processors or DSPs? ===

Yes. You can compile Speex for fixed-point CPUs by passing the --enable-fixed-point option to the configure script or defining FIXED_POINT.

=== What architectures are supported? ===

Speex is designed to run on pretty much any CPU that can do 16x16 multiplications. That includes:
* x86
* PowerPC
* ARM
* Blackfin
* TI C5x and C6x
* dsPIC (unsupported, unofficial port)
* Cell (in progress)

...and many others

=== Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? ===

Yes, but you must resample the signal to a supported sample rate before encoding.

Speex is optimised for '''8 kHz''' and '''16 kHz''' and it can also encode 32 kHz files as well (your mileage may vary). Anything else is ''unsupported'' and tends to be ''heavily sub-optimal''. You might as well use Vorbis instead.

Note that Speex includes a resampler module as of version '''1.2beta2'''. Refer to the section called Resampler in the Speex documentation.

=== Should I use version 1.0.x or 1.2? ===
While currently 1.2 is marked as beta, the author believes 1.2 is vastly superior and more stable than the 1.0.x series regarding compression and API performance. As such, it is recommended to use version 1.2, even though it is marked as beta.

== Technical ==

=== What's the difference between CELP and ACELP? ===

CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction".

That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used forms of CELP.

Unfortunately, since ACELP is patented, it could not be used in Speex.

=== Where can I get information about how Speex works? ===

There is not (yet) a complete description of the algorithm (except for the source code itself).

However, several aspects are documented either in the [http://speex.org/docs/ manual] or in the paper [http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex].

[[Category:Speex]]

Speex FAQ

2017-06-08T15:07:24Z

MrZeus: /* CELP, ACELP, what's the difference? */

== General ==

=== Why do we need [[Speex]]? [[Vorbis]] is open source and patent-free. ===

[[Vorbis]] is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets speech only. For that reason, Speex can achieve much better results than Vorbis on speech (typically 2-4 times higher compression at equal quality).

=== Isn't there an open source implementation of the [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] codec? Why is Speex necessary? ===

First of all, it's not clear whether [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps.

=== Ogg, Vorbis, Speex, what's the difference? ===

[[Ogg]] is a '''container''' format for holding multimedia data (audio, video, subtitles, etc.).

[[Vorbis]] is an '''audio codec''' that uses Ogg to store its bit-streams as files, hence the name '''Ogg Vorbis'''.

[[Speex]] is a '''speech codec''', that also uses the Ogg format to store its bit-streams as files,
so technically they would be '''Ogg Speex''' files. However, most people refer to them just as Speex files.

One difference Speex has with Vorbis, is that Speex is less tied to Ogg.
Actually, if you want to use Speex for Voice over IP (VoIP), you don't need to use an Ogg container at all.

=== What's the extension for Speex files? ===

Speex files tend to have the '''.spx''' extension. Note that the Speex tools (speexenc, speexdec) do not rely on the file extension at all, so any extension will work.

=== Can I use Speex for compressing music? ===

You can, but you'll be better off compressing with Vorbis when it comes to music.
Just like Vorbis is not really adapted to speech, Speex is really not adapted for music.

=== I converted some MP3s to Speex and the quality is bad. What's wrong? ===

This is called '''[https://en.wikipedia.org/wiki/Transcoding transcoding]''' (converting from one lossy format to another) and it will always result in much poorer quality than the original MP3.

Avoid transcoding speech, unless you have a ''really'' good reason to do so (e.g. you need a smaller file size).
You should also avoid '''tandeming''' (self-transcoding). I.e. if you decode a Speex file and re-encode it again at the same bit-rate, you ''will'' lose quality.

=== How does Speex compare to other proprietary codecs? ===

It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either).

Speex also has many features that are not present in most other codecs. These include variable bit-rate (VBR), integration of narrowband and wideband, as well as stereo support. Of course, another area where Speex is really hard to beat is the quality/price ratio. Unlike many very expensive codecs, Speex is free and anyone may distribute or modify it at will.

=== Can Speex pass [https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling DTMF] signals correctly? ===

I guess it all depends on the bit-rate used. Though no formal testing has yet been performed, I'd say it works correctly at 8 kbps and above. Make sure you don't use the lowest complexity (see SPEEX_SET_COMPLEXITY or --comp option), as it causes significant noise.

=== Can Speex pass [https://en.wikipedia.org/wiki/V.90_(recommendation) V.90]/[https://en.wikipedia.org/wiki/V.92 V.92] modem signals correctly? ===

If it could do that I'd be very rich by now :-) No it cannot, as that would break fundamental laws of information theory.

=== Does Speex have anything to do with the University of Sherbrooke? ===

No.

I wrote Speex while pursuing my Ph.D. at the University of Sherbrooke (2002-2005) in mobile robotics. Although I did my master with the Sherbrooke speech coding group (in speech enhancement, not coding), I was no longer associated with them when developing Speex. Furthermore, Speex does not make use of any code or proprietary technology developed in the Sherbrooke speech coding group.

=== When will the next version of Speex be released? ===

Speex has been superseded by the [[Opus]] codec, so probably no more versions of Speex will be released by Xiph. However, since the code is open-source, you are welcome to make your own additions to Speex as you please.

=== How can I help if I don't know about speech processing? ===

There's always "semi-technical" work to do. The documentation can be improved, and so can this FAQ.

== License ==

=== Under what license is Speex released? ===

As of version 1.0 beta 1, Speex is released under the revised (3-clause) BSD license. This license is one of the most permissive open source licenses.

=== Am I allowed to use Speex in commercial software? ===

Yes. As long as you comply with the license. This basically means you have to keep the copyright notice and you can't use our name to promote your product without authorization.

=== Do I have to release source code if I use Speex in a proprietary application? ===

No. The BSD license does '''not''' require you to release any source code. It is however '''appreciated''' (but not required) if you contribute back useful changes you make to Speex. This is generally also in your interest because it means you get maintenance of that code for free (i.e. no need to merge again in newer versions).

== Using libspeex ==

=== Does Speex run on Windows? ===

Compilation on Windows has been supported since version 0.8.0. There are also several front-ends available from the website.

=== Why is encoding so slow compared to decoding? ===

For most kinds of compression, encoding is inherently slower than decoding.

In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries.
When decoding, all that needs to be done is look up the right entry in the codebook using the encoded index. Since a lookup is much faster than a search, the decoder works much faster than the encoder.

This behaviour ties in with the assumption that in most use cases, people will be encoding a file once, but then decode it multiple times.

=== Why is Speex so slow on my iPaq (or insert any platform without an FPU)? ===

You probably didn't build Speex with the fixed-point option (--enable-fixed-point). Even if you did, not all modes have been ported to use fixed-point arithmetic, so the code may be slowed down by a few float operations left (e.g. in the wideband mode).

=== I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that? ===

One of the causes could be scaling of the input speech. Speex expects signals to have a +-32767 (signed short) dynamic range. If the dynamic range of your signals is too small (e.g. +-1.0), you will suffer important quantization noise. A good target is to have a dynamic range around +=8000 which is large enough, but small enough to make sure there's no clipping when converting back to signed short.

=== I get very distorted speech when using libspeex in my application. What's wrong? ===

There are many possible causes for that. One of them is errors in the way the bits are manipulated. Another possible cause is the use of the same encoder or decoder state for more than one audio stream (channel), which produces strange effects with the filter memories. If the input speech has an amplitude close to +-32767, it is possible that at decoding, the amplitude be a bit higher than that, causing clipping when saving as 16-bit PCM.

=== Can Speex run on fixed-point processors or DSPs? ===

Yes. You can compile Speex for fixed-point CPUs by passing the --enable-fixed-point option to the configure script or defining FIXED_POINT.

=== What architectures are supported? ===

Speex is designed to run on pretty much any CPU that can do 16x16 multiplications. That includes:
* x86
* PowerPC
* ARM
* Blackfin
* TI C5x and C6x
* dsPIC (unsupported, unofficial port)
* Cell (in progress)

...and many others

=== Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? ===

Yes, but you must resample the signal to a supported sample rate before encoding.

Speex is optimised for '''8 kHz''' and '''16 kHz''' and it can also encode 32 kHz files as well (your mileage may vary). Anything else is ''unsupported'' and tends to be ''heavily sub-optimal''. You might as well use Vorbis instead.

Note that Speex includes a resampler module as of version '''1.2beta2'''. Refer to the section called Resampler in the Speex documentation.

=== Should I use version 1.0.x or 1.2? ===
While currently 1.2 is marked as beta, the author believes 1.2 is vastly superior and more stable than the 1.0.x series regarding compression and API performance. As such, it is recommended to use version 1.2, even though it is marked as beta.

== Technical ==

=== What's the difference between CELP and ACELP? ===

CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction".

That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used forms of CELP.

Unfortunately, since ACELP is patented, it could not be used in Speex.

=== Where can I get information about how Speex works? ===

There is not (yet) a complete description of the algorithm (except for the source code itself). However, several aspects are documented either in the [http://speex.org/docs/ manual] or in the paper [http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex].

[[Category:Speex]]

Speex FAQ

2017-06-08T14:59:55Z

MrZeus: /* Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? */

== General ==

=== Why do we need [[Speex]]? [[Vorbis]] is open source and patent-free. ===

[[Vorbis]] is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets speech only. For that reason, Speex can achieve much better results than Vorbis on speech (typically 2-4 times higher compression at equal quality).

=== Isn't there an open source implementation of the [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] codec? Why is Speex necessary? ===

First of all, it's not clear whether [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps.

=== Ogg, Vorbis, Speex, what's the difference? ===

[[Ogg]] is a '''container''' format for holding multimedia data (audio, video, subtitles, etc.).

[[Vorbis]] is an '''audio codec''' that uses Ogg to store its bit-streams as files, hence the name '''Ogg Vorbis'''.

[[Speex]] is a '''speech codec''', that also uses the Ogg format to store its bit-streams as files,
so technically they would be '''Ogg Speex''' files. However, most people refer to them just as Speex files.

One difference Speex has with Vorbis, is that Speex is less tied to Ogg.
Actually, if you want to use Speex for Voice over IP (VoIP), you don't need to use an Ogg container at all.

=== What's the extension for Speex files? ===

Speex files tend to have the '''.spx''' extension. Note that the Speex tools (speexenc, speexdec) do not rely on the file extension at all, so any extension will work.

=== Can I use Speex for compressing music? ===

You can, but you'll be better off compressing with Vorbis when it comes to music.
Just like Vorbis is not really adapted to speech, Speex is really not adapted for music.

=== I converted some MP3s to Speex and the quality is bad. What's wrong? ===

This is called '''[https://en.wikipedia.org/wiki/Transcoding transcoding]''' (converting from one lossy format to another) and it will always result in much poorer quality than the original MP3.

Avoid transcoding speech, unless you have a ''really'' good reason to do so (e.g. you need a smaller file size).
You should also avoid '''tandeming''' (self-transcoding). I.e. if you decode a Speex file and re-encode it again at the same bit-rate, you ''will'' lose quality.

=== How does Speex compare to other proprietary codecs? ===

It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either).

Speex also has many features that are not present in most other codecs. These include variable bit-rate (VBR), integration of narrowband and wideband, as well as stereo support. Of course, another area where Speex is really hard to beat is the quality/price ratio. Unlike many very expensive codecs, Speex is free and anyone may distribute or modify it at will.

=== Can Speex pass [https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling DTMF] signals correctly? ===

I guess it all depends on the bit-rate used. Though no formal testing has yet been performed, I'd say it works correctly at 8 kbps and above. Make sure you don't use the lowest complexity (see SPEEX_SET_COMPLEXITY or --comp option), as it causes significant noise.

=== Can Speex pass [https://en.wikipedia.org/wiki/V.90_(recommendation) V.90]/[https://en.wikipedia.org/wiki/V.92 V.92] modem signals correctly? ===

If it could do that I'd be very rich by now :-) No it cannot, as that would break fundamental laws of information theory.

=== Does Speex have anything to do with the University of Sherbrooke? ===

No.

I wrote Speex while pursuing my Ph.D. at the University of Sherbrooke (2002-2005) in mobile robotics. Although I did my master with the Sherbrooke speech coding group (in speech enhancement, not coding), I was no longer associated with them when developing Speex. Furthermore, Speex does not make use of any code or proprietary technology developed in the Sherbrooke speech coding group.

=== When will the next version of Speex be released? ===

Speex has been superseded by the [[Opus]] codec, so probably no more versions of Speex will be released by Xiph. However, since the code is open-source, you are welcome to make your own additions to Speex as you please.

=== How can I help if I don't know about speech processing? ===

There's always "semi-technical" work to do. The documentation can be improved, and so can this FAQ.

== License ==

=== Under what license is Speex released? ===

As of version 1.0 beta 1, Speex is released under the revised (3-clause) BSD license. This license is one of the most permissive open source licenses.

=== Am I allowed to use Speex in commercial software? ===

Yes. As long as you comply with the license. This basically means you have to keep the copyright notice and you can't use our name to promote your product without authorization.

=== Do I have to release source code if I use Speex in a proprietary application? ===

No. The BSD license does '''not''' require you to release any source code. It is however '''appreciated''' (but not required) if you contribute back useful changes you make to Speex. This is generally also in your interest because it means you get maintenance of that code for free (i.e. no need to merge again in newer versions).

== Using libspeex ==

=== Does Speex run on Windows? ===

Compilation on Windows has been supported since version 0.8.0. There are also several front-ends available from the website.

=== Why is encoding so slow compared to decoding? ===

For most kinds of compression, encoding is inherently slower than decoding.

In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries.
When decoding, all that needs to be done is look up the right entry in the codebook using the encoded index. Since a lookup is much faster than a search, the decoder works much faster than the encoder.

This behaviour ties in with the assumption that in most use cases, people will be encoding a file once, but then decode it multiple times.

=== Why is Speex so slow on my iPaq (or insert any platform without an FPU)? ===

You probably didn't build Speex with the fixed-point option (--enable-fixed-point). Even if you did, not all modes have been ported to use fixed-point arithmetic, so the code may be slowed down by a few float operations left (e.g. in the wideband mode).

=== I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that? ===

One of the causes could be scaling of the input speech. Speex expects signals to have a +-32767 (signed short) dynamic range. If the dynamic range of your signals is too small (e.g. +-1.0), you will suffer important quantization noise. A good target is to have a dynamic range around +=8000 which is large enough, but small enough to make sure there's no clipping when converting back to signed short.

=== I get very distorted speech when using libspeex in my application. What's wrong? ===

There are many possible causes for that. One of them is errors in the way the bits are manipulated. Another possible cause is the use of the same encoder or decoder state for more than one audio stream (channel), which produces strange effects with the filter memories. If the input speech has an amplitude close to +-32767, it is possible that at decoding, the amplitude be a bit higher than that, causing clipping when saving as 16-bit PCM.

=== Can Speex run on fixed-point processors or DSPs? ===

Yes. You can compile Speex for fixed-point CPUs by passing the --enable-fixed-point option to the configure script or defining FIXED_POINT.

=== What architectures are supported? ===

Speex is designed to run on pretty much any CPU that can do 16x16 multiplications. That includes:
* x86
* PowerPC
* ARM
* Blackfin
* TI C5x and C6x
* dsPIC (unsupported, unofficial port)
* Cell (in progress)

...and many others

=== Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? ===

Yes, but you must resample the signal to a supported sample rate before encoding.

Speex is optimised for '''8 kHz''' and '''16 kHz''' and it can also encode 32 kHz files as well (your mileage may vary). Anything else is ''unsupported'' and tends to be ''heavily sub-optimal''. You might as well use Vorbis instead.

Note that Speex includes a resampler module as of version '''1.2beta2'''. Refer to the section called Resampler in the Speex documentation.

=== Should I use version 1.0.x or 1.2? ===
While currently 1.2 is marked as beta, the author believes 1.2 is vastly superior and more stable than the 1.0.x series regarding compression and API performance. As such, it is recommended to use version 1.2, even though it is marked as beta.

== Technical ==

=== CELP, ACELP, what's the difference? ===

CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction". That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used form of CELP. Unfortunately, since it is patented, it could not be used in Speex.

=== Where can I get information about how Speex works? ===

There is not (yet) a complete description of the algorithm (except for the source code itself). However, several aspects are documented either in the [http://speex.org/docs/ manual] or in the paper [http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex].

[[Category:Speex]]

Speex FAQ

2017-06-08T14:56:05Z

MrZeus: /* Why is encoding so slow compared to decoding? */

== General ==

=== Why do we need [[Speex]]? [[Vorbis]] is open source and patent-free. ===

[[Vorbis]] is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets speech only. For that reason, Speex can achieve much better results than Vorbis on speech (typically 2-4 times higher compression at equal quality).

=== Isn't there an open source implementation of the [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] codec? Why is Speex necessary? ===

First of all, it's not clear whether [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps.

=== Ogg, Vorbis, Speex, what's the difference? ===

[[Ogg]] is a '''container''' format for holding multimedia data (audio, video, subtitles, etc.).

[[Vorbis]] is an '''audio codec''' that uses Ogg to store its bit-streams as files, hence the name '''Ogg Vorbis'''.

[[Speex]] is a '''speech codec''', that also uses the Ogg format to store its bit-streams as files,
so technically they would be '''Ogg Speex''' files. However, most people refer to them just as Speex files.

One difference Speex has with Vorbis, is that Speex is less tied to Ogg.
Actually, if you want to use Speex for Voice over IP (VoIP), you don't need to use an Ogg container at all.

=== What's the extension for Speex files? ===

Speex files tend to have the '''.spx''' extension. Note that the Speex tools (speexenc, speexdec) do not rely on the file extension at all, so any extension will work.

=== Can I use Speex for compressing music? ===

You can, but you'll be better off compressing with Vorbis when it comes to music.
Just like Vorbis is not really adapted to speech, Speex is really not adapted for music.

=== I converted some MP3s to Speex and the quality is bad. What's wrong? ===

This is called '''[https://en.wikipedia.org/wiki/Transcoding transcoding]''' (converting from one lossy format to another) and it will always result in much poorer quality than the original MP3.

Avoid transcoding speech, unless you have a ''really'' good reason to do so (e.g. you need a smaller file size).
You should also avoid '''tandeming''' (self-transcoding). I.e. if you decode a Speex file and re-encode it again at the same bit-rate, you ''will'' lose quality.

=== How does Speex compare to other proprietary codecs? ===

It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either).

Speex also has many features that are not present in most other codecs. These include variable bit-rate (VBR), integration of narrowband and wideband, as well as stereo support. Of course, another area where Speex is really hard to beat is the quality/price ratio. Unlike many very expensive codecs, Speex is free and anyone may distribute or modify it at will.

=== Can Speex pass [https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling DTMF] signals correctly? ===

I guess it all depends on the bit-rate used. Though no formal testing has yet been performed, I'd say it works correctly at 8 kbps and above. Make sure you don't use the lowest complexity (see SPEEX_SET_COMPLEXITY or --comp option), as it causes significant noise.

=== Can Speex pass [https://en.wikipedia.org/wiki/V.90_(recommendation) V.90]/[https://en.wikipedia.org/wiki/V.92 V.92] modem signals correctly? ===

If it could do that I'd be very rich by now :-) No it cannot, as that would break fundamental laws of information theory.

=== Does Speex have anything to do with the University of Sherbrooke? ===

No.

I wrote Speex while pursuing my Ph.D. at the University of Sherbrooke (2002-2005) in mobile robotics. Although I did my master with the Sherbrooke speech coding group (in speech enhancement, not coding), I was no longer associated with them when developing Speex. Furthermore, Speex does not make use of any code or proprietary technology developed in the Sherbrooke speech coding group.

=== When will the next version of Speex be released? ===

Speex has been superseded by the [[Opus]] codec, so probably no more versions of Speex will be released by Xiph. However, since the code is open-source, you are welcome to make your own additions to Speex as you please.

=== How can I help if I don't know about speech processing? ===

There's always "semi-technical" work to do. The documentation can be improved, and so can this FAQ.

== License ==

=== Under what license is Speex released? ===

As of version 1.0 beta 1, Speex is released under the revised (3-clause) BSD license. This license is one of the most permissive open source licenses.

=== Am I allowed to use Speex in commercial software? ===

Yes. As long as you comply with the license. This basically means you have to keep the copyright notice and you can't use our name to promote your product without authorization.

=== Do I have to release source code if I use Speex in a proprietary application? ===

No. The BSD license does '''not''' require you to release any source code. It is however '''appreciated''' (but not required) if you contribute back useful changes you make to Speex. This is generally also in your interest because it means you get maintenance of that code for free (i.e. no need to merge again in newer versions).

== Using libspeex ==

=== Does Speex run on Windows? ===

Compilation on Windows has been supported since version 0.8.0. There are also several front-ends available from the website.

=== Why is encoding so slow compared to decoding? ===

For most kinds of compression, encoding is inherently slower than decoding.

In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries.
When decoding, all that needs to be done is look up the right entry in the codebook using the encoded index. Since a lookup is much faster than a search, the decoder works much faster than the encoder.

This behaviour ties in with the assumption that in most use cases, people will be encoding a file once, but then decode it multiple times.

=== Why is Speex so slow on my iPaq (or insert any platform without an FPU)? ===

You probably didn't build Speex with the fixed-point option (--enable-fixed-point). Even if you did, not all modes have been ported to use fixed-point arithmetic, so the code may be slowed down by a few float operations left (e.g. in the wideband mode).

=== I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that? ===

One of the causes could be scaling of the input speech. Speex expects signals to have a +-32767 (signed short) dynamic range. If the dynamic range of your signals is too small (e.g. +-1.0), you will suffer important quantization noise. A good target is to have a dynamic range around +=8000 which is large enough, but small enough to make sure there's no clipping when converting back to signed short.

=== I get very distorted speech when using libspeex in my application. What's wrong? ===

There are many possible causes for that. One of them is errors in the way the bits are manipulated. Another possible cause is the use of the same encoder or decoder state for more than one audio stream (channel), which produces strange effects with the filter memories. If the input speech has an amplitude close to +-32767, it is possible that at decoding, the amplitude be a bit higher than that, causing clipping when saving as 16-bit PCM.

=== Can Speex run on fixed-point processors or DSPs? ===

Yes. You can compile Speex for fixed-point CPUs by passing the --enable-fixed-point option to the configure script or defining FIXED_POINT.

=== What architectures are supported? ===

Speex is designed to run on pretty much any CPU that can do 16x16 multiplications. That includes:
* x86
* PowerPC
* ARM
* Blackfin
* TI C5x and C6x
* dsPIC (unsupported, unofficial port)
* Cell (in progress)

...and many others

=== Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? ===

Yes, but you must resample the signal to a supported sample rate before encoding. Speex is optimised for 8 kHz and 16 kHz and it can (your mileage may vary) encode 32 kHz files as well. Anything else is unsupported and tends to be heavily sub-optimal. You might as well use Vorbis instead. Note that Speex includes a resampler module as of version 1.2beta2. Refer to the section called Resampler in the Speex documentation.

=== Should I use version 1.0.x or 1.2? ===
While currently 1.2 is marked as beta, the author believes 1.2 is vastly superior and more stable than the 1.0.x series regarding compression and API performance. As such, it is recommended to use version 1.2, even though it is marked as beta.

== Technical ==

=== CELP, ACELP, what's the difference? ===

CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction". That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used form of CELP. Unfortunately, since it is patented, it could not be used in Speex.

=== Where can I get information about how Speex works? ===

There is not (yet) a complete description of the algorithm (except for the source code itself). However, several aspects are documented either in the [http://speex.org/docs/ manual] or in the paper [http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex].

[[Category:Speex]]

Speex FAQ

2017-06-08T14:34:48Z

MrZeus: /* Can Speex pass V.9x modem signals correctly? */

== General ==

=== Why do we need [[Speex]]? [[Vorbis]] is open source and patent-free. ===

[[Vorbis]] is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets speech only. For that reason, Speex can achieve much better results than Vorbis on speech (typically 2-4 times higher compression at equal quality).

=== Isn't there an open source implementation of the [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] codec? Why is Speex necessary? ===

First of all, it's not clear whether [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps.

=== Ogg, Vorbis, Speex, what's the difference? ===

[[Ogg]] is a '''container''' format for holding multimedia data (audio, video, subtitles, etc.).

[[Vorbis]] is an '''audio codec''' that uses Ogg to store its bit-streams as files, hence the name '''Ogg Vorbis'''.

[[Speex]] is a '''speech codec''', that also uses the Ogg format to store its bit-streams as files,
so technically they would be '''Ogg Speex''' files. However, most people refer to them just as Speex files.

One difference Speex has with Vorbis, is that Speex is less tied to Ogg.
Actually, if you want to use Speex for Voice over IP (VoIP), you don't need to use an Ogg container at all.

=== What's the extension for Speex files? ===

Speex files tend to have the '''.spx''' extension. Note that the Speex tools (speexenc, speexdec) do not rely on the file extension at all, so any extension will work.

=== Can I use Speex for compressing music? ===

You can, but you'll be better off compressing with Vorbis when it comes to music.
Just like Vorbis is not really adapted to speech, Speex is really not adapted for music.

=== I converted some MP3s to Speex and the quality is bad. What's wrong? ===

This is called '''[https://en.wikipedia.org/wiki/Transcoding transcoding]''' (converting from one lossy format to another) and it will always result in much poorer quality than the original MP3.

Avoid transcoding speech, unless you have a ''really'' good reason to do so (e.g. you need a smaller file size).
You should also avoid '''tandeming''' (self-transcoding). I.e. if you decode a Speex file and re-encode it again at the same bit-rate, you ''will'' lose quality.

=== How does Speex compare to other proprietary codecs? ===

It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either).

Speex also has many features that are not present in most other codecs. These include variable bit-rate (VBR), integration of narrowband and wideband, as well as stereo support. Of course, another area where Speex is really hard to beat is the quality/price ratio. Unlike many very expensive codecs, Speex is free and anyone may distribute or modify it at will.

=== Can Speex pass [https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling DTMF] signals correctly? ===

I guess it all depends on the bit-rate used. Though no formal testing has yet been performed, I'd say it works correctly at 8 kbps and above. Make sure you don't use the lowest complexity (see SPEEX_SET_COMPLEXITY or --comp option), as it causes significant noise.

=== Can Speex pass [https://en.wikipedia.org/wiki/V.90_(recommendation) V.90]/[https://en.wikipedia.org/wiki/V.92 V.92] modem signals correctly? ===

If it could do that I'd be very rich by now :-) No it cannot, as that would break fundamental laws of information theory.

=== Does Speex have anything to do with the University of Sherbrooke? ===

No.

I wrote Speex while pursuing my Ph.D. at the University of Sherbrooke (2002-2005) in mobile robotics. Although I did my master with the Sherbrooke speech coding group (in speech enhancement, not coding), I was no longer associated with them when developing Speex. Furthermore, Speex does not make use of any code or proprietary technology developed in the Sherbrooke speech coding group.

=== When will the next version of Speex be released? ===

Speex has been superseded by the [[Opus]] codec, so probably no more versions of Speex will be released by Xiph. However, since the code is open-source, you are welcome to make your own additions to Speex as you please.

=== How can I help if I don't know about speech processing? ===

There's always "semi-technical" work to do. The documentation can be improved, and so can this FAQ.

== License ==

=== Under what license is Speex released? ===

As of version 1.0 beta 1, Speex is released under the revised (3-clause) BSD license. This license is one of the most permissive open source licenses.

=== Am I allowed to use Speex in commercial software? ===

Yes. As long as you comply with the license. This basically means you have to keep the copyright notice and you can't use our name to promote your product without authorization.

=== Do I have to release source code if I use Speex in a proprietary application? ===

No. The BSD license does '''not''' require you to release any source code. It is however '''appreciated''' (but not required) if you contribute back useful changes you make to Speex. This is generally also in your interest because it means you get maintenance of that code for free (i.e. no need to merge again in newer versions).

== Using libspeex ==

=== Does Speex run on Windows? ===

Compilation on Windows has been supported since version 0.8.0. There are also several front-ends available from the website.

=== Why is encoding so slow compared to decoding? ===

For most kinds of compression, encoding is inherently slower than decoding. In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries. On the other hand, at decoding all that needs to be done is look up the right entry in the codebook using the encoded index. Since a lookup is much faster than a search, the decoder works much faster than the encoder.

=== Why is Speex so slow on my iPaq (or insert any platform without an FPU)? ===

You probably didn't build Speex with the fixed-point option (--enable-fixed-point). Even if you did, not all modes have been ported to use fixed-point arithmetic, so the code may be slowed down by a few float operations left (e.g. in the wideband mode).

=== I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that? ===

One of the causes could be scaling of the input speech. Speex expects signals to have a +-32767 (signed short) dynamic range. If the dynamic range of your signals is too small (e.g. +-1.0), you will suffer important quantization noise. A good target is to have a dynamic range around +=8000 which is large enough, but small enough to make sure there's no clipping when converting back to signed short.

=== I get very distorted speech when using libspeex in my application. What's wrong? ===

There are many possible causes for that. One of them is errors in the way the bits are manipulated. Another possible cause is the use of the same encoder or decoder state for more than one audio stream (channel), which produces strange effects with the filter memories. If the input speech has an amplitude close to +-32767, it is possible that at decoding, the amplitude be a bit higher than that, causing clipping when saving as 16-bit PCM.

=== Can Speex run on fixed-point processors or DSPs? ===

Yes. You can compile Speex for fixed-point CPUs by passing the --enable-fixed-point option to the configure script or defining FIXED_POINT.

=== What architectures are supported? ===

Speex is designed to run on pretty much any CPU that can do 16x16 multiplications. That includes:
* x86
* PowerPC
* ARM
* Blackfin
* TI C5x and C6x
* dsPIC (unsupported, unofficial port)
* Cell (in progress)

...and many others

=== Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? ===

Yes, but you must resample the signal to a supported sample rate before encoding. Speex is optimised for 8 kHz and 16 kHz and it can (your mileage may vary) encode 32 kHz files as well. Anything else is unsupported and tends to be heavily sub-optimal. You might as well use Vorbis instead. Note that Speex includes a resampler module as of version 1.2beta2. Refer to the section called Resampler in the Speex documentation.

=== Should I use version 1.0.x or 1.2? ===
While currently 1.2 is marked as beta, the author believes 1.2 is vastly superior and more stable than the 1.0.x series regarding compression and API performance. As such, it is recommended to use version 1.2, even though it is marked as beta.

== Technical ==

=== CELP, ACELP, what's the difference? ===

CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction". That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used form of CELP. Unfortunately, since it is patented, it could not be used in Speex.

=== Where can I get information about how Speex works? ===

There is not (yet) a complete description of the algorithm (except for the source code itself). However, several aspects are documented either in the [http://speex.org/docs/ manual] or in the paper [http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex].

[[Category:Speex]]

Speex FAQ

2017-06-08T14:08:55Z

MrZeus: /* I converted some MP3s to Speex and the quality is bad. What's wrong? */

== General ==

=== Why do we need [[Speex]]? [[Vorbis]] is open source and patent-free. ===

[[Vorbis]] is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets speech only. For that reason, Speex can achieve much better results than Vorbis on speech (typically 2-4 times higher compression at equal quality).

=== Isn't there an open source implementation of the [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] codec? Why is Speex necessary? ===

First of all, it's not clear whether [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps.

=== Ogg, Vorbis, Speex, what's the difference? ===

[[Ogg]] is a '''container''' format for holding multimedia data (audio, video, subtitles, etc.).

[[Vorbis]] is an '''audio codec''' that uses Ogg to store its bit-streams as files, hence the name '''Ogg Vorbis'''.

[[Speex]] is a '''speech codec''', that also uses the Ogg format to store its bit-streams as files,
so technically they would be '''Ogg Speex''' files. However, most people refer to them just as Speex files.

One difference Speex has with Vorbis, is that Speex is less tied to Ogg.
Actually, if you want to use Speex for Voice over IP (VoIP), you don't need to use an Ogg container at all.

=== What's the extension for Speex files? ===

Speex files tend to have the '''.spx''' extension. Note that the Speex tools (speexenc, speexdec) do not rely on the file extension at all, so any extension will work.

=== Can I use Speex for compressing music? ===

You can, but you'll be better off compressing with Vorbis when it comes to music.
Just like Vorbis is not really adapted to speech, Speex is really not adapted for music.

=== I converted some MP3s to Speex and the quality is bad. What's wrong? ===

This is called '''[https://en.wikipedia.org/wiki/Transcoding transcoding]''' (converting from one lossy format to another) and it will always result in much poorer quality than the original MP3.

Avoid transcoding speech, unless you have a ''really'' good reason to do so (e.g. you need a smaller file size).
You should also avoid '''tandeming''' (self-transcoding). I.e. if you decode a Speex file and re-encode it again at the same bit-rate, you ''will'' lose quality.

=== How does Speex compare to other proprietary codecs? ===

It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either).

Speex also has many features that are not present in most other codecs. These include variable bit-rate (VBR), integration of narrowband and wideband, as well as stereo support. Of course, another area where Speex is really hard to beat is the quality/price ratio. Unlike many very expensive codecs, Speex is free and anyone may distribute or modify it at will.

=== Can Speex pass [https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling DTMF] signals correctly? ===

I guess it all depends on the bit-rate used. Though no formal testing has yet been performed, I'd say it works correctly at 8 kbps and above. Make sure you don't use the lowest complexity (see SPEEX_SET_COMPLEXITY or --comp option), as it causes significant noise.

=== Can Speex pass V.9x modem signals correctly? ===

If I could do that I'd be very rich by now :-) No it cannot, as that would break fundamental laws of information theory.

=== Does Speex have anything to do with the University of Sherbrooke? ===

No.

I wrote Speex while pursuing my Ph.D. at the University of Sherbrooke (2002-2005) in mobile robotics. Although I did my master with the Sherbrooke speech coding group (in speech enhancement, not coding), I was no longer associated with them when developing Speex. Furthermore, Speex does not make use of any code or proprietary technology developed in the Sherbrooke speech coding group.

=== When will the next version of Speex be released? ===

Speex has been superseded by the [[Opus]] codec, so probably no more versions of Speex will be released by Xiph. However, since the code is open-source, you are welcome to make your own additions to Speex as you please.

=== How can I help if I don't know about speech processing? ===

There's always "semi-technical" work to do. The documentation can be improved, and so can this FAQ.

== License ==

=== Under what license is Speex released? ===

As of version 1.0 beta 1, Speex is released under the revised (3-clause) BSD license. This license is one of the most permissive open source licenses.

=== Am I allowed to use Speex in commercial software? ===

Yes. As long as you comply with the license. This basically means you have to keep the copyright notice and you can't use our name to promote your product without authorization.

=== Do I have to release source code if I use Speex in a proprietary application? ===

No. The BSD license does '''not''' require you to release any source code. It is however '''appreciated''' (but not required) if you contribute back useful changes you make to Speex. This is generally also in your interest because it means you get maintenance of that code for free (i.e. no need to merge again in newer versions).

== Using libspeex ==

=== Does Speex run on Windows? ===

Compilation on Windows has been supported since version 0.8.0. There are also several front-ends available from the website.

=== Why is encoding so slow compared to decoding? ===

For most kinds of compression, encoding is inherently slower than decoding. In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries. On the other hand, at decoding all that needs to be done is look up the right entry in the codebook using the encoded index. Since a lookup is much faster than a search, the decoder works much faster than the encoder.

=== Why is Speex so slow on my iPaq (or insert any platform without an FPU)? ===

You probably didn't build Speex with the fixed-point option (--enable-fixed-point). Even if you did, not all modes have been ported to use fixed-point arithmetic, so the code may be slowed down by a few float operations left (e.g. in the wideband mode).

=== I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that? ===

One of the causes could be scaling of the input speech. Speex expects signals to have a +-32767 (signed short) dynamic range. If the dynamic range of your signals is too small (e.g. +-1.0), you will suffer important quantization noise. A good target is to have a dynamic range around +=8000 which is large enough, but small enough to make sure there's no clipping when converting back to signed short.

=== I get very distorted speech when using libspeex in my application. What's wrong? ===

There are many possible causes for that. One of them is errors in the way the bits are manipulated. Another possible cause is the use of the same encoder or decoder state for more than one audio stream (channel), which produces strange effects with the filter memories. If the input speech has an amplitude close to +-32767, it is possible that at decoding, the amplitude be a bit higher than that, causing clipping when saving as 16-bit PCM.

=== Can Speex run on fixed-point processors or DSPs? ===

Yes. You can compile Speex for fixed-point CPUs by passing the --enable-fixed-point option to the configure script or defining FIXED_POINT.

=== What architectures are supported? ===

Speex is designed to run on pretty much any CPU that can do 16x16 multiplications. That includes:
* x86
* PowerPC
* ARM
* Blackfin
* TI C5x and C6x
* dsPIC (unsupported, unofficial port)
* Cell (in progress)

...and many others

=== Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? ===

Yes, but you must resample the signal to a supported sample rate before encoding. Speex is optimised for 8 kHz and 16 kHz and it can (your mileage may vary) encode 32 kHz files as well. Anything else is unsupported and tends to be heavily sub-optimal. You might as well use Vorbis instead. Note that Speex includes a resampler module as of version 1.2beta2. Refer to the section called Resampler in the Speex documentation.

=== Should I use version 1.0.x or 1.2? ===
While currently 1.2 is marked as beta, the author believes 1.2 is vastly superior and more stable than the 1.0.x series regarding compression and API performance. As such, it is recommended to use version 1.2, even though it is marked as beta.

== Technical ==

=== CELP, ACELP, what's the difference? ===

CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction". That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used form of CELP. Unfortunately, since it is patented, it could not be used in Speex.

=== Where can I get information about how Speex works? ===

There is not (yet) a complete description of the algorithm (except for the source code itself). However, several aspects are documented either in the [http://speex.org/docs/ manual] or in the paper [http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex].

[[Category:Speex]]