Speex FAQ: Difference between revisions
(license) |
|||
(18 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
== General == | == General == | ||
=== Vorbis is | === Why do we need [[Speex]]? [[Vorbis]] is open source and patent-free. === | ||
Vorbis is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets | [[Vorbis]] is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets speech only. For that reason, Speex can achieve much better results than Vorbis on speech (typically 2-4 times higher compression at equal quality). | ||
=== Isn't there | === Isn't there an open source implementation of the [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] codec? Why is Speex necessary? === | ||
First of all, it's not clear whether GSM-FR is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps. | First of all, it's not clear whether [https://en.wikipedia.org/wiki/Full_Rate GSM-FR] is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps. | ||
=== Ogg, Speex | === Ogg, Vorbis, Speex, what's the difference? === | ||
Ogg is a container format for holding multimedia data. Vorbis is an audio codec that uses Ogg to store its bit-streams as files, hence the name Ogg Vorbis. Speex also uses the Ogg format to store its bit-streams as files, so technically they would be | [[Ogg]] is a '''container''' format for holding multimedia data (audio, video, subtitles, etc.). | ||
[[Vorbis]] is an '''audio codec''' that uses Ogg to store its bit-streams as files, hence the name '''Ogg Vorbis'''. | |||
[[Speex]] is a '''speech codec''', that also uses the Ogg format to store its bit-streams as files, | |||
so technically they would be '''Ogg Speex''' files. However, most people refer to them just as Speex files. | |||
One difference Speex has with Vorbis, is that Speex is less tied to Ogg. | |||
Actually, if you want to use Speex for Voice over IP (VoIP), you don't need to use an Ogg container at all. | |||
=== What's the extension for Speex files? === | === What's the extension for Speex files? === | ||
Speex files tend to have the .spx extension. Note | Speex files tend to have the '''.spx''' extension. Note that the Speex tools (speexenc, speexdec) do not rely on the file extension at all, so any extension will work. | ||
=== Can I use Speex for compressing music? === | === Can I use Speex for compressing music? === | ||
Just like Vorbis is not really adapted to speech, Speex is really not adapted for | You can, but you'll be better off compressing with Vorbis when it comes to music. | ||
Just like Vorbis is not really adapted to speech, Speex is really not adapted for music. | |||
=== I converted some MP3s to Speex and the quality is bad. What's wrong? === | === I converted some MP3s to Speex and the quality is bad. What's wrong? === | ||
This is called transcoding and it will always result in much poorer quality than the original MP3. | This is called '''[https://en.wikipedia.org/wiki/Transcoding transcoding]''' (converting from one lossy format to another) and it will always result in much poorer quality than the original MP3. | ||
Avoid transcoding speech, unless you have a ''really'' good reason to do so (e.g. you need a smaller file size). | |||
You should also avoid '''tandeming''' (self-transcoding). I.e. if you decode a Speex file and re-encode it again at the same bit-rate, you ''will'' lose quality. | |||
=== How does Speex compare to other proprietary codecs? === | === How does Speex compare to other proprietary codecs? === | ||
It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either) | It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either). | ||
Speex also has many features that are not present in most other codecs. These include variable bit-rate (VBR), integration of narrowband and wideband, as well as stereo support. Of course, another area where Speex is really hard to beat is the quality/price ratio. Unlike many very expensive codecs, Speex is free and anyone may distribute or modify it at will. | |||
=== Can Speex pass [https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling DTMF] signals correctly? === | |||
I guess it all depends on the bit-rate used. Though no formal testing has yet been performed, I'd say it works correctly at 8 kbps and above. Make sure you don't use the lowest complexity (see SPEEX_SET_COMPLEXITY or --comp option), as it causes significant noise. | |||
If | === Can Speex pass [https://en.wikipedia.org/wiki/List_of_ITU-T_V-series_recommendations#Simultaneous_transmission_of_data_and_other_signals V.90] / [https://en.wikipedia.org/wiki/V.92 V.92] modem signals correctly? === | ||
No it cannot, as that would break fundamental laws of information theory. If it '''could''' do that, I'd be '''very''' rich by now :-) | |||
=== Does Speex have anything to do with the University of Sherbrooke? === | === Does Speex have anything to do with the University of Sherbrooke? === | ||
No. I | No. | ||
I wrote Speex while pursuing my Ph.D. at the University of Sherbrooke (2002-2005) in mobile robotics. Although I did my master with the Sherbrooke speech coding group (in speech enhancement, not coding), I was no longer associated with them when developing Speex. Furthermore, Speex does not make use of any code or proprietary technology developed in the Sherbrooke speech coding group. | |||
=== When will the next version of Speex be released? === | |||
Speex has been superseded by the [[Opus]] codec, so probably no more versions of Speex will be released by Xiph. However, since the code is open-source, you are welcome to make your own additions to Speex as you please. | |||
=== How can I help if I don't know about speech processing? === | |||
There's always "semi-technical" work to do. The documentation can be improved, and so can this FAQ. | |||
== License == | == License == | ||
Line 45: | Line 68: | ||
=== Under what license is Speex released? === | === Under what license is Speex released? === | ||
As of version 1.0 beta 1, Speex is released under the | As of version 1.0 beta 1, Speex is released under the revised (3-clause) BSD license. This license is one of the most permissive open source licenses. | ||
=== Am I allowed to use Speex in commercial software? === | === Am I allowed to use Speex in commercial software? === | ||
Line 63: | Line 86: | ||
=== Why is encoding so slow compared to decoding? === | === Why is encoding so slow compared to decoding? === | ||
For most kinds of compression, encoding is inherently slower than decoding. In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries. | For most kinds of compression, encoding is inherently slower than decoding. | ||
In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries. | |||
When decoding, all that needs to be done is look up the right entry in the codebook using the encoded index. Since a lookup is much faster than a search, the decoder works much faster than the encoder. | |||
=== Why is Speex so slow on my iPaq (or insert any platform without | === Why is Speex so slow on my iPaq (or insert any platform without an FPU)? === | ||
You probably didn't build Speex with the | You probably didn't build Speex with the '''--enable-fixed-point''' option. | ||
Even if you did, not all modes have been ported to use fixed-point arithmetic, so the code may be slowed down by any float operations left (e.g. in the Wideband mode). | |||
=== I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that? === | === I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that? === | ||
Line 79: | Line 107: | ||
=== Can Speex run on fixed-point processors or DSPs? === | === Can Speex run on fixed-point processors or DSPs? === | ||
Yes. You can compile Speex for fixed-point CPUs by passing the --enable-fixed-point option to the configure script or defining FIXED_POINT. | Yes. You can compile Speex for fixed-point CPUs by passing the '''--enable-fixed-point''' option to the configure script or defining '''FIXED_POINT'''. | ||
=== What architectures are supported? === | === What architectures are supported? === | ||
Line 96: | Line 124: | ||
=== Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? === | === Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals? === | ||
Yes | Yes, but you must resample the signal to a supported sample rate before encoding. | ||
Speex is optimised for '''8 kHz''' and '''16 kHz''' and it can also encode 32 kHz files as well (your mileage may vary). Anything else is ''unsupported'' and tends to be ''heavily sub-optimal''. You might as well use Vorbis instead. | |||
Note that Speex includes a resampler module as of version '''1.2beta2'''. Refer to the section called Resampler in the Speex documentation. | |||
=== Should I use version 1.0.x or 1.2? === | |||
While currently 1.2 is marked as beta, the author believes 1.2 is vastly superior and more stable than the 1.0.x series regarding compression and API performance. As such, it is recommended to use version 1.2, even though it is marked as beta. | |||
== Technical == | == Technical == | ||
=== | === What's the difference between CELP and ACELP? === | ||
CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction". That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used | CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction". | ||
That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used forms of CELP. | |||
Unfortunately, since ACELP is patented, it could not be used in Speex. | |||
=== Where can I get information about how Speex works? === | === Where can I get information about how Speex works? === | ||
There is not (yet) a complete description of the algorithm (except for the source code itself). However, several aspects are documented either in the [http://speex.org/docs/ manual] or in the paper [http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex]. | There is not (yet) a complete description of the algorithm (except for the source code itself). | ||
However, several aspects are documented either in the [http://speex.org/docs/ manual] or in the paper [http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex]. | |||
[[Category:Speex]] |
Latest revision as of 07:02, 9 November 2017
General
Why do we need Speex? Vorbis is open source and patent-free.
Vorbis is a great project but its goals are not the same as Speex. Vorbis is mostly aimed at compressing music and audio in general, while Speex targets speech only. For that reason, Speex can achieve much better results than Vorbis on speech (typically 2-4 times higher compression at equal quality).
Isn't there an open source implementation of the GSM-FR codec? Why is Speex necessary?
First of all, it's not clear whether GSM-FR is covered by a Philips patent (see http://kbs.cs.tu-berlin.de/~jutta/toast.html). Also, GSM-FR offers mediocre quality at a relatively high bit-rate, while Speex can offer equivalent quality at almost half the bit-rate. Last but not least, Speex offers a wide range of bit-rates and sampling rates, while GSM-FR is limited to 8 kHz speech at 13 kbps.
Ogg, Vorbis, Speex, what's the difference?
Ogg is a container format for holding multimedia data (audio, video, subtitles, etc.).
Vorbis is an audio codec that uses Ogg to store its bit-streams as files, hence the name Ogg Vorbis.
Speex is a speech codec, that also uses the Ogg format to store its bit-streams as files, so technically they would be Ogg Speex files. However, most people refer to them just as Speex files.
One difference Speex has with Vorbis, is that Speex is less tied to Ogg. Actually, if you want to use Speex for Voice over IP (VoIP), you don't need to use an Ogg container at all.
What's the extension for Speex files?
Speex files tend to have the .spx extension. Note that the Speex tools (speexenc, speexdec) do not rely on the file extension at all, so any extension will work.
Can I use Speex for compressing music?
You can, but you'll be better off compressing with Vorbis when it comes to music. Just like Vorbis is not really adapted to speech, Speex is really not adapted for music.
I converted some MP3s to Speex and the quality is bad. What's wrong?
This is called transcoding (converting from one lossy format to another) and it will always result in much poorer quality than the original MP3.
Avoid transcoding speech, unless you have a really good reason to do so (e.g. you need a smaller file size). You should also avoid tandeming (self-transcoding). I.e. if you decode a Speex file and re-encode it again at the same bit-rate, you will lose quality.
How does Speex compare to other proprietary codecs?
It's hard to give precise figures since no formal listening tests have been performed yet. All I can say is that in terms of quality, Speex competes on the same ground as other proprietary codecs (not necessarily the best, but not the worst either).
Speex also has many features that are not present in most other codecs. These include variable bit-rate (VBR), integration of narrowband and wideband, as well as stereo support. Of course, another area where Speex is really hard to beat is the quality/price ratio. Unlike many very expensive codecs, Speex is free and anyone may distribute or modify it at will.
Can Speex pass DTMF signals correctly?
I guess it all depends on the bit-rate used. Though no formal testing has yet been performed, I'd say it works correctly at 8 kbps and above. Make sure you don't use the lowest complexity (see SPEEX_SET_COMPLEXITY or --comp option), as it causes significant noise.
Can Speex pass V.90 / V.92 modem signals correctly?
No it cannot, as that would break fundamental laws of information theory. If it could do that, I'd be very rich by now :-)
Does Speex have anything to do with the University of Sherbrooke?
No.
I wrote Speex while pursuing my Ph.D. at the University of Sherbrooke (2002-2005) in mobile robotics. Although I did my master with the Sherbrooke speech coding group (in speech enhancement, not coding), I was no longer associated with them when developing Speex. Furthermore, Speex does not make use of any code or proprietary technology developed in the Sherbrooke speech coding group.
When will the next version of Speex be released?
Speex has been superseded by the Opus codec, so probably no more versions of Speex will be released by Xiph. However, since the code is open-source, you are welcome to make your own additions to Speex as you please.
How can I help if I don't know about speech processing?
There's always "semi-technical" work to do. The documentation can be improved, and so can this FAQ.
License
Under what license is Speex released?
As of version 1.0 beta 1, Speex is released under the revised (3-clause) BSD license. This license is one of the most permissive open source licenses.
Am I allowed to use Speex in commercial software?
Yes. As long as you comply with the license. This basically means you have to keep the copyright notice and you can't use our name to promote your product without authorization.
Do I have to release source code if I use Speex in a proprietary application?
No. The BSD license does not require you to release any source code. It is however appreciated (but not required) if you contribute back useful changes you make to Speex. This is generally also in your interest because it means you get maintenance of that code for free (i.e. no need to merge again in newer versions).
Using libspeex
Does Speex run on Windows?
Compilation on Windows has been supported since version 0.8.0. There are also several front-ends available from the website.
Why is encoding so slow compared to decoding?
For most kinds of compression, encoding is inherently slower than decoding.
In the case of Speex, encoding consists of finding, for each vector of 5 to 10 samples, the entry that matches the best within a codebook consisting of 16 to 256 entries. When decoding, all that needs to be done is look up the right entry in the codebook using the encoded index. Since a lookup is much faster than a search, the decoder works much faster than the encoder.
Why is Speex so slow on my iPaq (or insert any platform without an FPU)?
You probably didn't build Speex with the --enable-fixed-point option.
Even if you did, not all modes have been ported to use fixed-point arithmetic, so the code may be slowed down by any float operations left (e.g. in the Wideband mode).
I'm getting unusual background noise (hiss) when using libspeex in my application. How do I fix that?
One of the causes could be scaling of the input speech. Speex expects signals to have a +-32767 (signed short) dynamic range. If the dynamic range of your signals is too small (e.g. +-1.0), you will suffer important quantization noise. A good target is to have a dynamic range around +=8000 which is large enough, but small enough to make sure there's no clipping when converting back to signed short.
I get very distorted speech when using libspeex in my application. What's wrong?
There are many possible causes for that. One of them is errors in the way the bits are manipulated. Another possible cause is the use of the same encoder or decoder state for more than one audio stream (channel), which produces strange effects with the filter memories. If the input speech has an amplitude close to +-32767, it is possible that at decoding, the amplitude be a bit higher than that, causing clipping when saving as 16-bit PCM.
Can Speex run on fixed-point processors or DSPs?
Yes. You can compile Speex for fixed-point CPUs by passing the --enable-fixed-point option to the configure script or defining FIXED_POINT.
What architectures are supported?
Speex is designed to run on pretty much any CPU that can do 16x16 multiplications. That includes:
- x86
- PowerPC
- ARM
- Blackfin
- TI C5x and C6x
- dsPIC (unsupported, unofficial port)
- Cell (in progress)
...and many others
Can I use Speex for 22.05 kHz, 44.1 kHz or 48 kHz signals?
Yes, but you must resample the signal to a supported sample rate before encoding.
Speex is optimised for 8 kHz and 16 kHz and it can also encode 32 kHz files as well (your mileage may vary). Anything else is unsupported and tends to be heavily sub-optimal. You might as well use Vorbis instead.
Note that Speex includes a resampler module as of version 1.2beta2. Refer to the section called Resampler in the Speex documentation.
Should I use version 1.0.x or 1.2?
While currently 1.2 is marked as beta, the author believes 1.2 is vastly superior and more stable than the 1.0.x series regarding compression and API performance. As such, it is recommended to use version 1.2, even though it is marked as beta.
Technical
What's the difference between CELP and ACELP?
CELP stands for "Code Excited Linear Prediction", while ACELP stands for "Algebraic Code Excited Linear Prediction".
That means ACELP is a CELP technique that uses an algebraic codebook represented as a sum of unit pulses, thus making the codebook search much more efficient. This technique was invented at the University of Sherbrooke and is now one of the most widely used forms of CELP.
Unfortunately, since ACELP is patented, it could not be used in Speex.
Where can I get information about how Speex works?
There is not (yet) a complete description of the algorithm (except for the source code itself).
However, several aspects are documented either in the manual or in the paper Improved Noise Weighting in CELP Coding of Speech — Applying the Vorbis Psychoacoustic Model To Speex.