OggCELT

From XiphWiki

(Difference between revisions)
Jump to: navigation, search
(Frequency-dependent bit allocation)
(Ogg mapping (experimental))
 
(6 intermediate revisions not shown)
Line 1: Line 1:
 +
__NOTOC__
 +
{{draft}}
 +
CELT is an experimental audio codec for use in low-delay communication.
CELT is an experimental audio codec for use in low-delay communication.
Line 18: Line 21:
==CELT Tuning==
==CELT Tuning==
-
CELT currently provides three types of tuning knob.
+
CELT currently provides three classes of tuning knobs.
-
===Frequency-dependent bit allocation===
+
===Bit allocation for the spectrum details===
-
This is by far the most important thing to tune. There's a matrix in modes.c called "band_allocation". It has BITALLOC_SIZE lines and BARK_BANDS columns. Each line determines how many bits will be allocated to each critical band depending on the number of bits available.  
+
The frequency-dependent allocation of bits used to represent the 'details' of each frame's spectrum is by far the most important set of tuneables in CELT.  
 +
 
 +
There is a matrix in modes.c called "band_allocation". It has BITALLOC_SIZE lines and BARK_BANDS columns. Each line determines how many bits will be allocated to each critical band.  
Even in CBR mode (the only mode currently supported) the available number of bits will change from frame to frame because the space consumed by the compressed spectral envelope is not constant. However, at a single constant bitrate the available number of bits does not change tremendously much from frame to frame, as a result most of the values in the matrix will have little to no effect at any particular bit-rate setting.
Even in CBR mode (the only mode currently supported) the available number of bits will change from frame to frame because the space consumed by the compressed spectral envelope is not constant. However, at a single constant bitrate the available number of bits does not change tremendously much from frame to frame, as a result most of the values in the matrix will have little to no effect at any particular bit-rate setting.
Line 30: Line 35:
===Minimum width of bands===
===Minimum width of bands===
-
At low frequencies, critical bands are too narrow to be useful, so CELT imposes a minimum width for the bands. The minimum width is (in MDCT frequency bins) is defined by MIN_BINS in modes.c. The trade off is that for small values, there will be more bands, so more energy data to encode, while for large values, there may not be enough resolution to encode the spectral shape properly.
+
At low frequencies critical bands are too narrow to be useful so CELT imposes a minimum width for the bands.  
-
===Energy encoding resolution===
+
The minimum width is defined in MDCT frequency bins and is controlled by MIN_BINS in modes.c.  
-
The last set of parameters is the resolution at which the energy is encoded. The trade off here is that any bit spent on encoding the energy more accurately isn't spent on the residual signal in 1). The resolution is defined in the "frac" array in quant_bands.c. The values themselves are the number of subdivisions to use in 6 dB, i.e. for band i, the resolution used is 6/frac[i] dB. So the higher the value, the more accurate the encoding and the more bits it takes.
+
-
== Ogg mapping (experimental) ==
+
If MIN_BINS is set to a small value there will be more bands and as a result more energy data to encode which hurts the encoder's efficiency. If MIN_BINS is set to a large value there may not be enough frequency-resolution to encode the spectral shape in a manner which conforms well to human perception.
-
{{draft}}
+
===Band energy resolution===
 +
 
 +
The last set of parameters is the resolution of the quantizer used to encode the 'shape' of the spectrum of every frame.  Ideally the shape of the spectrum would be encoded as as accurately as possible, but every bit used to accurately represent the shape of the spectrum is unavailable to represent the spectrum details. The sensitivity of the human perceptual system to the exact amount of energy in a critical band is limited, especially at higher frequencies.
 +
 
 +
The resolution is defined in the "frac" array in quant_bands.c. The values themselves are the number of subdivisions to use in 6 dB, i.e. for band i, the resolution used is 6/frac[i] dB. So the higher the value, the more accurate the encoding and the more bits it takes.
 +
 
 +
== Design decisions ==
 +
 
 +
* How do we allocate the fine energy bits?
 +
* Do we have a fixed static bit allocation or do we transmit it in a header?
 +
* Dynamic bit allocation?
 +
 
 +
== Ogg mapping (experimental) ==
Default field type: LITTLE ENDIAN unsigned integer
Default field type: LITTLE ENDIAN unsigned integer
Line 62: Line 78:
   | header_size                                                  | 32-35
   | header_size                                                  | 32-35
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
   | mode                                                          | 36-39
+
   | sample_rate                                                  | 36-39
 +
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +
  | nb_channels                                                  | 40-43
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
   | sample_rate                                                  | 40-43
+
   | frame_size                                                    | 44-47
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
   | nb_channels                                                  | 44-47
+
   | overlap                                                      | 48-51
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
   | bytes_per_packet                                              | 48-51
+
   | bytes_per_packet                                              | 52-55
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
   | extra_headers                                                | 52-55
+
   | extra_headers                                                | 56-59
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
[[Category:Ogg Mappings]]
[[Category:Ogg Mappings]]

Latest revision as of 23:21, 29 August 2009

The following is a draft. It is at best incomplete and at worst completely broken. In any case, it is not an "official" Xiph spec/codec, so use with care.


CELT is an experimental audio codec for use in low-delay communication.

The name stands for "Code-Excited Lapped Transform". It applies some of the CELP principles, but does everything in the frequency domain, which removes some of the limitations of CELP.

The CELT codec is meant to close the gap between Vorbis and Speex for applications where both high quality audio and low delay are desired. It is a

Current features include:

  • Ultra-low latency (typically from 3 to 9 ms)
  • Full audio bandwidth (44.1 kHz and 48 kHz)
  • Stereo support
  • Packet loss concealment
  • Constant bit-rates from 32 kbps to 128 kbps and above
  • A fixed-point version of the encoder and decoder

The CELT homepage has all additional info as well as samples.

CELT Tuning

CELT currently provides three classes of tuning knobs.

Bit allocation for the spectrum details

The frequency-dependent allocation of bits used to represent the 'details' of each frame's spectrum is by far the most important set of tuneables in CELT.

There is a matrix in modes.c called "band_allocation". It has BITALLOC_SIZE lines and BARK_BANDS columns. Each line determines how many bits will be allocated to each critical band.

Even in CBR mode (the only mode currently supported) the available number of bits will change from frame to frame because the space consumed by the compressed spectral envelope is not constant. However, at a single constant bitrate the available number of bits does not change tremendously much from frame to frame, as a result most of the values in the matrix will have little to no effect at any particular bit-rate setting.

The particular bit-rate that a line is used for is not explicitly specified but is instead inferred by the actual bit consumption of that line.

When the amount of bits available is somewhere between the amounts provided by explicitly configured lines the encoder will interpolate between the two nearest matching lines to achieve the needed bit-rate.

Minimum width of bands

At low frequencies critical bands are too narrow to be useful so CELT imposes a minimum width for the bands.

The minimum width is defined in MDCT frequency bins and is controlled by MIN_BINS in modes.c.

If MIN_BINS is set to a small value there will be more bands and as a result more energy data to encode which hurts the encoder's efficiency. If MIN_BINS is set to a large value there may not be enough frequency-resolution to encode the spectral shape in a manner which conforms well to human perception.

Band energy resolution

The last set of parameters is the resolution of the quantizer used to encode the 'shape' of the spectrum of every frame. Ideally the shape of the spectrum would be encoded as as accurately as possible, but every bit used to accurately represent the shape of the spectrum is unavailable to represent the spectrum details. The sensitivity of the human perceptual system to the exact amount of energy in a critical band is limited, especially at higher frequencies.

The resolution is defined in the "frac" array in quant_bands.c. The values themselves are the number of subdivisions to use in 6 dB, i.e. for band i, the resolution used is 6/frac[i] dB. So the higher the value, the more accurate the encoding and the more bits it takes.

Design decisions

  • How do we allocate the fine energy bits?
  • Do we have a fixed static bit allocation or do we transmit it in a header?
  • Dynamic bit allocation?

Ogg mapping (experimental)

Default field type: LITTLE ENDIAN unsigned integer

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | codec_id: Identifier char[8]: 'CELT    '                      | 0-3
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                                                               | 4-7
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | codec_version: char[20]                                       | 8-11
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                                                               | 12-15
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                                                               | 16-19
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                                                               | 20-23
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                                                               | 24-27
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | version_id                                                    | 28-31
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | header_size                                                   | 32-35
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | sample_rate                                                   | 36-39
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | nb_channels                                                   | 40-43
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | frame_size                                                    | 44-47
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | overlap                                                       | 48-51
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | bytes_per_packet                                              | 52-55
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | extra_headers                                                 | 56-59
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Personal tools


Main Page

Xiph.Org Projects

Audio—

Video—

Text—

Container—

Streaming—