Ambisonics: Difference between revisions

From XiphWiki
Jump to navigation Jump to search
m (→‎UHJ format: Fixed broken link)
m (→‎Malham notation: remove extra table row)
 
(11 intermediate revisions by one other user not shown)
Line 1: Line 1:
<i>
''This page is part of the Xiph Wiki, and is aimed at people developing file formats and associated software for Ambisonics. For an general introduction to Ambisonics, please go to the ''[[Wikipedia:Ambisonics|Wikipedia page on Ambisonics]]''.''
This page is part of the XiphWiki, and is aimed at people developing file  
formats and associated software for Ambisonics. For an general introduction  
to Ambisonics, please go to the  
</i>
[[Wikipedia:Ambisonics|Wikipedia page on Ambisonics]].


'''Ambisonics''' is a surround sound system first developed in the 1970s.  
'''Ambisonics''' is a surround sound system first developed in the 1970s. Its main difference from other surround techniques is that it separates transmission channels from speaker feeds, the speaker feeds being derived using a decoder situated in the living room.  Decoders can be implemented in either hardware or software. Typically more speakers are used than transmission channels, and the more speakers used then the more stable the resulting soundfield. Speakers can be arranged in a number of configurations, regular polygons being the most popular.
Its main difference from other surround techniques is that it separates  
transmission channels from speaker feeds, the speaker feeds being derived  
using a decoder situated in the living room.  Decoders can be implemented  
in either hardware or software. Typically more speakers are used than  
transmission channels, and the more speakers used then the more stable the  
resulting soundfield. Speakers can be arranged in a number of configurations,  
regular polygons being the most popular.


Ambisonic files can come in a number of different formats. The main one is  
Ambisonic files can come in a number of different formats. The main one is called B-Format, the other formats being derived from this. UHJ format is mono- and stereo-compatible. G-Format is a set of speaker feeds, so can be enjoyed in surround sound without the need for a decoder in the living room.
called B-Format, the other formats being derived from this. UHJ format is  
mono- and stereo-compatible. G-Format is a set of speaker feeds, so can be  
enjoyed in surround sound without the need for a decoder in the living room.


== Ambisonics and 5.1 ==
== Ambisonics and 5.1 ==


Ambisonics and conventional 5.1 surround sound are very different. 5.1 is a  
Ambisonics and conventional 5.1 surround sound are very different. 5.1 is a set speaker feeds, the signal only being fully defined for sounds coming from a speaker. Phantom images between speakers can be created, but the technique to do so is left unspecified. Many 5.1 releases use pair-wise mixing to create phantom images. This is understandable as almost all stereo recordings are mixed using pair-wise mixing.
set speaker feeds, the signal only being fully defined for sounds coming  
from a speaker. Phantom images between speakers can be created, but the  
technique to do so is left unspecified. Many 5.1 releases use pair-wise  
mixing to create phantom images. This is understandable as almost all  
stereo recordings are mixed using pair-wise mixing.


Pair-wise mixing is also called "pan-potting", "amplitude mixing" and  
Pair-wise mixing is also called "pan-potting", "amplitude mixing" and "intensity stereophony". It mixes signals into the feeds for a pair of speakers to create the illusion that a sound is coming from a point somewhere between the speakers. During mixing, the apparent location of each sound is determined only by the relative amplitude of that sound in the two speakers.
"intensity stereophony". It mixes signals into the feeds for a pair of  
speakers to create the illusion that a sound is coming from a point  
somewhere between the speakers. During mixing, the apparent location of  
each sound is determined only by the relative amplitude of that sound in  
the two speakers.


Unfortunately, pair-wise mixing works poorly when the speakers are to the  
Unfortunately, pair-wise mixing works poorly when the speakers are to the rear of the listener and not-at-all when they are to one side. You can demonstrate this for yourself by performing [http://members.tripod.com/martin_leese/Ambisonic/experiment.html a very simple experiment]. Pair-wise mixing did not work in the quadraphonic era and it will not work now. Such an absolute statement can be made because the way that humans localize sound has not changed.
rear of the listener and not-at-all when they are to one side. You can  
demonstrate this for yourself by performing  
[http://members.tripod.com/martin_leese/Ambisonic/experiment.html a very simple experiment].
Pair-wise mixing did not work in the quadraphonic era and it will not work  
now. Such an absolute statement can be made because the way that humans  
localise sound has not changed.


Ambisonics is fundamentally different from 5.1. What is encoded in  
Ambisonics is fundamentally different from 5.1. What is encoded in Ambisonics is not speaker feeds, but ''direction''. When mixing in Ambisonics, the positions of the speakers are unknown ''and are of no interest''. Further, when Ambisonics is decoded to speaker feeds all of the speakers cooperate to localize a sound in its correct position so, for example, when the speakers on the left push those on the right pull. The speakers all contribute to the creation of a single coherent soundfield.
Ambisonics is not speaker feeds, but ''direction''. When mixing in  
Ambisonics, the positions of the speakers are unknown  
''and are of no interest''. Further, when Ambisonics is decoded to speaker  
feeds all of the speakers cooperate to localise a sound in its correct  
position so, for example, when the speakers on the left push those on the  
right pull. The speakers all contribute to the creation of a single  
coherent soundfield.


=== Ambisonics to 5.1 ===
=== Ambisonics to 5.1 ===
Converting Ambisonics to 5.1 is straightforward, and is discussed below  
Converting Ambisonics to 5.1 is straightforward, and is discussed below (see [[#G-Format|G-Format]]).
(see [[#G-Format|G-Format]]).


=== 5.1 to Ambisonics ===
=== 5.1 to Ambisonics ===
Converting 5.1 to Ambisonics is more difficult. It is easy to make the  
Converting 5.1 to Ambisonics is more difficult. It is easy to make the five speaker feeds phantom images, called "virtual speakers". (The ".1" channel can be folded into W.)  The problem with this is that even if the Ambisonic rendering is perfect, the result will only be as good as the original 5.1 played through ''real'' speakers. It will not be an improvement. Nobody has yet come up with a way for Ambisonics to improve 5.1; 5.1 is simply too broken.
five speaker feeds phantom images, called "virtual speakers". (The ".1"  
channel can be folded into W.)  The problem with this is that even if the  
Ambisonic rendering is perfect, the result will only be as good as the  
original 5.1 played through ''real'' speakers. It will not be an  
improvement. Nobody has yet come up with a way for Ambisonics to improve  
5.1; 5.1 is simply too broken.


== B-Format ==
== B-Format ==


B-Format is a single coherent soundfield composed of a set of related  
B-Format is a single coherent soundfield composed of a set of related channels.  The number of channels used depends on whether the soundfield is horizontal-only or full-sphere, and on the order. These B-Format channels are transmission channels, not speaker feeds. Listening to B-Format requires a decoder in your living room. Some numbers of channels are tabulated below.
channels.  The number of channels used depends on whether the soundfiled
is horizontal-only or full-sphere, and on the order. These B-Format  
channels are transmission channels, not speaker feeds. Listening to  
B-Format requires a decoder in your living room. Some numbers of  
channels are tabulated below.


=== Channel correlation ===
=== Channel correlation ===
Compression techniques typically make use of channel correlation to  
Compression techniques typically make use of channel correlation to remove redundancy from the audio data, and so improve the compression ratio.
remove redundancy from the audio data, and so improve the compression  
ratio.


The correlation between B-Format channels depends on the content.
The correlation between B-Format channels depends on the content. Four-channel B-Format consists of an omni-directional component, called W, and three figure-of-eight components pointing forward, left and up, called X, Y, Z. ([http://members.tripod.com/martin_leese/Ambisonic/Harmonic.html Pictures are available].) Three-channel, horizontal-only B-Format simply omits the Z channel. This means that anything in X also appears in W. Same for Y and Z. (W is omni-directional; everything appears in W.) Also, if content comes from Front-Left then it appears equally in X and Y. Same for content from Front-Right, Back-Left, Back-Right; only the relative polarities change. So there can be a lot of correlation between B-Format channels, but it is content dependent.
Four-channel B-Format consists of an  
omni-directional component, called W, and three figure-of-eight  
components pointing forward, left and up, called X, Y, Z.  
([http://members.tripod.com/martin_leese/Ambisonic/Harmonic.html Pictures are available].)  
Three-channel, horizontal-only B-Format simply omits the Z channel. This  
means that anything in X also appears in W. Same for Y and Z. (W is  
omni-directional; everything appears in W.) Also, if content comes from  
Front-Left then it appears equally in X and Y. Same for content from  
Front-Right, Back-Left, Back-Right; only the relative polarities change.  
So there can be a lot of correlation between B-Format channels, but it is  
content dependent.


One problem with B-Format is that it is big on low-frequency phase. The  
One problem with B-Format is that it is big on low-frequency phase. The phase relationships between the different B-Format channels are important if the resulting soundfield is to correctly "gel". This may be a problem when B-Format channels are compressed using lossy compression.
phase relationships between the different B-Format channels are important  
if the resulting soundfield is to correctly "gel". This may be a problem  
when B-Format channels are compressed using lossy compression.


There is a file specification in use for downloadable B-Format files  
There is a file specification in use for downloadable B-Format files called the [http://members.tripod.com/martin_leese/Ambisonic/B-Format_file_format.html ".amb" specification].
called the  
[http://members.tripod.com/martin_leese/Ambisonic/B-Format_file_format.html ".amb" specification].


=== Limitations of the ".amb" specification ===
=== Limitations of the ".amb" specification ===
The [http://members.tripod.com/martin_leese/Ambisonic/B-Format_file_format.html ".amb" specification]  
The [http://members.tripod.com/martin_leese/Ambisonic/B-Format_file_format.html ".amb" specification] for downloadable B-Format files is based on the WAVE-EX format.  There are currently over 200 pieces available in this format [http://www.ambisonia.com for free download]. Most of these are first-order full-sphere soundfields. (A [https://en.wikipedia.org/wiki/List_of_Ambisonic_Software list of Ambisonic software decoders] is avaialble on Wikipedia.) Some of the limitations of the specification are:   
for downloadable B-Format files is based on the WAVE-EX format.  There are  
currently over 200 pieces available in this format  
[http://www.ambisonia.com for free download]. Most of these are  
first-order full-sphere soundfields. (The same website also has details of
[http://www.ambisonia.com/wiki/index.php/Playback_Software ad hoc software decoders].)  
Some of the limitations of the specification are:   


#It is limited to 4 GByte files (2 GBytes if somebody screwed up).
#It is limited to 4 GByte files (2 GBytes if somebody screwed up).
#It is limited to third-order soundfields and below. While third-order looks like a lot (16 channels), there already exists a prototype mic that can record up to fourth-order (25 channels).
#It is limited to third-order soundfields and below. While third-order looks like a lot (16 channels), there are already research rigs that reproduce fourth-order (25 channels).
#No compression (particularly lossless).
#No compression (particularly lossless).


The reason that the ".amb" file specification is limited to third-order  
The reason that the ".amb" file specification is limited to third-order and below is because it uses the number of channels to uniquely define the soundfield order. Unfortunately this simple and elegant scheme does not work above third-order as ambiguities creep in. (One ambiguity is illustrated in the table below.)
and below is because it uses the number of channels to uniquely define the  
soundfield order. Unfortunately this simple and elegant scheme does not  
work above third-order as ambiguities creep in. (One ambiguity is  
illustrated in the table below.)


A more general file format will have to use something else, such as  
A more general file format will have to use something else, such as ''Malham notation'', or storing both the horizontal-order and height-order. There is a one-to-one correspondence between Malham notation and the pair of orders, and either can generate the number of channels.
''Malham notation'', or storing both the horizontal-order and  
height-order. There is a one-to-one correspondence between Malham notation  
and the pair of orders, and either can generate the number of channels.


==== Malham notation ====
==== Malham notation ====
Malham notation specifies the order of a B-Format soundfield using a  
Malham notation specifies the order of a B-Format soundfield using a string of characters, each character being either '''f''' (for full-sphere) or '''h''' (for horizontal). The first character in the string specifies the type of the first-order components, the second character the type of the second-order components, etc.
string of characters, each character being either '''f''' (for full-sphere)  
or '''h''' (for horizontal). The first character in the string specifies  
the type of the first-order components, the second character the type of  
the second-order components, etc.


{| class="wikitable" style="text-align:center"
{| class="wikitable" style="text-align:center"
|
|-
|-
!<span style="font-size:80%">Horizontal<br>order</span>
!<span style="font-size:80%">Horizontal<br>order</span>
Line 165: Line 79:


=== Default channel conversions from B-Format ===
=== Default channel conversions from B-Format ===
Converting a B-Format file to a mono file is straightforward.  Use Mono =  
Converting a B-Format file to a mono file is straightforward.  Use Mono = W*sqrt(2).
W*sqrt(2).


Converting a B-Format file to a stereo file is more difficult.  The "proper"  
Converting a B-Format file to a stereo file is more difficult.  The "proper" way to do this is to convert the W,X,Y channels to two-channel UHJ. Unfortunately this requires the use of wide-band 90-degree phase shifters. In the digital domain these are usually implemented as convolution filters.
way to do this is to convert the W,X,Y channels to two-channel UHJ.
Unfortunately this requires the use of wide-band 90-degree phase shifters.
In the digital domain these are usually implemented as convolution filters.


Assuming 90-degree phase shifters are unavailable then the problem is one of  
Assuming 90-degree phase shifters are unavailable then the problem is one of choice. Starting from B-Format, it is possible to synthesize ''any'' mic response pointing in ''any'' direction.  Hence, it is possible to synthesize ''all'' coincident stereo mic techniques.  Two popular stereo techniques are ''Blumlein Mid-Side'' and ''Blumlein Crossed Pair''.
choice. Starting from B-Format, it is possible to synthesize ''any'' mic  
response pointing in ''any'' direction.  Hence, it is possible to synthesize  
''all'' coincident stereo mic techniques.  Two popular stereo techniques are  
''Blumlein Mid-Side'' and ''Blumlein Crossed Pair''.
    
    
==== Blumlein Mid-Side ====
==== Blumlein Mid-Side ====
Line 192: Line 98:
</pre>
</pre>


Which conversion to stereo is better depends on the material and how it was  
Which conversion to stereo is better depends on the material and how it was recorded.  A good suggestion is to not specify a ''particular'' default channel conversion; instead, simply specify that there must be one.  If one has to be specified then Blumlein Crossed Pair is the simpler.
recorded.  A good suggestion is to not specify a ''particular'' default  
channel conversion; instead, simply specify that there must be one.  If one  
has to be specified then Blumlein Crossed Pair is the simpler.


== UHJ format ==
== UHJ format ==


B-Format is the main format for Ambisonic files. However, B-Format is  
B-Format is the main format for Ambisonic files. However, B-Format is not mono- or stereo-compatible. This is why the UHJ hierarchical system was developed. Depending on the number of channels available, the UHJ system can carry more or less information, but at all times it is fully mono- and stereo-compatible. Up to four channels (Left, Right, T, Q) may be used. The T-channel can also be band-limited but, as this "2&frac12;-channel UHJ" was only ever used for FM radio transmission, it will not be discussed further.
not mono- or stereo-compatible. This is why the UHJ hierarchical system  
was developed. Depending on the number of channels available, the UHJ  
system can carry more or less information, but at all times it is fully  
mono- and stereo-compatible. Up to four channels (Left, Right, T, Q) may  
be used. The T-channel can also be band-limited but, as this  
"2&frac12;-channel UHJ" was only ever used for FM radio transmission, it  
will not be discussed further.


To listen to UHJ files in surround requires a decoder in your living room.  
To listen to UHJ files in surround requires a decoder in your living room. Also, UHJ is restricted to first-order soundfields, either horizontal (two- and three-channel UHJ) or full-sphere (four-channel UHJ).
Also, UHJ is restricted to first-order soundfields, either horizontal (two-  
and three-channel UHJ) or full-sphere (four-channel UHJ).


Converting B-Format channels to UHJ channels, and vice versa, requires the  
Converting B-Format channels to UHJ channels, and vice versa, requires the use of wide-band 90-degree phase shifters. In the digital domain these are usually implemented as convolution filters. Conversion between four-channel B-Format (W, X, Y, Z) and four-channel UHJ (Left, Right, T, Q) can be accomplished without loss of information. The same with three-channel to three-channel (W, X, Y) <=> (Left, Right, T). It is possible to recover three-channel B-Format (W, X, Y) from two-channel UHJ (Left, Right), but not without loss. It is also important for the Ambisonic decoder to be aware that the B-Format channels were recovered from two-channel UHJ (because of the need to apply different shelf filters).
use of wide-band 90-degree phase shifters. In the digital domain these  
are usually implemented as convolution filters. Conversion between  
four-channel B-Format (W, X, Y, Z) and four-channel UHJ (Left, Right, T,  
Q) can be accomplished without loss of information. The same with  
three-channel to three-channel (W, X, Y) <=> (Left, Right, T). It is  
possible to recover three-channel B-Format (W, X, Y) from two-channel UHJ  
(Left, Right), but not without loss. It is also important for the Ambisonic  
decoder to be aware that the B-Format channels were recovered from  
two-channel UHJ (because of the need to apply different shelf filters).


Several hundred  
Several hundred [http://surrounddiscography.com/ two-channel UHJ LPs and CDs] have been released. Three- and four-channel UHJ recordings have never been commercially released.
[http://members.tripod.com/martin_leese/Ambisonic/uhjhtm.txt two-channel UHJ LPs and CDs]  
have been released. Three- and four-channel UHJ recordings have never been  
commercially released.


=== UHJ encoding and decoding equations ===
=== UHJ encoding and decoding equations ===
Line 255: Line 137:
where j is a +90 degree phase shift
where j is a +90 degree phase shift
</pre>
</pre>
Note that two-channel UHJ requires the player to use different shelf filters than for B-Format.
Note that two-channel UHJ requires the player to use different shelf filters than for B-Format (or for three- and four-channel UHJ).


For three- and four-channel UHJ:
For three- and four-channel UHJ:
Line 271: Line 153:


There is a file specification for downloadable two-channel UHJ files  
There is a file specification for downloadable two-channel UHJ files  
called the  
called the [http://members.tripod.com/martin_leese/Ambisonic/UHJ_file_format.html ".uhj" specification], but it is not currently in use.
[http://members.tripod.com/martin_leese/Ambisonic/UHJ_file_format.html ".uhj" specification], but it is not currently in use.


=== Limitations of the ".uhj" specification ===
=== Limitations of the ".uhj" specification ===
The [http://members.tripod.com/martin_leese/Ambisonic/UHJ_file_format.html ".uhj" specification]  
The [http://members.tripod.com/martin_leese/Ambisonic/UHJ_file_format.html ".uhj" specification] for downloadable two-channel UHJ files is based on the WAVE or WAVE-EX format. A UHJ chunk is added to the file to indicate it is UHJ. As unrecognized chunks are always skipped, use of this chunk maintains stereo compatibility. Some of the limitations of the specification are:   
for downloadable two-channel UHJ files is based on the WAVE or WAVE-EX  
format. A UHJ chunk is added to the file to indicate it is UHJ. As  
unrecognized chunks are always skipped, use of this chunk maintains stereo  
compatibility. Some of the limitations of the specification are:   


#It is limited to 4 GByte files (2 GBytes if somebody screwed up).
#It is limited to 4 GByte files (2 GBytes if somebody screwed up).
Line 285: Line 162:
#No compression.
#No compression.


The ".uhj" spcecification is only defined for two-channel UHJ to maintain  
The ".uhj" spcecification is only defined for two-channel UHJ to maintain stereo compatibility. While it would be possible to add the UHJ chunk to three- and four-channel WAVE-EX files, the recommendations from Microsoft for playing such files is that the audio device should render the extra channels to output ports not in use. This can happen even when the extra channels are masked off. (Put simply, in WAVE-EX files the channel mask does ''not'' mask channels.) Because of this, three- and four-channel WAVE-EX files can not be made stereo compatible.
stereo compatibility. While it would be possible to add the UHJ chunk to  
three- and four-channel WAVE-EX files, the recommendations from Microsoft  
for playing such files is that the audio device should render the extra  
channels to output ports not in use. This can happen even when the extra  
channels are masked off. (Put simply, in WAVE-EX files the channel mask  
does ''not'' mask channels.) Because of this, three- and four-channel  
WAVE-EX files can not be made stereo compatible.


In the Xiph world, it should be possible to use default channel conversions  
In the Xiph world, it should be possible to use default channel conversions to ensure that three- and four-channel UHJ files remain stereo compatible.
to ensure that three- and four-channel UHJ files remain stereo compatible.


=== Default channel conversions from UHJ ===
=== Default channel conversions from UHJ ===
Converting a UHJ file to a mono file is straightforward. Use Mono =  
Converting a UHJ file to a mono file is straightforward. Use Mono =  
(Left + Right) / sqrt(2).
(Left + Right) / sqrt(2).


Converting a UHJ file to a stereo file is even easier. Use Left = Left, Right = Right, and discard T and Q if present.
Converting a UHJ file to a stereo file is even easier. Use Left = Left, Right = Right, and discard T and Q if present.


== G-Format ==
== G-Format ==


A G-Format file is any common multi-channel surround file containing an  
A G-Format file is any common multi-channel surround file containing an Ambisonic soundfield pre-decoded to its speaker feeds. This allows listeners who do not own an Ambisonic decoder to enjoy Ambisonics.
Ambisonic soundfield pre-decoded to its speaker feeds. This allows  
listeners who do not own an Ambisonic decoder to enjoy Ambisonics.


The sound engineer creates a set of speaker feeds for a particular number  
The sound engineer creates a set of speaker feeds for a particular number and arrangement of speakers. This is typically four speakers arranged in a square. Other speaker arrangements are also possible
and arrangement of speakers. This is typically four speakers arranged in  
a square. Other speaker arrangements are also possible


In Ambisonics, all speakers cooperate to localise sounds in any particular  
In Ambisonics, all speakers cooperate to localize sounds in any particular direction; there are no "surround speakers" as such. Because of this, best results when playing G-Format recordings (and Ambisonics in general) are obtained when the speakers are matched. The easiest way to accomplish this is to use identical speakers. Unfortunately, many home theater systems include a center-front speaker which is different from the other speakers.
direction; there are no "surround speakers" as such. Because of this, best  
results when playing G-Format recordings (and Ambisonics in general) are  
obtained when the speakers are matched. The easiest way to accomplish this  
is to use identical speakers. Unfortunately, many home theatre systems  
include a centre-front speaker which is different from the other speakers.


An easy way to cope with this is adopted on G-Format recordings commercially  
An easy way to cope with this is adopted on G-Format recordings commercially released on DVD-A by [http://www.wyastone.co.uk/all-labels/nimbus/dvd-audio.html Nimbus Records]. They use four speakers in a square, the center-front speaker being unused.  
released on DVD-A by [http://www.wyastone.co.uk/nrl/dvd.html Nimbus Records].  
They use four speakers in a square, the centre-front speaker being unused.  


=== Recovering B-Format from G-Format ===
=== Recovering B-Format from G-Format ===
It is sometimes possible to recover the original B-Format channels from  
It is sometimes possible to recover the original B-Format channels from the G-Format speaker feeds. The recovered B-Format channels can then be fed to a decoder in the listener's living room, and so accommodate a speaker arrangement different from the one used when the G-Format file was produced. Each B-Format channel is recovered using a weighted combination of the speaker feeds in the G-Format file. The conversion coefficients required for the B-Format recovery depend on the particular speaker arrangement chosen by the sound engineer. (Obviously, if a B-Format version of the file also exists then it can be fed to the decoder directly without the need for G-Format.)
the G-Format speaker feeds. The recovered B-Format channels can then be  
fed to a decoder in the listener's living room, and so accommodate a  
speaker arrangement different from the one used when the G-Format file  
was produced. Each B-Format channel is recovered using a weighted  
combination of the speaker feeds in the G-Format file. The conversion  
coefficients required for the B-Format recovery depend on the particular  
speaker arrangment chosen by the sound engineer. (Obviously, if a  
B-Format version of the file also exists then it can be fed to the  
decoder directly without the need for G-Format.)


File formats for G-Format include all multi-channel formats that contain  
File formats for G-Format include all multi-channel formats that contain speaker feeds. However, these will not contain information to allow the B-Format channels to be automatically recovered. A [http://members.tripod.com/martin_leese/Ambisonic/G-Format_chunk.html ".amg" file format] (based on WAVE-EX) for downloadable G-Format files, which will allow the B-Format channels to be automatically recovered, has been proposed. Such file formats have the advantage of storing the conversion coefficients at the time the G-Format file is created. This is the only time the required information is readily available.
speaker feeds. However, these will not contain information to allow the  
B-Format channels to be automatically recovered. A [http://members.tripod.com/martin_leese/Ambisonic/G-Format_chunk.html ".amg" file format]  
(based on WAVE-EX) for downloadable G-Format files, which will allow  
the B-Format channels to be automatically recovered, has been proposed.  
Such file formats have the advantage of storing the conversion  
coefficients at the time the G-Format file is created. This is the only  
time the required information is readily available.


=== Default channel conversions from G-Format ===
=== Default channel conversions from G-Format ===
Converting a G-Format file to a mono or stereo file is straightforward.  
Converting a G-Format file to a mono or stereo file is straightforward. First, recover the B-Format channels using the conversion coefficients contained in the file. Second, follow the advice given above for [[#Default channel conversions from B-Format|Default channel conversions from B-Format]].
First, recover the B-Format channels using the conversion coefficients  
 
contained in the file. Second, follow the advice given above for  
An alternative approach is to encode directly into the file the coefficients for producing the stereo mix. An appropriate [http://members.tripod.com/martin_leese/Audio/StereoMix_chunk.html chunk for WAVE-EX files] has been proposed. This could be [http://members.tripod.com/martin_leese/Audio/stereo_mix_proposal.html extended to other multi-channel file formats].
[[#Default channel conversions from B-Format|Default channel conversions from B-Format]].


== Resources on Ambisonics ==
== Resources on Ambisonics ==

Latest revision as of 04:32, 3 November 2017

This page is part of the Xiph Wiki, and is aimed at people developing file formats and associated software for Ambisonics. For an general introduction to Ambisonics, please go to the Wikipedia page on Ambisonics.

Ambisonics is a surround sound system first developed in the 1970s. Its main difference from other surround techniques is that it separates transmission channels from speaker feeds, the speaker feeds being derived using a decoder situated in the living room. Decoders can be implemented in either hardware or software. Typically more speakers are used than transmission channels, and the more speakers used then the more stable the resulting soundfield. Speakers can be arranged in a number of configurations, regular polygons being the most popular.

Ambisonic files can come in a number of different formats. The main one is called B-Format, the other formats being derived from this. UHJ format is mono- and stereo-compatible. G-Format is a set of speaker feeds, so can be enjoyed in surround sound without the need for a decoder in the living room.

Ambisonics and 5.1

Ambisonics and conventional 5.1 surround sound are very different. 5.1 is a set speaker feeds, the signal only being fully defined for sounds coming from a speaker. Phantom images between speakers can be created, but the technique to do so is left unspecified. Many 5.1 releases use pair-wise mixing to create phantom images. This is understandable as almost all stereo recordings are mixed using pair-wise mixing.

Pair-wise mixing is also called "pan-potting", "amplitude mixing" and "intensity stereophony". It mixes signals into the feeds for a pair of speakers to create the illusion that a sound is coming from a point somewhere between the speakers. During mixing, the apparent location of each sound is determined only by the relative amplitude of that sound in the two speakers.

Unfortunately, pair-wise mixing works poorly when the speakers are to the rear of the listener and not-at-all when they are to one side. You can demonstrate this for yourself by performing a very simple experiment. Pair-wise mixing did not work in the quadraphonic era and it will not work now. Such an absolute statement can be made because the way that humans localize sound has not changed.

Ambisonics is fundamentally different from 5.1. What is encoded in Ambisonics is not speaker feeds, but direction. When mixing in Ambisonics, the positions of the speakers are unknown and are of no interest. Further, when Ambisonics is decoded to speaker feeds all of the speakers cooperate to localize a sound in its correct position so, for example, when the speakers on the left push those on the right pull. The speakers all contribute to the creation of a single coherent soundfield.

Ambisonics to 5.1

Converting Ambisonics to 5.1 is straightforward, and is discussed below (see G-Format).

5.1 to Ambisonics

Converting 5.1 to Ambisonics is more difficult. It is easy to make the five speaker feeds phantom images, called "virtual speakers". (The ".1" channel can be folded into W.) The problem with this is that even if the Ambisonic rendering is perfect, the result will only be as good as the original 5.1 played through real speakers. It will not be an improvement. Nobody has yet come up with a way for Ambisonics to improve 5.1; 5.1 is simply too broken.

B-Format

B-Format is a single coherent soundfield composed of a set of related channels. The number of channels used depends on whether the soundfield is horizontal-only or full-sphere, and on the order. These B-Format channels are transmission channels, not speaker feeds. Listening to B-Format requires a decoder in your living room. Some numbers of channels are tabulated below.

Channel correlation

Compression techniques typically make use of channel correlation to remove redundancy from the audio data, and so improve the compression ratio.

The correlation between B-Format channels depends on the content. Four-channel B-Format consists of an omni-directional component, called W, and three figure-of-eight components pointing forward, left and up, called X, Y, Z. (Pictures are available.) Three-channel, horizontal-only B-Format simply omits the Z channel. This means that anything in X also appears in W. Same for Y and Z. (W is omni-directional; everything appears in W.) Also, if content comes from Front-Left then it appears equally in X and Y. Same for content from Front-Right, Back-Left, Back-Right; only the relative polarities change. So there can be a lot of correlation between B-Format channels, but it is content dependent.

One problem with B-Format is that it is big on low-frequency phase. The phase relationships between the different B-Format channels are important if the resulting soundfield is to correctly "gel". This may be a problem when B-Format channels are compressed using lossy compression.

There is a file specification in use for downloadable B-Format files called the ".amb" specification.

Limitations of the ".amb" specification

The ".amb" specification for downloadable B-Format files is based on the WAVE-EX format. There are currently over 200 pieces available in this format for free download. Most of these are first-order full-sphere soundfields. (A list of Ambisonic software decoders is avaialble on Wikipedia.) Some of the limitations of the specification are:

  1. It is limited to 4 GByte files (2 GBytes if somebody screwed up).
  2. It is limited to third-order soundfields and below. While third-order looks like a lot (16 channels), there are already research rigs that reproduce fourth-order (25 channels).
  3. No compression (particularly lossless).

The reason that the ".amb" file specification is limited to third-order and below is because it uses the number of channels to uniquely define the soundfield order. Unfortunately this simple and elegant scheme does not work above third-order as ambiguities creep in. (One ambiguity is illustrated in the table below.)

A more general file format will have to use something else, such as Malham notation, or storing both the horizontal-order and height-order. There is a one-to-one correspondence between Malham notation and the pair of orders, and either can generate the number of channels.

Malham notation

Malham notation specifies the order of a B-Format soundfield using a string of characters, each character being either f (for full-sphere) or h (for horizontal). The first character in the string specifies the type of the first-order components, the second character the type of the second-order components, etc.

Horizontal
order
Height
order
Soundfield_type Malham
notation
Number
of_channels
Channels
1 0 horizontal h 3 WXY
1 1 full-sphere f 4 WXYZ
2 0 horizontal hh 5 WXYUV
2 1 mixed-order fh 6 WXYZUV
2 2 full-sphere ff 9 WXYZRSTUV
3 0 horizontal hhh 7 WXYUVPQ
3 1 mixed-order fhh 8 WXYZUVPQ
3 2 mixed-order ffh 11 WXYZRSTUVPQ
3 3 full-sphere fff 16 WXYZRSTUVKLMNOPQ
4 0 horizontal hhhh 9 extra channels unlabled

Default channel conversions from B-Format

Converting a B-Format file to a mono file is straightforward. Use Mono = W*sqrt(2).

Converting a B-Format file to a stereo file is more difficult. The "proper" way to do this is to convert the W,X,Y channels to two-channel UHJ. Unfortunately this requires the use of wide-band 90-degree phase shifters. In the digital domain these are usually implemented as convolution filters.

Assuming 90-degree phase shifters are unavailable then the problem is one of choice. Starting from B-Format, it is possible to synthesize any mic response pointing in any direction. Hence, it is possible to synthesize all coincident stereo mic techniques. Two popular stereo techniques are Blumlein Mid-Side and Blumlein Crossed Pair.

Blumlein Mid-Side

Mid = (W*sqrt(2)) + X  /*This is a cardioid response pointing forward*/
Left = Mid + Y
Right = Mid - Y

Blumlein Crossed Pair

Left = (X + Y)/sqrt(2)   /* (Left, Right) are just the (Y, X) */
Right = (X - Y)/sqrt(2)  /* responses rotated by -45 degrees  */

Which conversion to stereo is better depends on the material and how it was recorded. A good suggestion is to not specify a particular default channel conversion; instead, simply specify that there must be one. If one has to be specified then Blumlein Crossed Pair is the simpler.

UHJ format

B-Format is the main format for Ambisonic files. However, B-Format is not mono- or stereo-compatible. This is why the UHJ hierarchical system was developed. Depending on the number of channels available, the UHJ system can carry more or less information, but at all times it is fully mono- and stereo-compatible. Up to four channels (Left, Right, T, Q) may be used. The T-channel can also be band-limited but, as this "2½-channel UHJ" was only ever used for FM radio transmission, it will not be discussed further.

To listen to UHJ files in surround requires a decoder in your living room. Also, UHJ is restricted to first-order soundfields, either horizontal (two- and three-channel UHJ) or full-sphere (four-channel UHJ).

Converting B-Format channels to UHJ channels, and vice versa, requires the use of wide-band 90-degree phase shifters. In the digital domain these are usually implemented as convolution filters. Conversion between four-channel B-Format (W, X, Y, Z) and four-channel UHJ (Left, Right, T, Q) can be accomplished without loss of information. The same with three-channel to three-channel (W, X, Y) <=> (Left, Right, T). It is possible to recover three-channel B-Format (W, X, Y) from two-channel UHJ (Left, Right), but not without loss. It is also important for the Ambisonic decoder to be aware that the B-Format channels were recovered from two-channel UHJ (because of the need to apply different shelf filters).

Several hundred two-channel UHJ LPs and CDs have been released. Three- and four-channel UHJ recordings have never been commercially released.

UHJ encoding and decoding equations

Encoding

S = 0.9396926*W + 0.1855740*X
D = j(-0.3420201*W + 0.5098604*X) + 0.6554516*Y

Left = (S + D)/2.0
Right = (S - D)/2.0
T = j(-0.1432*W + 0.6512*X) - 0.7071*Y
Q = 0.9772*Z

where j is a +90 degree phase shift

Decoding

For two-channel UHJ:

S = (Left + Right)/2.0
D = (Left - Right)/2.0

W = 0.982*S + j*0.164*D
X = 0.419*S - j*0.828*D
Y = 0.763*D + j*0.385*S

where j is a +90 degree phase shift

Note that two-channel UHJ requires the player to use different shelf filters than for B-Format (or for three- and four-channel UHJ).

For three- and four-channel UHJ:

S = (Left + Right)/2.0
D = (Left - Right)/2.0

W = 0.982*S + j*0.197(0.828*D + 0.768*T)
X = 0.419*S - j(0.828*D + 0.768*T)
Y = 0.796*D - 0.676*T + j*0.187*S
Z = 1.023*Q

where j is a +90 degree phase shift

There is a file specification for downloadable two-channel UHJ files called the ".uhj" specification, but it is not currently in use.

Limitations of the ".uhj" specification

The ".uhj" specification for downloadable two-channel UHJ files is based on the WAVE or WAVE-EX format. A UHJ chunk is added to the file to indicate it is UHJ. As unrecognized chunks are always skipped, use of this chunk maintains stereo compatibility. Some of the limitations of the specification are:

  1. It is limited to 4 GByte files (2 GBytes if somebody screwed up).
  2. It is limited to two-channel UHJ files. Three- and four-channel UHJ are not accommodated.
  3. No compression.

The ".uhj" spcecification is only defined for two-channel UHJ to maintain stereo compatibility. While it would be possible to add the UHJ chunk to three- and four-channel WAVE-EX files, the recommendations from Microsoft for playing such files is that the audio device should render the extra channels to output ports not in use. This can happen even when the extra channels are masked off. (Put simply, in WAVE-EX files the channel mask does not mask channels.) Because of this, three- and four-channel WAVE-EX files can not be made stereo compatible.

In the Xiph world, it should be possible to use default channel conversions to ensure that three- and four-channel UHJ files remain stereo compatible.

Default channel conversions from UHJ

Converting a UHJ file to a mono file is straightforward. Use Mono = (Left + Right) / sqrt(2).

Converting a UHJ file to a stereo file is even easier. Use Left = Left, Right = Right, and discard T and Q if present.

G-Format

A G-Format file is any common multi-channel surround file containing an Ambisonic soundfield pre-decoded to its speaker feeds. This allows listeners who do not own an Ambisonic decoder to enjoy Ambisonics.

The sound engineer creates a set of speaker feeds for a particular number and arrangement of speakers. This is typically four speakers arranged in a square. Other speaker arrangements are also possible

In Ambisonics, all speakers cooperate to localize sounds in any particular direction; there are no "surround speakers" as such. Because of this, best results when playing G-Format recordings (and Ambisonics in general) are obtained when the speakers are matched. The easiest way to accomplish this is to use identical speakers. Unfortunately, many home theater systems include a center-front speaker which is different from the other speakers.

An easy way to cope with this is adopted on G-Format recordings commercially released on DVD-A by Nimbus Records. They use four speakers in a square, the center-front speaker being unused.

Recovering B-Format from G-Format

It is sometimes possible to recover the original B-Format channels from the G-Format speaker feeds. The recovered B-Format channels can then be fed to a decoder in the listener's living room, and so accommodate a speaker arrangement different from the one used when the G-Format file was produced. Each B-Format channel is recovered using a weighted combination of the speaker feeds in the G-Format file. The conversion coefficients required for the B-Format recovery depend on the particular speaker arrangement chosen by the sound engineer. (Obviously, if a B-Format version of the file also exists then it can be fed to the decoder directly without the need for G-Format.)

File formats for G-Format include all multi-channel formats that contain speaker feeds. However, these will not contain information to allow the B-Format channels to be automatically recovered. A ".amg" file format (based on WAVE-EX) for downloadable G-Format files, which will allow the B-Format channels to be automatically recovered, has been proposed. Such file formats have the advantage of storing the conversion coefficients at the time the G-Format file is created. This is the only time the required information is readily available.

Default channel conversions from G-Format

Converting a G-Format file to a mono or stereo file is straightforward. First, recover the B-Format channels using the conversion coefficients contained in the file. Second, follow the advice given above for Default channel conversions from B-Format.

An alternative approach is to encode directly into the file the coefficients for producing the stereo mix. An appropriate chunk for WAVE-EX files has been proposed. This could be extended to other multi-channel file formats.

Resources on Ambisonics

  • There is a set of Wikipedia articles on Ambisonics.
  • Of particular relevance is the ".amb" specification in use for downloadable B-Format files. However the ".amb" spec has some limitations which it would be useful to overcome.
  • There is also the ".uhj" specification for downloadable two-channel UHJ files, but it is not currently in use. The ".uhj" spec also has some limitations which it would be useful to overcome.
  • This website has many pages on Ambisonics (including at the bottom links to other Ambisonic websites).
  • Ambisonic.Net website includes a detailed series of descriptive and practical articles on current and past Ambisonic techniques with links to tools, other sites and additional material.
  • Richard Lee's page on Ambisonics contains articles on shelf filters and the design of Ambisonic decoders.