Changes

Jump to: navigation, search

OggPCM

4,552 bytes removed, 18:06, 6 September 2009
comments have been solicited, without effective feedback. removing notice of draft status.
== OggPCM ==
 
The following is an draft format for OggPCM. This is a work in progress and not a final proposal. In particular, there is no agreement yet on the channel mapping extra headers.
OggPCM is an encapsulation of PCM audio data into an Ogg logical bitstream. An OggPCM bistream may be concurrently multiplexed with other Ogg logical bitstreams such as [[OggUVS]] video or [[CMML]] metadata,
Note that unless otherwise noted, all multi-byte fields use the network byte order (big endian). :''Portable players are usually ARM, which is usually little-endian. The Macintosh is now little-endian. Obviously the PC is little-endian. Clearly there is a winner. It's long past time to stop putting the bytes in an order that makes both programmers and computers do extra work for no good reason. Don't try to hold back the tide.'' The first packet in a stream MUST be the main header packet. The second packet MUST be the comment packet. Some extra header packets MAY be included after the comment header, provided this is identified in the main header. The packets that follow MUST all be data packets.
=== Main Header Packet ===
16 0x00 Version Minor (backwards compatible, ie, more supported format id's)
32 [uint] PCM format
32 [uint] Sampling rate [Hz] -- this should be a rational with at least a 22-bit numerator and 10-bit denominator
8 [uint] Number of significant bits
8 [uint] Number of Channels (< 256)
A PCM "frame" is composed of samples for all channels at a given time.
 
An integer sampling rate is trouble. Audio does not always come that way. For example, audio is sometimes tied to the NTSC frame rate of 30000/1001. That 1001 can show up in the sample rate, and thus needs 10 bits. Rates with a 3 in the denominator are common too. Super Audio CD needs 22 bits to represent 2.8224 MHz. So 22 bits and 10 bits will do the job. Better would be 32 bits for both numerator and denominator of course. A float will never be quite right, though it sure beats an integer and will in fact hold exact values into the MHz. One can't express 1/3 or 1/10 as a float, so 12345.6 and 12345.6666... are undoable that way. BTW, allowing for subsonic recording would be nice.
The "Codec identifier" is 64 bit long since most other Ogg codecs specify their identifier within the first 64 bits rather than the first 32 bits, so this allows applications to match on all 64 bits consistently.
The first optional headers to be defined handle mappings from physically stored channels to logical channels, such as speaker feeds and Ambisonic signals.
==== Channel Mapping, option 1 Headers ==== ===== Channel Mapping Header ===== The channel mapping header is defined as:  32 0x00000000 Header ID 16 [uint] Major version 16 [uint] Minor version 32 [uint] Channel type 32x2N [uint]Channel map (channel-target pairs) All channel_types less than 0x80000000 are reserved for use by Xiph; 0x80000000 and above are allowed for application specific extensions. This scheme allows for 2^31 -1 Xiph defined channel map types and 2^32 distinct channel names. Exampe values for channel types might be:  OGG_CHANNEL_MAP_MONO = 0 OGG_CHANNEL_MAP_STEREO = 1 OGG_CHANNEL_MAP_MS_WAVE = 2 OGG_CHANNEL_MAP_QUADRAPHONIC = 3 and defined channels might be:  OGG_CHANNEL_FRONT_CENTER = 0 OGG_CHANNEL_FRONT_LEFT = 1 OGG_CHANNEL_FRONT_RIGHT = 2 OGG_CHANNEL_SURROUND_LEFT = 3 OGG_CHANNEL_SURROUND_RIGHT = 4 OGG_CHANNEL_SURROUND_REAR = 5 OGG_CHANNEL_REAR_LEFT = 6 OGG_CHANNEL_REAR_RIGHT = 7 OGG_CHANNEL_LFE_CENTER = 8 OGG_CHANNEL_LFE_LEFT = 9 OGG_CHANNEL_LFE_RIGHT = 10 A stereo file could thus be defined as:  channel_type = OGG_CHANNEL_MAP_STEREO channel_map [0] = OGG_CHANNEL_FRONT_LEFT channel_map [1] = OGG_CHANNEL_FRONT_RIGHT The channel map in this case is: "0 1 1 2".  ===== Channel Mapping Defaults ===== (ideas by JMV, not yet approved by anyone else. Should be merged in respective header definition above if approved) In order to simplify implementations when it comes to channel mappings, several defaults are defined when no extra header is present. * Files containing one channel are assumed to be plain mono files with: channel_type = OGG_CHANNEL_MAP_MONO channel_map [0] = OGG_CHANNEL_FRONT_CENTER * Files containing two channels are assumed to be stereo files with: channel_type = OGG_CHANNEL_MAP_STEREO channel_map [0] = OGG_CHANNEL_FRONT_LEFT channel_map [1] = OGG_CHANNEL_FRONT_RIGHT * Files containing three channels are assumed to be B-format Ambisonic files with: channel_type = OGG_CHANNEL_MAP_B_FORMAT channel_map [0] = OGG_CHANNEL_W channel_map [1] = OGG_CHANNEL_X channel_map [2] = OGG_CHANNEL_Y * Files containing four channels are assumed to be B-format Ambisonic files with: channel_type = OGG_CHANNEL_MAP_B_FORMAT channel_map [0] = OGG_CHANNEL_W channel_map [1] = OGG_CHANNEL_X channel_map [2] = OGG_CHANNEL_Y channel_map [3] = OGG_CHANNEL_Z ===== Channel Conversion Header ===== Any number of channel conversion headers can be specified. This header specifies how to down-mix the data to another format.  32 0x00000001 Remixing Header Id 16 [uint] Major version 16 [uint] Minor version 32 [uint] Target Channel type 32xMxN [sint] Target Channel (M) x Src Channel (N) Gain array The ordering of the mixing matrix is such that source channel gains are consecutive. The gain (note: *signed* integer) has the 16 MSBs for the integer part (including sign) and 16 bits for the fracional part of the gain. Note: the gain can be negative. ===== Channel Conversion Defaults ===== * Stereo files SHOULD be converted to a mono file by averaging the left channel and the right channel* Ambisonic files SHOULD be converted to a mono file using Mono = W*sqrt(2).* Ambisonic files SHOULD be converted to stereo files by dematrixing W, X and Y. ==== Channel Mapping, proposed option 2 ==== This proposed version of Channel Mapping has not yet gained the support of the Xiph.Org Foundation. However, it is likely the more mature proposal between the two. Still needs a bit more of polish, though.
Channel mappings are used to convey the meaning of the PCM signals stored in an OggPCM stream. They have been designed so that commonly used transmission formats like stereo, 5.1 and Ambisonics can be accurately tagged and distinguished from each other. Rudimentary downmixing from multichannel formats to stereo and mono and interoperability with compatibility formats like Dolby Surround and Ambisonics UHJ are also supported.
// front center/mono
OGG_CHANNEL_FRONT_CENTER OGG_CHANNEL_SCREEN_CENTER = 256 = 0x00000100 (ear level, straight ahead, at screen distance)
OGG_CHANNEL_MS_MID = 257 = 0x00000101 (cardioid response, straight ahead)
OGG_CHANNEL_FRONT_CENTER = 258 = 0x00000102 (ear level, straight ahead)
// lfe
Unless otherwise indicated, the logical channels are assumed to be speaker feeds, with the speaker lying in the indicated direction. The direction is referenced to either the front center, or where indicated, the back center speaker. By default all of the speakers SHOULD be at the same distance from the listener, or the so called "sweet spot", so that temporally coincident signals give rise to temporally coincident sound at the listening position. Where the channel_type indicates an interpretation other than a speaker feed, temporal coincidence SHOULD still hold.
Some of the base standards used to derive the channel mappings are sensitive to speaker distance in addition to any possible time delay, and some are not. In any case interoperability between the different standards calls for setting the distance. The base standards used to derive the channel map rarely take a stance on that, so for the purposes of this specification the speaker distance, the listening area, and the Ambisonics coding radius are all idealized as being infinite, unless otherwise noted. Hence, the field produced by any speaker feed SHOULD by default approximate a planar wave at the sweet spot.
Unless otherwise indicated, each channel should give rise to the same sound pressure level at the listener. The channel mapping metadata does not impose an absolute reference level for the channel data. The relative levels for ambisonic channels are given by the Furse-Malham convention.
===== Defaulting and Standard Mappings =====
OggPCM streams were originally defined without channel maps, so for compatibility purposes, the simplest cases are defaulted based on the number of physical channels present. The precise Channel Mapping Headers and Channel Conversion Headers that are implied are specified below. Further INFORMATIVE mappings for various channel layouts can be found in the companion document ''[[channel mapping examples|in a companion document]]''.
* Files containing precisely one channel and no explicit channel map are assumed to contain plain mono.
16 0x0000 Version Minor 0
32 0x00000000 Channel 0
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00000001 Channel Conversion Header
16 0x0000 Version Major 0
16 0x0000 Version Minor 0
32 0x00000000 Channel 0, containing OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00000000 Channel_type OGG_CHANNEL_STEREO_LEFT
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000000 Channel 0, containing OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00000001 Channel_type OGG_CHANNEL_STEREO_RIGHT
32 0x0000B504 Mixing coefficient 1/sqrt(2)
16 0x0000 Version Minor 0
32 0x00000000 Channel 0, containing OGG_CHANNEL_STEREO_LEFT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000001 Channel 1, containing OGG_CHANNEL_STEREO_RIGHT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
16 0x0000 Version Minor 0
32 0x00000000 Channel 0, containing OGG_CHANNEL_AMBISONICS_W
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00016A09 Mixing coefficient sqrt(2)
16 0x0000 Version Minor 0
32 0x00000000 Channel 0, containing OGG_CHANNEL_AMBISONICS_W
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00016A09 Mixing coefficient sqrt(2)
32 0x00000001 Channel_type OGG_CHANNEL_STEREO_RIGHT
32 0x00000002 Channel 2
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00000003 Channel 3
32 0x00000200 Channel_type OGG_CHANNEL_LFE
32 0x00000001 Channel_type OGG_CHANNEL_STEREO_RIGHT
32 0x00010000 Mixing coefficient 1
32 0x00000002 Channel 2, containing OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00000000 Channel_type OGG_CHANNEL_STEREO_LEFT
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000002 Channel 2, containing OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00000001 Channel_type OGG_CHANNEL_STEREO_RIGHT
32 0x0000B504 Mixing coefficient 1/sqrt(2)
16 0x0000 Version Minor 0
32 0x00000000 Channel 0, containing OGG_CHANNEL_STEREO_LEFT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000001 Channel 1, containing OGG_CHANNEL_STEREO_RIGHT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000002 Channel 2, containing OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER 32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00010000 Mixing coefficient 1
32 0x00000003 Channel 3, containing OGG_CHANNEL_LFE
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x000A0000 Mixing coefficient 10
32 0x00000004 Channel 4, containing OGG_CHANNEL_ITU_BACK_LEFT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000005 Channel 5, containing OGG_CHANNEL_ITU_BACK_RIGHT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000001 Channel_type OGG_CHANNEL_STEREO_RIGHT
32 0x00000002 Channel 2
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00000003 Channel 3
32 0x00000200 Channel_type OGG_CHANNEL_LFE
32 0x00000001 Channel_type OGG_CHANNEL_STEREO_RIGHT
32 0x00010000 Mixing coefficient 1
32 0x00000002 Channel 2, containing OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00000000 Channel_type OGG_CHANNEL_STEREO_LEFT
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000002 Channel 2, containing OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00000001 Channel_type OGG_CHANNEL_STEREO_RIGHT
32 0x0000B504 Mixing coefficient 1/sqrt(2)
16 0x0000 Version Minor 0
32 0x00000000 Channel 0, containing OGG_CHANNEL_STEREO_LEFT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000001 Channel 1, containing OGG_CHANNEL_STEREO_RIGHT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000002 Channel 2, containing OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER 32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00010000 Mixing coefficient 1
32 0x00000003 Channel 3, containing OGG_CHANNEL_LFE
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x000A0000 Mixing coefficient 10
32 0x00000004 Channel 4, containing OGG_CHANNEL_ITU_BACK_LEFT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00008000 Mixing coefficient 1/2
32 0x00000005 Channel 5, containing OGG_CHANNEL_ITU_BACK_RIGHT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00008000 Mixing coefficient 1/2
32 0x00000006 Channel 6, containing OGG_CHANNEL_BACK_CENTER
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000001 Channel_type OGG_CHANNEL_STEREO_RIGHT
32 0x00000002 Channel 2
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00000003 Channel 3
32 0x00000200 Channel_type OGG_CHANNEL_LFE
32 0x00000001 Channel_type OGG_CHANNEL_STEREO_RIGHT
32 0x00010000 Mixing coefficient 1
32 0x00000002 Channel 2, containing OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00000000 Channel_type OGG_CHANNEL_STEREO_LEFT
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000002 Channel 2, containing OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00000001 Channel_type OGG_CHANNEL_STEREO_RIGHT
32 0x0000B504 Mixing coefficient 1/sqrt(2)
16 0x0000 Version Minor 0
32 0x00000000 Channel 0, containing OGG_CHANNEL_STEREO_LEFT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000001 Channel 1, containing OGG_CHANNEL_STEREO_RIGHT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000002 Channel 2, containing OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER 32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00010000 Mixing coefficient 1
32 0x00000003 Channel 3, containing OGG_CHANNEL_LFE
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x000A0000 Mixing coefficient 10
32 0x00000004 Channel 4, containing OGG_CHANNEL_BACK_STEREO_LEFT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00008000 Mixing coefficient 1/2
32 0x00000005 Channel 5, containing OGG_CHANNEL_BACK_STEREO_RIGHT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x00008000 Mixing coefficient 1/2
32 0x00000006 Channel 6, containing OGG_CHANNEL_SIDE_LEFT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
32 0x00000007 Channel 7, containing OGG_CHANNEL_SIDE_RIGHT
32 0x00000100 Channel_type OGG_CHANNEL_FRONT_CENTEROGG_CHANNEL_SCREEN_CENTER
32 0x0000B504 Mixing coefficient 1/sqrt(2)
*[http://developer.apple.com/documentation/MusicAudio/Reference/CAFSpec/CAF_intro/chapter_1_section_1.html#//apple_ref/doc/uid/TP40001862-CH203-DontLinkElementID_60 Apple Core Audio Format 1.0 specification]
*[http://www.acoustics.hut.fi/research/cat/vbap/ Vector Base Amplitude Panning]
 
[[Category:Drafts]]
[[Category:Ogg Mappings]]
33
edits

Navigation menu