OggPCM: Difference between revisions
m (update link to OggPCM_Draft1) |
m (OggPCM2 moved to OggPCM Draft2) |
Revision as of 17:18, 21 February 2006
This page was created as an alternative to the original OggPCM_Draft1. After a heated debate most developers are now working on this version of the spec.
Draft format 2 for OggPCM
The following is an draft format for OggPCM. This is a work in progress and not a final proposal. In particular, there is no agreement yet on the channel mapping extra headers.
OggPCM is an encapsulation of PCM audio data into an Ogg logical bitstream. An OggPCM bistream may be concurrently multiplexed with other Ogg logical bitstreams such as OggUVS video or CMML metadata,
Note that unless otherwise noted, all multi-byte fields use the network byte order (big endian). The first packet in a stream MUST be the main header packet. The second packet MUST be the comment packet. Some extra header packets MAY be included after the comment header, provided this is identified in the main header. The packets that follow MUST all be data packets.
Main Header Packet
Multibyte fields in the header packets are packed in big endian order, to be consistent with network byte order. A header packet contains the following fields:
64 "PCM " Codec identifier 16 0x00 Version Major (breaks backwards compatibility to increment) 16 0x00 Version Minor (backwards compatible, ie, more supported format id's) 32 [uint] PCM format 32 [uint] Sampling rate [Hz] 8 [uint] Number of significant bits 8 [uint] Number of Channels (< 256) 16 [uint] Maximum number of frames per packet 32 [uint] Number of extra header packets
A PCM "frame" is composed of samples for all channels at a given time.
The "Codec identifier" is 64 bit long since most other Ogg codecs specify their identifier within the first 64 bits rather than the first 32 bits, so this allows applications to match on all 64 bits consistently.
The "Maximum number of frames per packet" field is meant to notify an application reading the file that no data packet will contain more than a certain number of frames. This not only makes implementation easier, but also provides information on how much needs to be buffered when streaming PCM files. A value of 0 means a maximum of 65536 frames. Implementations SHOULD make this field such that packets do not get split into multiple pages.
The "Number of significant bits" field specifies how many bits are actually used. The other bits MUST be zero. This can be used to support audio with any resolution. For example, 12-bit PCM can be supported as "16 bit PCM" for the format and 12 for the number of significant bits.
For streams where the number of significant bits is the same as the bit width specified by the format, the significant bits field may be set to zero.
For streams where the number of significant bits is less than that specified by the bit width, the data shall be justified to fill the most significant bits. For 12 bit PCM in a 16 bit format, the 12 valid bits will occupy the 12 most significant bits of the 16 bit word and the least significant 4 bits shall be zero.
Since the main header packet and the comment packet are mandatory, the "extra header packets" field counts any additional header packets (aside from these two) that can be provided before the start of the data packets.
Supported PCM Formats
Format ID Short Name Description -- Integer coding 0x00000000 OGGPCM_FMT_S8 Signed integer 8 bit 0x00000001 OGGPCM_FMT_U8 Unsigned integer 8 bit 0x00000002 OGGPCM_FMT_S16_LE Signed integer 16 bit little endian 0x00000003 OGGPCM_FMT_S16_BE Signed integer 16 bit big endian 0x00000004 OGGPCM_FMT_S24_LE Signed integer 24 bit little endian 0x00000005 OGGPCM_FMT_S24_BE Signed integer 24 bit big endian 0x00000006 OGGPCM_FMT_S32_LE Signed integer 32 bit little endian 0x00000007 OGGPCM_FMT_S32_BE Signed integer 32 bit big endian -- -- Compressed PCM 0x00000010 OGGPCM_FMT_ULAW G.711 u-law encoding (8 bit) 0x00000011 OGGPCM_FMT_ALAW G.711 A-law encoding (8 bit) -- -- IEEE Floating point coding 0x00000020 OGGPCM_FMT_FLT32_LE IEEE Float [-1,1] 32 bit little endian 0x00000021 OGGPCM_FMT_FLT32_BE IEEE Float [-1,1] 32 bit big endian 0x00000022 OGGPCM_FMT_FLT64_LE IEEE Float [-1,1] 64 bit little endian 0x00000023 OGGPCM_FMT_FLT64_BE IEEE Float [-1,1] 64 bit big endian
Format IDs below 0x80000000 are reserved for use by Xiph and all the ones above are allowed for application-specific formats.
Comment packet
The codec header is followed by a "vorbis comment" packet and by optional extra headers, if any. The format used is the same as for Vorbis with the exception that there is no packet identifier (so the packet is exactly like it is for Speex).
Data Packets
Data packets contain the raw PCM audio in interleaved format (complete frames are encoded sequentially) with the following definitions/restrictions:
- A PCM "frame" is composed of samples for all channels at a given time.
- Any OggPCM packet MUST only contain complete frames (ie samples for all channels at a given sampling instance). Partial frames are forbidden. It is RECOMMENDED that decoders that come across an invalid packet containing a partial frame to drop the partial frame (at the end) and issue an error.
- There is no padding allowed in a frame except when some bits (<8) are needed to complete a byte. This means that packet size has a direct relationship to the number of frames in the packet (for purposes of seeking).
- Recommended packet size is smaller than 4k since interleaving and seeking in Ogg bitstreams is done on the resolution of packets and thus larger packet sizes create suboptimal bitstreams.
Extra Headers (optional)
Extra header packets contain additional information about the OggPCM stream, and must come after the Comment Packet and before the first Data Packet. Each extra header is defined as:
32 [uint] Header ID ... Header data
Two examples for such optional extra header packets are the channel mapping and the channel conversion header packets.
Channel Mapping Header
The channel mapping header is defined as:
32 0x00000000 Header ID 16 [uint] Major version 16 [uint] Minor version 32 [uint] Channel type 32x2N [uint]Channel map (channel-target pairs)
All channel_types less than 0x80000000 are reserved for use by Xiph; 0x80000000 and above are allowed for application specific extensions.
This scheme allows for 2^31 -1 Xiph defined channel map types and 2^32 distinct channel names.
Exampe values for channel types might be:
OGG_CHANNEL_MAP_MONO = 0 OGG_CHANNEL_MAP_STEREO = 1 OGG_CHANNEL_MAP_MS_WAVE = 2 OGG_CHANNEL_MAP_QUADROPHONIC = 3
and defined channels might be:
OGG_CHANNEL_FRONT_CENTER = 0 OGG_CHANNEL_FRONT_LEFT = 1 OGG_CHANNEL_FRONT_RIGHT = 2 OGG_CHANNEL_SURROUND_LEFT = 3 OGG_CHANNEL_SURROUND_RIGHT = 4 OGG_CHANNEL_SURROUND_REAR = 5 OGG_CHANNEL_REAR_LEFT = 6 OGG_CHANNEL_REAR_RIGHT = 7 OGG_CHANNEL_LFE_CENTER = 8 OGG_CHANNEL_LFE_LEFT = 9 OGG_CHANNEL_LFE_RIGHT = 10
A stereo file could thus be defined as:
channel_type = OGG_CHANNEL_MAP_STEREO channel_map [0] = OGG_CHANNEL_FRONT_LEFT channel_map [1] = OGG_CHANNEL_FRONT_RIGHT
The channel map in this case is: "0 1 1 2".
Channel Mapping Defaults
(ideas by JMV, not yet approved by anyone else. Should be merged in respective header definition above if approved)
In order to simplify implementations when it comes to channel mappings, several defaults are defined when no extra header is present.
- Files containing one channel are assumed to be plain mono files with:
channel_type = OGG_CHANNEL_MAP_MONO channel_map [0] = OGG_CHANNEL_FRONT_CENTER
- Files containing two channels are assumed to be stereo files with:
channel_type = OGG_CHANNEL_MAP_STEREO channel_map [0] = OGG_CHANNEL_FRONT_LEFT channel_map [1] = OGG_CHANNEL_FRONT_RIGHT
- Files containing four channels are assumed to be B-format ambisonic files with:
channel_type = OGG_CHANNEL_MAP_B_FORMAT channel_map [0] = OGG_CHANNEL_W channel_map [1] = OGG_CHANNEL_X channel_map [2] = OGG_CHANNEL_Y channel_map [3] = OGG_CHANNEL_Z
Channel Conversion Header
Any number of channel conversion headers can be specified. This header specifies how to down-mix the data to another format.
32 0x00000001 Remixing Header Id 16 [uint] Major version 16 [uint] Minor version 32 [uint] Target Channel type 32xMxN [sint] Target Channel (M) x Src Channel (N) Gain array
The ordering of the mixing matrix is such that source channel gains are consecutive. The gain (note: *signed* integer) has the 16 MSBs for the integer part (including sign) and 16 bits for the fracional part of the gain. Note: the gain can be negative.
Channel Conversion Defaults
- Stereo files SHOULD be converted to mono files by averaging the left channel and the right channel
- Ambisonic files SHOULD be converted to stereo files basic ambisonic dematrixing W and X.
Related Links
Short info about AC-3: http://www.mediatwins.com/en/support/kb_topic_11.html
AC-3 spec: http://www.atsc.org/standards/a_52a.pdf
Note: around p34/140 it appears to be how the channel mapping is encoded.
.wav extended headers for multi channel: http://www.microsoft.com/whdc/device/audio/multichaud.mspx
General surround info: http://www.surroundassociates.com/fqmain.html