OggPCM Draft1: Difference between revisions
(Add info about encapsulation in Ogg, granulepos) |
m (spelling, grammar) |
||
Line 5: | Line 5: | ||
== Why is it == | == Why is it == | ||
The intention for this format is as an interchange format, | The intention for this format is as an interchange format, for example for use with [[OggStream]]. It is also useful for storing time-synced decoded audio/video, as opposed to using RIFF/WAV (.wav) and YUV4MPEG (.yuv) in separate files as was done during [[Theora]] development. | ||
It is also less complex than either .wav (RIFF) or .aiff (AIFF), both of these formats being designed for generic multimedia (audio, video, etc). Full compatability with these formats includes support for non-PCM data. | It is also less complex than either .wav (RIFF) or .aiff (AIFF), both of these formats being designed for generic multimedia (audio, video, etc). Full compatability with these formats includes support for non-PCM data. | ||
Using raw PCM data, on the other hand, doesn't give | Using raw PCM data, on the other hand, doesn't give that all-important header which carries information about the number of channels, sample width, and sample frequency. So what is needed is a header followed by raw PCM data - nothing more complicated. | ||
== Format == | == Format == | ||
Line 101: | Line 101: | ||
'''Constraints''' | '''Constraints''' | ||
This format can support | This format can support only specified sample formats. Each logical stream can support up to 16 channels sharing a fixed sample rate. Logical streams from the same source may be multiplexed to provide up to 4096 channels per source, each with their own sample rate. Up to 256 Sources may be multiplexed within a physical Ogg stream, unless an application takes other measures to logically partition the stream. | ||
'''Discussion''' | '''Discussion''' |
Revision as of 22:53, 10 November 2005
What is it
OggPCM is a pulse-code modulation (PCM) audio codec for Ogg. Similar to Microsoft's .wav or Apple's .aiff formats, it's a simple way to store and transfer uncompressed audio within an Ogg container.
Why is it
The intention for this format is as an interchange format, for example for use with OggStream. It is also useful for storing time-synced decoded audio/video, as opposed to using RIFF/WAV (.wav) and YUV4MPEG (.yuv) in separate files as was done during Theora development.
It is also less complex than either .wav (RIFF) or .aiff (AIFF), both of these formats being designed for generic multimedia (audio, video, etc). Full compatability with these formats includes support for non-PCM data.
Using raw PCM data, on the other hand, doesn't give that all-important header which carries information about the number of channels, sample width, and sample frequency. So what is needed is a header followed by raw PCM data - nothing more complicated.
Format
Packets are processed as per the value of their first byte. Packets of unknown ID should be silently ignored, providing a convient way to add future expandability which does not break the data format. Multibyte fields in the header packet are packed in big endian order. Other fields are stored MSB first. Multibyte fields in the data packet are packed in little endian order.
The granule position specified is the total samples encoded after including all samples on the page. Samples must not be split across pages. The rationale here is that the position specified in the frame header of the last page tells how long the data coded by the bitstream is. A truncated stream will still return the proper number of samples that can be decoded fully.
An example of how this can be useful is the proposed ReplayGain extension to .wav format: http://replaygain.hydrogenaudio.org/file_format_wav.html
Note that no such extension is planned, nor is the need for a future format forseen, but history has shown that even the most basic formats eventually become obsolete.
Packet 0, BOS, 12 bytes 8 0x00 Stream Header Packet ID 24 "PCM" Codec identifier - 8 0x01 Version Major (breaks backwards compatability to increment) 8 0x00 Version Minor (backwards compatable, ie, via extended header) 8 [int] Number of Channels (1-256) 1 [flg] False = MSB, True = LSB 3 [int] PCM Data Type (see table below) 4 [nil] Padding to byte, may be used in later minor version - 32 [int] Samplerate (samples/second)
Comment Header Packet 8 0x03 Comment Header Packet ID 24 "PCM" Codec Identifier -- Continues as [Comment Header]
Data Packet 8 0xFF Data Packet ID 24 "PCM" Codec identifier, pads data to 32-bits .. [data] variable length pcm data
PCM Data Type ============= ID# Bits Type 0 8 signed (char) 1 8 unsigned (char) 2 16 signed (short int) 3 24 signed (int + 8bit padding) 4 32 signed (int) 5 32 float (float) 6 64 float (double) 7 ? Extend - unsupported by 1.0-only software
Alternative Format
The primary difference between this format and the one above is that it is intended to support channels from the same source having different sampling parameters.
Packet structure
Packet 0, BOS, tbd bytes 8 0x00 Header Packet ID 24 "PCM" Codec identifier - 8 0x01 Version Major (breaks backwards compatability to increment) 8 0x00 Version Minor (backwards compatable, ie, via extended header) 8 [uint] Source ID (Unique amongst all OggPCM streams in the physical stream) 8 [uint] Channel Block - 16 [bitfield] Indicates which of the 16 channels in this channel block are present in this logical OGGPCM stream. 8 [enum] Sample format (OGGPCM_FMT_U8, OGGPCM_FMT_LE_S16, OGGPCM_FMT_BE_S16, etc) 24 [uint] Sample rate ** this field crosses a 32bit-word barrier **
Data Packet 8 0xFF Data Packet ID 24 "PCM" Codec identifier, pads data to 32-bits .. [data] variable length pcm data, packing defined by Sample Format field in header
Sample Format
OGG_PCM_S8 = 0x1 /* Signed 8 bit. */ OGG_PCM_S16 = 0x2 OGG_PCM_S24 = 0x3 OGG_PCM_S32 = 0x4 OGG_PCM_U8 = 0x5 /* Unsigned 8 bit */ OGG_PCM_FLOAT32 = 0x6 OGG_PCM_FLOAT64 = 0x7
Encapsulation in Ogg
Ogg provides encapsulation of data in packets, which may be marked with a granulepos. The granulepos of an Ogg packet indicates the presentation time of the last presentable element in the packet; for audio, this corresponds to the timestamp of the last audio frame.
Following standard terminology for uncompressed audio, an audio frame is the collection of samples for all channels for a single sampling period. For example, an audio frame for a stereo signal is a pair of sample values for the left and right channels.
An OggPCM packet MUST NOT be constructed with a partial frame; ie. an audio frame must not span two Ogg packets.
Constraints
This format can support only specified sample formats. Each logical stream can support up to 16 channels sharing a fixed sample rate. Logical streams from the same source may be multiplexed to provide up to 4096 channels per source, each with their own sample rate. Up to 256 Sources may be multiplexed within a physical Ogg stream, unless an application takes other measures to logically partition the stream.
Discussion
This seems to make it easy to support the simple/normal cases and possible to support the pathological cases, for instance:
Source ID | Channel Bitfield | Sample Rate | Sample Format | Comment |
0x00 | 0000 0000 0000 0011 | 96000 | OGGPCM_FMT_LE_S24 | Front Stereo Pair |
0x00 | 0000 0000 0011 1100 | 44100 | OGGPCM_FMT_LE_S16 | Center And Surrounds |
0x00 | 0000 0000 0010 0000 | 8000 | OGGPCM_FMT_LE_S16 | LFE Channel |
0x01 | 0000 0000 0000 0001 | 8000 | OGGPCM_FMT_U8 | PC Speaker |
0x02 | 0000 0000 0000 0001 | 8000 | OGGPCM_FMT_U8 | Microphone |
0x03 | 0000 0000 0000 0011 | 8000 | OGGPCM_FMT_LE_S16 | Voice Chat |
Each entry in the table is a logical Ogg stream. Arc is not convinced that the source id and channel block are necessary, but figured he'd throw it out there.