OggPCM Draft1: Difference between revisions

Revision as of 08:37, 12 November 2005

What is it

OggPCM is a pulse-code modulation (PCM) audio codec for Ogg. Similar to Microsoft's .wav or Apple's .aiff formats, it's a simple way to store and transfer uncompressed audio within an Ogg container. For the purposes of this document, the term PCM is used to describe a digital representation of an audio signal, where volume samples are taken at regular uniform intervals and then quantized into a digital (usually binary) code. A more complete definition of PCM and related terminology can be found at Wikipedia.

Why is it

The intention for this format is as an interchange format, for example for use with OggStream. It is also useful for storing time-synced decoded audio/video, as opposed to using RIFF/WAV (.wav) and YUV4MPEG (.yuv) in separate files as was done during Theora development. It is intended to be less complex to use than either RIFF or AIFF

The degenerate stream is a single header packet followed by the raw data packets. While this degenerate stream is not incredibly useful for long term storage or as a general purpose container, it is useful for applications where other data describing the stream is available out of band, for instance amongst cooperating applications in an inter-process communication scheme. Streams providing the extra defined comment packets are intended to be useful for long term storage and communication amongst diverse applications.

Format

This is a the current working draft, a compromise between the different promposed elements needed

Packets are processed as per the value of their first byte. Packets of unknown ID should be silently ignored, providing a convient way to add future expandability which does not break the data format. Multibyte fields in the header packets are packed in little endian order. Multibyte fields in the data packet are packed according to the endian flag in the stream header packet.

An audio frame consists of one sample from each audio channel encoded in sequence. The granule position specified is the total audio frames in the stream including the last complete packet in a page. Audio frames must not be split across packets. The rationale here is that the position specified in the frame header of the last page tells how long the data coded by the bitstream is in samples as well as provides the current stream position to seeking routines. A truncated stream will still return the proper number of audio frames that can be decoded fully.

An example of how this can be useful is the proposed ReplayGain extension to .wav format: http://replaygain.hydrogenaudio.org/file_format_wav.html

Note that no such extension is planned, nor is the need for a future format forseen, but history has shown that even the most basic formats eventually become obsolete.

Packet 0, BOS, 12 bytes
 8  0x00   Stream Header Packet ID
24  "PCM"  Codec identifier 
 -
 8  0x01   Version Major (breaks backwards compatability to increment)
 8  0x00   Version Minor (backwards compatable, ie, via extended header)
 8  [int]  Number of Channels (1-256)
 1  [flg]  False = MSB, True = LSB
 3  [int]  PCM Data Type (see table below)
 4  [nil]  Padding to byte, may be used in later minor version
 -
32  [int]  Samplerate (samples/second)

Comment Header Packet
 8  0x03   Comment Header Packet ID
24  "PCM"  Codec Identifier
-- Continues as [Comment Header]

Data Packet
 8  0xFF   Data Packet ID
24  "PCM"  Codec identifier, pads data to 32-bits
..  [data] variable length pcm data

PCM Data Type
=============
ID#  Bits  Type
 0   8     signed   (char)
 1   8     unsigned (char)
 2   16    signed   (short int)
 3   24    signed   (int + 8bit padding)
 4   32    signed   (int)
 5   32    float    (float)
 6   64    float    (double)
 7   ?     Extended unsupported by 1.0 software

Encapsulation in Ogg

The granulepos of an Ogg page indicates the presentation time of the last presentable element in the last complete packet within that page; for OggPCM, a granule is an audio frame.

Following standard terminology for uncompressed audio, an audio frame is the collection of samples for all channels for a single sampling period. For example, an audio frame for a stereo signal is a pair of sample values for the left and right channels.

Constraints

Version 1.0 codec software MUST NOT attempt to decode when the Extended (7) Data Type is specified.

An OggPCM packet MUST NOT be constructed with a partial frame; ie. an audio frame must not span two Ogg packets.

Alternative Format

This format was written by Jkoleszar, and has since been combined with other ideas into the primary format (above)

It is intended to support channels from the same source having different sampling parameters.

Packet structure

Packet 0, BOS, tbd bytes
 8  0x00       Header Packet ID
24  "PCM"      Codec identifier 
 -
 8  0x01       Version Major (breaks backwards compatability to increment)
 8  0x00       Version Minor (backwards compatable, ie, via extended header)
 8  [uint]     Source ID (Unique amongst all OggPCM streams in the physical stream)
 8  [uint]     Channel Block
 -
16  [bitfield] Indicates which of the 16 channels in this channel block 
               are present in this logical OGGPCM stream.
 8  [enum]     Sample format (OGGPCM_FMT_U8, OGGPCM_FMT_LE_S16, OGGPCM_FMT_BE_S16, etc) 
24  [uint]     Sample rate ** this field crosses a 32bit-word barrier **

Data Packet
 8  0xFF       Data Packet ID
24  "PCM"      Codec identifier, pads data to 32-bits
..  [data]     variable length pcm data, packing defined by Sample Format field in header

Sample Format

OGG_PCM_S8      = 0x1       /* Signed 8 bit. */
OGG_PCM_S16     = 0x2
OGG_PCM_S24     = 0x3
OGG_PCM_S32     = 0x4
OGG_PCM_U8      = 0x5        /* Unsigned 8 bit */
OGG_PCM_FLOAT32 = 0x6
OGG_PCM_FLOAT64 = 0x7

Discussion

This seems to make it easy to support the simple/normal cases and possible to support the pathological cases, for instance:

Source ID	Channel Bitfield	Sample Rate	Sample Format	Comment
0x00	0000 0000 0000 0011	96000	OGGPCM_FMT_LE_S24	Front Stereo Pair
0x00	0000 0000 0011 1100	44100	OGGPCM_FMT_LE_S16	Center And Surrounds
0x00	0000 0000 0010 0000	8000	OGGPCM_FMT_LE_S16	LFE Channel
0x01	0000 0000 0000 0001	8000	OGGPCM_FMT_U8	PC Speaker
0x02	0000 0000 0000 0001	8000	OGGPCM_FMT_U8	Microphone
0x03	0000 0000 0000 0011	8000	OGGPCM_FMT_LE_S16	Voice Chat

Each entry in the table is a logical Ogg stream. Jkoleszar is not convinced that the source id and channel block are necessary, but figured he'd throw it out there.

@@ Line 1: / Line 1: @@
 == What is it ==
-'''OggPCM''' is a pulse-code modulation (PCM) audio codec for Ogg.  Similar to Microsoft's .wav or Apple's .aiff formats, it's a simple way to store and transfer uncompressed audio within an Ogg container. For the purposes of this document, the term PCM is used to describe a digital representation of an audio signal, where volume samples are taken at regular uniform intervals and then quantized into a digital (usually binary) code. A more complete definition of PCM and related terminology can be found at [http://en.wikipedia.org/wiki/Pulse-code_modulation Wikipedia.]
+'''OggPCM''' is a pulse-code modulation (PCM) audio codec for Ogg.  Similar to Microsoft's .wav or Apple's .aiff formats, it's a simple way to store and transfer uncompressed audio within an Ogg container. For the purposes of this document, the term PCM is used to describe a digital representation of an audio signal, where volume samples are taken at regular uniform intervals and then quantized into a digital (usually binary) code. A more complete definition of PCM and related terminology can be found at [[Wikipedia:Pulse-code_modulation|Wikipedia]].
 == Why is it ==