Changes

Jump to: navigation, search

OggPCM Draft1

4,999 bytes added, 12:22, 10 November 2007
{{historical}}
{{historical}}
{{draft}}
'''This is the original OggPCM draft. After a [http://lists.xiph.org/pipermail/ogg-dev/2005-November/thread.html heated debate], most developers have now moved to [[OggPCM2]]'''
 
== What is it ==
'''OggPCM''' is a pulse-code modulation (PCM) audio codec for Ogg. Similar to Microsoft's .wav or Apple's .aiff formats, it's a simple way to store and transfer uncompressed audio within an Ogg container.For the purposes of this document, the term PCM is used to describe a digital representation of an audio signal, where volume samples are taken at regular uniform intervals and then quantized into a digital (usually binary) code. A more complete definition of PCM and related terminology can be found at [[Wikipedia:Pulse-code_modulation|Wikipedia]].
== Why is it ==
The intention for this format is as an interchange format, for example for use with [[OggStream]]. It is also useful for storing time-synced decoded audio/video, as opposed to using RIFF/WAV (.wav) and YUV4MPEG (.yuv) in separate files as was done during [[Theora]] development. It is also intended to be less complex to use than either .wav (RIFF) or .aiff (AIFF), both of these formats being designed for generic multimedia (audio, video, etc). Full compatability with these formats includes support for non-PCM data. Using raw PCM data, on the other hand, doesn't give that all-important header which carries information about the number of channels, sample width, and sample frequency. So what is needed is a header followed by raw PCM data - nothing more complicated.
== Format Stream Description ==A stream is composed of a header packet, zero or more comment packets, and one or more data packets. Data packets may be of variable length, including zero. The only valid use of a zero length data packet is to mark the end of stream. Data packets must contain samples for all channels. That is to say, the length of a data packet must be a multiple of the number of channels times the storage size of a single sample. For instance, for a stream containing 6 channels at 2 byte per channel, the length of the data packet must be a multiple of 12 bytes.
'' This The degenerate stream is a single header packet followed by the current working draftraw data packets. While this degenerate stream is not incredibly useful for long term storage or as a general purpose container, a compromise between it is useful for applications where other data describing the stream is available out of band, for instance amongst cooperating applications in an inter-process communication scheme. Streams providing the different promposed elements needed ''extra defined comment packets are intended to be useful for long term storage and communication amongst diverse applications.
Packets == Packet Format ==Header and comment packets are processed as per the value of their first byte. Packets of unknown ID should be silently ignored, providing a convient way to add future expandability which does not break the data format. Multibyte fields in An example of how this can be useful is the header packet are packed in big endian orderproposed ReplayGain extension to .wav format: http://replaygain. Other fields are stored MSB firsthydrogenaudio. Multibyte fields in the data packet are packed in little endian orderorg/file_format_wav.html
The granule position specified is header packet contains a field indicating the total samples encoded after including all samples on number of comment packets preceding the pageraw data. Samples Applications must not be split across pages. The rationale here is that the position specified either parse or skip exactly this many packets, in addition to the frame header of the last page tells how long packet, before treating the stream as raw data coded by the bitstream is. A truncated stream will still return the proper number of samples that can be decoded fully.
An example of how this can be useful is === Header Packet ===Multibyte fields in the proposed ReplayGain extension header packets are packed in big endian order, to be consistent with network byte order.wav formatA header packet contains the following fields: http://replaygain.hydrogenaudio.org/file_format_wav.html
Note that no such extension is planned, nor is the need for a future format forseen, but history has shown that even the most basic formats eventually become obsolete.  Packet 0, BOS, 12 16 bytes
8 0x00 Stream Header Packet ID
24 "PCM" Codec identifier
-
8 0x01 Version Major (breaks backwards compatability to increment)
8 0x00 Version Minor (backwards compatable, ie, via extended headermore supported format id's) 8 [intuint] Number of Channels (1-256)header packets preceding data 1 8 [flguint] False = MSBNumber of Channels, True 0 = LSB256 3 - 16 [intflag] Flags PCM Data Type (see table below) 4 16 [nilenum] Padding to byte, may be used in later minor versionPCM Format ID
-
32 [intuint] Sample Rate The flags field is defined as follows: Bit Description 15 (MSB) Samplerate (Interleaved/Chunked - If set, data in the packets is "chunked" by channel. In a data packet containing 3 channels and 2 samples/secondchannel, the chunked storage order would be 001122. For the interleaved storage format (default), the order would be 012012. others Reserved
Applications conforming to version 1.0 of this spec MUST:<ul>
<li>set all reserved flags to false (zero) when creating these streams.</li>
<li>preserve all values of all reserved flags when reading or modifying these streams, unless the application sets the minor version field to zero, in which case the reserved flags must be set to false as well.</li>
</ul>
 
=== Comment Packets ===
At this time, there is only one defined comment packet.
Comment Header Packet
8 0x03 0x01 Comment Header Packet ID
24 "PCM" Codec Identifier
-- Continues as [[http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#vorbis-spec-comment|Vorbis's Comment Header]]
Data Packet 8 0xFF Data Packet ID 24 "PCM" Codec identifier, pads data to 32-bits .. [data] variable length pcm data  PCM Data Type ==========Data Packets === ID# Bits Type 0 8 signed (char) 1 8 unsigned (char) 2 16 signed (short int) 3 24 signed (int + 8bit padding) 4 32 signed (int) 5 32 float (float) 6 64 float (double) 7 ? Extend - unsupported by 1Data packets have no header word.0-only software == Alternative Format == ''This format was written by [[User:Jkoleszar|Jkoleszar]], and has since been combined with other ideas into the primary format (above)'' It is intended done to support channels from preserve the same source having different sampling parameters. '''Packet structure'''  Packet 0, BOS, tbd bytes 8 0x00 Header Packet ID 24 "PCM" Codec identifier - 8 0x01 Version Major (breaks backwards compatability to increment) 8 0x00 Version Minor (backwards compatable, ie, via extended header) 8 [uint] Source ID (Unique amongst all OggPCM streams in the physical stream) 8 [uint] Channel Block - 16 [bitfield] Indicates which alignment of the 16 channels in this channel block are present in this logical OGGPCM stream. 8 [enum] Sample format (OGGPCM_FMT_U8, OGGPCM_FMT_LE_S16, OGGPCM_FMT_BE_S16, etc) 24 [uint] Sample rate ** this field crosses a 32bit-word barrier **   Data Packet 8 0xFF Data Packet ID 24 "PCM" Codec identifier, pads data to 32-bits payload.. [The contents of the data] variable length pcm data, packing defined packets are specified by Sample Format field in header a combination of the '''Sample PCM FormatID'field and the 'Flags OGG_PCM_S8 = 0x1 /* Signed 8 bitfield. */ OGG_PCM_S16 = 0x2 OGG_PCM_S24 = 0x3 OGG_PCM_S32 = 0x4 OGG_PCM_U8 = 0x5 /* Unsigned 8 bit */ OGG_PCM_FLOAT32 = 0x6 OGG_PCM_FLOAT64 = 0x7 '''Encapsulation in Ogg''' Ogg provides encapsulation The length of the data in packets, which may packet must be marked with a granulepos. The granulepos multiple of an Ogg packet indicates the presentation time number of the last presentable element channels specified in the packet; for audioheader, this corresponds to and the timestamp storage size of the last audio frame. Following standard terminology for uncompressed audio, an audio frame is the collection of samples for all channels for a single sampling period. For examplesample, an audio frame for a stereo signal is a pair of sample values for as specified by the left and right channels. An OggPCM packet MUST NOT be constructed with a partial frame; ie. an audio frame must not span two Ogg packets'PCM Format ID' field.
=== Supported PCM Formats ===Formats are identified within a header packet by a 16 bit "format type" field. Whilemost applications will treat this as an opaque type, it is possible to discern someinformation about the format from the value of this field itself. Specifically, theformat'''Constraints'''s storage size, in bytes, and its byte ordering, can be discerned by parsingthe lower 6 bits of the value. These values are exposed so that it is possible toextract individual samples without necessarily understanding the coding scheme involved.While for pratical purposes, due to performance concerns, most applications willchoose to operate on a buffer directly, it is nonetheless possible to work a sampleat a time.
This format Binary Value Meaning ..xxxx00 N/A, or data not accurately described by this scheme. ..xxxx01 Least significant byte first. Bytes are MS bit first. ..xxxx10 Most significant byte first. Bytes are MS bit first. ..xxxx11 Data is machine endian ..0000xx Data can support only specified not be described by this bytepacking scheme. ..0001xx Samples are stored using one byte per sample formats . Each logical stream can support up to 16 channels sharing a fixed .0010xx Samples are stored using two bytes per sample ..0011xx Samples are stored using three bytes per sample rate .. Logical streams from the same source may be multiplexed to provide up to 4096 channels 0100xx Samples are stored using four bytes per source, each with their own sample rate . Up to 256 Sources may be multiplexed within a physical Ogg stream, unless an application takes other measures to logically partition the stream. 1000xx Samples are stored using eight bytes per sample
'''Discussion'''The remaining 10 bits describe the coding scheme used to convert the digital valueto an audio signal. The following formats are defined for version 1.0 of thisformat. For purposes of attribution, it should be noted that these formats are thePCM formats supported by the Advanced Linux Sound Architecture (ALSA) project, andshould be fairly comprehensive.
This seems to make it easy to support the simple/normal cases and possible to support the pathological cases Format ID Short Name Description -- Signed integer coding (0) 0x0004 OGGPCM_FMT_S8 Signed integer 8 bit 0x0009 OGGPCM_FMT_S16_LE Signed integer 16 bit little endian 0x000A OGGPCM_FMT_S16_BE Signed integer 16 bit big endian 0x000B OGGPCM_FMT_S16 Signed integer 16 bit machine endian 0x000D OGGPCM_FMT_S24_3LE Signed integer 24 bit little endian 0x000E OGGPCM_FMT_S24_3BE Signed integer 24 bit big endian 0x0011 OGGPCM_FMT_S32_LE Signed integer 32 bit little endian 0x0012 OGGPCM_FMT_S32_BE Signed integer 32 bit big endian 0x0013 OGGPCM_FMT_S32 Signed integer 32 bit machine endian -- -- Unsigned integer coding (1) 0x0044 OGGPCM_FMT_U8 Unsigned integer 8 bit 0x0049 OGGPCM_FMT_U16_LE Unsigned integer 16 bit little endian 0x004A OGGPCM_FMT_U16_BE Unsigned integer 16 bit big endian 0x004B OGGPCM_FMT_U16 Unsigned integer 16 bit machine endian 0x004D OGGPCM_FMT_U24_3LE Unsigned integer 24 bit little endian 0x004E OGGPCM_FMT_U24_3BE Unsigned integer 24 bit big endian 0x0051 OGGPCM_FMT_U32_LE Unsigned integer 32 bit little endian 0x0052 OGGPCM_FMT_U32_BE Unsigned integer 32 bit big endian 0x0053 OGGPCM_FMT_U32 Unsigned integer 32 bit machine endian -- -- IEEE Floating point coding (2) 0x0091 OGGPCM_FMT_FLT_LE IEEE Float (-1,1) 32 bit little endian 0x0092 OGGPCM_FMT_FLT_BE IEEE Float (-1,1) 32 bit big endian 0x0093 OGGPCM_FMT_FLT IEEE Float (-1,1) 32 bit machine endian 0x00A1 OGGPCM_FMT_FLT64_LE IEEE Float (-1,1) 64 bit little endian 0x00A2 OGGPCM_FMT_FLT64_BE IEEE Float (-1, for instance:1) 64 bit big endian{| border=" 0x00A3 OGGPCM_FMT_FLT64 IEEE Float (-1" cellpadding=",1") 64 bit machine endian -- -- IEC958 coding (?) (3) 0x00CD OGGPCM_FMT_IEC958_3LE IEC958 Subframe, 24 bit little endian 0x00CE OGGPCM_FMT_IEC958_3BE IEC958 Subframe, 24 bit big endian 0x00D1 OGGPCM_FMT_IEC958_LE IEC958 Subframe, 32 bit little endian 0x00D2 OGGPCM_FMT_IEC958_BE IEC958 Subframe, 32 bit big endian 0x00D3 OGGPCM_FMT_IEC958 IEC965 Subframe, 32 bit machine endian -- -- Mu-Law coding (4) 0x0104 OGGPCM_FMT_MU_LAW Mu-Law -- -- A-Law coding (5) 0x0144 OGGPCM_FMT_A_LAW A-Law -- -- ADPCM coding (6) 0x0180 OGGPCM_FMT_ADPCM Ima-ADPCM -- -- GSM coding (7) 0x01C0 OGGPCM_FMT_GSM GSM -- -- 24 bit signed integer in 32 bit storage (8)| Source ID || Channel Bitfield || Sample Rate || Sample Format || Comment 0x0211 OGGPCM_FMT_S24_LE Signed integer 24 bit little endian 0x0212 OGGPCM_FMT_S24_BE Signed integer 24 bit big endian 0x0213 OGGPCM_FMT_S24 Signed integer 24 bit machine endian -- -- 24 bit unsigned integer in 32 bit storage (9) 0x0251 OGGPCM_FMT_U24_LE Unsigned integer 24 bit little endian 0x0252 OGGPCM_FMT_U24_BE Unsigned integer 24 bit big endian 0x0253 OGGPCM_FMT_U24 Unsigned integer 24 bit machine endian --| --20 bit signed integer in 24 bit storage (10) 0x028D OGGPCM_FMT_S20_3LE Signed integer 20 bit little endian 0x028E OGGPCM_FMT_S20_3BE Signed integer 20 bit big endian| 0x00 || 0000 0000 0000 0011 || 96000 || OGGPCM_FMT_LE_S24 || Front Stereo Pair --| --20 bit unsigned integer in 24 bit storage (11) 0x02CD OGGPCM_FMT_U20_3LE Unsigned integer 20 bit little endian| 0x00 || 0000 0000 0011 1100 || 44100 || OGGPCM_FMT_LE_S16 || Center And Surrounds 0x02CE OGGPCM_FMT_U20_3BE Unsigned integer 20 bit big endian| --| 0x00 || 0000 0000 0010 0000 || 8000 || OGGPCM_FMT_LE_S16 || LFE Channel -- 18 bit signed integer in 24 bit storage (12) 0x030D OGGPCM_FMT_S18_3LE Signed integer 18 bit little endian 0x030E OGGPCM_FMT_S18_3BE Signed integer 18 bit big endian| --| 0x01 || 0000 0000 0000 0001 || 8000 || OGGPCM_FMT_U8 || PC Speaker -- 18 bit unsigned integer in 24 bit storage (13) 0x034D OGGPCM_FMT_U18_3LE Unsigned integer 18 bit little endian 0x034E OGGPCM_FMT_U18_3BE Unsigned integer 18 bit big endian| -- Other coding schemes supported by ALSA but not specified here:| 0x02 || 0000 0000 0000 0001 || 8000 || OGGPCM_FMT_U8 || Microphone MPEG| --| 0x03 || 0000 0000 0000 0011 || 8000 || OGGPCM_FMT_LE_S16 || Voice Chat TODO: ADPCM and GSM need further specification (or elimination) since these aren't really byte packed like the other formats here are. |}== Encapsulation in Ogg ==Following standard terminology for uncompressed audio, an audio frame is the collection of samples for all channels for a single sampling period. For example, an audio frame for a stereo signal is a pair of sample values for the left and right channels.
Each entry The granulepos of an Ogg page indicates the presentation time of the last presentable element in the table last complete packet within that page; for '''OggPCM''', a granule is an audio frame. The granule position specified is the total audio frames in the stream including the last complete packet in a logical Ogg streampage. Audio frames must not be split across packets. [[User:Jkoleszar|Jkoleszar]] The rationale here is not convinced that the source id and channel block are necessary, but figured he'd throw it out thereposition specified in the frame header of the last page tells how long the data coded by the bitstream is in samples as well as provides the current stream position to seeking routines. A truncated stream will still return the proper number of audio frames that can be decoded fully.

Navigation menu