Talk:OggPCM Draft1: Difference between revisions

Latest revision as of 06:09, 7 March 2006

Needs

As primarily an audio interchange codec, OggPCM should support all the capabilities of curret Ogg audio codecs and any feature we'll conceivably need in the near future. These should be supported in a way which is easy to implement.

Not all features need to be supported by all software, ie, support for more than two channels or 8-bit audio is not needed.

Current issues should be moved to the top.

Seperate fields or unified table?

This has been the most contested issue to date, and one which I believe has been solved in a mutually acceptable way, since the two are not mutually exclusive. A table, designed with values non-linearly, such as the value of bits within that table can be tested within the context of a simple flow chart, can be used to discover the format. Meanwhile, a table can be implemented as desired and be in full compatability since the flow chart only permits valid choices.

--Arc 00:48, 13 Nov 2005 (PST)

Are samples padded to some round number of bits?

I don't know of any PCM formats for non-octet based samples, but if you want to specify something, I'd say pack them into the MSB's of the next larger byte boundary, round toward zero, on a per channel basis. This should allow software that knows how to handle 16 bit audio but not 10 bit to operate on the data.

--Jkoleszar 11:48, 9 Nov 2005 (PST)

The occurrence of N bit PCM where N is not a multiple of 8 bits is so rare that it should probably be ignored. In addition, there really isn't any reason to treat 10 bit data packed into the 10 most significant bits of a 16 bit int any different from a real 16 bit value. So why make any distinction?

--Erikd

10-bit values have a range of -512 to +511. When you shift them up the range is -32768 to 32704, so they need scaling if you want them to have their proper range in a normalised system.

Precisions that aren't a multiple of 8 bit aren't at all rare, but they're normally rounded off to a multiple for compatibility.

--Gumboot 02:02, 10 Nov 2005 (PST)

Support for non-octet based sample sizes has been removed with the introduction of a data type table. We no longer need to worry about this topic.

--Arc 23:48, 10 Nov 2005 (PST)

A new table is being worked on which allows 10bit to 32bit signed int data along with or without padding to octet. Padding may be up to 8 bits, which allows (ie) 24-bit values to be padded to 32-bit words.

--Arc 00:48, 13 Nov 2005 (PST)

Do we want/need the 32-bit data packet header?

The issue was raised on the ogg-dev mailing list of wether this is necessary. With only a single header packet, it could be considered an unneeded complication, however, additional header packets (current or future) will make this a requirement.

--Arc

I can definitely see people wanting to use comment pages, so I'd say leave the header on the data pages as well. On the other hand, if ogg provides guarantees about the alignment of packet data from packetout, I could see getting rid of it since there are benefits to working on buffers aligned to larger boundaries on some architectures. As far as I can tell, either no guarantees are made, or you'll get a buffer aligned to a word boundary, in which case having the header has no penalty.

--Jkoleszar 11:48, 9 Nov 2005 (PST)

I believe that 64-bit platforms still use 32-bit memory space (I may be wrong!). Yes, libogg2 buffers should always begin on a 32-bit word boundary, so the beginning of the data should also be on a boundary. This was done intentionally, as was the choice to use a three letter codec identifier for raw codecs (since the packet ID + codec ID = 32bits this way), after an extended IRC discussion on the subject. If ending on a 64-bit boundary is something we're really worried about, we could always add 4 bytes, but I really don't think it should be necessary.

--Arc 13:11, 9 Nov 2005 (PST)

On UltraSparc and Alpha CPUs (both 64 bit) accessing a 64 bit double at an address that is not 8 byte aligned causes a segmentation fault. However, accessing unaligned doubles on x86 (ie 32 bit) is slower than accessing aligned doubles. You might want to consider this.

--Erikd

I cannot see why that data header is necessary. No other uncompressed audio format requires extra framing information, so I cannot see how future additional header fields would require to be added. It should be clear from the bos page how many samples go into a packet and thus this field is just complicating decoding with an extra parsing step IMHO.

--Silvia

This header is unnecessary. Ogg already provides packet framing, and the existing headers (BOS, comments) can be determined by sequence order. The BOS header already contains forwards compatability versioning for extra header fields. Even if new headers were to be created, they could be indicated by an 'extra_headers' field in the BOS header, as is done in Speex.

--Conrad

This issue remains one of few left contested, however, I believe that for uniformity with Vorbis and Theora, this is the correct method to identify packet types within the current version of the Ogg container.

--Arc 23:51, 10 Nov 2005 (PST)

As to wether we need it, we need a way to mark header packets from data packets, as we need a comment header to carry comments from decoded Vorbis/FLAC/etc (or to be encoded to Vorbis/FLAC/etc). Erik's comment re: 64-bit floats is one I'd like to highlight yellow. Pending a check on libogg2 to see what 64-bit alignment is available on which platforms, we could:
- Extend the data packet header to 64-bits, prehaps only with 64-bit data
- Have a packet0 field which specifies how many header packets there are, as Conrad suggested
- Have the last header packet of ID \xFF which marks the end of the headers

--Arc 01:10, 13 Nov 2005 (PST)

Signed/Unsigned data flag?

Not really. The data can be easily changed to signed as default losslessly. Unsigned 8-bit data (where 128 is the median) is easily changed to signed, and changed back if being saved as RIFF/WAV (which only supports unsigned 8-bit). However, it wouldn't hurt to support it. Applications can be built to support one or multiple formats, thus requesting conversion if not supported by the codec.

--Arc

I don't agree with that. It just puts more conditional code into packages that would normally have only one native format and it gives them more opportunity to fail to support variants of the format. If it's fixed then a few packages will always have to modify the data, and most will never get it wrong. If it's variable then every package will have to do something sometimes, or fail occasionally.

--Gumboot 01:28, 8 Nov 2005 (PST)

I see no reason to support any unsigned PCM format other than 8 bit. For instance, I know of no container format which supports unsigned 16 bit.

--Erikd

This issue has been resolved in the most recent Format draft; unsigned support is provided for 8bit samples only.

--Arc 23:39, 10 Nov 2005 (PST)

Int/Float data flag?

Some codecs (Vorbis) use floating point samples natively. Others only support int. Support for int/float data flag is thus important.

--Arc

Please don't make determination of the data format depend on multiple fields. Instead use an enumeration so that something like little endian 16 bit PCM can be specifed as OGG_PCM_LE_PCM_16 and big endian 16 bit doubles can be specified as OGG_PCM_BE_FLOAT_64. This scheme is far more transparent and self documenting. If the format field is 8 bits, this scheme supports 256 formats; if its 16 bit it will support 65536 formats.

I also suggest leaving the format associated with a value of zero as an invalid format. --Erikd

It would not support 256 formats. It would support the small set of formats that somebody bothered to define early on, and it would not be able to expand because many implementations would fail to follow the changing specification thereby forcing everybody to limit themselves to the initial set.

--Gumboot 02:08, 10 Nov 2005 (PST)

This issue has been resolved in the most recent Format draft; float support is provided for 32bit and 64bit samples only.

--Arc 23:40, 10 Nov 2005 (PST)

Endian data flag? If not, which is used?

LSB/MSB can be changed losslessly, one should probobally be settled on for the data and stick with it. It's a fairly low-CPU process to change the endian on the application side in any event, and if the application uses the bitpacker, this isn't even an issue. Supporting both is possible, too, but adds complexity to a format intended to be simple.

--Arc

We should just standardize on little endian ordering for the data. It's commonly used and well supported in hardware and software. Any cross architecture application that can deal WAV's will already know how to support it.

--Jkoleszar 11:48, 9 Nov 2005 (PST)

I agree that we should use little endian as standard, however, I'm questioning if big endian should be supported as well... after all, it'd be trivial for a plugin to convert from one to another.

--Arc 13:11, 9 Nov 2005 (PST)

Big and little endian data formats should both be supported with equal status. There should not even be a default; the endian-ness should be explicit.

--Erikd

This issue has been resolved in the most recent Format draft; an endian flag is provided seperate from the data format, though it will not effect 8bit sample types.

--Arc 23:42, 10 Nov 2005 (PST)

Vorbiscomment-style header?

It'd be useful to be able to carry information like what was decoded, or CDDB IDs, or replaygain information. Besides, if you don't put it in then five other people will do it five different ways.

--Arc

Agree

--Conrad

A comment header, identical to vorbis's comment header, has been added to the most recent draft format

--Arc 23:44, 10 Nov 2005 (PST)

How does one interpret a file where the Bits per Sample is neither 32 nor 64 and the Data Type is float?

One doesn't. Standardize on IEEE floats and be done with it. Simple, remember? :)

--Jkoleszar 11:48, 9 Nov 2005 (PST)

I'm uncertain exactly what this question is. Hopefully the submitter can clarify?

--Arc 13:11, 9 Nov 2005 (PST)

Many file formats (WAV, AIFF, AU and others) support 64 bit float data. WAV stores floats as little endian data and AIFF stores if as big endian data. OggPCM should support both 32 and 64 bit floats of both endian-nesses (is that a word?). I don't know of any other floating point format that needs consideration.

--Erikd

With the introduction of a data type lookup table for the most recent format, float types of neither 32bit or 64bit size is no longer available. If other sizes of float are needed they may be added in a future minor revision with an extended type.

--Arc 23:46, 10 Nov 2005 (PST)

@@ Line 1: / Line 1: @@
-'''Do we need signed/unsigned data flag?'''
+== Needs ==
+As primarily an audio interchange codec, '''OggPCM''' should support all the capabilities of curret Ogg audio codecs and any feature we'll conceivably need in the near future.  These should be supported in a way which is easy to implement.
-Not really.  The data can be easily changed to signed as default losslessly.  Unsigned 8-bit data (where 128 is the median) is easily changed to signed, and changed back if being saved as RIFF/WAV (which only supports unsigned 8-bit).
+Not all features need to be supported by all software, ie, support for more than two channels or 8-bit audio is not needed.
-However, it wouldn't hurt to support it.  Applications can be built to support one or multiple formats, thus requesting conversion if not supported by the codec.
+Current issues should be moved to the top.
-* I don't agree with that.  It just puts more conditional code into packages that would normally have only one native format and it gives them more opportunity to fail to support variants of the format. If it's fixed then a few packages will always have to modify the data, and most will never get it wrong. If it's variable then every package will have to do something sometimes, or fail occasionally. --[[User:Gumboot|Gumboot]] 01:28, 8 Nov 2005 (PST)
-'''Do we need to record int/float data flag?'''
-Some codecs (Vorbis) use floating point samples nativly.  Others only support int.  Support for int/float data flag is thus important.
+=== Seperate fields or unified table? ===
+* This has been the most contested issue to date, and one which I believe has been solved in a mutually acceptable way, since the two are not mutually exclusive.  A table, designed with values non-linearly, such as the value of bits within that table can be tested within the context of a simple flow chart, can be used to discover the format.  Meanwhile, a table can be implemented as desired and be in full compatability since the flow chart only permits valid choices.
+--[[User:Arc|Arc]] 00:48, 13 Nov 2005 (PST)
-'''Do we need to offer endian data flag?'''
+=== Are samples padded to some round number of bits? ===
+* I don't know of any PCM formats for non-octet based samples, but if you want to specify something, I'd say pack them into the MSB's of the next larger byte boundary, round toward zero, on a per channel basis. This should allow software that knows how to handle 16 bit audio but not 10 bit to operate on the data.
+--[[User:Jkoleszar|Jkoleszar]] 11:48, 9 Nov 2005 (PST)
-LSB/MSB can be changed losslessly, one should probobally be settled on for the data and stick with it.  It's a fainly low-CPU process to change the endian on the application side in any event, and if the application uses the bitpacker, this isn't even an issue.
+* The occurrence of N bit PCM where N is not a multiple of 8 bits is so rare that it should probably be ignored. In addition, there really isn't any reason to treat 10 bit data packed into the 10 most significant bits of a 16 bit int any different from a real 16 bit value. So why make any distinction?
+--[[Erikd|Erikd]]
-Supporting both is possible, too, but adds complexity to a format intended to be ''simple''.
+* 10-bit values have a range of -512 to +511.  When you shift them up the range is -32768 to 32704, so they need scaling if you want them to have their proper range in a normalised system.
+* Precisions that aren't a multiple of 8 bit aren't at all rare, but they're normally rounded off to a multiple for compatibility.
+--[[User:Gumboot|Gumboot]] 02:02, 10 Nov 2005 (PST)
+* Support for non-octet based sample sizes has been removed with the introduction of a data type table.  We no longer need to worry about this topic.
+--[[User:Arc|Arc]] 23:48, 10 Nov 2005 (PST)
-'''Which endian is it?'''
+* A new table is being worked on which allows 10bit to 32bit signed int data along with or without padding to octet.  Padding may be up to 8 bits, which allows (ie) 24-bit values to be padded to 32-bit words.
+--[[User:Arc|Arc]] 00:48, 13 Nov 2005 (PST)
-'''Is it worth supporting a vorbiscomment header?'''
-It'd be useful to be able to carry information like what was decoded, or CDDB IDs, or replaygain information.  Besides, if you don't put it in then five other people will do it five different ways.
+=== Do we want/need the 32-bit data packet header? ===
+* The issue was raised on the ogg-dev mailing list of wether this is necessary.  With only a single header packet, it could be considered an unneeded complication, however, additional header packets (current or future) will make this a requirement.
+--[[User:Arc|Arc]]
-'''How does one interpret a file where the Bits per Sample is neither 32 nor 64 and the Data Type is float?'''
+* I can definitely see people wanting to use comment pages, so I'd say leave the header on the data pages as well. On the other hand, if ogg provides guarantees about the alignment of packet data from packetout, I could see getting rid of it since there are benefits to working on buffers aligned to larger boundaries on some architectures. As far as I can tell, either no guarantees are made, or you'll get a buffer aligned to a word boundary, in which case having the header has no penalty.
+--[[User:Jkoleszar|Jkoleszar]] 11:48, 9 Nov 2005 (PST)
-'''Are samples padded to some round number of bits?'''
+* I believe that 64-bit platforms still use 32-bit memory space (I may be wrong!).  Yes, libogg2 buffers should always begin on a 32-bit word boundary, so the beginning of the data should also be on a boundary.  This was done intentionally, as was the choice to use a three letter codec identifier for raw codecs (since the packet ID + codec ID = 32bits this way), after an extended IRC discussion on the subject.  If ending on a 64-bit boundary is something we're really worried about, we could always add 4 bytes, but I really don't think it should be necessary.
+--[[User:Arc|Arc]] 13:11, 9 Nov 2005 (PST)
+* On UltraSparc and Alpha CPUs (both 64 bit) accessing a 64 bit double at an address that is not 8 byte aligned causes a segmentation fault. However, accessing unaligned doubles on x86 (ie 32 bit) is slower than accessing aligned doubles. You might want to consider this.
+--[[Erikd|Erikd]]
+* I cannot see why that data header is necessary. No other uncompressed audio format requires extra framing information, so I cannot see how future additional header fields would require to be added. It should be clear from the bos page how many samples go into a packet and thus this field is just complicating decoding with an extra parsing step IMHO.
+--[[User:Silvia|Silvia]]
+* This header is unnecessary. Ogg already provides packet framing, and the existing headers (BOS, comments) can be determined by sequence order. The BOS header already contains forwards compatability versioning for extra header fields. Even if new headers were to be created, they could be indicated by an 'extra_headers' field in the BOS header, as is done in Speex.
+--[[User:Conrad|Conrad]]
+* This issue remains one of few left contested, however, I believe that for uniformity with Vorbis and Theora, this is the correct method to identify packet types within the current version of the Ogg container.
+--[[User:Arc|Arc]] 23:51, 10 Nov 2005 (PST)
+* As to wether we need it, we need a way to mark header packets from data packets, as we need a comment header to carry comments from decoded Vorbis/FLAC/etc (or to be encoded to Vorbis/FLAC/etc).  Erik's comment re: 64-bit floats is one I'd like to highlight yellow.  Pending a check on libogg2 to see what 64-bit alignment is available on which platforms, we could:
+** Extend the data packet header to 64-bits, prehaps only with 64-bit data
+** Have a packet0 field which specifies how many header packets there are, as [[User:Conrad|Conrad]] suggested
+** Have the last header packet of ID \xFF which marks the end of the headers
+--[[User:Arc|Arc]] 01:10, 13 Nov 2005 (PST)
+=== Signed/Unsigned data flag? ===
+* Not really.  The data can be easily changed to signed as default losslessly.  Unsigned 8-bit data (where 128 is the median) is easily changed to signed, and changed back if being saved as RIFF/WAV (which only supports unsigned 8-bit).  However, it wouldn't hurt to support it.  Applications can be built to support one or multiple formats, thus requesting conversion if not supported by the codec.
+--[[User:Arc|Arc]]
+* I don't agree with that.  It just puts more conditional code into packages that would normally have only one native format and it gives them more opportunity to fail to support variants of the format. If it's fixed then a few packages will always have to modify the data, and most will never get it wrong. If it's variable then every package will have to do something sometimes, or fail occasionally.
+--[[User:Gumboot|Gumboot]] 01:28, 8 Nov 2005 (PST)
+* I see no reason to support any unsigned PCM format other than 8 bit. For instance, I know of no container format which supports unsigned 16 bit.
+--[[User:Erikd|Erikd]]
+* This issue has been resolved in the most recent [[OggPCM_Draft1#Format|Format]] draft; unsigned support is provided for 8bit samples only.
+--[[User:Arc|Arc]] 23:39, 10 Nov 2005 (PST)
+=== Int/Float data flag? ===
+* Some codecs (Vorbis) use floating point samples natively.  Others only support int.  Support for int/float data flag is thus important.
+--[[User:Arc|Arc]]
+* Please don't make determination of the data format depend on multiple fields. Instead use an enumeration so that something like little endian 16 bit PCM can be specifed as OGG_PCM_LE_PCM_16 and big endian 16 bit doubles can be specified as OGG_PCM_BE_FLOAT_64. This scheme is far more transparent and self documenting. If the format field is 8 bits, this scheme supports 256 formats; if its 16 bit it will support 65536 formats.
+I also suggest leaving the format associated with a value of zero as an invalid format.
+--[[Erikd|Erikd]]
+* It would ''not'' support 256 formats.  It would support the small set of formats that somebody bothered to define early on, and it would not be able to expand because many implementations would fail to follow the changing specification thereby forcing everybody to limit themselves to the initial set.
+--[[User:Gumboot|Gumboot]] 02:08, 10 Nov 2005 (PST)
+* This issue has been resolved in the most recent [[OggPCM_Draft1#Format|Format]] draft; float support is provided for 32bit and 64bit samples only.
+--[[User:Arc|Arc]] 23:40, 10 Nov 2005 (PST)
+=== Endian data flag?  If not, which is used? ===
+* LSB/MSB can be changed losslessly, one should probobally be settled on for the data and stick with it.  It's a fairly low-CPU process to change the endian on the application side in any event, and if the application uses the bitpacker, this isn't even an issue. Supporting both is possible, too, but adds complexity to a format intended to be ''simple''.
+--[[User:Arc|Arc]]
+* We should just standardize on little endian ordering for the data. It's commonly used and well supported in hardware and software. Any cross architecture application that can deal WAV's will already know how to support it.
+--[[User:Jkoleszar|Jkoleszar]] 11:48, 9 Nov 2005 (PST)
+* I agree that we should use little endian as standard, however, I'm questioning if big endian should be supported as well... after all, it'd be trivial for a plugin to convert from one to another.
+--[[User:Arc|Arc]] 13:11, 9 Nov 2005 (PST)
+* Big and little endian data formats should both be supported with equal status. There should not even be a default; the endian-ness should be explicit.
+--[[User:Erikd|Erikd]]
+* This issue has been resolved in the most recent [[OggPCM_Draft1#Format|Format]] draft; an endian flag is provided seperate from the data format, though it will not effect 8bit sample types.
+--[[User:Arc|Arc]] 23:42, 10 Nov 2005 (PST)
+=== Vorbiscomment-style header? ===
+* It'd be useful to be able to carry information like what was decoded, or CDDB IDs, or replaygain information.  Besides, if you don't put it in then five other people will do it five different ways.
+--[[User:Arc|Arc]]
+* Agree
+--[[User:Conrad|Conrad]]
+* A comment header, identical to vorbis's comment header, has been added to the most recent draft [[OggPCM_Draft1#Format|format]]
+--[[User:Arc|Arc]] 23:44, 10 Nov 2005 (PST)
+=== How does one interpret a file where the Bits per Sample is neither 32 nor 64 and the Data Type is float? ===
+* One doesn't. Standardize on IEEE floats and be done with it. Simple, remember? :)
+--[[User:Jkoleszar|Jkoleszar]] 11:48, 9 Nov 2005 (PST)
+* I'm uncertain exactly what this question is.  Hopefully the submitter can clarify?
+--[[User:Arc|Arc]] 13:11, 9 Nov 2005 (PST)
+* Many file formats (WAV, AIFF, AU and others) support 64 bit float data. WAV stores floats as little endian data and AIFF stores if as big endian data. OggPCM should support both 32 and 64 bit floats of both endian-nesses (is that a word?). I don't know of any other floating point format that needs consideration.
+--[[Erikd|Erikd]]
+* With the introduction of a data type lookup table for the most recent [[OggPCM_Draft1#Format|format]], float types of neither 32bit or 64bit size is no longer available.  If other sizes of float are needed they may be added in a future minor revision with an extended type.
+--[[User:Arc|Arc]] 23:46, 10 Nov 2005 (PST)