OggSpots: Difference between revisions
(started discussion on specification of codec) |
(added ident header) |
||
Line 1: | Line 1: | ||
{{draft}} | |||
'''PNG''' is an open [http://www.w3.org/TR/PNG/ image compression format] that is used as the basis for a "timed image" codec in this specification. Recordings of seminars, lectures and presentations generally consist of slides and a very efficient representation of such a recording is as a stream of images with timing information plus the recorded audio underneath. | '''PNG''' is an open [http://www.w3.org/TR/PNG/ image compression format] that is used as the basis for a "timed image" codec in this specification. Recordings of seminars, lectures and presentations generally consist of slides and a very efficient representation of such a recording is as a stream of images with timing information plus the recorded audio underneath. | ||
Line 38: | Line 40: | ||
== Timed Images Mapping into Ogg == | == Timed Images Mapping into Ogg == | ||
The first step towards encapsulating the data into ogg is the definition of packets: | |||
* There is a OggPNG ident header, which is encapsulated in the bos page. | |||
* There is a secondary header packet containing the setup parameters, which is encapsulated in a separage page. | |||
* Each image is mapped into a data packet, which are each encoded in their own packet and inserted at the accurate time. | |||
* The eos page is empty. | |||
=== OggPNG ident header === | |||
The timed PNG logical bitstream starts with an ident header which is mapped into the OggPNG bos page. The ident header contains all information required to identify the timed PNG bitstream and to set up a timed PNG decoder. It has the following format: | |||
0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Identifier 'PNGT\0\0\0\0' | 0-3 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | 4-7 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Version major | Version minor | 8-11 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Granulerate numerator | 12-15 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | 16-19 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Granulerate denominator | 20-23 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | 24-27 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Granuleshift | 28 | |||
+-+-+-+-+-+-+-+-+ | |||
| ... | |||
The OggPNG <i>version</i> as described here is major=0 minor=1. | |||
The <i>granulerate</i> represents the temporal resolution of the logical bitstream in Hz given as a rational number in the same way as the [[OggSkeleton]] fisbone secondary header specifies granulerate. It enables a mapping of granule position of the data pages to time by calculating "granulepos / granulerate". | |||
The default granule rate for OggPNG is: 1/30 (30 frames per second resolution). | |||
The <i>granuleshift</i> is a 1 Byte integer number describing whether to partition the granule_position into two for the OggPNG logical bitstream, and how many of the lower bits to use for the partitioning. The upper bits then still signify a time-continuous granule position for a directly decodable and presentable data granule. The lower bits allow for specification of the granule position of a previous OggPNG data packet (i.e. image), which helps to identify how much backwards seeking is necessary to get to the last and still active image. The granuleshift is therefore the log of the maximum possible image spacing. | |||
The default granule shift used is 32, which halfs the granule position to allow for the backwards pointer. |
Revision as of 20:33, 18 February 2006
PNG is an open image compression format that is used as the basis for a "timed image" codec in this specification. Recordings of seminars, lectures and presentations generally consist of slides and a very efficient representation of such a recording is as a stream of images with timing information plus the recorded audio underneath.
This specification defines a format to describe a timed image track, including the presentation parameters, the input images and their timing. It then defines a logical bitstream format for encapsulating the images inside Ogg. When multiplexed together with one of the Xiph audio codecs such as Speex, Vorbis, FLAC, and OggPCM2, e.g. using OggSkeleton, you end up with a video format that consists of timed images and audio.
Timed Image Specification Format
An authoring format for specifying "timed images" has to be defined.
One option is a plain text format. Something along the lines of:
Display-Width: 320 Display-Height: 240 [newline] npt:00:00:00.000 /my_slides/image_01.png npt:00:02:10.000 /my_slides/image_02.png npt:00:05:02.000 /my_slides/image_03.png npt:00:06:50.000 /my_slides/image_04.png
could be a simple solution.
A different option is to use a XML based format, something like a "timed image codec":
<tic> <head> <param name="Display-Width" value="320"/> <param name="Display-Height" value="240"/> <param name="Image-Format" value="image/png"/> </head> <clip start="npt:00:00:00" src="/my_slides/image_01.png"/> <clip start="npt:02:10:00" src="/my_slides/image_02.png"/> <clip start="npt:05:02:00" src="/my_slides/image_03.png"/> <clip start="npt:06:50:00" src="/my_slides/image_04.png"/> </tic>
Advantages/disadvantages of these option needs to be discussed.
Timed Images Mapping into Ogg
The first step towards encapsulating the data into ogg is the definition of packets:
- There is a OggPNG ident header, which is encapsulated in the bos page.
- There is a secondary header packet containing the setup parameters, which is encapsulated in a separage page.
- Each image is mapped into a data packet, which are each encoded in their own packet and inserted at the accurate time.
- The eos page is empty.
OggPNG ident header
The timed PNG logical bitstream starts with an ident header which is mapped into the OggPNG bos page. The ident header contains all information required to identify the timed PNG bitstream and to set up a timed PNG decoder. It has the following format:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identifier 'PNGT\0\0\0\0' | 0-3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 4-7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version major | Version minor | 8-11 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Granulerate numerator | 12-15 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 16-19 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Granulerate denominator | 20-23 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 24-27 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Granuleshift | 28 +-+-+-+-+-+-+-+-+ | ...
The OggPNG version as described here is major=0 minor=1.
The granulerate represents the temporal resolution of the logical bitstream in Hz given as a rational number in the same way as the OggSkeleton fisbone secondary header specifies granulerate. It enables a mapping of granule position of the data pages to time by calculating "granulepos / granulerate".
The default granule rate for OggPNG is: 1/30 (30 frames per second resolution).
The granuleshift is a 1 Byte integer number describing whether to partition the granule_position into two for the OggPNG logical bitstream, and how many of the lower bits to use for the partitioning. The upper bits then still signify a time-continuous granule position for a directly decodable and presentable data granule. The lower bits allow for specification of the granule position of a previous OggPNG data packet (i.e. image), which helps to identify how much backwards seeking is necessary to get to the last and still active image. The granuleshift is therefore the log of the maximum possible image spacing.
The default granule shift used is 32, which halfs the granule position to allow for the backwards pointer.