OggSpots: Difference between revisions

From XiphWiki
Jump to navigation Jump to search
m (OggPNG moved to OggSpots)
(move from PNG to generic images)
Line 3: Line 3:
== Purpose ==
== Purpose ==


'''PNG''' is an open [http://www.w3.org/TR/PNG/ image compression format] that is used as the basis for a "timed image" codec in this specification. Recordings of seminars, lectures and presentations generally consist of slides and a very efficient representation of such a recording is as a stream of images with timing information plus the recorded audio underneath.
Recordings of seminars, lectures and presentations generally consist of slides plus an audio recording of the presentation. Slides are usually (if not animated) just a sequence of images. A very efficient representation of such a recording is then as a stream of images with timing information plus the recorded audio underneath.


This specification defines a format to describe a timed image track, including the presentation parameters, the input images and their timing. It then defines a logical bitstream format for encapsulating the images inside Ogg. When multiplexed together with one of the Xiph audio codecs such as [[Speex]], [[Vorbis]], [[FLAC]], and [[OggPCM2]], e.g. using [[OggSkeleton]], you end up with a video format that consists of timed images and audio.
This specification defines a format to describe a timed image track, including presentation parameters, input image formats and their timing. We encourage in particular the use of <b>PNG</b> as an open [http://www.w3.org/TR/PNG/ image compression format] or of <b>JPG</b>.
 
We define a logical bitstream format for encapsulating the images inside Ogg. When multiplexed together with one of the Xiph audio codecs such as [[Speex]], [[Vorbis]], [[FLAC]], and [[OggPCM2]], e.g. using [[OggSkeleton]], you end up with a video format that consists of timed images and audio.




== Timed Image Specification Format ==
== Timed Image Specification Format ==


The bitstream format this "codec" should be very simple. It should essentially consist only of a sequence of PNG images preceeded by a header with a simple set o fields to set up the decoding. If at all possible, we're trying to avoid giving parameters to the individual images since this will create an additional intermediate decoding step.
The bitstream format for this "codec" should be very simple. It should essentially consist only of a sequence of images preceeded by a header with a simple set of fields to set up the decoding. If at all possible, we will avoid encoding parameters for individual images since this will create an additional intermediate decoding step.


The following fields are under discussion:
The following fields are under discussion:
Line 16: Line 18:
* Image-Format
* Image-Format


This format can be generalised to cover all kinds of image formats, e.g. JPEG, GIF. We want to avoid stumbling over image formats that cannot be decoded by a client and therefore all images in such a stream need to be of the same format, which is specified here.
This will specify the image format, e.g. JPEG, GIF. We want to avoid players stumbling over image formats that they do not understand and therefore all images in such a stream need to be of the same format, given in this field.


* Display-Width, Display-Height
* Display-Width, Display-Height


While it is expected that most of the images in the data packets are of the same size (dimensions, geometry, resolution), variations may occur. A decoder should be given a resolution at which the images are to be presented. Aspect ratio must be kept when images are re-scaled.
While it is expected that most of the images in the data packets are of the same size (dimensions, geometry, resolution), variations may occur. A decoder should be given a resolution at which the images are to be presented. Aspect ratio must be kept when images are re-scaled.
NOTE: It may be interesting to keep smaller images at their original size and just put them in the screen centre?


* Rescaling
* Rescaling


While images that are larger than the display are defined by Display-Width and Display-Height must be scaled down to this size, this may not necessarily be desirable for smaller images. This paramter decides whether smaller images should be scaled up to meet the display size, or just be displayed inside it.
While images that are larger than the display area defined by Display-Width and Display-Height must be scaled down to this size, this may not necessarily be desirable for smaller images. This paramter decides whether smaller images should be scaled up to meet the display size, or just be displayed inside it.


* Align-Horizontal, Align-Vertical
* Align-Horizontal, Align-Vertical
Line 39: Line 39:




A rough format specification could be:
A rough way to author such information could be:


  Display-Width: 320
  Display-Width: 320
  Display-Height: 240
  Display-Height: 240
  Rescaling: False
  Rescaling: True
  Background-Colour: Grey
  Background-Colour: Grey
  npt:00:00:00.000 /my_slides/image_01.png
  npt:00:00:00.000 /my_slides/image_01.png
Line 55: Line 55:
The first step towards encapsulating the data into ogg is the definition of packets:
The first step towards encapsulating the data into ogg is the definition of packets:


* There is a OggPNG ident header, which is encapsulated in the bos page.
* There is a OggSpots ident header, which is encapsulated in the bos page.
* There is a secondary header packet containing the setup parameters, which is encapsulated in a separate page.
* There is a secondary header packet containing the setup parameters, which is encapsulated in a separate page.
* Each image is mapped into a data packet, which are each encoded in their own packet and inserted at the accurate time.
* Each image is mapped into a data packet, which are each encoded in their own packet and inserted at the accurate time.
Line 61: Line 61:




=== OggPNG ident header ===
=== OggSpots ident header ===


The timed PNG logical bitstream starts with an ident header which is mapped into the OggPNG bos page. The ident header contains all information required to identify the timed PNG bitstream and to set up a timed PNG decoder. It has the following format:
The timed Spots logical bitstream starts with an ident header which is mapped into the OggSpots bos page. The ident header contains all information required to identify the timed Spots bitstream and to set up a timed Spots decoder. It has the following format:


   0                  1                  2                  3
   0                  1                  2                  3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | Identifier 'PNGT\0\0\0\0'                                    | 0-3
  | Identifier 'SPOTS\0\0\0'                                    | 0-3
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                                                              | 4-7
  |                                                              | 4-7
Line 86: Line 86:
  | ...
  | ...


The OggPNG <i>version</i>  as described here is major=0 minor=1.
The OggSpots <i>version</i>  as described here is major=0 minor=1.


The <i>granulerate</i>  represents the temporal resolution of the logical bitstream in Hz given as a rational number in the same way as the [[OggSkeleton]] fisbone secondary header specifies granulerate. It enables a mapping of granule position of the data pages to time by calculating "granulepos / granulerate".
The <i>granulerate</i>  represents the temporal resolution of the logical bitstream in Hz given as a rational number in the same way as the [[OggSkeleton]] fisbone secondary header specifies granulerate. It enables a mapping of granule position of the data pages to time by calculating "granulepos / granulerate".


The default granule rate for OggPNG is: 1/30 (30 frames per second resolution).
The default granule rate for OggSpots is: 1/30 (30 frames per second resolution).


The <i>granuleshift</i>  is a 1 Byte integer number describing whether to partition the granule_position into two for the OggPNG logical bitstream, and how many of the lower bits to use for the partitioning.  The upper bits then still signify a time-continuous granule position for a directly decodable and presentable data granule.  The lower bits allow for specification of the granule position of a previous OggPNG data packet (i.e. image), which helps to identify how much backwards seeking is necessary to get to the last and still active image. The granuleshift is therefore the log of the maximum possible image spacing.
The <i>granuleshift</i>  is a 1 Byte integer number describing whether to partition the granule_position into two for the OggSpots logical bitstream, and how many of the lower bits to use for the partitioning.  The upper bits then still signify a time-continuous granule position for a directly decodable and presentable data granule.  The lower bits allow for specification of the granule position of a previous OggSpots data packet (i.e. image), which helps to identify how much backwards seeking is necessary to get to the last and still active image. The granuleshift is therefore the log of the maximum possible image spacing.


The default granule shift used is 32, which halfs the granule position to allow for the backwards pointer.
The default granule shift used is 32, which halfs the granule position to allow for the backwards pointer.




=== OggPNG secondary header ===
=== OggSpots secondary header ===


This header contains all the setup information for the decoder.  
This header contains all the setup information for the decoder.  
Line 104: Line 104:
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | Display width                | Display height                | 0-3
| Image-Format                                                  | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                              | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | Display width                | Display height                | 8-11
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | ...                                                          |
  | BG-Color      | Rescaling    | Align-Horiz  | Align-Vert    | 12-15
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+




=== OggPNG data ===
=== OggSpots data ===


The data packets are simple. Each data packet simply contains a PNG image.
The data packets are simple. Each data packet simply contains a complete image.


The insertion time (and therefore the granule_pos) is given through the specified time.
The insertion time (and therefore the granule_pos) is given through the specified time.

Revision as of 22:34, 20 February 2006


Purpose

Recordings of seminars, lectures and presentations generally consist of slides plus an audio recording of the presentation. Slides are usually (if not animated) just a sequence of images. A very efficient representation of such a recording is then as a stream of images with timing information plus the recorded audio underneath.

This specification defines a format to describe a timed image track, including presentation parameters, input image formats and their timing. We encourage in particular the use of PNG as an open image compression format or of JPG.

We define a logical bitstream format for encapsulating the images inside Ogg. When multiplexed together with one of the Xiph audio codecs such as Speex, Vorbis, FLAC, and OggPCM2, e.g. using OggSkeleton, you end up with a video format that consists of timed images and audio.


Timed Image Specification Format

The bitstream format for this "codec" should be very simple. It should essentially consist only of a sequence of images preceeded by a header with a simple set of fields to set up the decoding. If at all possible, we will avoid encoding parameters for individual images since this will create an additional intermediate decoding step.

The following fields are under discussion:

  • Image-Format

This will specify the image format, e.g. JPEG, GIF. We want to avoid players stumbling over image formats that they do not understand and therefore all images in such a stream need to be of the same format, given in this field.

  • Display-Width, Display-Height

While it is expected that most of the images in the data packets are of the same size (dimensions, geometry, resolution), variations may occur. A decoder should be given a resolution at which the images are to be presented. Aspect ratio must be kept when images are re-scaled.

  • Rescaling

While images that are larger than the display area defined by Display-Width and Display-Height must be scaled down to this size, this may not necessarily be desirable for smaller images. This paramter decides whether smaller images should be scaled up to meet the display size, or just be displayed inside it.

  • Align-Horizontal, Align-Vertical

Smaller images that are not rescaled to display size may be aligned at several different areas inside the larger display image:

    • Align-Horizontal: centre/left/right
    • Align-Vertical: centre/top/bottom
  • Background-Colour

For transparent images and for smaller, non-rescaled images, the background colour of the images has to be defined. This may be black, white, gray or whatever.


A rough way to author such information could be:

Display-Width: 320
Display-Height: 240
Rescaling: True
Background-Colour: Grey
npt:00:00:00.000 /my_slides/image_01.png
npt:00:02:10.000 /my_slides/image_02.png
npt:00:05:02.000 /my_slides/image_03.png
npt:00:06:50.000 /my_slides/image_04.png


Timed Images Mapping into Ogg

The first step towards encapsulating the data into ogg is the definition of packets:

  • There is a OggSpots ident header, which is encapsulated in the bos page.
  • There is a secondary header packet containing the setup parameters, which is encapsulated in a separate page.
  • Each image is mapped into a data packet, which are each encoded in their own packet and inserted at the accurate time.
  • The eos page is empty.


OggSpots ident header

The timed Spots logical bitstream starts with an ident header which is mapped into the OggSpots bos page. The ident header contains all information required to identify the timed Spots bitstream and to set up a timed Spots decoder. It has the following format:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier 'SPOTS\0\0\0'                                     | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version major                 | Version minor                 | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granulerate numerator                                         | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granulerate denominator                                       | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granuleshift  |                                                 28
+-+-+-+-+-+-+-+-+
| ...

The OggSpots version as described here is major=0 minor=1.

The granulerate represents the temporal resolution of the logical bitstream in Hz given as a rational number in the same way as the OggSkeleton fisbone secondary header specifies granulerate. It enables a mapping of granule position of the data pages to time by calculating "granulepos / granulerate".

The default granule rate for OggSpots is: 1/30 (30 frames per second resolution).

The granuleshift is a 1 Byte integer number describing whether to partition the granule_position into two for the OggSpots logical bitstream, and how many of the lower bits to use for the partitioning. The upper bits then still signify a time-continuous granule position for a directly decodable and presentable data granule. The lower bits allow for specification of the granule position of a previous OggSpots data packet (i.e. image), which helps to identify how much backwards seeking is necessary to get to the last and still active image. The granuleshift is therefore the log of the maximum possible image spacing.

The default granule shift used is 32, which halfs the granule position to allow for the backwards pointer.


OggSpots secondary header

This header contains all the setup information for the decoder.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Image-Format                                                  | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Display width                 | Display height                | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BG-Color      | Rescaling     | Align-Horiz   | Align-Vert    | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


OggSpots data

The data packets are simple. Each data packet simply contains a complete image.

The insertion time (and therefore the granule_pos) is given through the specified time.