OggSpots: Difference between revisions

From XiphWiki
Jump to navigation Jump to search
(added ident header)
No edit summary
 
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{draft}}
{{draft}}


'''PNG''' is an open [http://www.w3.org/TR/PNG/ image compression format] that is used as the basis for a "timed image" codec in this specification. Recordings of seminars, lectures and presentations generally consist of slides and a very efficient representation of such a recording is as a stream of images with timing information plus the recorded audio underneath.
== Purpose ==


This specification defines a format to describe a timed image track, including the presentation parameters, the input images and their timing. It then defines a logical bitstream format for encapsulating the images inside Ogg. When multiplexed together with one of the Xiph audio codecs such as [[Speex]], [[Vorbis]], [[FLAC]], and [[OggPCM2]], e.g. using [[OggSkeleton]], you end up with a video format that consists of timed images and audio.
Recordings of seminars, lectures and presentations generally consist of slides plus an audio recording of the presentation. Slides are usually (if not animated) just a sequence of images. A very efficient representation of such a recording is as a stream of images with timing information plus the recorded audio underneath.


This specification defines a format to describe a timed image track, including presentation parameters, input image formats and their timing. We encourage in particular the use of the open compression formats [http://www.w3.org/TR/PNG/ PNG] and [http://www.jpeg.org/ JPEG] for lossless and lossy image compression respectively.
We define a logical bitstream format for encapsulating the images inside Ogg. When multiplexed together with one of the Xiph audio codecs such as [[Speex]], [[Vorbis]], [[FLAC]], or [[OggPCM]], e.g. using [[Ogg Skeleton]], you end up with a video format that consists of timed images and audio.


== Timed Image Specification Format ==
== Timed Image Specification Format ==


An authoring format for specifying "timed images" has to be defined.
The bitstream format for this "codec" should be very simple. It should essentially consist only of a sequence of images preceeded by a header with a simple set of fields to set up the decoding.
 
One option is a plain text format. Something along the lines of:
 
Display-Width: 320
Display-Height: 240
[newline]
npt:00:00:00.000 /my_slides/image_01.png
npt:00:02:10.000 /my_slides/image_02.png
npt:00:05:02.000 /my_slides/image_03.png
npt:00:06:50.000 /my_slides/image_04.png
 
could be a simple solution.


A different option is to use a XML based format, something like a "timed image codec":
<tic>
  <head>
    <param name="Display-Width"  value="320"/>
    <param name="Display-Height" value="240"/>
    <param name="Image-Format" value="image/png"/>
  </head>
  <clip start="npt:00:00:00" src="/my_slides/image_01.png"/>
  <clip start="npt:02:10:00" src="/my_slides/image_02.png"/>
  <clip start="npt:05:02:00" src="/my_slides/image_03.png"/>
  <clip start="npt:06:50:00" src="/my_slides/image_04.png"/>
</tic>
Advantages/disadvantages of these option needs to be discussed.




Line 43: Line 19:
The first step towards encapsulating the data into ogg is the definition of packets:
The first step towards encapsulating the data into ogg is the definition of packets:


* There is a OggPNG ident header, which is encapsulated in the bos page.
* There is a OggSpots ident header with setup parameters, which is encapsulated in the bos page.
* There is a secondary header packet containing the setup parameters, which is encapsulated in a separage page.
* Each image is mapped into a data packet, which are each encoded in their own packet and inserted at the accurate time.
* Each image is mapped into a data packet, which are each encoded in their own packet and inserted at the accurate time.
* The eos page is empty.
* The eos page is empty.




=== OggPNG ident header ===
=== OggSpots ident header ===


The timed PNG logical bitstream starts with an ident header which is mapped into the OggPNG bos page. The ident header contains all information required to identify the timed PNG bitstream and to set up a timed PNG decoder. It has the following format:
The timed Spots logical bitstream starts with an ident header which is mapped into the OggSpots bos page. The ident header contains all information required to identify the timed Spots bitstream and to set up a timed Spots decoder. It has the following format:


   0                  1                  2                  3
   0                  1                  2                  3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | Identifier 'PNGT\0\0\0\0'                                     | 0-3
  | Identifier 'SPOTS\0\0\0'                                     | 0-3
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                                                              | 4-7
  |                                                              | 4-7
Line 70: Line 45:
  |                                                              | 24-27
  |                                                              | 24-27
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | Granuleshift  |                                                 28
  | Granuleshift  | RESERVED FOR LATER USE                        | 28-31
  +-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | ...
| Image-Format                                                  | 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                              | 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Display width                | Display height                | 40-43
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | BG-Color                                                      | 44-47
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Align-Horiz  | Align-Vert    | Options                      | 48-51
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


The OggPNG <i>version</i>  as described here is major=0 minor=1.


The <i>granulerate</i> represents the temporal resolution of the logical bitstream in Hz given as a rational number in the same way as the [[OggSkeleton]] fisbone secondary header specifies granulerate. It enables a mapping of granule position of the data pages to time by calculating "granulepos / granulerate".
* Version:
The OggSpots version as described here is major=0 minor=1.


The default granule rate for OggPNG is: 1/30 (30 frames per second resolution).
* Granulerate & Granuleshift:
The granulerate  represents the temporal resolution of the logical bitstream in Hz given as a rational number in the same way as the [[OggSkeleton]] fisbone secondary header specifies granulerate. It enables a mapping of granule position of the data pages to time by calculating "granulepos / granulerate".


The <i>granuleshift</i>  is a 1 Byte integer number describing whether to partition the granule_position into two for the OggPNG logical bitstream, and how many of the lower bits to use for the partitioning.  The upper bits then still signify a time-continuous granule position for a directly decodable and presentable data granule.  The lower bits allow for specification of the granule position of a previous OggPNG data packet (i.e. image), which helps to identify how much backwards seeking is necessary to get to the last and still active image. The granuleshift is therefore the log of the maximum possible image spacing.
The default granule rate for OggSpots is: 1/30 (30 frames per second resolution).
 
The granuleshift is a 1 Byte integer number describing whether to partition the granule_position into two for the OggSpots logical bitstream, and how many of the lower bits to use for the partitioning.  The upper bits then still signify a time-continuous granule position for a directly decodable and presentable data granule.  The lower bits allow for specification of the granule position of a previous OggSpots data packet (i.e. image), which helps to identify how much backwards seeking is necessary to get to the last and still active image. The granuleshift is therefore the log of the maximum possible image spacing.


The default granule shift used is 32, which halfs the granule position to allow for the backwards pointer.
The default granule shift used is 32, which halfs the granule position to allow for the backwards pointer.
* Image-Format:
The image format specifies e.g. JPEG, GIF. We want to avoid players stumbling over image formats that they do not understand and therefore all image formats used in an OggSpots logical bitstream need to be provided in the bos page.
* Display-Width, Display-Height:
While it is expected that most of the images in the data packets are of the same size (dimensions, geometry, resolution), variations may occur. These fields provide a decoder with a resolution at which the images are to be presented. If images need to be re-scaled, aspect ratio must be kept.
* Background-Colour:
For transparent images and for smaller, non-rescaled images, the background colour of the images has to be defined. This may be black, white, gray or whatever. This is a default setting which may be overruled by the specific image.
* Align-Horizontal, Align-Vertical:
Smaller images that are not rescaled to display size may be aligned at several different areas inside the larger display image:
* Align-Horizontal: centre/left/right
* Align-Vertical: centre/top/bottom
This is a default setting which may be ruled over by the specific image through its own parameters.
* Options:
* Upscaling: This paramter decides whether smaller images should be scaled up to meet the display size, or just be displayed inside it. This is a default setting and can be ruled over by the specific image through its own parameters.
* Downscaling: This paramter decides whether larger images should be scaled down to meet the display size, or just be truncated. This is a default setting and can be ruled over by the specific image through its own parameters.
=== OggSpots data ===
Each data packet contains a byte offset to where the complete image is stored, a set of parameters to describe what to do with the image, and the image itself.
The insertion time is encoded in the granule_pos of the Ogg Page that the image ends on.
  0                  1                  2                  3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Byte Offset                                                  | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Image-Format                                                  | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                              | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Display width                | Display height                | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BG-Color      | Rescaling    | Align-Horiz  | Align-Vert    | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
== Authoring Considerations ==
While it is possible to author an OggSpots video through command-line parameters by handing over a set of images and setting parameters, there should also be a means to author it through files.
A rough way to author such information could be:
Display-Width: 320
Display-Height: 240
Rescaling: True
Background-Colour: Grey
...further parameters...
npt:00:00:00.000 /my_slides/image_01.png
npt:00:02:10.000 /my_slides/image_02.png
npt:00:05:02.000 /my_slides/image_03.png
npt:00:06:50.000 /my_slides/image_04.png
[[Category:Ogg Mappings]]

Latest revision as of 15:27, 15 February 2008


Purpose

Recordings of seminars, lectures and presentations generally consist of slides plus an audio recording of the presentation. Slides are usually (if not animated) just a sequence of images. A very efficient representation of such a recording is as a stream of images with timing information plus the recorded audio underneath.

This specification defines a format to describe a timed image track, including presentation parameters, input image formats and their timing. We encourage in particular the use of the open compression formats PNG and JPEG for lossless and lossy image compression respectively.

We define a logical bitstream format for encapsulating the images inside Ogg. When multiplexed together with one of the Xiph audio codecs such as Speex, Vorbis, FLAC, or OggPCM, e.g. using Ogg Skeleton, you end up with a video format that consists of timed images and audio.

Timed Image Specification Format

The bitstream format for this "codec" should be very simple. It should essentially consist only of a sequence of images preceeded by a header with a simple set of fields to set up the decoding.


Timed Images Mapping into Ogg

The first step towards encapsulating the data into ogg is the definition of packets:

  • There is a OggSpots ident header with setup parameters, which is encapsulated in the bos page.
  • Each image is mapped into a data packet, which are each encoded in their own packet and inserted at the accurate time.
  • The eos page is empty.


OggSpots ident header

The timed Spots logical bitstream starts with an ident header which is mapped into the OggSpots bos page. The ident header contains all information required to identify the timed Spots bitstream and to set up a timed Spots decoder. It has the following format:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier 'SPOTS\0\0\0'                                      | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version major                 | Version minor                 | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granulerate numerator                                         | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granulerate denominator                                       | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granuleshift  | RESERVED FOR LATER USE                        | 28-31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Image-Format                                                  | 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               | 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Display width                 | Display height                | 40-43
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BG-Color                                                      | 44-47
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Align-Horiz   | Align-Vert    | Options                       | 48-51
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


  • Version:

The OggSpots version as described here is major=0 minor=1.

  • Granulerate & Granuleshift:

The granulerate represents the temporal resolution of the logical bitstream in Hz given as a rational number in the same way as the OggSkeleton fisbone secondary header specifies granulerate. It enables a mapping of granule position of the data pages to time by calculating "granulepos / granulerate".

The default granule rate for OggSpots is: 1/30 (30 frames per second resolution).

The granuleshift is a 1 Byte integer number describing whether to partition the granule_position into two for the OggSpots logical bitstream, and how many of the lower bits to use for the partitioning. The upper bits then still signify a time-continuous granule position for a directly decodable and presentable data granule. The lower bits allow for specification of the granule position of a previous OggSpots data packet (i.e. image), which helps to identify how much backwards seeking is necessary to get to the last and still active image. The granuleshift is therefore the log of the maximum possible image spacing.

The default granule shift used is 32, which halfs the granule position to allow for the backwards pointer.

  • Image-Format:

The image format specifies e.g. JPEG, GIF. We want to avoid players stumbling over image formats that they do not understand and therefore all image formats used in an OggSpots logical bitstream need to be provided in the bos page.

  • Display-Width, Display-Height:

While it is expected that most of the images in the data packets are of the same size (dimensions, geometry, resolution), variations may occur. These fields provide a decoder with a resolution at which the images are to be presented. If images need to be re-scaled, aspect ratio must be kept.

  • Background-Colour:

For transparent images and for smaller, non-rescaled images, the background colour of the images has to be defined. This may be black, white, gray or whatever. This is a default setting which may be overruled by the specific image.

  • Align-Horizontal, Align-Vertical:

Smaller images that are not rescaled to display size may be aligned at several different areas inside the larger display image:

* Align-Horizontal: centre/left/right
* Align-Vertical: centre/top/bottom

This is a default setting which may be ruled over by the specific image through its own parameters.

  • Options:
* Upscaling: This paramter decides whether smaller images should be scaled up to meet the display size, or just be displayed inside it. This is a default setting and can be ruled over by the specific image through its own parameters.
* Downscaling: This paramter decides whether larger images should be scaled down to meet the display size, or just be truncated. This is a default setting and can be ruled over by the specific image through its own parameters.

OggSpots data

Each data packet contains a byte offset to where the complete image is stored, a set of parameters to describe what to do with the image, and the image itself.

The insertion time is encoded in the granule_pos of the Ogg Page that the image ends on.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Byte Offset                                                   | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Image-Format                                                  | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Display width                 | Display height                | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BG-Color      | Rescaling     | Align-Horiz   | Align-Vert    | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



Authoring Considerations

While it is possible to author an OggSpots video through command-line parameters by handing over a set of images and setting parameters, there should also be a means to author it through files.

A rough way to author such information could be:

Display-Width: 320
Display-Height: 240
Rescaling: True
Background-Colour: Grey
...further parameters...
npt:00:00:00.000 /my_slides/image_01.png
npt:00:02:10.000 /my_slides/image_02.png
npt:00:05:02.000 /my_slides/image_03.png
npt:00:06:50.000 /my_slides/image_04.png