Ogg Index

From XiphWiki

(Difference between revisions)
Jump to: navigation, search
(add {{draft}} template)
Line 1: Line 1:
{{draft}}
{{draft}}
-
= Ogg Index Track Specification Version 1 =
+
= Ogg Skeleton 3.1 with Keyframe Index =
-
'''DRAFT, last updated 24 September 2009'''
+
'''DRAFT, last updated 11 January 2010'''
'''This specification is still a work in progress, and does not yet constitute an official Ogg track format.'''
'''This specification is still a work in progress, and does not yet constitute an official Ogg track format.'''
-
+
 
-
+
== Overview ==
== Overview ==
   
   
-
Seeking in an Ogg file is typically implemented as a bisection search
+
Seeking in an Ogg file is typically implemented as a bisection search  
-
over the pages in the file. The Ogg physical bitstream is bisected and
+
over the pages in the file. The Ogg physical bitstream is bisected and  
-
the next Ogg page's end-time is extracted. The bisection continues until
+
the next Ogg page's end-time is extracted. The bisection continues until  
-
it reaches an Ogg page with an end-time close enough to the seek target
+
it reaches an Ogg page with an end-time close enough to the seek target  
-
time. However in media containing streams which have key frames and
+
time. However in media containing streams which have keyframes and  
-
interframes, such as Theora streams, your bisection search won't
+
interframes, such as Theora streams, your bisection search won't  
-
necessarily stop at a keyframe, thus you can't simply resume playback
+
necessarily terminate at a keyframe. Thus if you begin decoding after your
-
from that point. First you need to construct the keyframe's timestamp
+
first bisection terminates, you're likely to only get partial incomplete
-
from the last page's granulepos, and seek again to the start of the
+
frames, with "visual artifacts", until you decode up to the next keyframe.
-
keyframe and decode forward until you reach the frame at the seek
+
So to eliminate these visual artifacts, after the first bisection
-
target.
+
terminates, you must extract the keyframe's timestamp from the last Theora
-
+
page's granulepos, and seek again back to the start of the keyframe and
-
This is further complicated by the fact that packets often span multiple
+
decode forward until you reach the frame at the seek target.  
-
Ogg pages, and that Ogg pages from different streams can be interleaved
+
 
-
between spanning packets.
+
This is further complicated by the fact that packets often span multiple  
-
+
Ogg pages, and that Ogg pages from different streams can be interleaved  
-
The bisection method above works fine for seeking in local files, but
+
between spanning packets.  
-
for seeking in files served over the Internet via HTTP, each bisection
+
 
-
or non sequential read can trigger a new HTTP request, which can have
+
The bisection method above works fine for seeking in local files, but  
-
very high latency, making seeking very slow.
+
for seeking in files served over the Internet via HTTP, each bisection  
-
+
or non sequential read can trigger a new HTTP request, which can have  
-
+
very high latency, making seeking very slow.  
 +
 
== Seeking with an index ==
== Seeking with an index ==
-
+
 
-
The Ogg index bitstream attempts to alleviate this problem, by providing
+
The Skeleton 3.1 bitstream attempts to alleviate this problem, by  
-
an index of periodic keyframes in an Ogg file. The index is contained in
+
providing an index of periodic keyframes for every content stream in an  
-
a separate track which is embedded in the Ogg file, so that players
+
Ogg segment. Note that the Skeleton 3.1 track only holds data for the
-
which don't understand the index track can just ignore it. In streams
+
segment in which it resides. So if two Ogg files are concatenated together
-
without the concept of a keyframe, such as Vorbis streams where each
+
("chained"), the Skeleton 3.1's keyframe indexes in the first Ogg segment
-
sample is independent, the index can instead record the time position at
+
(the first Ogg in the "chain") do not contain information about the
-
periodic intervals, which achieves the same result. When this document
+
keyframes in the second Ogg segment (the second Ogg in the "chain").
-
refers to keyframes, it also refers to these independent periodic
+
 
-
samples from keyframe-less streams.
+
Each content track has a separate index, which is stored in its own
-
+
packet in the Skeleton 3.1 track. The index for streams without the  
-
The Ogg index bitstream provides seek algorithms with an ordered table
+
concept of a keyframe, such as Vorbis streams, can instead record the  
-
of the Ogg page start-offsets and end-times of key points in the indexed
+
time position at periodic intervals, which achieves the same result.  
-
streams in an Ogg segment.
+
When this document refers to keyframes, it also implicitly refers to these
-
+
independent periodic samples from keyframe-less streams.  
-
A key point k is defined as follows. Each key point has an 8 byte offset
+
 
-
o, an 8 byte time t, and a 4 byte checksum c. This specifies that in
+
All the Skeleton 3.1 track's pages appear in the header pages of the Ogg
-
order to render the media at presentation time t milliseconds, the last
+
segment. This means the all the keyframe indexes are immediately
-
page which lies before the start of all the packets containing all the
+
available once the header packets have been read when playing the media
-
key frames required to render at time t begins at offset o. The checksum
+
over a network connection.
-
c is the checksum of the page which begins at offset o, which enables
+
 
-
you to verify that you're seeking to the intended page.
+
For every content stream in an Ogg segment, the Ogg index bitstream  
-
+
provides seek algorithms with an ordered table of "key points". A key
-
To seek in an Ogg bitstream which contains an index, you find the last
+
point is intrinsically associated with exactly one stream, and refers to
-
key point in the index with time less than or equal to the target time.
+
a page in that stream. A key point k is defined as follows. Each key  
-
You then seek to the key point's offset, check that the page found there
+
point has an 8 byte offset o, a presentation time t as a fraction with an
-
has checksum c, and then decode forward until you encounter the sample
+
8 byte numerator and an 8 byte denominator, and a 4 byte checksum c.  
-
which corresponds to your seek target time. You are guaranteed to pass
+
This specifies that in order to render the stream at presentation time t,
-
keyframes on all indexed streams with time less than or equal to your
+
the last page which lies before all information required to render the
-
seek target time while decoding up to the seek target.
+
keyframe at presentation time t begins at byte offset o, as offset from
-
+
the beginning of the Ogg segment. The checksum c is the checksum of the
-
Be aware that you cannot assume that any or all Ogg files will contain
+
page which begins at offset o. This enables you to verify that you're
-
an index, and so when implementing Ogg seeking, you must gracefully
+
seeking to the intended page, and that the segment has not been modified
-
fall-back to a bisection search or other seek algorithm when the index
+
since the index was constructed. The time t is the keyframe's presentation
-
is not present.
+
time corresponding to the granulepos, and is represented as a fraction in
-
+
seconds. Note that if a stream requires any preroll, this will be
-
The index also only holds data for the segment in which it resides, i.e.
+
accounted for in the time stored in the keypoint.
-
if two Ogg files are concatenated together ("chained"), the index track
+
 
-
in one Ogg segment does not contain information about the keyframes in
+
The Skeleton 3.1 track contains one index for each content stream in the
-
the other Ogg segment.
+
file. To seek in an Ogg file which contains keyframe indexes, first
-
+
construct the set which contains every active streams' last keypoint which
-
The index also stores meta data about the segment in which it resides.
+
has time less than or equal to the seek target time. Then from that set
-
It stores the start time and the end time, and also the length of the
+
of key points, select the key point with the smallest byte offset. You then
-
segment in bytes. This is so that if the seek target is outside of the
+
verify that the page found at the selected key point's byte offset has the
-
indexed range, you can immediately move to the next/previous segment and
+
same checksum as the selected keypoint's checksum, and if so, you can begin
-
either seek using that segment's index, or narrow the bisection window
+
decoding up to the seek target time. You are guaranteed to pass keyframes
-
if that segment has no index.
+
on all streams with time less than or equal to your seek target time while
-
+
decoding up to the seek target.  
-
+
 
 +
Be aware that you cannot assume that any or all Ogg files will contain  
 +
keyframe indexes, and so when implementing Ogg seeking, you must  
 +
gracefully fall-back to a bisection search or other seek algorithm when  
 +
the index is not present.  
 +
 
 +
When using the index to seek, you must verify that the index is still
 +
correct - always check the key point's checksum matches the checksum of
 +
the page found at excatly the checksum's offset. If it does not match,  
 +
the file has changed since it was indexed, and you cannot rely on the  
 +
index being reliable. You should then fallback to seek using a bisection
 +
search. You should also always check the Skeleton version header field
 +
to ensure your decoder correctly knows how to parse the Skeleton track.  
 +
 
 +
The Skeleton 3.1 header packet also stores meta data about the segment in  
 +
which it resides. It stores the timestamps of the first and last samples
 +
in the segment. This also allows you to determine the duration of the
 +
indexed Ogg media without having to decode the start and end of the
 +
Ogg segment to calculate the difference (which is the duration). The index
 +
header also contains the length of the index segment in bytes. This is so
 +
that if the seek target is outside of the indexed range, you can
 +
immediately move to the next/previous segment and either seek using that
 +
segment's index, or narrow the bisection window if that segment has no index.
 +
 
== Format Specification ==
== Format Specification ==
   
   
-
Unless otherwise specified, all integers and fields in the bitstream are
+
Unless otherwise specified, all integers and fields in the bitstream are  
-
encoded with the least significant bit coming first in each byte.
+
encoded with the least significant bit coming first in each byte.  
-
Integers and fields comprising of more than one byte are encoded least
+
Integers and fields comprising of more than one byte are encoded least  
-
significant byte first (i.e. little endian byte order).
+
significant byte first (i.e. little endian byte order).  
-
+
-
An Ogg index track starts with an identifier header packet which
+
-
contains the following data, in the following order:
+
-
+
-
* The identifier "index\0".
+
-
* The index version format number, as a 1 byte unsigned integer. This specification describes version 1, so this field should have the value 0x01.
+
-
* The playback start time, in milliseconds, as an 8 byte unsigned integer, this is the presentation time of the first frame.
+
-
* The playback end time, in milliseconds, as an 8 byte unsigned integer, this is the end time of the last frame.
+
-
* The length of the indexed segment, in bytes, as an 8 byte unsigned integer.
+
-
* The number of key points in the index, 'n', as a 4 byte unsigned integer.
+
-
+
The Skeleton 3.1 track is intended to be backwards compatible with the  
-
The track then contains one secondary header packet, which contains the
+
Skeleton 3.0 specification, available at
-
actual index. This is the "index packet", and it must begin on a new
+
http://www.xiph.org/ogg/doc/skeleton.html . Unless specified
-
page, but it may span multiple pages. The index packet contains the
+
differently here, it is safe to assume that anything specified for a
-
following:
+
Skeleton 3.0 track holds for a Skeleton 3.1 track.
-
+
 
-
* 'n' key points, each of which contain, in the following order:
+
As per the Skeleton 3.0 track, a segment containing a Skeleton 3.1 track
-
** the page offset as an 8 byte unsigned integer, followed by
+
must begin with a '''Skeleton 3.1 fishead BOS packet''' on a page by itself, with the
-
** the checksum of the page found at the offset, as a 4 byte field,followed by
+
following format:
-
** the presentation times in milliseconds of the key point, as an 8 byte unsigned integer.
+
 
-
+
# Identifier: 8 bytes, "fishead\0".
-
The size of the data in the index packet is (n * (8 + 4 + 8)) bytes. The
+
# Version major: 2 Byte unsigned integer denoting the major version (3)
-
key points are stored in increasing order by offset.
+
# Version minor: 2 Byte unsigned integer denoting the minor version (1)
-
+
# Presentationtime numerator: 8 Byte signed integer
-
The track then contains one empty EOS packet, which must start on a new
+
# Presentationtime denominator: 8 Byte signed integer
-
page. The track therefore contains exactly three packets, on three or
+
# Basetime numerator: 8 Byte signed integer
-
more pages.
+
# Basetime denominator: 8 Byte signed integer
-
+
# UTC [ISO8601]: a 20 Byte string containing a UTC time
-
The offsets stored in the keypoints is relative to the start of the Ogg
+
# '''[NEW]''' First-sample-time numerator: 8 byte signed integer representing the numerator for the presentation time of the first sample in the media. Note that samples between the first-sample-time and the Presentationtime are supposed to be skipped during playback.
-
bitstream segment. So if you have a physical Ogg bitstream made up of
+
# '''[NEW]''' First-sample-time denominator: 8 byte signed integer, with value 0 if the timestamp is unknown. Decoders should always ensure that the denominator is not 0 before using it as a divisor!
-
two chained Oggs, the offsets in the second Ogg segment's bitstream's
+
# '''[NEW]''' Last-sample-time numerator: 8 byte signed integer representing the end time of the last sample in the segment.
-
index are relative to the beginning of the second Ogg in the chain, not
+
# '''[NEW]''' Last-sample-time denominator: 8 byte signed integer, with value 0 if the timestamp is unknown. Decoders should always ensure that the denominator is not 0 before using it as a divisor!
-
the first. Also note that if a physical Ogg bitstream is made up of
+
# '''[NEW]''' The length of the segment, in bytes: 8 byte signed integer, -1 if unknown.
-
chained Oggs, the presence of an index in one segment does not imply
+
 
-
that there will be an index in any other segment.
+
The first-sample-time and last-sample-time are rational numbers, in units
-
+
of seconds. If the denominator is 0 for the first-sample-time or the
-
The exact number of keyframes used to construct key points in the index
+
last-sample-time, then that value was unable to be determined at indexing
-
is up to the indexer, but to limit the index size, we recommend
+
time, and is unknown. The duration of the Ogg segment can be calculated by
-
including at most one key point per every 64KB of data, or every 500ms.
+
subtracting the first-sample-time from the last-sample-time.
-
+
 
-
There can be only one index track per Ogg bitstream segment. The index
+
In Skeleton 3.1 the "fisbone" packets remain unchanged from Skeleton
-
packet must occur before all non-metadata streams' content packets. In
+
3.0, and will still follow after the other streams' BOS pages and
-
practice this means that the index packet will occur along with other
+
secondary header pages.
-
secondary header pages, before the skeleton EOS page.
+
 
-
+
Before the Skeleton EOS page in the segment header pages come the
-
All pages in the index bitstream have their granulepos set as 0.
+
'''Skeleton 3.1 keyframe index packets'''. There is one index packet for each
 +
content stream in the Ogg segment. Each index packet contains the  
 +
following:  
 +
 
 +
# Identifier 6 bytes: "index\0"
 +
# The serialno of the stream this index applies to, as a 4 byte field.
 +
# The number of keypoints in this index packet, 'n' as a 4 byte unsigned integer. This can be 0.
 +
# The keypoint presentation time denominator, as an 8 byte signed integer.
 +
# 'n' key points, each of which contain, in the following order:
 +
## a page start's byte offset as an 8 byte unsigned integer, followed by
 +
## the checksum of the page found at the offset, as a 4 byte field, followed by
 +
## the presentation time numerator of the first key frame which starts on the page at the keypoint's offset, as an 8 byte integer. Divide this by the timestamp denominator to determine the presentation time of the keyframe in seconds.
 +
 
 +
Note that a keypoint always represents the first key frame on a page. If an
 +
Ogg page contains two or more keyframes, the index's key point *must* refer
 +
to the first keyframe on that page, not the second.
 +
 
 +
The key points are stored in increasing order by offset (and thus by
 +
presentation time as well). Note that an index packet may be larger than
 +
(6 + 4 + 4 + 8 + (n * (8 + 4 + 8)) bytes, as it may have been
 +
preallocated during encoding, but not completely filled. Do not make
 +
assumptions about an index packet's size, always check an index packet's
 +
'bytes' field to determine its size, and always use its 'n' field to
 +
determine the number of keypoints contained in the index packet.  
 +
 
 +
The byte offsets stored in keypoints are relative to the start of the Ogg
 +
bitstream segment. So if you have a physical Ogg bitstream made up of two
 +
chained Oggs, the offsets in the second Ogg segment's bitstream's index
 +
are relative to the beginning of the second Ogg in the chain, not the first.
 +
Also note that if a physical Ogg bitstream is made up of chained Oggs, the
 +
presence of an index in one segment does not imply that there will be an
 +
index in any other segment.
== Software Prototype ==
== Software Prototype ==
-
For a prototype indexer, see:
+
For a prototype indexer, see [http://github.com/cpearce/OggIndex OggIndex]. Also included there is a program OggIndexValid, which can verify that Theora and Vorbis indexes are valid. If you're implementing your own indexer, or going to be modifying existing indexes, always verify that your modified indexes are valid as per OggIndexValid!
-
http://github.com/cpearce/OggIndex
+
Recent [http://firefogg.org/nightly/ ffmpeg2theora nightlies] will also include a keyframe index in the Skeleton
 +
3.1 track if you specify the command line option <tt>--seek-index</tt>.
To see how indexes improves network seeking performance, you can download a development
To see how indexes improves network seeking performance, you can download a development
version of Firefox which can take advantage of indexes here:
version of Firefox which can take advantage of indexes here:
-
http://pearce.org.nz/video/firefox-indexed-ogg-seek.linux.tar.bz2
+
http://pearce.org.nz/video/firefox-indexed-seek-linux.tar.bz2
 +
 
 +
http://pearce.org.nz/video/firefox-indexed-seek-macosx.dmg
-
http://pearce.org.nz/video/firefox-indexed-ogg-seek.macosx.dmg
+
http://pearce.org.nz/video/firefox-indexed-seek-win32.zip
-
http://pearce.org.nz/video/firefox-indexed-ogg-seek.win32.zip
+
If you already have a Firefox instance running, you'll need to either close your running
 +
Firefox instance before starting the index-capable Firefox, or start the index-capable
 +
Firefox with the <tt>--no-remote</tt> command line parameter.
-
Then point that browser here:
+
To compare the network performance of indexed versus non-indexed seeking, point the
 +
index-capable Firefox here:
http://pearce.org.nz/video/indexed-seek-demo.html
http://pearce.org.nz/video/indexed-seek-demo.html

Revision as of 22:28, 10 January 2010

Contents

The following is a draft. It is at best incomplete and at worst completely broken. In any case, it is not an "official" Xiph spec/codec, so use with care.


Ogg Skeleton 3.1 with Keyframe Index

DRAFT, last updated 11 January 2010

This specification is still a work in progress, and does not yet constitute an official Ogg track format.

Overview

Seeking in an Ogg file is typically implemented as a bisection search over the pages in the file. The Ogg physical bitstream is bisected and the next Ogg page's end-time is extracted. The bisection continues until it reaches an Ogg page with an end-time close enough to the seek target time. However in media containing streams which have keyframes and interframes, such as Theora streams, your bisection search won't necessarily terminate at a keyframe. Thus if you begin decoding after your first bisection terminates, you're likely to only get partial incomplete frames, with "visual artifacts", until you decode up to the next keyframe. So to eliminate these visual artifacts, after the first bisection terminates, you must extract the keyframe's timestamp from the last Theora page's granulepos, and seek again back to the start of the keyframe and decode forward until you reach the frame at the seek target.

This is further complicated by the fact that packets often span multiple Ogg pages, and that Ogg pages from different streams can be interleaved between spanning packets.

The bisection method above works fine for seeking in local files, but for seeking in files served over the Internet via HTTP, each bisection or non sequential read can trigger a new HTTP request, which can have very high latency, making seeking very slow.

Seeking with an index

The Skeleton 3.1 bitstream attempts to alleviate this problem, by providing an index of periodic keyframes for every content stream in an Ogg segment. Note that the Skeleton 3.1 track only holds data for the segment in which it resides. So if two Ogg files are concatenated together ("chained"), the Skeleton 3.1's keyframe indexes in the first Ogg segment (the first Ogg in the "chain") do not contain information about the keyframes in the second Ogg segment (the second Ogg in the "chain").

Each content track has a separate index, which is stored in its own packet in the Skeleton 3.1 track. The index for streams without the concept of a keyframe, such as Vorbis streams, can instead record the time position at periodic intervals, which achieves the same result. When this document refers to keyframes, it also implicitly refers to these independent periodic samples from keyframe-less streams.

All the Skeleton 3.1 track's pages appear in the header pages of the Ogg segment. This means the all the keyframe indexes are immediately available once the header packets have been read when playing the media over a network connection.

For every content stream in an Ogg segment, the Ogg index bitstream provides seek algorithms with an ordered table of "key points". A key point is intrinsically associated with exactly one stream, and refers to a page in that stream. A key point k is defined as follows. Each key point has an 8 byte offset o, a presentation time t as a fraction with an 8 byte numerator and an 8 byte denominator, and a 4 byte checksum c. This specifies that in order to render the stream at presentation time t, the last page which lies before all information required to render the keyframe at presentation time t begins at byte offset o, as offset from the beginning of the Ogg segment. The checksum c is the checksum of the page which begins at offset o. This enables you to verify that you're seeking to the intended page, and that the segment has not been modified since the index was constructed. The time t is the keyframe's presentation time corresponding to the granulepos, and is represented as a fraction in seconds. Note that if a stream requires any preroll, this will be accounted for in the time stored in the keypoint.

The Skeleton 3.1 track contains one index for each content stream in the file. To seek in an Ogg file which contains keyframe indexes, first construct the set which contains every active streams' last keypoint which has time less than or equal to the seek target time. Then from that set of key points, select the key point with the smallest byte offset. You then verify that the page found at the selected key point's byte offset has the same checksum as the selected keypoint's checksum, and if so, you can begin decoding up to the seek target time. You are guaranteed to pass keyframes on all streams with time less than or equal to your seek target time while decoding up to the seek target.

Be aware that you cannot assume that any or all Ogg files will contain keyframe indexes, and so when implementing Ogg seeking, you must gracefully fall-back to a bisection search or other seek algorithm when the index is not present.

When using the index to seek, you must verify that the index is still correct - always check the key point's checksum matches the checksum of the page found at excatly the checksum's offset. If it does not match, the file has changed since it was indexed, and you cannot rely on the index being reliable. You should then fallback to seek using a bisection search. You should also always check the Skeleton version header field to ensure your decoder correctly knows how to parse the Skeleton track.

The Skeleton 3.1 header packet also stores meta data about the segment in which it resides. It stores the timestamps of the first and last samples in the segment. This also allows you to determine the duration of the indexed Ogg media without having to decode the start and end of the Ogg segment to calculate the difference (which is the duration). The index header also contains the length of the index segment in bytes. This is so that if the seek target is outside of the indexed range, you can immediately move to the next/previous segment and either seek using that segment's index, or narrow the bisection window if that segment has no index.

Format Specification

Unless otherwise specified, all integers and fields in the bitstream are encoded with the least significant bit coming first in each byte. Integers and fields comprising of more than one byte are encoded least significant byte first (i.e. little endian byte order).

The Skeleton 3.1 track is intended to be backwards compatible with the Skeleton 3.0 specification, available at http://www.xiph.org/ogg/doc/skeleton.html . Unless specified differently here, it is safe to assume that anything specified for a Skeleton 3.0 track holds for a Skeleton 3.1 track.

As per the Skeleton 3.0 track, a segment containing a Skeleton 3.1 track must begin with a Skeleton 3.1 fishead BOS packet on a page by itself, with the following format:

  1. Identifier: 8 bytes, "fishead\0".
  2. Version major: 2 Byte unsigned integer denoting the major version (3)
  3. Version minor: 2 Byte unsigned integer denoting the minor version (1)
  4. Presentationtime numerator: 8 Byte signed integer
  5. Presentationtime denominator: 8 Byte signed integer
  6. Basetime numerator: 8 Byte signed integer
  7. Basetime denominator: 8 Byte signed integer
  8. UTC [ISO8601]: a 20 Byte string containing a UTC time
  9. [NEW] First-sample-time numerator: 8 byte signed integer representing the numerator for the presentation time of the first sample in the media. Note that samples between the first-sample-time and the Presentationtime are supposed to be skipped during playback.
  10. [NEW] First-sample-time denominator: 8 byte signed integer, with value 0 if the timestamp is unknown. Decoders should always ensure that the denominator is not 0 before using it as a divisor!
  11. [NEW] Last-sample-time numerator: 8 byte signed integer representing the end time of the last sample in the segment.
  12. [NEW] Last-sample-time denominator: 8 byte signed integer, with value 0 if the timestamp is unknown. Decoders should always ensure that the denominator is not 0 before using it as a divisor!
  13. [NEW] The length of the segment, in bytes: 8 byte signed integer, -1 if unknown.

The first-sample-time and last-sample-time are rational numbers, in units of seconds. If the denominator is 0 for the first-sample-time or the last-sample-time, then that value was unable to be determined at indexing time, and is unknown. The duration of the Ogg segment can be calculated by subtracting the first-sample-time from the last-sample-time.

In Skeleton 3.1 the "fisbone" packets remain unchanged from Skeleton 3.0, and will still follow after the other streams' BOS pages and secondary header pages.

Before the Skeleton EOS page in the segment header pages come the Skeleton 3.1 keyframe index packets. There is one index packet for each content stream in the Ogg segment. Each index packet contains the following:

  1. Identifier 6 bytes: "index\0"
  2. The serialno of the stream this index applies to, as a 4 byte field.
  3. The number of keypoints in this index packet, 'n' as a 4 byte unsigned integer. This can be 0.
  4. The keypoint presentation time denominator, as an 8 byte signed integer.
  5. 'n' key points, each of which contain, in the following order:
    1. a page start's byte offset as an 8 byte unsigned integer, followed by
    2. the checksum of the page found at the offset, as a 4 byte field, followed by
    3. the presentation time numerator of the first key frame which starts on the page at the keypoint's offset, as an 8 byte integer. Divide this by the timestamp denominator to determine the presentation time of the keyframe in seconds.

Note that a keypoint always represents the first key frame on a page. If an Ogg page contains two or more keyframes, the index's key point *must* refer to the first keyframe on that page, not the second.

The key points are stored in increasing order by offset (and thus by presentation time as well). Note that an index packet may be larger than (6 + 4 + 4 + 8 + (n * (8 + 4 + 8)) bytes, as it may have been preallocated during encoding, but not completely filled. Do not make assumptions about an index packet's size, always check an index packet's 'bytes' field to determine its size, and always use its 'n' field to determine the number of keypoints contained in the index packet.

The byte offsets stored in keypoints are relative to the start of the Ogg bitstream segment. So if you have a physical Ogg bitstream made up of two chained Oggs, the offsets in the second Ogg segment's bitstream's index are relative to the beginning of the second Ogg in the chain, not the first. Also note that if a physical Ogg bitstream is made up of chained Oggs, the presence of an index in one segment does not imply that there will be an index in any other segment.

Software Prototype

For a prototype indexer, see OggIndex. Also included there is a program OggIndexValid, which can verify that Theora and Vorbis indexes are valid. If you're implementing your own indexer, or going to be modifying existing indexes, always verify that your modified indexes are valid as per OggIndexValid!

Recent ffmpeg2theora nightlies will also include a keyframe index in the Skeleton 3.1 track if you specify the command line option --seek-index.

To see how indexes improves network seeking performance, you can download a development version of Firefox which can take advantage of indexes here:

http://pearce.org.nz/video/firefox-indexed-seek-linux.tar.bz2

http://pearce.org.nz/video/firefox-indexed-seek-macosx.dmg

http://pearce.org.nz/video/firefox-indexed-seek-win32.zip

If you already have a Firefox instance running, you'll need to either close your running Firefox instance before starting the index-capable Firefox, or start the index-capable Firefox with the --no-remote command line parameter.

To compare the network performance of indexed versus non-indexed seeking, point the index-capable Firefox here:

http://pearce.org.nz/video/indexed-seek-demo.html

Personal tools


Main Page

Xiph.Org Projects

Audio—

Video—

Text—

Container—

Streaming—