Ogg Index: Difference between revisions

From XiphWiki
Jump to navigation Jump to search
(add {{draft}} template)
No edit summary
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{draft}}
{{draft}}


= Ogg Index Track Specification Version 1 =
= Ogg Skeleton 3.3 with Keyframe Index =


'''DRAFT, last updated 24 September 2009'''
'''This specification is obsolete and has been superseded by [[Ogg Skeleton 4]]. Use that instead!'''


'''This specification is still a work in progress, and does not yet constitute an official Ogg track format.'''
== Overview ==
== Overview ==
   
   
Seeking in an Ogg file is typically implemented as a bisection search
Seeking in an Ogg file is typically implemented as a bisection search  
over the pages in the file. The Ogg physical bitstream is bisected and
over the pages in the file. The Ogg physical bitstream is bisected and  
the next Ogg page's end-time is extracted. The bisection continues until
the next Ogg page's end-time is extracted. The bisection continues until  
it reaches an Ogg page with an end-time close enough to the seek target
it reaches an Ogg page with an end-time close enough to the seek target  
time. However in media containing streams which have key frames and
time. However in media containing streams which have keyframes and  
interframes, such as Theora streams, your bisection search won't
interframes, such as Theora streams, your bisection search won't  
necessarily stop at a keyframe, thus you can't simply resume playback
necessarily terminate at a keyframe. Thus if you begin decoding after your
from that point. First you need to construct the keyframe's timestamp
first bisection terminates, you're likely to only get partial incomplete
from the last page's granulepos, and seek again to the start of the
frames, with "visual artifacts", until you decode up to the next keyframe.
keyframe and decode forward until you reach the frame at the seek
So to eliminate these visual artifacts, after the first bisection
target.
terminates, you must extract the keyframe's timestamp from the last Theora
page's granulepos, and seek again back to the start of the keyframe and
This is further complicated by the fact that packets often span multiple
decode forward until you reach the frame at the seek target.  
Ogg pages, and that Ogg pages from different streams can be interleaved
 
between spanning packets.
This is further complicated by the fact that packets often span multiple  
Ogg pages, and that Ogg pages from different streams can be interleaved  
The bisection method above works fine for seeking in local files, but
between spanning packets.  
for seeking in files served over the Internet via HTTP, each bisection
 
or non sequential read can trigger a new HTTP request, which can have
The bisection method above works fine for seeking in local files, but  
very high latency, making seeking very slow.
for seeking in files served over the Internet via HTTP, each bisection  
or non sequential read can trigger a new HTTP request, which can have  
very high latency, making seeking very slow.  
 
== Seeking with an index ==
== Seeking with an index ==
 
The Ogg index bitstream attempts to alleviate this problem, by providing
The Skeleton 3.3 bitstream attempts to alleviate this problem, by  
an index of periodic keyframes in an Ogg file. The index is contained in
providing an index of periodic keyframes for every content stream in an  
a separate track which is embedded in the Ogg file, so that players
Ogg segment. Note that the Skeleton 3.3 track only holds data for the
which don't understand the index track can just ignore it. In streams
segment or "link" in which it resides. So if two Ogg files are concatenated
without the concept of a keyframe, such as Vorbis streams where each
together ("chained"), the Skeleton 3.3's keyframe indexes in the first Ogg
sample is independent, the index can instead record the time position at
segment (the first "link" in the "chain") do not contain information
periodic intervals, which achieves the same result. When this document
about the keyframes in the second Ogg segment (the second link in the chain).
refers to keyframes, it also refers to these independent periodic
 
samples from keyframe-less streams.
Each content track has a separate index, which is stored in its own
packet in the Skeleton 3.3 track. The index for streams without the  
The Ogg index bitstream provides seek algorithms with an ordered table
concept of a keyframe, such as Vorbis streams, can instead record the  
of the Ogg page start-offsets and end-times of key points in the indexed
time position at periodic intervals, which achieves the same result.  
streams in an Ogg segment.
When this document refers to keyframes, it also implicitly refers to these
independent periodic samples from keyframe-less streams.  
A key point k is defined as follows. Each key point has an 8 byte offset
 
o, an 8 byte time t, and a 4 byte checksum c. This specifies that in
All the Skeleton 3.3 track's pages appear in the header pages of the Ogg
order to render the media at presentation time t milliseconds, the last
segment. This means the all the keyframe indexes are immediately
page which lies before the start of all the packets containing all the
available once the header packets have been read when playing the media
key frames required to render at time t begins at offset o. The checksum
over a network connection.
c is the checksum of the page which begins at offset o, which enables
 
you to verify that you're seeking to the intended page.
For every content stream in an Ogg segment, the Ogg index bitstream  
provides seek algorithms with an ordered table of "key points". A key  
To seek in an Ogg bitstream which contains an index, you find the last
point is intrinsically associated with exactly one stream, and stores the
key point in the index with time less than or equal to the target time.
offset of the page on which it starts, o, as well as the presentation time
You then seek to the key point's offset, check that the page found there
of the keyframe t, as a fraction of seconds. This specifies that in order
has checksum c, and then decode forward until you encounter the sample
to render the stream at presentation time t, the last page which lies before
which corresponds to your seek target time. You are guaranteed to pass
all information required to render the keyframe at presentation time t begins
keyframes on all indexed streams with time less than or equal to your
exactly at byte offset o, as offset from the beginning of the Ogg segment.
seek target time while decoding up to the seek target.
The offset is exactly the first byte of the page, so if you seek to a
keypoint's offset and don't find the beginning of a page there, you can
Be aware that you cannot assume that any or all Ogg files will contain
assume that the Ogg segment has been modified since the index was constructed,
an index, and so when implementing Ogg seeking, you must gracefully
and that the index is now invalid and should not be used. The time t is the
keyframe's presentation time corresponding to the granulepos, and is
represented as a fraction in seconds. Note that if a stream requires any
preroll, this will be accounted for in the time stored in the keypoint.  
 
The Skeleton 3.3 track contains one index for each content stream in the
file. To seek in an Ogg file which contains keyframe indexes, first
construct the set which contains every active streams' last keypoint which
has time less than or equal to the seek target time. Then from that set
of key points, select the key point with the smallest byte offset. You then
verify that there's a page found at exactly that offset, and if so, you can
begin decoding. If the first keyframe you encounter has a time equal to
that stored in the keypoint, you have made the optimal seek, and can safely
continue to decode up to the seek target time. You are guaranteed to pass
keyframes on all streams with time less than or equal to your seek target
time while decoding up to the seek target. However if the first keyframe
you encounter after decoding does not have the same presentation time as
is stored in the keypoint, you then the index is invalid (possibly the file
has been changed without updating the index) and you must either fallback
to a bisection search, or keep decoding if you've landed "close enough"
to the seek target.
 
Be aware that you cannot assume that any or all Ogg files will contain  
keyframe indexes, so when implementing Ogg seeking, you must gracefully
fall-back to a bisection search or other seek algorithm when the index
fall-back to a bisection search or other seek algorithm when the index
is not present.
is not present, or when it is invalid.
 
The index also only holds data for the segment in which it resides, i.e.
The Skeleton 3.3 BOS packet also stores meta data about the segment in  
if two Ogg files are concatenated together ("chained"), the index track
which it resides. It stores the timestamps of the first and last samples
in one Ogg segment does not contain information about the keyframes in
in the segment. This also allows you to determine the duration of the
the other Ogg segment.
indexed Ogg media without having to decode the start and end of the
Ogg segment to calculate the difference (which is the duration).
The index also stores meta data about the segment in which it resides.
 
It stores the start time and the end time, and also the length of the
The Skeleton 3.3 BOS packet also contains the length of the indexed segment
segment in bytes. This is so that if the seek target is outside of the
in bytes. This is so that if the seek target is outside of the indexed range,
indexed range, you can immediately move to the next/previous segment and
you can immediately move to the next/previous segment and either seek using
either seek using that segment's index, or narrow the bisection window
that segment's index, or narrow the bisection window if that segment has no
if that segment has no index.
index. You can also use the segement length to verify if the index is valid.
If the contents of the segment have changed, it's highly likely that the
length of the segment has changed as well. When you load the segment's
header pages, you should check the length of the physical segment, and if it
doesn't match that stored in the Skeleton header packet, you know the index
is out of date and not safe to use.
 
The Skeleton 3.3 BOS packet also contains the offset of the first non header
page in the Ogg segment. This means that if you wish to delay loading of an
index for whatever reason, you can skip forward to that offset, and start
decoding from that offset forwards.
 
When using the index to seek, you must verify that the index is still
correct. You can consider the index invalid if any of the following are true:
 
# The segment length stored in the Skeleton BOS packet doesn't match the length of the physical segment, or
# after a seek to a keypoint's offset, you don't land exactly on a page boundary, or
# the first keyframe decoded after seeking to a keypoint's offset doesn't have the same presentation time as stored in the index.
 
You should also always check the Skeleton version header field
to ensure your decoder correctly knows how to parse the Skeleton track.
 
Be aware that a keyframe index may not index all keyframes in the Ogg segment,
it may only index periodic keyframes instead.
 
== Format Specification ==
== Format Specification ==
   
   
Unless otherwise specified, all integers and fields in the bitstream are
Unless otherwise specified, all integers and fields in the bitstream are  
encoded with the least significant bit coming first in each byte.
encoded with the least significant bit coming first in each byte.  
Integers and fields comprising of more than one byte are encoded least
Integers and fields comprising of more than one byte are encoded least  
significant byte first (i.e. little endian byte order).
significant byte first (i.e. little endian byte order).  
 
An Ogg index track starts with an identifier header packet which
The Skeleton 3.3 track is intended to be backwards compatible with the
contains the following data, in the following order:
Skeleton 3.0 specification, available at
http://www.xiph.org/ogg/doc/skeleton.html . Unless specified
* The identifier "index\0".
differently here, it is safe to assume that anything specified for a
* The index version format number, as a 1 byte unsigned integer. This specification describes version 1, so this field should have the value 0x01.
Skeleton 3.0 track holds for a Skeleton 3.3 track.
* The playback start time, in milliseconds, as an 8 byte unsigned integer, this is the presentation time of the first frame.
 
* The playback end time, in milliseconds, as an 8 byte unsigned integer, this is the end time of the last frame.
As per the Skeleton 3.0 track, a segment containing a Skeleton 3.3 track  
* The length of the indexed segment, in bytes, as an 8 byte unsigned integer.
must begin with a '''Skeleton 3.3 fishead BOS packet''' on a page by itself, with the
* The number of key points in the index, 'n', as a 4 byte unsigned integer.
following format:
 
# Identifier: 8 bytes, "fishead\0".
# Version major: 2 Byte unsigned integer denoting the major version (3)
# Version minor: 2 Byte unsigned integer denoting the minor version (1)
# Presentationtime numerator: 8 Byte signed integer
# Presentationtime denominator: 8 Byte signed integer
# Basetime numerator: 8 Byte signed integer
# Basetime denominator: 8 Byte signed integer
# UTC [ISO8601]: a 20 Byte string containing a UTC time
# '''[NEW]''' First-sample-time numerator: 8 byte signed integer representing the numerator for the presentation time of the first sample in the media. Note that samples between the first-sample-time and the Presentationtime are supposed to be skipped during playback.
# '''[NEW]''' First-sample-time denominator: 8 byte signed integer, with value 0 if the timestamp is unknown. Decoders should always ensure that the denominator is not 0 before using it as a divisor!
# '''[NEW]''' Last-sample-time numerator: 8 byte signed integer representing the end time of the last sample in the segment.
# '''[NEW]''' Last-sample-time denominator: 8 byte signed integer, with value 0 if the timestamp is unknown. Decoders should always ensure that the denominator is not 0 before using it as a divisor!
# '''[NEW]''' The length of the segment, in bytes: 8 byte unsigned integer, 0 if unknown.
# '''[NEW]''' The offset of the first non-header page, in bytes: 8 byte unsigned integer, 0 if unknown.
 
The first-sample-time and last-sample-time are rational numbers, in units
of seconds. If the denominator is 0 for the first-sample-time or the
last-sample-time, then that value was unable to be determined at indexing
time, and is unknown. The duration of the Ogg segment can be calculated by
subtracting the first-sample-time from the last-sample-time.
 
In '''Skeleton 3.3 the "fisbone" packets remain unchanged from Skeleton
3.0''', and will still follow after the other streams' BOS pages and
secondary header pages.
 
Before the Skeleton EOS page in the segment header pages come the
Skeleton 3.3 keyframe index packets. There should be one index packet for
each content stream in the Ogg segment, but index packets are not required
for a Skeleton 3.3 track to be considered valid. Each keypoint in the index
is stored in a "keypoint", which in turn stores an offset, checksum, and
timestamp. In order to save space, the offsets and timestamps are stored as
deltas, and then variable byte-encoded. The offset and timestamp deltas
store the difference between the keypoint's offset and timestamp from the
previous keypoint's offset and timestamp. So to calculate the page offset
of a keypoint you must sum the offset deltas of up to and including the
keypoint in the index.
 
The variable byte encoded integers are encoded using 7 bits per byte to
store the integer's bits, and the high bit is set in the last byte used
to encode the integer. The bits and bytes are in little endian byte order.
For example, the integer 7843, or <tt>0001 1110 1010 0011</tt> in binary, would be
stored as two bytes: <tt>0xBD 0x23</tt>, or <tt>1011 1101 0010 0011</tt> in binary.
 
Each '''Skeleton 3.3 keyframe index packet''' contains the following:  
 
# Identifier 6 bytes: "index\0"
# The serialno of the stream this index applies to, as a 4 byte field.
# The number of keypoints in this index packet, 'n' as a 8 byte unsigned integer. This can be 0.
# The keypoint presentation time denominator, as an 8 byte signed integer.
# 'n' key points, each of which contain, in the following order:
## the keyframe's page's byte offset delta, as a variable byte encoded integer. This is the number of bytes that this keypoint is after the preceeding keypoint's offset, or from the start of the segment if this is the first keypoint. The keypoint's page start is therefore the sum of the byte-offset-deltas of all the keypoints which come before it.
## the presentation time numerator delta, of the first key frame which starts on the page at the keypoint's offset, as a variable byte encoded integer. This is the difference from the previous keypoint's timestamp numerator. The keypoint's timestamp numerator is therefore the sum of all the timestamp numerator deltas up to and including the keypoint's. Divide the timestamp numerator sum by the timestamp denominator stored earlier in the index packet to determine the presentation time of the keyframe in seconds.
 
Note that a keypoint always represents the first key frame on a page. If an
Ogg page contains two or more keyframes, the index's key point *must* refer
to the first keyframe on that page, not any subsequent keyframes on that page.
 
The key points are stored in increasing order by offset (and thus by
presentation time as well).
 
The byte offsets stored in keypoints are relative to the start of the Ogg
bitstream segment. So if you have a physical Ogg bitstream made up of two
chained Oggs, the offsets in the second Ogg segment's bitstream's index
are relative to the beginning of the second Ogg in the chain, not the first.
Also note that if a physical Ogg bitstream is made up of chained Oggs, the
presence of an index in one segment does not imply that there will be an
index in any other segment.  
 
The exact number of keyframes used to construct key points in the index  
is up to the indexer, but to limit the index size, we recommend
including at most one key point per every 64KB of data, or every 2000ms,  
whichever is least frequent.  


As per the Skeleton 3.0 track, '''the last packet in the Skeleton 3.3 track
The track then contains one secondary header packet, which contains the
is an empty EOS packet'''.
actual index. This is the "index packet", and it must begin on a new
page, but it may span multiple pages. The index packet contains the
following:
* 'n' key points, each of which contain, in the following order:
** the page offset as an 8 byte unsigned integer, followed by
** the checksum of the page found at the offset, as a 4 byte field,followed by
** the presentation times in milliseconds of the key point, as an 8 byte unsigned integer.
The size of the data in the index packet is (n * (8 + 4 + 8)) bytes. The
key points are stored in increasing order by offset.
The track then contains one empty EOS packet, which must start on a new
page. The track therefore contains exactly three packets, on three or
more pages.
The offsets stored in the keypoints is relative to the start of the Ogg
bitstream segment. So if you have a physical Ogg bitstream made up of
two chained Oggs, the offsets in the second Ogg segment's bitstream's
index are relative to the beginning of the second Ogg in the chain, not
the first. Also note that if a physical Ogg bitstream is made up of
chained Oggs, the presence of an index in one segment does not imply
that there will be an index in any other segment.
The exact number of keyframes used to construct key points in the index
is up to the indexer, but to limit the index size, we recommend
including at most one key point per every 64KB of data, or every 500ms.
There can be only one index track per Ogg bitstream segment. The index
packet must occur before all non-metadata streams' content packets. In
practice this means that the index packet will occur along with other
secondary header pages, before the skeleton EOS page.
All pages in the index bitstream have their granulepos set as 0.


== Software Prototype ==
== Software Prototype ==


For a prototype indexer, see:
For a prototype indexer, see [http://github.com/cpearce/OggIndex OggIndex]. Also included there is a program OggIndexValid, which can verify that Theora and Vorbis indexes are valid. If you're implementing your own indexer, or going to be modifying existing indexes, always verify that your modified indexes are valid as per OggIndexValid!


http://github.com/cpearce/OggIndex
Recent [http://firefogg.org/nightly/ ffmpeg2theora nightlies] will also include a keyframe index in the Skeleton
3.3 track if you specify the command line option <tt>--seek-index</tt>.


To see how indexes improves network seeking performance, you can download a development
To see how indexes improves network seeking performance, you can download a development
version of Firefox which can take advantage of indexes here:
version of Firefox which can take advantage of indexes here:


http://pearce.org.nz/video/firefox-indexed-ogg-seek.linux.tar.bz2
http://pearce.org.nz/video/firefox-indexed-seek-linux.tar.bz2
 
http://pearce.org.nz/video/firefox-indexed-seek-macosx.dmg


http://pearce.org.nz/video/firefox-indexed-ogg-seek.macosx.dmg
http://pearce.org.nz/video/firefox-indexed-seek-win32.zip


http://pearce.org.nz/video/firefox-indexed-ogg-seek.win32.zip
If you already have a Firefox instance running, you'll need to either close your running
Firefox instance before starting the index-capable Firefox, or start the index-capable
Firefox with the <tt>--no-remote</tt> command line parameter.


Then point that browser here:
To compare the network performance of indexed versus non-indexed seeking, point the
index-capable Firefox here:


http://pearce.org.nz/video/indexed-seek-demo.html
http://pearce.org.nz/video/indexed-seek-demo.html

Latest revision as of 16:22, 22 November 2010


Ogg Skeleton 3.3 with Keyframe Index

This specification is obsolete and has been superseded by Ogg Skeleton 4. Use that instead!

Overview

Seeking in an Ogg file is typically implemented as a bisection search over the pages in the file. The Ogg physical bitstream is bisected and the next Ogg page's end-time is extracted. The bisection continues until it reaches an Ogg page with an end-time close enough to the seek target time. However in media containing streams which have keyframes and interframes, such as Theora streams, your bisection search won't necessarily terminate at a keyframe. Thus if you begin decoding after your first bisection terminates, you're likely to only get partial incomplete frames, with "visual artifacts", until you decode up to the next keyframe. So to eliminate these visual artifacts, after the first bisection terminates, you must extract the keyframe's timestamp from the last Theora page's granulepos, and seek again back to the start of the keyframe and decode forward until you reach the frame at the seek target.

This is further complicated by the fact that packets often span multiple Ogg pages, and that Ogg pages from different streams can be interleaved between spanning packets.

The bisection method above works fine for seeking in local files, but for seeking in files served over the Internet via HTTP, each bisection or non sequential read can trigger a new HTTP request, which can have very high latency, making seeking very slow.

Seeking with an index

The Skeleton 3.3 bitstream attempts to alleviate this problem, by providing an index of periodic keyframes for every content stream in an Ogg segment. Note that the Skeleton 3.3 track only holds data for the segment or "link" in which it resides. So if two Ogg files are concatenated together ("chained"), the Skeleton 3.3's keyframe indexes in the first Ogg segment (the first "link" in the "chain") do not contain information about the keyframes in the second Ogg segment (the second link in the chain).

Each content track has a separate index, which is stored in its own packet in the Skeleton 3.3 track. The index for streams without the concept of a keyframe, such as Vorbis streams, can instead record the time position at periodic intervals, which achieves the same result. When this document refers to keyframes, it also implicitly refers to these independent periodic samples from keyframe-less streams.

All the Skeleton 3.3 track's pages appear in the header pages of the Ogg segment. This means the all the keyframe indexes are immediately available once the header packets have been read when playing the media over a network connection.

For every content stream in an Ogg segment, the Ogg index bitstream provides seek algorithms with an ordered table of "key points". A key point is intrinsically associated with exactly one stream, and stores the offset of the page on which it starts, o, as well as the presentation time of the keyframe t, as a fraction of seconds. This specifies that in order to render the stream at presentation time t, the last page which lies before all information required to render the keyframe at presentation time t begins exactly at byte offset o, as offset from the beginning of the Ogg segment. The offset is exactly the first byte of the page, so if you seek to a keypoint's offset and don't find the beginning of a page there, you can assume that the Ogg segment has been modified since the index was constructed, and that the index is now invalid and should not be used. The time t is the keyframe's presentation time corresponding to the granulepos, and is represented as a fraction in seconds. Note that if a stream requires any preroll, this will be accounted for in the time stored in the keypoint.

The Skeleton 3.3 track contains one index for each content stream in the file. To seek in an Ogg file which contains keyframe indexes, first construct the set which contains every active streams' last keypoint which has time less than or equal to the seek target time. Then from that set of key points, select the key point with the smallest byte offset. You then verify that there's a page found at exactly that offset, and if so, you can begin decoding. If the first keyframe you encounter has a time equal to that stored in the keypoint, you have made the optimal seek, and can safely continue to decode up to the seek target time. You are guaranteed to pass keyframes on all streams with time less than or equal to your seek target time while decoding up to the seek target. However if the first keyframe you encounter after decoding does not have the same presentation time as is stored in the keypoint, you then the index is invalid (possibly the file has been changed without updating the index) and you must either fallback to a bisection search, or keep decoding if you've landed "close enough" to the seek target.

Be aware that you cannot assume that any or all Ogg files will contain keyframe indexes, so when implementing Ogg seeking, you must gracefully fall-back to a bisection search or other seek algorithm when the index is not present, or when it is invalid.

The Skeleton 3.3 BOS packet also stores meta data about the segment in which it resides. It stores the timestamps of the first and last samples in the segment. This also allows you to determine the duration of the indexed Ogg media without having to decode the start and end of the Ogg segment to calculate the difference (which is the duration).

The Skeleton 3.3 BOS packet also contains the length of the indexed segment in bytes. This is so that if the seek target is outside of the indexed range, you can immediately move to the next/previous segment and either seek using that segment's index, or narrow the bisection window if that segment has no index. You can also use the segement length to verify if the index is valid. If the contents of the segment have changed, it's highly likely that the length of the segment has changed as well. When you load the segment's header pages, you should check the length of the physical segment, and if it doesn't match that stored in the Skeleton header packet, you know the index is out of date and not safe to use.

The Skeleton 3.3 BOS packet also contains the offset of the first non header page in the Ogg segment. This means that if you wish to delay loading of an index for whatever reason, you can skip forward to that offset, and start decoding from that offset forwards.

When using the index to seek, you must verify that the index is still correct. You can consider the index invalid if any of the following are true:

  1. The segment length stored in the Skeleton BOS packet doesn't match the length of the physical segment, or
  2. after a seek to a keypoint's offset, you don't land exactly on a page boundary, or
  3. the first keyframe decoded after seeking to a keypoint's offset doesn't have the same presentation time as stored in the index.

You should also always check the Skeleton version header field to ensure your decoder correctly knows how to parse the Skeleton track.

Be aware that a keyframe index may not index all keyframes in the Ogg segment, it may only index periodic keyframes instead.

Format Specification

Unless otherwise specified, all integers and fields in the bitstream are encoded with the least significant bit coming first in each byte. Integers and fields comprising of more than one byte are encoded least significant byte first (i.e. little endian byte order).

The Skeleton 3.3 track is intended to be backwards compatible with the Skeleton 3.0 specification, available at http://www.xiph.org/ogg/doc/skeleton.html . Unless specified differently here, it is safe to assume that anything specified for a Skeleton 3.0 track holds for a Skeleton 3.3 track.

As per the Skeleton 3.0 track, a segment containing a Skeleton 3.3 track must begin with a Skeleton 3.3 fishead BOS packet on a page by itself, with the following format:

  1. Identifier: 8 bytes, "fishead\0".
  2. Version major: 2 Byte unsigned integer denoting the major version (3)
  3. Version minor: 2 Byte unsigned integer denoting the minor version (1)
  4. Presentationtime numerator: 8 Byte signed integer
  5. Presentationtime denominator: 8 Byte signed integer
  6. Basetime numerator: 8 Byte signed integer
  7. Basetime denominator: 8 Byte signed integer
  8. UTC [ISO8601]: a 20 Byte string containing a UTC time
  9. [NEW] First-sample-time numerator: 8 byte signed integer representing the numerator for the presentation time of the first sample in the media. Note that samples between the first-sample-time and the Presentationtime are supposed to be skipped during playback.
  10. [NEW] First-sample-time denominator: 8 byte signed integer, with value 0 if the timestamp is unknown. Decoders should always ensure that the denominator is not 0 before using it as a divisor!
  11. [NEW] Last-sample-time numerator: 8 byte signed integer representing the end time of the last sample in the segment.
  12. [NEW] Last-sample-time denominator: 8 byte signed integer, with value 0 if the timestamp is unknown. Decoders should always ensure that the denominator is not 0 before using it as a divisor!
  13. [NEW] The length of the segment, in bytes: 8 byte unsigned integer, 0 if unknown.
  14. [NEW] The offset of the first non-header page, in bytes: 8 byte unsigned integer, 0 if unknown.

The first-sample-time and last-sample-time are rational numbers, in units of seconds. If the denominator is 0 for the first-sample-time or the last-sample-time, then that value was unable to be determined at indexing time, and is unknown. The duration of the Ogg segment can be calculated by subtracting the first-sample-time from the last-sample-time.

In Skeleton 3.3 the "fisbone" packets remain unchanged from Skeleton 3.0, and will still follow after the other streams' BOS pages and secondary header pages.

Before the Skeleton EOS page in the segment header pages come the Skeleton 3.3 keyframe index packets. There should be one index packet for each content stream in the Ogg segment, but index packets are not required for a Skeleton 3.3 track to be considered valid. Each keypoint in the index is stored in a "keypoint", which in turn stores an offset, checksum, and timestamp. In order to save space, the offsets and timestamps are stored as deltas, and then variable byte-encoded. The offset and timestamp deltas store the difference between the keypoint's offset and timestamp from the previous keypoint's offset and timestamp. So to calculate the page offset of a keypoint you must sum the offset deltas of up to and including the keypoint in the index.

The variable byte encoded integers are encoded using 7 bits per byte to store the integer's bits, and the high bit is set in the last byte used to encode the integer. The bits and bytes are in little endian byte order. For example, the integer 7843, or 0001 1110 1010 0011 in binary, would be stored as two bytes: 0xBD 0x23, or 1011 1101 0010 0011 in binary.

Each Skeleton 3.3 keyframe index packet contains the following:

  1. Identifier 6 bytes: "index\0"
  2. The serialno of the stream this index applies to, as a 4 byte field.
  3. The number of keypoints in this index packet, 'n' as a 8 byte unsigned integer. This can be 0.
  4. The keypoint presentation time denominator, as an 8 byte signed integer.
  5. 'n' key points, each of which contain, in the following order:
    1. the keyframe's page's byte offset delta, as a variable byte encoded integer. This is the number of bytes that this keypoint is after the preceeding keypoint's offset, or from the start of the segment if this is the first keypoint. The keypoint's page start is therefore the sum of the byte-offset-deltas of all the keypoints which come before it.
    2. the presentation time numerator delta, of the first key frame which starts on the page at the keypoint's offset, as a variable byte encoded integer. This is the difference from the previous keypoint's timestamp numerator. The keypoint's timestamp numerator is therefore the sum of all the timestamp numerator deltas up to and including the keypoint's. Divide the timestamp numerator sum by the timestamp denominator stored earlier in the index packet to determine the presentation time of the keyframe in seconds.

Note that a keypoint always represents the first key frame on a page. If an Ogg page contains two or more keyframes, the index's key point *must* refer to the first keyframe on that page, not any subsequent keyframes on that page.

The key points are stored in increasing order by offset (and thus by presentation time as well).

The byte offsets stored in keypoints are relative to the start of the Ogg bitstream segment. So if you have a physical Ogg bitstream made up of two chained Oggs, the offsets in the second Ogg segment's bitstream's index are relative to the beginning of the second Ogg in the chain, not the first. Also note that if a physical Ogg bitstream is made up of chained Oggs, the presence of an index in one segment does not imply that there will be an index in any other segment.

The exact number of keyframes used to construct key points in the index is up to the indexer, but to limit the index size, we recommend including at most one key point per every 64KB of data, or every 2000ms, whichever is least frequent.

As per the Skeleton 3.0 track, the last packet in the Skeleton 3.3 track is an empty EOS packet.

Software Prototype

For a prototype indexer, see OggIndex. Also included there is a program OggIndexValid, which can verify that Theora and Vorbis indexes are valid. If you're implementing your own indexer, or going to be modifying existing indexes, always verify that your modified indexes are valid as per OggIndexValid!

Recent ffmpeg2theora nightlies will also include a keyframe index in the Skeleton 3.3 track if you specify the command line option --seek-index.

To see how indexes improves network seeking performance, you can download a development version of Firefox which can take advantage of indexes here:

http://pearce.org.nz/video/firefox-indexed-seek-linux.tar.bz2

http://pearce.org.nz/video/firefox-indexed-seek-macosx.dmg

http://pearce.org.nz/video/firefox-indexed-seek-win32.zip

If you already have a Firefox instance running, you'll need to either close your running Firefox instance before starting the index-capable Firefox, or start the index-capable Firefox with the --no-remote command line parameter.

To compare the network performance of indexed versus non-indexed seeking, point the index-capable Firefox here:

http://pearce.org.nz/video/indexed-seek-demo.html