Schemes such as M3F require an embedding in physical Ogg streams. This page is for development of a specification for embedding XML streams in Ogg. The final version will probably look like the CMML mapping.
(taken from mailing list discussions)
Should we have a magic number for this? By convention the beginning of stream packet for codecs in Ogg identifies the packet, Strictly speaking we will have a magic number regardless, but should it be:
- '<?xml' the opening of the XML declaration. In this instance the demuxer would pass this upwards to an XML parser which would derive from the rest of the bos packet what to do with it.
- Some other sequence before the XML starts, identifying the particular stream type, probably with a version number which will imply the contents to be some form of XML. CMML does this. This may avoid the almost inevitable problem of some implementor assuming '<?xml' is always metadata, it may also reduce the difficulty of writing a demuxer. Demuxers could still peek ahead and try to look of an xml namespace it recognizes.
Division into packets
The raw XML should ideally be broken into packets in a way that the loss of some packets, while destroying information, does not result in an invalid stream. Generally this means:
- The bos packet should consist of any initial processing directives, namespace declarations, and the root tag if there is one.
- Subsequent packets should be valid xml stanzas, similar to the XMPP definition, which concatenated are also valid xml.
- The eos packet can be empty. If there was a root tag in the bos packet, it should be closed here.
Parsers should close all open tags on encountering eos to handle truncated stream conditions. Encountering eos here means either a after processing a packet marked with the eos flag, having finished on an Ogg page with the eos flag set, or a virtual eos, implied by encountering bos flags for a new chain segment.
c.f. Silvia Pfeiffer on ogg-dev:
I suggest using the solution that CMML has come to use. The XML file is essentially the same as an unencapsulated physical bitstream. Then there is a mapping into a logical bitstream, where some of the default information - in particular the XML header - are split off and put into the bos packet - nothing really needs to go into the eos packet. There's also a magic number and a version number. Also, use the granulepos scheme that we defined for CMML pages- you're going to make your lives easier.
That is, use a split granulepos scheme like keyframe codecs to indicate the offset to the previous packet. Whether this is appropriate for a given XML steam will depend on its application. Metadata that applies to the whole stream should just be included at the beginning, like OggSkeleton, and the granulepos can just all be zero. Time-based data, like a slideshow or 'currently playing' for radio streams, should be muxed throughout the stream.
Given that the CMML mapping would be sensible should we simply hijack CMML for this use (i.e. just put the metadata XML in a CMML stream? Arguments against this: it means stream parsing is needed to find out whether the CMML stream contains metadata too and possibly complicates CMML handling. If not hijacking CMML it might be worth having a flag indicating whether the stream is continuous or secondary header only.
It seems reasonable to use the fishbone message fields in Ogg Skeleton to supply an ID to be associated with each logical stream (via an "id:" message header field). The other side of this problem is how these should be addressed. The physical bitstream itself shouldn't need one, but do we worry about chained/concatenated streams?
The Skeleton section of the Annodex bitstream format specifies that mandatory header fields MUST be US-ASCII encoded, but allows UTF-8 for other message fields. This does not appear to be a problem for an ID field. RFC2822 limits message header fields to 998 bytes (excluding CRLF) and spaces are not normally permitted in IDs, so IDs would be limited to 994 bytes long.
For use in Skeleton. MIME Types and File Extensions gives 'text/cmml' for 'CMML without container', if that can be used by Skeleton to describe packetized CMML in Ogg then there's no issue here; 'text/xml' or whatever is appropriate could be used.
It is a barrier to the widespread introduction of any metadata format that the Vorbis I spec only requires players to support an unaccompanied Vorbis stream; many Ogg Vorbis players will refuse to play augmented streams, especially if the content is not recognised (although many recent players do succeed). As a prelude to development of an Ogg metadata format it will be necessary to encourage developers to introduce more flexible Ogg filters.
One of the intentions of the current work on MIME Types and File Extensions is to superseede the Vorbis I spec allowing all types of metadata to be included and encouraging program writers to ignore unrecognised content in these files. The metadata embedded Ogg files below are therefore '.oga' served as audio/x-ogg. You may want to use your browser's save as function if you have trouble opening the links.
To help with testing the following files are available, based on a speculative (and very basic) metadata format. In each case the derivative files are under the same license as the original. Two sets are provided to allow chained stream testing. On some players the seek tests produce an annoying clicking—if you like the music get the originals. Please notice that filenames are mixed case and add a note in discussion if you find a broken link.
- The Ogg-Vorbis-XML version
- The XML/RDF description as a separate document
- With the XML page repeated after every fifth Vorbis page. (This is not a suggested way to add meta data, just a way of testing how players handle seeking in the presence of an unknown stream.)
- With the XML page repeated after every fifth Vorbis page and the stream ending on a meta data page (breaks simpler track-length strategies, again not a suggested format for metadata)