The MNG animation format grew out of the more successful PNG project as a way to provide animation support to compete with animated gif on the web. It basically allows encoding of both png and jpeg images and then construction of frames for display through alpha composition of those images and previously constructed frame data, including offsets and subimages. There is some crude scaling support, mostly as a way to implement gradients, but no general scale and rotation support. There is a delta-png mode for encoding minor frame-to-frame changes.
There are also some simpler profiles that are just a sequence of full-frame images at a constant framerate.
We are interested in an Ogg encapsulation of MNG for a number of reasons. The simplest is that MNG is an excellent format for 'traditional' cel-style animation, especially with fixed background plates, but within itself provides no audio support. Multiplexing such an animation with a Vorbis audio track in Ogg makes an obviously complete format. One could also put vorbis data into the MNG format as well of course "VHDR, VDAT, ..., VEND" but we have other purposes in mind.
Although most efficient in compressing animation, MNG's support for both high-quality PNG lossless compression and subsumption of mjpeg make it a good source format for working with video in the Ogg framework. Combined with FLAC audio we have a good mastering format, and something that can be piped to encoders in a single stream.
Finally, we want to support 'slideshow' tracks and graphic overlays over video in Ogg Theora. So one can do DVD-Video style graphic subtitles if you want precise control over the appearance (only with alpha blending so they look much better) or even complicated graphical annotations of video, like the VH1 "Pop Up Video" series. On the other side, one can do a series of slides with music or commentary, for example, a webcast of a conference presentation can include both video of the speaker and full-resolution images of the slides as they are presented, a much better solution than the current practice of cutting periodically to an illegible version in the video feed itself.
Native MNG is a chunk-based file format. Each coded element (header, compressed images data, control information, etc.) is wrapped in a 'chunk' structure consisting of a 4-byte data length, a 4-byte type field, the actual data, and a 4-byte CRC. These chunks are simply contatentated to form the MNG bytestream. There are some ordering rules for the chunks, and an initial 8 byte magic sequence for recognition.
The most straightforward encapsulation is to make an Ogg packet of each chunk, and then apply some familiar conventions for pagination to assist with streaming and seekability. We complicate this by including only the chunk type and data fields, and *not* the length and checksum which are redudant in Ogg. This increases complexity a bit, but the bitrate savings can be valuable in some cases. libmng (v1.0.8 and later) has special support for this. Granulepos would be the presentation time of the mng frame (in "ticks" as defined in the MHDR chunk) in variable framerate schemes, or the frame number in fixed-framerate streams, analogous to the treatment in MNG itself.
The beginning-of-stream packet thus consists on the 8 byte MNG file magic followed by the MHDR chunk type and data. This must appear on a page by itself for codec id as usual. Other packets consist of individual chunk type+body as described above. The end-of-stream packet will be the MEND chunk (which has no body and so consists of only the 4 byte type) and need not be on a page by itself, nor must that page appear at the end of a grouped Ogg segment.
For the sake of streaming, complex MNG streams should divide their referenced data into a section at the beginning of the logical stream, or associated within some trivial time with the actual display, so seeking works without having to search for referenced objects beyond the beginning of stream, as is already required by vorbis and theora. For simple sequence-of-frames data, things are more straightforward of course. Likewise, if there is a tEXt (or zTXt or iTXt) chunk with metadata describing the whole stream, it should appear at the beginning of the stream after the MHDR but before any image data.
The obvious extension could be made to including still images as PNG or JNG (png-style file format encapsulating jpeg image data with an optional alpha mask) e.g for album art. In that case the same stripped chunk to packet mapping works, with the beginning-of-stream packet being the 8 byte PNG or JNG signature followed by the IHDR or JHDR chunk type and body, respectively.
There are a couple of undecided issues that most likely need feedback from implementation.
MNG can do variable frame rate streams, but each frame is marked with the delay until the next frame, which does not work well with Ogg's stream-oriented design. There have been two proposals to deal with this. A 'SHOW' chunk (empty) can be periodically repeated to refresh the previous frame. This is tedious but fairly low overhead, and results in a MNG stream that would play back identically outside the Ogg wrapper. Alternatively, the Ogg page granulepos could be used to indicate presentation time, overriding the internal MNG timing. This is not without precedent since MNG defines a framing rate of 'zero' as a special value where advance is triggered by some external event (3d volume slices, manual slideshow advance, and so on). Possibly both could be implemented with the Ogg granulepos overriding only when the MHDR framing rate is 0.
We have two uses of MNG in Ogg that need to be distinguished. The MNG stream can be a separate 'video' stream all on its own, e.g. presentation slides that should be displayed in parallel with the other streams, and as an overlay on another video stream. This could be done with heuristics (an overlay must be transparent, and match the video in frame dimensions), conventions (a tEXt chuck could include a "ROLE:Overlay" definition, or through a separate metastream header.
Conrad suggested a design where processing instructions could in included in the Fishbone metadata packet, but Ralph thinks this needs some refinement.
We recommend the heuristic approach for now.
For the presentation slides use case, the streaming server will want to cache the current frame and send it on connect to new listeners. This is a new requirement over just caching the initial headers that are required by the Vorbis and Theora codecs.