OggMNG

From XiphWiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Moving-image Network Graphics (MNG) is a PNG-like image format supporting multiple images, animation and transparent JPEG. OggMNG is the specification to place a MNG image inside an Ogg container.

Motivation

The MNG animation format grew out of the more successful PNG project as a way to provide animation support to compete with animated gif on the web. It basically allows encoding of both png and jpeg images and then construction of frames for display through alpha composition of those images and previously constructed frame data, including offsets and subimages. There is some crude scaling support, mostly as a way to implement gradients, but no general scale and rotation support. There is a delta-png mode for encoding minor frame-to-frame changes.

There are also some simpler profiles that are just a sequence of full-frame images at a constant framerate.

We are interested in an Ogg encapsulation of MNG for a number of reasons. The simplest is that MNG is an excellent format for 'traditional' cel-style animation, especially with fixed background plates, but within itself provides no audio support. Multiplexing such an animation with a Vorbis audio track in Ogg makes an obviously complete format. One could also put vorbis data into the MNG format as well of course "VHDR, VDAT, ..., VEND" but we have other purposes in mind.

Although most efficient in compressing animation, MNG's support for both high-quality PNG lossless compression and subsumption of mjpeg make it a good source format for working with video in the Ogg framework. Combined with FLAC audio we have a good mastering format, and something that can be piped to encoders in a single stream.

Finally, we want to support 'slideshow' tracks and graphic overlays over video in Ogg Theora. So one can do DVD-Video style graphic subtitles if you want precise control over the appearance (only with alpha blending so they look much better) or even complicated graphical annotations of video, like the VH1 "Pop Up Video" series. On the other side, one can do a series of slides with music or commentary, for example, a webcast of a conference presentation can include both video of the speaker and full-resolution images of the slides as they are presented, a much better solution than the current practice of cutting periodically to an illegible version in the video feed itself.

Specification

Native MNG is a chunk-based file format. Each coded element (header, compressed images data, control information, etc.) is wrapped in a 'chunk' structure consisting of a 4-byte data length, a 4-byte type field, the actual data, and a 4-byte CRC. These chunks are simply concatenated to form the MNG bytestream. There are some ordering rules for the chunks, and an initial 8 byte magic sequence for recognition.

The most straightforward encapsulation is to make an Ogg packet of each chunk, and then apply some familiar conventions for pagination to assist with streaming and seekability. We complicate this by including only the chunk type and data fields, and *not* the length and checksum which are redundant in Ogg. This increases complexity, but the bitrate savings are valuable in some cases. libmng (v1.0.8 and later) has special support for this. Granulepos would be the presentation time of the mng frame (in "ticks" as defined in the MHDR chunk) in variable framerate schemes, or the frame count in fixed-framerate streams, analogous to the treatment in MNG itself.

The beginning-of-stream packet thus consists on the 8 byte MNG file magic followed by the MHDR chunk type and data. This satisfies the design requirements of having the initial packet provide both codec identification and relevant information about the stream, such as the granulepos scheme. This must appear on a page by itself as usual. All other packets consist of individual chunk type+body as described above. The end-of-stream packet will be the MEND chunk (which has no body and so consists of only the 4 byte type) and need not be on a page by itself, nor must that page appear at the end of a grouped Ogg segment.

For the sake of streaming, complex MNG streams should divide their referenced data into a section at the beginning of the logical stream, or associated within some trivial time with the actual display, so seeking works without having to search beyond the beginning of the stream for referenced objects, as is already required by vorbis and theora. For simple sequence-of-frames data, things are more straightforward of course. Likewise, if there is a tEXt (or zTXt or iTXt) chunk with metadata describing the whole stream, it should appear at the beginning of the stream after the MHDR but before any image data.

Since the still image formats PNG or JNG (png-style file format encapsulating jpeg image data with an optional alpha mask) are also by specification valid MNG files, we make the same extension to allow their encapsulation in Ogg, e.g. for album art. In that case the same stripped chunk to packet mapping is used, with the beginning-of-stream packet being the 8 byte PNG or JNG signature followed by the IHDR or JHDR chunk type and body, respectively. Encoders should set the granulepos of any pages containing still image data to 0. Decoders should ignore the granulepos and display the still image in whatever association with the other data it deems appropriate.

Speculation

There are a couple of undecided issues that most likely need feedback from implementation.

MNG can do variable frame rate streams, but each frame is marked with the delay until the next frame, which does not work well with Ogg's stream-oriented design. There have been two proposals to deal with this. A 'SHOW' chunk (empty) can be periodically repeated to refresh the previous frame. This is tedious but fairly low overhead, and results in a MNG stream that would play back identically outside the Ogg wrapper. Alternatively, the Ogg page granulepos could be used to indicate presentation time, overriding the internal MNG timing. This is not without precedent since MNG defines a framing rate of 'zero' as a special value where advance is triggered by some external event (3d volume slices, manual slideshow advance, and so on). Possibly both could be implemented with the Ogg granulepos overriding only when the MHDR framing rate is 0.

We have two uses of MNG in Ogg that need to be distinguished. The MNG stream can be a separate 'video' stream all on its own, e.g. presentation slides that should be displayed in parallel with the other streams, and as an overlay on another video stream. This could be done with heuristics (an overlay must be transparent, and match the video in frame dimensions), conventions (a tEXt chuck could include a "ROLE:Overlay" definition, or through a separate metastream header.

Since this is a general requirement for overlay streams, the plan is to define standard entries in the Skeleton message headers (placed in the fishbone packet corresponding to the MNG stream) to make these determinations. The heuristics described above can be used as fallbacks in the absence of Skeleton information.

Contra

The MNG format has seen little adoption. The niche it was targeted at (animated web graphics) has been filled by flash, with SVG being the competing free format. The full profile is very complex while missing obvious features like general scale and rotation for sprite animation. Wouldn't something simpler or more popular be a better choice?

See also OggSpots and APNG.

Implementation Notes

For the presentation slides use case, the streaming server will want to cache the current frame and send it on connect to new listeners. This is a new requirement over just caching the initial headers that are required by the Vorbis and Theora codecs.