Videos/A Digital Media Primer For Geeks/making

From XiphWiki
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page documents some of the background information behind the production of Digital Media Primer For Geeks. To see the video or its wiki-edition visit the main video page.

The making of…



Canon HV40 HDV camera w/ wide-angle lens operating on a tripod. At the time I was looking for MyFirstVideoCamera, the six people I asked who did video work all recommended this same camera, and two said not to get it without the wide angle lens. I took their advice and have been happy with it. Among other nifty features, the camera offers true progressive scan modes, live firewire output, and the ability to act as a digitizer for external video input. With the patches I made in my Git repo, Cinelerra natively handles the Canon HDV progressive modes.

The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture. Useful for shooting indoors at night (eg, this entire video)

No additional lighting kit was used.


Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera's microphone input.

No additional audio kit was used.


Whiteboard markers by 'Bic'

Drawing aids by Staedtler, McMaster Carr, and 'Generic'.

Video shooting sequence

Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision. In the future... I'm getting a teleprompter. OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I'll not be invited to many parties.

Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks. Despite looking a lot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).

Camera operated in 24F shutter priority mode (Tv set to "24") with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked. Microphone attenuation setting was active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building's ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room's only shadow. Most of the room light is focused on the table and walls. Additional fill lighting kit would have been useful, but for the first vid, I didn't want 'perfect' to be the enemy of 'good'.

Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.

Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.

Production sequence

All hail Cinelerra. You better hail, or Cinelerra will get pissy about it.

Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it's composed entirely of compressed bugs. That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work. It was also the only FOSS editor with a working 2D compositor. It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at;a=summary)

Choosing takes

Each shooting session yielded four to six hours of raw video. The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file. Be careful that Settings->Align Cursor On Frames is set, else the audio and video renders won't start on the same boundary.


At this point, the raw video clips were adjusted for gamma, contrast, and saturation in gstreamer and mplayer. In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to 'correct' (there is no real correction as the low-end data is gone, but it's possible to make it look better). Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform. The whiteboard tends not to look white because it's mildly reflective, and picked up the color of the cyan and orange audio baffles in the room like a big diffuse mirror.

The audio was both noisy (due to the building's ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two). Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily. It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable). Later takes used some big fleece 'soft flats' in the room to absorb some additional reverb, and the later takes are less heavily filtered.

The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy).

Special Effects

Audio special effects were one-offs, mostly done using Sox. The processed demo sections of audio were then spliced back into the original audio takes using Audactity.

Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi. A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer. Video effects were then stitched back into the original video takes in Cinelerra.


All editing was done in Cinelerra. This primarily consisted of stitching the individual takes back together with crossfades. All input and rendering output were done with raw YUV4MPEG and WAV files. Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.


I encoded by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM. I synced subtitles to the video by hand with Audacity (I already had the script) in SRT format [for easy editing/translation and syncing with the video in HTML5], and transcoded to Ogg Kate using kateenc. The Kate subs were then muxed with the Ogg video encoding using oggz-merge, and finally indexing added to the Ogg with OggIndex.

(For the record, scaling with mplayer below is done mostly for convenience. It should not be used for professional as the libswscale resampler used in mplayer tends to shift the colors in the result.)

Sample Ogg command lines… ...for producing 360p, 128-ish (a4) audio and 500-ish (v50) video with subtitles and index

  • perform a little denoising, scale, and deband the raw render:

mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1 complete.y4m -fast -noconsolecontrols -vo yuv4mpeg:file=filtered.y4m

  • encode the basic Ogg Vorbis/Theora file:

encoder_example -a 4 -v 50 -k 240 complete.wav filtered.y4m -o basic.ogv

  • produce Kate subs from the SRT input file:

kateenc -t srt -l en_US -c SUB -o subs.kate

  • add the subs to the Ogg video file:

oggz-merge basic.ogv subs.kate -o subbed.ogv

  • add index for faster seeking on the Web:

OggIndex subbed.ogv -o A_Digital_Media_Primer_For_Geeks-360p.ogv

Important Update as of 2013-02-23

A few things have changed slightly with the WebM tools since episode-I was first released. It also turns out that mkvmerge/clean don't generate strictly correct WebM files. The mkvmerge/clean problem can cause playback in Chromium to freeze due to out-of-order timestamps. There's more information here: [1]

As a result, I've updated the examples below to use the current tool names and options, as well as use ffmpeg to perform WebM muxing rather than mkvtoolnix.

Sample WebM command lines… ...for producing 360p, 128-ish (a4) audio and 500kbps video with index

  • Might as well reuse the Vorbis encoding already done for the Ogg file:

oggz-rip -c vorbis A_Digital_Media_Primer_For_Geeks-360p.ogv -o vorbis.ogg

  • Produce VP8 encoding from the y4m file used for Theora

vpxenc -p 2 -t 4 --best --target-bitrate=1500 --end-usage=0 --auto-alt-ref=1 -v --minsection-pct=5 --maxsection-pct=800 --lag-in-frames=16 --kf-min-dist=0 --kf-max-dist=120 --static-thresh=0 --drop-frame=0 --min-q=0 --max-q=60 -o vp8.webm filtered.y4m

  • Mux the audio and video into the final WebM file

ffmpeg -i vorbis.ogg -i vp8.webm -c:v copy -c:a copy A_Digital_Media_Primer_For_Geeks-360p.webm

Web Presentation

HTML5 is new, so I found (to my unpleasant surprise) that I got to script all my website controls from scratch. Virtually everything preexisting was either very large/inscrutable/inflexible (a 'complete web video solution!') and offered both features I did not want and was missing features I did, or was a proof of concept that was obviously unfinished, unpolished, and not well tested.

Playback Controls

I wanted more than the standard set of controls, but I did *not* want to fall into the usual web geek trap: a UI with 50 buttons in a big heap with no thought to usability, and extra points for using at least twelve colors. I wanted new controls to be unobtrusive but obvious when you wanted them, and to blend into the prexisting controls.

The clearly best way to do this would be to put a transparent canvas layer over the video window and implement completely fresh controls. This would probably give the most bug-proofness/future-proofness and definitely give the most consistent look/feel across browsers. I also estimate it would take several weeks of full time scripting to make it work as expected (remember, HTML5 is new and still a draft, so there are endless inconsistencies and implementation bugs to deal with. Writing a script is easy and fast. Making it work consistently is time consuming and frustrating).

Adding a fade-in bar that approximately matched the existing controls in most players would be finicky, shorter-lived and not as pretty, but it could be practical and working far faster than the overkill solution of reimplementing everything. As HTML5 is as yet a draft anyway and I'll probably have to revisit any site scripting regularly anyway, option two seemed the sensible way to go.

The nice thing about HTML and JavaScript both is that they're inherently Open Source; anyone can inspect the code I wrote (and point and laugh).


Although I think external subtitles aren't the best overall direction, it's all HTML5 currently offers. The Ogg files include Kate format subtitles, but HTML5 offers no API for accessing them. What HTML5 does give is a high-resolution playback timer, and the ability to load and parse subtitle files.

subtitles.js is an updated version of that loads and parses SRT format subs on demand from any URL, and places the text of the each subtitle into a <div> element in synchronization with the video playback timer. A little additional CSS is all that's necessary to put a translucent background behind it, and display it over the video frame.

Resolution / stream switching

This was considerably less elegant due to some apparent inadequacies in the HTML5 draft spec. There seems to be two basic ways of changing the video currently playing back in the current draft.

The first way to change streams is to create a new video element via javascript, wait for it to load, then replace the current video with the new one. Unfortunately, HTML5 gives no way to prevent the original video, even when stopped, from using all available bandwidth to keep buffering as fast as it can. This starves the replacement video of network access, causing a lengthy delay when loading. It looks very nice and seamless when it finally works, but can easily result is switching video streams taking 15-30 seconds or more.

The second option is to switch the preexisting video element to a new stream. This is much faster as the original stream stops sinking bandwidth immediately, but upon loading it always starts from the beginning and in current browsers also displays the first frame, even if playback isn't started. After the load completes, then it's possible to seek forward to where the original stream started. It doesn't look as good, but it's much faster in practice.

I use the second, faster option, so there's a brief flash back to the beginning of the video upon resolution switch.

Chapter Navigation

Nothing special here, all it is is a <select> dropdown with an onchange handler that sets a new 'video.currentTime'.

Control pop/unpop

Oddly enough this was the hardest part, not because it's hard to do, but it's hard to make it consistent across browsers. Every browser fires radically different UI events for the same mouse/keyboard actions.


...In retrospect, not as gratuitous as it seemed when I first wrote it. Many aspects about how video is made and presented assume viewing in a relatively dim environment, where the video being watched is the brightest thing in sight [or close to it]. Xiph's web styling uses white backgrounds, which I found actively distracting and out of place, but altering the style of the site for just the video pages also seemed clearly wrong. So I added an animated dim/undim on playback/pause (instantaneous dim/undim was jarring). I'm now convinced it was a good call, assuming it actually works everywhere as intended (it won't work on browsers using the Cortado fallback).