<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.xiph.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Mindspillage</id>
	<title>XiphWiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.xiph.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Mindspillage"/>
	<link rel="alternate" type="text/html" href="https://wiki.xiph.org/Special:Contributions/Mindspillage"/>
	<updated>2026-05-14T11:29:59Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.45.1</generator>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/Digital_Show_and_Tell&amp;diff=13961</id>
		<title>Videos/Digital Show and Tell</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/Digital_Show_and_Tell&amp;diff=13961"/>
		<updated>2013-02-26T08:25:41Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Epilogue */ +Credits section from video&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:dsat_001.jpg|400px|right]]&lt;br /&gt;
&lt;br /&gt;
Continuing in the &amp;quot;firehose&amp;quot; tradition of [[Videos/A_Digital_Media_Primer_For_Geeks|Episode 01]], Xiph.Org&#039;s second video on digital media explores multiple facets of digital audio signals and how they &#039;&#039;really&#039;&#039; behave in the real world.&lt;br /&gt;
&lt;br /&gt;
Demonstrations of sampling, quantization, bit-depth, and dither explore digital audio behavior on real audio equipment using both modern digital analysis and vintage analog bench equipment, just in case we can&#039;t trust those newfangled digital gizmos. You can download the source code for each demo and try it all for yourself!&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid2.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
Supported players: [http://www.videolan.org/vlc/ VLC 1.1+], [https://www.mozilla.com/en-US/firefox/ Firefox ], [http://www.chromium.org/Home Chrome ], [http://www.opera.com/ Opera]. Or see [http://www.webmproject.org/users/ other WebM] or [[TheoraSoftwarePlayers|other Theora]] players.&lt;br /&gt;
&lt;br /&gt;
If you&#039;re having trouble with playback in a modern browser or player, please visit our [[Playback_Troubleshooting|playback troubleshooting and discussion]] page.&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
[[Image:Xiph_ep02_test.png|400px|right]]&lt;br /&gt;
&lt;br /&gt;
Hi, I&#039;m Monty Montgomery from [http://www.redhat.com/ Red Hat] and [http://xiph.org/ Xiph.Org].&lt;br /&gt;
&lt;br /&gt;
A few months ago, I wrote&lt;br /&gt;
[http://people.xiph.org/~xiphmont/demo/neil-young.html an article on digital audio and why 24bit/192kHz music downloads don&#039;t make sense].&lt;br /&gt;
In the article, I&lt;br /&gt;
mentioned--almost in passing--that a digital waveform is&lt;br /&gt;
[http://people.xiph.org/~xiphmont/demo/neil-young.html#toc_sfam not a stairstep],&lt;br /&gt;
and you certainly don&#039;t get a stairstep when you&lt;br /&gt;
[[WikiPedia:Digital-to-analog_converter|convert from digital back to analog]].&lt;br /&gt;
&lt;br /&gt;
Of everything in the entire article, &#039;&#039;&#039;that&#039;&#039;&#039; was the number one thing&lt;br /&gt;
people wrote about. In fact, more than half the mail I got was questions and&lt;br /&gt;
comments about basic digital signal behavior.  Since there&#039;s interest, let&#039;s&lt;br /&gt;
take a little time to play with some &#039;&#039;simple&#039;&#039; digital signals.&lt;br /&gt;
&lt;br /&gt;
==Veritas ex machina==&lt;br /&gt;
[[Image:Dsat_002.jpg|200px|right]]&lt;br /&gt;
[[Image:Dsat_003.jpg|200px|right]]&lt;br /&gt;
[[Image:Dsat_004.jpg|200px|right]]&lt;br /&gt;
[[Image:Dsat_005.jpg|200px|right]]&lt;br /&gt;
&lt;br /&gt;
Pretend for a moment that we have no idea how digital signals really&lt;br /&gt;
behave. In that case it doesn&#039;t make sense for us to use digital test&lt;br /&gt;
equipment either.  Fortunately for this exercise, there&#039;s still plenty&lt;br /&gt;
of working analog lab equipment out there.&lt;br /&gt;
&lt;br /&gt;
First up, we need a [[WikiPedia:Function_generator|signal generator]] to provide us with analog input&lt;br /&gt;
signals--in this case, an&lt;br /&gt;
[http://www.home.agilent.com/en/pd-3325A%3Aepsg%3Apro-pn-3325A/synthesizer-function-generator?pm=PL&amp;amp;nid=-536900197.536896863&amp;amp;cc=SE&amp;amp;lc=swe HP3325]&lt;br /&gt;
from 1978.  It&#039;s still a pretty good&lt;br /&gt;
generator, so if you don&#039;t mind the size, the weight, the power&lt;br /&gt;
consumption, and the noisy fan, you can find them on eBay... occasionally&lt;br /&gt;
for only slightly more than you&#039;ll pay for shipping.&lt;br /&gt;
&lt;br /&gt;
Next, we&#039;ll observe our analog waveforms on [[WikiPedia:Oscilloscope_types#Cathode-ray_oscilloscope_.28CRO.29|analog oscilloscopes]],&lt;br /&gt;
like this Tektronix 2246 from the mid-90s, one of the last and very best analog scopes ever made. Every home lab should have one.&lt;br /&gt;
&lt;br /&gt;
...and finally inspect the [[WikiPedia:Spectral_density#Electrical_engineering|frequency spectrum]] of our signals using an&lt;br /&gt;
[[WikiPedia:Spectrum_analyzer#Swept-tuned|analog spectrum analyzer]], this&lt;br /&gt;
[http://www.home.agilent.com/en/pd-3585A%3Aepsg%3Apro-pn-3585A/spectrum-analyzer-high-perf-20hz-40mhz?pm=PL&amp;amp;nid=-536900197.536897319&amp;amp;cc=SE&amp;amp;lc=swe HP3585]&lt;br /&gt;
from the same product line as&lt;br /&gt;
the signal generator.  Like the other equipment here it has&lt;br /&gt;
[http://www.hp9845.net/9845/hardware/processors/ a rudimentary and hilariously large microcontroller],&lt;br /&gt;
but the signal path&lt;br /&gt;
from input to what you see on the screen is completely analog.&lt;br /&gt;
&lt;br /&gt;
All of this equipment is vintage, but aside from its raw tonnage, the specs are still quite good.&lt;br /&gt;
&lt;br /&gt;
At the moment, we have our signal generator set to output a nice 1 [[WikiPedia:Hertz#SI_multiples|kHz]]&lt;br /&gt;
sine wave at one [[WikiPedia:Volt|Volt]] [[WikiPedia:Amplitude#Root_mean_square_amplitude|RMS]].&lt;br /&gt;
We see the sine wave on the oscilloscope, can verify that it is indeed&lt;br /&gt;
1 kHz at 1 Volt RMS, which is 2.8 Volts&lt;br /&gt;
[[WikiPedia:Amplitude#Peak-to-peak_amplitude|peak-to-peak]],&lt;br /&gt;
and that matches the&lt;br /&gt;
measurement on the spectrum analyzer as well.&lt;br /&gt;
&lt;br /&gt;
The analyzer also shows some low-level [[WikiPedia:White_noise|white noise]]&lt;br /&gt;
and just a bit of [[WikiPedia:Harmonic_distortion#Harmonic_distortion|harmonic distortion]],&lt;br /&gt;
with the highest peak about 70[[WikiPedia:Decibel|dB]] or so below&lt;br /&gt;
[[WikiPedia:Fundamental_frequency|the fundamental]].&lt;br /&gt;
Now, this doesn&#039;t matter at all in our demos, but I&lt;br /&gt;
wanted to point it out now just in case you didn&#039;t notice it until&lt;br /&gt;
later.&lt;br /&gt;
&lt;br /&gt;
Now, we drop digital sampling in the middle.&lt;br /&gt;
&lt;br /&gt;
For the conversion, we&#039;ll use a boring, consumer-grade, eMagic USB1&lt;br /&gt;
audio device.  It&#039;s also more than ten years old at this point, and it&#039;s&lt;br /&gt;
getting obsolete.&lt;br /&gt;
&lt;br /&gt;
A recent converter can easily have an order of magnitude better specs.&lt;br /&gt;
[[WikiPedia:Reconstruction_filter#Sampled_data_reconstruction_filters|Flatness]],&lt;br /&gt;
[[WikiPedia:Analog-to-digital_converter#Non-linearity|linearity]],&lt;br /&gt;
[[WikiPedia:Jitter#Sampling_jitter|jitter]],&lt;br /&gt;
[[WikiPedia:Noise_floor|noise behavior]],&lt;br /&gt;
[[WikiPedia:Digital-to-analog_converter#DAC_performance|everything]]...&lt;br /&gt;
you may not&lt;br /&gt;
have noticed.  Just because we can measure an improvement doesn&#039;t&lt;br /&gt;
mean we can hear it, and even these old consumer boxes were already at&lt;br /&gt;
the edge of ideal transparency.&lt;br /&gt;
&lt;br /&gt;
The eMagic connects to my ThinkPad, which displays a digital&lt;br /&gt;
waveform and spectrum for comparison, then the ThinkPad&lt;br /&gt;
sends the digital signal right back out to the eMagic for&lt;br /&gt;
re-conversion to analog and observation on the output scopes.&lt;br /&gt;
&lt;br /&gt;
Input to output, left to right.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Stairsteps==&lt;br /&gt;
[[Image:Dsat 006.jpg|360px|right]]&lt;br /&gt;
[[Image:Dsat 007.png|360px|right]]&lt;br /&gt;
OK, it&#039;s go time. We begin by converting an analog signal to digital and&lt;br /&gt;
then right back to analog again with no other steps.&lt;br /&gt;
&lt;br /&gt;
The signal generator is set to produce a 1kHz sine wave just like&lt;br /&gt;
before.&lt;br /&gt;
&lt;br /&gt;
We can see our analog sine wave on our input-side oscilloscope.&lt;br /&gt;
&lt;br /&gt;
We digitize our signal to&lt;br /&gt;
[[Videos/A_Digital_Media_Primer_For_Geeks#Raw_.28digital_audio.29_meat|16 bit PCM at 44.1kHz]],&lt;br /&gt;
same as on a CD.&lt;br /&gt;
The spectrum of the digitized signal matches what we saw earlier&lt;br /&gt;
&lt;br /&gt;
and what we see now on the analog spectrum analyzer, aside from its &lt;br /&gt;
[[WikiPedia:High_impedance|high-impedance input]] being just a smidge noisier.&lt;br /&gt;
&lt;br /&gt;
For now, the waveform display shows our digitized sine wave as a&lt;br /&gt;
stairstep pattern, one step for each sample.&lt;br /&gt;
&lt;br /&gt;
And when we look at the output signal that&#039;s been converted&lt;br /&gt;
from digital back to analog, we see...&lt;br /&gt;
&lt;br /&gt;
It&#039;s exactly like the original sine wave.  No stairsteps.&lt;br /&gt;
&lt;br /&gt;
OK, 1 kHz is still a fairly low frequency, maybe the stairsteps are just&lt;br /&gt;
hard to see or they&#039;re being smoothed away.  Fair enough. Let&#039;s choose&lt;br /&gt;
a higher frequency, something close to [[WikiPedia:Nyquist_frequency|Nyquist]], say 15kHz.&lt;br /&gt;
&lt;br /&gt;
Now the sine wave is represented by less than three samples per cycle, and...&lt;br /&gt;
&lt;br /&gt;
the digital waveform looks pretty awful.  Well, looks&lt;br /&gt;
can be deceiving. The analog output...&lt;br /&gt;
&lt;br /&gt;
is still a perfect sine wave, exactly like the original.&lt;br /&gt;
&lt;br /&gt;
Let&#039;s keep going up.&lt;br /&gt;
&lt;br /&gt;
Let&#039;s see if I can do this without blocking any cameras.&lt;br /&gt;
&lt;br /&gt;
16kHz.... 17kHz... 18kHz... 19kHz... &lt;br /&gt;
&lt;br /&gt;
20kHz.  Welcome to the upper limits of human hearing. The output&lt;br /&gt;
waveform is still perfect. No jagged edges, no dropoff, no stairsteps.&lt;br /&gt;
&lt;br /&gt;
So where&#039;d the stairsteps go? Don&#039;t answer, it&#039;s a trick question.&lt;br /&gt;
They were never there.&lt;br /&gt;
&lt;br /&gt;
Drawing a digital waveform as a stairstep... was wrong to begin with.&lt;br /&gt;
&lt;br /&gt;
Why? A stairstep is a continuous-time function.  It&#039;s jagged, and it&#039;s&lt;br /&gt;
piecewise, but it has a defined value at every point in time.&lt;br /&gt;
&lt;br /&gt;
A sampled signal is entirely different. It&#039;s discrete-time; it&#039;s only&lt;br /&gt;
got a value right at each instantaneous sample point and it&#039;s&lt;br /&gt;
undefined, there is no value at all, everywhere between.  A&lt;br /&gt;
discrete-time signal is properly drawn as a lollipop graph.&lt;br /&gt;
&lt;br /&gt;
The continuous, analog counterpart of a digital signal passes&lt;br /&gt;
smoothly through each sample point, and that&#039;s just as true for high&lt;br /&gt;
frequencies as it is for low.&lt;br /&gt;
&lt;br /&gt;
Now, the interesting and not at all obvious bit is: [[WikiPedia:Nyquist%E2%80%93Shannon_sampling_theorem|there&#039;s only one&lt;br /&gt;
bandlimited signal that passes exactly through each sample point]]. It&#039;s&lt;br /&gt;
a unique solution. So if you sample a bandlimited signal and then&lt;br /&gt;
convert it back, the original input is also the only possible output.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dsat 008.png|360px|right]]&lt;br /&gt;
&lt;br /&gt;
And before you say, &amp;quot;oh, I can draw a different signal that passes&lt;br /&gt;
through those points&amp;quot;, well, yes you can, but if it differs even&lt;br /&gt;
minutely from the original, it includes frequency content at or beyond&lt;br /&gt;
Nyquist, breaks the bandlimiting requirement and isn&#039;t a valid&lt;br /&gt;
solution.&lt;br /&gt;
&lt;br /&gt;
So how did everyone get confused and start thinking of digital signals&lt;br /&gt;
as stairsteps? I can think of two good reasons.&lt;br /&gt;
&lt;br /&gt;
First: it&#039;s easy enough to convert a sampled signal to a true stairstep. Just&lt;br /&gt;
extend each sample value forward until the next sample period.  This is&lt;br /&gt;
called a [[WikiPedia:Zero-order hold|zero-order hold]], and it&#039;s an important part of how some&lt;br /&gt;
digital-to-analog converters work, especially the simplest ones.&lt;br /&gt;
&lt;br /&gt;
So, anyone who looks up [[WikiPedia:Digital-to-analog_converter#Practical_operation|digital-to-analog converter or&lt;br /&gt;
digital-to-analog conversion]] is probably going to see a diagram of a&lt;br /&gt;
stairstep waveform somewhere, but that&#039;s not a finished conversion,&lt;br /&gt;
and it&#039;s not the signal that comes out.&lt;br /&gt;
&lt;br /&gt;
Second, and this is probably the more likely reason, engineers who&lt;br /&gt;
supposedly know better, like me, draw stairsteps even though they&#039;re&lt;br /&gt;
technically wrong. It&#039;s a sort of like a one-dimensional version of&lt;br /&gt;
[[WikiPedia:MacPaint#Development|fat bits in an image editor]].&lt;br /&gt;
&lt;br /&gt;
Pixels aren&#039;t squares either, they&#039;re samples of a 2-dimensional&lt;br /&gt;
function space and so they&#039;re also, conceptually, infinitely small&lt;br /&gt;
points. Practically, it&#039;s a real pain in the ass to see or manipulate&lt;br /&gt;
infinitely small anything, so big squares it is.  Digital stairstep&lt;br /&gt;
drawings are exactly the same thing.&lt;br /&gt;
&lt;br /&gt;
It&#039;s just a convenient drawing. The stairsteps aren&#039;t really there.&lt;br /&gt;
&lt;br /&gt;
==Bit-depth==&lt;br /&gt;
[[Image:Dsat_009.jpg|360px|right]]&lt;br /&gt;
[[Image:Dsat_010.jpg|260px|right]]&lt;br /&gt;
&lt;br /&gt;
When we convert a digital signal back to analog, the result is&lt;br /&gt;
&#039;&#039;also&#039;&#039; smooth regardless of the [[WikiPedia:Audio_bit_depth|bit depth]].  24 bits or 16 bits...&lt;br /&gt;
or 8 bits...  it doesn&#039;t matter.&lt;br /&gt;
&lt;br /&gt;
So does that mean that the digital bit depth makes no difference at&lt;br /&gt;
all? Of course not.&lt;br /&gt;
&lt;br /&gt;
Channel 2 here is the same sine wave input, but we quantize with&lt;br /&gt;
[[WikiPedia:Dither|dither]] down to 8 bits.&lt;br /&gt;
&lt;br /&gt;
On the scope, we still see a nice&lt;br /&gt;
smooth sine wave on channel 2. Look very close, and you&#039;ll also see a&lt;br /&gt;
bit more noise.  That&#039;s a clue.&lt;br /&gt;
&lt;br /&gt;
If we look at the spectrum of the signal... aha!  Our sine wave is&lt;br /&gt;
still there unaffected, but the noise level of the 8-bit signal on&lt;br /&gt;
the second channel is much higher!&lt;br /&gt;
&lt;br /&gt;
And that&#039;s the difference the number of bits makes.  That&#039;s it!&lt;br /&gt;
&lt;br /&gt;
When we digitize a signal, first we sample it. The&lt;br /&gt;
sampling step is perfect; it loses nothing. But then we [[WikiPedia:Quantization_(sound_processing)|quantize]] it,&lt;br /&gt;
and [[WikiPedia:Quantization_error|quantization adds noise]].&lt;br /&gt;
&lt;br /&gt;
The number of bits determines how much noise and so the level of the&lt;br /&gt;
noise floor.&lt;br /&gt;
&lt;br /&gt;
What does this dithered quantization noise sound like?  Let&#039;s listen&lt;br /&gt;
to our 8-bit sine wave.&lt;br /&gt;
&lt;br /&gt;
That may have been hard to hear anything but the tone.  Let&#039;s listen&lt;br /&gt;
to just the noise after we notch out the sine wave and then bring the&lt;br /&gt;
gain up a bit because the noise is quiet.&lt;br /&gt;
&lt;br /&gt;
Those of you who have used analog recording equipment may have just&lt;br /&gt;
thought to yourselves, &amp;quot;My goodness! That sounds like tape hiss!&amp;quot;&lt;br /&gt;
Well, it doesn&#039;t just sound like tape hiss, it acts like it too, and&lt;br /&gt;
if we use a [[WikiPedia:Dither#Different_types|gaussian dither]] then it&#039;s&lt;br /&gt;
[[WikiPedia:Central_limit_theorem|mathematically equivalent]] in every way. It &#039;&#039;is&#039;&#039; tape hiss.&lt;br /&gt;
&lt;br /&gt;
Intuitively, that means that we can measure tape hiss and thus the noise floor&lt;br /&gt;
of [[WikiPedia:Magnetic_tape_sound_recording|magnetic audio tape]]&lt;br /&gt;
in [[WikiPedia:Shannon–Hartley_theorem#Examples|bits instead of decibels]], in order to put things in a&lt;br /&gt;
digital perspective.  [[WikiPedia:Compact cassettes|Compact cassettes]] (for those of you who are old enough to remember them) could reach as&lt;br /&gt;
deep as 9 bits in perfect conditions, though 5 to 6 bits was&lt;br /&gt;
more typical, especially if it was a recording made on a&lt;br /&gt;
[[WikiPedia:Cassette_deck|tape deck]]. That&#039;s right... your mix tapes were only about 6 bits&lt;br /&gt;
deep... if you were lucky!&lt;br /&gt;
&lt;br /&gt;
The very best professional [[WikiPedia:Reel-to-reel_audio_tape_recording|open reel tape]] used in studios could barely&lt;br /&gt;
hit...  any guesses? 13 bits &#039;&#039;with&#039;&#039; [[WikiPedia:Reel-to-reel_audio_tape_recording#Noise_reduction|advanced noise reduction]].  And&lt;br /&gt;
that&#039;s why seeing &#039;[[WikiPedia:SPARS_code|D D D]]&#039; on a [[WikiPedia:Compact_disk|Compact Disc]] used to be such a big,&lt;br /&gt;
high-end deal.&lt;br /&gt;
&lt;br /&gt;
==Dither==&lt;br /&gt;
[[Image:Dsat_011.png|360px|right]]&lt;br /&gt;
&lt;br /&gt;
I keep saying that I&#039;m quantizing with [[Wikipedia:dither|dither]], so what is dither&lt;br /&gt;
exactly and, more importantly, what does it do?&lt;br /&gt;
&lt;br /&gt;
The simple way to quantize a signal is to choose the digital&lt;br /&gt;
amplitude value closest to the original analog amplitude.  [[WikiPedia:Rounding|Obvious]],&lt;br /&gt;
right?  Unfortunately, the exact noise you get from this simple&lt;br /&gt;
quantization scheme depends somewhat on the input signal,&lt;br /&gt;
&lt;br /&gt;
so we may get noise that&#039;s inconsistent, or causes distortion, or is&lt;br /&gt;
undesirable in some other way.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Cameron Nicklaus Christou&#039;s thesis [http://uwspace.uwaterloo.ca/bitstream/10012/3867/1/thesis.pdf Optimal Dither and Noise Shaping in Image Processing] provides an &#039;&#039;excellent&#039;&#039; explanation of dither and noise shaping.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Dither is specially-constructed noise that substitutes for the noise&lt;br /&gt;
produced by simple quantization. Dither doesn&#039;t [[WikiPedia:Sound_masking|drown out or mask]]&lt;br /&gt;
quantization noise, it actually replaces it with noise characteristics&lt;br /&gt;
of our choosing that aren&#039;t influenced by the input.&lt;br /&gt;
&lt;br /&gt;
Let&#039;s &#039;&#039;watch&#039;&#039; what dither does.  The signal generator has too much noise for this test so we&#039;ll produce a mathematically perfect sine wave with the ThinkPad and quantize it to 8 bits with dithering.&lt;br /&gt;
&lt;br /&gt;
We see a nice sine wave on the waveform display and output scope and, once the analog spectrum analyzer catches up...&lt;br /&gt;
a clean frequency peak with a uniform noise floor on both spectral displays&lt;br /&gt;
just like before. Again, this is with dither.&lt;br /&gt;
&lt;br /&gt;
Now I turn dithering off.&lt;br /&gt;
&lt;br /&gt;
The quantization noise, that dither had spread out into a nice, flat noise&lt;br /&gt;
floor, piles up into harmonic distortion peaks.  The noise floor is&lt;br /&gt;
lower, but the level of distortion becomes nonzero, and the distortion&lt;br /&gt;
peaks sit higher than the dithering noise did.&lt;br /&gt;
&lt;br /&gt;
At 8 bits this effect is exaggerated. At 16 bits,&lt;br /&gt;
even without dither, harmonic distortion is going to be so low as to&lt;br /&gt;
be completely inaudible.&lt;br /&gt;
&lt;br /&gt;
Still, we can use dither to eliminate it completely if we so choose.&lt;br /&gt;
&lt;br /&gt;
Turning the dither off again for a moment, you&#039;ll notice that the&lt;br /&gt;
absolute level of distortion from undithered quantization stays&lt;br /&gt;
approximately constant regardless of the input amplitude.&lt;br /&gt;
But when the signal level drops below a half a bit, everything&lt;br /&gt;
quantizes to zero.&lt;br /&gt;
&lt;br /&gt;
In a sense, everything quantizing to zero is just 100% distortion!&lt;br /&gt;
Dither eliminates this distortion too. We reenable dither&lt;br /&gt;
and ... there&#039;s our signal back at 1/4 bit, with our nice flat noise floor.&lt;br /&gt;
&lt;br /&gt;
The noise floor doesn&#039;t have to be flat.  Dither is noise of our&lt;br /&gt;
choosing, so let&#039;s choose a noise as [http://www.acoustics.salford.ac.uk/res/cox/sound_quality/?content=subjective inoffensive] and&lt;br /&gt;
[[WikiPedia:Absolute_threshold_of_hearing|difficult to notice]]&lt;br /&gt;
as possible.&lt;br /&gt;
&lt;br /&gt;
Our hearing is most sensitive in the midrange from 2kHz to 4kHz,&lt;br /&gt;
so that&#039;s where background noise is going to be the most obvious.&lt;br /&gt;
We can [[WikiPedia:Noise_shaping|shape dithering noise]] away from sensitive frequencies to where&lt;br /&gt;
hearing is less sensitive, usually the highest frequencies.&lt;br /&gt;
&lt;br /&gt;
16-bit dithering noise is normally much too quiet to hear at all, but&lt;br /&gt;
let&#039;s listen to our noise shaping example, again with the gain&lt;br /&gt;
brought way up...&lt;br /&gt;
&lt;br /&gt;
Lastly, dithered quantization noise &#039;&#039;is&#039;&#039; higher [[WikiPedia:power|Sound_power]] overall&lt;br /&gt;
than undithered quantization noise even when it sounds quieter, and&lt;br /&gt;
you can see that on a [[WikiPedia:VU_meter|VU meter]] during passages of near-silence.  But&lt;br /&gt;
dither isn&#039;t only an on or off choice. We can reduce the dither&#039;s&lt;br /&gt;
power to balance less noise against a bit of distortion to minimize&lt;br /&gt;
the overall effect.&lt;br /&gt;
&lt;br /&gt;
We&#039;ll also [[WikiPedia:Amplitude_modulation|modulate the input signal]] like this to show how a varying input affects the quantization noise.  At&lt;br /&gt;
full dithering power, the noise is uniform, constant, and featureless&lt;br /&gt;
just like we expect:&lt;br /&gt;
&lt;br /&gt;
As we reduce the dither&#039;s power, the input increasingly&lt;br /&gt;
affects the amplitude and the character of the quantization noise.&lt;br /&gt;
Shaped dither behaves similarly, but noise shaping lends one more nice&lt;br /&gt;
advantage.  To make a long story short, it can use a somewhat lower&lt;br /&gt;
dither power before the input has as much effect on the output.&lt;br /&gt;
&lt;br /&gt;
Despite all the time I just spent on dither, we&#039;re talking about&lt;br /&gt;
differences that start 100 decibels and more below [[WikiPedia:Full_scale|full scale]].  Maybe&lt;br /&gt;
if the CD had been&lt;br /&gt;
[http://www.research.philips.com/technologies/projects/cd/index.html 14 bits as originally designed],&lt;br /&gt;
dither &#039;&#039;might&#039;&#039; be&lt;br /&gt;
more important.  Maybe.  At 16 bits, really, it&#039;s mostly a wash.  You&lt;br /&gt;
can think of dither as an insurance policy that gives several extra&lt;br /&gt;
decibels of dynamic range just in case. The simple fact is, though, no&lt;br /&gt;
one ever ruined a great recording by not dithering the final master.&lt;br /&gt;
&lt;br /&gt;
==Bandlimitation and timing==&lt;br /&gt;
[[image:Dsat_013.jpg|360px|right]]&lt;br /&gt;
[[image:Dsat_014.gif|360px|right]]&lt;br /&gt;
&lt;br /&gt;
We&#039;ve been using [[WikiPedia:Sine_wave|sine waves]]. They&#039;re the obvious choice when what we&lt;br /&gt;
want to see is a system&#039;s behavior at a given isolated frequency.  Now&lt;br /&gt;
let&#039;s look at something a bit more complex.  What should we expect to&lt;br /&gt;
happen when I change the input to a [[WikiPedia:Square_wave|square wave]]...&lt;br /&gt;
&lt;br /&gt;
The input scope confirms our 1kHz square wave.  The output scope shows..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Exactly what it should.&lt;br /&gt;
...&lt;br /&gt;
What is a square wave really?  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Well, we can say it&#039;s a waveform that&#039;s&lt;br /&gt;
some positive value for half a cycle and then transitions&lt;br /&gt;
instantaneously to a negative value for the other half. But that doesn&#039;t&lt;br /&gt;
really tell us anything useful about how this input &lt;br /&gt;
becomes this output .&lt;br /&gt;
&lt;br /&gt;
Then we remember that [any waveform is also [[WikiPedia:Fourier_series|the sum of discrete frequencies]],&lt;br /&gt;
and a square wave is particularly simple sum: a fundamental and an&lt;br /&gt;
infinite series of [[WikiPedia:Even_and_odd_functions#Harmonics|odd harmonics]].  Sum them all up, you get a&lt;br /&gt;
square wave.&lt;br /&gt;
&lt;br /&gt;
At first glance, that doesn&#039;t seem very useful either. You have to sum&lt;br /&gt;
up an infinite number of harmonics to get the answer.  Ah, but we don&#039;t&lt;br /&gt;
have an infinite number of harmonics.&lt;br /&gt;
&lt;br /&gt;
We&#039;re using a quite sharp [[WikiPedia:Low-pass_filter|anti-aliasing filter]] that cuts off right&lt;br /&gt;
above 20kHz, so our signal is [[WikiPedia:Bandlimiting|bandlimited]], which means we get this:&lt;br /&gt;
&lt;br /&gt;
..and that&#039;s exactly what we see on the output scope.&lt;br /&gt;
&lt;br /&gt;
The rippling you see around sharp edges in a bandlimited signal is&lt;br /&gt;
called the [[WikiPedia:/Gibbs_phenomenon|Gibbs effect]]. It happens whenever you slice off part of the&lt;br /&gt;
frequency domain in the middle of nonzero energy.&lt;br /&gt;
&lt;br /&gt;
The usual rule of thumb you&#039;ll hear is &amp;quot;the sharper the cutoff, the&lt;br /&gt;
stronger the rippling&amp;quot;, which is approximately true, but we have to be&lt;br /&gt;
careful how we think about it.&lt;br /&gt;
For example... what would you expect our quite sharp anti-aliasing filter&lt;br /&gt;
to do if I run our signal through it a second time?&lt;br /&gt;
&lt;br /&gt;
Aside from adding a few fractional cycles of delay, the answer is...&lt;br /&gt;
nothing at all.  The signal is already bandlimited. Bandlimiting it&lt;br /&gt;
again doesn&#039;t do anything.  A second pass can&#039;t remove frequencies&lt;br /&gt;
that we already removed.&lt;br /&gt;
&lt;br /&gt;
And that&#039;s important.  People tend to think of the ripples as&lt;br /&gt;
a kind of [[WikiPedia:Sonic_artifact|artifact]] that&#039;s added by anti-aliasing and [[WikiPedia:Reconstruction_filter|anti-imaging]]&lt;br /&gt;
filters, implying that the ripples get worse each time the signal&lt;br /&gt;
passes through.  We can see that in this case that didn&#039;t happen. So&lt;br /&gt;
was it really the filter that added the ripples the first time&lt;br /&gt;
through?  No, not really. It&#039;s a subtle distinction, but Gibbs effect&lt;br /&gt;
ripples aren&#039;t added by filters, they&#039;re just part of what a&lt;br /&gt;
bandlimited signal &#039;&#039;is&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
Even if we synthetically construct what looks like a perfect digital&lt;br /&gt;
square wave,&lt;br /&gt;
&lt;br /&gt;
it&#039;s still limited to the channel bandwidth.  Remember,&lt;br /&gt;
the stairstep representation is misleading.&lt;br /&gt;
&lt;br /&gt;
What we really have here are instantaneous sample points,&lt;br /&gt;
&lt;br /&gt;
and only one bandlimited signal fits those points.  All we did when we&lt;br /&gt;
drew our apparently perfect square wave was line up the sample points&lt;br /&gt;
just right so it appeared that there were no ripples if we played&lt;br /&gt;
[[WikiPedia:Interpolation|connect-the-dots]].&lt;br /&gt;
&lt;br /&gt;
But the original bandlimited signal, complete with ripples, was&lt;br /&gt;
still there.&lt;br /&gt;
&lt;br /&gt;
And that leads us to one more important point.  You&#039;ve probably heard&lt;br /&gt;
that the timing precision of a digital signal is limited by its sample&lt;br /&gt;
rate; put another way,&lt;br /&gt;
&lt;br /&gt;
that digital signals can&#039;t represent anything that falls between the&lt;br /&gt;
samples.. implying that [[WikiPedia:Dirac_delta_function|impulses]] or&lt;br /&gt;
[[WikiPedia:Synthesizer#ADSR_envelope|fast attacks]] have to align exactly&lt;br /&gt;
with a sample, or the timing gets mangled... or they just disappear.&lt;br /&gt;
&lt;br /&gt;
At this point, we can easily see why that&#039;s wrong.&lt;br /&gt;
&lt;br /&gt;
Again, our input signals are bandlimited. And digital signals are&lt;br /&gt;
samples, not stairsteps, not &#039;connect-the-dots&#039;.  We most certainly&lt;br /&gt;
can, for example, put the rising edge of our bandlimited square wave&lt;br /&gt;
anywhere we want between samples.&lt;br /&gt;
&lt;br /&gt;
It&#039;s represented perfectly and it&#039;s reconstructed perfectly.&lt;br /&gt;
&lt;br /&gt;
==Epilogue==&lt;br /&gt;
&lt;br /&gt;
[[Image:Moffey.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
Just like in [[Videos/A_Digital_Media_Primer_For_Geeks|the previous episode]], we&#039;ve covered a broad range of&lt;br /&gt;
topics, and yet barely scratched the surface of each one.  If anything, my&lt;br /&gt;
sins of omission are greater this time around... but this is a good&lt;br /&gt;
stopping point.&lt;br /&gt;
&lt;br /&gt;
Or maybe, a good starting point.  Dig deeper.  Experiment.  I chose my&lt;br /&gt;
demos very carefully to be simple and give clear results. You can&lt;br /&gt;
reproduce every one of them on your own if you like.  But let&#039;s face&lt;br /&gt;
it, sometimes we learn the most about a spiffy toy by breaking it open&lt;br /&gt;
and studying all the pieces that fall out.  And that&#039;s OK, we&#039;re&lt;br /&gt;
engineers.  Play with the demo parameters, hack up the code, set up&lt;br /&gt;
alternate experiments.  The source code for everything, including the&lt;br /&gt;
little pushbutton demo application, is up at xiph.org.&lt;br /&gt;
&lt;br /&gt;
In the course of experimentation, you&#039;re likely to run into something&lt;br /&gt;
that you didn&#039;t expect and can&#039;t explain.  Don&#039;t worry!  My earlier&lt;br /&gt;
snark aside, Wikipedia is fantastic for exactly this kind of casual&lt;br /&gt;
research. And, if you&#039;re really serious about understanding signals,&lt;br /&gt;
several universities have advanced materials online, such as the&lt;br /&gt;
[http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/index.htm 6.003]&lt;br /&gt;
and&lt;br /&gt;
[http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-007-electromagnetic-energy-from-motors-to-lasers-spring-2011 6.007]&lt;br /&gt;
Signals and Systems modules at MIT OpenCourseWare. And of&lt;br /&gt;
course, there&#039;s always the [http://webchat.freenode.net/?channels=xiph community here at Xiph.Org].&lt;br /&gt;
&lt;br /&gt;
Digging deeper or not, I am out of coffee, so, until next time, happy&lt;br /&gt;
hacking!&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
Written by: Christopher (Monty) Montgomery and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Special thanks to:&lt;br /&gt;
*Heidi Baumgartner, for the second Tektronix oscilloscope&lt;br /&gt;
*Gregory Maxwell and Dr. Timothy Terriberry, for additional technical review&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Andy Warhol Is Gone&amp;quot;, by Lousy Robot&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Lousy Robot.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.lousyrobot.com www.lousyrobot.com]&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software:&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*[http://www.gnu.org/ GNU]&amp;lt;br&amp;gt;&lt;br /&gt;
*[http://www.linux.org/ Linux]&amp;lt;br&amp;gt;&lt;br /&gt;
*[http://fedoraproject.org/ Fedora]&amp;lt;br&amp;gt;&lt;br /&gt;
*[http://cinelerra.org/ Cinelerra]&amp;lt;br&amp;gt;&lt;br /&gt;
*[http://www.gimp.org/ The Gimp]&amp;lt;br&amp;gt;&lt;br /&gt;
*[http://audacity.sourceforge.net/ Audacity]&amp;lt;br&amp;gt;&lt;br /&gt;
*[http://svn.xiph.org/trunk/postfish/README Postfish]&amp;lt;br&amp;gt;&lt;br /&gt;
*[http://gstreamer.freedesktop.org/ Gstreamer]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
All trademarks are the property of their respective owners. &lt;br /&gt;
&lt;br /&gt;
*&#039;&#039;Complete video&#039;&#039; [http://creativecommons.org/licenses/by-sa/3.0/legalcode CC-BY-SA]&amp;lt;br&amp;gt;&lt;br /&gt;
*&#039;&#039;Text transcript and Wiki edition&#039;&#039; [http://creativecommons.org/licenses/by-sa/3.0/legalcode CC-BY-SA]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat, Inc.&amp;lt;br&amp;gt;&lt;br /&gt;
(C) 2012-2013, Some Rights Reserved&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Use The Source Luke ==&lt;br /&gt;
&lt;br /&gt;
As stated in the Epilogue, everything that appears in the video demos is driven by open source software, which means the source is both available for inspection and freely usable by the community.  The Thinkpad that appears in the video was running Fedora 17 and Gnome Shell (Gnome 3).  The demonstration software does not require Fedora specifically, but it does require Gnu/Linux to run in its current form.&lt;br /&gt;
&lt;br /&gt;
=== The Spectrum and Waveform Viewer ===&lt;br /&gt;
&lt;br /&gt;
The realtime software spectrum analyzer application that appears in the video was a preexisting application that was dusted off and updated for use in the video.  The waveform viewer (effectively a simple software oscilloscope) was written from scratch making use of some of the internals from the spectrum analyzer application.  Both are available from Xiph.Org svn:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
*Source for the Spectrum and Waveform applications is found at:&lt;br /&gt;
https://svn.xiph.org/trunk/spectrum/&lt;br /&gt;
*The source can be checked out of svn using the following command line:&lt;br /&gt;
svn co https://svn.xiph.org/trunk/spectrum&lt;br /&gt;
*Trac is a convenient way to browse the source without checking out a copy:&lt;br /&gt;
https://trac.xiph.org/browser/trunk/spectrum&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Spectrum and Waveform both expect an input stream on the command line, either as raw data or as a WAV file.&lt;br /&gt;
&lt;br /&gt;
=== GTK-Bounce ===&lt;br /&gt;
&lt;br /&gt;
The touch-controlled application used in the video is named &#039;gtk-bounce&#039; and was custom-written for the sole purpose of the in-video demonstrations.  It is so named because, for the most part, all it does is read the input from an audio device, and then immediately write the same data back out for playback.  It also forwards a copy of this data to up to two external monitoring applications, and in several demos, applies simple filters or generates simple waveforms. It includes several demos not included in the video.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Source for gtk-bounce is found at:&#039;&#039;&#039;&lt;br /&gt;
https://svn.xiph.org/trunk/Xiph-episode-II/bounce/&lt;br /&gt;
The source can be checked out of svn using the following command line:&lt;br /&gt;
svn co https://svn.xiph.org/trunk/Xiph-episode-II/bounce/&lt;br /&gt;
Trac is a convenient way to browse the source without checking out a copy:&lt;br /&gt;
https://trac.xiph.org/browser/Xiph-episode-II/bounce/&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The application is somewhat hardwired for specific demo purposes, but most of the hardwired settings can be found at the top of each source file.  As found in SVN, the application expects an ALSA hardware audio device at hw:1, and if none if found, it will wait for one to appear.  Once a sound device is successfully initialized, it expects to find and open two pipes named pipe0 and pipe1 for output in the current directory.  In the video, the waveform and spectrum applications are started to take input from pipe0 and pipe1 respectively.  In most of the demos, the output sent to the two pipes is identical, and matches the output data sent to the hardware device for conversion to analog.  The only exception is the tenth demo panel (which does not appear in the video) where gtk-bounce can be set to monitor the hardware inputs to the pipes rather than the outputs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Assuming gtk-bounce, spectrum and waveform have been checked out and built, the configuration seen in the video can be started using the following commands:&#039;&#039;&#039;&lt;br /&gt;
# make the pipe fifos for the applications to communicate (only needs to be done once)&lt;br /&gt;
mkfifo pipe0; mkfifo pipe1&lt;br /&gt;
# start all three applications&lt;br /&gt;
waveform pipe0 &amp;amp; spectrum pipe1 &amp;amp; gtk-bounce &amp;amp;&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/Digital_Show_and_Tell&amp;diff=13875</id>
		<title>Videos/Digital Show and Tell</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/Digital_Show_and_Tell&amp;diff=13875"/>
		<updated>2013-02-25T03:23:14Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: minor proofreading fixes&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;&amp;lt;&amp;lt; Intro &amp;gt;&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Hi, I&#039;m Monty Montgomery from Red Hat and Xiph.Org.&lt;br /&gt;
&lt;br /&gt;
A few months ago, I wrote an article on digital audio and why&lt;br /&gt;
24bit/192kHz music downloads don&#039;t make sense. In the article, I&lt;br /&gt;
mentioned--almost in passing--that a digital waveform is not a&lt;br /&gt;
stairstep, and you certainly don&#039;t get a stairstep when you convert&lt;br /&gt;
from digital back to analog.&lt;br /&gt;
&lt;br /&gt;
Of everything in the entire article, *that* was the number one thing&lt;br /&gt;
people wrote about. In fact, more than half the mail I got was questions and&lt;br /&gt;
comments about basic digital signal behavior.  Since there&#039;s interest,&lt;br /&gt;
let&#039;s take a little time to play with some _simple_ digital signals.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;&amp;lt;&amp;lt;&amp;lt; veritas ex machina &amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pretend for a moment that we have no idea how digital signals really&lt;br /&gt;
behave. In that case it doesn&#039;t make sense for us to use digital test&lt;br /&gt;
equipment either.  Fortunately for this exercise, there&#039;s still plenty&lt;br /&gt;
of working analog lab equipment out there.&lt;br /&gt;
&lt;br /&gt;
[[ close on 3325 ]&lt;br /&gt;
&lt;br /&gt;
First up, we need a signal generator to provide us with analog input&lt;br /&gt;
signals--in this case, an HP3325 from 1978.  It&#039;s still a pretty good&lt;br /&gt;
generator, so if you don&#039;t mind the size, the weight, the power&lt;br /&gt;
consumption, and the noisy fan, you can find them on eBay... occasionally&lt;br /&gt;
for only slightly more than you&#039;ll pay for shipping.&lt;br /&gt;
&lt;br /&gt;
[[ close on 2246 ]]&lt;br /&gt;
&lt;br /&gt;
Next, we&#039;ll observe our analog waveforms on analog oscilloscopes, like this&lt;br /&gt;
Tektronix 2246 from the mid-90s, one of the last and very best analog&lt;br /&gt;
scopes ever made. Every home lab should have one.&lt;br /&gt;
&lt;br /&gt;
[[ close on 3585]]&lt;br /&gt;
&lt;br /&gt;
...and finally inspect the frequency spectrum of our signals using an&lt;br /&gt;
analog spectrum analyzer, this HP3585 from the same product line as&lt;br /&gt;
the signal generator.  Like the other equipment here it has a&lt;br /&gt;
rudimentary and hilariously large microcontroller, but the signal path&lt;br /&gt;
from input to what you see on the screen is completely analog.&lt;br /&gt;
&lt;br /&gt;
All of this equipment is vintage, but aside from its raw tonnage, the&lt;br /&gt;
specs are still quite good.&lt;br /&gt;
&lt;br /&gt;
[[ out ]]&lt;br /&gt;
&lt;br /&gt;
At the moment, we have our signal generator set to output a nice 1kHz&lt;br /&gt;
sine wave at one volt RMS,&lt;br /&gt;
&lt;br /&gt;
we see the sine wave on the oscilloscope, can verify that it is indeed&lt;br /&gt;
1kHz at one volt RMS, which is 2.8V peak-to-peak, and that matches the&lt;br /&gt;
measurement on the spectrum analyzer as well.&lt;br /&gt;
&lt;br /&gt;
The analyzer also shows some low-level white noise and just a bit of&lt;br /&gt;
harmonic distortion, with the highest peak about 70dB or so below the&lt;br /&gt;
fundamental. Now, this doesn&#039;t matter at all in our demos, but I&lt;br /&gt;
wanted to point it out now just in case you didn&#039;t notice it until&lt;br /&gt;
later.&lt;br /&gt;
&lt;br /&gt;
[[ cut to complete setup ]]&lt;br /&gt;
&lt;br /&gt;
Now, we drop digital sampling in the middle.&lt;br /&gt;
&lt;br /&gt;
For the conversion, we&#039;ll use a boring, consumer-grade, eMagic USB1&lt;br /&gt;
audio device.  It&#039;s also more than ten years old at this point, and it&#039;s&lt;br /&gt;
getting obsolete.&lt;br /&gt;
&lt;br /&gt;
A recent converter can easily have an order of magnitude better specs.&lt;br /&gt;
Flatness, linearity, jitter, noise behavior, everything... you may not&lt;br /&gt;
have noticed.  Just because we can measure an improvement doesn&#039;t&lt;br /&gt;
mean we can hear it, and even these old consumer boxes were already at&lt;br /&gt;
the edge of ideal transparency.&lt;br /&gt;
&lt;br /&gt;
[[out to see emagic initialize and digital waveform appear on TP ]]&lt;br /&gt;
&lt;br /&gt;
The eMagic connects to my ThinkPad, which displays a digital&lt;br /&gt;
waveform and spectrum for comparison, then the ThinkPad&lt;br /&gt;
sends the digital signal right back out to the eMagic for&lt;br /&gt;
re-conversion to analog and observation on the output scopes.&lt;br /&gt;
&lt;br /&gt;
Input to output, left to right.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;&amp;lt;&amp;lt;&amp;lt; stairsteps &amp;gt;&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
OK, it&#039;s go time. We begin by converting an analog signal to digital and&lt;br /&gt;
then right back to analog again with no other steps.&lt;br /&gt;
&lt;br /&gt;
[[close to 3325]] &lt;br /&gt;
The signal generator is set to produce a 1kHz sine wave just like&lt;br /&gt;
before.&lt;br /&gt;
&lt;br /&gt;
[[close to input scope]]&lt;br /&gt;
We can see our analog sine wave on our input-side oscilloscope.&lt;br /&gt;
&lt;br /&gt;
[[close to TP: spectrum]]&lt;br /&gt;
We digitize our signal to 16 bit PCM at 44.1kHz, same as on a CD.&lt;br /&gt;
The spectrum of the digitized signal matches what we saw earlier&lt;br /&gt;
&lt;br /&gt;
[[close to SA]]&lt;br /&gt;
and what we see now on the analog spectrum analyzer, aside from its &lt;br /&gt;
high-impedance input being just a smidge noisier.&lt;br /&gt;
&lt;br /&gt;
[[close to TP ; overview/waveform ]]&lt;br /&gt;
For now, the waveform display shows our digitized sine wave as a&lt;br /&gt;
stairstep pattern, one step for each sample.&lt;br /&gt;
&lt;br /&gt;
[[ out ]]&lt;br /&gt;
And when we look at the output signal that&#039;s been converted&lt;br /&gt;
from digital back to analog, we see...&lt;br /&gt;
&lt;br /&gt;
[[close to output scope: press CH1 button to show waveform]]&lt;br /&gt;
It&#039;s exactly like the original sine wave.  No stairsteps.&lt;br /&gt;
&lt;br /&gt;
[[ out ]]&lt;br /&gt;
OK, 1kHz is still a fairly low frequency, maybe the stairsteps are just&lt;br /&gt;
hard to see or they&#039;re being smoothed away.  Fair enough. Let&#039;s choose&lt;br /&gt;
a higher frequency, something close to Nyquist, say 15kHz.&lt;br /&gt;
&lt;br /&gt;
[[set 3325 to 15kHz ]]&lt;br /&gt;
Now the sine wave is represented by less than three samples per cycle, and...&lt;br /&gt;
&lt;br /&gt;
[[close to TP]]&lt;br /&gt;
the digital waveform looks pretty awful.  Well, looks&lt;br /&gt;
can be deceiving. The analog output...&lt;br /&gt;
&lt;br /&gt;
[[close to output scope]]&lt;br /&gt;
is still a perfect sine wave, exactly like the original.&lt;br /&gt;
&lt;br /&gt;
Let&#039;s keep going up.&lt;br /&gt;
&lt;br /&gt;
Let&#039;s see if I can do this without blocking any cameras.&lt;br /&gt;
&lt;br /&gt;
16kHz.... 17kHz... 18kHz... 19kHz... &lt;br /&gt;
&lt;br /&gt;
20kHz.  Welcome to the upper limits of human hearing. The output&lt;br /&gt;
waveform is still perfect. No jagged edges, no dropoff, no stairsteps.&lt;br /&gt;
&lt;br /&gt;
So where&#039;d the stairsteps go? Don&#039;t answer, it&#039;s a trick question.&lt;br /&gt;
They were never there.&lt;br /&gt;
&lt;br /&gt;
Drawing a digital waveform as a stairstep... was wrong to begin with.&lt;br /&gt;
&lt;br /&gt;
Why? A stairstep is a continuous-time function.  It&#039;s jagged, and it&#039;s&lt;br /&gt;
piecewise, but it has a defined value at every point in time.&lt;br /&gt;
&lt;br /&gt;
A sampled signal is entirely different. It&#039;s discrete-time; it&#039;s only&lt;br /&gt;
got a value right at each instantaneous sample point and it&#039;s&lt;br /&gt;
undefined, there is no value at all, everywhere between.  A&lt;br /&gt;
discrete-time signal is properly drawn as a lollipop graph.&lt;br /&gt;
&lt;br /&gt;
The continuous, analog counterpart of a digital signal passes&lt;br /&gt;
smoothly through each sample point, and that&#039;s just as true for high&lt;br /&gt;
frequencies as it is for low.&lt;br /&gt;
&lt;br /&gt;
Now, the interesting and not at all obvious bit is: there&#039;s only one&lt;br /&gt;
bandlimited signal that passes exactly through each sample point. It&#039;s&lt;br /&gt;
a unique solution. So if you sample a bandlimited signal and then&lt;br /&gt;
convert it back, the original input is also the only possible output.&lt;br /&gt;
&lt;br /&gt;
And before you say, &amp;quot;oh, I can draw a different signal that passes&lt;br /&gt;
through those points&amp;quot;, well, yes you can, but if it differs even&lt;br /&gt;
minutely from the original, it includes frequency content at or beyond&lt;br /&gt;
Nyquist, breaks the bandlimiting requirement and isn&#039;t a valid&lt;br /&gt;
solution.&lt;br /&gt;
&lt;br /&gt;
[[ out ]]&lt;br /&gt;
So how did everyone get confused and start thinking of digital signals&lt;br /&gt;
as stairsteps? I can think of two good reasons.&lt;br /&gt;
&lt;br /&gt;
[[close to TP; freeze display; draw in zero-order]]&lt;br /&gt;
First: it&#039;s easy enough to convert a sampled signal to a true stairstep. Just&lt;br /&gt;
extend each sample value forward until the next sample period.  This is&lt;br /&gt;
called a zero-order hold, and it&#039;s an important part of how some&lt;br /&gt;
digital-to-analog converters work, especially the simplest ones.&lt;br /&gt;
&lt;br /&gt;
[[ Wikipedia DAC lookup + scroll down to hold image]]&lt;br /&gt;
So, anyone who looks up digital-to-analog converter or&lt;br /&gt;
digital-to-analog conversion is probably going to see a diagram of a&lt;br /&gt;
stairstep waveform somewere, but that&#039;s not a finished conversion,&lt;br /&gt;
and it&#039;s not the signal that comes out.&lt;br /&gt;
&lt;br /&gt;
[[ out ]]&lt;br /&gt;
Second, and this is probably the more likely reason, engineers who&lt;br /&gt;
supposedly know better, like me, draw stairsteps even though they&#039;re&lt;br /&gt;
technically wrong. It&#039;s a sort of like a one-dimensional version of&lt;br /&gt;
fat bits in an image editor.&lt;br /&gt;
&lt;br /&gt;
[[gimp RMD animation]]]&lt;br /&gt;
Pixels aren&#039;t squares either, they&#039;re samples of a 2-dimensional&lt;br /&gt;
function space and so they&#039;re also, conceptually, infinitely small&lt;br /&gt;
points. Practically, it&#039;s a real pain in the ass to see or manipulate&lt;br /&gt;
infinitely small anything, so big squares it is.  Digital stairstep&lt;br /&gt;
drawings are exactly the same thing.&lt;br /&gt;
&lt;br /&gt;
[[ out ]]&lt;br /&gt;
It&#039;s just a convenient drawing. The stairsteps aren&#039;t really there.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;&amp;lt;&amp;lt; bit-depth &amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
When we convert a digital signal back to analog, the result is&lt;br /&gt;
_also_ smooth regardless of the bit depth.  24 bits or 16 bits...&lt;br /&gt;
or 8 bits...  it doesn&#039;t matter.&lt;br /&gt;
&lt;br /&gt;
So does that mean that the digital bit depth makes no difference at&lt;br /&gt;
all? Of course not.&lt;br /&gt;
&lt;br /&gt;
Channel 2 here is the same sine wave input, but we quantize with&lt;br /&gt;
dither down to eight bits.&lt;br /&gt;
&lt;br /&gt;
On the scope, we still see a nice&lt;br /&gt;
smooth sine wave on channel 2. Look very close, and you&#039;ll also see a&lt;br /&gt;
bit more noise.  That&#039;s a clue.&lt;br /&gt;
&lt;br /&gt;
If we look at the spectrum of the signal... aha!  Our sine wave is&lt;br /&gt;
still there unaffected, but the noise level of the eight-bit signal on&lt;br /&gt;
the second channel is much higher!&lt;br /&gt;
&lt;br /&gt;
And that&#039;s the difference the number of bits makes.  That&#039;s it!&lt;br /&gt;
&lt;br /&gt;
When we digitize a signal, first we sample it. The&lt;br /&gt;
sampling step is perfect; it loses nothing. But then we quantize it,&lt;br /&gt;
and quantization adds noise.&lt;br /&gt;
&lt;br /&gt;
[[panel 2; demonstrate changing bit depth on tablet ]]&lt;br /&gt;
&lt;br /&gt;
The number of bits determines how much noise and so the level of the&lt;br /&gt;
noise floor. [[demonstrate changing bit depth on tablet]].&lt;br /&gt;
&lt;br /&gt;
[[ out ]]&lt;br /&gt;
&lt;br /&gt;
What does this dithered quantization noise sound like?  Let&#039;s listen&lt;br /&gt;
to our eight-bit sine wave.&lt;br /&gt;
&lt;br /&gt;
[[audio: eight bit sine]]&lt;br /&gt;
&lt;br /&gt;
That may have been hard to hear anything but the tone.  Let&#039;s listen&lt;br /&gt;
to just the noise after we notch out the sine wave and then bring the&lt;br /&gt;
gain up a bit because the noise is quiet.&lt;br /&gt;
&lt;br /&gt;
[[audio: hit notch + gain button]]&lt;br /&gt;
&lt;br /&gt;
Those of you who have used analog recording equipment may have just&lt;br /&gt;
thought to yourselves, &amp;quot;My goodness! That sounds like tape hiss!&amp;quot;&lt;br /&gt;
Well, it doesn&#039;t just sound like tape hiss, it acts like it too, and&lt;br /&gt;
if we use a gaussian dither then it&#039;s mathematically&lt;br /&gt;
equivalent in every way. It _is_ tape hiss.&lt;br /&gt;
&lt;br /&gt;
Intuitively, that means that we can measure tape hiss and thus the noise floor&lt;br /&gt;
of magnetic audio tape in bits instead of decibels, in order to put things in a&lt;br /&gt;
digital perspective.  Compact cassettes...&lt;br /&gt;
&lt;br /&gt;
[[ reveal cassette ]]&lt;br /&gt;
&lt;br /&gt;
for those of you who are old enough to remember them, could reach as&lt;br /&gt;
deep as nine bits in perfect conditions, though five to six bits was&lt;br /&gt;
more typical, especially if it was a recording made on a tape&lt;br /&gt;
deck. That&#039;s right... your mix tapes were only about six bits&lt;br /&gt;
deep... if you were lucky!&lt;br /&gt;
&lt;br /&gt;
The very best professional open reel tape used in studios could barely&lt;br /&gt;
hit...  any guesses? 13 bits _with_ advanced noise reduction.  And&lt;br /&gt;
that&#039;s why seeing &#039;D D D&#039; on a Compact Disc used to be such a big,&lt;br /&gt;
high-end deal.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;&amp;lt;&amp;lt; dither &amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I keep saying that I&#039;m quantizing with dither, so what is dither&lt;br /&gt;
exactly and, more importantly, what does it do?&lt;br /&gt;
&lt;br /&gt;
[[ Illustration: quantization ]]&lt;br /&gt;
&lt;br /&gt;
The simple way to quantize a signal is to choose the digital&lt;br /&gt;
amplitude value closest to the original analog amplitude.  Obvious,&lt;br /&gt;
right?  Unfortunately, the exact noise you get from this simple&lt;br /&gt;
quantization scheme depends somewhat on the input signal,&lt;br /&gt;
&lt;br /&gt;
[[Illustration: correlated quantization noise ]]&lt;br /&gt;
&lt;br /&gt;
so we may get noise that&#039;s inconsistent, or causes distortion, or is&lt;br /&gt;
undesirable in some other way.&lt;br /&gt;
&lt;br /&gt;
[show/attribute the dither paper]&lt;br /&gt;
Dither is specially-constructed noise that substitutes for the noise&lt;br /&gt;
produced by simple quantization. Dither doesn&#039;t drown out or mask&lt;br /&gt;
quantization noise, it actually replaces it with noise characteristics&lt;br /&gt;
of our choosing that aren&#039;t influenced by the input.&lt;br /&gt;
&lt;br /&gt;
[[out]]&lt;br /&gt;
&lt;br /&gt;
Let&#039;s _watch_ what dither does.  The signal generator has too much&lt;br /&gt;
noise for this test so&lt;br /&gt;
&lt;br /&gt;
[[close: panel 3]]&lt;br /&gt;
&lt;br /&gt;
we&#039;ll produce a mathematically perfect sine wave with the ThinkPad&lt;br /&gt;
[[press]]&lt;br /&gt;
&lt;br /&gt;
and quantize it to eight bits [[press]]&lt;br /&gt;
&lt;br /&gt;
with dithering. [[press]]&lt;br /&gt;
&lt;br /&gt;
We see a nice sine wave on the waveform display&lt;br /&gt;
&lt;br /&gt;
[[ show outscope ]]  and output scope&lt;br /&gt;
&lt;br /&gt;
[[ show analyzer]]  and, once the analog spectrum analyzer catches up...&lt;br /&gt;
&lt;br /&gt;
[[time accel sweep]] a clean frequency peak with a uniform noise floor&lt;br /&gt;
on both spectral displays&lt;br /&gt;
&lt;br /&gt;
[[ overview: spectrum ]]  just like before. Again, this is with dither.&lt;br /&gt;
&lt;br /&gt;
Now I turn dithering off. [[ deactivate dither ]]&lt;br /&gt;
&lt;br /&gt;
The quantization noise, that dither had spread out into a nice, flat noise&lt;br /&gt;
floor, piles up into harmonic distortion peaks.  The noise floor is&lt;br /&gt;
lower, but the level of distortion becomes nonzero, and the distortion&lt;br /&gt;
peaks sit higher than the dithering noise did.&lt;br /&gt;
&lt;br /&gt;
At eight bits this effect is exaggerated. At sixteen bits, [[click 16]]&lt;br /&gt;
&lt;br /&gt;
even without dither, harmonic distortion is going to be so low as to&lt;br /&gt;
be completely inaudible.&lt;br /&gt;
&lt;br /&gt;
[[draw line across -100]]&lt;br /&gt;
&lt;br /&gt;
Still, we can use dither to eliminate it completely if we so choose.&lt;br /&gt;
&lt;br /&gt;
Turning the dither off again for a moment, you&#039;ll notice that the&lt;br /&gt;
absolute level of distortion from undithered quantization stays&lt;br /&gt;
approximately constant regardless of the input amplitude.&lt;br /&gt;
&lt;br /&gt;
[[ overview: waveform ]]&lt;br /&gt;
&lt;br /&gt;
But when the signal level drops below a half a bit, everything&lt;br /&gt;
quantizes to zero.&lt;br /&gt;
&lt;br /&gt;
[[ overview: spectrum ]]&lt;br /&gt;
&lt;br /&gt;
In a sense, everything quantizing to zero is just 100% distortion!&lt;br /&gt;
Dither eliminates this distortion too. We reenable dither&lt;br /&gt;
and...&lt;br /&gt;
&lt;br /&gt;
[[dither on]]&lt;br /&gt;
&lt;br /&gt;
there&#039;s our signal back at 1/4 bit, with our nice flat noise floor.&lt;br /&gt;
&lt;br /&gt;
[[ out ]]&lt;br /&gt;
&lt;br /&gt;
The noise floor doesn&#039;t have to be flat.  Dither is noise of our&lt;br /&gt;
choosing, so let&#039;s choose a noise as inoffensive and difficult to&lt;br /&gt;
notice as possible.&lt;br /&gt;
&lt;br /&gt;
[[panel 5]]&lt;br /&gt;
&lt;br /&gt;
Our hearing is most sensitive in the midrange from 2kHz to 4kHz,&lt;br /&gt;
so that&#039;s where background noise is going to be the most obvious.&lt;br /&gt;
&lt;br /&gt;
[[annotate: underline 2-4kHz]]&lt;br /&gt;
&lt;br /&gt;
[[click shaped]&lt;br /&gt;
&lt;br /&gt;
We can shape dithering noise away from sensitive frequencies to where&lt;br /&gt;
hearing is less sensitive, usually the highest frequencies.&lt;br /&gt;
&lt;br /&gt;
[[annotate: arrow to HF]]&lt;br /&gt;
&lt;br /&gt;
[[out]]&lt;br /&gt;
16-bit dithering noise is normally much too quiet to hear at all, but&lt;br /&gt;
let&#039;s listen to our noise shaping example, again with the gain&lt;br /&gt;
brought way up...&lt;br /&gt;
&lt;br /&gt;
[[close]]&lt;br /&gt;
[[unshaped white to shaped]]&lt;br /&gt;
 &lt;br /&gt;
[[out]] Lastly, dithered quantization noise _is_ higher power overall&lt;br /&gt;
than undithered quantization noise even when it sounds quieter, and&lt;br /&gt;
you can see that on a VU meter during passages of near-silence.  But&lt;br /&gt;
dither isn&#039;t only an on or off choice. We can reduce the dither&#039;s&lt;br /&gt;
power to balance less noise against a bit of distortion to minimize&lt;br /&gt;
the overall effect.&lt;br /&gt;
&lt;br /&gt;
  [[ panel 6 audio :: flat, unmodulated ]]&lt;br /&gt;
We&#039;ll also modulate the input signal like this:&lt;br /&gt;
  [[ panel 6 audio :: flat, modulated ]]&lt;br /&gt;
...to show how a varying input affects the quantization noise.  At&lt;br /&gt;
full dithering power, the noise is uniform, constant, and featureless&lt;br /&gt;
just like we expect:&lt;br /&gt;
  [[ panel 6 audio :: flat, modulated, notch ]]&lt;br /&gt;
As we reduce the dither&#039;s power, the input increasingly&lt;br /&gt;
affects the amplitude and the character of the quantization noise:&lt;br /&gt;
  [[ panel 6 audio :: flat, modulated, notch ]]&lt;br /&gt;
Shaped dither behaves similarly, but noise shaping lends one more nice&lt;br /&gt;
advantage.  To make a long story short, it can use a somewhat lower&lt;br /&gt;
dither power before the input has as much effect on the output.&lt;br /&gt;
  [[ panel 6 audio :: shaped, modulated, notch ]]&lt;br /&gt;
  [[ reset panel :: shaped, unmodulated, no notch ]]&lt;br /&gt;
&lt;br /&gt;
[[out]]&lt;br /&gt;
&lt;br /&gt;
Despite all the time I just spent on dither, we&#039;re talking about&lt;br /&gt;
differences that start 100 decibels and more below full scale.  Maybe&lt;br /&gt;
if the CD had been 14 bits as originally designed, dither _might_ be&lt;br /&gt;
more important.  Maybe.  At 16 bits, really, it&#039;s mostly a wash.  You&lt;br /&gt;
can think of dither as an insurance policy that gives several extra&lt;br /&gt;
decibels of dynamic range just in case. The simple fact is, though, no&lt;br /&gt;
one ever ruined a great recording by not dithering the final master.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;&amp;lt;&amp;lt;&amp;lt; bandlimitation and timing &amp;gt;&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We&#039;ve been using sine waves. They&#039;re the obvious choice when what we&lt;br /&gt;
want to see is a system&#039;s behavior at a given isolated frequency.  Now&lt;br /&gt;
let&#039;s look at something a bit more complex.  What should we expect to&lt;br /&gt;
happen when I change the input to a square wave...&lt;br /&gt;
&lt;br /&gt;
[[close to sig analyzer-- press the button]]&lt;br /&gt;
&lt;br /&gt;
[[close to input scope]]&lt;br /&gt;
The input scope confirms our 1kHz square wave.  The output scope shows..&lt;br /&gt;
&lt;br /&gt;
[[close to output scope]]&lt;br /&gt;
Exactly what it should.&lt;br /&gt;
 ...&lt;br /&gt;
What is a square wave really?  &lt;br /&gt;
[[illustrate]]&lt;br /&gt;
&lt;br /&gt;
Well, we can say it&#039;s a waveform that&#039;s&lt;br /&gt;
some positive value for half a cycle and then transitions&lt;br /&gt;
instantaneously to a negative value for the other half. But that doesn&#039;t&lt;br /&gt;
really tell us anything useful about how this input [[close/point]]&lt;br /&gt;
becomes this output [[close/point]].&lt;br /&gt;
&lt;br /&gt;
[[animated diagram]]&lt;br /&gt;
Then we remember that any waveform is also the sum of discrete frequencies,&lt;br /&gt;
and a square wave is particularly simple sum: a fundamental and an&lt;br /&gt;
infinite series of odd harmonics.  Sum them all up, you get a&lt;br /&gt;
squarewave.&lt;br /&gt;
&lt;br /&gt;
[[out]]&lt;br /&gt;
At first glance, that doesn&#039;t seem very useful either. You have to sum&lt;br /&gt;
up an infinite number of harmonics to get the answer.  Ah, but we don&#039;t&lt;br /&gt;
have an infinite number of harmonics.&lt;br /&gt;
&lt;br /&gt;
[[close to panel, annotate circling cutoff, and line at 20kHz on spectrum]]&lt;br /&gt;
&lt;br /&gt;
We&#039;re using a quite sharp anti-aliasing filter that cuts off right&lt;br /&gt;
above 20kHz, so our signal is bandlimited, which means we get this:&lt;br /&gt;
&lt;br /&gt;
[[diagram]]&lt;br /&gt;
&lt;br /&gt;
..and that&#039;s exactly what we see on the output scope.&lt;br /&gt;
[[pan/fade to scope display showing they line up perfectly]]&lt;br /&gt;
&lt;br /&gt;
The rippling you see around sharp edges in a bandlimited signal is&lt;br /&gt;
called the Gibbs effect. It happens whenever you slice off part of the&lt;br /&gt;
frequency domain in the middle of nonzero energy.&lt;br /&gt;
&lt;br /&gt;
[[out]]&lt;br /&gt;
The usual rule of thumb you&#039;ll hear is &amp;quot;the sharper the cutoff, the&lt;br /&gt;
stronger the rippling&amp;quot;, which is approximately true, but we have to be&lt;br /&gt;
careful how we think about it.&lt;br /&gt;
&lt;br /&gt;
For example... what would you expect our quite sharp anti-aliasing filter&lt;br /&gt;
to do if I run our signal through it a second time?&lt;br /&gt;
&lt;br /&gt;
[[ plug plug go]]&lt;br /&gt;
[[outscope]]&lt;br /&gt;
&lt;br /&gt;
Aside from adding a few fractional cycles of delay, the answer is...&lt;br /&gt;
nothing at all.  The signal is already bandlimited. Bandlimiting it&lt;br /&gt;
again doesn&#039;t do anything.  A second pass can&#039;t remove frequencies&lt;br /&gt;
that we already removed.&lt;br /&gt;
&lt;br /&gt;
[[out]] And that&#039;s important.  People tend to think of the ripples as&lt;br /&gt;
a kind of artifact that&#039;s added by anti-aliasing and anti-imaging&lt;br /&gt;
filters, implying that the ripples get worse each time the signal&lt;br /&gt;
passes through.  We can see that in this case that didn&#039;t happen. So&lt;br /&gt;
was it really the filter that added the ripples the first time&lt;br /&gt;
through?  No, not really. It&#039;s a subtle distinction, but Gibbs effect&lt;br /&gt;
ripples aren&#039;t added by filters, they&#039;re just part of what a&lt;br /&gt;
bandlimited signal _is_.&lt;br /&gt;
&lt;br /&gt;
[[close: panel 8]]&lt;br /&gt;
&lt;br /&gt;
Even if we synthetically construct what looks like a perfect digital&lt;br /&gt;
square wave,&lt;br /&gt;
&lt;br /&gt;
[[ turn on digital &#039;squarewave&#039; ]]&lt;br /&gt;
&lt;br /&gt;
it&#039;s still limited to the channel bandwidth.  Remember,&lt;br /&gt;
the stairstep representation is misleading.&lt;br /&gt;
&lt;br /&gt;
[[go to lollipop]]&lt;br /&gt;
&lt;br /&gt;
What we really have here are instantaneous sample points,&lt;br /&gt;
&lt;br /&gt;
[[to diagram, trace original ]]&lt;br /&gt;
&lt;br /&gt;
and only one bandlimited signal fits those points.  All we did when we&lt;br /&gt;
drew our apparently perfect square wave was line up the sample points&lt;br /&gt;
just right so it appeared that there were no ripples if we played&lt;br /&gt;
connect-the-dots.&lt;br /&gt;
&lt;br /&gt;
[[ diagram: shift samples forward and back; fade to waveform display&lt;br /&gt;
showing same ]]&lt;br /&gt;
&lt;br /&gt;
But the original bandlimited signal, complete with ripples, was&lt;br /&gt;
still there.&lt;br /&gt;
&lt;br /&gt;
[[ show output scope ]]&lt;br /&gt;
[[ out ]]&lt;br /&gt;
&lt;br /&gt;
And that leads us to one more important point.  You&#039;ve probably heard&lt;br /&gt;
that the timing precision of a digital signal is limited by its sample&lt;br /&gt;
rate; put another way,&lt;br /&gt;
&lt;br /&gt;
[[diagram]]&lt;br /&gt;
&lt;br /&gt;
that digital signals can&#039;t represent anything that falls between the&lt;br /&gt;
samples.. implying that impulses or fast attacks have to align exactly&lt;br /&gt;
with a sample, or the timing gets mangled... or they just disapper.&lt;br /&gt;
&lt;br /&gt;
[[ scribble it out ]]&lt;br /&gt;
&lt;br /&gt;
At this point, we can easily see why that&#039;s wrong.&lt;br /&gt;
&lt;br /&gt;
[[ diagram: both an edge and an impulse ]]&lt;br /&gt;
&lt;br /&gt;
Again, our input signals are bandlimited. And digital signals are&lt;br /&gt;
samples, not stairsteps, not &#039;connect-the-dots&#039;.  We most certainly&lt;br /&gt;
can, for example, put the rising edge of our bandlimited square wave&lt;br /&gt;
anywhere we want between samples.&lt;br /&gt;
&lt;br /&gt;
It&#039;s represented perfectly [[show on the waveform display, move slider]]&lt;br /&gt;
and it&#039;s reconstructed perfectly [[show on output scope with moving slider]].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;&amp;lt;&amp;lt; epilogue &amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[ back in :20 sign ]]&lt;br /&gt;
&lt;br /&gt;
Just like in the previous episode, we&#039;ve covered a broad range of&lt;br /&gt;
topics, and yet barely scratched the surface of each one.  If anything, my&lt;br /&gt;
sins of omission are greater this time around... but this is a good&lt;br /&gt;
stopping point.&lt;br /&gt;
&lt;br /&gt;
Or maybe, a good starting point.  Dig deeper.  Experiment.  I chose my&lt;br /&gt;
demos very carefully to be simple and give clear results. You can&lt;br /&gt;
reproduce every one of them on your own if you like.  But let&#039;s face&lt;br /&gt;
it, sometimes we learn the most about a spiffy toy by breaking it open&lt;br /&gt;
and studying all the pieces that fall out.  And that&#039;s OK, we&#039;re&lt;br /&gt;
engineers.  Play with the demo parameters, hack up the code, set up&lt;br /&gt;
alternate experiments.  The source code for everything, including the&lt;br /&gt;
little pushbutton demo application, is up at xiph.org.&lt;br /&gt;
&lt;br /&gt;
In the course of experimentation, you&#039;re likely to run into something&lt;br /&gt;
that you didn&#039;t expect and can&#039;t explain.  Don&#039;t worry!  My earlier&lt;br /&gt;
snark aside, Wikipedia is fantastic for exactly this kind of casual&lt;br /&gt;
research. And, if you&#039;re really serious about understanding signals,&lt;br /&gt;
several universities have advanced materials online, such as the 6.003&lt;br /&gt;
and 6.007 Signals and Systems modules at MIT OpenCourseWare. And of&lt;br /&gt;
course, there&#039;s always the community here at Xiph.Org.&lt;br /&gt;
&lt;br /&gt;
Digging deeper or not, I am out of coffee, so, until next time, happy&lt;br /&gt;
hacking!&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12468</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12468"/>
		<updated>2010-09-22T06:47:31Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Introduction */ trailing quote&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_001.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
Players supporting WEBM: [http://www.videolan.org/vlc/ VLC 1.1+], [https://www.mozilla.com/en-US/firefox/all-beta.html Firefox 4 (beta)], [http://www.chromium.org/getting-involved/dev-channel Chrome (development versions)], [http://www.opera.com/ Opera], [http://www.webmproject.org/users/ more…]&lt;br /&gt;
&lt;br /&gt;
Players supporting Ogg/Theora: [http://www.videolan.org/vlc/ VLC], [http://www.firefox.com/ Firefox], [http://www.opera.com/ Opera], [[TheoraSoftwarePlayers|more…]]&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
[[Image:Dmpfg_000.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_002.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals.&amp;quot; This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*[http://www.xiph.org/about/ About Xiph.Org]: Why you should care about open media&lt;br /&gt;
*[http://www.0xdeadbeef.com/weblog/2010/01/html5-video-and-h-264-what-history-tells-us-and-why-were-standing-with-the-web/ HTML5 Video and H.264: what history tells us and why we&#039;re standing with the web]: Chris Blizzard of Mozilla on free formats and the open web&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5]: tutorial on HTML5 web video&lt;br /&gt;
*[http://webchat.freenode.net/?channels=vp8 Chat with the creators of the video] via freenode IRC in #vp8.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
[[Image:Dmpfg_004.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_006.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_007.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[WikiPedia:Sound|Sound]] is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the [[WikiPedia:Oscilloscope|&#039;scope]] down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s [[WikiPedia:Continuous_function|continuous]] in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is [[WikiPedia:Discrete_math|discrete]] in both value and time.&lt;br /&gt;
In the simplest and most common system, called [[WikiPedia:Pulse code modulation|Pulse Code Modulation]],&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the [[WikiPedia:Nyquist-Shannon sampling theorem|Sampling Theorem]] says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by [[WikiPedia:Claude Shannon|Claude Shannon]] in 1949&lt;br /&gt;
and built on the work of [[WikiPedia:Harry Nyquist|Nyquist]], and [[WikiPedia:Ralph Hartley|Hartley]], and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The [[WikiPedia:Telegraph|telegraph]] predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... [[WikiPedia:Tickertape|tickertape]]. Harry Nyquist of [[WikiPedia:Bell_labs|Bell Labs]] was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the [[WikiPedia:Nyquist_frequency|Nyquist frequency]], the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A [[WikiPedia:Low-pass_filter#Continuous-time_low-pass_filters|second-order low-pass filter]], for example,&lt;br /&gt;
requires two passive components.  An all-analog [[WikiPedia:Short-time_Fourier_transform|short-time Fourier&lt;br /&gt;
transform]], a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy (bang on the [http://www.testequipmentdepot.com/usedequipment/hewlettpackard/spectrumanalyzers/3585a.htm 3585]).  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And [[WikiPedia:File:Transistor_Count_and_Moore&#039;s_Law_-_2008.svg|since we&lt;br /&gt;
do]], digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds [[WikiPedia:Johnson–Nyquist_noise|noise]] and [[WikiPedia:Distortion#Electronic_signals|distortion]], usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an [[WikiPedia:Inductor|inductor]] the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use [[WikiPedia:Lossy_compression|lossy]] algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern [[WikiPedia:Digital-to-analog_converter|conversion stages]] are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[WikiPedia:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
*Wikipedia: [[WikiPedia:Passive_analogue_filter_development|The history of analog filters]] such as the [[WikiPedia:RC circuit|RC low-pass]] show connected to the spectrum analyizer in the video.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
[[WikiPedia:Delta-sigma_modulation|Sigma-Delta coding]] used by the [[WikiPedia:Super_Audio_CD|SACD]], which is a form of [[wikipedia:Pulse-density_modulation|Pulse Density&lt;br /&gt;
Modulation]].  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample rate===&lt;br /&gt;
[[Image:Dmpfg_009.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_008.jpg|360px|right]]&lt;br /&gt;
The first parameter is the [[wikipedia:Sampling_rate|sampling rate]].  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally [[wikipedia:Bandlimiting|band-limited]] voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
Stepping back for just a second, the French mathematician [[wikipedia:Joseph_Fourier|Jean&lt;br /&gt;
Baptiste Joseph Fourier]] showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This [[wikipedia:Frequency_domain|frequency-domain]]&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it [[wikipedia:Basis_(linear_algebra)|a different way]].  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as [[wikipedia:Aliasing|aliasing distortion]].&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
[[wikipedia:Hearing_range|20kHz]] but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
[[wikipedia:Octave_(electronics)|octave]] or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample format===&lt;br /&gt;
[[Image:Dmpfg_anim.gif|right]]&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was [[wikipedia:Quantization_(sound_processing)#Audio_quantization|eight-bit]] [[wikipedia:Linear_pulse_code_modulation|linear]], encoded as an [[wikipedia:Signedness|unsigned]] [[wikipedia:Integer_(computer_science)#Bytes_and_octets|byte]].  The&lt;br /&gt;
[[wikipedia:Dynamic_range#Audio|dynamic range]] is limited to about [[wikipedia:Decibel|50dB]]  and the [[wikipedia:Quantization_error|quantization noise]], as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called [[wikipedia:A-law_algorithm|A-law]] and [[wikipedia:Μ-law_algorithm|μ-law]]. These formats encode a roughly&lt;br /&gt;
[[wikipedia:Audio_bit_depth#Dynamic_range|14 bit dynamic range]] into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit [[wikipedia:Two&#039;s_complement|two&#039;s-complement]] signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are [[wikipedia:Clipping_(audio)|clipped]].&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use [[wikipedia:Floating_point|floating-point]]&lt;br /&gt;
numbers for PCM instead of [[wikipedia:Integer_(computer_science)|integers]].  A 32 bit [[wikipedia:IEEE_754-2008|IEEE754]] float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in [[wikipedia:Endianness|big- or little-endian order]], and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft [[wikipedia:WAV|WAV]] files are little-endian,&lt;br /&gt;
and Apple [[wikipedia:AIFC|AIFC]] files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of [[wikipedia:Multichannel_audio|channels]].  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [[wikipedia:Roll-off|Wikipedia&#039;s article on filter roll-off]], to learn why it&#039;s hard to build analog filters with a very narrow [[wikipedia:Transition_band|transition band]] between the [[wikipedia:Passband|passband]] and the [[wikipedia:Stopband|stopband]].  Filters that achieve such hard edges often do so at the expense of increased [[wikipedia:Ripple_(filters)#Frequency-domain_ripple|ripple]] and [http://www.ocf.berkeley.edu/~ashon/audio/phase/phaseaud2.htm phase distortion].&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=PCM Some more minutiae] about PCM in practice.&lt;br /&gt;
* [[wikipedia:DPCM|DPCM]] and [[wikipedia:ADPCM|ADPCM]], simple audio codecs loosely inspired by PCM.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
[[Image:Dmpfg_010.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. [[wikipedia:Red_Book_(audio_Compact_Disc_standard)#Technical_details|Raw CD audio]] is about 1.4 megabits&lt;br /&gt;
per second. Raw [[wikipedia:1080i|1080i]] HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By [[wikipedia:Moore&#039;s_law|Moore&#039;s law]]... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal [[wikipedia:NTSC|analog television broadcast]].  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
[[Image:Dmpfg_011.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of [[wikipedia:Scan_line|scanlines]] in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
[[wikipedia:Bandwidth_(signal_processing)|bandwidth]]. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of [[wikipedia:DVD-Video#Frame_size_and_frame_rate|704 by 480]], a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of [[wikipedia:Standard-definition_television#Resolution|10:11]], making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
[[Image:Dmpfg_012.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the [[wikipedia:Frame_rate|frame rate]], the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
[[wikipedia:Interlace|interlacing]].&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize [[wikipedia:Flicker_(screen)|flicker]]&lt;br /&gt;
on phosphor-based [[wikipedia:Cathode_ray_tube|CRTs]].  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
[[wikipedia:Deinterlacing|deinterlace]] a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
[[Image:Dmpfg_013.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary [[wikipedia:Gamma_correction|gamma correction]] circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard [[wikipedia:sRGB|sRGB]] computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
[[Image:Dmpfg_014.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as [[wikipedia:Additive_color|additive primaries]] to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are [[wikipedia:CMYK|Cyan, Magenta, and Yellow]] for the same reason; pigments&lt;br /&gt;
are [[wikipedia:Subtractive_color|subtractive]], and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to [[wikipedia:Luminance_(relative)|luminosity]] than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution [[wikipedia:Luma_(video)|luma channel]]&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution [[wikipedia:Chrominance|chroma channels]], the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called [[wikipedia:Y&#039;CbCr|Y&#039;CbCr]], but the more generic term YUV is&lt;br /&gt;
widely used to describe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
[[Image:Dmpfg_015.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually [[wikipedia:Chroma_subsampling|halved or even quartered]] in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are [[wikipedia:Chroma_subsampling#4:4:4_Y.27CbCr|4:4:4]] video, which isn&#039;t actually subsampled at all, [[wikipedia:Chroma_subsampling#4:2:2|4:2:2]] video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, [[wikipedia:Chroma_subsampling#4:2:0|4:2:0]] video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, [[wikipedia:Chroma_subsampling#4:1:1|4:1:1]], and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, [[wikipedia:Motion_Jpeg|motion&lt;br /&gt;
JPEG]], [[wikipedia:MPEG-1#Part_2:_Video|MPEG-1 video]], [[wikipedia:MPEG-2#Video_coding_.28simplified.29|MPEG-2 video]], [[wikipedia:DV#DV_Compression|DV]], [[wikipedia:Theora|Theora]], and [[wikipedia:WebM|WebM]] all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels [http://www.mir.com/DMG/chroma.html three different ways].&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG-1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG-2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
[[Image:Dmpfg_016.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least [http://www.fourcc.org/yuv.php 50 different formats] in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or [[wikipedia:FourCC|fourcc]] code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  [http://www.fourcc.org/yuv.php#UYVY YV12]&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of [[wikipedia:YUV#BT.709_and_BT.601|several YUV colorspace definitions]].&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=YUV4MPEG2 The y4m format] is the most common simple container for raw YUV video.  People occasionally use [[OggYUV]] to store it in Ogg instead.&lt;br /&gt;
* Learn about [[wikipedia:High_dynamic_range_imaging|high dynamic range imaging]], which achieves better representation of the full range of brightnesses in the real world by using more than 8 bits per channel.&lt;br /&gt;
* Learn about how [[wikipedia:Trichromatic_vision|trichromatic color vision]] works in humans, and how human color perception is encoded in the [[wikipedia:CIE 1931 color space|CIE 1931 XYZ color space]].&lt;br /&gt;
** Compare with the [[wikipedia:Lab_color_space|Lab color space]], mathematically equivalent but structured to account for &amp;quot;perceptual uniformity&amp;quot;.&lt;br /&gt;
** If we were all [[wikipedia:Dichromacy|dichromats]] then video would only need two color channels.  Some humans might be [[wikipedia:Tetrachromacy#Possibility_of_human_tetrachromats|tetrachromats]], in which case they would need an additional color channel for video to fully represent their vision.&lt;br /&gt;
** [http://www.xritephoto.com/ph_toolframe.aspx?action=coloriq Test your color vision] (or at least your monitor).&lt;br /&gt;
* YCbCr is defined in terms of RGB by the ITU in two incompatible standards: [[wikipedia:Rec. 601|Rec. 601]] and [[wikipedia:Rec. 709|Rec. 709]].  Both conversion standards are lossy, which has prompted some to adopt a lossless alternative called [http://wiki.multimedia.cx/index.php?title=YCoCg YCoCg].&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
[[Image:Dmpfg_017.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other [[wikipedia:Metadata#Video|metadata]] we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a [[wikipedia:Container_format_(digital)|container]].  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
[[Image:Dmpfg_018.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_019.png|360px|right]]&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the [http://microscopicseptet.com/ Microscopic Septet]&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.cuneiformrecords.com www.cuneiformrecords.com]&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software:&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://www.gnu.org/ GNU]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.linux.org/ Linux]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://fedoraproject.org/ Fedora]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://cinelerra.org/ Cinelerra]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.gimp.org/ The Gimp]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://audacity.sourceforge.net/ Audacity]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://svn.xiph.org/trunk/postfish/README Postfish]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://gstreamer.freedesktop.org/ Gstreamer]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
All trademarks are the property of their respective owners. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Complete video&#039;&#039; [http://creativecommons.org/licenses/by-nc-sa/3.0/legalcode CC-BY-NC-SA]&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;Text transcript and Wiki edition&#039;&#039; [http://creativecommons.org/licenses/by-sa/3.0/legalcode CC-BY-SA]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&amp;lt;br&amp;gt;&lt;br /&gt;
(C) 2010, Some Rights Reserved&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&amp;lt;hr/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+1&amp;quot;&amp;gt;&#039;&#039;[[A Digital Media Primer For Geeks (episode 1)/making|Learn more about the making of this video…]]&#039;&#039;&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12448</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12448"/>
		<updated>2010-09-22T04:38:36Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Credits */ microscopic septet link, license link&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Image:Dmpfg_001.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
[[Image:Dmpfg_000.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_002.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*[http://www.0xdeadbeef.com/weblog/2010/01/html5-video-and-h-264-what-history-tells-us-and-why-were-standing-with-the-web/ HTML5 Video and H.264: what history tells us and why we&#039;re standing with the web]: Chris Blizzard of Mozilla on free formats and the open web&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
[[Image:Dmpfg_004.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_006.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_007.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[WikiPedia:Sound|Sound]] is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the [[WikiPedia:Oscilloscope|&#039;scope]] down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s [[WikiPedia:Continuous_function|continuous]] in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is [[WikiPedia:Discrete_math|discrete]] in both value and time.&lt;br /&gt;
In the simplest and most common system, called [[WikiPedia:Pulse code modulation|Pulse Code Modulation]],&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the [[WikiPedia:Nyquist-Shannon sampling theorem|Sampling Theorem]] says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by [[WikiPedia:Claude Shannon|Claude Shannon]] in 1949&lt;br /&gt;
and built on the work of [[WikiPedia:Harry Nyquist|Nyquist]], and [[WikiPedia:Ralph Hartley|Hartley]], and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The [[WikiPedia:Telegraph|telegraph]] predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... [[WikiPedia:Tickertape|tickertape]]. Harry Nyquist of [[WikiPedia:Bell_labs|Bell Labs]] was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the [[WikiPedia:Nyquist_frequency|Nyquist frequency]], the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A [[WikiPedia:Low-pass_filter#Continuous-time_low-pass_filters|second-order low-pass filter]], for example,&lt;br /&gt;
requires two passive components.  An all-analog [[WikiPedia:Short-time_Fourier_transform|short-time Fourier&lt;br /&gt;
transform]], a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy (bang on the [http://www.testequipmentdepot.com/usedequipment/hewlettpackard/spectrumanalyzers/3585a.htm 3585]).  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And [[WikiPedia:File:Transistor_Count_and_Moore&#039;s_Law_-_2008.svg|since we&lt;br /&gt;
do]], digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds [[WikiPedia:Johnson–Nyquist_noise|noise]] and [[WikiPedia:Distortion#Electronic_signals|distortion]], usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an [[WikiPedia:Inductor|inductor]] the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use [[WikiPedia:Lossy_compression|lossy]] algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern [[WikiPedia:Digital-to-analog_converter|conversion stages]] are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
[[WikiPedia:Delta-sigma_modulation|Sigma-Delta coding]] used by the [[WikiPedia:Super_Audio_CD|SACD]], which is a form of [[wikipedia:Pulse-density_modulation|Pulse Density&lt;br /&gt;
Modulation]].  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample rate===&lt;br /&gt;
[[Image:Dmpfg_009.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_008.jpg|360px|right]]&lt;br /&gt;
The first parameter is the [[wikipedia:Sampling_rate|sampling rate]].  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally [[wikipedia:Bandlimiting|band-limited]] voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
Stepping back for just a second, the French mathematician [[wikipedia:Joseph_Fourier|Jean&lt;br /&gt;
Baptiste Joseph Fourier]] showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This [[wikipedia:Frequency_domain|frequency-domain]]&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it [[wikipedia:Basis_(linear_algebra)|a different way]].  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as [[wikipedia:Aliasing|aliasing distortion]].&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
[[wikipedia:Hearing_range|20kHz]] but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
[[wikipedia:Octave_(electronics)|octave]] or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample format===&lt;br /&gt;
[[Image:Dmpfg_anim.gif|right]]&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was [[wikipedia:Quantization_(sound_processing)#Audio_quantization|eight-bit]] [[wikipedia:Linear_pulse_code_modulation|linear]], encoded as an [[wikipedia:Signedness|unsigned]] [[wikipedia:Integer_(computer_science)#Bytes_and_octets|byte]].  The&lt;br /&gt;
[[wikipedia:Dynamic_range#Audio|dynamic range]] is limited to about [[wikipedia:Decibel|50dB]]  and the [[wikipedia:Quantization_error|quantization noise]], as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called [[wikipedia:A-law_algorithm|A-law]] and [[wikipedia:Μ-law_algorithm|μ-law]]. These formats encode a roughly&lt;br /&gt;
[[wikipedia:Audio_bit_depth#Dynamic_range|14 bit dynamic range]] into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit [[wikipedia:Two&#039;s_complement|two&#039;s-complement]] signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are [[wikipedia:Clipping_(audio)|clipped]].&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use [[wikipedia:Floating_point|floating-point]]&lt;br /&gt;
numbers for PCM instead of [[wikipedia:Integer_(computer_science)|integers]].  A 32 bit [[wikipedia:IEEE_754-2008|IEEE754]] float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in [[wikipedia:Endianness|big- or little-endian order]], and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft [[wikipedia:WAV|WAV]] files are little-endian,&lt;br /&gt;
and Apple [[wikipedia:AIFC|AIFC]] files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of [[wikipedia:Multichannel_audio|channels]].  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [[wikipedia:Roll-off|Wikipedia&#039;s article on filter roll-off]], to learn why it&#039;s hard to build analog filters with a very narrow [[wikipedia:Transition_band|transition band]] between the [[wikipedia:Passband|passband]] and the [[wikipedia:Stopband|stopband]].&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=PCM Some more minutiae] about PCM in practice.&lt;br /&gt;
* [[wikipedia:DPCM|DPCM]] and [[wikipedia:ADPCM|ADPCM]], simple audio codecs loosely inspired by PCM.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
[[Image:Dmpfg_010.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. [[wikipedia:Red_Book_(audio_Compact_Disc_standard)#Technical_details|Raw CD audio]] is about 1.4 megabits&lt;br /&gt;
per second. Raw [[wikipedia:1080i|1080i]] HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By [[wikipedia:Moore&#039;s_law|Moore&#039;s law]]... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal [[wikipedia:NTSC|analog television broadcast]].  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
[[Image:Dmpfg_011.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of [[wikipedia:Scan_line|scanlines]] in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
[[wikipedia:Bandwidth_(signal_processing)|bandwidth]]. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of [[wikipedia:DVD-Video#Frame_size_and_frame_rate|704 by 480]], a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of [[wikipedia:Standard-definition_television#Resolution|10:11]], making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
[[Image:Dmpfg_012.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the [[wikipedia:Frame_rate|frame rate]], the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
[[wikipedia:Interlace|interlacing]].&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize [[wikipedia:Flicker_(screen)|flicker]]&lt;br /&gt;
on phosphor-based [[wikipedia:Cathode_ray_tube|CRTs]].  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
[[wikipedia:Deinterlacing|deinterlace]] a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
[[Image:Dmpfg_013.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary [[wikipedia:Gamma_correction|gamma correction]] circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard [[wikipedia:sRGB|sRGB]] computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
[[Image:Dmpfg_014.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as [[wikipedia:Additive_color|additive primaries]] to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are [[wikipedia:CMYK|Cyan, Magenta, and Yellow]] for the same reason; pigments&lt;br /&gt;
are [[wikipedia:Subtractive_color|subtractive]], and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to [[wikipedia:Luminance_(relative)|luminosity]] than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution [[wikipedia:Luma_(video)|luma channel]]&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution [[wikipedia:Chrominance|chroma channels]], the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called [[wikipedia:Y&#039;CbCr|Y&#039;CbCr]], but the more generic term YUV is&lt;br /&gt;
widely used to describe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
[[Image:Dmpfg_015.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually [[wikipedia:Chroma_subsampling|halved or even quartered]] in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are [[wikipedia:Chroma_subsampling#4:4:4_Y.27CbCr|4:4:4]] video, which isn&#039;t actually subsampled at all, [[wikipedia:Chroma_subsampling#4:2:2|4:2:2]] video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, [[wikipedia:Chroma_subsampling#4:2:0|4:2:0]] video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, [[wikipedia:Chroma_subsampling#4:1:1|4:1:1]], and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, [[wikipedia:Motion_Jpeg|motion&lt;br /&gt;
JPEG]], [[wikipedia:MPEG-1#Part_2:_Video|MPEG-1 video]], [[wikipedia:MPEG-2#Video_coding_.28simplified.29|MPEG-2 video]], [[wikipedia:DV#DV_Compression|DV]], [[wikipedia:Theora|Theora]], and [[wikipedia:WebM|WebM]] all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels [http://www.mir.com/DMG/chroma.html three different ways].&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG-1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG-2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
[[Image:Dmpfg_016.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least [http://www.fourcc.org/yuv.php 50 different formats] in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or [[wikipedia:FourCC|fourcc]] code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  [http://www.fourcc.org/yuv.php#UYVY YV12]&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of [[wikipedia:YUV#BT.709_and_BT.601|several YUV colorspace definitions]].&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=YUV4MPEG2 The y4m format] is the most common simple container for raw YUV video.  People occasionally use [[OggYUV]] to store it in Ogg instead.&lt;br /&gt;
* Learn about [[wikipedia:High_dynamic_range_imaging|high dynamic range imaging]], which achieves better representation of the full range of brightnesses in the real world by using more than 8 bits per channel.&lt;br /&gt;
* Learn about how [[wikipedia:Trichromatic_vision|trichromatic color vision]] works in humans, and how human color perception is encoded in the [[wikipedia:CIE 1931 color space|CIE 1931 XYZ color space]].&lt;br /&gt;
** Compare with the [[wikipedia:Lab_color_space|Lab color space]], mathematically equivalent but structured to account for &amp;quot;perceptual uniformity&amp;quot;.&lt;br /&gt;
** If we were all [[wikipedia:Dichromacy|dichromats]] then video would only need two color channels.  Some humans might be [[wikipedia:Tetrachromacy#Possibility_of_human_tetrachromats|tetrachromats]], in which case they would need an additional color channel for video to fully represent their vision.&lt;br /&gt;
** [http://www.xritephoto.com/ph_toolframe.aspx?action=coloriq Test your color vision] (or at least your monitor).&lt;br /&gt;
* YCbCr is defined in terms of RGB by the ITU in two incompatible standards: [[wikipedia:Rec. 601|Rec. 601]] and [[wikipedia:Rec. 709|Rec. 709]].  Both conversion standards are lossy, which has prompted some to adopt a lossless alternative called [http://wiki.multimedia.cx/index.php?title=YCoCg YCoCg].&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
[[Image:Dmpfg_017.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other [[wikipedia:Metadata#Video|metadata]] we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a [[wikipedia:Container_format_(digital)|container]].  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
[[Image:Dmpfg_018.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the [http://microscopicseptet.com/ Microscopic Septet]&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.cuneiformrecords.com www.cuneiformrecords.com]&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software:&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://www.gnu.org/ GNU]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.linux.org/ Linux]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://fedoraproject.org/ Fedora]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://cinelerra.org/ Cinelerra]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.gimp.org/ The Gimp]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://audacity.sourceforge.net/ Audacity]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://svn.xiph.org/trunk/postfish/README Postfish]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://gstreamer.freedesktop.org/ Gstreamer]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://creativecommons.org/licenses/by-nc-sa/3.0/legalcode CC-BY-NC-SA]&amp;lt;br&amp;gt;&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&amp;lt;br&amp;gt;&lt;br /&gt;
(C) 2010, Some Rights Reserved&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking a lot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast, and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orange audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy).&lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12447</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12447"/>
		<updated>2010-09-22T04:36:52Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Credits */ project links&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Image:Dmpfg_001.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
[[Image:Dmpfg_000.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_002.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*[http://www.0xdeadbeef.com/weblog/2010/01/html5-video-and-h-264-what-history-tells-us-and-why-were-standing-with-the-web/ HTML5 Video and H.264: what history tells us and why we&#039;re standing with the web]: Chris Blizzard of Mozilla on free formats and the open web&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
[[Image:Dmpfg_004.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_006.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_007.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[WikiPedia:Sound|Sound]] is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the [[WikiPedia:Oscilloscope|&#039;scope]] down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s [[WikiPedia:Continuous_function|continuous]] in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is [[WikiPedia:Discrete_math|discrete]] in both value and time.&lt;br /&gt;
In the simplest and most common system, called [[WikiPedia:Pulse code modulation|Pulse Code Modulation]],&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the [[WikiPedia:Nyquist-Shannon sampling theorem|Sampling Theorem]] says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by [[WikiPedia:Claude Shannon|Claude Shannon]] in 1949&lt;br /&gt;
and built on the work of [[WikiPedia:Harry Nyquist|Nyquist]], and [[WikiPedia:Ralph Hartley|Hartley]], and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The [[WikiPedia:Telegraph|telegraph]] predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... [[WikiPedia:Tickertape|tickertape]]. Harry Nyquist of [[WikiPedia:Bell_labs|Bell Labs]] was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the [[WikiPedia:Nyquist_frequency|Nyquist frequency]], the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A [[WikiPedia:Low-pass_filter#Continuous-time_low-pass_filters|second-order low-pass filter]], for example,&lt;br /&gt;
requires two passive components.  An all-analog [[WikiPedia:Short-time_Fourier_transform|short-time Fourier&lt;br /&gt;
transform]], a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy (bang on the [http://www.testequipmentdepot.com/usedequipment/hewlettpackard/spectrumanalyzers/3585a.htm 3585]).  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And [[WikiPedia:File:Transistor_Count_and_Moore&#039;s_Law_-_2008.svg|since we&lt;br /&gt;
do]], digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds [[WikiPedia:Johnson–Nyquist_noise|noise]] and [[WikiPedia:Distortion#Electronic_signals|distortion]], usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an [[WikiPedia:Inductor|inductor]] the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use [[WikiPedia:Lossy_compression|lossy]] algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern [[WikiPedia:Digital-to-analog_converter|conversion stages]] are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
[[WikiPedia:Delta-sigma_modulation|Sigma-Delta coding]] used by the [[WikiPedia:Super_Audio_CD|SACD]], which is a form of [[wikipedia:Pulse-density_modulation|Pulse Density&lt;br /&gt;
Modulation]].  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample rate===&lt;br /&gt;
[[Image:Dmpfg_009.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_008.jpg|360px|right]]&lt;br /&gt;
The first parameter is the [[wikipedia:Sampling_rate|sampling rate]].  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally [[wikipedia:Bandlimiting|band-limited]] voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
Stepping back for just a second, the French mathematician [[wikipedia:Joseph_Fourier|Jean&lt;br /&gt;
Baptiste Joseph Fourier]] showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This [[wikipedia:Frequency_domain|frequency-domain]]&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it [[wikipedia:Basis_(linear_algebra)|a different way]].  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as [[wikipedia:Aliasing|aliasing distortion]].&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
[[wikipedia:Hearing_range|20kHz]] but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
[[wikipedia:Octave_(electronics)|octave]] or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample format===&lt;br /&gt;
[[Image:Dmpfg_anim.gif|right]]&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was [[wikipedia:Quantization_(sound_processing)#Audio_quantization|eight-bit]] [[wikipedia:Linear_pulse_code_modulation|linear]], encoded as an [[wikipedia:Signedness|unsigned]] [[wikipedia:Integer_(computer_science)#Bytes_and_octets|byte]].  The&lt;br /&gt;
[[wikipedia:Dynamic_range#Audio|dynamic range]] is limited to about [[wikipedia:Decibel|50dB]]  and the [[wikipedia:Quantization_error|quantization noise]], as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called [[wikipedia:A-law_algorithm|A-law]] and [[wikipedia:Μ-law_algorithm|μ-law]]. These formats encode a roughly&lt;br /&gt;
[[wikipedia:Audio_bit_depth#Dynamic_range|14 bit dynamic range]] into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit [[wikipedia:Two&#039;s_complement|two&#039;s-complement]] signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are [[wikipedia:Clipping_(audio)|clipped]].&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use [[wikipedia:Floating_point|floating-point]]&lt;br /&gt;
numbers for PCM instead of [[wikipedia:Integer_(computer_science)|integers]].  A 32 bit [[wikipedia:IEEE_754-2008|IEEE754]] float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in [[wikipedia:Endianness|big- or little-endian order]], and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft [[wikipedia:WAV|WAV]] files are little-endian,&lt;br /&gt;
and Apple [[wikipedia:AIFC|AIFC]] files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of [[wikipedia:Multichannel_audio|channels]].  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [[wikipedia:Roll-off|Wikipedia&#039;s article on filter roll-off]], to learn why it&#039;s hard to build analog filters with a very narrow [[wikipedia:Transition_band|transition band]] between the [[wikipedia:Passband|passband]] and the [[wikipedia:Stopband|stopband]].&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=PCM Some more minutiae] about PCM in practice.&lt;br /&gt;
* [[wikipedia:DPCM|DPCM]] and [[wikipedia:ADPCM|ADPCM]], simple audio codecs loosely inspired by PCM.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
[[Image:Dmpfg_010.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. [[wikipedia:Red_Book_(audio_Compact_Disc_standard)#Technical_details|Raw CD audio]] is about 1.4 megabits&lt;br /&gt;
per second. Raw [[wikipedia:1080i|1080i]] HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By [[wikipedia:Moore&#039;s_law|Moore&#039;s law]]... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal [[wikipedia:NTSC|analog television broadcast]].  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
[[Image:Dmpfg_011.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of [[wikipedia:Scan_line|scanlines]] in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
[[wikipedia:Bandwidth_(signal_processing)|bandwidth]]. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of [[wikipedia:DVD-Video#Frame_size_and_frame_rate|704 by 480]], a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of [[wikipedia:Standard-definition_television#Resolution|10:11]], making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
[[Image:Dmpfg_012.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the [[wikipedia:Frame_rate|frame rate]], the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
[[wikipedia:Interlace|interlacing]].&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize [[wikipedia:Flicker_(screen)|flicker]]&lt;br /&gt;
on phosphor-based [[wikipedia:Cathode_ray_tube|CRTs]].  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
[[wikipedia:Deinterlacing|deinterlace]] a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
[[Image:Dmpfg_013.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary [[wikipedia:Gamma_correction|gamma correction]] circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard [[wikipedia:sRGB|sRGB]] computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
[[Image:Dmpfg_014.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as [[wikipedia:Additive_color|additive primaries]] to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are [[wikipedia:CMYK|Cyan, Magenta, and Yellow]] for the same reason; pigments&lt;br /&gt;
are [[wikipedia:Subtractive_color|subtractive]], and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to [[wikipedia:Luminance_(relative)|luminosity]] than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution [[wikipedia:Luma_(video)|luma channel]]&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution [[wikipedia:Chrominance|chroma channels]], the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called [[wikipedia:Y&#039;CbCr|Y&#039;CbCr]], but the more generic term YUV is&lt;br /&gt;
widely used to describe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
[[Image:Dmpfg_015.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually [[wikipedia:Chroma_subsampling|halved or even quartered]] in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are [[wikipedia:Chroma_subsampling#4:4:4_Y.27CbCr|4:4:4]] video, which isn&#039;t actually subsampled at all, [[wikipedia:Chroma_subsampling#4:2:2|4:2:2]] video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, [[wikipedia:Chroma_subsampling#4:2:0|4:2:0]] video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, [[wikipedia:Chroma_subsampling#4:1:1|4:1:1]], and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, [[wikipedia:Motion_Jpeg|motion&lt;br /&gt;
JPEG]], [[wikipedia:MPEG-1#Part_2:_Video|MPEG-1 video]], [[wikipedia:MPEG-2#Video_coding_.28simplified.29|MPEG-2 video]], [[wikipedia:DV#DV_Compression|DV]], [[wikipedia:Theora|Theora]], and [[wikipedia:WebM|WebM]] all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels [http://www.mir.com/DMG/chroma.html three different ways].&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG-1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG-2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
[[Image:Dmpfg_016.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least [http://www.fourcc.org/yuv.php 50 different formats] in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or [[wikipedia:FourCC|fourcc]] code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  [http://www.fourcc.org/yuv.php#UYVY YV12]&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of [[wikipedia:YUV#BT.709_and_BT.601|several YUV colorspace definitions]].&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=YUV4MPEG2 The y4m format] is the most common simple container for raw YUV video.  People occasionally use [[OggYUV]] to store it in Ogg instead.&lt;br /&gt;
* Learn about [[wikipedia:High_dynamic_range_imaging|high dynamic range imaging]], which achieves better representation of the full range of brightnesses in the real world by using more than 8 bits per channel.&lt;br /&gt;
* Learn about how [[wikipedia:Trichromatic_vision|trichromatic color vision]] works in humans, and how human color perception is encoded in the [[wikipedia:CIE 1931 color space|CIE 1931 XYZ color space]].&lt;br /&gt;
** Compare with the [[wikipedia:Lab_color_space|Lab color space]], mathematically equivalent but structured to account for &amp;quot;perceptual uniformity&amp;quot;.&lt;br /&gt;
** If we were all [[wikipedia:Dichromacy|dichromats]] then video would only need two color channels.  Some humans might be [[wikipedia:Tetrachromacy#Possibility_of_human_tetrachromats|tetrachromats]], in which case they would need an additional color channel for video to fully represent their vision.&lt;br /&gt;
** [http://www.xritephoto.com/ph_toolframe.aspx?action=coloriq Test your color vision] (or at least your monitor).&lt;br /&gt;
* YCbCr is defined in terms of RGB by the ITU in two incompatible standards: [[wikipedia:Rec. 601|Rec. 601]] and [[wikipedia:Rec. 709|Rec. 709]].  Both conversion standards are lossy, which has prompted some to adopt a lossless alternative called [http://wiki.multimedia.cx/index.php?title=YCoCg YCoCg].&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
[[Image:Dmpfg_017.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other [[wikipedia:Metadata#Video|metadata]] we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a [[wikipedia:Container_format_(digital)|container]].  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
[[Image:Dmpfg_018.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the Microscopic Septet&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.cuneiformrecords.com www.cuneiformrecords.com]&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software:&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://www.gnu.org/ GNU]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.linux.org/ Linux]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://fedoraproject.org/ Fedora]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://cinelerra.org/ Cinelerra]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.gimp.org/ The Gimp]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://audacity.sourceforge.net/ Audacity]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://svn.xiph.org/trunk/postfish/README Postfish]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://gstreamer.freedesktop.org/ Gstreamer]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&amp;lt;br&amp;gt;&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&amp;lt;br&amp;gt;&lt;br /&gt;
(C) 2010, Some Rights Reserved&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking a lot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast, and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orange audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy).&lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12446</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12446"/>
		<updated>2010-09-22T04:35:24Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Credits */ project links&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Image:Dmpfg_001.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
[[Image:Dmpfg_000.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_002.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*[http://www.0xdeadbeef.com/weblog/2010/01/html5-video-and-h-264-what-history-tells-us-and-why-were-standing-with-the-web/ HTML5 Video and H.264: what history tells us and why we&#039;re standing with the web]: Chris Blizzard of Mozilla on free formats and the open web&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
[[Image:Dmpfg_004.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_006.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_007.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[WikiPedia:Sound|Sound]] is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the [[WikiPedia:Oscilloscope|&#039;scope]] down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s [[WikiPedia:Continuous_function|continuous]] in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is [[WikiPedia:Discrete_math|discrete]] in both value and time.&lt;br /&gt;
In the simplest and most common system, called [[WikiPedia:Pulse code modulation|Pulse Code Modulation]],&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the [[WikiPedia:Nyquist-Shannon sampling theorem|Sampling Theorem]] says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by [[WikiPedia:Claude Shannon|Claude Shannon]] in 1949&lt;br /&gt;
and built on the work of [[WikiPedia:Harry Nyquist|Nyquist]], and [[WikiPedia:Ralph Hartley|Hartley]], and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The [[WikiPedia:Telegraph|telegraph]] predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... [[WikiPedia:Tickertape|tickertape]]. Harry Nyquist of [[WikiPedia:Bell_labs|Bell Labs]] was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the [[WikiPedia:Nyquist_frequency|Nyquist frequency]], the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A [[WikiPedia:Low-pass_filter#Continuous-time_low-pass_filters|second-order low-pass filter]], for example,&lt;br /&gt;
requires two passive components.  An all-analog [[WikiPedia:Short-time_Fourier_transform|short-time Fourier&lt;br /&gt;
transform]], a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy (bang on the [http://www.testequipmentdepot.com/usedequipment/hewlettpackard/spectrumanalyzers/3585a.htm 3585]).  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And [[WikiPedia:File:Transistor_Count_and_Moore&#039;s_Law_-_2008.svg|since we&lt;br /&gt;
do]], digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds [[WikiPedia:Johnson–Nyquist_noise|noise]] and [[WikiPedia:Distortion#Electronic_signals|distortion]], usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an [[WikiPedia:Inductor|inductor]] the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use [[WikiPedia:Lossy_compression|lossy]] algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern [[WikiPedia:Digital-to-analog_converter|conversion stages]] are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
[[WikiPedia:Delta-sigma_modulation|Sigma-Delta coding]] used by the [[WikiPedia:Super_Audio_CD|SACD]], which is a form of [[wikipedia:Pulse-density_modulation|Pulse Density&lt;br /&gt;
Modulation]].  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample rate===&lt;br /&gt;
[[Image:Dmpfg_009.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_008.jpg|360px|right]]&lt;br /&gt;
The first parameter is the [[wikipedia:Sampling_rate|sampling rate]].  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally [[wikipedia:Bandlimiting|band-limited]] voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
Stepping back for just a second, the French mathematician [[wikipedia:Joseph_Fourier|Jean&lt;br /&gt;
Baptiste Joseph Fourier]] showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This [[wikipedia:Frequency_domain|frequency-domain]]&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it [[wikipedia:Basis_(linear_algebra)|a different way]].  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as [[wikipedia:Aliasing|aliasing distortion]].&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
[[wikipedia:Hearing_range|20kHz]] but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
[[wikipedia:Octave_(electronics)|octave]] or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample format===&lt;br /&gt;
[[Image:Dmpfg_anim.gif|right]]&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was [[wikipedia:Quantization_(sound_processing)#Audio_quantization|eight-bit]] [[wikipedia:Linear_pulse_code_modulation|linear]], encoded as an [[wikipedia:Signedness|unsigned]] [[wikipedia:Integer_(computer_science)#Bytes_and_octets|byte]].  The&lt;br /&gt;
[[wikipedia:Dynamic_range#Audio|dynamic range]] is limited to about [[wikipedia:Decibel|50dB]]  and the [[wikipedia:Quantization_error|quantization noise]], as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called [[wikipedia:A-law_algorithm|A-law]] and [[wikipedia:Μ-law_algorithm|μ-law]]. These formats encode a roughly&lt;br /&gt;
[[wikipedia:Audio_bit_depth#Dynamic_range|14 bit dynamic range]] into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit [[wikipedia:Two&#039;s_complement|two&#039;s-complement]] signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are [[wikipedia:Clipping_(audio)|clipped]].&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use [[wikipedia:Floating_point|floating-point]]&lt;br /&gt;
numbers for PCM instead of [[wikipedia:Integer_(computer_science)|integers]].  A 32 bit [[wikipedia:IEEE_754-2008|IEEE754]] float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in [[wikipedia:Endianness|big- or little-endian order]], and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft [[wikipedia:WAV|WAV]] files are little-endian,&lt;br /&gt;
and Apple [[wikipedia:AIFC|AIFC]] files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of [[wikipedia:Multichannel_audio|channels]].  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [[wikipedia:Roll-off|Wikipedia&#039;s article on filter roll-off]], to learn why it&#039;s hard to build analog filters with a very narrow [[wikipedia:Transition_band|transition band]] between the [[wikipedia:Passband|passband]] and the [[wikipedia:Stopband|stopband]].&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=PCM Some more minutiae] about PCM in practice.&lt;br /&gt;
* [[wikipedia:DPCM|DPCM]] and [[wikipedia:ADPCM|ADPCM]], simple audio codecs loosely inspired by PCM.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
[[Image:Dmpfg_010.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. [[wikipedia:Red_Book_(audio_Compact_Disc_standard)#Technical_details|Raw CD audio]] is about 1.4 megabits&lt;br /&gt;
per second. Raw [[wikipedia:1080i|1080i]] HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By [[wikipedia:Moore&#039;s_law|Moore&#039;s law]]... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal [[wikipedia:NTSC|analog television broadcast]].  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
[[Image:Dmpfg_011.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of [[wikipedia:Scan_line|scanlines]] in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
[[wikipedia:Bandwidth_(signal_processing)|bandwidth]]. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of [[wikipedia:DVD-Video#Frame_size_and_frame_rate|704 by 480]], a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of [[wikipedia:Standard-definition_television#Resolution|10:11]], making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
[[Image:Dmpfg_012.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the [[wikipedia:Frame_rate|frame rate]], the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
[[wikipedia:Interlace|interlacing]].&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize [[wikipedia:Flicker_(screen)|flicker]]&lt;br /&gt;
on phosphor-based [[wikipedia:Cathode_ray_tube|CRTs]].  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
[[wikipedia:Deinterlacing|deinterlace]] a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
[[Image:Dmpfg_013.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary [[wikipedia:Gamma_correction|gamma correction]] circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard [[wikipedia:sRGB|sRGB]] computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
[[Image:Dmpfg_014.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as [[wikipedia:Additive_color|additive primaries]] to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are [[wikipedia:CMYK|Cyan, Magenta, and Yellow]] for the same reason; pigments&lt;br /&gt;
are [[wikipedia:Subtractive_color|subtractive]], and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to [[wikipedia:Luminance_(relative)|luminosity]] than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution [[wikipedia:Luma_(video)|luma channel]]&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution [[wikipedia:Chrominance|chroma channels]], the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called [[wikipedia:Y&#039;CbCr|Y&#039;CbCr]], but the more generic term YUV is&lt;br /&gt;
widely used to describe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
[[Image:Dmpfg_015.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually [[wikipedia:Chroma_subsampling|halved or even quartered]] in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are [[wikipedia:Chroma_subsampling#4:4:4_Y.27CbCr|4:4:4]] video, which isn&#039;t actually subsampled at all, [[wikipedia:Chroma_subsampling#4:2:2|4:2:2]] video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, [[wikipedia:Chroma_subsampling#4:2:0|4:2:0]] video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, [[wikipedia:Chroma_subsampling#4:1:1|4:1:1]], and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, [[wikipedia:Motion_Jpeg|motion&lt;br /&gt;
JPEG]], [[wikipedia:MPEG-1#Part_2:_Video|MPEG-1 video]], [[wikipedia:MPEG-2#Video_coding_.28simplified.29|MPEG-2 video]], [[wikipedia:DV#DV_Compression|DV]], [[wikipedia:Theora|Theora]], and [[wikipedia:WebM|WebM]] all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels [http://www.mir.com/DMG/chroma.html three different ways].&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG-1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG-2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
[[Image:Dmpfg_016.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least [http://www.fourcc.org/yuv.php 50 different formats] in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or [[wikipedia:FourCC|fourcc]] code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  [http://www.fourcc.org/yuv.php#UYVY YV12]&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of [[wikipedia:YUV#BT.709_and_BT.601|several YUV colorspace definitions]].&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=YUV4MPEG2 The y4m format] is the most common simple container for raw YUV video.  People occasionally use [[OggYUV]] to store it in Ogg instead.&lt;br /&gt;
* Learn about [[wikipedia:High_dynamic_range_imaging|high dynamic range imaging]], which achieves better representation of the full range of brightnesses in the real world by using more than 8 bits per channel.&lt;br /&gt;
* Learn about how [[wikipedia:Trichromatic_vision|trichromatic color vision]] works in humans, and how human color perception is encoded in the [[wikipedia:CIE 1931 color space|CIE 1931 XYZ color space]].&lt;br /&gt;
** Compare with the [[wikipedia:Lab_color_space|Lab color space]], mathematically equivalent but structured to account for &amp;quot;perceptual uniformity&amp;quot;.&lt;br /&gt;
** If we were all [[wikipedia:Dichromacy|dichromats]] then video would only need two color channels.  Some humans might be [[wikipedia:Tetrachromacy#Possibility_of_human_tetrachromats|tetrachromats]], in which case they would need an additional color channel for video to fully represent their vision.&lt;br /&gt;
** [http://www.xritephoto.com/ph_toolframe.aspx?action=coloriq Test your color vision] (or at least your monitor).&lt;br /&gt;
* YCbCr is defined in terms of RGB by the ITU in two incompatible standards: [[wikipedia:Rec. 601|Rec. 601]] and [[wikipedia:Rec. 709|Rec. 709]].  Both conversion standards are lossy, which has prompted some to adopt a lossless alternative called [http://wiki.multimedia.cx/index.php?title=YCoCg YCoCg].&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
[[Image:Dmpfg_017.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other [[wikipedia:Metadata#Video|metadata]] we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a [[wikipedia:Container_format_(digital)|container]].  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
[[Image:Dmpfg_018.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the Microscopic Septet&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.cuneiformrecords.com www.cuneiformrecords.com]&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software:&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[http://www.gnu.org/ GNU]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.linux.org/ Linux]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://fedoraproject.org/ Fedora]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://cinelerra.org/ Cinelerra]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.gimp.org/ The Gimp]&amp;lt;br&amp;gt;&lt;br /&gt;
[http://audacity.sourceforge.net/ Audacity]&amp;lt;br&amp;gt;&lt;br /&gt;
Postfish&amp;lt;br&amp;gt;&lt;br /&gt;
[http://gstreamer.freedesktop.org/ Gstreamer]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&amp;lt;br&amp;gt;&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&amp;lt;br&amp;gt;&lt;br /&gt;
(C) 2010, Some Rights Reserved&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking a lot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast, and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orange audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy).&lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12442</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12442"/>
		<updated>2010-09-22T04:29:07Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Credits */ line breaks&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Image:Dmpfg_001.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
[[Image:Dmpfg_000.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_002.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*[http://www.0xdeadbeef.com/weblog/2010/01/html5-video-and-h-264-what-history-tells-us-and-why-were-standing-with-the-web/ HTML5 Video and H.264: what history tells us and why we&#039;re standing with the web]: Chris Blizzard of Mozilla on free formats and the open web&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
[[Image:Dmpfg_004.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_006.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_007.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[WikiPedia:Sound|Sound]] is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the [[WikiPedia:Oscilloscope|&#039;scope]] down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s [[WikiPedia:Continuous_function|continuous]] in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is [[WikiPedia:Discrete_math|discrete]] in both value and time.&lt;br /&gt;
In the simplest and most common system, called [[WikiPedia:Pulse code modulation|Pulse Code Modulation]],&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the [[WikiPedia:Nyquist-Shannon sampling theorem|Sampling Theorem]] says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by [[WikiPedia:Claude Shannon|Claude Shannon]] in 1949&lt;br /&gt;
and built on the work of [[WikiPedia:Harry Nyquist|Nyquist]], and [[WikiPedia:Ralph Hartley|Hartley]], and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The [[WikiPedia:Telegraph|telegraph]] predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... [[WikiPedia:Tickertape|tickertape]]. Harry Nyquist of [[WikiPedia:Bell_labs|Bell Labs]] was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the [[WikiPedia:Nyquist_frequency|Nyquist frequency]], the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A [[WikiPedia:Low-pass_filter#Continuous-time_low-pass_filters|second-order low-pass filter]], for example,&lt;br /&gt;
requires two passive components.  An all-analog [[WikiPedia:Short-time_Fourier_transform|short-time Fourier&lt;br /&gt;
transform]], a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy (bang on the [http://www.testequipmentdepot.com/usedequipment/hewlettpackard/spectrumanalyzers/3585a.htm 3585]).  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And [[WikiPedia:File:Transistor_Count_and_Moore&#039;s_Law_-_2008.svg|since we&lt;br /&gt;
do]], digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds [[WikiPedia:Johnson–Nyquist_noise|noise]] and [[WikiPedia:Distortion#Electronic_signals|distortion]], usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an [[WikiPedia:Inductor|inductor]] the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use [[WikiPedia:Lossy_compression|lossy]] algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern [[WikiPedia:Digital-to-analog_converter|conversion stages]] are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
[[WikiPedia:Delta-sigma_modulation|Sigma-Delta coding]] used by the [[WikiPedia:Super_Audio_CD|SACD]], which is a form of [[wikipedia:Pulse-density_modulation|Pulse Density&lt;br /&gt;
Modulation]].  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample rate===&lt;br /&gt;
[[Image:Dmpfg_009.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_008.jpg|360px|right]]&lt;br /&gt;
The first parameter is the [[wikipedia:Sampling_rate|sampling rate]].  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally [[wikipedia:Bandlimiting|band-limited]] voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
Stepping back for just a second, the French mathematician [[wikipedia:Joseph_Fourier|Jean&lt;br /&gt;
Baptiste Joseph Fourier]] showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This [[wikipedia:Frequency_domain|frequency-domain]]&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it [[wikipedia:Basis_(linear_algebra)|a different way]].  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as [[wikipedia:Aliasing|aliasing distortion]].&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
[[wikipedia:Hearing_range|20kHz]] but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
[[wikipedia:Octave_(electronics)|octave]] or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample format===&lt;br /&gt;
[[Image:Dmpfg_anim.gif|right]]&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was [[wikipedia:Quantization_(sound_processing)#Audio_quantization|eight-bit]] [[wikipedia:Linear_pulse_code_modulation|linear]], encoded as an [[wikipedia:Signedness|unsigned]] [[wikipedia:Integer_(computer_science)#Bytes_and_octets|byte]].  The&lt;br /&gt;
[[wikipedia:Dynamic_range#Audio|dynamic range]] is limited to about [[wikipedia:Decibel|50dB]]  and the [[wikipedia:Quantization_error|quantization noise]], as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called [[wikipedia:A-law_algorithm|A-law]] and [[wikipedia:Μ-law_algorithm|μ-law]]. These formats encode a roughly&lt;br /&gt;
[[wikipedia:Audio_bit_depth#Dynamic_range|14 bit dynamic range]] into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit [[wikipedia:Two&#039;s_complement|two&#039;s-complement]] signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are [[wikipedia:Clipping_(audio)|clipped]].&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use [[wikipedia:Floating_point|floating-point]]&lt;br /&gt;
numbers for PCM instead of [[wikipedia:Integer_(computer_science)|integers]].  A 32 bit [[wikipedia:IEEE_754-2008|IEEE754]] float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in [[wikipedia:Endianness|big- or little-endian order]], and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft [[wikipedia:WAV|WAV]] files are little-endian,&lt;br /&gt;
and Apple [[wikipedia:AIFC|AIFC]] files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of [[wikipedia:Multichannel_audio|channels]].  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [[wikipedia:Roll-off|Wikipedia&#039;s article on filter roll-off]], to learn why it&#039;s hard to build analog filters with a very narrow [[wikipedia:Transition_band|transition band]] between the [[wikipedia:Passband|passband]] and the [[wikipedia:Stopband|stopband]].&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=PCM Some more minutiae] about PCM in practice.&lt;br /&gt;
* [[wikipedia:DPCM|DPCM]] and [[wikipedia:ADPCM|ADPCM]], simple audio codecs loosely inspired by PCM.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
[[Image:Dmpfg_010.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. [[wikipedia:Red_Book_(audio_Compact_Disc_standard)#Technical_details|Raw CD audio]] is about 1.4 megabits&lt;br /&gt;
per second. Raw [[wikipedia:1080i|1080i]] HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By [[wikipedia:Moore&#039;s_law|Moore&#039;s law]]... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal [[wikipedia:NTSC|analog television broadcast]].  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
[[Image:Dmpfg_011.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of [[wikipedia:Scan_line|scanlines]] in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
[[wikipedia:Bandwidth_(signal_processing)|bandwidth]]. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of [[wikipedia:DVD-Video#Frame_size_and_frame_rate|704 by 480]], a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of [[wikipedia:Standard-definition_television#Resolution|10:11]], making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
[[Image:Dmpfg_012.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the [[wikipedia:Frame_rate|frame rate]], the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
[[wikipedia:Interlace|interlacing]].&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize [[wikipedia:Flicker_(screen)|flicker]]&lt;br /&gt;
on phosphor-based [[wikipedia:Cathode_ray_tube|CRTs]].  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
[[wikipedia:Deinterlacing|deinterlace]] a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
[[Image:Dmpfg_013.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary [[wikipedia:Gamma_correction|gamma correction]] circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard [[wikipedia:sRGB|sRGB]] computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
[[Image:Dmpfg_014.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as [[wikipedia:Additive_color|additive primaries]] to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are [[wikipedia:CMYK|Cyan, Magenta, and Yellow]] for the same reason; pigments&lt;br /&gt;
are [[wikipedia:Subtractive_color|subtractive]], and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to [[wikipedia:Luminance_(relative)|luminosity]] than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution [[wikipedia:Luma_(video)|luma channel]]&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution [[wikipedia:Chrominance|chroma channels]], the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called [[wikipedia:Y&#039;CbCr|Y&#039;CbCr]], but the more generic term YUV is&lt;br /&gt;
widely used to describe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
[[Image:Dmpfg_015.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually [[wikipedia:Chroma_subsampling|halved or even quartered]] in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are [[wikipedia:Chroma_subsampling#4:4:4_Y.27CbCr|4:4:4]] video, which isn&#039;t actually subsampled at all, [[wikipedia:Chroma_subsampling#4:2:2|4:2:2]] video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, [[wikipedia:Chroma_subsampling#4:2:0|4:2:0]] video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, [[wikipedia:Chroma_subsampling#4:1:1|4:1:1]], and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, [[wikipedia:Motion_Jpeg|motion&lt;br /&gt;
JPEG]], [[wikipedia:MPEG-1#Part_2:_Video|MPEG-1 video]], [[wikipedia:MPEG-2#Video_coding_.28simplified.29|MPEG-2 video]], [[wikipedia:DV#DV_Compression|DV]], [[wikipedia:Theora|Theora]], and [[wikipedia:WebM|WebM]] all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels [http://www.mir.com/DMG/chroma.html three different ways].&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG-1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG-2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
[[Image:Dmpfg_016.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least [http://www.fourcc.org/yuv.php 50 different formats] in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or [[wikipedia:FourCC|fourcc]] code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  [http://www.fourcc.org/yuv.php#UYVY YV12]&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of [[wikipedia:YUV#BT.709_and_BT.601|several YUV colorspace definitions]].&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=YUV4MPEG2 The y4m format] is the most common simple container for raw YUV video.  People occasionally use [[OggYUV]] to store it in Ogg instead.&lt;br /&gt;
* Learn about [[wikipedia:High_dynamic_range_imaging|high dynamic range imaging]], which achieves better representation of the full range of brightnesses in the real world by using more than 8 bits per channel.&lt;br /&gt;
* Learn about how [[wikipedia:Trichromatic_vision|trichromatic color vision]] works in humans, and how human color perception is encoded in the [[wikipedia:CIE 1931 color space|CIE 1931 XYZ color space]].&lt;br /&gt;
** Compare with the [[wikipedia:Lab_color_space|Lab color space]], mathematically equivalent but structured to account for &amp;quot;perceptual uniformity&amp;quot;.&lt;br /&gt;
** If we were all [[wikipedia:Dichromacy|dichromats]] then video would only need two color channels.  Some humans might be [[wikipedia:Tetrachromacy#Possibility_of_human_tetrachromats|tetrachromats]], in which case they would need an additional color channel for video to fully represent their vision.&lt;br /&gt;
** [http://www.xritephoto.com/ph_toolframe.aspx?action=coloriq Test your color vision] (or at least your monitor).&lt;br /&gt;
* YCbCr is defined in terms of RGB by the ITU in two incompatible standards: [[wikipedia:Rec. 601|Rec. 601]] and [[wikipedia:Rec. 709|Rec. 709]].  Both conversion standards are lossy, which has prompted some to adopt a lossless alternative called [http://wiki.multimedia.cx/index.php?title=YCoCg YCoCg].&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
[[Image:Dmpfg_017.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other [[wikipedia:Metadata#Video|metadata]] we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a [[wikipedia:Container_format_(digital)|container]].  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the Microscopic Septet&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.cuneiformrecords.com www.cuneiformrecords.com]&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software:&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
GNU&amp;lt;br&amp;gt;&lt;br /&gt;
Linux&amp;lt;br&amp;gt;&lt;br /&gt;
Fedora&amp;lt;br&amp;gt;&lt;br /&gt;
Cinelerra&amp;lt;br&amp;gt;&lt;br /&gt;
The Gimp&amp;lt;br&amp;gt;&lt;br /&gt;
Audacity&amp;lt;br&amp;gt;&lt;br /&gt;
Postfish&amp;lt;br&amp;gt;&lt;br /&gt;
Gstreamer&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&amp;lt;br&amp;gt;&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&amp;lt;br&amp;gt;&lt;br /&gt;
(C) 2010, Some Rights Reserved&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking a lot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast, and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orange audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy).&lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12441</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12441"/>
		<updated>2010-09-22T04:28:18Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Credits */ link cuneiform&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Image:Dmpfg_001.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
[[Image:Dmpfg_000.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_002.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*[http://www.0xdeadbeef.com/weblog/2010/01/html5-video-and-h-264-what-history-tells-us-and-why-were-standing-with-the-web/ HTML5 Video and H.264: what history tells us and why we&#039;re standing with the web]: Chris Blizzard of Mozilla on free formats and the open web&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
[[Image:Dmpfg_004.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_006.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_007.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[WikiPedia:Sound|Sound]] is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the [[WikiPedia:Oscilloscope|&#039;scope]] down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s [[WikiPedia:Continuous_function|continuous]] in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is [[WikiPedia:Discrete_math|discrete]] in both value and time.&lt;br /&gt;
In the simplest and most common system, called [[WikiPedia:Pulse code modulation|Pulse Code Modulation]],&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the [[WikiPedia:Nyquist-Shannon sampling theorem|Sampling Theorem]] says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by [[WikiPedia:Claude Shannon|Claude Shannon]] in 1949&lt;br /&gt;
and built on the work of [[WikiPedia:Harry Nyquist|Nyquist]], and [[WikiPedia:Ralph Hartley|Hartley]], and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The [[WikiPedia:Telegraph|telegraph]] predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... [[WikiPedia:Tickertape|tickertape]]. Harry Nyquist of [[WikiPedia:Bell_labs|Bell Labs]] was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the [[WikiPedia:Nyquist_frequency|Nyquist frequency]], the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A [[WikiPedia:Low-pass_filter#Continuous-time_low-pass_filters|second-order low-pass filter]], for example,&lt;br /&gt;
requires two passive components.  An all-analog [[WikiPedia:Short-time_Fourier_transform|short-time Fourier&lt;br /&gt;
transform]], a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy (bang on the [http://www.testequipmentdepot.com/usedequipment/hewlettpackard/spectrumanalyzers/3585a.htm 3585]).  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And [[WikiPedia:File:Transistor_Count_and_Moore&#039;s_Law_-_2008.svg|since we&lt;br /&gt;
do]], digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds [[WikiPedia:Johnson–Nyquist_noise|noise]] and [[WikiPedia:Distortion#Electronic_signals|distortion]], usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an [[WikiPedia:Inductor|inductor]] the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use [[WikiPedia:Lossy_compression|lossy]] algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern [[WikiPedia:Digital-to-analog_converter|conversion stages]] are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
[[WikiPedia:Delta-sigma_modulation|Sigma-Delta coding]] used by the [[WikiPedia:Super_Audio_CD|SACD]], which is a form of [[wikipedia:Pulse-density_modulation|Pulse Density&lt;br /&gt;
Modulation]].  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample rate===&lt;br /&gt;
[[Image:Dmpfg_009.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_008.jpg|360px|right]]&lt;br /&gt;
The first parameter is the [[wikipedia:Sampling_rate|sampling rate]].  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally [[wikipedia:Bandlimiting|band-limited]] voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
Stepping back for just a second, the French mathematician [[wikipedia:Joseph_Fourier|Jean&lt;br /&gt;
Baptiste Joseph Fourier]] showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This [[wikipedia:Frequency_domain|frequency-domain]]&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it [[wikipedia:Basis_(linear_algebra)|a different way]].  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as [[wikipedia:Aliasing|aliasing distortion]].&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
[[wikipedia:Hearing_range|20kHz]] but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
[[wikipedia:Octave_(electronics)|octave]] or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample format===&lt;br /&gt;
[[Image:Dmpfg_anim.gif|right]]&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was [[wikipedia:Quantization_(sound_processing)#Audio_quantization|eight-bit]] [[wikipedia:Linear_pulse_code_modulation|linear]], encoded as an [[wikipedia:Signedness|unsigned]] [[wikipedia:Integer_(computer_science)#Bytes_and_octets|byte]].  The&lt;br /&gt;
[[wikipedia:Dynamic_range#Audio|dynamic range]] is limited to about [[wikipedia:Decibel|50dB]]  and the [[wikipedia:Quantization_error|quantization noise]], as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called [[wikipedia:A-law_algorithm|A-law]] and [[wikipedia:Μ-law_algorithm|μ-law]]. These formats encode a roughly&lt;br /&gt;
[[wikipedia:Audio_bit_depth#Dynamic_range|14 bit dynamic range]] into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit [[wikipedia:Two&#039;s_complement|two&#039;s-complement]] signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are [[wikipedia:Clipping_(audio)|clipped]].&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use [[wikipedia:Floating_point|floating-point]]&lt;br /&gt;
numbers for PCM instead of [[wikipedia:Integer_(computer_science)|integers]].  A 32 bit [[wikipedia:IEEE_754-2008|IEEE754]] float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in [[wikipedia:Endianness|big- or little-endian order]], and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft [[wikipedia:WAV|WAV]] files are little-endian,&lt;br /&gt;
and Apple [[wikipedia:AIFC|AIFC]] files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of [[wikipedia:Multichannel_audio|channels]].  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [[wikipedia:Roll-off|Wikipedia&#039;s article on filter roll-off]], to learn why it&#039;s hard to build analog filters with a very narrow [[wikipedia:Transition_band|transition band]] between the [[wikipedia:Passband|passband]] and the [[wikipedia:Stopband|stopband]].&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=PCM Some more minutiae] about PCM in practice.&lt;br /&gt;
* [[wikipedia:DPCM|DPCM]] and [[wikipedia:ADPCM|ADPCM]], simple audio codecs loosely inspired by PCM.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
[[Image:Dmpfg_010.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. [[wikipedia:Red_Book_(audio_Compact_Disc_standard)#Technical_details|Raw CD audio]] is about 1.4 megabits&lt;br /&gt;
per second. Raw [[wikipedia:1080i|1080i]] HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By [[wikipedia:Moore&#039;s_law|Moore&#039;s law]]... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal [[wikipedia:NTSC|analog television broadcast]].  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
[[Image:Dmpfg_011.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of [[wikipedia:Scan_line|scanlines]] in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
[[wikipedia:Bandwidth_(signal_processing)|bandwidth]]. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of [[wikipedia:DVD-Video#Frame_size_and_frame_rate|704 by 480]], a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of [[wikipedia:Standard-definition_television#Resolution|10:11]], making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
[[Image:Dmpfg_012.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the [[wikipedia:Frame_rate|frame rate]], the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
[[wikipedia:Interlace|interlacing]].&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize [[wikipedia:Flicker_(screen)|flicker]]&lt;br /&gt;
on phosphor-based [[wikipedia:Cathode_ray_tube|CRTs]].  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
[[wikipedia:Deinterlacing|deinterlace]] a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
[[Image:Dmpfg_013.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary [[wikipedia:Gamma_correction|gamma correction]] circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard [[wikipedia:sRGB|sRGB]] computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
[[Image:Dmpfg_014.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as [[wikipedia:Additive_color|additive primaries]] to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are [[wikipedia:CMYK|Cyan, Magenta, and Yellow]] for the same reason; pigments&lt;br /&gt;
are [[wikipedia:Subtractive_color|subtractive]], and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to [[wikipedia:Luminance_(relative)|luminosity]] than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution [[wikipedia:Luma_(video)|luma channel]]&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution [[wikipedia:Chrominance|chroma channels]], the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called [[wikipedia:Y&#039;CbCr|Y&#039;CbCr]], but the more generic term YUV is&lt;br /&gt;
widely used to describe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
[[Image:Dmpfg_015.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually [[wikipedia:Chroma_subsampling|halved or even quartered]] in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are [[wikipedia:Chroma_subsampling#4:4:4_Y.27CbCr|4:4:4]] video, which isn&#039;t actually subsampled at all, [[wikipedia:Chroma_subsampling#4:2:2|4:2:2]] video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, [[wikipedia:Chroma_subsampling#4:2:0|4:2:0]] video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, [[wikipedia:Chroma_subsampling#4:1:1|4:1:1]], and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, [[wikipedia:Motion_Jpeg|motion&lt;br /&gt;
JPEG]], [[wikipedia:MPEG-1#Part_2:_Video|MPEG-1 video]], [[wikipedia:MPEG-2#Video_coding_.28simplified.29|MPEG-2 video]], [[wikipedia:DV#DV_Compression|DV]], [[wikipedia:Theora|Theora]], and [[wikipedia:WebM|WebM]] all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels [http://www.mir.com/DMG/chroma.html three different ways].&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG-1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG-2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
[[Image:Dmpfg_016.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least [http://www.fourcc.org/yuv.php 50 different formats] in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or [[wikipedia:FourCC|fourcc]] code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  [http://www.fourcc.org/yuv.php#UYVY YV12]&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of [[wikipedia:YUV#BT.709_and_BT.601|several YUV colorspace definitions]].&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=YUV4MPEG2 The y4m format] is the most common simple container for raw YUV video.  People occasionally use [[OggYUV]] to store it in Ogg instead.&lt;br /&gt;
* Learn about [[wikipedia:High_dynamic_range_imaging|high dynamic range imaging]], which achieves better representation of the full range of brightnesses in the real world by using more than 8 bits per channel.&lt;br /&gt;
* Learn about how [[wikipedia:Trichromatic_vision|trichromatic color vision]] works in humans, and how human color perception is encoded in the [[wikipedia:CIE 1931 color space|CIE 1931 XYZ color space]].&lt;br /&gt;
** Compare with the [[wikipedia:Lab_color_space|Lab color space]], mathematically equivalent but structured to account for &amp;quot;perceptual uniformity&amp;quot;.&lt;br /&gt;
** If we were all [[wikipedia:Dichromacy|dichromats]] then video would only need two color channels.  Some humans might be [[wikipedia:Tetrachromacy#Possibility_of_human_tetrachromats|tetrachromats]], in which case they would need an additional color channel for video to fully represent their vision.&lt;br /&gt;
** [http://www.xritephoto.com/ph_toolframe.aspx?action=coloriq Test your color vision] (or at least your monitor).&lt;br /&gt;
* YCbCr is defined in terms of RGB by the ITU in two incompatible standards: [[wikipedia:Rec. 601|Rec. 601]] and [[wikipedia:Rec. 709|Rec. 709]].  Both conversion standards are lossy, which has prompted some to adopt a lossless alternative called [http://wiki.multimedia.cx/index.php?title=YCoCg YCoCg].&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
[[Image:Dmpfg_017.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other [[wikipedia:Metadata#Video|metadata]] we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a [[wikipedia:Container_format_(digital)|container]].  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the Microscopic Septet&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
[http://www.cuneiformrecords.com www.cuneiformrecords.com]&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking a lot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast, and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orange audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy).&lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12440</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12440"/>
		<updated>2010-09-22T04:24:12Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Introduction */ typo&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Image:Dmpfg_001.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
[[Image:Dmpfg_000.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_002.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*[http://www.0xdeadbeef.com/weblog/2010/01/html5-video-and-h-264-what-history-tells-us-and-why-were-standing-with-the-web/ HTML5 Video and H.264: what history tells us and why we&#039;re standing with the web]: Chris Blizzard of Mozilla on free formats and the open web&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
[[Image:Dmpfg_004.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_006.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_007.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[WikiPedia:Sound|Sound]] is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the [[WikiPedia:Oscilloscope|&#039;scope]] down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s [[WikiPedia:Continuous_function|continuous]] in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is [[WikiPedia:Discrete_math|discrete]] in both value and time.&lt;br /&gt;
In the simplest and most common system, called [[WikiPedia:Pulse code modulation|Pulse Code Modulation]],&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the [[WikiPedia:Nyquist-Shannon sampling theorem|Sampling Theorem]] says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by [[WikiPedia:Claude Shannon|Claude Shannon]] in 1949&lt;br /&gt;
and built on the work of [[WikiPedia:Harry Nyquist|Nyquist]], and [[WikiPedia:Ralph Hartley|Hartley]], and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The [[WikiPedia:Telegraph|telegraph]] predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... [[WikiPedia:Tickertape|tickertape]]. Harry Nyquist of [[WikiPedia:Bell_labs|Bell Labs]] was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the [[WikiPedia:Nyquist_frequency|Nyquist frequency]], the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A [[WikiPedia:Low-pass_filter#Continuous-time_low-pass_filters|second-order low-pass filter]], for example,&lt;br /&gt;
requires two passive components.  An all-analog [[WikiPedia:Short-time_Fourier_transform|short-time Fourier&lt;br /&gt;
transform]], a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy (bang on the [http://www.testequipmentdepot.com/usedequipment/hewlettpackard/spectrumanalyzers/3585a.htm 3585]).  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And [[WikiPedia:File:Transistor_Count_and_Moore&#039;s_Law_-_2008.svg|since we&lt;br /&gt;
do]], digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds [[WikiPedia:Johnson–Nyquist_noise|noise]] and [[WikiPedia:Distortion#Electronic_signals|distortion]], usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an [[WikiPedia:Inductor|inductor]] the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use [[WikiPedia:Lossy_compression|lossy]] algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern [[WikiPedia:Digital-to-analog_converter|conversion stages]] are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
[[WikiPedia:Delta-sigma_modulation|Sigma-Delta coding]] used by the [[WikiPedia:Super_Audio_CD|SACD]], which is a form of [[wikipedia:Pulse-density_modulation|Pulse Density&lt;br /&gt;
Modulation]].  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample rate===&lt;br /&gt;
[[Image:Dmpfg_009.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_008.jpg|360px|right]]&lt;br /&gt;
The first parameter is the [[wikipedia:Sampling_rate|sampling rate]].  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally [[wikipedia:Bandlimiting|band-limited]] voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
Stepping back for just a second, the French mathematician [[wikipedia:Joseph_Fourier|Jean&lt;br /&gt;
Baptiste Joseph Fourier]] showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This [[wikipedia:Frequency_domain|frequency-domain]]&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it [[wikipedia:Basis_(linear_algebra)|a different way]].  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as [[wikipedia:Aliasing|aliasing distortion]].&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
[[wikipedia:Hearing_range|20kHz]] but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
[[wikipedia:Octave_(electronics)|octave]] or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample format===&lt;br /&gt;
[[Image:Dmpfg_anim.gif|right]]&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was [[wikipedia:Quantization_(sound_processing)#Audio_quantization|eight-bit]] [[wikipedia:Linear_pulse_code_modulation|linear]], encoded as an [[wikipedia:Signedness|unsigned]] [[wikipedia:Integer_(computer_science)#Bytes_and_octets|byte]].  The&lt;br /&gt;
[[wikipedia:Dynamic_range#Audio|dynamic range]] is limited to about [[wikipedia:Decibel|50dB]]  and the [[wikipedia:Quantization_error|quantization noise]], as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called [[wikipedia:A-law_algorithm|A-law]] and [[wikipedia:Μ-law_algorithm|μ-law]]. These formats encode a roughly&lt;br /&gt;
[[wikipedia:Audio_bit_depth#Dynamic_range|14 bit dynamic range]] into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit [[wikipedia:Two&#039;s_complement|two&#039;s-complement]] signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are [[wikipedia:Clipping_(audio)|clipped]].&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use [[wikipedia:Floating_point|floating-point]]&lt;br /&gt;
numbers for PCM instead of [[wikipedia:Integer_(computer_science)|integers]].  A 32 bit [[wikipedia:IEEE_754-2008|IEEE754]] float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in [[wikipedia:Endianness|big- or little-endian order]], and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft [[wikipedia:WAV|WAV]] files are little-endian,&lt;br /&gt;
and Apple [[wikipedia:AIFC|AIFC]] files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of [[wikipedia:Multichannel_audio|channels]].  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [[wikipedia:Roll-off|Wikipedia&#039;s article on filter roll-off]], to learn why it&#039;s hard to build analog filters with a very narrow [[wikipedia:Transition_band|transition band]] between the [[wikipedia:Passband|passband]] and the [[wikipedia:Stopband|stopband]].&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=PCM Some more minutiae] about PCM in practice.&lt;br /&gt;
* [[wikipedia:DPCM|DPCM]] and [[wikipedia:ADPCM|ADPCM]], simple audio codecs loosely inspired by PCM.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
[[Image:Dmpfg_010.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. [[wikipedia:Red_Book_(audio_Compact_Disc_standard)#Technical_details|Raw CD audio]] is about 1.4 megabits&lt;br /&gt;
per second. Raw [[wikipedia:1080i|1080i]] HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By [[wikipedia:Moore&#039;s_law|Moore&#039;s law]]... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal [[wikipedia:NTSC|analog television broadcast]].  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
[[Image:Dmpfg_011.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of [[wikipedia:Scan_line|scanlines]] in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
[[wikipedia:Bandwidth_(signal_processing)|bandwidth]]. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of [[wikipedia:DVD-Video#Frame_size_and_frame_rate|704 by 480]], a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of [[wikipedia:Standard-definition_television#Resolution|10:11]], making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
[[Image:Dmpfg_012.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the [[wikipedia:Frame_rate|frame rate]], the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
[[wikipedia:Interlace|interlacing]].&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize [[wikipedia:Flicker_(screen)|flicker]]&lt;br /&gt;
on phosphor-based [[wikipedia:Cathode_ray_tube|CRTs]].  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
[[wikipedia:Deinterlacing|deinterlace]] a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
[[Image:Dmpfg_013.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary [[wikipedia:Gamma_correction|gamma correction]] circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard [[wikipedia:sRGB|sRGB]] computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
[[Image:Dmpfg_014.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as [[wikipedia:Additive_color|additive primaries]] to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are [[wikipedia:CMYK|Cyan, Magenta, and Yellow]] for the same reason; pigments&lt;br /&gt;
are [[wikipedia:Subtractive_color|subtractive]], and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to [[wikipedia:Luminance_(relative)|luminosity]] than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution [[wikipedia:Luma_(video)|luma channel]]&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution [[wikipedia:Chrominance|chroma channels]], the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called [[wikipedia:Y&#039;CbCr|Y&#039;CbCr]], but the more generic term YUV is&lt;br /&gt;
widely used to describe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
[[Image:Dmpfg_015.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually [[wikipedia:Chroma_subsampling|halved or even quartered]] in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are [[wikipedia:Chroma_subsampling#4:4:4_Y.27CbCr|4:4:4]] video, which isn&#039;t actually subsampled at all, [[wikipedia:Chroma_subsampling#4:2:2|4:2:2]] video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, [[wikipedia:Chroma_subsampling#4:2:0|4:2:0]] video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, [[wikipedia:Chroma_subsampling#4:1:1|4:1:1]], and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, [[wikipedia:Motion_Jpeg|motion&lt;br /&gt;
JPEG]], [[wikipedia:MPEG-1#Part_2:_Video|MPEG-1 video]], [[wikipedia:MPEG-2#Video_coding_.28simplified.29|MPEG-2 video]], [[wikipedia:DV#DV_Compression|DV]], [[wikipedia:Theora|Theora]], and [[wikipedia:WebM|WebM]] all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels [http://www.mir.com/DMG/chroma.html three different ways].&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG-1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG-2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
[[Image:Dmpfg_016.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least [http://www.fourcc.org/yuv.php 50 different formats] in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or [[wikipedia:FourCC|fourcc]] code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  [http://www.fourcc.org/yuv.php#UYVY YV12]&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of [[wikipedia:YUV#BT.709_and_BT.601|several YUV colorspace definitions]].&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=YUV4MPEG2 The y4m format] is the most common simple container for raw YUV video.  People occasionally use [[OggYUV]] to store it in Ogg instead.&lt;br /&gt;
* Learn about [[wikipedia:High_dynamic_range_imaging|high dynamic range imaging]], which achieves better representation of the full range of brightnesses in the real world by using more than 8 bits per channel.&lt;br /&gt;
* Learn about how [[wikipedia:Trichromatic_vision|trichromatic color vision]] works in humans, and how human color perception is encoded in the [[wikipedia:CIE 1931 color space|CIE 1931 XYZ color space]].&lt;br /&gt;
** Compare with the [[wikipedia:Lab_color_space|Lab color space]], mathematically equivalent but structured to account for &amp;quot;perceptual uniformity&amp;quot;.&lt;br /&gt;
** If we were all [[wikipedia:Dichromacy|dichromats]] then video would only need two color channels.  Some humans might be [[wikipedia:Tetrachromacy#Possibility_of_human_tetrachromats|tetrachromats]], in which case they would need an additional color channel for video to fully represent their vision.&lt;br /&gt;
** [http://www.xritephoto.com/ph_toolframe.aspx?action=coloriq Test your color vision] (or at least your monitor).&lt;br /&gt;
* YCbCr is defined in terms of RGB by the ITU in two incompatible standards: [[wikipedia:Rec. 601|Rec. 601]] and [[wikipedia:Rec. 709|Rec. 709]].  Both conversion standards are lossy, which has prompted some to adopt a lossless alternative called [http://wiki.multimedia.cx/index.php?title=YCoCg YCoCg].&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
[[Image:Dmpfg_017.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other [[wikipedia:Metadata#Video|metadata]] we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a [[wikipedia:Container_format_(digital)|container]].  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the Microscopic Septet&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking a lot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast, and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orange audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy).&lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12439</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12439"/>
		<updated>2010-09-22T04:23:47Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Introduction */ link&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Image:Dmpfg_001.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
[[Image:Dmpfg_000.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_002.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*[http://www.0xdeadbeef.com/weblog/2010/01/html5-video-and-h-264-what-history-tells-us-and-why-were-standing-with-the-web/ HTML5 Video and H.264: what ihstory tells us and why we&#039;re standing with the web]: Chris Blizzard of Mozilla on free formats and the open web&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
[[Image:Dmpfg_004.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_006.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_007.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[WikiPedia:Sound|Sound]] is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the [[WikiPedia:Oscilloscope|&#039;scope]] down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s [[WikiPedia:Continuous_function|continuous]] in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is [[WikiPedia:Discrete_math|discrete]] in both value and time.&lt;br /&gt;
In the simplest and most common system, called [[WikiPedia:Pulse code modulation|Pulse Code Modulation]],&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the [[WikiPedia:Nyquist-Shannon sampling theorem|Sampling Theorem]] says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by [[WikiPedia:Claude Shannon|Claude Shannon]] in 1949&lt;br /&gt;
and built on the work of [[WikiPedia:Harry Nyquist|Nyquist]], and [[WikiPedia:Ralph Hartley|Hartley]], and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The [[WikiPedia:Telegraph|telegraph]] predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... [[WikiPedia:Tickertape|tickertape]]. Harry Nyquist of [[WikiPedia:Bell_labs|Bell Labs]] was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the [[WikiPedia:Nyquist_frequency|Nyquist frequency]], the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A [[WikiPedia:Low-pass_filter#Continuous-time_low-pass_filters|second-order low-pass filter]], for example,&lt;br /&gt;
requires two passive components.  An all-analog [[WikiPedia:Short-time_Fourier_transform|short-time Fourier&lt;br /&gt;
transform]], a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy (bang on the [http://www.testequipmentdepot.com/usedequipment/hewlettpackard/spectrumanalyzers/3585a.htm 3585]).  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And [[WikiPedia:File:Transistor_Count_and_Moore&#039;s_Law_-_2008.svg|since we&lt;br /&gt;
do]], digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds [[WikiPedia:Johnson–Nyquist_noise|noise]] and [[WikiPedia:Distortion#Electronic_signals|distortion]], usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an [[WikiPedia:Inductor|inductor]] the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use [[WikiPedia:Lossy_compression|lossy]] algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern [[WikiPedia:Digital-to-analog_converter|conversion stages]] are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
[[WikiPedia:Delta-sigma_modulation|Sigma-Delta coding]] used by the [[WikiPedia:Super_Audio_CD|SACD]], which is a form of [[wikipedia:Pulse-density_modulation|Pulse Density&lt;br /&gt;
Modulation]].  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample rate===&lt;br /&gt;
[[Image:Dmpfg_009.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_008.jpg|360px|right]]&lt;br /&gt;
The first parameter is the [[wikipedia:Sampling_rate|sampling rate]].  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally [[wikipedia:Bandlimiting|band-limited]] voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
Stepping back for just a second, the French mathematician [[wikipedia:Joseph_Fourier|Jean&lt;br /&gt;
Baptiste Joseph Fourier]] showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This [[wikipedia:Frequency_domain|frequency-domain]]&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it [[wikipedia:Basis_(linear_algebra)|a different way]].  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as [[wikipedia:Aliasing|aliasing distortion]].&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
[[wikipedia:Hearing_range|20kHz]] but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
[[wikipedia:Octave_(electronics)|octave]] or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample format===&lt;br /&gt;
[[Image:Dmpfg_anim.gif|right]]&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was [[wikipedia:Quantization_(sound_processing)#Audio_quantization|eight-bit]] [[wikipedia:Linear_pulse_code_modulation|linear]], encoded as an [[wikipedia:Signedness|unsigned]] [[wikipedia:Integer_(computer_science)#Bytes_and_octets|byte]].  The&lt;br /&gt;
[[wikipedia:Dynamic_range#Audio|dynamic range]] is limited to about [[wikipedia:Decibel|50dB]]  and the [[wikipedia:Quantization_error|quantization noise]], as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called [[wikipedia:A-law_algorithm|A-law]] and [[wikipedia:Μ-law_algorithm|μ-law]]. These formats encode a roughly&lt;br /&gt;
[[wikipedia:Audio_bit_depth#Dynamic_range|14 bit dynamic range]] into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit [[wikipedia:Two&#039;s_complement|two&#039;s-complement]] signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are [[wikipedia:Clipping_(audio)|clipped]].&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use [[wikipedia:Floating_point|floating-point]]&lt;br /&gt;
numbers for PCM instead of [[wikipedia:Integer_(computer_science)|integers]].  A 32 bit [[wikipedia:IEEE_754-2008|IEEE754]] float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in [[wikipedia:Endianness|big- or little-endian order]], and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft [[wikipedia:WAV|WAV]] files are little-endian,&lt;br /&gt;
and Apple [[wikipedia:AIFC|AIFC]] files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of [[wikipedia:Multichannel_audio|channels]].  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [[wikipedia:Roll-off|Wikipedia&#039;s article on filter roll-off]], to learn why it&#039;s hard to build analog filters with a very narrow [[wikipedia:Transition_band|transition band]] between the [[wikipedia:Passband|passband]] and the [[wikipedia:Stopband|stopband]].&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=PCM Some more minutiae] about PCM in practice.&lt;br /&gt;
* [[wikipedia:DPCM|DPCM]] and [[wikipedia:ADPCM|ADPCM]], simple audio codecs loosely inspired by PCM.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
[[Image:Dmpfg_010.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. [[wikipedia:Red_Book_(audio_Compact_Disc_standard)#Technical_details|Raw CD audio]] is about 1.4 megabits&lt;br /&gt;
per second. Raw [[wikipedia:1080i|1080i]] HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By [[wikipedia:Moore&#039;s_law|Moore&#039;s law]]... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal [[wikipedia:NTSC|analog television broadcast]].  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
[[Image:Dmpfg_011.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of [[wikipedia:Scan_line|scanlines]] in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
[[wikipedia:Bandwidth_(signal_processing)|bandwidth]]. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of [[wikipedia:DVD-Video#Frame_size_and_frame_rate|704 by 480]], a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of [[wikipedia:Standard-definition_television#Resolution|10:11]], making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
[[Image:Dmpfg_012.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the [[wikipedia:Frame_rate|frame rate]], the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
[[wikipedia:Interlace|interlacing]].&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize [[wikipedia:Flicker_(screen)|flicker]]&lt;br /&gt;
on phosphor-based [[wikipedia:Cathode_ray_tube|CRTs]].  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
[[wikipedia:Deinterlacing|deinterlace]] a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
[[Image:Dmpfg_013.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary [[wikipedia:Gamma_correction|gamma correction]] circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard [[wikipedia:sRGB|sRGB]] computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
[[Image:Dmpfg_014.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as [[wikipedia:Additive_color|additive primaries]] to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are [[wikipedia:CMYK|Cyan, Magenta, and Yellow]] for the same reason; pigments&lt;br /&gt;
are [[wikipedia:Subtractive_color|subtractive]], and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to [[wikipedia:Luminance_(relative)|luminosity]] than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution [[wikipedia:Luma_(video)|luma channel]]&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution [[wikipedia:Chrominance|chroma channels]], the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called [[wikipedia:Y&#039;CbCr|Y&#039;CbCr]], but the more generic term YUV is&lt;br /&gt;
widely used to describe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
[[Image:Dmpfg_015.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually [[wikipedia:Chroma_subsampling|halved or even quartered]] in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are [[wikipedia:Chroma_subsampling#4:4:4_Y.27CbCr|4:4:4]] video, which isn&#039;t actually subsampled at all, [[wikipedia:Chroma_subsampling#4:2:2|4:2:2]] video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, [[wikipedia:Chroma_subsampling#4:2:0|4:2:0]] video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, [[wikipedia:Chroma_subsampling#4:1:1|4:1:1]], and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, [[wikipedia:Motion_Jpeg|motion&lt;br /&gt;
JPEG]], [[wikipedia:MPEG-1#Part_2:_Video|MPEG-1 video]], [[wikipedia:MPEG-2#Video_coding_.28simplified.29|MPEG-2 video]], [[wikipedia:DV#DV_Compression|DV]], [[wikipedia:Theora|Theora]], and [[wikipedia:WebM|WebM]] all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels [http://www.mir.com/DMG/chroma.html three different ways].&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG-1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG-2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
[[Image:Dmpfg_016.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least [http://www.fourcc.org/yuv.php 50 different formats] in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or [[wikipedia:FourCC|fourcc]] code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  [http://www.fourcc.org/yuv.php#UYVY YV12]&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of [[wikipedia:YUV#BT.709_and_BT.601|several YUV colorspace definitions]].&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=YUV4MPEG2 The y4m format] is the most common simple container for raw YUV video.  People occasionally use [[OggYUV]] to store it in Ogg instead.&lt;br /&gt;
* Learn about [[wikipedia:High_dynamic_range_imaging|high dynamic range imaging]], which achieves better representation of the full range of brightnesses in the real world by using more than 8 bits per channel.&lt;br /&gt;
* Learn about how [[wikipedia:Trichromatic_vision|trichromatic color vision]] works in humans, and how human color perception is encoded in the [[wikipedia:CIE 1931 color space|CIE 1931 XYZ color space]].&lt;br /&gt;
** Compare with the [[wikipedia:Lab_color_space|Lab color space]], mathematically equivalent but structured to account for &amp;quot;perceptual uniformity&amp;quot;.&lt;br /&gt;
** If we were all [[wikipedia:Dichromacy|dichromats]] then video would only need two color channels.  Some humans might be [[wikipedia:Tetrachromacy#Possibility_of_human_tetrachromats|tetrachromats]], in which case they would need an additional color channel for video to fully represent their vision.&lt;br /&gt;
** [http://www.xritephoto.com/ph_toolframe.aspx?action=coloriq Test your color vision] (or at least your monitor).&lt;br /&gt;
* YCbCr is defined in terms of RGB by the ITU in two incompatible standards: [[wikipedia:Rec. 601|Rec. 601]] and [[wikipedia:Rec. 709|Rec. 709]].  Both conversion standards are lossy, which has prompted some to adopt a lossless alternative called [http://wiki.multimedia.cx/index.php?title=YCoCg YCoCg].&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
[[Image:Dmpfg_017.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other [[wikipedia:Metadata#Video|metadata]] we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a [[wikipedia:Container_format_(digital)|container]].  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the Microscopic Septet&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking a lot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast, and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orange audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy).&lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12423</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12423"/>
		<updated>2010-09-22T03:16:28Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* gamma */ fix link&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Image:Dmpfg_001.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
[[Image:Dmpfg_000.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_002.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Fixme: Some good HTML5 baseline codec debate article&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
[[Image:Dmpfg_004.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_006.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_007.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[WikiPedia:Sound|Sound]] is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the [[WikiPedia:Oscilloscope|&#039;scope]] down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s [[WikiPedia:Continuous_function|continuous]] in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is [[WikiPedia:Discrete_math|discrete]] in both value and time.&lt;br /&gt;
In the simplest and most common system, called [[WikiPedia:Pulse code modulation|Pulse Code Modulation]],&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the [[WikiPedia:Nyquist-Shannon sampling theorem|Sampling Theorem]] says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by [[WikiPedia:Claude Shannon|Claude Shannon]] in 1949&lt;br /&gt;
and built on the work of [[WikiPedia:Harry Nyquist|Nyquist]], and [[WikiPedia:Ralph Hartley|Hartley]], and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The [[WikiPedia:Telegraph|telegraph]] predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... [[WikiPedia:Tickertape|tickertape]]. Harry Nyquist of [[WikiPedia:Bell_labs|Bell Labs]] was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the [[WikiPedia:Nyquist_frequency|Nyquist frequency]], the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A [[WikiPedia:Low-pass_filter#Continuous-time_low-pass_filters|second-order low-pass filter]], for example,&lt;br /&gt;
requires two passive components.  An all-analog [[WikiPedia:Short-time_Fourier_transform|short-time Fourier&lt;br /&gt;
transform]], a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy (bang on the [http://www.testequipmentdepot.com/usedequipment/hewlettpackard/spectrumanalyzers/3585a.htm 3585]).  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And [[WikiPedia:File:Transistor_Count_and_Moore&#039;s_Law_-_2008.svg|since we&lt;br /&gt;
do]], digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds [[WikiPedia:Johnson–Nyquist_noise|noise]] and [[WikiPedia:Distortion#Electronic_signals|distortion]], usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an [[WikiPedia:Inductor|inductor]] the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use [[WikiPedia:Lossy_compression|lossy]] algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern [[WikiPedia:Digital-to-analog_converter|conversion stages]] are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
[[WikiPedia:Delta-sigma_modulation|Sigma-Delta coding]] used by the [[WikiPedia:Super_Audio_CD|SACD]], which is a form of [[wikipedia:Pulse-density_modulation|Pulse Density&lt;br /&gt;
Modulation]].  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample rate===&lt;br /&gt;
[[Image:Dmpfg_009.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_008.jpg|360px|right]]&lt;br /&gt;
The first parameter is the [[wikipedia:Sampling_rate|sampling rate]].  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally [[wikipedia:Bandlimiting|band-limited]] voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
Stepping back for just a second, the French mathematician [[wikipedia:Joseph_Fourier|Jean&lt;br /&gt;
Baptiste Joseph Fourier]] showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This [[wikipedia:Frequency_domain|frequency-domain]]&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it [[wikipedia:Basis_(linear_algebra)|a different way]].  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as [[wikipedia:Aliasing|aliasing distortion]].&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
[[wikipedia:Hearing_range|20kHz]] but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
[[wikipedia:Octave_(electronics)|octave]] or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample format===&lt;br /&gt;
[[Image:Dmpfg_anim.gif|right]]&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was [[wikipedia:Quantization_(sound_processing)#Audio_quantization|eight-bit]] [[wikipedia:Linear_pulse_code_modulation|linear]], encoded as an [[wikipedia:Signedness|unsigned]] [[wikipedia:Integer_(computer_science)#Bytes_and_octets|byte]].  The&lt;br /&gt;
[[wikipedia:Dynamic_range#Audio|dynamic range]] is limited to about [[wikipedia:Decibel|50dB]]  and the [[wikipedia:Quantization_error|quantization noise]], as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called [[wikipedia:A-law_algorithm|A-law]] and [[wikipedia:Μ-law_algorithm|μ-law]]. These formats encode a roughly&lt;br /&gt;
[[wikipedia:Audio_bit_depth#Dynamic_range|14 bit dynamic range]] into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit [[wikipedia:Two&#039;s_complement|two&#039;s-complement]] signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are [[wikipedia:Clipping_(audio)|clipped]].&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use [[wikipedia:Floating_point|floating-point]]&lt;br /&gt;
numbers for PCM instead of [[wikipedia:Integer_(computer_science)|integers]].  A 32 bit [[wikipedia:IEEE_754-2008|IEEE754]] float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in [[wikipedia:Endianness|big- or little-endian order]], and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft [[wikipedia:WAV|WAV]] files are little-endian,&lt;br /&gt;
and Apple [[wikipedia:AIFC|AIFC]] files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of [[wikipedia:Multichannel_audio|channels]].  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [[wikipedia:Roll-off|Wikipedia&#039;s article on filter roll-off]], to learn why it&#039;s hard to build analog filters with a very narrow [[wikipedia:Transition_band|transition band]] between the [[wikipedia:Passband|passband]] and the [[wikipedia:Stopband|stopband]].&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=PCM Some more minutiae] about PCM in practice.&lt;br /&gt;
* [[wikipedia:DPCM|DPCM]] and [[wikipedia:ADPCM|ADPCM]], simple audio codecs derived from PCM.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
[[Image:Dmpfg_010.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. [[wikipedia:Red_Book_(audio_Compact_Disc_standard)#Technical_details|Raw CD audio]] is about 1.4 megabits&lt;br /&gt;
per second. Raw [[wikipedia:1080i|1080i]] HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By [[wikipedia:Moore&#039;s_law|Moore&#039;s law]]... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal [[wikipedia:NTSC|analog television broadcast]].  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
[[Image:Dmpfg_011.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of [[wikipedia:Scan_line|scanlines]] in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
[[wikipedia:Bandwidth_(signal_processing)|bandwidth]]. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of [[wikipedia:DVD-Video#Frame_size_and_frame_rate|704 by 480]], a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of [[wikipedia:Standard-definition_television#Resolution|10:11]], making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
[[Image:Dmpfg_012.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the [[wikipedia:Frame_rate|frame rate]], the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
[[wikipedia:Interlace|interlacing]].&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize [[wikipedia:Flicker_(screen)|flicker]]&lt;br /&gt;
on phosphor-based [[wikipedia:Cathode_ray_tube|CRTs]].  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
[[wikipedia:Deinterlacing|deinterlace]] a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
[[Image:Dmpfg_013.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary [[wikipedia:Gamma_correction|gamma correction]] circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard [[wikipedia:sRGB|sRGB]] computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
[[Image:Dmpfg_014.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as [[wikipedia:Additive_color|additive primaries]] to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are [[wikipedia:CMYK|Cyan, Magenta, and Yellow]] for the same reason; pigments&lt;br /&gt;
are [[wikipedia:Subtractive_color|subtractive]], and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to [[wikipedia:Luminance_(relative)|luminosity]] than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution [[wikipedia:Luma_(video)|luma channel]]&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution [[wikipedia:Chrominance|chroma channels]], the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called [[wikipedia:Y&#039;CbCr|Y&#039;CbCr]], but the more generic term YUV is&lt;br /&gt;
widely used to describe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually [[wikipedia:Chroma_subsampling|halved or even quartered]] in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are [[wikipedia:Chroma_subsampling#4:4:4_Y.27CbCr|4:4:4]] video, which isn&#039;t actually subsampled at all, [[wikipedia:Chroma_subsampling#4:2:2|4:2:2]] video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, [[wikipedia:Chroma_subsampling#4:2:0|4:2:0]] video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, [[wikipedia:Chroma_subsampling#4:1:1|4:1:1]], and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, [[wikipedia:Motion_Jpeg|motion&lt;br /&gt;
JPEG]], [[wikipedia:MPEG-1#Part_2:_Video|MPEG-1 video]], [[wikipedia:MPEG-2#Video_coding_.28simplified.29|MPEG-2 video]], [[wikipedia:DV#DV_Compression|DV]], [[wikipedia:Theora|Theora]], and [[wikipedia:WebM|WebM]] all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels [http://www.mir.com/DMG/chroma.html three different ways].&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG-1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG-2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least [http://www.fourcc.org/yuv.php 50 different formats] in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or [[wikipedia:FourCC|fourcc]] code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  [http://www.fourcc.org/yuv.php#UYVY YV12]&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of [[wikipedia:YUV#BT.709_and_BT.601|several YUV colorspace definitions]].&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=YUV4MPEG2 The y4m format] is the most common simple container for raw YUV video.  People occasionally use [[OggYUV]] to store it in Ogg instead.&lt;br /&gt;
* Learn about [[wikipedia:High_dynamic_range_imaging|high dynamic range imaging]], which achieves better representation of the full range of brightnesses in the real world by using more than 8 bits per channel.&lt;br /&gt;
* Learn about how [[wikipedia:Trichromatic_vision|trichromatic color vision]] works in humans, and how human color perception is encoded in the [[wikipedia:CIE 1931 color space|CIE 1931 XYZ color space]].&lt;br /&gt;
** Compare with the [[wikipedia:Lab_color_space|Lab color space]], mathematically equivalent but structured to account for &amp;quot;perceptual uniformity&amp;quot;.&lt;br /&gt;
** Some humans might be [[wikipedia:Tetrachromacy#Possibility_of_human_tetrachromats|tetrachromats]], in which case they would need an additional color channel for video to fully represent their vision.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other [[wikipedia:Metadata#Video|metadata]] we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a [[wikipedia:Container_format_(digital)|container]].  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the Microscopic Septet&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking a lot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast, and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orange audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy).&lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12422</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12422"/>
		<updated>2010-09-22T03:12:58Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* frame rate and interlacing */ link&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Image:Dmpfg_001.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
[[Image:Dmpfg_000.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_002.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Fixme: Some good HTML5 baseline codec debate article&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
[[Image:Dmpfg_004.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_006.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_007.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[WikiPedia:Sound|Sound]] is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the [[WikiPedia:Oscilloscope|&#039;scope]] down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s [[WikiPedia:Continuous_function|continuous]] in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is [[WikiPedia:Discrete_math|discrete]] in both value and time.&lt;br /&gt;
In the simplest and most common system, called [[WikiPedia:Pulse code modulation|Pulse Code Modulation]],&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the [[WikiPedia:Nyquist-Shannon sampling theorem|Sampling Theorem]] says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by [[WikiPedia:Claude Shannon|Claude Shannon]] in 1949&lt;br /&gt;
and built on the work of [[WikiPedia:Harry Nyquist|Nyquist]], and [[WikiPedia:Ralph Hartley|Hartley]], and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The [[WikiPedia:Telegraph|telegraph]] predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... [[WikiPedia:Tickertape|tickertape]]. Harry Nyquist of [[WikiPedia:Bell_labs|Bell Labs]] was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the [[WikiPedia:Nyquist_frequency|Nyquist frequency]], the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A [[WikiPedia:Low-pass_filter#Continuous-time_low-pass_filters|second-order low-pass filter]], for example,&lt;br /&gt;
requires two passive components.  An all-analog [[WikiPedia:Short-time_Fourier_transform|short-time Fourier&lt;br /&gt;
transform]], a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy (bang on the [http://www.testequipmentdepot.com/usedequipment/hewlettpackard/spectrumanalyzers/3585a.htm 3585]).  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And [[WikiPedia:File:Transistor_Count_and_Moore&#039;s_Law_-_2008.svg|since we&lt;br /&gt;
do]], digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds [[WikiPedia:Johnson–Nyquist_noise|noise]] and [[WikiPedia:Distortion#Electronic_signals|distortion]], usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an [[WikiPedia:Inductor|inductor]] the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use [[WikiPedia:Lossy_compression|lossy]] algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern [[WikiPedia:Digital-to-analog_converter|conversion stages]] are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
[[WikiPedia:Delta-sigma_modulation|Sigma-Delta coding]] used by the [[WikiPedia:Super_Audio_CD|SACD]], which is a form of [[wikipedia:Pulse-density_modulation|Pulse Density&lt;br /&gt;
Modulation]].  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample rate===&lt;br /&gt;
[[Image:Dmpfg_009.jpg|360px|right]]&lt;br /&gt;
[[Image:Dmpfg_008.jpg|360px|right]]&lt;br /&gt;
The first parameter is the [[wikipedia:Sampling_rate|sampling rate]].  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally [[wikipedia:Bandlimiting|band-limited]] voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
Stepping back for just a second, the French mathematician [[wikipedia:Joseph_Fourier|Jean&lt;br /&gt;
Baptiste Joseph Fourier]] showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This [[wikipedia:Frequency_domain|frequency-domain]]&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it [[wikipedia:Basis_(linear_algebra)|a different way]].  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as [[wikipedia:Aliasing|aliasing distortion]].&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
[[wikipedia:Hearing_range|20kHz]] but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
[[wikipedia:Octave_(electronics)|octave]] or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===sample format===&lt;br /&gt;
[[Image:Dmpfg_anim.gif|right]]&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was [[wikipedia:Quantization_(sound_processing)#Audio_quantization|eight-bit]] [[wikipedia:Linear_pulse_code_modulation|linear]], encoded as an [[wikipedia:Signedness|unsigned]] [[wikipedia:Integer_(computer_science)#Bytes_and_octets|byte]].  The&lt;br /&gt;
[[wikipedia:Dynamic_range#Audio|dynamic range]] is limited to about [[wikipedia:Decibel|50dB]]  and the [[wikipedia:Quantization_error|quantization noise]], as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called [[wikipedia:A-law_algorithm|A-law]] and [[wikipedia:Μ-law_algorithm|μ-law]]. These formats encode a roughly&lt;br /&gt;
[[wikipedia:Audio_bit_depth#Dynamic_range|14 bit dynamic range]] into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit [[wikipedia:Two&#039;s_complement|two&#039;s-complement]] signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are [[wikipedia:Clipping_(audio)|clipped]].&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use [[wikipedia:Floating_point|floating-point]]&lt;br /&gt;
numbers for PCM instead of [[wikipedia:Integer_(computer_science)|integers]].  A 32 bit [[wikipedia:IEEE_754-2008|IEEE754]] float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in [[wikipedia:Endianness|big- or little-endian order]], and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft [[wikipedia:WAV|WAV]] files are little-endian,&lt;br /&gt;
and Apple [[wikipedia:AIFC|AIFC]] files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of [[wikipedia:Multichannel_audio|channels]].  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [[wikipedia:Roll-off|Wikipedia&#039;s article on filter roll-off]], to learn why it&#039;s hard to build analog filters with a very narrow [[wikipedia:Transition_band|transition band]] between the [[wikipedia:Passband|passband]] and the [[wikipedia:Stopband|stopband]].&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=PCM Some more minutiae] about PCM in practice.&lt;br /&gt;
* [[wikipedia:DPCM|DPCM]] and [[wikipedia:ADPCM|ADPCM]], simple audio codecs derived from PCM.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
[[Image:Dmpfg_010.jpg|360px|right]]&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. [[wikipedia:Red_Book_(audio_Compact_Disc_standard)#Technical_details|Raw CD audio]] is about 1.4 megabits&lt;br /&gt;
per second. Raw [[wikipedia:1080i|1080i]] HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By [[wikipedia:Moore&#039;s_law|Moore&#039;s law]]... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal [[wikipedia:NTSC|analog television broadcast]].  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
[[Image:Dmpfg_011.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of [[wikipedia:Scan_line|scanlines]] in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
[[wikipedia:Bandwidth_(signal_processing)|bandwidth]]. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of [[wikipedia:DVD-Video#Frame_size_and_frame_rate|704 by 480]], a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of [[wikipedia:Standard-definition_television#Resolution|10:11]], making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
[[Image:Dmpfg_012.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the [[wikipedia:Frame_rate|frame rate]], the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
[[wikipedia:Interlace|interlacing]].&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize [[wikipedia:Flicker_(screen)|flicker]]&lt;br /&gt;
on phosphor-based [[wikipedia:Cathode_ray_tube|CRTs]].  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
[[wikipedia:Deinterlacing|deinterlace]] a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
[[Image:Dmpfg_013.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary [[wiki:Gamma_correction|gamma correction]] circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard [[wikipedia:sRGB|sRGB]] computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
[[Image:Dmpfg_014.jpg|360px|right]]&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as [[wikipedia:Additive_color|additive primaries]] to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are [[wikipedia:CMYK|Cyan, Magenta, and Yellow]] for the same reason; pigments&lt;br /&gt;
are [[wikipedia:Subtractive_color|subtractive]], and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to [[wikipedia:Luminance_(relative)|luminosity]] than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution [[wikipedia:Luma_(video)|luma channel]]&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution [[wikipedia:Chrominance|chroma channels]], the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called [[wikipedia:Y&#039;CbCr|Y&#039;CbCr]], but the more generic term YUV is&lt;br /&gt;
widely used to describe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually [[wikipedia:Chroma_subsampling|halved or even quartered]] in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are [[wikipedia:Chroma_subsampling#4:4:4_Y.27CbCr|4:4:4]] video, which isn&#039;t actually subsampled at all, [[wikipedia:Chroma_subsampling#4:2:2|4:2:2]] video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, [[wikipedia:Chroma_subsampling#4:2:0|4:2:0]] video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, [[wikipedia:Chroma_subsampling#4:1:1|4:1:1]], and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, [[wikipedia:Motion_Jpeg|motion&lt;br /&gt;
JPEG]], [[wikipedia:MPEG-1#Part_2:_Video|MPEG-1 video]], [[wikipedia:MPEG-2#Video_coding_.28simplified.29|MPEG-2 video]], [[wikipedia:DV#DV_Compression|DV]], [[wikipedia:Theora|Theora]], and [[wikipedia:WebM|WebM]] all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels [http://www.mir.com/DMG/chroma.html three different ways].&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG-1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG-2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least [http://www.fourcc.org/yuv.php 50 different formats] in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or [[wikipedia:FourCC|fourcc]] code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  [http://www.fourcc.org/yuv.php#UYVY YV12]&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of [[wikipedia:YUV#BT.709_and_BT.601|several YUV colorspace definitions]].&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* [http://wiki.multimedia.cx/index.php?title=YUV4MPEG2 The y4m format] is the most common simple container for raw YUV video.  People occasionally use [[OggYUV]] to store it in Ogg instead.&lt;br /&gt;
* Learn about [[wikipedia:High_dynamic_range_imaging|high dynamic range imaging]], which achieves better representation of the full range of brightnesses in the real world by using more than 8 bits per channel.&lt;br /&gt;
* Learn about how [[wikipedia:Trichromatic_vision|trichromatic color vision]] works in humans, and how human color perception is encoded in the [[wikipedia:CIE 1931 color space|CIE 1931 XYZ color space]].&lt;br /&gt;
** Compare with the [[wikipedia:Lab_color_space|Lab color space]], mathematically equivalent but structured to account for &amp;quot;perceptual uniformity&amp;quot;.&lt;br /&gt;
** Some humans might be [[wikipedia:Tetrachromacy#Possibility_of_human_tetrachromats|tetrachromats]], in which case they would need an additional color channel for video to fully represent their vision.&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other [[wikipedia:Metadata#Video|metadata]] we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a [[wikipedia:Container_format_(digital)|container]].  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the Microscopic Septet&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking a lot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast, and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orange audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy).&lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12368</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12368"/>
		<updated>2010-09-21T22:25:25Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Postprocessing */ sp&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_001.jpg|thumb|360px|right]]&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_002.jpg|thumb|360px|right]]&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Fixme: Some good HTML5 baseline codec debate article&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_004.jpg|thumb|360px|right]]&lt;br /&gt;
&lt;br /&gt;
Sound is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the &#039;scope down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s continuous in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is discrete in both value and time.&lt;br /&gt;
In the simplest and most common system, called Pulse Code Modulation,&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_006.jpg|thumb|360px|right]]&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the Sampling Theorem says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by Claude Shannon in 1949&lt;br /&gt;
and built on the work of Nyquist, and Hartley, and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The telegraph predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... tickertape. Harry Nyquist of Bell Labs was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the Nyquist frequency, the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_007.jpg|thumb|360px|right]]&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A second-order low-pass filter, for example,&lt;br /&gt;
requires two passive components.  An all-analog short-time Fourier&lt;br /&gt;
transform, a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy [bang on the 3585].  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And since we&lt;br /&gt;
do, digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds noise and distortion, usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an inductor the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use lossy algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern conversion stages are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
Sigma-Delta coding used by the SACD, which is a form of Pulse Density&lt;br /&gt;
Modulation.  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&lt;br /&gt;
===sample rate===&lt;br /&gt;
&lt;br /&gt;
The first parameter is the sampling rate.  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally band-limited voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_008.jpg|thumb|360px|right]]&lt;br /&gt;
Stepping back for just a second, the French mathematician Jean&lt;br /&gt;
Baptiste Joseph Fourier showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This frequency-domain&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it a different way.  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as aliasing distortion.&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
20kHz but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
octave or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&lt;br /&gt;
===sample format===&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was eight-bit linear, encoded as an unsigned byte.  The&lt;br /&gt;
dynamic range is limited to about 50dB  and the quantization noise, as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called A-law and mu-law. These formats encode a roughly&lt;br /&gt;
14 bit dynamic range into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit two&#039;s-complement signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are clipped.&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use floating-point&lt;br /&gt;
numbers for PCM instead of integers.  A 32 bit IEEE754 float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in big- or little-endian order, and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft WAV files are little-endian,&lt;br /&gt;
and Apple AIFC files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of channels.  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. Raw CD audio is about 1.4 megabits&lt;br /&gt;
per second. Raw 1080i HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By Moore&#039;s law... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal analog television broadcast.  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of scanlines in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
bandwidth. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of 704 by 480, a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of 10:11, making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the frame rate, the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
interlacing.&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize flicker&lt;br /&gt;
on phosphor-based CRTs.  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
deinterlace a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary gamma correction circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard sRGB computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as additive primaries to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are Cyan, Magenta, and Yellow for the same reason; pigments&lt;br /&gt;
are subtractive, and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to luminosity than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution luma channel&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution chroma channels, the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called Y&#039;CbCr, but the more generic term YUV is&lt;br /&gt;
widely used to decribe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually halved or even quartered in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are 4:4:4 video, which isn&#039;t actually subsampled at all, 4:2:2 video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, 4:2:0 video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, 4:1:1, and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, motion&lt;br /&gt;
JPEG, MPEG-1 video, MPEG-2 video, DV, Theora, and WebM all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels three different&lt;br /&gt;
ways.&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least 50 different formats in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or fourcc code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  YV12&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of several YUV colorspace definitions.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other metadata we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a container.  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the Microscopic Septet&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking a lot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast, and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orange audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy).&lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12367</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12367"/>
		<updated>2010-09-21T22:24:00Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Video shooting sequence */ sp&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_001.jpg|thumb|360px|right]]&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_002.jpg|thumb|360px|right]]&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Fixme: Some good HTML5 baseline codec debate article&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_004.jpg|thumb|360px|right]]&lt;br /&gt;
&lt;br /&gt;
Sound is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the &#039;scope down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s continuous in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is discrete in both value and time.&lt;br /&gt;
In the simplest and most common system, called Pulse Code Modulation,&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_006.jpg|thumb|360px|right]]&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the Sampling Theorem says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by Claude Shannon in 1949&lt;br /&gt;
and built on the work of Nyquist, and Hartley, and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The telegraph predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... tickertape. Harry Nyquist of Bell Labs was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the Nyquist frequency, the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_007.jpg|thumb|360px|right]]&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A second-order low-pass filter, for example,&lt;br /&gt;
requires two passive components.  An all-analog short-time Fourier&lt;br /&gt;
transform, a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy [bang on the 3585].  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And since we&lt;br /&gt;
do, digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds noise and distortion, usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an inductor the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use lossy algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern conversion stages are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
Sigma-Delta coding used by the SACD, which is a form of Pulse Density&lt;br /&gt;
Modulation.  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&lt;br /&gt;
===sample rate===&lt;br /&gt;
&lt;br /&gt;
The first parameter is the sampling rate.  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally band-limited voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_008.jpg|thumb|360px|right]]&lt;br /&gt;
Stepping back for just a second, the French mathematician Jean&lt;br /&gt;
Baptiste Joseph Fourier showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This frequency-domain&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it a different way.  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as aliasing distortion.&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
20kHz but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
octave or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&lt;br /&gt;
===sample format===&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was eight-bit linear, encoded as an unsigned byte.  The&lt;br /&gt;
dynamic range is limited to about 50dB  and the quantization noise, as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called A-law and mu-law. These formats encode a roughly&lt;br /&gt;
14 bit dynamic range into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit two&#039;s-complement signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are clipped.&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use floating-point&lt;br /&gt;
numbers for PCM instead of integers.  A 32 bit IEEE754 float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in big- or little-endian order, and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft WAV files are little-endian,&lt;br /&gt;
and Apple AIFC files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of channels.  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. Raw CD audio is about 1.4 megabits&lt;br /&gt;
per second. Raw 1080i HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By Moore&#039;s law... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal analog television broadcast.  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of scanlines in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
bandwidth. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of 704 by 480, a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of 10:11, making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the frame rate, the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
interlacing.&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize flicker&lt;br /&gt;
on phosphor-based CRTs.  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
deinterlace a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary gamma correction circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard sRGB computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as additive primaries to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are Cyan, Magenta, and Yellow for the same reason; pigments&lt;br /&gt;
are subtractive, and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to luminosity than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution luma channel&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution chroma channels, the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called Y&#039;CbCr, but the more generic term YUV is&lt;br /&gt;
widely used to decribe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually halved or even quartered in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are 4:4:4 video, which isn&#039;t actually subsampled at all, 4:2:2 video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, 4:2:0 video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, 4:1:1, and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, motion&lt;br /&gt;
JPEG, MPEG-1 video, MPEG-2 video, DV, Theora, and WebM all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels three different&lt;br /&gt;
ways.&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least 50 different formats in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or fourcc code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  YV12&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of several YUV colorspace definitions.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other metadata we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a container.  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the Microscopic Septet&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking a lot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orage audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy). &lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12366</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12366"/>
		<updated>2010-09-21T22:22:41Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Credits */ sp, punct&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_001.jpg|thumb|360px|right]]&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_002.jpg|thumb|360px|right]]&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Fixme: Some good HTML5 baseline codec debate article&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_004.jpg|thumb|360px|right]]&lt;br /&gt;
&lt;br /&gt;
Sound is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the &#039;scope down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s continuous in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is discrete in both value and time.&lt;br /&gt;
In the simplest and most common system, called Pulse Code Modulation,&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_006.jpg|thumb|360px|right]]&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the Sampling Theorem says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by Claude Shannon in 1949&lt;br /&gt;
and built on the work of Nyquist, and Hartley, and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The telegraph predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... tickertape. Harry Nyquist of Bell Labs was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the Nyquist frequency, the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_007.jpg|thumb|360px|right]]&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A second-order low-pass filter, for example,&lt;br /&gt;
requires two passive components.  An all-analog short-time Fourier&lt;br /&gt;
transform, a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy [bang on the 3585].  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And since we&lt;br /&gt;
do, digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds noise and distortion, usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an inductor the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use lossy algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern conversion stages are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
Sigma-Delta coding used by the SACD, which is a form of Pulse Density&lt;br /&gt;
Modulation.  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&lt;br /&gt;
===sample rate===&lt;br /&gt;
&lt;br /&gt;
The first parameter is the sampling rate.  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally band-limited voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_008.jpg|thumb|360px|right]]&lt;br /&gt;
Stepping back for just a second, the French mathematician Jean&lt;br /&gt;
Baptiste Joseph Fourier showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This frequency-domain&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it a different way.  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as aliasing distortion.&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
20kHz but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
octave or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&lt;br /&gt;
===sample format===&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was eight-bit linear, encoded as an unsigned byte.  The&lt;br /&gt;
dynamic range is limited to about 50dB  and the quantization noise, as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called A-law and mu-law. These formats encode a roughly&lt;br /&gt;
14 bit dynamic range into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit two&#039;s-complement signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are clipped.&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use floating-point&lt;br /&gt;
numbers for PCM instead of integers.  A 32 bit IEEE754 float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in big- or little-endian order, and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft WAV files are little-endian,&lt;br /&gt;
and Apple AIFC files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of channels.  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. Raw CD audio is about 1.4 megabits&lt;br /&gt;
per second. Raw 1080i HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By Moore&#039;s law... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal analog television broadcast.  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of scanlines in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
bandwidth. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of 704 by 480, a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of 10:11, making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the frame rate, the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
interlacing.&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize flicker&lt;br /&gt;
on phosphor-based CRTs.  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
deinterlace a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary gamma correction circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard sRGB computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as additive primaries to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are Cyan, Magenta, and Yellow for the same reason; pigments&lt;br /&gt;
are subtractive, and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to luminosity than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution luma channel&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution chroma channels, the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called Y&#039;CbCr, but the more generic term YUV is&lt;br /&gt;
widely used to decribe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually halved or even quartered in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are 4:4:4 video, which isn&#039;t actually subsampled at all, 4:2:2 video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, 4:2:0 video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, 4:1:1, and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, motion&lt;br /&gt;
JPEG, MPEG-1 video, MPEG-2 video, DV, Theora, and WebM all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels three different&lt;br /&gt;
ways.&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least 50 different formats in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or fourcc code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  YV12&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of several YUV colorspace definitions.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other metadata we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a container.  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well-earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then&amp;amp;mdash;Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&amp;lt;br&amp;gt;&lt;br /&gt;
Performed by the Microscopic Septet&amp;lt;br&amp;gt;&lt;br /&gt;
Used by permission of Cuneiform Records.&amp;lt;br&amp;gt;&lt;br /&gt;
Original source track All Rights Reserved.&amp;lt;br&amp;gt;&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking alot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orage audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy). &lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12365</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12365"/>
		<updated>2010-09-21T22:20:42Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Containers */ sp, punct&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_001.jpg|thumb|360px|right]]&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_002.jpg|thumb|360px|right]]&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Fixme: Some good HTML5 baseline codec debate article&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_004.jpg|thumb|360px|right]]&lt;br /&gt;
&lt;br /&gt;
Sound is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the &#039;scope down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s continuous in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is discrete in both value and time.&lt;br /&gt;
In the simplest and most common system, called Pulse Code Modulation,&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_006.jpg|thumb|360px|right]]&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the Sampling Theorem says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by Claude Shannon in 1949&lt;br /&gt;
and built on the work of Nyquist, and Hartley, and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The telegraph predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... tickertape. Harry Nyquist of Bell Labs was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the Nyquist frequency, the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_007.jpg|thumb|360px|right]]&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A second-order low-pass filter, for example,&lt;br /&gt;
requires two passive components.  An all-analog short-time Fourier&lt;br /&gt;
transform, a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy [bang on the 3585].  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And since we&lt;br /&gt;
do, digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds noise and distortion, usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an inductor the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use lossy algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern conversion stages are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
Sigma-Delta coding used by the SACD, which is a form of Pulse Density&lt;br /&gt;
Modulation.  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&lt;br /&gt;
===sample rate===&lt;br /&gt;
&lt;br /&gt;
The first parameter is the sampling rate.  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally band-limited voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_008.jpg|thumb|360px|right]]&lt;br /&gt;
Stepping back for just a second, the French mathematician Jean&lt;br /&gt;
Baptiste Joseph Fourier showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This frequency-domain&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it a different way.  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as aliasing distortion.&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
20kHz but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
octave or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&lt;br /&gt;
===sample format===&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was eight-bit linear, encoded as an unsigned byte.  The&lt;br /&gt;
dynamic range is limited to about 50dB  and the quantization noise, as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called A-law and mu-law. These formats encode a roughly&lt;br /&gt;
14 bit dynamic range into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit two&#039;s-complement signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are clipped.&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use floating-point&lt;br /&gt;
numbers for PCM instead of integers.  A 32 bit IEEE754 float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in big- or little-endian order, and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft WAV files are little-endian,&lt;br /&gt;
and Apple AIFC files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of channels.  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. Raw CD audio is about 1.4 megabits&lt;br /&gt;
per second. Raw 1080i HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By Moore&#039;s law... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal analog television broadcast.  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of scanlines in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
bandwidth. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of 704 by 480, a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of 10:11, making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the frame rate, the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
interlacing.&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize flicker&lt;br /&gt;
on phosphor-based CRTs.  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
deinterlace a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary gamma correction circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard sRGB computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as additive primaries to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are Cyan, Magenta, and Yellow for the same reason; pigments&lt;br /&gt;
are subtractive, and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to luminosity than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution luma channel&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution chroma channels, the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called Y&#039;CbCr, but the more generic term YUV is&lt;br /&gt;
widely used to decribe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually halved or even quartered in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are 4:4:4 video, which isn&#039;t actually subsampled at all, 4:2:2 video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, 4:2:0 video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, 4:1:1, and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, motion&lt;br /&gt;
JPEG, MPEG-1 video, MPEG-2 video, DV, Theora, and WebM all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels three different&lt;br /&gt;
ways.&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least 50 different formats in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or fourcc code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  YV12&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of several YUV colorspace definitions.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight-up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally-visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid predetermined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames, though, aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other metadata we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata&amp;amp;mdash;that is, data about the data&amp;amp;mdash;within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a container.  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate, and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then--- Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&lt;br /&gt;
Performed by the Microscopic Septet&lt;br /&gt;
Used by permission of Cuneiform Records.&lt;br /&gt;
Original source track All Rights Reserved.&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking alot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orage audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy). &lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12364</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12364"/>
		<updated>2010-09-21T22:19:20Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Video vegetables (they&amp;#039;re good for you!) */ sp, punct&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_001.jpg|thumb|360px|right]]&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_002.jpg|thumb|360px|right]]&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Fixme: Some good HTML5 baseline codec debate article&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_004.jpg|thumb|360px|right]]&lt;br /&gt;
&lt;br /&gt;
Sound is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the &#039;scope down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s continuous in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is discrete in both value and time.&lt;br /&gt;
In the simplest and most common system, called Pulse Code Modulation,&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_006.jpg|thumb|360px|right]]&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the Sampling Theorem says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by Claude Shannon in 1949&lt;br /&gt;
and built on the work of Nyquist, and Hartley, and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The telegraph predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... tickertape. Harry Nyquist of Bell Labs was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the Nyquist frequency, the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_007.jpg|thumb|360px|right]]&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A second-order low-pass filter, for example,&lt;br /&gt;
requires two passive components.  An all-analog short-time Fourier&lt;br /&gt;
transform, a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy [bang on the 3585].  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And since we&lt;br /&gt;
do, digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds noise and distortion, usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an inductor the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use lossy algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern conversion stages are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
Sigma-Delta coding used by the SACD, which is a form of Pulse Density&lt;br /&gt;
Modulation.  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&lt;br /&gt;
===sample rate===&lt;br /&gt;
&lt;br /&gt;
The first parameter is the sampling rate.  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally band-limited voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_008.jpg|thumb|360px|right]]&lt;br /&gt;
Stepping back for just a second, the French mathematician Jean&lt;br /&gt;
Baptiste Joseph Fourier showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This frequency-domain&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it a different way.  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as aliasing distortion.&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
20kHz but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
octave or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&lt;br /&gt;
===sample format===&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was eight-bit linear, encoded as an unsigned byte.  The&lt;br /&gt;
dynamic range is limited to about 50dB  and the quantization noise, as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called A-law and mu-law. These formats encode a roughly&lt;br /&gt;
14 bit dynamic range into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit two&#039;s-complement signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are clipped.&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use floating-point&lt;br /&gt;
numbers for PCM instead of integers.  A 32 bit IEEE754 float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in big- or little-endian order, and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft WAV files are little-endian,&lt;br /&gt;
and Apple AIFC files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of channels.  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. Raw CD audio is about 1.4 megabits&lt;br /&gt;
per second. Raw 1080i HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process, and store per&lt;br /&gt;
second.  By Moore&#039;s law... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty-year-old black and white television could still show a&lt;br /&gt;
normal analog television broadcast.  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of scanlines in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
bandwidth. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of 704 by 480, a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of 10:11, making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the frame rate, the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
interlacing.&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical frame rate to smooth motion and to minimize flicker&lt;br /&gt;
on phosphor-based CRTs.  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
deinterlace a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary gamma correction circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard sRGB computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as additive primaries to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are Cyan, Magenta, and Yellow for the same reason; pigments&lt;br /&gt;
are subtractive, and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be, and sometimes is, represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to luminosity than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution luma channel&amp;amp;mdash;the black &amp;amp; white&amp;amp;mdash;along with&lt;br /&gt;
additional, often lower resolution chroma channels, the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset, and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called Y&#039;CbCr, but the more generic term YUV is&lt;br /&gt;
widely used to decribe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually halved or even quartered in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are 4:4:4 video, which isn&#039;t actually subsampled at all, 4:2:2 video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, 4:2:0 video in which both the horizontal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, 4:1:1, and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, motion&lt;br /&gt;
JPEG, MPEG-1 video, MPEG-2 video, DV, Theora, and WebM all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels three different&lt;br /&gt;
ways.&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizarre.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizontal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least 50 different formats in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings, due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware, or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or fourcc code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  YV12&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of several YUV colorspace definitions.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not-so-quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite a lot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid pre-determined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames though aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other metadata we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata, that is,  data about the data, within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a container.  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then--- Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&lt;br /&gt;
Performed by the Microscopic Septet&lt;br /&gt;
Used by permission of Cuneiform Records.&lt;br /&gt;
Original source track All Rights Reserved.&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking alot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orage audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy). &lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12362</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12362"/>
		<updated>2010-09-21T21:53:49Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Raw (digital audio) meat */ sp, punct&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_001.jpg|thumb|360px|right]]&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_002.jpg|thumb|360px|right]]&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Fixme: Some good HTML5 baseline codec debate article&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Analog_vs_Digital|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_004.jpg|thumb|360px|right]]&lt;br /&gt;
&lt;br /&gt;
Sound is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the &#039;scope down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s continuous in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is discrete in both value and time.&lt;br /&gt;
In the simplest and most common system, called Pulse Code Modulation,&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_006.jpg|thumb|360px|right]]&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the Sampling Theorem says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by Claude Shannon in 1949&lt;br /&gt;
and built on the work of Nyquist, and Hartley, and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The telegraph predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... tickertape. Harry Nyquist of Bell Labs was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the Nyquist frequency, the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_007.jpg|thumb|360px|right]]&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A second-order low-pass filter, for example,&lt;br /&gt;
requires two passive components.  An all-analog short-time Fourier&lt;br /&gt;
transform, a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy [bang on the 3585].  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And since we&lt;br /&gt;
do, digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds noise and distortion, usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an inductor the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use lossy algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern conversion stages are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Raw_.28digital_audio.29_meat|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist: for example, the&lt;br /&gt;
Sigma-Delta coding used by the SACD, which is a form of Pulse Density&lt;br /&gt;
Modulation.  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&lt;br /&gt;
===sample rate===&lt;br /&gt;
&lt;br /&gt;
The first parameter is the sampling rate.  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore, the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally band-limited voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate: the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like&amp;amp;mdash;a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_008.jpg|thumb|360px|right]]&lt;br /&gt;
Stepping back for just a second, the French mathematician Jean&lt;br /&gt;
Baptiste Joseph Fourier showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This frequency-domain&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it a different way.  Here we see the&lt;br /&gt;
frequency-domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a low-pass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as aliasing distortion.&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the low pass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
20kHz but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build, and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the low pass has an extra&lt;br /&gt;
octave or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&lt;br /&gt;
===sample format===&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format; that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was eight-bit linear, encoded as an unsigned byte.  The&lt;br /&gt;
dynamic range is limited to about 50dB  and the quantization noise, as&lt;br /&gt;
you can hear, is pretty severe.  Eight-bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called A-law and mu-law. These formats encode a roughly&lt;br /&gt;
14 bit dynamic range into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight-bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16- or 24-bit two&#039;s-complement signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels, and thus&lt;br /&gt;
beyond the maximum representable range, are clipped.&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use floating-point&lt;br /&gt;
numbers for PCM instead of integers.  A 32 bit IEEE754 float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating-point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating-point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in big- or little-endian order, and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft WAV files are little-endian,&lt;br /&gt;
and Apple AIFC files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of channels.  The convention in&lt;br /&gt;
raw PCM is to encode multiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is &#039;&#039;so easy&#039;&#039;!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Video_vegetables_.28they.27re_good_for_you.21.29|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. Raw CD audio is about 1.4 megabits&lt;br /&gt;
per second. Raw 1080i HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process and store per&lt;br /&gt;
second.  By Moore&#039;s law... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty year old black and white television could still show a&lt;br /&gt;
normal analog television broadcast.  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of scanlines in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
bandwidth. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of 704 by 480, a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of 10:11, making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the frame rate, the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
interlacing.&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical framerate to smooth motion and to minimize flicker&lt;br /&gt;
on phosphor-based CRTs.  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
deinterlace a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary gamma correction circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard sRGB computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as additive primaries to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are Cyan, Magenta, and Yellow for the same reason; pigments&lt;br /&gt;
are subtractive, and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be and sometimes is represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to luminosity than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution luma channel, the black &amp;amp; white, along with&lt;br /&gt;
additional, often lower resolution chroma channels, the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called Y&#039;CbCr, but the more generic term YUV is&lt;br /&gt;
widely used to decribe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually halved or even quartered in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are 4:4:4 video, which isn&#039;t actually subsampled at all, 4:2:2 video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, 4:2:0 video in which both the horizonal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, 4:1:1 and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, motion&lt;br /&gt;
JPEG, MPEG-1 video, MPEG-2 video, DV, Theora and WebM all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels three different&lt;br /&gt;
ways.&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizaare.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizonatal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least 50 different formats in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or fourcc code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  YV12&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of several YUV colorspace definitions.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not so quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite alot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Containers|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid pre-determined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames though aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other metadata we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata, that is,  data about the data, within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a container.  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then--- Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&lt;br /&gt;
Performed by the Microscopic Septet&lt;br /&gt;
Used by permission of Cuneiform Records.&lt;br /&gt;
Original source track All Rights Reserved.&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
==The making of…==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#The_making_of.E2.80.A6|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
===Equipment===&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking alot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orage audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy). &lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12355</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12355"/>
		<updated>2010-09-21T21:45:50Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Analog vs Digital */ sp, punct&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_001.jpg|thumb|360px|right]]&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_002.jpg|thumb|360px|right]]&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Fixme: Some good HTML5 baseline codec debate article&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
[[Image:Dmpfg_004.jpg|thumb|360px|right]]&lt;br /&gt;
&lt;br /&gt;
Sound is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the &#039;scope down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s continuous in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is discrete in both value and time.&lt;br /&gt;
In the simplest and most common system, called Pulse Code Modulation,&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_006.jpg|thumb|360px|right]]&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the Sampling Theorem says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by Claude Shannon in 1949&lt;br /&gt;
and built on the work of Nyquist, and Hartley, and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangeable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The telegraph predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... tickertape. Harry Nyquist of Bell Labs was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the Nyquist frequency, the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_007.jpg|thumb|360px|right]]&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee, it&#039;s so much easier.  A second-order low-pass filter, for example,&lt;br /&gt;
requires two passive components.  An all-analog short-time Fourier&lt;br /&gt;
transform, a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy [bang on the 3585].  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And since we&lt;br /&gt;
do, digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is a lot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds noise and distortion, usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up a lot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an inductor the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated, and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use lossy algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern conversion stages are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically lossless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Wikipedia: [[wiki:Nyquist–Shannon_sampling_theorem|Nyquist–Shannon sampling theorem]]&lt;br /&gt;
*MIT OpenCourseWare [http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-003-signals-and-systems-spring-2010/lecture-notes/ Lecture notes from 6.003 signals and systems.]&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist, for example the&lt;br /&gt;
Sigma-Delta coding used by the SACD, which is a form of Pulse Density&lt;br /&gt;
Modulation.  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&lt;br /&gt;
===sample rate===&lt;br /&gt;
&lt;br /&gt;
The first parameter is the sampling rate.  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally band-limited voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate, the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like--- a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_008.jpg|thumb|360px|right]]&lt;br /&gt;
Stepping back for just a second, the French mathematician Jean&lt;br /&gt;
Baptiste Joseph Fourier showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This frequency domain&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it a different way.  Here we see the&lt;br /&gt;
frequency domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a lowpass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as aliasing distortion.&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the lowpass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
20kHz but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the lowpass has an extra&lt;br /&gt;
octave or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&lt;br /&gt;
===sample format===&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format, that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was eight bit linear, encoded as an unsigned byte.  The&lt;br /&gt;
dynamic range is limited to about 50dB and the quantization noise, as&lt;br /&gt;
you can hear, is pretty severe.  Eight bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called A-law and mu-law. These formats encode a roughly&lt;br /&gt;
14 bit dynamic range into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16 or 24 bit two&#039;s-complement signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels and thus&lt;br /&gt;
beyond the maximum representable range are clipped.&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use floating point&lt;br /&gt;
numbers for PCM instead of integers.  A 32 bit IEEE754 float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in big or little endian order, and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft WAV files are little endian,&lt;br /&gt;
and Apple AIFC files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of channels.  The convention in&lt;br /&gt;
raw PCM is to encode mutiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is _so easy_!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. Raw CD audio is about 1.4 megabits&lt;br /&gt;
per second. Raw 1080i HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process and store per&lt;br /&gt;
second.  By Moore&#039;s law... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty year old black and white television could still show a&lt;br /&gt;
normal analog television broadcast.  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of scanlines in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
bandwidth. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of 704 by 480, a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of 10:11, making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the frame rate, the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
interlacing.&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical framerate to smooth motion and to minimize flicker&lt;br /&gt;
on phosphor-based CRTs.  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
deinterlace a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary gamma correction circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard sRGB computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as additive primaries to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are Cyan, Magenta, and Yellow for the same reason; pigments&lt;br /&gt;
are subtractive, and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be and sometimes is represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to luminosity than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution luma channel, the black &amp;amp; white, along with&lt;br /&gt;
additional, often lower resolution chroma channels, the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called Y&#039;CbCr, but the more generic term YUV is&lt;br /&gt;
widely used to decribe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually halved or even quartered in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are 4:4:4 video, which isn&#039;t actually subsampled at all, 4:2:2 video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, 4:2:0 video in which both the horizonal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, 4:1:1 and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, motion&lt;br /&gt;
JPEG, MPEG-1 video, MPEG-2 video, DV, Theora and WebM all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels three different&lt;br /&gt;
ways.&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizaare.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizonatal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least 50 different formats in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or fourcc code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  YV12&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of several YUV colorspace definitions.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not so quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite alot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
* Fixme&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid pre-determined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames though aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other metadata we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata, that is,  data about the data, within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a container.  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then--- Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&lt;br /&gt;
Performed by the Microscopic Septet&lt;br /&gt;
Used by permission of Cuneiform Records.&lt;br /&gt;
Original source track All Rights Reserved.&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
==The making of…==&lt;br /&gt;
&lt;br /&gt;
===Equipment===&lt;br /&gt;
&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking alot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orage audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy). &lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12349</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12349"/>
		<updated>2010-09-21T21:39:01Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Introduction */ punctuation&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
[[Image:Dmpfg_001.jpg|thumb|360px|right]]&lt;br /&gt;
This first video from Xiph.Org presents the technical foundations of modern digital media via a half-hour firehose of information. One community member called it &amp;quot;a Uni lecture I never got but really wanted.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
The program offers a brief history of digital media, a quick summary of the sampling theorem, and myriad details of low level audio and video characterization and formatting. It&#039;s intended for budding geeks looking to get into video coding, as well as the technically curious who want to know more about the media they wrangle for work or play.&lt;br /&gt;
&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;font size=&amp;quot;+2&amp;quot;&amp;gt;[http://www.xiph.org/video/vid1.shtml Download or Watch online]&amp;lt;/font&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&amp;lt;small&amp;gt;[[Talk:A_Digital_Media_Primer_For_Geeks_(episode_1)#Introduction|Discuss this section]]&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Workstations and high-end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_002.jpg|thumb|360px|right]]&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Fixme: Some good HTML5 baseline codec debate article&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
[[Image:Dmpfg_004.jpg|thumb|360px|right]]&lt;br /&gt;
&lt;br /&gt;
Sound is the propagation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the &#039;scope down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s continuous in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is discrete in both value and time.&lt;br /&gt;
In the simplest and most common system, called Pulse Code Modulation,&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_006.jpg|thumb|360px|right]]&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the Sampling Theorem says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by Claude Shannon in 1949&lt;br /&gt;
and built on the work of Nyquist, and Hartley, and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are fairly recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The telegraph predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... Tickertape. Harry Nyquist of Bell Labs was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the Nyquist frequency, the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal technology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_007.jpg|thumb|360px|right]]&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee it&#039;s so much easier.  A second-order lowpass filter, for example,&lt;br /&gt;
requires two passive components.  An all-analog short-time Fourier&lt;br /&gt;
transform, a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy [bang on the 3585].  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And since we&lt;br /&gt;
do, digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is alot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds noise and distortion, usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up alot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an inductor the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use lossy algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern conversion stages are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically losless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;div style=&amp;quot;background-color:#DDDDFF;border-color:#CCCCDD;border-style:solid;width:80%;padding:0 1em 1em 1em;text-align:left;&amp;quot;&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Going deeper…&#039;&#039;&#039;&lt;br /&gt;
*Fixme: Some good HTML5 baseline codec debate article&lt;br /&gt;
*[http://diveintohtml5.org/video.html Dive into HTML5] web video tutorial&lt;br /&gt;
&amp;lt;/div&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist, for example the&lt;br /&gt;
Sigma-Delta coding used by the SACD, which is a form of Pulse Density&lt;br /&gt;
Modulation.  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&lt;br /&gt;
===sample rate===&lt;br /&gt;
&lt;br /&gt;
The first parameter is the sampling rate.  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally band-limited voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate, the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like--- a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s virtually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
[[Image:Dmpfg_008.jpg|thumb|360px|right]]&lt;br /&gt;
Stepping back for just a second, the French mathematician Jean&lt;br /&gt;
Baptiste Joseph Fourier showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This frequency domain&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it a different way.  Here we see the&lt;br /&gt;
frequency domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a lowpass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as aliasing distortion.&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the lowpass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
20kHz but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the lowpass has an extra&lt;br /&gt;
octave or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&lt;br /&gt;
===sample format===&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format, that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was eight bit linear, encoded as an unsigned byte.  The&lt;br /&gt;
dynamic range is limited to about 50dB and the quantization noise, as&lt;br /&gt;
you can hear, is pretty severe.  Eight bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called A-law and mu-law. These formats encode a roughly&lt;br /&gt;
14 bit dynamic range into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16 or 24 bit two&#039;s-complement signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels and thus&lt;br /&gt;
beyond the maximum representable range are clipped.&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use floating point&lt;br /&gt;
numbers for PCM instead of integers.  A 32 bit IEEE754 float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in big or little endian order, and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft WAV files are little endian,&lt;br /&gt;
and Apple AIFC files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of channels.  The convention in&lt;br /&gt;
raw PCM is to encode mutiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is _so easy_!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. Raw CD audio is about 1.4 megabits&lt;br /&gt;
per second. Raw 1080i HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process and store per&lt;br /&gt;
second.  By Moore&#039;s law... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatibility.  Up until just last year in the&lt;br /&gt;
US, a sixty year old black and white television could still show a&lt;br /&gt;
normal analog television broadcast.  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatibility is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of scanlines in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
bandwidth. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of 704 by 480, a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of 10:11, making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the frame rate, the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
interlacing.&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical framerate to smooth motion and to minimize flicker&lt;br /&gt;
on phosphor-based CRTs.  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
deinterlace a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary gamma correction circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard sRGB computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as additive primaries to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are Cyan, Magenta, and Yellow for the same reason; pigments&lt;br /&gt;
are subtractive, and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be and sometimes is represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to luminosity than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution luma channel, the black &amp;amp; white, along with&lt;br /&gt;
additional, often lower resolution chroma channels, the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the separate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called Y&#039;CbCr, but the more generic term YUV is&lt;br /&gt;
widely used to decribe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually halved or even quartered in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are 4:4:4 video, which isn&#039;t actually subsampled at all, 4:2:2 video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, 4:2:0 video in which both the horizonal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, 4:1:1 and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, motion&lt;br /&gt;
JPEG, MPEG-1 video, MPEG-2 video, DV, Theora and WebM all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels three different&lt;br /&gt;
ways.&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG2 sites chroma pixels between lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizaare.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizonatal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in separate&lt;br /&gt;
planes stacked in order in the frame. There are at least 50 different formats in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or fourcc code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  YV12&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of several YUV colorspace definitions.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not so quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite alot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
==Containers==&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid pre-determined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames though aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that separate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other metadata we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata, that is,  data about the data, within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a container.  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify multiple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then--- Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&lt;br /&gt;
Performed by the Microscopic Septet&lt;br /&gt;
Used by permission of Cuneiform Records.&lt;br /&gt;
Original source track All Rights Reserved.&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br style=&amp;quot;clear:both;&amp;quot;/&amp;gt;&lt;br /&gt;
==The making of…==&lt;br /&gt;
&lt;br /&gt;
===Equipment===&lt;br /&gt;
&lt;br /&gt;
====Camera====&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating on a tripod.&lt;br /&gt;
&lt;br /&gt;
The wide angle lens gives the camera a nice close macro mode, and approximately triples the amount of light coming into the sensor for a given zoom/aperture.  Useful for shooting indoors at night.&lt;br /&gt;
&lt;br /&gt;
No additional lighting kit was used.&lt;br /&gt;
&lt;br /&gt;
====Audio====&lt;br /&gt;
&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker, run through a cheap Behringer portable mixer and into the camera&#039;s microphone input.  &lt;br /&gt;
&lt;br /&gt;
No additional audio kit was used.&lt;br /&gt;
&lt;br /&gt;
====Sundries====&lt;br /&gt;
&lt;br /&gt;
Whiteboard markers by &#039;Bic&#039;&lt;br /&gt;
&lt;br /&gt;
Drawing aids by Staedtler, McMaster Carr, and &#039;Generic&#039;.&lt;br /&gt;
&lt;br /&gt;
===Video shooting sequence===&lt;br /&gt;
&lt;br /&gt;
Scenes were pre-scripted and memorized, usually with lots of on-the-fly revision.  (In the future... I&#039;m totally getting a teleprompter.  Wh000.  OTOH, I can totally rattle off the entire video script from beginning to end as a party trick, thus ensuring I&#039;ll not be invited to many parties --Monty).&lt;br /&gt;
&lt;br /&gt;
Diagrams were drawn by hand on a physical whiteboard with whiteboard markers and magnetic T-squares, triangles, and yardsticks.  Despite looking alot like greenscreen work, there is no image compositing in use (actually-- there are two small composites where an error in a whiteboard diagram was corrected by subtracting part of the original image and then adding a corrected version of the diagram).&lt;br /&gt;
&lt;br /&gt;
Camera operated in 24F shutter priority mode (Tv set to &amp;quot;24&amp;quot;) with exposure and white balance both calibrated to the white board (or a white piece of paper) and locked.  Microphone attenuation setting active, with gain locked such that room noise peaked at -40dB (all the rooms in the shooting sequences were noisy due to the building&#039;s ventilation system, or active equipment). Lighting in the whiteboard rooms tended to be odd, with little relative light cast on a presenter standing just in front of the whiteboard; a presenter is practically standing in the room&#039;s only shadow.  Most of the room light is focused on the table and walls.  Additional fill lighting kit would have been useful, but for the first vid, I didn&#039;t want &#039;perfect&#039; to be the enemy of &#039;good&#039;.&lt;br /&gt;
&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
&lt;br /&gt;
Continuous capture to a Thinkpad with firewire input via a simple gstreamer script.&lt;br /&gt;
&lt;br /&gt;
===Production sequence===&lt;br /&gt;
&lt;br /&gt;
====All hail Cinelerra.  You better hail, or Cinelerra will get pissy about it.====&lt;br /&gt;
&lt;br /&gt;
Most of the production sequence hinged on making Cinelerra happy; it is a hulking rusty cast iron WWI tank of a program that can seem like it&#039;s composed entirely of compressed bugs.  That said, it was neither particularly crashy nor did it ever accidentally corrupt or lose work.  It was also the only FOSS editor with a working 2D compositor.  It got the job done once I found a workflow it would cope with (and fixed a number of bugs; these fixes are available from my cinelerra Git repo at http://git.xiph.org/?p=users/xiphmont/cinelerraCV.git;a=summary)&lt;br /&gt;
&lt;br /&gt;
====Choosing takes====&lt;br /&gt;
&lt;br /&gt;
Each shooting session yielded four to six hours of raw video.  The first step was to load the raw video into the cinelerra timeline, label each complete take, compare and choose the take to use, then render the chosen take out to a raw clip as a YUV4MPEG raw video file and a WAV raw audio file.  Be careful that Settings-&amp;gt;Align Cursor On Frames is set, else the audio and video renders won&#039;t start on the same boundary.&lt;br /&gt;
&lt;br /&gt;
====Postprocessing====&lt;br /&gt;
&lt;br /&gt;
At this point, the raw video clips were adjusted for gamma, contrast and saturation in gstreamer and mplayer.  In the earlier shoots the camera was underexposing due to pilot error, which required quite a bit of gamma and saturation inflation to &#039;correct&#039; (there is no real correction as the low-end data is gone, but it&#039;s possible to make it look better).  Later shoots used saner settings and the adjustments were mostly to keep different shooting sessions more uniform.  The whiteboard tends not to look white because it&#039;s mildly reflective, and picked up the color of the cyan and orage audio baffles in the room like a big diffuse mirror.&lt;br /&gt;
&lt;br /&gt;
The audio was both noisy (due to the building&#039;s ventilation system which either sounded like a low loud rumble or a jet-engine taking off) and reverberant (the rooms were glass on two sides and plaster on the other two).  Early takes used no additional sound absorbing material in the rooms, and the Postfish filtering and deverb was used heavily.  It gives the early audio in the vid a slightly odd, processed feel (I had almost decided the original audio was simply unusable).  Later takes used some big fleece &#039;soft flats&#039; in the room to absorb some additional reverb, and the later takes are less heavily filtered.&lt;br /&gt;
&lt;br /&gt;
The postfish filtering chain used declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (noise gating), single compand (for volume levelling) and EQ (the Crown mics are nice, but are very midrange heavy). &lt;br /&gt;
&lt;br /&gt;
====Special Effects====&lt;br /&gt;
&lt;br /&gt;
Audio special effects were one-offs, mostly done using Sox.  The processed demo sections of audio were then spliced back into the original audio takes using Audactity.&lt;br /&gt;
&lt;br /&gt;
Video special effects (eg, removing a color channel, etc) were done by writing quick, one-off filters in C for y4oi.  A few effects were done by dumping a take as a directory full of PNGs and then batch-processing the PNGs again using a one-off C program, then reassembling with mplayer.  Video effects were then stitched back into the original video takes in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
====Editing====&lt;br /&gt;
&lt;br /&gt;
All editing was done in Cinelerra.  This primarily consisted of stitching the individual takes back together with crossfades.  All input and rendering output were done with raw YUV4MPEG and WAV files.  Note that making this work well and correctly required several patches to the YUV4MPEG handler and colorspace conversion code.&lt;br /&gt;
&lt;br /&gt;
====Encoding====&lt;br /&gt;
&lt;br /&gt;
Encoding was done by hand external to Cinelerra using mplayer for final postprocessing, the example_encoder included with the [Ptalarbvorm] Theora source distribution, and ivfenc for WebM.&lt;br /&gt;
&lt;br /&gt;
Sample Theora encode command line (note this is using an mplayer patched for y4o support; it could be done just as easily with a yuv4mpeg pipe):&lt;br /&gt;
&lt;br /&gt;
# 360p, 128-ish (a4) audio + 500-ish (v50) video&lt;br /&gt;
mplayer -vf hqdn3d,scale=640:360,gradfun=1.5,unsharp=l3x3:.1,yuv4ogg complete2.m2v -fast -noconsolecontrols -vo null &amp;gt; /dev/null &amp;amp; ~/MotherfishSVN/theora-ptalarbvorm/examples/encoder_example -a 4 -v 50 -k 240 complete2.wav output.y4o -o A_Digital_Media_Primer_For_Geeks-360p-a4+v50.ogv&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12320</id>
		<title>Videos/A Digital Media Primer For Geeks</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=Videos/A_Digital_Media_Primer_For_Geeks&amp;diff=12320"/>
		<updated>2010-09-15T13:03:00Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Introduction */ sp&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;small&amp;gt;&#039;&#039;Wiki edition&#039;&#039;&amp;lt;/small&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
*Watch&lt;br /&gt;
*Download&lt;br /&gt;
*Up to videos page&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
Workstations and high end personal computers have been able to&lt;br /&gt;
manipulate digital audio pretty easily for about fifteen years now.&lt;br /&gt;
It&#039;s only been about five years that a decent workstation&#039;s been able&lt;br /&gt;
to handle raw video without a lot of expensive special purpose&lt;br /&gt;
hardware.&lt;br /&gt;
&lt;br /&gt;
But today even most cheap home PCs have the processor power and&lt;br /&gt;
storage necessary to really toss raw video around, at least without&lt;br /&gt;
too much of a struggle. So now that everyone has all of this cheap media-capable hardware, &lt;br /&gt;
more people, not surprisingly, want to do interesting&lt;br /&gt;
things with digital media, especially streaming. YouTube was the first huge&lt;br /&gt;
success, and now everybody wants in.&lt;br /&gt;
&lt;br /&gt;
Well good!  Because this stuff is a lot of fun!&lt;br /&gt;
&lt;br /&gt;
It&#039;s no problem finding consumers for digital media.  But here, I&#039;d&lt;br /&gt;
like to address the engineers, the mathematicians, the hackers, the&lt;br /&gt;
people who are interested in discovering and making things and&lt;br /&gt;
building the technology itself. The people after my own heart.&lt;br /&gt;
&lt;br /&gt;
Digital media, compression especially, is perceived to be super-elite,&lt;br /&gt;
somehow incredibly more difficult than anything else in computer&lt;br /&gt;
science. The big industry players in the field don&#039;t mind this&lt;br /&gt;
perception at all; it helps justify the staggering number of very&lt;br /&gt;
basic patents they hold.  They like the image that their media&lt;br /&gt;
researchers &amp;quot;are the best of the best, so much smarter than anyone&lt;br /&gt;
else that their brilliant ideas can&#039;t even be understood by mere&lt;br /&gt;
mortals. This is bunk.  &lt;br /&gt;
&lt;br /&gt;
Digital audio and video and streaming and compression offer endless&lt;br /&gt;
deep and stimulating mental challenges, just like any other&lt;br /&gt;
discipline. It seems elite because so few people have been been&lt;br /&gt;
involved.  So few people have been involved perhaps because so few&lt;br /&gt;
people could afford the expensive, special-purpose equipment it&lt;br /&gt;
required. But today, just about anyone watching this video has a&lt;br /&gt;
cheap, general-purpose computer powerful enough to play with the big&lt;br /&gt;
boys. There are battles going on today around HTML5 and browsers and&lt;br /&gt;
video and open vs. closed.  So now is a pretty good time to get&lt;br /&gt;
involved.  The easiest place to start is probably understanding the&lt;br /&gt;
technology we have right now.&lt;br /&gt;
&lt;br /&gt;
This is an introduction. Since it&#039;s an introduction, it glosses over a&lt;br /&gt;
ton of details so that the big picture&#039;s a little easier to see.&lt;br /&gt;
Quite a few people watching are going to be way past anything that I&#039;m&lt;br /&gt;
talking about, at least for now.  On the other hand, I&#039;m probably&lt;br /&gt;
going to go too fast for folks who really are are brand new to all of&lt;br /&gt;
this, so if this is all new, relax. The important thing is to pick out&lt;br /&gt;
any ideas that really grab your imagination. Especially pay attention&lt;br /&gt;
to the terminology surrounding those ideas, because with those, and&lt;br /&gt;
Google, and Wikipedia, you can dig as deep as interests you.&lt;br /&gt;
&lt;br /&gt;
So, without any further ado, welcome to one hell of a new hobby.&lt;br /&gt;
&lt;br /&gt;
==Analog vs Digital==&lt;br /&gt;
&lt;br /&gt;
Sound is the propogation of pressure waves through air, spreading out&lt;br /&gt;
from a source like ripples spread from a stone tossed into a pond.  A&lt;br /&gt;
microphone, or the human ear for that matter, transforms these passing&lt;br /&gt;
ripples of pressure into an electric signal.  Right, this is&lt;br /&gt;
middle school science class, everyone remembers this.  Moving on.&lt;br /&gt;
&lt;br /&gt;
That audio signal is a one-dimensional function, a single value&lt;br /&gt;
varying over time.  If we slow the &#039;scope down a bit... that should be&lt;br /&gt;
a little easier to see. A few other aspects of the signal are&lt;br /&gt;
important. It&#039;s continuous in both value and time; that is, at any&lt;br /&gt;
given time it can have any real value, and there&#039;s a smoothly varying&lt;br /&gt;
value at every point in in time.  No matter how much we zoom in, there&lt;br /&gt;
are no discontinuities, no singularities, no instantaneous steps or&lt;br /&gt;
points where the signal ceases to exist. It&#039;s defined&lt;br /&gt;
everywhere. Classic continuous math works very well on these signals.&lt;br /&gt;
&lt;br /&gt;
A digital signal on the other hand is discrete in both value and time.&lt;br /&gt;
In the simplest and most common system, called Pulse Code Modulation,&lt;br /&gt;
one of a fixed number of possible values directly represents the&lt;br /&gt;
instantaneous signal amplitude at points in time spaced a fixed&lt;br /&gt;
distance apart.  The end result is a stream of digits.&lt;br /&gt;
&lt;br /&gt;
Now this looks an awful lot like this.  It seems intuitive that we&lt;br /&gt;
should somehow be able to rigorously transform one into the other, and&lt;br /&gt;
good news, the Sampling Theorem says we can and tells us&lt;br /&gt;
how. Published in its most recognizable form by Claude Shannon in 1949&lt;br /&gt;
and built on the work of Nyquist, and Hartley, and tons of others, the&lt;br /&gt;
sampling theorem states that not only can we go back and&lt;br /&gt;
forth between analog and digital, but also lays&lt;br /&gt;
down a set of conditions for which conversion is lossless and the two&lt;br /&gt;
representations become equivalent and interchangable.  When the&lt;br /&gt;
lossless conditions aren&#039;t met, the sampling theorem tells us how and&lt;br /&gt;
how much information is lost or corrupted.&lt;br /&gt;
&lt;br /&gt;
Up until very recently, analog technology was the basis for&lt;br /&gt;
practically everything done with audio, and that&#039;s not because most&lt;br /&gt;
audio comes from an originally analog source.  You may also think that&lt;br /&gt;
since computers are farily recent, analog signal technology must have&lt;br /&gt;
come first.  Nope. Digital is actually older.  The telegraph predates&lt;br /&gt;
the telephone by half a century and was already fully mechanically&lt;br /&gt;
automated by the 1860s, sending coded, multiplexed digital signals&lt;br /&gt;
long distances. You know... Tickertape. Harry Nyquist of Bell Labs was&lt;br /&gt;
researching telegraph pulse transmission when he published his&lt;br /&gt;
description of what later became known as the Nyquist frequency, the&lt;br /&gt;
core concept of the sampling theorem.  Now, it&#039;s true the telegraph&lt;br /&gt;
was transmitting symbolic information, text, not a digitized analog&lt;br /&gt;
signal, but with the advent of the telephone and radio, analog and&lt;br /&gt;
digital signal techology progressed rapidly and side-by-side.&lt;br /&gt;
&lt;br /&gt;
Audio had always been manipulated as an analog signal because... well,&lt;br /&gt;
gee it&#039;s so much easier.  A second-order lowpass filter, for example,&lt;br /&gt;
requires two passive components.  An all-analog short-time Fourier&lt;br /&gt;
transform, a few hundred.  Well, maybe a thousand if you want to build&lt;br /&gt;
something really fancy [bang on the 3585].  Processing signals&lt;br /&gt;
digitally requires millions to billions of transistors running at&lt;br /&gt;
microwave frequencies, support hardware at very least to digitize and&lt;br /&gt;
reconstruct the analog signals, a complete software ecosystem for&lt;br /&gt;
programming and controlling that billion-transistor juggernaut,&lt;br /&gt;
digital storage just in case you want to keep any of those bits for&lt;br /&gt;
later...&lt;br /&gt;
&lt;br /&gt;
So we come to the conclusion that analog is the only practical way to&lt;br /&gt;
do much with audio... well, unless you happen to have a billion&lt;br /&gt;
transistors and all the other things just lying around. And since we&lt;br /&gt;
do, digital signal processing becomes very attractive.&lt;br /&gt;
&lt;br /&gt;
For one thing, analog componentry just doesn&#039;t have the flexibility of&lt;br /&gt;
a general purpose computer.  Adding a new function to this&lt;br /&gt;
beast... yeah, it&#039;s probably not going to happen.  On a digital&lt;br /&gt;
processor though, just write a new program.  Software isn&#039;t trivial,&lt;br /&gt;
but it is alot easier.&lt;br /&gt;
&lt;br /&gt;
Perhaps more importantly though every analog component is an&lt;br /&gt;
approximation. There&#039;s no such thing as a perfect transistor, or a&lt;br /&gt;
perfect inductor, or a perfect capacitor.  In analog, every component&lt;br /&gt;
adds noise and distortion, usually not very much, but it adds up. Just&lt;br /&gt;
transmitting an analog signal, especially over long distances,&lt;br /&gt;
progressively, measurably, irretrievably corrupts it.  Besides, all of&lt;br /&gt;
those single-purpose analog components take up alot of space.  Two&lt;br /&gt;
lines of code on the billion transistors back here can implement a&lt;br /&gt;
filter that would require an inductor the size of a refrigerator.&lt;br /&gt;
&lt;br /&gt;
Digital systems don&#039;t have these drawbacks.  Digital signals can be&lt;br /&gt;
stored, copied, manipulated and transmitted without adding any noise&lt;br /&gt;
or distortion. We do use lossy algorithms from time to time, but the&lt;br /&gt;
only unavoidably non-ideal steps are digitization and reconstruction,&lt;br /&gt;
where digital has to interface with all of that messy analog.  Messy&lt;br /&gt;
or not, modern conversion stages are very, very good.  By the&lt;br /&gt;
standards of our ears, we can consider them practically losless as&lt;br /&gt;
well.&lt;br /&gt;
&lt;br /&gt;
With a little extra hardware, then, most of which is now small and&lt;br /&gt;
inexpensive due to our modern industrial infrastructure, digital audio&lt;br /&gt;
is the clear winner over analog.  So let us then go about storing it,&lt;br /&gt;
copying it, manipulating it, and transmitting it.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Raw (digital audio) meat==&lt;br /&gt;
&lt;br /&gt;
Pulse Code Modulation is the most common representation for &lt;br /&gt;
raw audio.  Other practical representations do exist, for example the&lt;br /&gt;
Sigma-Delta coding used by the SACD, which is a form of Pulse Density&lt;br /&gt;
Modulation.  That said, Pulse Code Modulation is far&lt;br /&gt;
and away dominant, mainly because it&#039;s so mathematically&lt;br /&gt;
convenient.  An audio engineer can spend an entire career without&lt;br /&gt;
running into anything else.&lt;br /&gt;
&lt;br /&gt;
PCM encoding can be characterized in three parameters, making it easy&lt;br /&gt;
to account for every possible PCM variant with mercifully little&lt;br /&gt;
hassle.&lt;br /&gt;
&lt;br /&gt;
===sample rate===&lt;br /&gt;
&lt;br /&gt;
The first parameter is the sampling rate.  The highest frequency an&lt;br /&gt;
encoding can represent is called the Nyquist Frequency.  The Nyquist&lt;br /&gt;
frequency of PCM happens to be exactly half the sampling rate.&lt;br /&gt;
Therefore the sampling rate directly determines the highest possible&lt;br /&gt;
frequency in the digitized signal.&lt;br /&gt;
&lt;br /&gt;
Analog telephone systems traditionally band-limited voice channels to&lt;br /&gt;
just under 4kHz, so digital telephony and most classic voice&lt;br /&gt;
applications use an 8kHz sampling rate, the minimum sampling rate&lt;br /&gt;
necessary to capture the entire bandwidth of a 4kHz channel.  This is&lt;br /&gt;
what an 8kHz sampling rate sounds like--- a bit muffled but perfectly&lt;br /&gt;
intelligible for voice.  This is the lowest sampling rate that&#039;s ever&lt;br /&gt;
been used widely in practice.&lt;br /&gt;
&lt;br /&gt;
From there, as power, and memory, and storage increased, consumer&lt;br /&gt;
computer hardware went to offering 11, and then 16, and then 22, and&lt;br /&gt;
then 32kHz sampling.  With each increase in the sampling rate and the&lt;br /&gt;
Nyquist frequency, it&#039;s obvious that the high end becomes a little&lt;br /&gt;
clearer and the sound more natural.&lt;br /&gt;
&lt;br /&gt;
The Compact Disc uses a 44.1kHz sampling rate, which is again slightly&lt;br /&gt;
better than 32kHz, but the gains are becoming less distinct.  44.1kHz&lt;br /&gt;
is a bit of an oddball choice, especially given that it hadn&#039;t been&lt;br /&gt;
used for anything prior to the compact disc, but the huge success of&lt;br /&gt;
the CD has made it a common rate.&lt;br /&gt;
&lt;br /&gt;
The most common hi-fidelity sampling rate aside from the CD is 48kHz.&lt;br /&gt;
There&#039;s vitually no audible difference between the two.  This video,&lt;br /&gt;
or at least the original version of it, was shot and produced with&lt;br /&gt;
48kHz audio, which happens to be the original standard for&lt;br /&gt;
high-fidelity audio with video.&lt;br /&gt;
&lt;br /&gt;
Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also&lt;br /&gt;
appeared. The reason for the sampling rates beyond 48kHz isn&#039;t to&lt;br /&gt;
extend the audible high frequencies further. It&#039;s for a different&lt;br /&gt;
reason.&lt;br /&gt;
&lt;br /&gt;
Stepping back for just a second, the French mathemetician Jean&lt;br /&gt;
Baptiste Joseph Fourier showed that we can also think of signals like&lt;br /&gt;
audio as a set of component frequencies.  This frequency domain&lt;br /&gt;
representation is equivalent to the time representation; the signal is&lt;br /&gt;
exactly the same, we&#039;re just looking at it a different way.  Here we see the&lt;br /&gt;
frequency domain representation of a hypothetical analog signal we&lt;br /&gt;
intend to digitally sample.&lt;br /&gt;
&lt;br /&gt;
The sampling theorem tells us two essential things about the sampling&lt;br /&gt;
process. First, that a digital signal can&#039;t represent any&lt;br /&gt;
frequencies above the Nyquist frequency. Second, and this is the new&lt;br /&gt;
part, if we don&#039;t remove those frequencies with a lowpass filter&lt;br /&gt;
before sampling, the sampling process will fold them down into the&lt;br /&gt;
representable frequency range as aliasing distortion.&lt;br /&gt;
&lt;br /&gt;
Aliasing, in a nutshell, sounds freakin&#039; awful, so it&#039;s essential to&lt;br /&gt;
remove any beyond-Nyquist frequencies before sampling and after&lt;br /&gt;
reconstruction.&lt;br /&gt;
&lt;br /&gt;
Human frequency perception is considered to extend to about 20kHz. In&lt;br /&gt;
44.1 or 48kHz sampling, the lowpass before the sampling stage has to&lt;br /&gt;
be extremely sharp to avoid cutting any audible frequencies below&lt;br /&gt;
20kHz but still not allow frequencies above the Nyquist to leak&lt;br /&gt;
forward into the sampling process.  This is a difficult filter to&lt;br /&gt;
build and no practical filter succeeds completely. If the sampling&lt;br /&gt;
rate is 96kHz or 192kHz on the other hand, the lowpass has an extra&lt;br /&gt;
octave or two for its transition band. This is a much easier filter to&lt;br /&gt;
build.  Sampling rates beyond 48kHz are actually one of those messy&lt;br /&gt;
analog stage compromises.&lt;br /&gt;
&lt;br /&gt;
===sample format===&lt;br /&gt;
&lt;br /&gt;
The second fundamental PCM parameter is the sample format, that is,&lt;br /&gt;
the format of each digital number.  A number is a number, but a number&lt;br /&gt;
can be represented in bits a number of different ways.&lt;br /&gt;
&lt;br /&gt;
Early PCM was eight bit linear, encoded as an unsigned byte.  The&lt;br /&gt;
dynamic range is limited to about 50dB and the quantization noise, as&lt;br /&gt;
you can hear, is pretty severe.  Eight bit audio is vanishingly rare&lt;br /&gt;
today.&lt;br /&gt;
&lt;br /&gt;
Digital telephony typically uses one of two related non-linear eight&lt;br /&gt;
bit encodings called A-law and mu-law. These formats encode a roughly&lt;br /&gt;
14 bit dynamic range into eight bits by spacing the higher amplitude&lt;br /&gt;
values farther apart. A-law and mu-law obviously improve quantization&lt;br /&gt;
noise compared to linear 8-bit, and voice harmonics especially hide&lt;br /&gt;
the remaining quantization noise well. All three eight bit encodings,&lt;br /&gt;
linear, A-law, and mu-law, are typically paired with an 8kHz sampling&lt;br /&gt;
rate, though I&#039;m demonstrating them here at 48kHz.&lt;br /&gt;
&lt;br /&gt;
Most modern PCM uses 16 or 24 bit two&#039;s-complement signed integers to&lt;br /&gt;
encode the range from negative infinity to zero decibels in 16 or 24&lt;br /&gt;
bits of precision. The maximum absolute value corresponds to zero decibels.&lt;br /&gt;
As with all the sample formats so far, signals beyond zero decibels and thus&lt;br /&gt;
beyond the maximum representable range are clipped.&lt;br /&gt;
&lt;br /&gt;
In mixing and mastering, it&#039;s not unusual to use floating point&lt;br /&gt;
numbers for PCM instead of integers.  A 32 bit IEEE754 float, that&#039;s&lt;br /&gt;
the normal kind of floating point you see on current computers, has 24&lt;br /&gt;
bits of resolution, but a seven bit floating point exponent increases&lt;br /&gt;
the representable range.  Floating point usually represents zero&lt;br /&gt;
decibels as +/-1.0, and because floats can obviously represent&lt;br /&gt;
considerably beyond that, temporarily exceeding zero decibels during&lt;br /&gt;
the mixing process doesn&#039;t cause clipping.  Floating point PCM takes&lt;br /&gt;
up more space, so it tends to be used only as an intermediate&lt;br /&gt;
production format.&lt;br /&gt;
&lt;br /&gt;
Lastly, most general purpose computers still read and&lt;br /&gt;
write data in octet bytes, so it&#039;s important to remember that samples&lt;br /&gt;
bigger than eight bits can be in big or little endian order, and both&lt;br /&gt;
endiannesses are common.  For example, Microsoft WAV files are little endian,&lt;br /&gt;
and Apple AIFC files tend to be big-endian.  Be aware of it.&lt;br /&gt;
&lt;br /&gt;
===channels===&lt;br /&gt;
&lt;br /&gt;
The third PCM parameter is the number of channels.  The convention in&lt;br /&gt;
raw PCM is to encode mutiple channels by interleaving the samples of&lt;br /&gt;
each channel together into a single stream.  Straightforward and extensible.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
And that&#039;s it!  That describes every PCM representation ever.  Done.&lt;br /&gt;
Digital audio is _so easy_!  There&#039;s more to do of course, but at this&lt;br /&gt;
point we&#039;ve got a nice useful chunk of audio data, so let&#039;s get some&lt;br /&gt;
video too.&lt;br /&gt;
&lt;br /&gt;
==Video vegetables (they&#039;re good for you!)==&lt;br /&gt;
&lt;br /&gt;
One could think of video as being like audio but with two additional&lt;br /&gt;
spatial dimensions, X and Y, in addition to the dimension of time.&lt;br /&gt;
This is mathematically sound. The Sampling Theorem applies to all&lt;br /&gt;
three video dimensions just as it does the single time dimension of&lt;br /&gt;
audio.&lt;br /&gt;
&lt;br /&gt;
Audio and video are obviously quite different in practice. For one,&lt;br /&gt;
compared to audio, video is huge. Raw CD audio is about 1.4 megabits&lt;br /&gt;
per second. Raw 1080i HD video is over 700 megabits per second. That&#039;s&lt;br /&gt;
more than 500 times more data to capture, process and store per&lt;br /&gt;
second.  By Moore&#039;s law... that&#039;s... let&#039;s see... roughly eight&lt;br /&gt;
doublings times two years, so yeah, computers requiring about an extra&lt;br /&gt;
fifteen years to handle raw video after getting raw audio down pat was&lt;br /&gt;
about right.&lt;br /&gt;
&lt;br /&gt;
Basic raw video is also just more complex than basic raw audio. The&lt;br /&gt;
sheer volume of data currently necessitates a representation more&lt;br /&gt;
efficient than the linear PCM used for audio.  In addition, electronic&lt;br /&gt;
video comes almost entirely from broadcast television alone, and the&lt;br /&gt;
standards committees that govern broadcast video have always been very&lt;br /&gt;
concerned with backward compatability.  Up until just last year in the&lt;br /&gt;
US, a sixty year old black and white television could still show a&lt;br /&gt;
normal analog television broadcast.  That&#039;s actually a really neat&lt;br /&gt;
trick.&lt;br /&gt;
&lt;br /&gt;
The downside to backward compatability is that once a detail makes it&lt;br /&gt;
into a standard, you can&#039;t ever really throw it out again. Electronic&lt;br /&gt;
video has never started over from scratch the way audio has multiple&lt;br /&gt;
times.  Sixty years worth of clever but obsolete hacks necessitated by&lt;br /&gt;
the passing technology of a given era have built up into quite a pile,&lt;br /&gt;
and because digital standards also come from broadcast television, all&lt;br /&gt;
these eldritch hacks have been brought forward into the digital&lt;br /&gt;
standards as well.&lt;br /&gt;
&lt;br /&gt;
In short, there are a whole lot more details involved in digital video&lt;br /&gt;
than there were with audio. There&#039;s no hope of covering them&lt;br /&gt;
all completely here, so we&#039;ll cover the broad fundamentals.&lt;br /&gt;
&lt;br /&gt;
===resolution and aspect===&lt;br /&gt;
&lt;br /&gt;
The most obvious raw video parameters are the width and height of the&lt;br /&gt;
picture in pixels. As simple as that may sound, the pixel dimensions&lt;br /&gt;
alone don&#039;t actually specify the absolute width and height of the&lt;br /&gt;
picture, as most broadcast-derived video doesn&#039;t use square pixels.&lt;br /&gt;
The number of scanlines in a broadcast image was fixed, but the&lt;br /&gt;
effective number of horizontal pixels was a function of channel&lt;br /&gt;
bandwidth. Effective horizontal resolution could result in pixels that&lt;br /&gt;
were either narrower or wider than the spacing between scanlines.&lt;br /&gt;
&lt;br /&gt;
Standards have generally specified that digitally sampled video should&lt;br /&gt;
reflect the real resolution of the original analog source, so a large&lt;br /&gt;
amount of digital video also uses non-square pixels. For example, a&lt;br /&gt;
normal 4:3 aspect NTSC DVD is typically encoded with a display&lt;br /&gt;
resolution of 704 by 480, a ratio wider than 4:3.  In this case, the&lt;br /&gt;
pixels themselves are assigned an aspect ratio of 10:11, making them&lt;br /&gt;
taller than they are wide and narrowing the image horizontally to the&lt;br /&gt;
correct aspect.  Such an image has to be resampled to show properly on&lt;br /&gt;
a digital display with square pixels.&lt;br /&gt;
&lt;br /&gt;
===frame rate and interlacing===&lt;br /&gt;
&lt;br /&gt;
The second obvious video parameter is the frame rate, the number of&lt;br /&gt;
full frames per second.  Several standard frame rates are in active&lt;br /&gt;
use. Digital video, in one form or another, can use all of them.  Or,&lt;br /&gt;
any other frame rate.  Or even variable rates where the frame rate&lt;br /&gt;
changes adaptively over the course of the video. The higher the frame&lt;br /&gt;
rate, the smoother the motion and that brings us, unfortunately, to&lt;br /&gt;
interlacing.&lt;br /&gt;
&lt;br /&gt;
In the very earliest days of broadcast video, engineers sought the&lt;br /&gt;
fastest practical framerate to smooth motion and to minimize flicker&lt;br /&gt;
on phosphor-based CRTs.  They were also under pressure to use the&lt;br /&gt;
least possible bandwidth for the highest resolution and fastest frame&lt;br /&gt;
rate.  Their solution was to interlace the video where the even lines&lt;br /&gt;
are sent in one pass and the odd lines in the next.  Each pass is&lt;br /&gt;
called a field and two fields sort of produce one complete frame.&lt;br /&gt;
&amp;quot;Sort of&amp;quot;, because the even and odd fields aren&#039;t actually from the&lt;br /&gt;
same source frame.  In a 60 field per second picture, the source frame&lt;br /&gt;
rate is actually 60 full frames per second, and half of each frame,&lt;br /&gt;
every other line, is simply discarded.  This is why we can&#039;t&lt;br /&gt;
deinterlace a video simply by combining two fields into one frame;&lt;br /&gt;
they&#039;re not actually from one frame to begin with.&lt;br /&gt;
&lt;br /&gt;
===gamma===&lt;br /&gt;
&lt;br /&gt;
The cathode ray tube was the only available display technology for&lt;br /&gt;
most of the history of electronic video. A CRT&#039;s output brightness is&lt;br /&gt;
nonlinear, approximately equal to the input controlling voltage raised&lt;br /&gt;
to the 2.5th power. This exponent, 2.5, is designated gamma, and so&lt;br /&gt;
it&#039;s often referred to as the gamma of a display.  Cameras, though,&lt;br /&gt;
are linear, and if you feed a CRT a linear input signal, it looks a&lt;br /&gt;
bit like this.&lt;br /&gt;
&lt;br /&gt;
As there were originally to be very few cameras, which were&lt;br /&gt;
fantastically expensive anyway, and hopefully many, many television&lt;br /&gt;
sets which best be as inexpensive as possible, engineers decided to&lt;br /&gt;
add the necessary gamma correction circuitry to the cameras rather&lt;br /&gt;
than the sets. Video transmitted over the airwaves would thus have a&lt;br /&gt;
nonlinear intensity using the inverse of the set&#039;s gamma exponent, so that&lt;br /&gt;
once a camera&#039;s signal was finally displayed on the CRT, the overall&lt;br /&gt;
response of the system from camera to set was back to linear again.&lt;br /&gt;
&lt;br /&gt;
Almost.&lt;br /&gt;
&lt;br /&gt;
There were also two other tweaks. A television camera actually uses a&lt;br /&gt;
gamma exponent that&#039;s the inverse of 2.2, not 2.5.  That&#039;s just a&lt;br /&gt;
correction for viewing in a dim environment. Also, the exponential&lt;br /&gt;
curve transitions to a linear ramp near black.  That&#039;s just an old&lt;br /&gt;
hack for suppressing sensor noise in the camera.&lt;br /&gt;
&lt;br /&gt;
Gamma correction also had a lucky benefit. It just so happens that the&lt;br /&gt;
human eye has a perceptual gamma of about 3.  This is relatively close&lt;br /&gt;
to the CRT&#039;s gamma of 2.5. An image using gamma correction devotes&lt;br /&gt;
more resolution to lower intensities, where the eye happens to have&lt;br /&gt;
its finest intensity discrimination, and therefore uses the available&lt;br /&gt;
scale resolution more efficiently.  Although CRTs are currently&lt;br /&gt;
vanishing, a standard sRGB computer display still uses a nonlinear&lt;br /&gt;
intensity curve similar to television, with a linear ramp near black,&lt;br /&gt;
followed by an exponential curve with a gamma exponent of 2.4. This&lt;br /&gt;
encodes a sixteen bit linear range down into eight bits.&lt;br /&gt;
&lt;br /&gt;
===color and colorspace===&lt;br /&gt;
&lt;br /&gt;
The human eye has three apparent color channels, red, green, and blue,&lt;br /&gt;
and most displays use these three colors as additive primaries to&lt;br /&gt;
produce a full range of color output.  The primary pigments in&lt;br /&gt;
printing are Cyan, Magenta, and Yellow for the same reason; pigments&lt;br /&gt;
are subtractive, and each of these pigments subtracts one pure color&lt;br /&gt;
from reflected light.  Cyan subtracts red, magenta subtracts green, and&lt;br /&gt;
yellow subtracts blue.&lt;br /&gt;
&lt;br /&gt;
Video can be and sometimes is represented with red, green, and blue&lt;br /&gt;
color channels, but RGB video is atypical. The human eye is far more&lt;br /&gt;
sensitive to luminosity than it is the color, and RGB tends to spread&lt;br /&gt;
the energy of an image across all three color channels.  That is, the&lt;br /&gt;
red plane looks like a red version of the original picture, the green&lt;br /&gt;
plane looks like a green version of the original picture, and the blue&lt;br /&gt;
plane looks like a blue version of the original picture.  Black and&lt;br /&gt;
white times three.  Not efficient.&lt;br /&gt;
&lt;br /&gt;
For those reasons and because, oh hey, television just happened to&lt;br /&gt;
start out as black and white anyway, video usually is represented as a&lt;br /&gt;
high resolution luma channel, the black &amp;amp; white, along with&lt;br /&gt;
additional, often lower resolution chroma channels, the color. The&lt;br /&gt;
luma channel, Y, is produced by weighting and then adding the seperate&lt;br /&gt;
red, green and blue signals.  The chroma channels U and V are then&lt;br /&gt;
produced by subtracting the luma signal from blue and the luma signal&lt;br /&gt;
from red.&lt;br /&gt;
&lt;br /&gt;
When YUV is scaled, offset and quantized for digital video, it&#039;s&lt;br /&gt;
usually more correctly called Y&#039;CbCr, but the more generic term YUV is&lt;br /&gt;
widely used to decribe all the analog and digital variants of this&lt;br /&gt;
color model.&lt;br /&gt;
&lt;br /&gt;
===chroma subsampling===&lt;br /&gt;
&lt;br /&gt;
The U and V chroma channels can have the same resolution as the Y&lt;br /&gt;
channel, but because the human eye has far less spatial color&lt;br /&gt;
resolution than spatial luminosity resolution, chroma resolution is&lt;br /&gt;
usually halved or even quartered in the horizontal direction, the&lt;br /&gt;
vertical direction, or both, usually without any significant impact on the&lt;br /&gt;
apparent raw image quality.  Practically every possible subsampling&lt;br /&gt;
variant has been used at one time or another, but the common choices&lt;br /&gt;
today are 4:4:4 video, which isn&#039;t actually subsampled at all, 4:2:2 video in&lt;br /&gt;
which the horizontal resolution of the U and V channels is halved, and&lt;br /&gt;
most common of all, 4:2:0 video in which both the horizonal and vertical&lt;br /&gt;
resolutions of the chroma channels are halved, resulting in U and V&lt;br /&gt;
planes that are each one quarter the size of Y.&lt;br /&gt;
&lt;br /&gt;
The terms 4:2:2, 4:2:0, 4:1:1 and so on and so forth, aren&#039;t complete&lt;br /&gt;
descriptions of a chroma subsampling. There&#039;s multiple possible ways&lt;br /&gt;
to position the chroma pixels relative to luma, and again, several&lt;br /&gt;
variants are in active use for each subsampling.  For example, motion&lt;br /&gt;
JPEG, MPEG-1 video, MPEG-2 video, DV, Theora and WebM all use or can&lt;br /&gt;
use 4:2:0 subsampling, but they site the chroma pixels three different&lt;br /&gt;
ways.&lt;br /&gt;
&lt;br /&gt;
Motion JPEG, MPEG1 video, Theora and WebM all site chroma pixels&lt;br /&gt;
between luma pixels both horizontally and vertically.&lt;br /&gt;
&lt;br /&gt;
MPEG2 sites chroma pixels betwwen lines, but horizontally aligned with&lt;br /&gt;
every other luma pixel. Interlaced modes complicate things somewhat,&lt;br /&gt;
resulting in a siting arrangement that&#039;s a tad bizaare.&lt;br /&gt;
&lt;br /&gt;
And finally PAL-DV, which is always interlaced, places the chroma&lt;br /&gt;
pixels in the same position as every other luma pixel in the&lt;br /&gt;
horizonatal direction, and vertically alternates chroma channel on&lt;br /&gt;
each line.&lt;br /&gt;
&lt;br /&gt;
That&#039;s just 4:2:0 video. I&#039;ll leave the other subsamplings as homework for the&lt;br /&gt;
viewer.  Got the basic idea, moving on.&lt;br /&gt;
&lt;br /&gt;
===pixel formats===&lt;br /&gt;
&lt;br /&gt;
In audio, we always represent multiple channels in a PCM stream by&lt;br /&gt;
interleaving the samples from each channel in order. Video uses both&lt;br /&gt;
packed formats that interleave the color channels, as well as planar&lt;br /&gt;
formats that keep the pixels from each channel together in seperate&lt;br /&gt;
planes stacked in order in the frame. There are at least 50 different formats in&lt;br /&gt;
these two broad categories with possibly ten or fifteen in common use.&lt;br /&gt;
&lt;br /&gt;
Each chroma subsampling and different bit-depth requires a different&lt;br /&gt;
packing arrangement, and so a different pixel format.  For a given&lt;br /&gt;
unique subsampling, there are usually also several equivalent formats&lt;br /&gt;
that consist of trivial channel order rearrangements or repackings due either to&lt;br /&gt;
convenience once-upon-a-time on some particular piece of hardware or&lt;br /&gt;
sometimes just good old-fashioned spite.&lt;br /&gt;
&lt;br /&gt;
Pixels formats are described by a unique name or fourcc code.  There&lt;br /&gt;
are quite a few of these and there&#039;s no sense going over each one now.&lt;br /&gt;
Google is your friend.  Be aware that fourcc codes for raw video&lt;br /&gt;
specify the pixel arrangement and chroma subsampling, but generally&lt;br /&gt;
don&#039;t imply anything certain about chroma siting or color space.  YV12&lt;br /&gt;
video to pick one, can use JPEG, MPEG-2 or DV chroma siting, and any&lt;br /&gt;
one of several YUV colorspace definitions.&lt;br /&gt;
&lt;br /&gt;
===done!===&lt;br /&gt;
&lt;br /&gt;
That wraps up our not so quick and yet very incomplete tour of raw&lt;br /&gt;
video. The good news is we can already get quite alot of real work&lt;br /&gt;
done using that overview. In plenty of situations, a frame of video&lt;br /&gt;
data is a frame of video data.  The details matter, greatly, when it&lt;br /&gt;
come time to write software, but for now I am satisfied that the&lt;br /&gt;
esteemed viewer is broadly aware of the relevant issues.&lt;br /&gt;
&lt;br /&gt;
==Containers==&lt;br /&gt;
So. We have audio data. We have video data. What remains is the more&lt;br /&gt;
familiar non-signal data and straight up engineering that software&lt;br /&gt;
developers are used to, and plenty of it.&lt;br /&gt;
&lt;br /&gt;
Chunks of raw audio and video data have no externally visible&lt;br /&gt;
structure, but they&#039;re often uniformly sized.  We could just string&lt;br /&gt;
them together in a rigid pre-determined ordering for streaming and&lt;br /&gt;
storage, and some simple systems do approximately that. Compressed&lt;br /&gt;
frames though aren&#039;t necessarily a predictable size, and we usually want&lt;br /&gt;
some flexibility in using a range of different data types in streams.&lt;br /&gt;
If we string random formless data together, we lose the boundaries&lt;br /&gt;
that seperate frames and don&#039;t necessarily know what data belongs to&lt;br /&gt;
which streams.  A stream needs some generalized structure to be&lt;br /&gt;
generally useful.&lt;br /&gt;
&lt;br /&gt;
In addition to our signal data, we also have our PCM and video&lt;br /&gt;
parameters.  There&#039;s probably plenty of other metadata we also want to&lt;br /&gt;
deal with, like audio tags and video chapters and subtitles, all&lt;br /&gt;
essential components of rich media.  It makes sense to place this&lt;br /&gt;
metadata, that is,  data about the data, within the media itself.&lt;br /&gt;
&lt;br /&gt;
Storing and structuring formless data and disparate metadata is the&lt;br /&gt;
job of a container.  Containers provide framing for the data blobs,&lt;br /&gt;
interleave and identify mutliple data streams, provide timing&lt;br /&gt;
information, and store the metadata necessary to parse, navigate,&lt;br /&gt;
manipulate and present the media.  In general, any container can hold&lt;br /&gt;
any kind of data.  And data can be put into any container.&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
In the past thirty minutes, we&#039;ve covered digital audio, video, some&lt;br /&gt;
history, some math and a little engineering. We&#039;ve barely scratched the&lt;br /&gt;
surface, but it&#039;s time for a well earned break.&lt;br /&gt;
&lt;br /&gt;
There&#039;s so much more to talk about, so I hope you&#039;ll join me again in&lt;br /&gt;
our next episode.  Until then--- Cheers!&lt;br /&gt;
&lt;br /&gt;
Written by:&lt;br /&gt;
Christopher (Monty) Montgomery&lt;br /&gt;
and the Xiph.Org Community&lt;br /&gt;
&lt;br /&gt;
Intro, title and credits music:&lt;br /&gt;
&amp;quot;Boo Boo Coming&amp;quot;, by Joel Forrester&lt;br /&gt;
Performed by the Microscopic Septet&lt;br /&gt;
Used by permission of Cuneiform Records.&lt;br /&gt;
Original source track All Rights Reserved.&lt;br /&gt;
www.cuneiformrecords.com&lt;br /&gt;
&lt;br /&gt;
This Video Was Produced Entirely With Free and Open Source Software&lt;br /&gt;
&lt;br /&gt;
GNU&lt;br /&gt;
Linux&lt;br /&gt;
Fedora&lt;br /&gt;
Cinelerra&lt;br /&gt;
The Gimp&lt;br /&gt;
Audacity&lt;br /&gt;
Postfish&lt;br /&gt;
Gstreamer&lt;br /&gt;
&lt;br /&gt;
CC BY-NC-SA&lt;br /&gt;
A Co-Production of Xiph.Org and Red Hat Inc.&lt;br /&gt;
(C) 2010, Some Rights Reserved&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The making of…==&lt;br /&gt;
&lt;br /&gt;
===Equipment===&lt;br /&gt;
&lt;br /&gt;
Video:&lt;br /&gt;
Canon HV40 HDV camera w/ wide-angle lens operating in 24F shutter priority mode on a Tiltall tripod&lt;br /&gt;
Shutter set to Tv &amp;quot;24&amp;quot;, then exposure and white balance both calibrated to the white board (or a white piece of paper) and locked&lt;br /&gt;
Autofocus used for whiteboard scenes, manual focus used for several workshop scenes as the autofocus tended to hunt continuously in very low light.&lt;br /&gt;
No additional lighting kit.&lt;br /&gt;
&lt;br /&gt;
Audio:&lt;br /&gt;
Two Crown PCC160 boundary microphones placed on a table approximately 4-8 feet in front of the speaker&lt;br /&gt;
Portable Behringer (yes, ugh) 602 board used as microphone preamp, stereo output fed directly into HDV camera&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Realtime capture to laptop via firewire using a gstreamer script&lt;br /&gt;
&lt;br /&gt;
Audio was postprocessed in Postfish using the declip (for the occasional overrange oops), deverb (remove room reverberation), multicompand (for noise gating), single compand (for volume levelling) and EQ filters. &lt;br /&gt;
&lt;br /&gt;
Video was postprocessed in gstreamer to scale and adjust saturation, gamma, and contrast.&lt;br /&gt;
&lt;br /&gt;
Takes were clipped out of the long HDV grabs using a patched Cinelerra and exported to working source files in raw YUV4MPEG and WAV format.  Editing was done on these raw intermediates in Cinelerra.&lt;br /&gt;
&lt;br /&gt;
Some video effects were performed by small one-off C programs that operated directly on the source clips; these effects were then stitched into the main video using Cinelerra.&lt;br /&gt;
&lt;br /&gt;
Text, caption overlays, and still conversions perfromed in the Gimp.&lt;br /&gt;
&lt;br /&gt;
Final renders encoded using libtheora (ptalarbvorm) and libvorbis via the Theora &#039;example encoder&#039;.&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=IDABC_Questionnaire_2009&amp;diff=10742</id>
		<title>IDABC Questionnaire 2009</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=IDABC_Questionnaire_2009&amp;diff=10742"/>
		<updated>2009-11-24T22:33:13Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Meeting and consultation */ &amp;quot;limited abilities&amp;quot; almost certainly means impairment here; changing language.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;This is a draft document. A work in progress. A scratchpad for ideas. It should not be widely circulated in this form.&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Context =&lt;br /&gt;
We received [http://lists.xiph.org/pipermail/theora/2009-November/002996.html an e-mail] from a consultant studying the suitability of Theora for use in &amp;quot;eGovernment&amp;quot;, on behalf of the [http://ec.europa.eu/idabc/ IDABC], an EU governmental agency responsible for &amp;quot;Interoperability&amp;quot; with an emphasis on open source.  The investigation is in the context of [http://ec.europa.eu/idabc/en/document/7728 European Interoperability Framework], about which there has been [http://www.computerworlduk.com/community/blogs/index.cfm?entryid=2620&amp;amp;blogid=14&amp;amp;pn=1 some real controversy].&lt;br /&gt;
&lt;br /&gt;
The method of assessment is the Common Assessment Method for Standards and Specifications, including the questions below.&lt;br /&gt;
&lt;br /&gt;
= CAMSS Questions =&lt;br /&gt;
== Part 4: Market Criteria ==&lt;br /&gt;
&lt;br /&gt;
This group of Market criteria analyses the formal specification in the scope of its market environment, and more precisely it examines the implementations of the formal specification and the market players. This implies identifying to which extent the formal specification benefits from market support and wide adoption, what are its level of maturity and its capacity of reusability.&lt;br /&gt;
&lt;br /&gt;
Market support is evaluated through an analysis of how many products implementing the formal specification exist, what their market share is and who their end-users are. The quality and the completeness (in case of partitioning) of the implementations of the formal specification can also be analysed. Availability of existing or planned mechanisms to assess conformity of implementations to the standard or to the specification could also be identified. The existence of at least one reference implementation (i.e.: mentioning a recognized certification process) - and of which one is an open source implementation - can also be relevant to the assessment. Wide adoption can also be assessed across domains (i.e.: public and private sectors), in an open environment, and/or in a similar field (i.e.: best practices).&lt;br /&gt;
&lt;br /&gt;
A formal specification is mature if it has been in use and development for long enough that most of its initial problems have been overcome and its underlying technology is well understood and well defined. Maturity is also assessed by identifying if all aspects of the formal specification are considered as validated by usage, (i.e.: if the formal specification is partitioned), and if the reported issues have been solved and documented.&lt;br /&gt;
&lt;br /&gt;
Reusability of a formal specification is enabled if it includes guidelines for its implementation in a given context. The identification of successful implementations of the standard or specification should focus on good practices in a similar field. Its incompatibility with related standards or specifications should also be taken into account.&lt;br /&gt;
&lt;br /&gt;
The ideas behind the Market Criteria can also be expressed in the form of the following questions:&lt;br /&gt;
&lt;br /&gt;
=== Market support ===&lt;br /&gt;
* Does the standard have strong support in the marketplace? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  For example, among web browsers, support for Xiph&#039;s Ogg, Theora, and Vorbis standards is now included by default in Mozilla Firefox, Google Chrome, and the latest versions of Opera, representing hundreds of millions of installed users just in this market alone. On Windows, DirectShow filters exist which also enable all Windows applications that use the DirectShow framework to use Xiph&#039;s Ogg, Theora, and Vorbis standards. A QuickTime component exists which enables use of Xiph&#039;s Ogg, Theora, and Vorbis standards in all Mac OS X applications that make use of the QuickTime framework---which includes Safari/Webkit, iMovie, QuickTime, and many others.&#039;&#039;&#039;&lt;br /&gt;
* What products exist for this formal specification ? &lt;br /&gt;
: &#039;&#039;&#039;Theora is a video codec, and as such the required products are encoders, decoders, and transmission systems.  All three types of products are widely available for Theora.&#039;&#039;&#039;&lt;br /&gt;
* How many implementations of the formal specification are there? &lt;br /&gt;
: &#039;&#039;&#039;Xiph does not require implementors to acquire any license before implementing the specification.  Therefore, we do not have a definitive count of the number of implementations.  In addition to the reference implementation, which has been ported to most modern platforms and highly optimized for x86 and ARM CPUs and TI C64x+ DSPs, we are aware of a number of independent, conformant or mostly-conformant implementations.  These include two C decoders ([http://ffmpeg.org/ FFmpeg] and [http://sourceforge.jp/projects/qtheora/ QTheora]), a Java decoder ([http://www.theora.org/cortado/ Jheora]), a [http://www.wreckedgames.com/downloads/cSharpTheora.zip C# decoder], an [http://svn.xiph.org/trunk/theora-fpga/ FPGA decoder], and an [http://sourceforge.net/projects/elphel/ FPGA encoder].&#039;&#039;&#039;&lt;br /&gt;
* Are there products from different suppliers in the market that implement this formal specification ? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Corporations such as Atari, Canonical, DailyMotion, Elphel, Fluendo, Google, Mozilla, Novell, Opera, Red Hat, Sun Microsystems, and Ubisoft have supplied products with an implementation of the Theora standard.&#039;&#039;&#039;&lt;br /&gt;
* Are there many products readily available from a variety of suppliers? &lt;br /&gt;
: &#039;&#039;&#039;Yes. Theora has been deployed in embedded devices, security cameras, video games, video conferencing systems, web browsers, home theater systems, and many other products.  A complete, legal, open-source reference implementation can also be downloaded free of charge, including components for all major media frameworks (DirectShow, gstreamer, and Quicktime), giving most applications the ability to use the codec.&#039;&#039;&#039;&lt;br /&gt;
* What is the market share of the products implementing the formal specification, versus other implementations of competing formal specifications ? &lt;br /&gt;
: &#039;&#039;&#039;Theora playback is extremely widely available, covering virtually the entire market of personal computers.  Theora is also increasingly available in mobile and embedded devices.  Since we do not require licensing for products that implement the specification, we do not have market share numbers that can be compared with competing formal specifications.  Because implementations are readily available and free, Theora is included in many products that support multiple codecs, and is sometimes the only video codec included in free software products.&#039;&#039;&#039;&lt;br /&gt;
* Who are the end-users of these products implementing the formal specification?&lt;br /&gt;
: &#039;&#039;&#039;The end users are television viewers, video gamers, web surfers, movie makers, business people, video distribution services, and anyone else who interacts with moving pictures.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Maturity ===&lt;br /&gt;
* Are there any existing or planned mechanisms to assess conformity of the implementations of the formal specification? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  In addition to a continuous peer review process, we maintain a suite of [http://v2v.cc/~j/theora_testsuite/ test vectors] that allow implementors to assess decoder conformity. We also provide free online developer support and testing for those attempting to make a conforming implementation. An [http://validator.xiph.org/ online validation service] is available.&#039;&#039;&#039;&lt;br /&gt;
* Is there a reference implementation (i.e.: mentioning a recognized certification process)? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph maintains a reference implementation called [http://downloads.xiph.org/releases/theora/ libtheora].  In addition to serving as a reference, libtheora is also highly optimized to achieve the maximum possible speed, accuracy, reliability, efficiency, and video quality.  As a result, many implementors of Theora adopt the reference implementation.&#039;&#039;&#039;&lt;br /&gt;
* Is there an open source implementation? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  libtheora is made available under a permissive BSD-like license.  Its open-source nature also contributes to its quality as a reference implementation, as implementors are welcome to contribute their improvements to the reference.  There are also several other open source implementations in addition to the reference.&#039;&#039;&#039;&lt;br /&gt;
* Does the formal specification show wide adoption? &lt;br /&gt;
** across different domains? (I.e.: public and private) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  In addition to the private companies mentioned in the previous section, Theora has also been specified as the sole format supported by non-profit organizations such as Wikipedia, currently the 6th largest website in the world, and as one of a small number of preferred formats supported by other public institutions, such as the Norwegian government.&#039;&#039;&#039;&lt;br /&gt;
** in an open environment? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  On open/free operating systems such as those distributed by Novell/SuSE, Canonical, and Red Hat, Theora is the primary default video codec.&#039;&#039;&#039;&lt;br /&gt;
** in a similar field? (i.e.: can best practices be identified?) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Most prominently, Theora has been used for eGovernment video distribution in the United States at [http://metavid.org Metavid].  Metavid is the most comprehensive, interactive archive of video footage from the United States legislature.  Metavid not only distributes video; they also enable citizen engagement by allowing them to annotate videos and correct transcripts.  Metavid distributes its entire archive in Theora format.  Metavid&#039;s source code is entirely open and reusable for any purpose, providing instant access to best practices for eGovernment with Theora.  Metavid&#039;s video display component is also available separately as [http://metavid.org/wiki/Mv_Embed mv_embed], which provides reusable best practices for easy Theora display on the web.&#039;&#039;&#039;&lt;br /&gt;
: &#039;&#039;&#039;Another important user of Theora is Wikipedia, which distributes video exclusively in Theora format.  Wikipedia&#039;s best practices for Theora distribution are encapsulated in [http://www.mediawiki.org/wiki/Extension:OggHandler OggHandler], which can be freely reused by anyone using the open-source MediaWiki software.&#039;&#039;&#039;&lt;br /&gt;
* Has the formal specification been in use and development long enough that most of its initial problems have been overcome? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Theora was derived from VP3, which was originally released in May 2000.  The Theora specification was completed in 2004.  Theora has now been used in a wide variety of applications, on the full spectrum of computing devices.&#039;&#039;&#039;&lt;br /&gt;
* Is the underlying technology of the standard well-understood? (e.g., a reference model is well defined, appropriate concepts of the technology are in widespread use, the technology may have been in use for many years, a formal mathematical model is defined, etc.) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  The underlying technology has been in use for nearly a decade, and most of the concepts have been in widespread use for even longer.&#039;&#039;&#039;&lt;br /&gt;
* Is the formal specification based upon technology that has not been well-defined and may be relatively new? &lt;br /&gt;
: &#039;&#039;&#039;No.  The formal specification is based on technology from the On2 VP3 codec, which is substantially similar to simple block-transform codecs like H.261. This class of codecs is extremely well understood, and has been in active use for over 20 years.&#039;&#039;&#039;&lt;br /&gt;
* Has the formal specification been revised? (Yes/No, Nof) &lt;br /&gt;
: &#039;&#039;&#039;The formal specification of the Theora decoder has been stable for years.  However, the text of the specification is continuously revised, based on user feedback, to improve the clarity and accuracy of the description of the technology.&#039;&#039;&#039;&lt;br /&gt;
* Is the formal specification under the auspices of an architectural board? (Yes/No) &lt;br /&gt;
: &#039;&#039;&#039;No.  Although officially maintained by the Xiph.Org Foundation, anyone is free to join this organization, and one need not even be a member to make contributions. However, the core developers will review contributions and make sure they do not contradict the general architecture and they work well with the existing code and the test cases.&#039;&#039;&#039;&lt;br /&gt;
* Is the formal specification partitioned in its functionality? (Yes/No) &lt;br /&gt;
: &#039;&#039;&#039;No.  Theora is very deliberately not partitioned, to avoid the confusion created by a &amp;quot;standard&amp;quot; composed of many incompatible &amp;quot;profiles&amp;quot;.  The Theora standard does not have any optional components.  A compliant Theora decoder can correctly process any Theora stream.&#039;&#039;&#039;&lt;br /&gt;
** To what extent does each partition participate to its overall functionality? (NN%) &lt;br /&gt;
: &#039;&#039;&#039;N/A.&#039;&#039;&#039;&lt;br /&gt;
** To what extent is each partition implemented? (NN%) (cf market adoption)&lt;br /&gt;
: &#039;&#039;&#039;N/A.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Re-usability === &lt;br /&gt;
* Does the formal specification provide guidelines for its implementation in a given organisation? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  For example, [http://theora.org/doc/Theora.pdf the Theora specification] provides &amp;quot;non-normative&amp;quot; advice and explanation for implementors of Theora decoders and encoders, including example algorithms for implementing required mathematical transforms.  Xiph also maintains [http://wiki.xiph.org/Main_Page a documentation base] for implementors who desire more guidelines beyond the specification itself.&#039;&#039;&#039;&lt;br /&gt;
* Can other cases where similar systems implement the formal specification be considered as successful implementations and good practices? &lt;br /&gt;
: &#039;&#039;&#039;Xiph&#039;s standards have successfully been implemented by many organisations in a wide variety of environments.  We maintain (non-exhaustive) [http://wiki.xiph.org/TheoraSoftwarePlayers lists] of products which implement Theora support, many of them open source, so that others may use them as a reference when preparing their own products.&#039;&#039;&#039;&lt;br /&gt;
* Is its compatibility with related formal specification documented?&lt;br /&gt;
: &#039;&#039;&#039;Yes.  For example, [http://theora.org/doc/Theora.pdf the Theora specification] also documents the use of Theora within the [http://www.ietf.org/rfc/rfc3533.txt standard Ogg encapsulation format], and the [http://svn.xiph.org/trunk/theora/doc/draft-ietf-avt-rtp-theora-00.txt TheoraRTP draft specification] explains how to transmit Theora using the [http://tools.ietf.org/html/rfc3550 RTP standard].  In addition, the specification documents Theora&#039;s compatibility with ITU-R B.470, ITU-R B.601, ITU-R B.709, SMPTE-170M, [http://tools.ietf.org/html/rfc2044 UTF-8], ISO 10646, and [http://www.xiph.org/vorbis/doc/Vorbis_I_spec.pdf Ogg Vorbis].&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Part 5: Standardisation Criteria == &lt;br /&gt;
From Idabc-camss&lt;br /&gt;
&lt;br /&gt;
Note: Throughout this section, “Organisation” refers to the standardisation/fora/consortia body in charge of the formal specification.&lt;br /&gt;
&lt;br /&gt;
Significant characteristics of the way the organisation operates are for example the way it gives the possibility to stakeholders to influence the evolution of the formal specification, or which conditions it attaches to the use of the formal specification or its implementation. Moreover, it is important to know how the formal specification is defined, supported, and made available, as well as how interaction with stakeholders is managed by the organisation during these steps. Governance of interoperability testing with other formal specifications is also indicative.&lt;br /&gt;
&lt;br /&gt;
The standardisation criteria analyses therefore the following elements:&lt;br /&gt;
&lt;br /&gt;
=== Availability of Documentation ===&lt;br /&gt;
The availability of documentation criteria is linked to cost and online availability. Access to all preliminary results documentation can be online, online for members only, offline, offline, for members only or not available. Access can be free or for a fee (which fee?).&lt;br /&gt;
: &#039;&#039;&#039;Every Xiph standard is permanently available online to everyone at no cost.  For example, we invite everyone to download [http://theora.org/doc/Theora.pdf the most up-to-date copy of the Theora specification], and [http://xiph.org/vorbis/doc/Vorbis_I_spec.html the latest revision of Vorbis].  All previous revisions are available from Xiph&#039;s [http://svn.xiph.org/ revision control system].&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Intellectual Property Right ===&lt;br /&gt;
The Intellectual Property Rights evaluation criteria relates to the ability for implementers to use the formal specification in products without legal or financial implications. The IPR policy of the organisation is therefore evaluated according to: &lt;br /&gt;
* the availability of the IPR or copyright policies of the organisation (available on-line or off-line, or not available);&lt;br /&gt;
: &#039;&#039;&#039;The reference implementations of each codec include all necessary IPR and copyright licenses for that codec, including all documentation, and are freely available to everyone.&#039;&#039;&#039;&lt;br /&gt;
* the organisation’s governance to disclose any IPR from any contributor (ex-ante, online, offline, for free for all, for a fee for all, for members only, not available);&lt;br /&gt;
: &#039;&#039;&#039;Xiph does not require the identification of specific patents that may be required to implement a standard; however, it does require an open-source compatible, royalty free license from a contributor for any such patents they may own before the corresponding technology can be included in a standard. These licenses are made available online, for free, to all parties.&#039;&#039;&#039;&lt;br /&gt;
* the level of IPR set &amp;quot;mandatory&amp;quot; by the organisation (no patent, royalty free patent, patent and RAND with limited liability , patent and classic RAND, patent with explicit licensing, patent with defensive licensing, or none); &lt;br /&gt;
: &#039;&#039;&#039;All standards, specifications, and software published by the Xiph.Org Foundation are required to have &amp;quot;open-source compatible&amp;quot; IPR.  This means that a contribution must either be entirely clear of any known patents, or any patents that read upon the contribution must be available under a transferable, irrevocable public nonassertion agreement to all people everywhere.  For example, see [http://svn.xiph.org/trunk/theora/LICENSE our On2 patent nonassertion warrant].  Other common &amp;quot;royalty free&amp;quot; patent licenses are either not transferable, are revocable under certain conditions (such as patent infringement litigation against the originating party), or otherwise impose restrictions that would prevent distribution under common [http://www.opensource.org/ OSI]-approved licenses.  These would not be acceptable.&#039;&#039;&#039;&lt;br /&gt;
* the level of IPR &amp;quot;recommended&amp;quot; by the organisation (no patent, royalty free patent, patent and RAND with limited liability, patent and classic RAND, patent with explicit licensing, patent with defensive licensing, or none). [Note: RAND (Reasonable and Non Discriminatory License) is based on a &amp;quot;fairness&amp;quot; concept. Companies agree that if they receive any patents on technologies that become essential to the standard then they agree to allow other groups attempting to implement the standard to use these patents and they agree that the charges for the patents shall be reasonable. &amp;quot;RAND with limited availability&amp;quot; is a version of RAND where the &amp;quot;reasonable charges&amp;quot; have an upper limit.]&lt;br /&gt;
: &#039;&#039;&#039;Xiph&#039;s recommended IPR requirements are the same as our mandatory requirements.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Accessibility ===&lt;br /&gt;
&lt;br /&gt;
The accessibility evaluation criteria describe the importance of equal and safe accessibility by the users of implementations of formal specifications. This aspect can be related to safety (physical safety and conformance safety) and accessibility of physical impaired people (design for all).&lt;br /&gt;
&lt;br /&gt;
Focus is made particularly on accessibility and conformance safety. Conformance testing is testing to determine whether a system meets some specified formal specification. The result can be results from a test suite. Conformance validation is when the conformance test uniquely qualifies a given implementation as conformant or not. Conformance certification is a process that provides a public and easily visible &amp;quot;stamp of approval&amp;quot; that an implementation of a standard validates as conformant.&lt;br /&gt;
&lt;br /&gt;
The following questions allow an assessment of accessibility and conformance safety: &lt;br /&gt;
* Does a mechanism that ensures disability support by a formal specification exist? (Y/N) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph ensures support for users with disabilities by providing specifications for accessible technologies independent of the codec itself.  Notably, the Xiph [http://wiki.xiph.org/OggKate OggKate] codec for time-aligned text and image content provides support for subtitles for internationalisation, captions for the hearing-impaired, and textual audio descriptions for the visually impaired. Further, Ogg supports multiple tracks of audio and video content in one container, such that sign language tracks and audio descriptions can be included in one file. For this to work, Xiph has defined [http://wiki.xiph.org/Ogg_Skeleton Skeleton] which holds metadata about each track encapsulated within a single Ogg file. When Theora is transmitted or stored in an Ogg container, it is automatically compatible with these accessibility measures.&#039;&#039;&#039;&lt;br /&gt;
* Is conformance governance always part of a standard? (Y/N) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph&#039;s standards always precisely specify the requirements that an implementation must meet in order to be considered conformant.&#039;&#039;&#039;&lt;br /&gt;
* Is a conformance test offered to implementers? (Y/N) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph maintains a freely available suite of [http://v2v.cc/~j/theora_testsuite/ test vectors] and an [http://validator.xiph.org online validation service] that can be used by anyone to check confirm basic conformance, in addition to tools such as the oggz-validate program included with liboggz, which has been widely used for conformance testing.&#039;&#039;&#039;&lt;br /&gt;
* Is conformance validation available to implementers? (Y/N) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Informal conformance testing is available to implementors upon request, and Xiph has provided such testing for a number of implementations in the past.&#039;&#039;&#039;&lt;br /&gt;
* Is conformance certification available? (Y/N) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph does not require certification, but maintains the right to withhold the use of our trademarks from implementors who act in bad faith.  Implementors may, however, request explicit permission to use our trademarks with a conforming implementation.&#039;&#039;&#039;&lt;br /&gt;
* Is localisation of a formal specification possible? (Y/N)&lt;br /&gt;
: &#039;&#039;&#039;Yes.  We welcome anyone who wishes to translate Xiph specifications into other languages.  We have no policy requiring that the normative specification be written in English.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Interoperability governance === &lt;br /&gt;
The interoperability governance evaluation criteria relates to how interoperability is identified and maintained between interoperable formal specifications. In order to do this, the organisation may provide governance for: &lt;br /&gt;
* open identification in formal specifications, &lt;br /&gt;
: &#039;&#039;&#039;Yes.  The Xiph codecs can be precisely identified by [http://wiki.xiph.org/index.php/MIMETypesCodecs their MIME types], as formally defined by [http://tools.ietf.org/html/rfc5334 IETF RFC 5334], an open specification.&#039;&#039;&#039;&lt;br /&gt;
* open negotiation in formal specifications,&lt;br /&gt;
: &#039;&#039;&#039;Yes.  For example, a [http://tools.ietf.org/html/draft-barbato-avt-rtp-theora-01 draft RTP specification] describes how Theora interoperates with the [http://tools.ietf.org/html/rfc3264 Session Description Protocol] (SDP), a mechanism for negotiating the parameters of RTP sessions.&#039;&#039;&#039;&lt;br /&gt;
* open selection in formal specifications.&lt;br /&gt;
: &#039;&#039;&#039;Yes. There are many open specifications that provide a mechanism for selecting Theora from among many codecs.  One such specification is [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#video HTML 5 video], which allows the user agent to select Theora based on its MIME type, using [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#the-source-element the source element].&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Meeting and consultation ===&lt;br /&gt;
The meeting and consultation evaluation criteria relates to the process of defining a formal specification. As formal specifications are usually defined by committees, and these committees normally consist of members of the organisation, this criteria studies how to become a member and which are the financial barriers for this, as well as how are non-members able to have an influence on the process of defining the formal specification. It analyses: &lt;br /&gt;
* if the organisation is open to all types of companies and organisations and to individuals; &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph welcomes representatives from all companies and organizations as well as all individuals.&#039;&#039;&#039;&lt;br /&gt;
* if the standardisation process may specifically allow participation of members with limited abilities when relevant; &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Standardization occurs almost entirely in text-based, accessible internet communications channels, allowing participants with disabilities to engage fully in the standards development process.&#039;&#039;&#039;&lt;br /&gt;
* if meetings are open to all members;&lt;br /&gt;
: &#039;&#039;&#039;Xiph meetings are open to everyone.  We charge no fee for and place no restrictions on attendance or participation.  For example, anyone interested in contributing to the Theora specification may join [http://lists.xiph.org/pipermail/theora-dev/ the Theora development mailing list].&#039;&#039;&#039;&lt;br /&gt;
* if all can participate in the formal specification creation process; &lt;br /&gt;
: &#039;&#039;&#039;Yes.  All people are welcome to participate in the specification creation process.  No dues or fees are required to participate.&#039;&#039;&#039;&lt;br /&gt;
* if non-members can participate in the formal specification creation process.&lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph does not maintain an explicit list of members, and no one is excluded from contributing to specifications as they are developed.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Consensus ===&lt;br /&gt;
Consensus is decision making primarily with regard to the approval of formal specifications and review with interest groups (non-members). The consensus evaluation criterion is evaluated with the following questions:&lt;br /&gt;
* Does the organisation have a stated objective of reaching consensus when making decisions on standards? &lt;br /&gt;
: &#039;&#039;&#039;There is no explicitly stated objective of reaching consensus. However, when new contributions are made, the key specification developers will be able to veto the introduction of a new feature. Generally, differences are discussed openly and new features are adapted until they fit the overall architecture of the standard, at which stage they are introduced into the specification, standard and software.&#039;&#039;&#039;&lt;br /&gt;
* If consensus is not reached, can the standard be approved? (answers are: cannot be approved but referred back to working group/committee, approved with 75% majority, approved with 66% majority, approved with 51% majority, can be decided by a &amp;quot;director&amp;quot; or similar in the organisation).&lt;br /&gt;
: &#039;&#039;&#039;The standard can be approved without consensus via the decision of a &amp;quot;director&amp;quot; or similar.&#039;&#039;&#039;&lt;br /&gt;
* Is there a formal process for external review of standard proposals by interest groups (nonmembers)?&lt;br /&gt;
: &#039;&#039;&#039;Since anyone may participate in the development process and make proposals, there is no need for a separate formal process to include proposals by nonmembers.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Due Process ===&lt;br /&gt;
The due process evaluation criteria relates to the level of respect of each member of the organisation with regard to its rights. More specifically, it must be assured that if a member believes an error has been made in the process of defining a formal specification, it must be possible to appeal this to an independent, higher instance. The question is therefore: can a member formally appeal or raise objections to a procedure or to a technical specification to an independent, higher instance?&lt;br /&gt;
&lt;br /&gt;
: &#039;&#039;&#039;Yes.  Even if a member fails an appeal within the organization, because all of the technology Xiph standardizes is open and freely implementable, they are always free to develop their own, competing version.  Such competing versions may even still be eligible for standardization under the Xiph umbrella.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Changes to the formal specification ===&lt;br /&gt;
The suggested changes made to a formal specification need to be presented, evaluated and approved in the same way as the formal specification was first defined. This criteria therefore applies the above criteria to the changes made to the formal specification(availability of documentation, Intellectual Property Right, accessibility, interoperability governance, meeting and consultation, consensus, due process).&lt;br /&gt;
&lt;br /&gt;
: &#039;&#039;&#039;The exact same process is used for revisions to the standard as was used for the original development of the standard, and thus the answers to all of the above questions remain the same.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Support ===&lt;br /&gt;
It is critical that the organisation takes responsibility for the formal specification throughout its life span. This can be done in several ways such as for example a regular periodic review of the formal specification. The support criteria relates to the level of commitment the organisation has taken to support the formal specification throughout its life: &lt;br /&gt;
* does the organisation provide support until removal of the published formal specification from public domain (Including this process?)&lt;br /&gt;
: &#039;&#039;&#039;Xiph.Org standards are never removed from the public domain.  Xiph endeavors to provide support for as long as the standard remains in use.&#039;&#039;&#039;&lt;br /&gt;
* does the organisation make the formal specification still available even when in non-maintenance mode?&lt;br /&gt;
: &#039;&#039;&#039;Yes.  All Xiph.Org standards are freely licensed and will always be available.&#039;&#039;&#039;&lt;br /&gt;
* does the organisation add new features and keep the formal specification up-to-date?&lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph maintains its ecosystem of standards on a continuous basis.&#039;&#039;&#039;&lt;br /&gt;
* does the organisation rectify problems identified in initial implementations?&lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph maintains [https://trac.xiph.org/report a problem reporting system] that is open to the public, and invites everyone to submit suggestions for improvements.  Improvements are made both to the standards documents and to the reference implementations.&#039;&#039;&#039;&lt;br /&gt;
* does the organisation only create the formal specification?&lt;br /&gt;
: &#039;&#039;&#039;No.  Xiph also produces high-quality reusable reference implementations of its standards, released under an open license.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;This is a draft document. A work in progress. A scratchpad for ideas. It should not be widely circulated in this form.&amp;lt;/strong&amp;gt;&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=IDABC_Questionnaire_2009&amp;diff=10741</id>
		<title>IDABC Questionnaire 2009</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=IDABC_Questionnaire_2009&amp;diff=10741"/>
		<updated>2009-11-24T22:21:30Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Intellectual Property Right */ punctuation&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;This is a draft document. A work in progress. A scratchpad for ideas. It should not be widely circulated in this form.&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Context =&lt;br /&gt;
We received [http://lists.xiph.org/pipermail/theora/2009-November/002996.html an e-mail] from a consultant studying the suitability of Theora for use in &amp;quot;eGovernment&amp;quot;, on behalf of the [http://ec.europa.eu/idabc/ IDABC], an EU governmental agency responsible for &amp;quot;Interoperability&amp;quot; with an emphasis on open source.  The investigation is in the context of [http://ec.europa.eu/idabc/en/document/7728 European Interoperability Framework], about which there has been [http://www.computerworlduk.com/community/blogs/index.cfm?entryid=2620&amp;amp;blogid=14&amp;amp;pn=1 some real controversy].&lt;br /&gt;
&lt;br /&gt;
The method of assessment is the Common Assessment Method for Standards and Specifications, including the questions below.&lt;br /&gt;
&lt;br /&gt;
= CAMSS Questions =&lt;br /&gt;
== Part 4: Market Criteria ==&lt;br /&gt;
&lt;br /&gt;
This group of Market criteria analyses the formal specification in the scope of its market environment, and more precisely it examines the implementations of the formal specification and the market players. This implies identifying to which extent the formal specification benefits from market support and wide adoption, what are its level of maturity and its capacity of reusability.&lt;br /&gt;
&lt;br /&gt;
Market support is evaluated through an analysis of how many products implementing the formal specification exist, what their market share is and who their end-users are. The quality and the completeness (in case of partitioning) of the implementations of the formal specification can also be analysed. Availability of existing or planned mechanisms to assess conformity of implementations to the standard or to the specification could also be identified. The existence of at least one reference implementation (i.e.: mentioning a recognized certification process) - and of which one is an open source implementation - can also be relevant to the assessment. Wide adoption can also be assessed across domains (i.e.: public and private sectors), in an open environment, and/or in a similar field (i.e.: best practices).&lt;br /&gt;
&lt;br /&gt;
A formal specification is mature if it has been in use and development for long enough that most of its initial problems have been overcome and its underlying technology is well understood and well defined. Maturity is also assessed by identifying if all aspects of the formal specification are considered as validated by usage, (i.e.: if the formal specification is partitioned), and if the reported issues have been solved and documented.&lt;br /&gt;
&lt;br /&gt;
Reusability of a formal specification is enabled if it includes guidelines for its implementation in a given context. The identification of successful implementations of the standard or specification should focus on good practices in a similar field. Its incompatibility with related standards or specifications should also be taken into account.&lt;br /&gt;
&lt;br /&gt;
The ideas behind the Market Criteria can also be expressed in the form of the following questions:&lt;br /&gt;
&lt;br /&gt;
=== Market support ===&lt;br /&gt;
* Does the standard have strong support in the marketplace? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  For example, among web browsers, support for Xiph&#039;s Ogg, Theora, and Vorbis standards is now included by default in Mozilla Firefox, Google Chrome, and the latest versions of Opera, representing hundreds of millions of installed users just in this market alone. On Windows, DirectShow filters exist which also enable all Windows applications that use the DirectShow framework to use Xiph&#039;s Ogg, Theora, and Vorbis standards. A QuickTime component exists which enables use of Xiph&#039;s Ogg, Theora, and Vorbis standards in all Mac OS X applications that make use of the QuickTime framework---which includes Safari/Webkit, iMovie, QuickTime, and many others.&#039;&#039;&#039;&lt;br /&gt;
* What products exist for this formal specification ? &lt;br /&gt;
: &#039;&#039;&#039;Theora is a video codec, and as such the required products are encoders, decoders, and transmission systems.  All three types of products are widely available for Theora.&#039;&#039;&#039;&lt;br /&gt;
* How many implementations of the formal specification are there? &lt;br /&gt;
: &#039;&#039;&#039;Xiph does not require implementors to acquire any license before implementing the specification.  Therefore, we do not have a definitive count of the number of implementations.  In addition to the reference implementation, which has been ported to most modern platforms and highly optimized for x86 and ARM CPUs and TI C64x+ DSPs, we are aware of a number of independent, conformant or mostly-conformant implementations.  These include two C decoders ([http://ffmpeg.org/ FFmpeg] and [http://sourceforge.jp/projects/qtheora/ QTheora]), a Java decoder ([http://www.theora.org/cortado/ Jheora]), a [http://www.wreckedgames.com/downloads/cSharpTheora.zip C# decoder], an [http://svn.xiph.org/trunk/theora-fpga/ FPGA decoder], and an [http://sourceforge.net/projects/elphel/ FPGA encoder].&#039;&#039;&#039;&lt;br /&gt;
* Are there products from different suppliers in the market that implement this formal specification ? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Corporations such as Atari, Canonical, DailyMotion, Elphel, Fluendo, Google, Mozilla, Novell, Opera, Red Hat, Sun Microsystems, and Ubisoft have supplied products with an implementation of the Theora standard.&#039;&#039;&#039;&lt;br /&gt;
* Are there many products readily available from a variety of suppliers? &lt;br /&gt;
: &#039;&#039;&#039;Yes. Theora has been deployed in embedded devices, security cameras, video games, video conferencing systems, web browsers, home theater systems, and many other products.  A complete, legal, open-source reference implementation can also be downloaded free of charge, including components for all major media frameworks (DirectShow, gstreamer, and Quicktime), giving most applications the ability to use the codec.&#039;&#039;&#039;&lt;br /&gt;
* What is the market share of the products implementing the formal specification, versus other implementations of competing formal specifications ? &lt;br /&gt;
: &#039;&#039;&#039;Theora playback is extremely widely available, covering virtually the entire market of personal computers.  Theora is also increasingly available in mobile and embedded devices.  Since we do not require licensing for products that implement the specification, we do not have market share numbers that can be compared with competing formal specifications.  Because implementations are readily available and free, Theora is included in many products that support multiple codecs, and is sometimes the only video codec included in free software products.&#039;&#039;&#039;&lt;br /&gt;
* Who are the end-users of these products implementing the formal specification?&lt;br /&gt;
: &#039;&#039;&#039;The end users are television viewers, video gamers, web surfers, movie makers, business people, video distribution services, and anyone else who interacts with moving pictures.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Maturity ===&lt;br /&gt;
* Are there any existing or planned mechanisms to assess conformity of the implementations of the formal specification? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  In addition to a continuous peer review process, we maintain a suite of [http://v2v.cc/~j/theora_testsuite/ test vectors] that allow implementors to assess decoder conformity. We also provide free online developer support and testing for those attempting to make a conforming implementation. An [http://validator.xiph.org/ online validation service] is available.&#039;&#039;&#039;&lt;br /&gt;
* Is there a reference implementation (i.e.: mentioning a recognized certification process)? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph maintains a reference implementation called [http://downloads.xiph.org/releases/theora/ libtheora].  In addition to serving as a reference, libtheora is also highly optimized to achieve the maximum possible speed, accuracy, reliability, efficiency, and video quality.  As a result, many implementors of Theora adopt the reference implementation.&#039;&#039;&#039;&lt;br /&gt;
* Is there an open source implementation? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  libtheora is made available under a permissive BSD-like license.  Its open-source nature also contributes to its quality as a reference implementation, as implementors are welcome to contribute their improvements to the reference.  There are also several other open source implementations in addition to the reference.&#039;&#039;&#039;&lt;br /&gt;
* Does the formal specification show wide adoption? &lt;br /&gt;
** across different domains? (I.e.: public and private) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  In addition to the private companies mentioned in the previous section, Theora has also been specified as the sole format supported by non-profit organizations such as Wikipedia, currently the 6th largest website in the world, and as one of a small number of preferred formats supported by other public institutions, such as the Norwegian government.&#039;&#039;&#039;&lt;br /&gt;
** in an open environment? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  On open/free operating systems such as those distributed by Novell/SuSE, Canonical, and Red Hat, Theora is the primary default video codec.&#039;&#039;&#039;&lt;br /&gt;
** in a similar field? (i.e.: can best practices be identified?) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Most prominently, Theora has been used for eGovernment video distribution in the United States at [http://metavid.org Metavid].  Metavid is the most comprehensive, interactive archive of video footage from the United States legislature.  Metavid not only distributes video; they also enable citizen engagement by allowing them to annotate videos and correct transcripts.  Metavid distributes its entire archive in Theora format.  Metavid&#039;s source code is entirely open and reusable for any purpose, providing instant access to best practices for eGovernment with Theora.  Metavid&#039;s video display component is also available separately as [http://metavid.org/wiki/Mv_Embed mv_embed], which provides reusable best practices for easy Theora display on the web.&#039;&#039;&#039;&lt;br /&gt;
: &#039;&#039;&#039;Another important user of Theora is Wikipedia, which distributes video exclusively in Theora format.  Wikipedia&#039;s best practices for Theora distribution are encapsulated in [http://www.mediawiki.org/wiki/Extension:OggHandler OggHandler], which can be freely reused by anyone using the open-source MediaWiki software.&#039;&#039;&#039;&lt;br /&gt;
* Has the formal specification been in use and development long enough that most of its initial problems have been overcome? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Theora was derived from VP3, which was originally released in May 2000.  The Theora specification was completed in 2004.  Theora has now been used in a wide variety of applications, on the full spectrum of computing devices.&#039;&#039;&#039;&lt;br /&gt;
* Is the underlying technology of the standard well-understood? (e.g., a reference model is well defined, appropriate concepts of the technology are in widespread use, the technology may have been in use for many years, a formal mathematical model is defined, etc.) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  The underlying technology has been in use for nearly a decade, and most of the concepts have been in widespread use for even longer.&#039;&#039;&#039;&lt;br /&gt;
* Is the formal specification based upon technology that has not been well-defined and may be relatively new? &lt;br /&gt;
: &#039;&#039;&#039;No.  The formal specification is based on technology from the On2 VP3 codec, which is substantially similar to simple block-transform codecs like H.261. This class of codecs is extremely well understood, and has been in active use for over 20 years.&#039;&#039;&#039;&lt;br /&gt;
* Has the formal specification been revised? (Yes/No, Nof) &lt;br /&gt;
: &#039;&#039;&#039;The formal specification of the Theora decoder has been stable for years.  However, the text of the specification is continuously revised, based on user feedback, to improve the clarity and accuracy of the description of the technology.&#039;&#039;&#039;&lt;br /&gt;
* Is the formal specification under the auspices of an architectural board? (Yes/No) &lt;br /&gt;
: &#039;&#039;&#039;No.  Although officially maintained by the Xiph.Org Foundation, anyone is free to join this organization, and one need not even be a member to make contributions. However, the core developers will review contributions and make sure they do not contradict the general architecture and they work well with the existing code and the test cases.&#039;&#039;&#039;&lt;br /&gt;
* Is the formal specification partitioned in its functionality? (Yes/No) &lt;br /&gt;
: &#039;&#039;&#039;No.  Theora is very deliberately not partitioned, to avoid the confusion created by a &amp;quot;standard&amp;quot; composed of many incompatible &amp;quot;profiles&amp;quot;.  The Theora standard does not have any optional components.  A compliant Theora decoder can correctly process any Theora stream.&#039;&#039;&#039;&lt;br /&gt;
** To what extent does each partition participate to its overall functionality? (NN%) &lt;br /&gt;
: &#039;&#039;&#039;N/A.&#039;&#039;&#039;&lt;br /&gt;
** To what extent is each partition implemented? (NN%) (cf market adoption)&lt;br /&gt;
: &#039;&#039;&#039;N/A.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Re-usability === &lt;br /&gt;
* Does the formal specification provide guidelines for its implementation in a given organisation? &lt;br /&gt;
: &#039;&#039;&#039;Yes.  For example, [http://theora.org/doc/Theora.pdf the Theora specification] provides &amp;quot;non-normative&amp;quot; advice and explanation for implementors of Theora decoders and encoders, including example algorithms for implementing required mathematical transforms.  Xiph also maintains [http://wiki.xiph.org/Main_Page a documentation base] for implementors who desire more guidelines beyond the specification itself.&#039;&#039;&#039;&lt;br /&gt;
* Can other cases where similar systems implement the formal specification be considered as successful implementations and good practices? &lt;br /&gt;
: &#039;&#039;&#039;Xiph&#039;s standards have successfully been implemented by many organisations in a wide variety of environments.  We maintain (non-exhaustive) [http://wiki.xiph.org/TheoraSoftwarePlayers lists] of products which implement Theora support, many of them open source, so that others may use them as a reference when preparing their own products.&#039;&#039;&#039;&lt;br /&gt;
* Is its compatibility with related formal specification documented?&lt;br /&gt;
: &#039;&#039;&#039;Yes.  For example, [http://theora.org/doc/Theora.pdf the Theora specification] also documents the use of Theora within the [http://www.ietf.org/rfc/rfc3533.txt standard Ogg encapsulation format], and the [http://svn.xiph.org/trunk/theora/doc/draft-ietf-avt-rtp-theora-00.txt TheoraRTP draft specification] explains how to transmit Theora using the [http://tools.ietf.org/html/rfc3550 RTP standard].  In addition, the specification documents Theora&#039;s compatibility with ITU-R B.470, ITU-R B.601, ITU-R B.709, SMPTE-170M, [http://tools.ietf.org/html/rfc2044 UTF-8], ISO 10646, and [http://www.xiph.org/vorbis/doc/Vorbis_I_spec.pdf Ogg Vorbis].&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Part 5: Standardisation Criteria == &lt;br /&gt;
From Idabc-camss&lt;br /&gt;
&lt;br /&gt;
Note: Throughout this section, “Organisation” refers to the standardisation/fora/consortia body in charge of the formal specification.&lt;br /&gt;
&lt;br /&gt;
Significant characteristics of the way the organisation operates are for example the way it gives the possibility to stakeholders to influence the evolution of the formal specification, or which conditions it attaches to the use of the formal specification or its implementation. Moreover, it is important to know how the formal specification is defined, supported, and made available, as well as how interaction with stakeholders is managed by the organisation during these steps. Governance of interoperability testing with other formal specifications is also indicative.&lt;br /&gt;
&lt;br /&gt;
The standardisation criteria analyses therefore the following elements:&lt;br /&gt;
&lt;br /&gt;
=== Availability of Documentation ===&lt;br /&gt;
The availability of documentation criteria is linked to cost and online availability. Access to all preliminary results documentation can be online, online for members only, offline, offline, for members only or not available. Access can be free or for a fee (which fee?).&lt;br /&gt;
: &#039;&#039;&#039;Every Xiph standard is permanently available online to everyone at no cost.  For example, we invite everyone to download [http://theora.org/doc/Theora.pdf the most up-to-date copy of the Theora specification], and [http://xiph.org/vorbis/doc/Vorbis_I_spec.html the latest revision of Vorbis].  All previous revisions are available from Xiph&#039;s [http://svn.xiph.org/ revision control system].&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Intellectual Property Right ===&lt;br /&gt;
The Intellectual Property Rights evaluation criteria relates to the ability for implementers to use the formal specification in products without legal or financial implications. The IPR policy of the organisation is therefore evaluated according to: &lt;br /&gt;
* the availability of the IPR or copyright policies of the organisation (available on-line or off-line, or not available);&lt;br /&gt;
: &#039;&#039;&#039;The reference implementations of each codec include all necessary IPR and copyright licenses for that codec, including all documentation, and are freely available to everyone.&#039;&#039;&#039;&lt;br /&gt;
* the organisation’s governance to disclose any IPR from any contributor (ex-ante, online, offline, for free for all, for a fee for all, for members only, not available);&lt;br /&gt;
: &#039;&#039;&#039;Xiph does not require the identification of specific patents that may be required to implement a standard; however, it does require an open-source compatible, royalty free license from a contributor for any such patents they may own before the corresponding technology can be included in a standard. These licenses are made available online, for free, to all parties.&#039;&#039;&#039;&lt;br /&gt;
* the level of IPR set &amp;quot;mandatory&amp;quot; by the organisation (no patent, royalty free patent, patent and RAND with limited liability , patent and classic RAND, patent with explicit licensing, patent with defensive licensing, or none); &lt;br /&gt;
: &#039;&#039;&#039;All standards, specifications, and software published by the Xiph.Org Foundation are required to have &amp;quot;open-source compatible&amp;quot; IPR.  This means that a contribution must either be entirely clear of any known patents, or any patents that read upon the contribution must be available under a transferable, irrevocable public nonassertion agreement to all people everywhere.  For example, see [http://svn.xiph.org/trunk/theora/LICENSE our On2 patent nonassertion warrant].  Other common &amp;quot;royalty free&amp;quot; patent licenses are either not transferable, are revocable under certain conditions (such as patent infringement litigation against the originating party), or otherwise impose restrictions that would prevent distribution under common [http://www.opensource.org/ OSI]-approved licenses.  These would not be acceptable.&#039;&#039;&#039;&lt;br /&gt;
* the level of IPR &amp;quot;recommended&amp;quot; by the organisation (no patent, royalty free patent, patent and RAND with limited liability, patent and classic RAND, patent with explicit licensing, patent with defensive licensing, or none). [Note: RAND (Reasonable and Non Discriminatory License) is based on a &amp;quot;fairness&amp;quot; concept. Companies agree that if they receive any patents on technologies that become essential to the standard then they agree to allow other groups attempting to implement the standard to use these patents and they agree that the charges for the patents shall be reasonable. &amp;quot;RAND with limited availability&amp;quot; is a version of RAND where the &amp;quot;reasonable charges&amp;quot; have an upper limit.]&lt;br /&gt;
: &#039;&#039;&#039;Xiph&#039;s recommended IPR requirements are the same as our mandatory requirements.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Accessibility ===&lt;br /&gt;
&lt;br /&gt;
The accessibility evaluation criteria describe the importance of equal and safe accessibility by the users of implementations of formal specifications. This aspect can be related to safety (physical safety and conformance safety) and accessibility of physical impaired people (design for all).&lt;br /&gt;
&lt;br /&gt;
Focus is made particularly on accessibility and conformance safety. Conformance testing is testing to determine whether a system meets some specified formal specification. The result can be results from a test suite. Conformance validation is when the conformance test uniquely qualifies a given implementation as conformant or not. Conformance certification is a process that provides a public and easily visible &amp;quot;stamp of approval&amp;quot; that an implementation of a standard validates as conformant.&lt;br /&gt;
&lt;br /&gt;
The following questions allow an assessment of accessibility and conformance safety: &lt;br /&gt;
* Does a mechanism that ensures disability support by a formal specification exist? (Y/N) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph ensures support for users with disabilities by providing specifications for accessible technologies independent of the codec itself.  Notably, the Xiph [http://wiki.xiph.org/OggKate OggKate] codec for time-aligned text and image content provides support for subtitles for internationalisation, captions for the hearing-impaired, and textual audio descriptions for the visually impaired. Further, Ogg supports multiple tracks of audio and video content in one container, such that sign language tracks and audio descriptions can be included in one file. For this to work, Xiph has defined [http://wiki.xiph.org/Ogg_Skeleton Skeleton] which holds metadata about each track encapsulated within a single Ogg file. When Theora is transmitted or stored in an Ogg container, it is automatically compatible with these accessibility measures.&#039;&#039;&#039;&lt;br /&gt;
* Is conformance governance always part of a standard? (Y/N) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph&#039;s standards always precisely specify the requirements that an implementation must meet in order to be considered conformant.&#039;&#039;&#039;&lt;br /&gt;
* Is a conformance test offered to implementers? (Y/N) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph maintains a freely available suite of [http://v2v.cc/~j/theora_testsuite/ test vectors] and an [http://validator.xiph.org online validation service] that can be used by anyone to check confirm basic conformance, in addition to tools such as the oggz-validate program included with liboggz, which has been widely used for conformance testing.&#039;&#039;&#039;&lt;br /&gt;
* Is conformance validation available to implementers? (Y/N) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Informal conformance testing is available to implementors upon request, and Xiph has provided such testing for a number of implementations in the past.&#039;&#039;&#039;&lt;br /&gt;
* Is conformance certification available? (Y/N) &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph does not require certification, but maintains the right to withhold the use of our trademarks from implementors who act in bad faith.  Implementors may, however, request explicit permission to use our trademarks with a conforming implementation.&#039;&#039;&#039;&lt;br /&gt;
* Is localisation of a formal specification possible? (Y/N)&lt;br /&gt;
: &#039;&#039;&#039;Yes.  We welcome anyone who wishes to translate Xiph specifications into other languages.  We have no policy requiring that the normative specification be written in English.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Interoperability governance === &lt;br /&gt;
The interoperability governance evaluation criteria relates to how interoperability is identified and maintained between interoperable formal specifications. In order to do this, the organisation may provide governance for: &lt;br /&gt;
* open identification in formal specifications, &lt;br /&gt;
: &#039;&#039;&#039;Yes.  The Xiph codecs can be precisely identified by [http://wiki.xiph.org/index.php/MIMETypesCodecs their MIME types], as formally defined by [http://tools.ietf.org/html/rfc5334 IETF RFC 5334], an open specification.&#039;&#039;&#039;&lt;br /&gt;
* open negotiation in formal specifications,&lt;br /&gt;
: &#039;&#039;&#039;Yes.  For example, a [http://tools.ietf.org/html/draft-barbato-avt-rtp-theora-01 draft RTP specification] describes how Theora interoperates with the [http://tools.ietf.org/html/rfc3264 Session Description Protocol] (SDP), a mechanism for negotiating the parameters of RTP sessions.&#039;&#039;&#039;&lt;br /&gt;
* open selection in formal specifications.&lt;br /&gt;
: &#039;&#039;&#039;Yes. There are many open specifications that provide a mechanism for selecting Theora from among many codecs.  One such specification is [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#video HTML 5 video], which allows the user agent to select Theora based on its MIME type, using [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#the-source-element the source element].&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Meeting and consultation ===&lt;br /&gt;
The meeting and consultation evaluation criteria relates to the process of defining a formal specification. As formal specifications are usually defined by committees, and these committees normally consist of members of the organisation, this criteria studies how to become a member and which are the financial barriers for this, as well as how are non-members able to have an influence on the process of defining the formal specification. It analyses: &lt;br /&gt;
* if the organisation is open to all types of companies and organisations and to individuals; &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph welcomes representatives from all companies and organizations.&#039;&#039;&#039;&lt;br /&gt;
* if the standardisation process may specifically allow participation of members with limited abilities when relevant; &lt;br /&gt;
: &#039;&#039;&#039;Yes.  Standardization occurs almost entirely in internet communications channels, allowing participants with disabilities to engage fully in the standards development process.  We also encourage nonexperts and students to assist us as they can, and to learn about Xiph technologies by participating in the standards development process.&#039;&#039;&#039;&lt;br /&gt;
* if meetings are open to all members;&lt;br /&gt;
: &#039;&#039;&#039;Xiph meetings are open to everyone.  We charge no fee for and place no restrictions on attendance or participation.  For example, anyone interested in contributing to the Theora specification may join [http://lists.xiph.org/pipermail/theora-dev/ the Theora development mailing list].&#039;&#039;&#039;&lt;br /&gt;
* if all can participate in the formal specification creation process; &lt;br /&gt;
: &#039;&#039;&#039;Yes.  All people are welcome to participate in the specification creation process.  No dues or fees are required to participate.&#039;&#039;&#039;&lt;br /&gt;
* if non-members can participate in the formal specification creation process.&lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph does not maintain an explicit list of members, and no one is excluded from contributing to specifications as they are developed.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Consensus ===&lt;br /&gt;
Consensus is decision making primarily with regard to the approval of formal specifications and review with interest groups (non-members). The consensus evaluation criterion is evaluated with the following questions:&lt;br /&gt;
* Does the organisation have a stated objective of reaching consensus when making decisions on standards? &lt;br /&gt;
: &#039;&#039;&#039;There is no explicitly stated objective of reaching consensus. However, when new contributions are made, the key specification developers will be able to veto the introduction of a new feature. Generally, differences are discussed openly and new features are adapted until they fit the overall architecture of the standard, at which stage they are introduced into the specification, standard and software.&#039;&#039;&#039;&lt;br /&gt;
* If consensus is not reached, can the standard be approved? (answers are: cannot be approved but referred back to working group/committee, approved with 75% majority, approved with 66% majority, approved with 51% majority, can be decided by a &amp;quot;director&amp;quot; or similar in the organisation).&lt;br /&gt;
: &#039;&#039;&#039;The standard can be approved without consensus via the decision of a &amp;quot;director&amp;quot; or similar.&#039;&#039;&#039;&lt;br /&gt;
* Is there a formal process for external review of standard proposals by interest groups (nonmembers)?&lt;br /&gt;
: &#039;&#039;&#039;Since anyone may participate in the development process and make proposals, there is no need for a separate formal process to include proposals by nonmembers.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Due Process ===&lt;br /&gt;
The due process evaluation criteria relates to the level of respect of each member of the organisation with regard to its rights. More specifically, it must be assured that if a member believes an error has been made in the process of defining a formal specification, it must be possible to appeal this to an independent, higher instance. The question is therefore: can a member formally appeal or raise objections to a procedure or to a technical specification to an independent, higher instance?&lt;br /&gt;
&lt;br /&gt;
: &#039;&#039;&#039;Yes.  Even if a member fails an appeal within the organization, because all of the technology Xiph standardizes is open and freely implementable, they are always free to develop their own, competing version.  Such competing versions may even still be eligible for standardization under the Xiph umbrella.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Changes to the formal specification ===&lt;br /&gt;
The suggested changes made to a formal specification need to be presented, evaluated and approved in the same way as the formal specification was first defined. This criteria therefore applies the above criteria to the changes made to the formal specification(availability of documentation, Intellectual Property Right, accessibility, interoperability governance, meeting and consultation, consensus, due process).&lt;br /&gt;
&lt;br /&gt;
: &#039;&#039;&#039;The exact same process is used for revisions to the standard as was used for the original development of the standard, and thus the answers to all of the above questions remain the same.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=== Support ===&lt;br /&gt;
It is critical that the organisation takes responsibility for the formal specification throughout its life span. This can be done in several ways such as for example a regular periodic review of the formal specification. The support criteria relates to the level of commitment the organisation has taken to support the formal specification throughout its life: &lt;br /&gt;
* does the organisation provide support until removal of the published formal specification from public domain (Including this process?)&lt;br /&gt;
: &#039;&#039;&#039;Xiph.Org standards are never removed from the public domain.  Xiph endeavors to provide support for as long as the standard remains in use.&#039;&#039;&#039;&lt;br /&gt;
* does the organisation make the formal specification still available even when in non-maintenance mode?&lt;br /&gt;
: &#039;&#039;&#039;Yes.  All Xiph.Org standards are freely licensed and will always be available.&#039;&#039;&#039;&lt;br /&gt;
* does the organisation add new features and keep the formal specification up-to-date?&lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph maintains its ecosystem of standards on a continuous basis.&#039;&#039;&#039;&lt;br /&gt;
* does the organisation rectify problems identified in initial implementations?&lt;br /&gt;
: &#039;&#039;&#039;Yes.  Xiph maintains [https://trac.xiph.org/report a problem reporting system] that is open to the public, and invites everyone to submit suggestions for improvements.  Improvements are made both to the standards documents and to the reference implementations.&#039;&#039;&#039;&lt;br /&gt;
* does the organisation only create the formal specification?&lt;br /&gt;
: &#039;&#039;&#039;No.  Xiph also produces high-quality reusable reference implementations of its standards, released under an open license.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;This is a draft document. A work in progress. A scratchpad for ideas. It should not be widely circulated in this form.&amp;lt;/strong&amp;gt;&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
	<entry>
		<id>https://wiki.xiph.org/index.php?title=IDABC_Questionnaire_2009&amp;diff=10625</id>
		<title>IDABC Questionnaire 2009</title>
		<link rel="alternate" type="text/html" href="https://wiki.xiph.org/index.php?title=IDABC_Questionnaire_2009&amp;diff=10625"/>
		<updated>2009-11-04T19:27:55Z</updated>

		<summary type="html">&lt;p&gt;Mindspillage: /* Consensus */ no need for separate process for nonmembers&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Context =&lt;br /&gt;
We received [http://lists.xiph.org/pipermail/theora/2009-November/002996.html an e-mail] from a consultant studying the suitability of Theora for use in &amp;quot;eGovernment&amp;quot;, on behalf of the [http://ec.europa.eu/idabc/ IDABC], an EU governmental agency responsible for &amp;quot;Interoperability&amp;quot; with an emphasis on open source.  The investigation is in the context of [http://ec.europa.eu/idabc/en/document/7728 European Interoperability Framework], about which there has been [http://www.computerworlduk.com/community/blogs/index.cfm?entryid=2620&amp;amp;blogid=14&amp;amp;pn=1 some real controversy].&lt;br /&gt;
&lt;br /&gt;
The method of assessment is the Common Assessment Method for Standards and Specifications, including the questions below.&lt;br /&gt;
&lt;br /&gt;
= CAMSS Questions =&lt;br /&gt;
== Part 4: Market Criteria ==&lt;br /&gt;
&lt;br /&gt;
This group of Market criteria analyses the formal specification in the scope of its market environment, and more precisely it examines the implementations of the formal specification and the market players. This implies identifying to which extent the formal specification benefits from market support and wide adoption, what are its level of maturity and its capacity of reusability.&lt;br /&gt;
&lt;br /&gt;
Market support is evaluated through an analysis of how many products implementing the formal specification exist, what their market share is and who their end-users are. The quality and the completeness (in case of partitioning) of the implementations of the formal specification can also be analysed. Availability of existing or planned mechanisms to assess conformity of implementations to the standard or to the specification could also be identified. The existence of at least one reference implementation (i.e.: mentioning a recognized certification process) - and of which one is an open source implementation - can also be relevant to the assessment. Wide adoption can also be assessed across domains (i.e.: public and private sectors), in an open environment, and/or in a similar field (i.e.: best practices).&lt;br /&gt;
&lt;br /&gt;
A formal specification is mature if it has been in use and development for long enough that most of its initial problems have been overcome and its underlying technology is well understood and well defined. Maturity is also assessed by identifying if all aspects of the formal specification are considered as validated by usage, (i.e.: if the formal specification is partitioned), and if the reported issues have been solved and documented.&lt;br /&gt;
&lt;br /&gt;
Reusability of a formal specification is enabled if it includes guidelines for its implementation in a given context. The identification of successful implementations of the standard or specification should focus on good practices in a similar field. Its incompatibility with related standards or specifications should also be taken into account.&lt;br /&gt;
&lt;br /&gt;
The ideas behind the Market Criteria can also be expressed in the form of the following questions:&lt;br /&gt;
&lt;br /&gt;
=== Market support ===&lt;br /&gt;
* Does the standard have strong support in the marketplace? &lt;br /&gt;
: Yes.  For example, among web browsers, support for Xiph&#039;s Ogg, Theora, and Vorbis standards is now included by default in Mozilla Firefox, Google Chrome, and the latest versions of Opera, representing many millions of installed users just in this market alone.&lt;br /&gt;
* What products exist for this formal specification ? &lt;br /&gt;
: Theora is a video codec, and as such the required products are encoders, decoders, and transmission systems.  All three types of products are widely available for Theora.&lt;br /&gt;
* How many implementations of the formal specification are there? &lt;br /&gt;
: Xiph.org does not require implementors to acquire any license before implementing the specification.  Therefore, we do not know how many implementations have been created.  From voluntary notification given to us by implementors, we know that Theora has at least been implemented in embedded devices, security cameras, video games, video conferencing systems, web browsers, home theater systems, and many other applications.&lt;br /&gt;
* Are there products from different suppliers in the market that implement this formal specification ? &lt;br /&gt;
: Yes.  Corporations such as Novell, Opera, Google, Mozilla, Red Hat, Sun Microsystems, Canonical, DailyMotion, Elphel, and countless others have supplied implementations of the Theora standard.&lt;br /&gt;
* Are there many products readily available from a variety of suppliers? &lt;br /&gt;
: Yes.  Theora is extremely readily available: any user with a modern home computer can have a complete, legal, open source implementation, free of charge, downloaded and running in less than 30 seconds.&lt;br /&gt;
* What is the market share of the products implementing the formal specification, versus other implementations of competing formal specifications ? &lt;br /&gt;
: Theora playback is extremely widely available, covering virtually the entire market of personal computers.  Theora is also increasingly available in mobile and embedded devices.&lt;br /&gt;
* Who are the end-users of these products implementing the formal specification?&lt;br /&gt;
: The end users are television viewers, movie makers, business people, video distribution services, and anyone else who interacts with moving pictures.&lt;br /&gt;
&lt;br /&gt;
=== Maturity ===&lt;br /&gt;
* Are there any existing or planned mechanisms to assess conformity of the implementations of the formal specification? &lt;br /&gt;
: Yes.  In addition to a continuous peer review process, we maintain a suite of [http://v2v.cc/~j/theora_testsuite/ test vectors] that allow implementors to assess decoder conformity.&lt;br /&gt;
* Is there a reference implementation (i.e.: mentioning a recognized certification process)? &lt;br /&gt;
: Yes.  Xiph.org maintains a constantly updated reference implementation called [http://downloads.xiph.org/releases/theora/ libtheora].  libtheora is maintained not only as  a reference, but also to achieve the maximum possible speed, accuracy, reliability, efficiency, and video quality.  As a result, many implementors of Theora adopt the reference implementation.&lt;br /&gt;
* Is there an open source implementation? &lt;br /&gt;
: Yes.  libtheora is made available under a completely permissive BSD-like license.  Its open-source nature also contributes to its quality as a reference implementation, as implementors are welcome to contribute their improvements to the reference.  There are also several other open source implementations.&lt;br /&gt;
* Does the formal specification show wide adoption? &lt;br /&gt;
** across different domains? (I.e.: public and private) &lt;br /&gt;
** in an open environment? &lt;br /&gt;
: Yes.  On open/free operating systems such as those distributed by Novell/SuSE, Canonical, and Red Hat, Theora is the primary default video codec.&lt;br /&gt;
** in a similar field? (i.e.: can best practices be identified?) &lt;br /&gt;
* Has the formal specification been in use and development long enough that most of its initial problems have been overcome? &lt;br /&gt;
: Yes.  Theora has now been used in a wide variety of applications, on the full spectrum of computing devices.&lt;br /&gt;
* Is the underlying technology of the standard well-understood? (e.g., a reference model is welldefined, appropriate concepts of the technology are in widespread use, the technology may have been in use for many years, a formal mathematical model is defined, etc.) &lt;br /&gt;
* Is the formal specification based upon technology that has not been well-defined and may be relatively new? &lt;br /&gt;
: No.  The formal specification is based on technology from the On2 VP3 codec, which is substantially similar to simple block-transform codecs like H.261. This class of codecs is extremely well understood, and has been actively in use for over 20 years.&lt;br /&gt;
* Has the formal specification been revised? (Yes/No, Nof) &lt;br /&gt;
: Yes.  The specification is continuously revised to improve clarity and accuracy.&lt;br /&gt;
* Is the formal specification under the auspices of an architectural board? (Yes/No) &lt;br /&gt;
* Is the formal specification partitioned in its functionality? (Yes/No) &lt;br /&gt;
: No.  Theora is very deliberately not partitioned, to avoid the confusion created by a &amp;quot;standard&amp;quot; composed of many incompatible &amp;quot;profiles&amp;quot;.  The Theora standard does not have any optional components.  A compliant Theora decoder can correctly process any Theora stream.&lt;br /&gt;
** To what extent does each partition participate to its overall functionality? (NN%) &lt;br /&gt;
** To what extent is each partition implemented? (NN%) (cf market adoption)&lt;br /&gt;
&lt;br /&gt;
=== Re-usability === &lt;br /&gt;
* Does the formal specification provide guidelines for its implementation in a given organisation? &lt;br /&gt;
: Yes.  For example, [http://theora.org/doc/Theora.pdf the Theora specification document] provides &amp;quot;non-normative&amp;quot; advice and explanation for implementors of Theora decoders and encoders, including example algorithms for implementing required mathematical transforms.  Xiph also maintains [http://wiki.xiph.org/Main_Page a documentation base] for implementors who desire more guidelines beyond the specification itself.&lt;br /&gt;
* Can other cases where similar systems implement the formal specification be considered as successful implementations and good practices? &lt;br /&gt;
: Xiph.org standards have successfully been implemented by many organisations in a wide variety of environments.  We encourage independent implementations and unexpected applications of our standards.&lt;br /&gt;
* Is its compatibility with related formal specification documented?&lt;br /&gt;
: Yes.  For example, [http://theora.org/doc/Theora.pdf the Theora specification] also documents the use of Theora within the [http://www.ietf.org/rfc/rfc3533.txt standard Ogg encapsulation format], and the [http://svn.xiph.org/trunk/theora/doc/draft-ietf-avt-rtp-theora-00.txt TheoraRTP draft specification] explains how to transmit Theora using the [http://tools.ietf.org/html/rfc3550 RTP standard].&lt;br /&gt;
&lt;br /&gt;
== Part 5: Standardisation Criteria == &lt;br /&gt;
From Idabc-camss&lt;br /&gt;
&lt;br /&gt;
Note: Throughout this section, “Organisation” refers to the standardisation/fora/consortia body in charge of the formal specification.&lt;br /&gt;
&lt;br /&gt;
Significant characteristics of the way the organisation operates are for example the way it gives the possibility to stakeholders to influence the evolution of the formal specification, or which conditions it attaches to the use of the formal specification or its implementation. Moreover, it is important to know how the formal specification is defined, supported, and made available, as well as how interaction with stakeholders is managed by the organisation during these steps. Governance of interoperability testing with other formal specifications is also indicative.&lt;br /&gt;
&lt;br /&gt;
The standardisation criteria analyses therefore the following elements:&lt;br /&gt;
&lt;br /&gt;
=== Availability of Documentation ===&lt;br /&gt;
The availability of documentation criteria is linked to cost and online availability. Access to all preliminary results documentation can be online, online for members only, offline, offline, for members only or not available. Access can be free or for a fee (which fee?).&lt;br /&gt;
: Every Xiph.org standard is permanently available online to everyone at no cost.  For example, we invite everyone to download [http://theora.org/doc/Theora.pdf the most up-to-date copy of the Theora specification], and [http://xiph.org/vorbis/doc/Vorbis_I_spec.html the latest revision of Vorbis].&lt;br /&gt;
&lt;br /&gt;
=== Intellectual Property Right ===&lt;br /&gt;
The Intellectual Property Rights evaluation criteria relates to the ability for implementers to use the formal specification in products without legal or financial implications. The IPR policy of the organisation is therefore evaluated according to: &lt;br /&gt;
* the availability of the IPR or copyright policies of the organisation (available on-line or off-line, or not available); &lt;br /&gt;
* the organisation’s governance to disclose any IPR from any contributor (ex-ante, online, offline, for free for all, for a fee for all, for members only, not available); &lt;br /&gt;
* the level of IPR set &amp;quot;mandatory&amp;quot; by the organisation (no patent, royalty free patent, patent and RAND with limited liability , patent and classic RAND, patent with explicit licensing, patent with defensive licensing, or none); &lt;br /&gt;
: Xiph.org standards have a mandatory requirement that our standards be absolutely free of IPR encumbrance.  This means that a standard must either be entirely clear of any known patents, or any patents that read upon the standard must be available under an irrevocable public nonassertion agreement to all people everywhere.  For example, see [http://svn.xiph.org/trunk/theora/LICENSE our On2 patent nonassertion warrant].&lt;br /&gt;
* the level of IPR &amp;quot;recommended&amp;quot; by the organisation (no patent, royalty free patent, patent and RAND with limited liability, patent and classic RAND, patent with explicit licensing, patent with defensive licensing, or none). [Note: RAND (Reasonable and Non Discriminatory License) is based on a &amp;quot;fairness&amp;quot; concept. Companies agree that if they receive any patents on technologies that become essential to the standard then they agree to allow other groups attempting to implement the standard to use these patents and they agree that the charges for the patents shall be reasonable. &amp;quot;RAND with limited availability&amp;quot; is a version of RAND where the &amp;quot;reasonable charges&amp;quot; have an upper limit.]&lt;br /&gt;
: Xiph&#039;s mandatory IPR requirements are the most stringent possible, and so our IPR recommendation is necessarily the same as the mandatory requirements.&lt;br /&gt;
&lt;br /&gt;
=== Accessibility ===&lt;br /&gt;
&lt;br /&gt;
The accessibility evaluation criteria describe the importance of equal and safe accessibility by the users of implementations of formal specifications. This aspect can be related to safety (physical safety and conformance safety) and accessibility of physical impaired people (design for all).&lt;br /&gt;
&lt;br /&gt;
Focus is made particularly on accessibility and conformance safety. Conformance testing is testing to determine whether a system meets some specified formal specification. The result can be results from a test suite. Conformance validation is when the conformance test uniquely qualifies a given implementation as conformant or not. Conformance certification is a process that provides a public and easily visible &amp;quot;stamp of approval&amp;quot; that an implementation of a standard validates as conformant.&lt;br /&gt;
&lt;br /&gt;
The following questions allow an assessment of accessibility and conformance safety: &lt;br /&gt;
* Does a mechanism that ensures disability support by a formal specification exist? (Y/N) &lt;br /&gt;
: Yes.  Xiph.org ensures disability supporting by providing specifications for accessible technologies independent of the codec itself.  Notable Xiph specifications include [http://wiki.xiph.org/OggKate OggKate] and [http://wiki.xiph.org/index.php/CMML CMML], which provide subtitles for the hearing-impaired, as well as [http://wiki.xiph.org/Ogg_Skeleton Skeleton], which provides metadata that can be used to specify scene description audio tracks for the visually impaired.  When Theora is transmitted or stored in an Ogg container, it is automatically compatible with these accessibility measures.&lt;br /&gt;
* Is conformance governance always part of a standard? (Y/N) &lt;br /&gt;
* Is a conformance test offered to implementers? (Y/N) &lt;br /&gt;
* Is conformance validation available to implementers? (Y/N) &lt;br /&gt;
* Is conformance certification available? (Y/N) &lt;br /&gt;
* Is localisation of a formal specification possible? (Y/N)&lt;br /&gt;
: Yes.  We welcome anyone who wishes to translate Xiph specifications into other languages.  We have no policy requiring that the normative specification be written in English.&lt;br /&gt;
&lt;br /&gt;
=== Interoperability governance === &lt;br /&gt;
The interoperability governance evaluation criteria relates to how interoperability is identified and maintained between interoperable formal specifications. In order to do this, the organisation may provide governance for: &lt;br /&gt;
* open identification in formal specifications, &lt;br /&gt;
* open negotiation in formal specifications, &lt;br /&gt;
* open selection in formal specifications. &lt;br /&gt;
&lt;br /&gt;
=== Meeting and consultation ===&lt;br /&gt;
The meeting and consultation evaluation criteria relates to the process of defining a formal specification. As formal specifications are usually defined by committees, and these committees normally consist of members of the organisation, this criteria studies how to become a member and which are the financial barriers for this, as well as how are non-members able to have an influence on the process of defining the formal specification. It analyses: &lt;br /&gt;
* if the organisation is open to all types of companies and organisations and to individuals; &lt;br /&gt;
: Yes.  Xiph welcomes representatives from all companies and organizations.&lt;br /&gt;
* if the standardisation process may specifically allow participation of members with limited abilities when relevant; &lt;br /&gt;
: Yes.  Standardization occurs almost entirely in internet communications channels, allowing participants with disabilities to engage fully in the standards development process.  We also encourage nonexperts and students to assist us as they can, and to learn about Xiph technologies by participating in the standards development process.&lt;br /&gt;
* if meetings are open to all members;&lt;br /&gt;
: Xiph meetings are open to everyone.  We place no restrictions on attendance or participation.  For example, anyone interesting in contributing to the Theora specification may join [http://lists.xiph.org/pipermail/theora-dev/ the Theora development mailing list].&lt;br /&gt;
* if all can participate in the formal specification creation process; &lt;br /&gt;
: Yes.  All people are welcome to participate in the specification creation process.  No dues or fees are required to participate&lt;br /&gt;
* if non-members can participate in the formal specification creation process.&lt;br /&gt;
: Yes.  Xiph does not maintain an explicit list of members, and no one is excluded from contributing to specifications as they are developed.&lt;br /&gt;
&lt;br /&gt;
=== Consensus ===&lt;br /&gt;
Consensus is decision making primarily with regard to the approval of formal specifications and review with interest groups (non-members). The consensus evaluation criterion is evaluated with the following questions:&lt;br /&gt;
* Does the organisation have a stated objective of reaching consensus when making decisions on standards? &lt;br /&gt;
* If consensus is not reached, can the standard be approved? (answers are: cannot be approved but referred back to working group/committee, approved with 75% majority, approved with 66% majority, approved with 51% majority, can be decided by a &amp;quot;director&amp;quot; or similar in the organisation). &lt;br /&gt;
* Is there a formal process for external review of standard proposals by interest groups (nonmembers)?&lt;br /&gt;
** Since anyone may participate in the development process and make proposals, there is no need for a separate formal process to include proposals by nonmembers.&lt;br /&gt;
&lt;br /&gt;
=== Due Process ===&lt;br /&gt;
The due process evaluation criteria relates to the level of respect of each member of the organisation with regard to its rights. More specifically, it must be assured that if a member believes an error has been made in the process of defining a formal specification, it must be possible to appeal this to an independent, higher instance. The question is therefore: can a member formally appeal or raise objections to a procedure or to a technical specification to an independent, higher instance?&lt;br /&gt;
&lt;br /&gt;
=== Changes to the formal specification ===&lt;br /&gt;
The suggested changes made to a formal specification need to be presented, evaluated and approved in the same way as the formal specification was first defined. This criteria therefore applies the above criteria to the changes made to the formal specification(availability of documentation, Intellectual Property Right, accessibility, interoperability governance, meeting and consultation, consensus, due process).&lt;br /&gt;
&lt;br /&gt;
=== Support ===&lt;br /&gt;
It is critical that the organisation takes responsibility for the formal specification throughout its life span. This can be done in several ways such as for example a regular periodic review of the formal specification. The support criteria relates to the level of commitment the organisation has taken to support the formal specification throughout its life: &lt;br /&gt;
* does the organisation provide support until removal of the published formal specification from public domain (Including this process? &lt;br /&gt;
: Xiph.org standards are never removed from the public domain.  Xiph endeavors to provide support for as long as the standard remains in use.&lt;br /&gt;
* does the organisation make the formal specification still available even when in non-maintenance mode?&lt;br /&gt;
: Yes.  All Xiph.org standards are freely licensed and will always be available.&lt;br /&gt;
* does the organisation add new features and keep the formal specification up-to-date?&lt;br /&gt;
: Yes.  Xiph.org maintains its ecosystem of standards on a continuous basis.&lt;br /&gt;
* does the organisation rectify problems identified in initial implementations?&lt;br /&gt;
: Yes.  Xiph.org maintains [https://trac.xiph.org/report a problem reporting system] that is open to the public, and invites everyone to submit suggestions for improvements.  Improvements are made both to the standards documents and to the reference implementations.&lt;br /&gt;
* does the organisation only create the formal specification?&lt;br /&gt;
: No.  Xiph.org also produces high-quality reusable reference implementations of its standards, released under an open license.&lt;/div&gt;</summary>
		<author><name>Mindspillage</name></author>
	</entry>
</feed>