Talk:Videos/A Digital Media Primer For Geeks: Difference between revisions

From XiphWiki
Jump to navigation Jump to search
(23 intermediate revisions by 14 users not shown)
Line 15: Line 15:
:I normally suggest to people looking for increased to look into acoustic holography techniques like higher-order ambisonics and wavefield synthesis.  
:I normally suggest to people looking for increased to look into acoustic holography techniques like higher-order ambisonics and wavefield synthesis.  
:The beyond 48kHz sampling subject subject has been [http://www.google.com/custom?domains=hydrogenaudio.org&q=96khz&sa=Google+Search&sitesearch=hydrogenaudio.org&client=pub-4544327213918729&forid=1&channel=7051718642&ie=ISO-8859-1&oe=ISO-8859-1&flav=0000&sig=6_g3ghDcS6bRpfcd&cof=GALT%3A%23008000%3BGL%3A1%3BDIV%3A%23336699%3BVLC%3A663399%3BAH%3Acenter%3BBGC%3AFFFFFF%3BLBGC%3AFFFFFF%3BALC%3A0000FF%3BLC%3A0000FF%3BT%3A000000%3BGFNT%3A0000FF%3BGIMP%3A0000FF%3BLH%3A50%3BLW%3A262%3BL%3Ahttp%3A%2F%2Fwww.hydrogenaudio.org%2Fforums%2Flogo50.png%3BS%3Ahttp%3A%2F%2Fwww.hydrogenaudio.org%3BFORID%3A1&hl=en discussed a number of times on hydrogen audio], I recommend reading the thread there. They are quite informative. Most audio groups out there online and off are not very scientifically oriented (e.g. evidence based)— HA is special because it is one of the few that are.--[[User:Gmaxwell|Gmaxwell]] 06:00, 24 September 2010 (UTC)
:The beyond 48kHz sampling subject subject has been [http://www.google.com/custom?domains=hydrogenaudio.org&q=96khz&sa=Google+Search&sitesearch=hydrogenaudio.org&client=pub-4544327213918729&forid=1&channel=7051718642&ie=ISO-8859-1&oe=ISO-8859-1&flav=0000&sig=6_g3ghDcS6bRpfcd&cof=GALT%3A%23008000%3BGL%3A1%3BDIV%3A%23336699%3BVLC%3A663399%3BAH%3Acenter%3BBGC%3AFFFFFF%3BLBGC%3AFFFFFF%3BALC%3A0000FF%3BLC%3A0000FF%3BT%3A000000%3BGFNT%3A0000FF%3BGIMP%3A0000FF%3BLH%3A50%3BLW%3A262%3BL%3Ahttp%3A%2F%2Fwww.hydrogenaudio.org%2Fforums%2Flogo50.png%3BS%3Ahttp%3A%2F%2Fwww.hydrogenaudio.org%3BFORID%3A1&hl=en discussed a number of times on hydrogen audio], I recommend reading the thread there. They are quite informative. Most audio groups out there online and off are not very scientifically oriented (e.g. evidence based)— HA is special because it is one of the few that are.--[[User:Gmaxwell|Gmaxwell]] 06:00, 24 September 2010 (UTC)
::I don't think higher Fs's than 48000 Hz are justified for psychoacoustic reasons, but we definitely use them in production for sound effects and music mastering, because it significantly improves the quality of pitch shifting and time stretching.  On features we record all sound effects at 96 kHz (at least) so we have the liberty to pitch it down an octave.  We shoot at 96k or 192k, and I archive to FLAC and use Apple Lossless .m4a's for online use, since they're better supported by our DAWs. As a distribution format though a base rate of 48kHz is definitely all you need for the home listening environment.  [[User:Iluvcapra|Iluvcapra]] 18:39, 26 September 2010 (UTC)


==Video vegetables (they're good for you!)==
==Video vegetables (they're good for you!)==
Line 23: Line 24:
Hi there, great tutorial, but in fact the most common DVD standard is 720 pixels by 480 pixels, with a pixel ratio of 0.9, yielding a device aspect ratio of 1.35. I understand that you're trying to simplify the lecture to 4:3 aspect (1.333) for newbies, I think this is ultimately misleading, since the vast majority of DVDs are not sampled at 704x480. --Dryo
Hi there, great tutorial, but in fact the most common DVD standard is 720 pixels by 480 pixels, with a pixel ratio of 0.9, yielding a device aspect ratio of 1.35. I understand that you're trying to simplify the lecture to 4:3 aspect (1.333) for newbies, I think this is ultimately misleading, since the vast majority of DVDs are not sampled at 704x480. --Dryo


: Sort of-- the most common encoding is 720x480, but with the crop area set to 704x480; that's what the standard calls for.  Many software players ignore the crop area and also display the horizontal overscan area.  Many software encoders also just blindly encode 720x480 without setting the crop area. It is a source of *much* confusion. --[[User:Xiphmont|Monty]]
: Sort of-- the most common encoding is 720x480, but with the crop area set to 704x480; that's what the standard calls for (I was being sneaky when I said 'display resolution of 704x480').  Many software players ignore the crop rectangle and also display the horizontal overscan area.  Many software encoders also just blindly encode 720x480 without setting the crop area. It is a source of *much* confusion. --[[User:Xiphmont|Monty]]
 
::"The standard" here being— Rec. 601? Is there anything else?  We should probably at least link [[Wikipedia:overscan]]. --[[User:Gmaxwell|Gmaxwell]] 13:13, 24 September 2010 (UTC)
 
::OK, thanks for the clarification Monty... I did not even know that the horizontal crop area existed.
 
"''[...] most displays use [RGB] colors [...]''". Doesn't that sentence contradict this one : "''[...] video usually is represented as a [...] luma channel along with additional [...] chroma channels, the color''". I don't understand what "''position the chroma pixels''" means exactly. Are we talking of real points on a display ? Thanks, great video ! --[[User:Ledahulevogyre|Ledahulevogyre]] 13:59, 24 September 2010 (UTC)
 
:Display devices use RGB. Most video is actually encoded as YUV, luma plus two color "difference" channels. This reduces the bandwidth of raw video by cleverly exploiting limitations in human perception. Additionally, color samples need not be as frequent as luminance samples. So "chroma pixels" are the color data samples, not the pixels on a real display.  --Dryo
 
::Thanks Dryo ! that's what I thought. Then I don't quite understand what this chroma samples positioning/siting is about. Is it actually defining the algorithm you should use to compute RGB pixels from YUV samples ? Is is defining the influence zone of chroma samples over luminance ones ? What I don't get is how you can talk about spatial positioning for something that is, well... not spatial (samples). Thank you again ! --[[User:Ledahulevogyre|Ledahulevogyre]] 09:52, 25 September 2010 (UTC)
 
:::Imagine a small 2x2 image, with the top two pixels blue, and the bottom two pixels red. Luminance will be sampled at each pixel, but (for 4:2:0), only one sample of Cr will be taken for this 2x2 set, so you'll have to decide where. If you place the sample on the middle horizontally, but aligned with every even or odd line, you'll get a sample from either blue, or red. If you place the sample horizontally and vertically, you'll get a sample from pink. Similarly for each other possible placement algorithm. [[User:Ogg.k.ogg.k|Ogg.k.ogg.k]] 10:24, 25 September 2010 (UTC)
 
=== color and colorspace ===
I'm not 100% sure of this, but when I was messing with analogue and digitised-analogue video at University, I thought the key difference between 4:1:1 YUV and 4:2:0 YUV is that the former has the same color sub-sampling on every field (dealing with interlaced content), each at one-quarter of full rate, where as 4:2:0 YUV has only Y and U in the first field, then only Y and V in the second.
 
Effectively the "4:2:0" signal is successively 4:2:0 and 4:0:2, which is why the V component doesn't go away altogether as the name implies. The reason for such a strange encoding standard and it's use for PAL-format DV encoding is that the analogue PAL signal already does the throwing away of half the colour information from each field in the analogue composite signal used to get video around production facilities, so there would only be reduced temporal resolution U and V input to be digitised. It obviously makes a lot less sense for progressive-scan content without the interlace (it would look pretty poor to reduce the colour frame rate to 12.5 fps). There is an argument that the conversion of 4:2:0 interlaced content into progressive data for computer display converts it into 4:1:1 material when the color planes of the two fields are married up?
 
The only important thing to come out of this is that the diagram on the whiteboard looks a lot more like 4:1:1 video, and I would expect that to be the correct choice for progressive-scan content (which I take your images to be, it being simpler). The narration of the next scene also uses 4:1:1 rather than 4:2:0, which tends to emphasise the same point.


==Containers==
==Containers==
Line 30: Line 50:


The video hasn't yet been formally released but we have all the sites up early in order to get everything debugged... Feedback on site functionality prior to the official release would be very helpful. --[[User:Gmaxwell|Gmaxwell]] 15:15, 22 September 2010 (UTC)
The video hasn't yet been formally released but we have all the sites up early in order to get everything debugged... Feedback on site functionality prior to the official release would be very helpful. --[[User:Gmaxwell|Gmaxwell]] 15:15, 22 September 2010 (UTC)
:Released now, but still tell us about bugs :-) --[[User:Xiphmont|Monty]]
When do you plan to create and/or release the next episode in this series? --[[User:Minerva|Minerva]] 05:52, 16 November 2010 (UTC)
Mad props to you for taking the time and spending the effort to make this video. I've been waiting for episode 2 since this first came out, any chance of that being produced soon? --[[User:StFS|StFS]] 17:20, 14 June 2011 (PDT)
=== Atom/RSS feed ===
Could not find an Atom/RSS feed for the video episodes. A videocast url with video-link enclosures would be ideal for getting future episodes. But even a announce-only feed would be convenient to track new episode releases. --[[User:Gsauthof|Gsauthof]] 17:41, 24 September 2010 (UTC)
:One does not exist yet— as a stopgap you can follow the [http://xiphmont.livejournal.com/tag/xiph Xiph tag on Monty's blog] and you'll be sure to hear about new videos. This has to be the most requested feature— I'll make sure we do it before the next video.--[[User:Gmaxwell|Gmaxwell]] 20:50, 24 September 2010 (UTC)
::Monty has uses the tag 'admpfg' for these videos on his blog, and Livejournal supports RSS feeds for tags. So, here's an RSS feed: http://xiphmont.livejournal.com/data/rss?tag=admpfg [[User:Nerd65536|Nerd65536]] 17:52, 26 September 2010 (UTC)
== 44100 Hz Trivia ==
The reason CDs use a  44,100 Hz (actually 44,056 Hz in the United States) is because, before dedicated digital recorders became mainstream, the only way a recording engineer or producer could record digital audio was with a piece of gear called a "PCM processor" or a "PCM Adaptor" (like a Sony PCM-F1 of PCM-501).  These would take an audio input and, after running through the A/D if necessary, it would modulate it onto a baseband monochrome NTSC or PAL video signal that could then be recorded onto a 3/4" U-Matic video tape.  The processors would accept two inputs, at 16 bits, giving a total bit rate of 1411200 bps.  This number has the serendipitous property of being evenly divisible by both 30 and 25, 47040 and 56448, and these numbers allow both NTSC and PAL to encode the same number of bits, 98, per scan line (with the NTSC 480 line raster and PAL 576 line raster).  It was just convenient selection of integers.  CDs would be recorded at 44.1k in Europe as they were mastered onto 25 fps tapes, while CDs in the US were recorded at a "nominal" 30fps were actually at 44.056, but the difference in tone is basically inaudible.  [[User:Iluvcapra|Iluvcapra]] 18:44, 24 September 2010 (UTC)
:Note that the PCM audio signal, once modulated to NTSC or PAL, can be recorded on any video recorder, not just U-matic. The most common tape format for PCM audio was Sony Betamax. Sony sold Betamax decks bundled with external PCM A/D converter units for the pro audio market. The PCM-F1 was designed to be used with Betacam VCRs. -- Dryo
== 32 bit IEEE754 float ==
The video says that 32 bit floats have 24 bits of resolution and a 7 bit exponent.
This is incorrect. The exponent is eight bits.  The mantissa does have a resolution of 24 bits, but only 23 bits are explicit. The encoding has one implicit bit that is always '1'.  A 32 bit float can store all possible 24 bit integers exactly. [[User:Xiphmont|Xiphmont]] 17:20, 8 March 2012 (PST)

Revision as of 18:20, 8 March 2012

Welcome to the discussion.

To discuss the video, make an account and hit edit. Please feel free to point out errata, suggested additional resources, or just ask questions!


Introduction

Analog vs Digital

Raw (digital audio) meat

Don't forget when talking about higher sampling rates that frequency and temporal response are inherently linked. One often overlooked aspect of this is the value of higher sampling rates in presenting subtle differences in multi-channel timing (e.g. the stereo field). Even fairly uncritical listeners presented sample audio blind can notice this. --Chaboud

They aren't merely "technically linked". They're mathematically indistinguishable. If a system doesn't has a response beyond some frequency it also lacks time resolution beyond some point.
To the best of my knowledge a perceptually justified need for higher rates is not supported by the available science on the subject. Not only is there no real physiological mechanism proposed for this kind of sensitivity, well controlled blind listening tests don't support it— well controlled being key, loudspeakers can suffer from considerable non-linear effects including intermodulation, and having a lot of otherwise inaudible ultrasonics can produce audible distortion at lower frequencies. Another common error is running the DAC at different frequencies— with the obvious interactions with the reconstruction and analog filters. A correct test for determining the audibility differences of higher sample rates needs to use a single DAC stage at the highest frequency, re-sampling digitally to create the bandpass... etc. I'm not aware of any such test supporting a need for information beyond 24kHz.
I normally suggest to people looking for increased to look into acoustic holography techniques like higher-order ambisonics and wavefield synthesis.
The beyond 48kHz sampling subject subject has been discussed a number of times on hydrogen audio, I recommend reading the thread there. They are quite informative. Most audio groups out there online and off are not very scientifically oriented (e.g. evidence based)— HA is special because it is one of the few that are.--Gmaxwell 06:00, 24 September 2010 (UTC)
I don't think higher Fs's than 48000 Hz are justified for psychoacoustic reasons, but we definitely use them in production for sound effects and music mastering, because it significantly improves the quality of pitch shifting and time stretching. On features we record all sound effects at 96 kHz (at least) so we have the liberty to pitch it down an octave. We shoot at 96k or 192k, and I archive to FLAC and use Apple Lossless .m4a's for online use, since they're better supported by our DAWs. As a distribution format though a base rate of 48kHz is definitely all you need for the home listening environment. Iluvcapra 18:39, 26 September 2010 (UTC)

Video vegetables (they're good for you!)

An interesting point is that the discussion of the linear segment in the normal display responses (e.g. sRGB) is incorrect, or at best incomplete, though I've coming up short on good citations for this, so Wikipedia remains uncorrected at this time.--Gmaxwell 05:15, 22 September 2010 (UTC)


Hi there, great tutorial, but in fact the most common DVD standard is 720 pixels by 480 pixels, with a pixel ratio of 0.9, yielding a device aspect ratio of 1.35. I understand that you're trying to simplify the lecture to 4:3 aspect (1.333) for newbies, I think this is ultimately misleading, since the vast majority of DVDs are not sampled at 704x480. --Dryo

Sort of-- the most common encoding is 720x480, but with the crop area set to 704x480; that's what the standard calls for (I was being sneaky when I said 'display resolution of 704x480'). Many software players ignore the crop rectangle and also display the horizontal overscan area. Many software encoders also just blindly encode 720x480 without setting the crop area. It is a source of *much* confusion. --Monty
"The standard" here being— Rec. 601? Is there anything else? We should probably at least link Wikipedia:overscan. --Gmaxwell 13:13, 24 September 2010 (UTC)
OK, thanks for the clarification Monty... I did not even know that the horizontal crop area existed.

"[...] most displays use [RGB] colors [...]". Doesn't that sentence contradict this one : "[...] video usually is represented as a [...] luma channel along with additional [...] chroma channels, the color". I don't understand what "position the chroma pixels" means exactly. Are we talking of real points on a display ? Thanks, great video ! --Ledahulevogyre 13:59, 24 September 2010 (UTC)

Display devices use RGB. Most video is actually encoded as YUV, luma plus two color "difference" channels. This reduces the bandwidth of raw video by cleverly exploiting limitations in human perception. Additionally, color samples need not be as frequent as luminance samples. So "chroma pixels" are the color data samples, not the pixels on a real display. --Dryo
Thanks Dryo ! that's what I thought. Then I don't quite understand what this chroma samples positioning/siting is about. Is it actually defining the algorithm you should use to compute RGB pixels from YUV samples ? Is is defining the influence zone of chroma samples over luminance ones ? What I don't get is how you can talk about spatial positioning for something that is, well... not spatial (samples). Thank you again ! --Ledahulevogyre 09:52, 25 September 2010 (UTC)
Imagine a small 2x2 image, with the top two pixels blue, and the bottom two pixels red. Luminance will be sampled at each pixel, but (for 4:2:0), only one sample of Cr will be taken for this 2x2 set, so you'll have to decide where. If you place the sample on the middle horizontally, but aligned with every even or odd line, you'll get a sample from either blue, or red. If you place the sample horizontally and vertically, you'll get a sample from pink. Similarly for each other possible placement algorithm. Ogg.k.ogg.k 10:24, 25 September 2010 (UTC)

color and colorspace

I'm not 100% sure of this, but when I was messing with analogue and digitised-analogue video at University, I thought the key difference between 4:1:1 YUV and 4:2:0 YUV is that the former has the same color sub-sampling on every field (dealing with interlaced content), each at one-quarter of full rate, where as 4:2:0 YUV has only Y and U in the first field, then only Y and V in the second.

Effectively the "4:2:0" signal is successively 4:2:0 and 4:0:2, which is why the V component doesn't go away altogether as the name implies. The reason for such a strange encoding standard and it's use for PAL-format DV encoding is that the analogue PAL signal already does the throwing away of half the colour information from each field in the analogue composite signal used to get video around production facilities, so there would only be reduced temporal resolution U and V input to be digitised. It obviously makes a lot less sense for progressive-scan content without the interlace (it would look pretty poor to reduce the colour frame rate to 12.5 fps). There is an argument that the conversion of 4:2:0 interlaced content into progressive data for computer display converts it into 4:1:1 material when the color planes of the two fields are married up?

The only important thing to come out of this is that the diagram on the whiteboard looks a lot more like 4:1:1 video, and I would expect that to be the correct choice for progressive-scan content (which I take your images to be, it being simpler). The narration of the next scene also uses 4:1:1 rather than 4:2:0, which tends to emphasise the same point.

Containers

General discussion

The video hasn't yet been formally released but we have all the sites up early in order to get everything debugged... Feedback on site functionality prior to the official release would be very helpful. --Gmaxwell 15:15, 22 September 2010 (UTC)

Released now, but still tell us about bugs :-) --Monty

When do you plan to create and/or release the next episode in this series? --Minerva 05:52, 16 November 2010 (UTC)

Mad props to you for taking the time and spending the effort to make this video. I've been waiting for episode 2 since this first came out, any chance of that being produced soon? --StFS 17:20, 14 June 2011 (PDT)

Atom/RSS feed

Could not find an Atom/RSS feed for the video episodes. A videocast url with video-link enclosures would be ideal for getting future episodes. But even a announce-only feed would be convenient to track new episode releases. --Gsauthof 17:41, 24 September 2010 (UTC)

One does not exist yet— as a stopgap you can follow the Xiph tag on Monty's blog and you'll be sure to hear about new videos. This has to be the most requested feature— I'll make sure we do it before the next video.--Gmaxwell 20:50, 24 September 2010 (UTC)
Monty has uses the tag 'admpfg' for these videos on his blog, and Livejournal supports RSS feeds for tags. So, here's an RSS feed: http://xiphmont.livejournal.com/data/rss?tag=admpfg Nerd65536 17:52, 26 September 2010 (UTC)

44100 Hz Trivia

The reason CDs use a 44,100 Hz (actually 44,056 Hz in the United States) is because, before dedicated digital recorders became mainstream, the only way a recording engineer or producer could record digital audio was with a piece of gear called a "PCM processor" or a "PCM Adaptor" (like a Sony PCM-F1 of PCM-501). These would take an audio input and, after running through the A/D if necessary, it would modulate it onto a baseband monochrome NTSC or PAL video signal that could then be recorded onto a 3/4" U-Matic video tape. The processors would accept two inputs, at 16 bits, giving a total bit rate of 1411200 bps. This number has the serendipitous property of being evenly divisible by both 30 and 25, 47040 and 56448, and these numbers allow both NTSC and PAL to encode the same number of bits, 98, per scan line (with the NTSC 480 line raster and PAL 576 line raster). It was just convenient selection of integers. CDs would be recorded at 44.1k in Europe as they were mastered onto 25 fps tapes, while CDs in the US were recorded at a "nominal" 30fps were actually at 44.056, but the difference in tone is basically inaudible. Iluvcapra 18:44, 24 September 2010 (UTC)

Note that the PCM audio signal, once modulated to NTSC or PAL, can be recorded on any video recorder, not just U-matic. The most common tape format for PCM audio was Sony Betamax. Sony sold Betamax decks bundled with external PCM A/D converter units for the pro audio market. The PCM-F1 was designed to be used with Betacam VCRs. -- Dryo

32 bit IEEE754 float

The video says that 32 bit floats have 24 bits of resolution and a 7 bit exponent. This is incorrect. The exponent is eight bits. The mantissa does have a resolution of 24 bits, but only 23 bits are explicit. The encoding has one implicit bit that is always '1'. A 32 bit float can store all possible 24 bit integers exactly. Xiphmont 17:20, 8 March 2012 (PST)