Talk:Videos/Digital Show and Tell

From XiphWiki
Jump to navigation Jump to search

Greetings, Feel free to comment here— just log in to edit— or join us on IRC chat.

The wiki version of the video isn't yet as complete as the last video, due to schedules and timelines. In particular I think it could use some more going-deeper coverage. I'm surprised that I couldn't better HTML5 audio api examples of the "type your own JS, get audio and a scope" kind, if anyone knows of a better one than the one we have now that would be great.

--Gmaxwell 02:54, 26 February 2013 (PST)

Just dropping a line to say thank you! I'm continually impressed by the guides and overall outreach coming out of the xiph team. The latest video was a great introduction that managed to walk that fine line between theory and application without falling over or flailing about madly (in my opinion, anyway). Not to mention, I'm going through the gtk-bounce and waveform code now and really like it! It's not so trivial a piece of software as to be meaningless when learning to code useful applications, but it's not so gigantic as to be unapproachable either. Hell, I think it would serve as a great example for the GNOME folks to use in their documentation. Most guides on GTK just have you draw shapes on the screen and leave it at that. All in all, I'm really impressed and hope to have a similar setup replicated in a few weeks at my university, just for the sake of it.

--Aggroskater 23:16, 26 February 2013 (PST)

Some parts are better written than others... I used enough cut & paste to warn against taking it too seriously :-) --Xiphmont 10:50, 28 February 2013 (PST)

@Monty: Thanks for these information. And as a non-native English speaker I want to thank you for your clear pronunciation. What did you mean with "no one ever ruined a great recording by not dithering the final master."? Do you mean, that nobody ever would forget it, or that it was not ruinous? That "CSS is awesome"-cup made me really nervous. I hope it means something like "Cascading Style Sheets", and not, what would fit better in this context, "Content Scramble System"[shudder]! --Akf 15:20, 27 February 2013 (PST)

I meant that "not adding dither is not ruinous". I recall in at least one listening test on the subject, a minority of participants had a statistically significant preference for undithered versions, at least on those samples where it was in fact audible and the testers were encouraged to increase gain and listen to fade-outs. *However* I can't find the results of that test now that I've gone back to look for it, so my memory may be faulty. I've asked the HA folks to help me find it again if it really existed :-)
CSS refers to Cascading Style Sheets. Thus the typsetting that overflows the box. --Xiphmont 10:50, 28 February 2013 (PST)
I once built an 8-bit ADC from transistors, for experimentation. One strange result was how few bits are needed for speech. 8kHz with just one bit resolution is still quite intelligible (though rather noisy). --RichardNeill 08:08, 4 March 2013 (PST)

Thanks for these resources! One question: In the vid, you mention the Gibbs phenomenon. Is that in any way related to the Fourier uncertainty principle? These days, in various audio-related forums, people throw this Oppenheim and Magnasco (2013) paper entitled "Human hearing beats the Fourier uncertainty principle" around in response to your 24/192 article. Does the paper qualify any of the results presented in the video and/or your 24/192 article? (Just fixed the reference.) Lenfaki 13:14, 28 February 2013 (PST)

it is related to the fourier uncertainty principle in that all of these effects are in some way related by the same math. As for the "Human hearing beats the Fourier uncertainty principle" paper floating around, a) the headline is effectively wrong, b) the effect described as 'newly discovered' has been understood for roughly 100 years, this merely adds some new hard measurements to the data set, c) the Gabor limit does not even apply to the detection task they're describing. So either the authors or their editor are partly confused. There's been a decent discussion of it at Hydrogen Audio, with none other than James Johnston and Ethan Winer weighing in. --Xiphmont 10:50, 28 February 2013 (PST)

You talk about discrete values (whether the analog sample points, or the infinitesimal image pixels). BUT, these are in some way, averages. In a digital camera, the pixel value is the integral across about 90% of the pixel-pitch. In analog audio, is it an instantaneous sample, or an average over the preceding sample-interval, or is it sometimes even more "blurred" than that? Also, when performing DAC, how do we get rid of the stairstep so perfectly without distortion? --RichardNeill 07:51, 4 March 2013 (PST)

The pixel values in a camera are area averages exactly as you say, a necessary compromise in order to have enough light to work with. The sensor sits behind an optical lowpass filter that is intentionally blurring the image to prevent much of the aliasing distortion (Moiré) that would otherwise occur. Despite that, cameras still alias a bit, and if you _remove_ that anti-aliasing filter, you get much more (I have such a camera, danged filter was bonded to the hot filter, so both had to go to photograph hydrogen alpha lines).
Audio does in fact use as close to an instantaneous sample as possible. The 'stairsteps' of a zero-order hold are quite regular in the frequency domain; they're folded mirror images of the original spectrum extending to infinity. All the anti-imaging filter has to do is cut off everything above the original channel bandwidth, and it doesn't even have to do a great job to beat a human ear :-) --Xiphmont 02:56, 12 March 2013 (PDT)

What's the correct way to plot a reconstructed waveform? If I have an array of samples and play them back through a DAC, the oscilloscope shows a smooth curve. But plotting them with eg matplotlib shows a stairstep. Thanks --RichardNeill 07:58, 4 March 2013 (PST)

well, a fully reconstructed waveform is equal to the original input; it's a smooth continuous waveform. OTOH, if you want to plot an actual zero-order hold, a zero order hold really is a staircase waveform.
If you want to plot the digital waveform pre-reconstruction, a mathemetician would always use lollipops, an engineer will use whatever's the most convenient. --Xiphmont 02:56, 12 March 2013 (PDT)

I have a strange issue with the gtk-bounce program - on (k)ubuntu 12.10, spectrum and waveform work just fine, but if I scroll into the gtk-bounce panel, the cursor disappears. Anyone seen that behaviour? - Julf

Edit out the calls to "hide_mouse()" in gtk-bounce-widget.c. It's hiding the mouse on purpose because it's supposedly a touch application :-) --Xiphmont 02:56, 12 March 2013 (PDT)

One issue I'm not sure you have covered, relates content with short, loud sections (e.g. more like a movie or maybe classical music, less like pop music). I understand that that a 10dB change in sound level is perceived as twice as loud. Lets say we have some content where one section is 4 times (20dB) louder than the rest (not unreasonable - that is the difference between a conversation and street noise according to this page). If each extra bit adds 6dB of dynamic range, then the quiter sections will effectively be quantized using roughly 3 fewer bits than the louder section (e.g. 13bits rather than 16bits). If the majority of the content relatively quite (ie. being quantized using 13bits or less depending on how quiet relative to the peaks) then is it really fair to claim "16bit quality" for the entire piece of conetnet? Is this a real problem?? Is this ever an audible? Klodj 21:59, 11 March 2013 (PDT)

the bit depth doesn't affect the correctness or 'fineness' of the reconstruction, it only changes the noise floor. It is the same and sounds the same as what happens when recording quieter-than-full-range on analogue tape. Compare a 16-bit digital signal recorded at -20dBFS to the same signal recorded on tape at -20dBV. Both will be smooth and analogue, but the tape will be noisier ;-) --Xiphmont 02:56, 12 March 2013 (PDT)
Thank you for your reply! Ok, I understand that, when comparing digital to "analog" (tape), this is a non-issue. The tape noise floor is higher. But could you clarify a related point that is entirely in the digital domain? Lets say we have a 16bit recording of the 1812 Overture where the canons don't clip. The average level is going to depend on how dynamic range is compressed to handle the canons, but lets say it -36dBFS. If I adjust volume to suit the average level, then won't I effectively be hearing a noise floor equivalent to 10 bit quantization (16 bits - 36dB/6db_per_bit) for the majority of the recording (dither aside). --Klodj 19:13, 13 March 2013 (PDT)
Yes. --Xiphmont 00:54, 14 March 2013 (PDT)

I've really enjoyed your videos, many thanks for your work producing them. In keeping with your message in the video about breaking things I was curious to see what would happen as you pushed the signal up to and beyond the Nyquist frequency. Would the filters mean that it would simply fade out smoothly (what order filter is employed?)? I suppose I could check this out myself, but would be interested to hear your answer. Also is the Gibbs phenomenon audible to any degree? --Stuarticus 14:00, 11 June 2013 (PDT)

If the filter is smooth, it will fade smoothly. If the filter is sharp, it will drop off suddenly. Some DACs put the transition band of the anti-imaging filters straddling the Nyquist frequency or slightly past, so you might even see the signal fold back about Nyquist as it drops off (this used to be more common about 10 years ago). So long as no significant aliasing reaches back under 20kHz, this is just one of many arbitrary design decisions that don't really affect the audio quality. The order of the digital filter used in the upsampling stage can be nearly anything, but they're not likely to be as huge as many software resampling filters, where a linear-phase FIR of 512 or 1024 taps is common. The analog filter stages, if they're there at all, are unlikely to be more than a handful of taps. Detailed spec sheets should say outright, and if they don't they should at least usually mention the approximate slope. --Xiphmont 03:45, 15 June 2013 (UTC)

I just watch the video and has one question. During demonstration which Monty has feed generator at various frequency and convert to digital at 44.1KHz and convert back to analog and show the analog result at second oscilloscope which is the good way to show the whole digital way of encode and decode. At 1KHz sine wave input, when sampling at 44.1KHz, there will be 44.1 sampling per sine wave or 22 sampling per half sine wave which is not too bad to represent the sine wave and the output from digital-to-analog should still be quite close to original sine wave. However, at input of 20KHz, there will be only 2.2 sampling per sine wave or just 1 sampling per half wave and it merely enough to represent sine wave. My question is if there is just one sampling per half wave, how can the output from digital-to-analog is still be very good sine wave. If 20KHz is sampling at 192KHz, there will be 9.6 sampling per sine wave or 4.8 sampling per half wave which is far from perfect but still better than one. Does this mean that we should have better output if we increasing the sampling from 44.1KHz to 192KHz? --Somchaisis 05:37, 25 August 2013 (UTC)

You'll have exactly the same 20 kHz sine wave at 192 kHz as 44.1 kHz. Watch 6m 40s - 7m 06s again. Only two sample points are needed to perfectly recreate a sine wave - any extra samples are redundant superfluous. --Leorex 09:29, 15 January 2014 (PST)
2.2 samplings per period is enough to fully recreate the original sine wave. It's counterintuitive, but try and think of it like this. You know the input signal (analog) is band-passed to 20kHz, so there are no frequencies higher than 20kHz to be reconstructed. Now look at the 2.2 samples per period; try and draw a continuous line through all the samples without using any frequencies above 20kHz. The maximum frequency of 20kHz limits how "quickly" you can rise or fall the slope of the line. So in fact, there is only *1* solution for the line you draw through the sampling points. You can 100% recreate the original analog signal from the sampling points. You will not get better output by increasing the sampling rate to 192kHz, because you have already reconstructed 100% of the signal. -- Nhand42 13:26, 14 February 2014 (PST)

Regardless of reconstruction filter, though, approaching the Nyquist limit it seems that bit depth and phase become extremely important: A cosine wave sampled on the peaks at Nyquist should reconstruct with full bit depth, but a sine wave will be sampled at its zeros and so have basically zero bit depth. A slightly-shifted sign wave would have around 1 bit of depth, and anything other than a cosine wave has a phase ambiguity (i.e., a cos wave shifted to the right by delta samples the same way a cos wave shifted to the left by delta). Thoughts? BenFrantzDale (talk) 06:35, 9 November 2015 (PST)

Related: the notion that you can place a step anywhere is clearly (by the pigeonhole principal) a pedagogical white lie. (It shouldn't matter 'cause with even moderate bit depth, the crossing time can be placed to unreasonably good precision. Still, the time at which a step function crosses a given value is strongly a function of the neighboring two samples, so if the signal is sufficiently low-amplitude, you'll run out of bits and stepping either sample by 1 DAC unit would move the crossing by a nontrivial fraction of the sampling period. BenFrantzDale (talk) 06:35, 9 November 2015 (PST)

Thank you for the entertaining and educational demo. You have a natural talent for explaining complicated concepts. I studied signals and systems at university and I swear you managed in 30 minutes to explain the same material that took my professors nearly 6 months! I'm looking forward to your next show and tell. -- Nhand42 12:54, 14 February 2014 (PST)

So I understand that only one bandlimited signal is able to pass the samples perfectly. But how does the DAC know, which one. Does it do some math inside? how are all the volts created that form this smooth curve?--Vietwoojagig 04:30, 4 April 2014 (PDT)

And another question: Audiophile "believers" will argue, that digital signals always have jitter. What type of noise is that? And how would you agure, that this does not matter?--Vietwoojagig 04:30, 4 April 2014 (PDT)

Hi, I've got 3 questions, but first let me say thank you for the great explanations, I donate to Wikipedia and have also donated here. Great videos. On to the questions:

The 3 questions relate to Sample rate, Waveform Representation and the Frequencies (if you like) ADSR Envelope behaviour.

1) If we record (or generate data raw from inside the computer) using a Super High Sample Rate, like 192kHz, we would be bandlimiting at a much higher frequency, thus capturing more harmonics, which would in turn, give us a far more accurate representation of the waveform inside the computer when we re-generate the waveform using the recorded samples. So when we have the interest of editing (or processing) the recorded waveform with a much higher level of detail, it is beneficial to use much higher sample rates. So since the higher the Sample Rate = the more accurate the Waveform, it does make sense to use as high a sample rate as possible (considering your resources for processing or storing data). Would this entire statement be 100% accurate?

The article 24/192 Music Downloads ...and why they make no sense Section "192kHz considered harmful" gives you the answer to that question --Vietwoojagig 02:40, 5 May 2014 (PDT)

2) About the ADSR (envelope) recoding of each Frequency. In short, would an increased Bit Depth produce a more accurate representation of each frequencies Attack, Decay, Sustain & Decay in Amplitude? Therefore, on the output, be better able to accurately represent transients (and what follows) with more accuracy?

A frequency doesn't have attack, decay, or sustain. When we are talking about frequency components, we are talking about making up the entire song (say) out of a sum of sine and cosine waves of different frequencies. What our ears hear as attack, sustain, and decay comes from a sum of many many many different sine/cosine waves. It's a little mind-bending if you haven't groked it before. BenFrantzDale (talk) 07:07, 9 November 2015 (PST)

3) Another question relating to capturing ADSR behaviour of frequencies. I came across an idea online that in order to capture transients ADSR behaviour more accurately, you will need (at least) twice the Sample Rate of the Transients Sustain Rate, otherwise the Transient will be deformed slightly (therefore distorted) - The Transient would still be represented at the output, but because the voltage has "less" samples to go through, the Transient will be less accurately represented compared to what the original Input was. The more samples the voltage runs through smoothly, the more detailed the frequencies ADSR Amplitude behaviour is captured. Is this statement 100% accurate?

I'm not sure about this one, but I think that's just saying you need to properly sample every frequency component you are interested in. BenFrantzDale (talk) 07:07, 9 November 2015 (PST)

--Electronic 09:10, 03 May 2014 (PST)

Sample rates and A/D/A conversions

Hi Monty, or others that know the answers, if I were you I'd be completely frustrated by what I'm about to ask regarding sample rates and A/D/A conversions after all the trouble you've been to to explain, so thanks for your patience. I've read the "24/192 Music Downloads" article and watched this video/read the wiki, but these particular questions are not answered:

1) The A/D/A process produces a constant analogue signal, analogous to that which it is sampling. Could you explain the difference that would be seen if using music rather than a constant frequency/amplitude analogue signal?

Music is a sum of frequencies. So no difference.--Vietwoojagig 07:00, 2 February 2015 (PST)

What I'm failing to understand is, if using a constantly varying analogue input, the sampling, which is fast but not the speed of analogue fast, will miss some small part of those frequency/amplitude changes and the result, shown in the analogue output, will be an average of the previous and successive samples. Is this correct? Leading to...

Noise--Vietwoojagig 07:00, 2 February 2015 (PST)
The in-between values that get missed by ideal instant sampling aren't actually missed: they are seen by the low-pass analog prefilter and so those values do affect the samples that get seen in that they do affect the digitized values. Of course, if you add a signal that is, e.g., 2x the Nyquist frequency (one full cycle per sample) then there are some interesting examples: If that high-frequency signal is out of phase with the samples, then without a prefilter you wouldn't see it, and with a prefilter you still wouldn't see it because that content would be filtered out. If it is in phase with the sampling, then without a prefilter, you'd be increasing every other sample and decreasing every other sample, but with prefiltering, it would be rejected and so digitization still wouldn't see it. All of this assumes that your signal is long enough to be considered infinite (or repeating), of course. BenFrantzDale (talk) 06:56, 9 November 2015 (PST)

2a) "24/192..." made reference to better quality masters (SACD Vs CD); is a higher sample rate a contributing factor to that higher quality?

No--Vietwoojagig 07:00, 2 February 2015 (PST)

2b) If sampling/recording/digitising live music, a higher sample rate will produce a more accurate digital master than the same live music recorded at lower sample rates?

No. Also live music is the sum of frequencies--Vietwoojagig 07:00, 2 February 2015 (PST)
The interesting caveat Monty mentioned in one of his videos is that sampling at a higher rate allows some of the prefiltering to be done digitially, but to you don't have to *record* (i.e., *save*) at a high sample rate, you get the full benefit by just sampling at that rate, filtering the stream and downsampling to 44.1 kHz. BenFrantzDale (talk) 06:56, 9 November 2015 (PST)

3) Each sample is 16Bits long. What/How much data do these bits store/transmit?

16 Bit = 65536 different values from -32768...32767--Vietwoojagig 07:00, 2 February 2015 (PST)

There is reference everywhere to a Bit carrying Dynamic Range of the sample.

Dynamic range is the ratio between the largest and smallest signal a system can record or reproduce. [1]--Vietwoojagig 07:00, 2 February 2015 (PST)

Where does the information regarding frequency come from; is it an artefact of the sample rate combined with data carried by the Bit?

After thinking about it: yes.--Vietwoojagig 07:00, 2 February 2015 (PST)

Regards, Stephen Morris. And Thanks --> -- Storris 11:15, 15 September 2014 (PST)

Hey guys, I'm new to Linux and I have some issues to start the software configuration shown in the video. I've managed to built all programs and also created the fifos. When I try to start the configuration with the command from the wiki, I get as output on my console "bash: gtk-bounce: command not found..." I can start gtk-bounce as single application, but this config doesn't work. Any ideas what I have to do to make it work?

Regards, Moritz Schmidt ----Tujumo 14:09, 17. January 2015 (PST)

Graphical analog AA filtering

Great video! I've thought about the same issues in graphics (where at best the reconstruction actually is with square pixels(!)), which has lead me to the Wikipedia Kell factor page, which I haven't heard discussed elsewhere. Now that we have sane resolution on our screens, it would be nice if they included antialiasing filters so we no longer had square pixels (and in particular, so a droplet of water didn't enlarge a pixel and show R/G/B!). It is interesting to note that typically a camera is actually diffraction limited, which introduces a pretty decent AA filter, it's just displays are the problem. BenFrantzDale (talk) 06:35, 9 November 2015 (PST)