Talk:Videos/Digital Show and Tell

From XiphWiki
Revision as of 00:14, 3 May 2014 by Electronic (talk | contribs)
Jump to navigation Jump to search

Greetings, Feel free to comment here— just log in to edit— or join us on IRC chat.

Hi, I've got 3 questions, but first let me say thank you for the great explantions, I also donate to Wikipedia and have also donated here. Great videos. On to the questions:

The 3 questions relate to Sample rate, Waveform Representation and the Frequencies (if you like) ADSR Envelope behavour.


If we record (or generate data raw from inside the computer) using a Super High Sample Rate, like 192kHz, we would be bandlimiting at a much higher frequency, thus capturing more harmonics, which would in turn, give us a far more accurate representation of the waveform inside the computer when we re-generate the waveform using the recorded samples. So when we have the interest of editing (or processing) the recorded waveform with a much higher level of detail, it is beneficial to use much higher sample rates. So since the higher the Sample Rate = the more accurate the Waveform, it does make sense to use as high a sample rate as possible (considering your resources for processing or storing data). Would this entire statement be 100% accurate?


About the ADSR (envelope) recoding of each Frequency. In short, would an increased Bit Depth produce a more accurate representation of each frequencies Attack, Decay, Sustain & Decay in Amplitude? Therefore, on the output, be better able to accurately represent transients (and what follows) with more accuracy?


Another question relating to capturing ADSR behavour of frequencies. I came across an idea online that in order to capture transients ADSR behavour more accurately, you will need (at least) twice the Sample Rate of the the Transients Sustain Rate, otherwise the Transien will be deformed slightly (therefore distorted) - The Transient would still be represented at the output, but because the voltage has "less" samples to go through, the Transient will be less accurately represented compared to what the original Input was. The more samples the voltage runs through smoothly, the more detailed frequencies behavour is captured. Is this statement 100% accurate?

--Electronic 09:10, 03 May 2013 (PST)

The wiki version of the video isn't yet as complete as the last video, due to schedules and timelines. In particular I think it could use some more going-deeper coverage. I'm surprised that I couldn't better HTML5 audio api examples of the "type your own JS, get audio and a scope" kind, if anyone knows of a better one than the one we have now that would be great.

--Gmaxwell 02:54, 26 February 2013 (PST)

Just dropping a line to say thank you! I'm continually impressed by the guides and overall outreach coming out of the xiph team. The latest video was a great introduction that managed to walk that fine line between theory and application without falling over or flailing about madly (in my opinion, anyway). Not to mention, I'm going through the gtk-bounce and waveform code now and really like it! It's not so trivial a piece of software as to be meaningless when learning to code useful applications, but it's not so gigantic as to be unapproachable either. Hell, I think it would serve as a great example for the GNOME folks to use in their documentation. Most guides on GTK just have you draw shapes on the screen and leave it at that. All in all, I'm really impressed and hope to have a similar setup replicated in a few weeks at my university, just for the sake of it.

--Aggroskater 23:16, 26 February 2013 (PST)

Some parts are better written than others... I used enough cut & paste to warn against taking it too seriously :-) --Xiphmont 10:50, 28 February 2013 (PST)

@Monty: Thanks for these information. And as a non-native English speaker I want to thank you for your clear pronunciation. What did you mean with "no one ever ruined a great recording by not dithering the final master."? Do you mean, that nobody ever would forget it, or that it was not ruinous? That "CSS is awesome"-cup made me really nervous. I hope it means something like "Cascading Style Sheets", and not, what would fit better in this context, "Content Scramble System"[shudder]! --Akf 15:20, 27 February 2013 (PST)

I meant that "not adding dither is not ruinous". I recall in at least one listening test on the subject, a minority of participants had a statistically significant preference for undithered versions, at least on those samples where it was in fact audible and the testers were encouraged to increase gain and listen to fade-outs. *However* I can't find the results of that test now that I've gone back to look for it, so my memory may be faulty. I've asked the HA folks to help me find it again if it really existed :-)
CSS refers to Cascading Style Sheets. Thus the typsetting that overflows the box. --Xiphmont 10:50, 28 February 2013 (PST)
I once built an 8-bit ADC from transistors, for experimentation. One strange result was how few bits are needed for speech. 8kHz with just one bit resolution is still quite intelligible (though rather noisy). --RichardNeill 08:08, 4 March 2013 (PST)

Thanks for these resources! One question: In the vid, you mention the Gibbs phenomenon. Is that in any way related to the Fourier uncertainty principle? These days, in various audio-related forums, people throw this Oppenheim and Magnasco (2013) paper entitled "Human hearing beats the Fourier uncertainty principle" around in response to your 24/192 article. Does the paper qualify any of the results presented in the video and/or your 24/192 article? (Just fixed the reference.) Lenfaki 13:14, 28 February 2013 (PST)

it is related to the fourier uncertainty principle in that all of these effects are in some way related by the same math. As for the "Human hearing beats the Fourier uncertainty principle" paper floating around, a) the headline is effectively wrong, b) the effect described as 'newly discovered' has been understood for roughly 100 years, this merely adds some new hard measurements to the data set, c) the Gabor limit does not even apply to the detection task they're describing. So either the authors or their editor are partly confused. There's been a decent discussion of it at Hydrogen Audio, with none other than James Johnston and Ethan Winer weighing in. --Xiphmont 10:50, 28 February 2013 (PST)

You talk about discrete values (whether the analog sample points, or the infinitesimal image pixels). BUT, these are in some way, averages. In a digital camera, the pixel value is the integral across about 90% of the pixel-pitch. In analog audio, is it an instantaneous sample, or an average over the preceding sample-interval, or is it sometimes even more "blurred" than that? Also, when performing DAC, how do we get rid of the stairstep so perfectly without distortion? --RichardNeill 07:51, 4 March 2013 (PST)

The pixel values in a camera are area averages exactly as you say, a necessary compromise in order to have enough light to work with. The sensor sits behind an optical lowpass filter that is intentionally blurring the image to prevent much of the aliasing distortion (Moiré) that would otherwise occur. Despite that, cameras still alias a bit, and if you _remove_ that anti-aliasing filter, you get much more (I have such a camera, danged filter was bonded to the hot filter, so both had to go to photograph hydrogen alpha lines).
Audio does in fact use as close to an instantaneous sample as possible. The 'stairsteps' of a zero-order hold are quite regular in the frequency domain; they're folded mirror images of the original spectrum extending to infinity. All the anti-imaging filter has to do is cut off everything above the original channel bandwidth, and it doesn't even have to do a great job to beat a human ear :-) --Xiphmont 02:56, 12 March 2013 (PDT)

What's the correct way to plot a reconstructed waveform? If I have an array of samples and play them back through a DAC, the oscilloscope shows a smooth curve. But plotting them with eg matplotlib shows a stairstep. Thanks --RichardNeill 07:58, 4 March 2013 (PST)

well, a fully reconstructed waveform is equal to the original input; it's a smooth continuous waveform. OTOH, if you want to plot an actual zero-order hold, a zero order hold really is a staircase waveform.
If you want to plot the digital waveform pre-reconstruction, a mathemetician would always use lollipops, an engineer will use whatever's the most convenient. --Xiphmont 02:56, 12 March 2013 (PDT)

I have a strange issue with the gtk-bounce program - on (k)ubuntu 12.10, spectrum and waveform work just fine, but if I scroll into the gtk-bounce panel, the cursor disappears. Anyone seen that behaviour? - Julf

Edit out the calls to "hide_mouse()" in gtk-bounce-widget.c. It's hiding the mouse on purpose because it's supposedly a touch application :-) --Xiphmont 02:56, 12 March 2013 (PDT)

One issue I'm not sure you have covered, relates content with short, loud sections (e.g. more like a movie or maybe classical music, less like pop music). I understand that that a 10dB change in sound level is perceived as twice as loud. Lets say we have some content where one section is 4 times (20dB) louder than the rest (not unreasonable - that is the difference between a conversation and street noise according to this page). If each extra bit adds 6dB of dynamic range, then the quiter sections will effectively be quantized using roughly 3 fewer bits than the louder section (e.g. 13bits rather than 16bits). If the majority of the content relatively quite (ie. being quantized using 13bits or less depending on how quiet relative to the peaks) then is it really fair to claim "16bit quality" for the entire piece of conetnet? Is this a real problem?? Is this ever an audible? Klodj 21:59, 11 March 2013 (PDT)

the bit depth doesn't affect the correctness or 'fineness' of the reconstruction, it only changes the noise floor. It is the same and sounds the same as what happens when recording quieter-than-full-range on analogue tape. Compare a 16-bit digital signal recorded at -20dBFS to the same signal recorded on tape at -20dBV. Both will be smooth and analogue, but the tape will be noisier ;-) --Xiphmont 02:56, 12 March 2013 (PDT)
Thank you for your reply! Ok, I understand that, when comparing digital to "analog" (tape), this is a non-issue. The tape noise floor is higher. But could you clarify a related point that is entirely in the digital domain? Lets say we have a 16bit recording of the 1812 Overture where the canons don't clip. The average level is going to depend on how dynamic range is compressed to handle the canons, but lets say it -36dBFS. If I adjust volume to suit the average level, then won't I effectively be hearing a noise floor equivalent to 10 bit quantization (16 bits - 36dB/6db_per_bit) for the majority of the recording (dither aside). --Klodj 19:13, 13 March 2013 (PDT)
Yes. --Xiphmont 00:54, 14 March 2013 (PDT)

I've really enjoyed your videos, many thanks for your work producing them. In keeping with your message in the video about breaking things I was curious to see what would happen as you pushed the signal up to and beyond the Nyquist frequency. Would the filters mean that it would simply fade out smoothly (what order filter is employed?)? I suppose I could check this out myself, but would be interested to hear your answer. Also is the Gibbs phenomenon audible to any degree? --Stuarticus 14:00, 11 June 2013 (PDT)

If the filter is smooth, it will fade smoothly. If the filter is sharp, it will drop off suddenly. Some DACs put the transition band of the anti-imaging filters straddling the Nyquist frequency or slightly past, so you might even see the signal fold back about Nyquist as it drops off (this used to be more common about 10 years ago). So long as no significant aliasing reaches back under 20kHz, this is just one of many arbitrary design decisions that don't really affect the audio quality. The order of the digital filter used in the upsampling stage can be nearly anything, but they're not likely to be as huge as many software resampling filters, where a linear-phase FIR of 512 or 1024 taps is common. The analog filter stages, if they're there at all, are unlikely to be more than a handful of taps. Detailed spec sheets should say outright, and if they don't they should at least usually mention the approximate slope. --Xiphmont 03:45, 15 June 2013 (UTC)

I just watch the video and has one question. During demonstration which Monty has feed generator at various frequency and convert to digital at 44.1KHz and convert back to analog and show the analog result at second oscilloscope which is the good way to show the whole digital way of encode and decode. At 1KHz sine wave input, when sampling at 44.1KHz, there will be 44.1 sampling per sine wave or 22 sampling per half sine wave which is not too bad to represent the sine wave and the output from digital-to-analog should still be quite close to original sine wave. However, at input of 20KHz, there will be only 2.2 sampling per sine wave or just 1 sampling per half wave and it merely enough to represent sine wave. My question is if there is just one sampling per half wave, how can the output from digital-to-analog is still be very good sine wave. If 20KHz is sampling at 192KHz, there will be 9.6 sampling per sine wave or 4.8 sampling per half wave which is far from perfect but still better than one. Does this mean that we should have better output if we increasing the sampling from 44.1KHz to 192KHz? --Somchaisis 05:37, 25 August 2013 (UTC)

You'll have exactly the same 20 kHz sine wave at 192 kHz as 44.1 kHz. Watch 6m 40s - 7m 06s again. Only two sample points are needed to perfectly recreate a sine wave - any extra samples are redundant superfluous. --Leorex 09:29, 15 January 2014 (PST)
2.2 samplings per period is enough to fully recreate the original sine wave. It's counterintuitive, but try and think of it like this. You know the input signal (analog) is band-passed to 20kHz, so there are no frequencies higher than 20kHz to be reconstructed. Now look at the 2.2 samples per period; try and draw a continuous line through all the samples without using any frequencies above 20kHz. The maximum frequency of 20kHz limits how "quickly" you can rise or fall the slope of the line. So in fact, there is only *1* solution for the line you draw through the sampling points. You can 100% recreate the original analog signal from the sampling points. You will not get better output by increasing the sampling rate to 192kHz, because you have already reconstructed 100% of the signal. -- Nhand42 13:26, 14 February 2014 (PST)

Thank you for the entertaining and educational demo. You have a natural talent for explaining complicated concepts. I studied signals and systems at university and I swear you managed in 30 minutes to explain the same material that took my professors nearly 6 months! I'm looking forward to your next show and tell. -- Nhand42 12:54, 14 February 2014 (PST)

So I understand that only one bandlimited signal is able to pass the samples perfectly. But how does the DAC know, which one. Does it do some math inside? how are all the volts created that form this smooth curve?--Vietwoojagig 04:30, 4 April 2014 (PDT)

And another question: Audiophile "believers" will argue, that digital signals always have jitter. What type of noise is that? And how would you agure, that this does not matter?--Vietwoojagig 04:30, 4 April 2014 (PDT)