User:Izx:GSoC2007
Ishaan Dalal
E-mail: [ishaand [at] gmail dot com]
My background
Currently, I'm working towards my master's degree in Electrical Engineering (EE) at the Cooper Union for the Advancement of Science and Art in New York, NY. I graduated from Cooper with a B.E. in EE in 2006. Source coding (audio and video in particular) has always been my primary academic interest; it began in high school when I tried to discover *how* MP3 let me digitize my tape collection to such small sizes and still remain eminently listenable.
To a great degree, this interest has dictated my upper-level and graduate coursework. Besides the usual EE undergrad curriculum, math coursework relevant to this proposal includes upper-level linear algebra and some graduate stochastics. Graduate EE courses, most of which I took when I was a junior/senior, included the theory of digital video and audio coding, adaptive filtering and multiresolution techniques (filterbanks/wavelets). Each of these also had a "final project", which were a hierarchical DCT-based video codec, a from-scratch MIDI player and a simple subband audio codec, an embedded zerotree wavelet (EZW) image codec and non-separable 3D DWTs for image volumes (i.e. video) respectively.
My hands-on experience with audio coding includes a simple MDCT-based codec with a rudimentary psychoacoustic model, whose goal was primarily implementing real-time 44.1 KHz stereo decoding/playback on a 16-bit RISC microcontroller (the dsPIC). The decoder was written from scratch in PIC assembly, including sine windowing, MDCT, and RLE/Huffman. My senior project also focused on signal processing: a software-defined RF receiver and image processor for biomedical applications (MRI), implemented on an FPGA; it won the IEEE Region 1 Student Paper Contest and was also the subject of two conference papers (http://i.zdom.org/pubs).
I consider myself quite proficient at coding "math" in both C and octave/matlab; however, my experience with coding GUIs is very limited. I have not contributed, code-wise, to an open-source project before this; I'm an irregular on the lame (mp3-encoder) mailing list, where I try to answer questions from others when I can.
Project
I'd like to help in developing Ghost. Considering that Ghost is still fairly nascent, I'll try to formulate what I'd like to do as best as I can, after having had discussions with both Monty and JM. Ghost will probably be a "sinusoidal+noise" codec. The "sinusoidal" portion requires a method of estimating the primary sinusoids in a given frame, classifying them psychoacoustically, and then quantizing them. Both Monty and JM have/are experimenting with STFT/basis methods for estimation, and I will help them with implementing and testing their ideas, based on their advice. This could vary from e.g. testing/tweaking their basis solvers, trying out dynamic windowing, quantization, etc.
Another sinusoidal estimation technique that seems promising, and has not been tried out yet by either Monty or JM to the best of my knowledge, is "Matching Pursuit" (MP) [1]. If given a chance, I would like to see whether MP is qualitatively and computationally feasible for sinusoidal estimation in a real-time low-latency codec. The first part would probably include testing different kinds of dictionaries - audio researchers have been successful with atoms that are damped sinusoids [2], complex exponentials [3], harmonic sinusoids. Psycho-adaptive and hybrid MP methods, such as Bark-band-type splitting followed by MP [4], both estimate and also classify to a degree, which might be computationally more efficient. Finally, there are also fast MP techniques such as [5], that use the DFT for fast correlation.
Project outcomes
Again, it's hard to give specific numbers here. Over the summer, I would basically like help come to a consensus about the direction Ghost will take re sinusoidal analysis, including preliminary investigations into possible approaches to improve the computational efficiency and R-D tradeoff for the chosen analysis technique.
Schedule
A rough schedule would be:
May 28 - July 10: Experiment with and decide on a promising sinusoidal estimation technique
July 11 - July 31: Figure out sinusoidal classification - deltas, psych-model picking, etc.
August 1 - August 20: Work on efficient quantization of estimated/classified sinusoids.
I have no other formal commitments for the summer, and expect to devote the time-equivalent of a full-time job to this project. I will be away for two weeks at some point during the summer, visiting family in California. I've also submitted a paper to the MWSCAS 2007 conference in Montreal; if accepted, I will be there from August 5-8.
Why me?
The ability to write good code will be a common denominator among all the applicants. What's relatively unique about me is a broad knowledge of the underlying math and signal processing theory, which will allow me to assist with Ghost's design in a more substantial and efficient manner, than a "mercenary" coder. As a hypothetical example, I could interpret certain results and "fix" the algorithm/code, without having to be micromanaged by Monty/JM. My experience with implementing audio/signal processing algorithms in assembly and hardware also lets me easily break out of the high-level-language/floating-point mold when optimization is necessary.
References
1. S. Mallat and Z. Zhang, "Matching Pursuits with time-frequency dictionaries", IEEE Trans. Sig. Proc., Vol. 41, No. 12, 1993.
2. M. Goodwin and M. Vetterli, "Matching Pursuit and Atomic Signal Models Based on Recursive Filter Banks", IEEE Trans. Sig. Proc., Vol. 47, No. 7, July 1999
3. R. Heusdens, et al. "Sinusoidal modeling of audio and speech using psychoacoustic-adaptive matching pursuits". ICASSP '01, pp. 3281-3284.
4. H. Jang and S. Park, "Multiresolution Sinusoidal Model With Dynamic Segmentation for Timescale Modification of Polyphonic Audio Signals", IEEE Trans. Speech and Audio Proc., Vol 13, No. 2, March 2005.
5. T.S. Verma and T.H.Y. Meng, "Sinusoidal modeling using frame-based perceptually weighted matching pursuits", ICASSP '99, pp. 981-984.