Jump to: navigation, search


5,262 bytes added, 09:17, 22 December 2012
obsolete too???
This page is meant to track ideas about low-delay, high-quality audio coding. The work has just started, so don't expect anything in the near future (or at all for that matter).
** Many (sometimes closely spaced) harmonics
* Shapred noise
** Signals that are (or are indistinguishablefrom) from filtered (coloured) white noise
* Transients
** Whatever does't doesn’t fit above I guess
== Signal analysis ==
Tried so far:
* MUSIC fails on non-trivial signals and very complex, although there's an AES paper that recommends first whitening the noise part of the signal before applying the algo. Haven't tried that so far.* ESPRIT fails on non-trivial signals and very complex(see above for possible solution)* LPC would probably work, but requires an insane order -> impractical, plus it tends to be numerically unstable anyway.* FFT poor resolution, but that's all we have left so far. There's an AES paper that describes a sort of time-domain phase unwrapping that could help.
Step two: what to match
==== Phase lock loop (PLL) ====
== Quantization Ideas ==
After the sinusoids have been extracted they have to be quantized. The possible ways are
* Sort the sinusoids according to energy and transmit only a finite number or only ones with a specific energy or above. The indices of the sinusoids before rearranging will have to be sent.
** I think it's worth checking which is most efficient. Sorting the sinusoids will help quantizing the amplitude, but make it harder to encode frequency. [[User:Jmspeex|Jmspeex]] 05:45, 28 June 2006 (PDT)
* Use the psycho acoustic properties and remove all the sinusoids, which will be masked by other tones.
** Of course, we don't want to encode perceptually irrelevant sinusoids. Actually, we want the resolution (in amplitude, phase and probably frequency) to scale with the amplitude-to-mask ratio or something like that. [[User:Jmspeex|Jmspeex]] 05:45, 28 June 2006 (PDT)
* After removing perceptually irrelevant and low-energy tones the energy in each critical bands has to be adjusted to match with the initial energy.
** Possibly -- I don't know much on that topic. Monty probably has valuable experience. [[User:Jmspeex|Jmspeex]] 05:45, 28 June 2006 (PDT)
* Time-differential coding of sinusoids across frames can be used
** Definitely. This is very important if we plan on using short frames. It would be important to minimize inter-frame redundancy, but still make it possible to recover from packet loss. For that, we could either use a leaky predictor (like the pitch in CELP) or use key-frames (like a video codec). [[User:Jmspeex|Jmspeex]] 05:45, 28 June 2006 (PDT)
==== Quantization of frequencies====
* Quantize frequencies of a few selected sinusoids and recreate other values using interpolation.
** How would you do that? (maybe I'm not following here) [[User:Jmspeex|Jmspeex]] 05:45, 28 June 2006 (PDT)
==== Quantization of Amplitudes ====
* Model the energy curve of the sinusoids – for instance using an exponential curve
** Exponential decay might be a good way to do inter-frame prediction. [[User:Jmspeex|Jmspeex]] 05:45, 28 June 2006 (PDT)
* Quantize amplitudes of a few selected sinusoids and recreate other values using interpolation.
** Possibly, but probably not at first (hard problem). [[User:Jmspeex|Jmspeex]] 05:45, 28 June 2006 (PDT)
==== Quantization of phase and modulation parameters ====
* Can be scalar quantized with the number of bits allocated being proportional to the energy of the sinusoid
** Yes. Also, this is something that can be predicted very well across frames. It's not even necessary to make that one robust to losses, because as long as the phase is continuous, no one will notice [[User:Jmspeex|Jmspeex]] 05:45, 28 June 2006 (PDT)
==== Quantization of indices ====
==== Quantization of energy gains in critical bands ====
=== Excitation similarity weighting ===
The idea behind the ESW technique is to select sinusoids such that each new sinusoid added will provide a maximum incremental gain in matching between the auditory excitation pattern associated with the original signal and the auditory excitation pattern associated with the modeled signal. In order to accomplish this goal, an iterative process is proposed in which each sinusoid extracted during conventional analysis is assigned an excitation similarity weight. During each iteration, the sinusoid having the largest weight is added to the modeled representation. New sinusoids are accumulated until some constrain is exhausted, for example, a bit budget. The algorithm tends to converge as the number of modeled sinusoids increases
-- Not sure I understand here. Any reference? [[User:Jmspeex|Jmspeex]] 05:45, 28 June 2006 (PDT)
=== Trajectory tracking ===
Once the meaningful sinusoidal peaks and their parameters have been estimated, the peaks are tracked together into inter-frame trajectories. At each frame, a peak continuation algorithm tries to connect the sinusoidal peak into the already existing trajectories at the previous frame, resulting into a smooth curve of frequencies and amplitudes. The continuation was tested with two algorithms: the traditional one which uses only the parameters of the sinusoids to obtain smooth trajectories and one original method which synthesizes the possible continuations inside certain deviation limits and compares them to the original signal. There is also other systems which use more advanced methods, for example the Hidden Markov Models to track the trajectories.
Sinusoidal trajectories contain all the information needed for the reconstruction of the harmonic parts of input signals: amplitudes, frequencies and phases of each trajectory at each frame. To avoid discontinuities at frame boundaries, the amplitudes, frequencies and phases are interpolated from frame to frame.
*Amplitudes are linearly interpolated
* Phase interpolated with cubic polynomials
-- Any reference? [[User:Jmspeex|Jmspeex]] 05:45, 28 June 2006 (PDT)

Navigation menu