## Thursday, August 11, 2016

### Audio Resampling in Python

(Note: this post was written as a Jupyter Notebook which can be found with the Python code at https://nbviewer.jupyter.org/gist/jthiem/0bb8b2296c8dbd0b83bfab8270a8eb24.)
One of the most basic tasks in audio processing anyone would need to do is resampling audio files; seems like the data you want to process is never sampled in the rate you want. 44.1k? 16k? 8k? Those are the common ones; there are some really odd ones out there.
Resampling is actually quite hard to get right. You need to properly choose your antialias filters, and write a interpolation/decimation procedure that won't introduce too much noise. There have been books written on this topic. For the most part, there is no single univeral way to to this right, since context matters.
For a more detailed discussion on sample rate conversion see the aptly named "Digital Audio Resampling Homepage" [https://ccrma.stanford.edu/~jos/resample/]. Then look at [http://src.infinitewave.ca/] for a super informative comparison of a ton of resampling implementations. No seriously, go there. (It inspired me to compare the methods below using a sine sweep.)
MATLAB has the built-in resample function. Everyone uses it. It works, but also sucks in many aspects - a much better function is in the DSP toolbox, but as near as I can tell, noone uses it (even when copious other parts of the DSP toolbox are being used!)
But I'm not talking about MATLAB, I'm talking about Python here, and it doesn't have a default resampling method - except if you're doing any kind of numerical computing, you'll be using numpy and probably scipy --- and the latter has a built-in resample function.
To get ahead of myself a bit, scipy.signal.resample sucks for audio resampling. That becomes apparent quite quickly - it works in frequency domain, by basically truncation or zero-padding the signal in the frequency domain. This is quite ugly in time domain (especially since it assumes the signal to be circular).
There are two alternatives I want to point at that are "turn-key", that is, you just have to specify your resampling ratio, and the library does the rest. I know of a few others, but at the time of writing this they didn't seem as easy to use (eg. just doing the interpolation/decimation, but not calculating a filter.) If there are other notable libraries I should know of, please mention it in the comments of the blog version of this notebook [https://signalsprocessed.blogspot.de].
The first alternative is scikit.resample. Using it has two problems: it relies on an external (C-language) library called "Secret Rabbit Code" (SRC as in "sample rate conversion", geddit?) aka libsamplerate [http://www.mega-nerd.com/SRC/]. Not a problem in itself, but you need to install it and this step is not automated in pip and friends (as far as I know at time of writing). (Problem for me was that I needed to run it on a platform where I couldn't do this easily.) Problem 2: scikit.resample 0.3.3 is Python 2 only, and the Python 3 version is unofficial (0.4.0-dev, I used [https://github.com/gregorias/samplerate]).
The other one is resampy, which can be found in PyPI or [https://github.com/bmcfee/resampy]. It is easy to install and works with Python 3 (librosa now uses it as preferred resampler over libsamplerate)
TL;DR: if you can install external libs, use scikit.resample (0.4.0-dev). resampy is faster, but not as good, but entirely reasonable. (If MATLAB resample is good enough for you, resampy will serve you just fine and is easier to install)
Now for a comparison with pretty pictures.

## Downsampling a sweep

A common task is to convert between the CD sampling rate of 44.1 kHz, and the multiples of 8kHz that originated from the telekon industry. In particular, 48kHz. (Supposedly the rate of 44.1k was chosen in part because is was difficult to convert to 48 kHz; but it is also connected to NTSC timing). So let's have a sweep sampled at 48 kHz. (See the Jupyter Notebook for the Python code)
 Sweep 100-23900 Hz, sampled at 48 kHz.
Now convert it to 44.1 kHz using the different libraries, and plot the spectrograms.
 Same sweep downsampled to 44.1 kHz using scikit.resample.
scikit.samplerate (or libsamplerate AKA "Secret Rabbit Code") does it well. Almost no aliasing, yet close to Nyquist, and noise levels in line with the input. Note that once the sweep is above Nyquist, the output is basically nil.
 The same sweep but now downsampled using scipy.signal.resample.
And this is scipy.signal.resample. Throughout the entire signal there is a high-frequency tone, and there is considerable energy even if the sweep is above Nyquist. This is a result of the frequency-domain processing operating on the entire signal at one. This is NOT suited to audio signals.
 The same sweep downsampled using resampy.
Finally, resampy. There are some funny nonlinear effects, but at very low level - for audio applications generally nothing to worry about. Even the aliasing is below -60 dB. So, while not as good as scikit.resample is is useable and certainly better than scipy.signal.resample!

## Upsampling an impulse

A way to examine the antialias filter is to upsample an impulse. Here, note a frustrating aspect of all these differing libraries: the interface is completely different beween all of them; and to boot, resampy and scikit.resample give different length outputs for the same upsampling ratio. This means you can't just switch between them using a from foo import resample statement, you'll need a wrapper function if you want to switch freely amongst them (see librosa as an example).
>>> us1 = sk_samplerate.resample(impulse, P/Q, 'sinc_best')
>>> us2 = scipy_signal.resample(impulse, int(len(impulse)*P/Q))
>>> us3 = resampy.resample(impulse, Q, P)
>>> print(us1.shape, us2.shape, us3.shape)
(1087,) (1088,) (1088,)
 Frequency response of an impulse upsampled from 44.1 kHz to 48 kHz

'scikit.resample' certainly uses the best antialias filter, with a steep slope, allowing very little energy over Nyquist. 'resampy' is a more gentle filter (lower order, perhaps?) but decent. 'scipy.signal.resample'... well, anyways.
What do the impulse responses look like in time domain?
 Time-domain impulse upsampled from 44.1 kHz to 48 kHz
 The same section of the impulse, but with samples converted into dB (10*log10 magnitude squared)
Unsurprisingly, 'resampy' is the most compact, which explains the gentler cut-off. 'scikit.resample' is wider, but acceptable. 'scipy.signal.resample' sticks out like a sore thumb.

## Speed comparison

>>> %timeit sk_samplerate.resample(sig, Q/P, 'sinc_best')
10 loops, best of 3: 98.1 ms per loop
>>> %timeit scipy_signal.resample(np.float64(sig), int(len(sig)*Q/P))
100 loops, best of 3: 4.52 ms per loop
>>> %timeit resampy.resample(np.float64(sig), P, Q)
10 loops, best of 3: 40.7 ms per loop

And here is a notable merit of 'scipy.signal.resample'. It is faster by an order of magnitude compared to the other methods. I suspect that if you make sure your signals are of length 2^N, you'll get even faster results, since it'll switch to a FFT instead of a DFT. The other two are probably losing some speed in the passing of data from Python to C - but fundamentally, frequency domain techniques do tend to be fast. So keep that in mind you you need speed.

## Conclusion

I'll just repeat the above TL;DR: use scikit.resample (0.4.0-dev) if you can install libsamplerate in a sane place. Else, use resampy, which is faster, but not as good --- but entirely reasonable. Use 'scipy.signal.resample' only if you really need the speed, and be aware of its shortcomings (note the circular assumption!).


## Monday, June 6, 2016

### FFT-based Overlap-Add FIR filtering in Python

Here is a small Python function I've written (github), that might be useful if you're doing signal processing in Python.  The function implements the classic FFT-based Overlap-Add filtering, potentially saving a heck of a lot of processing time (assuming your filter is of sufficiently high order: usually about 128 taps).

For a thorough explanation of the algorithm, see the Wikipedia article.

I wrote this code as part of a lager ongoing project (gammatone filtering) that I will release eventually.  What I still have to add to this code is the ability to set and save state information (it's simply the part of the response cut off at the end of the function) and the ability to filter complex signals with complex filters (replacing rfft with fft).  Changes made in the latest version.

## Wednesday, May 18, 2016

### A simple scikit-learn classifier based on Gaussian Mixture Models (GMM)

When I started switching to Python for my work on CASA, it wasn't entirely clear to me how to use the sklearn GMM (sklearn.mixture.GMM) for classification.  Turned out a bit easier than expected (yay for scikit-learn!), but for others, here is my implementation of a class that behaves like the other classifiers (eg. sklean.svm.SVC).  All you need to decide is how many Gaussians you want to model your data with, and off you go.

Link to Github repo. A Jupyter notebook shows a sample use.

Why isn't something like this in sklearn yet?  Well, turns out someone did propose it already (no surprise) in a much more general way: see this discussion on GitHub.  (I myself was pointed there when I asked about my own code)  My bit of code is far more primitive, but I hope easier to understand.

## Tuesday, April 12, 2016

### DAGA2016 article: Probabilistic 2D localization of sound sources using a multichannel bilateral hearing aid

Just put my DAGA 2016 article online. Link to paper.

Abstract: In the context of localization for Computational Auditory Scene Analysis (CASA), probabilistic localisation is a technique where a probability that a sound source is present is computed for each possible direction. This approach has been shown to work well with binaural signals provided the location of the sources to be localized is in front of the user and approximately on the same plane as the ears. Modern hearing aids use multiple microphones to perform array processing, and in a bilateral configuration, the extra microphones can be used by localization algorithms to not only estimate the horizontal direction (azimuth), but vertical direction (elevation) as well, thereby also resolving the front-back confusion. In this work, we present three different approaches to use Gaussian Mixture Model classifiers to localize sounds relative to a multi- microphone bilateral hearing aid. One approach is to divide a unit sphere into a nonuniform grid and assign a class to each grid point; the other two approaches estimate elevation and azimuth separately, using either a vertical-polar coordinate system or an ear- polar coordinate system. The benefits and drawbacks in terms of performance, computational complexity and memory requirements are discussed for each of these approaches.

## Monday, February 8, 2016

### GMM based localizer on custom ASIC model

 The model interface hardware with the FPGA in-circuit emulator.
 Lukas Gerlach (L) and Christopher Seifert (R) demoing their ASIC model setup, running realtime on a FPGA.
It's always nice to see one's own research code running on real actual hardware with live data rather than just having a simulation in MATLAB.  My colleagues over at the Institut für Mikroelektronische Systeme (IMS) of the Leibnitz Universität in Hannover presented a demo of their hardware at the Hearing4All winter plenary held last week in Soltau.  The code running on the hardware visible is a GMM based localizer originally written by Tobias May, but since heavily modified by myself.  The next step is that we'll write up exactly what we did to make this all work and how well it does - so look out for an article on this in the near future! It's one of the advantages of being at an intengrated cluster; at Hearing4All, pretty much everything related to hearing loss is being investigated: from basic ear physiology, to audiology, models, algorithms, clinical procedures, implants, and new ground-breaking hardware.

## Tuesday, February 2, 2016

### The Selective Binaural Beamformer: It's out!

After six or so months of going through the peer review gauntlet, our paper on the Selective Binaural Beamformer (or simply SBB) is finally published.  Thanks to all my coauthors (Menno, Daniel, Simon, and Steven) as well as the reviewers (especially reviewer #2, who gave very tough but important feedback) I think this became a very nice paper.  Please go ahead and read it at http://www.asp.eurasipjournals.com/content/2016/1/12 (EURASIP Journal on Advances in Signal Processing, full title "Speech enhancement for multimicrophone binaural hearing aids aiming to preserve the spatial auditory scene"): it's open access, one can read it either at the above address or download a PDF (see the right sidebar on the linked page).  Being open access, it's free and CC-A 4.0 licensed.

The basic idea behind the algorithm is this: Normally, if using a beamforming algorithm on a binaural hearing aid, the entire auditory image will collapse to the position of the beam direction, that is ALL sound will appear (to the hearing aid user) to originate from the same location.  Various methods have been proposed to fix this - Simon Doclo in particular has done a lot of work on this topic (which is why it was so helpful to have him as coauthor).  My approach to this problem was to take the signal in the STFT domain (ie, the signal is divided into discrete short time frames and narrow frequency bins) and in each "bin" (time-frequency unit) make a decision if the target signal is dominant, or if the background noise is dominant.  In the first case, I use the beamformer output: the signal is enhanced, collapsed, but that's OK - it _should_ be coming from the target direction anyways.  In the second case, I simply use the signal as it comes from the two microphones closest to the ear canal, without processing - hence there is (almost) no difference from the "real" signal reaching the ears.  So, all the benefit of the beamformer without the nasty collapse of the auditory field!

...Well, mostly.  The tricky part is to make a good speech/noise decision (or actually a "target signal"/background noise decision).  But there's a fancy SNR estimator in there, from Adam Kuklasinski (see ref. 19 - I met him in Lisbon where he presented it at EUSIPCO), and that works pretty well.

So if this is the kind of thing that seems interesting to you, read the paper - and I will post some of the sample files (that were used during subjective testing) soonish on my personal homepage.

## Thursday, July 2, 2015

### A bit of Python path hackery

Like many people, I have a directory in \$HOME for useful python code I write. Similar to MATLAB's "Documents/MATLAB" directory, I want those files to be easily available if I'm hacking new stuff. The typical way of handling this is to do:

import sys
sys.path.append('/path/to/my/dir')


but that is a bit ugly, and has the problem that if I'm doing an IPython Notebook (as I often do), this append function gets reevaluated if I reexecute the cell in which I do all my imports (since as I'm edition the notebook and adding stuff, I would do a lot). Besides I now have sys in my namespace.

I could just dump it into ".local/lib/python3.4/site-packages" (too hidden, outside the tree that is synced between machines), or link "Documents/Python" to that path, but then I don't want to have pip install stuff there, and also, there might be times I don't want "Documents/Python" to be in the path.

So here's my solution.  I place a file in ".local/lib/python3.4/site-packages" and ".local/lib/python2.7/site-packages" with a name like "LocalPath.py", and this file contains

       
import sys

myPythonDir = '/home/jthiem/Documents/Python'
if myPythonDir not in sys.path:
sys.path.append(myPythonDir);


I can now do
       
import LocalPath

and have instant no-fuss access to my homebrew packages.  Now isn't that nice.