This page explains the example code in SPRAAK/examples/exp_aur4/. While the examples in the previous section deal with the Speecon data, which cannot be included with this distribution, Aurora-4 can be purchased at a lower cost, such that this example should be reconstructable for most users. Aurora-4 is derived from the WSJ0-corpus by adding noise. The new data is provided with Aurora-4, but the language model and lexicon are not. Hence, the WSJ0-database is required as well.

Running the Aurora-4 experiments requires the following steps

cd examples/exp_wsj
MAKE_RESOURCES  # adjust the paths in the MAKE_RESOURCES script to point to your copy of the WSJ0 DBase
cd ../exp_aur4
MAKE_RESOURCES  # adjust the paths in the MAKE_RESOURCES script to point to your copy of the Aurora-4 DBase
RUN_EXPERIMENTS_MDT

The MAKE_RESOURCES scripts convert index files to corpora, download the CMU-dictionary and extract the words required for training and testing, convert language model to the SPRAAK binary format and so on.

The exp_aur4/RUN_EXPERIMENTS_MDT script performs the following steps:

Create an initial segmentation file using a small acoustic model included in the SPRAAK distribution.
Train a standard acoustic model on the 'clean' (noise-free) Aurora-4 train data. This model uses VAD to estimate the channel (meannorm) on speech only. This mode of operation matches the MDT channel estimate which also ignores silence frames in its channel estimate.
Evaluate the standard (non-MDT) model. Begin/end point detection is added as a simple method to be somewhat noise robust. This result servers as the baseline.
Create all resources needed for the MDT-setup:
- the stream exponents and the prospect transformation matrix
- the VQ codebooks for the mask estimation
- the cluster Gaussians
- the association table (Gaussian short-lists)
Evaluate the MDT-setup using both static masks only and static and delta masks.

Some practical notes:

The setup assumes a 4-core machine, but everything can be easily adopted to more of fewer cores.
Most options that control the behaviour of the MDT-system are located in exp_aur4/aur4_clean_mida_vad_mdt.ini. Note that when using the 'channel feedback', the preprocessing file (resources/aur4_mida_vad_mdt.preproc must contain a matching [feedback] block.
Although there are many different preprocessing scripts involved, they basically originate from a single master preprocessing scripts be just selecting the appropriate blocks. In other words, the preprocessing scripts share code heavily. We just opted for different specific preprocessing routines for efficiency reasons (if something is not needed, it is not calculated).
The preprocessing scripts are modular and flexible and hence it should be easy to plug in your own mask estimation techniques; you could for example read masks from external files.
The relevant preprocessing scripts are examples/resources/preproc/aur4_* and examples/resources/preproc/impute_spec_hd.preproc.

Results:

The results listed below are indicative only; small deviations from these results are not abnormal, they just mean that your compiler organized some floating point opperations differently.

Baseline results, result with static masks only (threshold set to 1) and result with static and delta masks (threshold set to 1 and 35 respectively):

   clean   car     babble  resto   street  airport train           clean   car     babble  resto   street  airport train
   01      02      03      04      05      06      07      AV      08      09      10      11      12      13      14      AV
   -----------------------------------------------------------------------------------------------------------------------------
   5.55    14.18   32.67   40.22   35.51   27.91   34.75   27.26   19.13   34.34   50.98   55.71   56.53   46.44   55.50   45.52
   5.64     9.99   21.60   25.76   25.26   17.99   26.77   19.00   15.26   24.70   38.84   39.79   42.35   34.04   43.92   34.13
   5.90     9.56   20.68   25.24   25.63   17.99   27.12   18.88   13.13   21.30   37.27   39.16   40.59   32.92   43.38   32.53

Robust features:

As an alternative for MDT, the exp_aur4/RUN_EXPERIMENTS_PREPROC script will create two acoustic models based based on a preprocessing that contains noise normalisation, i.e. noise robust features. For more details, we refer to [1].
Note: the noise normalisation will inject some noise in the clean speech features. Hence clean speech results will be somewhat affected compared to the other techniques listed on this page.

The two models, fixed noise masking level and variable noise masking level respectively, obtain the following results:

   clean   car     babble  resto   street  airport train           clean   car     babble  resto   street  airport train
   01      02      03      04      05      06      07      AV      08      09      10      11      12      13      14      AV
   -----------------------------------------------------------------------------------------------------------------------------
   7.14     9.98   15.54   20.47   19.63   15.97   18.77   15.36   20.34   28.41   36.95   39.21   40.91   36.02   39.85   34.53
   6.63     9.15   15.28   18.33   18.87   13.56   18.53   14.34   17.97   25.91   34.21   35.74   39.01   32.45   38.76   32.01

[1]: Kris Demuynck, Xueru Zhang, Dirk Van Compernolle and Hugo Van hamme. Feature versus Model Based Noise Robustness. In Proc. INTERSPEECH, pages 721–724, Makuhari, Japan, September 2010.