GOAL of this TUTORIAL

In this TUTORIAL we show how to do a simple training for an acoustic model. You will experience SPRAAK's ease of handling of large corpora. The top level scripts and its associated resources are the only things you really need to understand. Much of the complex internal data handling and processing is hidden from you in a convenient way.

While the setup that we use is very simple in nature, it may serve as a starting point for contemporary research.

We assume that you already completed the tutorials on Recognizing and Aligning with SPRAAK and did the associated recommend reading.

Setting Up for Experiments

Speech recognition experiments typically require to handle large corpora of data. In the first place you will need sufficient disk space. As much of this is temporary data that is easily reproduced it is best to store your permanent data (resources, configurations and final models) in different place than the intermediate data. You would like your permannent data to be backed up regularly, while the intermediate data can be stored in a scratch directory.

For ease of working, you may want to put a link to your scratchdisk inside your experimentation directory, e.g. (assuming MYSCRATCHDISK is predefined). We also preconfigured a few experimental setups in the './scripts' subdirectory. You may want to copy the entire directory. Finally we suggest you create an extra directory where you will run the experiments.

Configuring the TRAINING

SPRAAK uses a top level PYTHON script spr_train.py to perform acoustic model training according to the specifications listed in a ".config" file. The configuration for this tutorial is in the file "e1.config".

A configuation file is actually PYTHON code executed by the master trainer script. In it you find specifications for:

feature extraction (preprocessing)
linguistic resources
HMM configuration
training corpus
directory specifications for scratch (sdir) and HMMs (mname)
training passes with their parameters

In this example the training consists of a single invocation of the 'trainer' object: trainer.tied(). Each call to the trainer constitutes a MAJOR iteration. Iterations within such a call will constitue MINOR iterations. (numbering starts at 1)

Configuring the Evaluation

spr_eval.py is a PYTHON script performing recognition and scoring of the results as already explained in the previous tutorial. The evaluation script uses both information given in a ".ini" initialization file (./scripts/e1.ini for Tutorial-1) and takes command line arguments.

The initialization file specifies default parameter settings and resources to be pre-loaded by the recognizer. The command line argument further specifies which of these components should be used during recognition.

Running the Experiment

The scripts, configuration files and other tools are stored in the ./scripts directory. The top level script that you need to run this experiment is ./scripts/e1.csh. Parameter settings are such that we assume that you copy the script to your ./exp directory and run it from there. All output will be generated in this directory, making it easy to clean up if something goes wrong. The top level script performs both training and evaluation.

NOTE: As training (and recognition) may take hours or days, it is good practice to "nice" your jobs, such that it doesn't interfere too much with interactive processes or to submit them to specific compute nodes.

Running the example TIMIT experiment can be done by executing following sequence of commands:

> cd ~/MyTimit/exp
> cp $SPR_HOME/examples/scripts/e1.* .
> e1.csh

In the example script the relative weight of acoustic and language model are preset (cost_C & cost_A parameters). Often experimentation involves a hill climbing procedure of optimization over such parameter settings, which may be done by a simple loop over the eval script (of course preferably done on an independent development test set). (Language Model Scaling)

Processing Steps

Following processes and subTasks will be performed

spr_train.py:
  trainer.tied()
    trainer_tied_init
        ....            # several initialization tasks
        SelFrames       # selected frames for gaussian initialization, maintain balancing per phone
        MakeTiedMvg     # make Tied Gaussian Pool
        MakeTiedHMM     # Initialize HMM weights
        Segpass         # Perform an initial Segmentation Pass (spr_segpass.c)
    trainer_tied_iter_m (3times; m=1,2,3)
        Vitpass         # Compute Viterbi counts (spr_vitpass.c)
        CtToHmm         # Update HMM statistics (spr_ct2hmm.c)
spr_eval.py:

A very rough estimate of execution times on a contempary (2010) dual core machine is:

training: 25'
evaluation: 5'

Files

Filenames of all files directly involved in the experiment are given below. Filenames are relative to the local experiment directory.

SETUP:
e1.csh                  master script for running training+evaluation
e1.config               config file for training
e1.ini                  config file for evaluation

OTHER INPUT FILES/DIRECTORIES USED:
../resources/mfcc39.ssp         feature extraction file
../resources/timit51.ci         phone alphabet
../resources/timit51.cd         state definition file
../resources/timit51.dic        lexicon
../resources/train_hand_states.seg      segmentation file (hand labeled)
../resources/train.cor          specification of the training corpus
../resources/test.39.cor        specification of the test corpus
../dbase/---                    database with speech waveform files

GENERATED FILES(primary):
e1_.log                 log file, contains logging information on the experiment
e1_recovery.log         recovery log file, contains recovery points for automatic restart
e1_m1/acmod.mvg         Multivariate Gaussians
e1_m1/acmod.hmm         HMM coefficients
e1_m1/acmod.sel         Index of HMM coefficients
e1_.RES                 Result File

GENERATED FILES(supporting):
e1_.CMD                 Commands sent to spr_cwr_main.c during evaluation
e1_.OUT                 Output generated by spr_cwr_main.c
e1_m1/acmod.x.xxx       Acoustic models files at the end of a minor iteration

Additional Experiments

Now you're ready to not only run the example experiment e1, but to generate your own variants.

You may try any of the following:

change the number of states per phone
change the maximum number of gaussians per state
do additional Viterbi training passes
...

PS: experiment e1 will result in 29% error rate. Can you do better ?