SPRAAK
|
In this TUTORIAL we show how to do a simple training for an acoustic model. You will experience SPRAAK's ease of handling of large corpora. The top level scripts and its associated resources are the only things you really need to understand. Much of the complex internal data handling and processing is hidden from you in a convenient way.
While the setup that we use is very simple in nature, it may serve as a starting point for contemporary research.
We assume that you already completed the tutorials on Recognizing and Aligning with SPRAAK and did the associated recommend reading.
Speech recognition experiments typically require to handle large corpora of data. In the first place you will need sufficient disk space. As much of this is temporary data that is easily reproduced it is best to store your permanent data (resources, configurations and final models) in different place than the intermediate data. You would like your permannent data to be backed up regularly, while the intermediate data can be stored in a scratch directory.
For ease of working, you may want to put a link to your scratchdisk inside your experimentation directory, e.g. (assuming MYSCRATCHDISK is predefined). We also preconfigured a few experimental setups in the './scripts' subdirectory. You may want to copy the entire directory. Finally we suggest you create an extra directory where you will run the experiments.
SPRAAK uses a top level PYTHON script spr_train.py to perform acoustic model training according to the specifications listed in a ".config" file. The configuration for this tutorial is in the file "e1.config".
A configuation file is actually PYTHON code executed by the master trainer script. In it you find specifications for:
In this example the training consists of a single invocation of the 'trainer' object: trainer.tied(). Each call to the trainer constitutes a MAJOR iteration. Iterations within such a call will constitue MINOR iterations. (numbering starts at 1)
spr_eval.py is a PYTHON script performing recognition and scoring of the results as already explained in the previous tutorial. The evaluation script uses both information given in a ".ini" initialization file (./scripts/e1.ini for Tutorial-1) and takes command line arguments.
The initialization file specifies default parameter settings and resources to be pre-loaded by the recognizer. The command line argument further specifies which of these components should be used during recognition.
The scripts, configuration files and other tools are stored in the ./scripts directory. The top level script that you need to run this experiment is ./scripts/e1.csh. Parameter settings are such that we assume that you copy the script to your ./exp directory and run it from there. All output will be generated in this directory, making it easy to clean up if something goes wrong. The top level script performs both training and evaluation.
NOTE: As training (and recognition) may take hours or days, it is good practice to "nice" your jobs, such that it doesn't interfere too much with interactive processes or to submit them to specific compute nodes.
Running the example TIMIT experiment can be done by executing following sequence of commands:
> cd ~/MyTimit/exp > cp $SPR_HOME/examples/scripts/e1.* . > e1.csh
In the example script the relative weight of acoustic and language model are preset (cost_C & cost_A parameters). Often experimentation involves a hill climbing procedure of optimization over such parameter settings, which may be done by a simple loop over the eval script (of course preferably done on an independent development test set). (Language Model Scaling)
Following processes and subTasks will be performed
spr_train.py: trainer.tied() trainer_tied_init .... # several initialization tasks SelFrames # selected frames for gaussian initialization, maintain balancing per phone MakeTiedMvg # make Tied Gaussian Pool MakeTiedHMM # Initialize HMM weights Segpass # Perform an initial Segmentation Pass (spr_segpass.c) trainer_tied_iter_m (3times; m=1,2,3) Vitpass # Compute Viterbi counts (spr_vitpass.c) CtToHmm # Update HMM statistics (spr_ct2hmm.c) spr_eval.py:
A very rough estimate of execution times on a contempary (2010) dual core machine is:
training: 25' evaluation: 5'
Filenames of all files directly involved in the experiment are given below. Filenames are relative to the local experiment directory.
SETUP: e1.csh master script for running training+evaluation e1.config config file for training e1.ini config file for evaluation OTHER INPUT FILES/DIRECTORIES USED: ../resources/mfcc39.ssp feature extraction file ../resources/timit51.ci phone alphabet ../resources/timit51.cd state definition file ../resources/timit51.dic lexicon ../resources/train_hand_states.seg segmentation file (hand labeled) ../resources/train.cor specification of the training corpus ../resources/test.39.cor specification of the test corpus ../dbase/--- database with speech waveform files GENERATED FILES(primary): e1_.log log file, contains logging information on the experiment e1_recovery.log recovery log file, contains recovery points for automatic restart e1_m1/acmod.mvg Multivariate Gaussians e1_m1/acmod.hmm HMM coefficients e1_m1/acmod.sel Index of HMM coefficients e1_.RES Result File GENERATED FILES(supporting): e1_.CMD Commands sent to spr_cwr_main.c during evaluation e1_.OUT Output generated by spr_cwr_main.c e1_m1/acmod.x.xxx Acoustic models files at the end of a minor iteration
Now you're ready to not only run the example experiment e1, but to generate your own variants.
You may try any of the following:
PS: experiment e1 will result in 29% error rate. Can you do better ?