A single test

The configuration files for the evaluation of acoustic models are the .ini files in the ./scripts directory. They contain the names of the dictionaries and the language models. In this tutorial we focus on the 20k open vocabulary evaluation set (nov.92).

Converting lexicon and language model is explained in Preparing the data. The evaluation of your models can be done by running this script:

> spr_eval.py -v LEX=20k -v LM=t20k wsj0+1_mida_vtln_CD.ini MY_EXP  ../resources/SPRAAK/nov92_np_20k.cor ../data/wsj0 wv1 thr=50.0 lma=1.5 lmc=-5

Transcription will be saved in MY_EXP.OUT files while results in MY_EXP.RES files. Try different values of lma, lmc, and thr to get best transcription with your models.

Optimizing Parameters

There are a few parameters that need optimization in virtually every setup. These are the parameters that weight the acoustic vs. the language model and the beam search parameters.

SPRAAK uses following unorthodox global scoring:

total_score = acoustic_model_score(log10) + LMA*language_model_score(log) + LMC

For the beam search two parameters matter:

max_bw  the maximum beamwidth (depends on vocabulary and language model)
thr     is an absolute threshold used for beam pruning

The max_bw parameter can be set once for a given task. E.g. for the WSJ a maximum beam width of 80.000 has proven to be fine. The threshold is somewhat dependent on the acoustic model and the parameter settings for AM and LM weighting. We have seen good compromises between performance and efficiency when the average beaam width is about 1/5 of the maximum beamwidth. The average beam width is given in the result file. Hence it may require a bit of tuning to find a good value.