Compatibility with previous versions of SPRAAK

The handling of the first/last frame(s) has changed.

The change in convention may lead to small (insignificant) changes in WER. If identical behaviour is required, the following changes need to be made to the pre-processing scripts:

Add
.trail zero

at the top op the pre-processing script.
See Also
preprocessing_modules__Global_commands
Add the option
extend zero

to every [sam2trk] section in the pre-processing script.
See Also
preprocessing_modules__sam2trk

Filterbank shapes

The filterbank shapes have changed: the filterbank shapes now interpolate on corner frequencies instead of rounding the corner frequency up or down to the nearest FFT bin (old convention). The new convention provides better behavior in combination with VTLN or when using a high number of filter banks.

For compatibility reasons the 'TRI' shape (see preprocessing_modules__filter_bank) stil uses the old convention. Use the 'TRI2' shape to obtain the new convention.

When using an adaptive VTLN, it is also advised to turn the 'norm' option on (see preprocessing_modules__filter_bank). This option will keep the surface of each filterbank (relates to the gain of the filterbank) constant. Note that a VTLN-dependent filterbank gain is not a problem with static VTLN since the subsequent [meannorm] compensates 100% for static changes in the energy content.

New features

pre-processing

Improved handling of first/last frame in the preprocessing.
See Also
Conventions for Frames, The handling of the first/last frame(s) has changed. and preprocessing_modules__deriv
The [filter_bank] preprocessing module now allows the specification of the begin and end frequency and the number of filter banks. All filterbank shapes (except TRI) now interpolate on corner frequencies instead of rounding the corner frequency to the nearest FFT bin.
See Also
Filterbank shapes and preprocessing_modules__filter_bank
The HTK Mel-scale has been added, making it easier to export/import models to/from HTK.
See Also
preprocessing_modules__filter_bank
Least squares overlap add signal reconstruction in the [trk2sam] pre-processing module. This allows for example the reconstruction of audio from an amplitude spectrogram using Griffin's algorithm.
See Also
preprocessing_modules__trk2sam
Options to make the [meannorm] preprocessing module robust against no input frames when using a voice activity detector.
See Also
preprocessing_modules__meannorm

lattices

Phone or state scores are now also stored with the phone/state trace-back and written to lattices (if requested).
See Also
spr_eval.py and spr_cwr_main.c
Word scores in lattices are now more precise when handling very long files.

acoustic modelling

Addition of a simple 'direct feed through' acoustic model, allowing externally calculated likelihoods to be used.
See Also
dft_am.c and spr_cwr_main.c
Increased flexibility in defining the amount of Gaussian tying between states during training. The scheme even allows to use one subset of Gaussians to approximate states modeled by another subset of Gaussians.
Segmentation files can now also contain scores, e.g. generated by spr_vitalign. This should allow for example to find unreliable reference transcriptions or deviant acoustics in DBases automatically.
Programs (spr_fv_gauss.c, spr_mdt_make_cgt.c) that worked with internal Viterbi alignment only now also accept a segmentation file.
Programs that use the recognizer in sentence mode (Viterbi-alignment & training) can now also chose the acoustic model type

other changes in fuctionality

Support for writing .wav files.
Mac-OsX support.
Various small improvements and corrections.

documentation & examples

The WSJ examples now use cmudict.0.7a (was cmudict.0.6d).
Addition of an example of a noise robust system by means of noise normalization.
Additional documentation.
Better (program specific) documentation for spr_sel_gauss, spr_add_gauss, spr_setminprob and the various spr_hmm... programs.

internal changes

Support for function templates, e.g. matrix functions (and documentation) for float and double precision based on a single piece of code.
Legacy code removed.
First steps in introducing a generic interface toward the acoustic model.
Code clean-up + various code improvements