SPRAAK
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
What's new in SPRAAK V1.1.362

Compatibility with previous versions of SPRAAK

The handling of the first/last frame(s) has changed.

The change in convention may lead to small (insignificant) changes in WER. If identical behaviour is required, the following changes need to be made to the pre-processing scripts:

Filterbank shapes

The filterbank shapes have changed: the filterbank shapes now interpolate on corner frequencies instead of rounding the corner frequency up or down to the nearest FFT bin (old convention). The new convention provides better behavior in combination with VTLN or when using a high number of filter banks.

For compatibility reasons the 'TRI' shape (see preprocessing_modules__filter_bank) stil uses the old convention. Use the 'TRI2' shape to obtain the new convention.

When using an adaptive VTLN, it is also advised to turn the 'norm' option on (see preprocessing_modules__filter_bank). This option will keep the surface of each filterbank (relates to the gain of the filterbank) constant. Note that a VTLN-dependent filterbank gain is not a problem with static VTLN since the subsequent [meannorm] compensates 100% for static changes in the energy content.

New features

pre-processing
lattices
  • Phone or state scores are now also stored with the phone/state trace-back and written to lattices (if requested).
    See Also
    spr_eval.py and spr_cwr_main.c
  • Word scores in lattices are now more precise when handling very long files.
acoustic modelling
  • Addition of a simple 'direct feed through' acoustic model, allowing externally calculated likelihoods to be used.
    See Also
    dft_am.c and spr_cwr_main.c
  • Increased flexibility in defining the amount of Gaussian tying between states during training. The scheme even allows to use one subset of Gaussians to approximate states modeled by another subset of Gaussians.
  • Segmentation files can now also contain scores, e.g. generated by spr_vitalign. This should allow for example to find unreliable reference transcriptions or deviant acoustics in DBases automatically.
  • Programs (spr_fv_gauss.c, spr_mdt_make_cgt.c) that worked with internal Viterbi alignment only now also accept a segmentation file.
  • Programs that use the recognizer in sentence mode (Viterbi-alignment & training) can now also chose the acoustic model type
other changes in fuctionality
  • Support for writing .wav files.
  • Mac-OsX support.
  • Various small improvements and corrections.
documentation & examples
  • The WSJ examples now use cmudict.0.7a (was cmudict.0.6d).
  • Addition of an example of a noise robust system by means of noise normalization.
  • Additional documentation.
  • Better (program specific) documentation for spr_sel_gauss, spr_add_gauss, spr_setminprob and the various spr_hmm... programs.
internal changes
  • Support for function templates, e.g. matrix functions (and documentation) for float and double precision based on a single piece of code.
  • Legacy code removed.
  • First steps in introducing a generic interface toward the acoustic model.
  • Code clean-up + various code improvements