SPRAAK
|
Speech recognizers do not work directly on sampled data. SPRAAK as the great majority of speech recognition systems works with frames, i.e. features extracted from the data at regularly sp time intervals. How SPRAAK maps samples to frames is explained in Frames and Times.
SPRAAK has a wealth of signal processing routines and options built in. The algorithm to be executed is described by a scripting language in a signal processing configuration file. It may be called 'in-line', i.e. processing is done as soon as input becomes available and the computed features are streamed towards further processing steps in an automatic way this can be done both during training or recognition. This is the standard setup for most types of signal processing which require only minimal amounts of computing time on today's systems. Processing may also be done 'off-line' in which the signal processing output is written to file. The latter may be the prefered modus of operandi for complex signal processing algorithms.
A list of all signal processing modules is found in Preprocessing modules. The implemented modules include: FFT, cepstra, mel-cepstra, mean normalization, histogram equalization, vocal tract length normalization, begin-endpoint detection, time derivatives, ... Furthermore hooks are foreseen to plug-in MATLAB code.
The example script below extracts 39-dimensional mel cepstral features (including first and second order derivatives and sentence based mean normalization), a common baseline for comparative experiments.