SPRAAK
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
Feature Extraction

Speech recognizers do not work directly on sampled data. SPRAAK as the great majority of speech recognition systems works with frames, i.e. features extracted from the data at regularly sp time intervals. How SPRAAK maps samples to frames is explained in Frames and Times.

SPRAAK has a wealth of signal processing routines and options built in. The algorithm to be executed is described by a scripting language in a signal processing configuration file. It may be called 'in-line', i.e. processing is done as soon as input becomes available and the computed features are streamed towards further processing steps in an automatic way this can be done both during training or recognition. This is the standard setup for most types of signal processing which require only minimal amounts of computing time on today's systems. Processing may also be done 'off-line' in which the signal processing output is written to file. The latter may be the prefered modus of operandi for complex signal processing algorithms.

A list of all signal processing modules is found in Preprocessing modules. The implemented modules include: FFT, cepstra, mel-cepstra, mean normalization, histogram equalization, vocal tract length normalization, begin-endpoint detection, time derivatives, ... Furthermore hooks are foreseen to plug-in MATLAB code.

The example script below extracts 39-dimensional mel cepstral features (including first and second order derivatives and sentence based mean normalization), a common baseline for comparative experiments.

% mfcc39.ssp
% Feature Extraction Script
% MEL FREQUENCY CEPSTRAL COEFFICIENTS, incl. first and second derivatives
%
% Parameter Settings:
% - frame shift: 10 msec
% - frame length: 25 msec
% - preemphasis: 0.95
% - Hamming window
% - mean normalization ON (file based)
% - triangular shaped filterbank using the full spectrum (24 channels for 16 kHz)
% - cep0 is included as 0th order coefficient
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% setting frame rate/shift
.fshift 0.01
.convert to FLOAT
% MEL-spectrum
[sam2trk]
flength 0.025
preemp 0.95
[scale]
window HAMMING norm
[anafft]
type POWER
[filter_bank]
scale MELDM
bank TRI
output dB
% mean normalization
[meannorm]
alpha 0.0
nfr -1
% conversion to cepstrum
[idct]
nparam 12
% adding first and second derivatives
.var Vlen
.set Vlen = vlen
[deriv]
COPY begin 0:Vlen
PROCESS 0:Vlen
order first robust
[deriv]
COPY begin 0:2*Vlen
PROCESS Vlen:Vlen
order first simple
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% the following scaling is optional; it provides compatibility with HTK cepstral coefficients
% in order to perform this scaling uncomment the following two lines
% [scale]
% coeff 0.072596 0.22978 0.4135 0.58653 0.62205 0.92992 1.0196 1.1581 1.2839 1.5252 1.614 1.6749 1.828 0.031819 0.10027 0.17512 0.25378 0.24688 0.35509 0.38311 0.42749 0.46578 0.53164 0.56466 0.6017 0.63302 0.03095 0.10172 0.16327 0.236 0.23339 0.31393 0.33192 0.36971 0.39564 0.43845 0.4623 0.4908 0.51367