SPRAAK
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
Functions
sspmod_vtl_estim.c File Reference

Estimation of warping factors for vocal tract length (VTL) normalisation. More...

Functions

void spr_vtl_estim_free (SprSspInfo *Info)
 
int spr_vtl_estim_setup (SprSspInfo *Info, const char **descript, void *aux_info)
 
int spr_vtl_estim_process (SprSspInfo *Info, const void *frame_in, void *frame_out)
 
void spr_vtl_estim_reset (SprSspInfo *Info, SprSspStatus *action)
 

Detailed Description

Estimation of warping factors for vocal tract length (VTL) normalisation.

This module estimates warping factors for VTL normalisation. Estimation is based on the (maximum) likelihood for general speech models (mixture of gaussians) that correspond to each of the targeted warping factors.

Multi speaker audio is supported. Note that the output contains an estimated warping factor for each speaker: this way the frames which are very good samples for a speaker (and based on which the warping factor should be estimated) can be decoupled from the frames for which that speaker's estimated warping factor will finally be used. If both decisions are the same, a single warping factor for the so-called current speaker can be found easily using fun_eval.


[vtl_estim]
nfr_init <number>(0)
Number of frames used for initial warping factor estimation - this value times the number of speakers is the minimal delay introduced by this routine. Set it to -1 if all frames should be used. Default 0 (typically for weighted warping factor estimation).
max_delay <number>(-1)
Do not delay the stream more than a certain number of frames, even if the <nfr_init> requirements are not met.
no_update
Don't update the warping factor estimation after the initial estimation. Early estimates (forced estimates due to the <max_delay> constrain) will still be updated. Default, estimates are updated.
weight <number> [partial/absolute](absolute)
The resulting warping factor depends on the total likelihoods of the models that correspond to the different target warping factors. Use -1 as <number> for maximum likelihood based target warping factor selection. For positive values of <number>, first a root is extracted of the likelihoods: the <number>-th root for <absolute> (default, this counteracts the frame dependency effect), the nfr/<number>-th root for <partial> (with nfr the number of frames on which the likelihood is based). Next these values are normalized and the resulting warping factor if found as the weighted sum of the different target warping factors. Default a weighting with the 10-th root is used.
no_reset
Do not reset the warping factor estimation at the beginning of a new file.
history <hist_len>
Limit the history length to <hist_len> (normal history is infinity in any case).
wf <warp_fac1> ...
Target warping factors (order as in the model file).
ngss <nr_gss1> ...
Number of gaussians in the mixture of the model for each warping factor. Default every model is supposed to have the same mixture size.
models <gss_file>
File with models (mixture of gaussians) for all target warping factors.
spdet_in <copy/move> <buf_name>
Input from a silence speech detector: the warping factor estimation is only updated for speech frames.
multi_spkr <N> <copy/move> <buf_name>
The named buffer <buf_name> contains a number in the range 0...<N>-1 that defines the current speaker.

Author
Jacques Duchateau
Date
07 Oct 2004