SPRAAK
|
Estimation of warping factors for vocal tract length (VTL) normalisation. More...
Functions | |
void | spr_vtl_estim_free (SprSspInfo *Info) |
int | spr_vtl_estim_setup (SprSspInfo *Info, const char **descript, void *aux_info) |
int | spr_vtl_estim_process (SprSspInfo *Info, const void *frame_in, void *frame_out) |
void | spr_vtl_estim_reset (SprSspInfo *Info, SprSspStatus *action) |
Estimation of warping factors for vocal tract length (VTL) normalisation.
This module estimates warping factors for VTL normalisation. Estimation is based on the (maximum) likelihood for general speech models (mixture of gaussians) that correspond to each of the targeted warping factors.
Multi speaker audio is supported. Note that the output contains an estimated warping factor for each speaker: this way the frames which are very good samples for a speaker (and based on which the warping factor should be estimated) can be decoupled from the frames for which that speaker's estimated warping factor will finally be used. If both decisions are the same, a single warping factor for the so-called current speaker can be found easily using fun_eval.
[vtl_estim] | |
---|---|
nfr_init <number>(0) | |
Number of frames used for initial warping factor estimation - this value times the number of speakers is the minimal delay introduced by this routine. Set it to -1 if all frames should be used. Default 0 (typically for weighted warping factor estimation). | |
max_delay <number>(-1) | |
Do not delay the stream more than a certain number of frames, even if the <nfr_init> requirements are not met. | |
no_update | |
Don't update the warping factor estimation after the initial estimation. Early estimates (forced estimates due to the <max_delay> constrain) will still be updated. Default, estimates are updated. | |
weight <number> [partial/absolute](absolute) | |
The resulting warping factor depends on the total likelihoods of the models that correspond to the different target warping factors. Use -1 as <number> for maximum likelihood based target warping factor selection. For positive values of <number>, first a root is extracted of the likelihoods: the <number>-th root for <absolute> (default, this counteracts the frame dependency effect), the nfr/<number>-th root for <partial> (with nfr the number of frames on which the likelihood is based). Next these values are normalized and the resulting warping factor if found as the weighted sum of the different target warping factors. Default a weighting with the 10-th root is used. | |
no_reset | |
Do not reset the warping factor estimation at the beginning of a new file. | |
history <hist_len> | |
Limit the history length to <hist_len> (normal history is infinity in any case). | |
wf <warp_fac1> ... | |
Target warping factors (order as in the model file). | |
ngss <nr_gss1> ... | |
Number of gaussians in the mixture of the model for each warping factor. Default every model is supposed to have the same mixture size. | |
models <gss_file> | |
File with models (mixture of gaussians) for all target warping factors. | |
spdet_in <copy/move> <buf_name> | |
Input from a silence speech detector: the warping factor estimation is only updated for speech frames. | |
multi_spkr <N> <copy/move> <buf_name> | |
The named buffer <buf_name> contains a number in the range 0...<N>-1 that defines the current speaker. |