Impute a spectrum given a decomposed spectrum. More...

Functions
int	spr_vqmask_process (SprSspInfo Info, const void frame_in, void *frame_out)

void	spr_vqmask_free (SprSspInfo *Info)

int	spr_vqmask_setup (SprSspInfo Info, const char descript, void aux_info)

Detailed Description

Impute a spectrum given a decomposed spectrum.

Impute a MEL-spectrum given

the noisy speech spectrum decomposed in a $y^{(v)}$ and $y^{(r)}$ component ( =voiced, =residu in harmonic decomposition; other decompositions should work as well)
an estimate of the mean $\mu^{(r)}_n$ and variance $\sigma^{(r)}_n$ for the component of the noise
the Gaussian distribution ${\cal N}(n^{(v)}-n^{(r)};\mu^{(v-r)}_n,\sigma^{(v-r)}_n)$ that links $n^{(v)}$ to $n^{(r)}$
a speech model, i.e. $i=1\ldots N$ (diagonal) Gaussian distributions ${\cal N}_i(x^{(v)},x^{(r)};\mu^{(v)}_x,\mu^{(r)}_x,\sigma^{(v)}_x,\sigma^{(r)}_x)$ modelling the decomposed clean speech

Right now all variances (sigma) are pooled and set to

1.0 for the speech Gaussians
$1/\sqrt(\lambda)$ for ${\cal N}(n^{(v)}-n^{(r)})$
$1/\sqrt(\gamma)$ for the component of the noise

Also, the speech Gaussians are assumed to have uniform prior probabilities.

The imputation estimates the underlying noise $\hat{n}^{(v)},\hat{n}^{(r)}$ for each spectral component and clean speech codebook entry ${\cal N}_i(\cdot)$ independently, given the decomposed noisy speech spectrum $y^{(v)}$ , $y^{(r)}$ and the estimate of the noise mean $\mu^{(r)}_n$ using the following equation:

$\hat{n}^{(v)},\hat{n}^{(r)} = \mbox{arg}\max_{n{^(v)},n^{(r)}} (y^{(v)}-\max(\mu^{(v)}_x,n^{(v)}))^2 + (y^{(r)}-\max(\mu^{(r)}_x,n^{(r)}))^2 + \gamma (n^{(r)}-\mu^{(r)}_n)^2 + \lambda (n^{(v)}-n^{(r)} - \mu^{(v-r)}_n)^2$

The imputed noise and speech spectrum are $\max(\hat{n}^{(r)},\hat{n}^{(v)})$ and $\max(\mu^{(v)}_x,\mu^{(r)}_x)$ respectively.

Input:
1. $y^{(v)}$
2. $y^{(r)}$
3. $\mu^{(r)}_n$
Output:
1. $\hat{n}^{(v)}$
2. $\hat{n}^{(r)}$
3. $\mu^{(v)}_x$ of the best fitting codebook
4. $\mu^{(r)}_x$ of the best fitting codebook

[vqmask]
`codebook <filename>`
Codebook filename (a .mvg file).
`Nsil <Nsil>(0) [SpchExclSil/SpchInclSil](SpchInclSil)`
Use the first <Nsil> entries for silence frames and the remaining entries for speech frames. Use either all the codebook entries (SpchInclSil; advised when the VAD has 'hangover', i.e. bridges short silence gaps) or only the non-silence ones (SpchExclSil; may work better if the VAD is excellent) for non-silence frames.
`gamma <val>(1.0)`
Weight of the noise mean Gaussian (1/sqrt(var)).
`lambda <val>(1.0)`
Weight of the noise linking Gaussian (1/sqrt(var)).
`mu_noise [val](0.0) ...`
The means mu^(v-r)_n for the Gaussian linking n^(v) with n^(r).

See Also: ssp_master.c

Date: 09/02/2011

Author: Kris Demuynck, Maarten Van Segbroeck

Functions

Detailed Description