SPRAAK
|
The favorite acoustic modeling technique in SPRAAK is semi-continuous HMMs (SCHMM) with diagonal covariance gaussians. In this setup each distribution in the acoustic model is computed as a weighted sum of gaussians drawn from a large pool. It is used as a generic approach that allows untied and fully tied mixture as extremes of its implementation.
In semi-continuous HMMs the model of the observation probabilities is organized in two layers, consisting of:
In semi-continuous HMMs the number of basis functions tends to be very large (several thousand), while only a few of them (typically around 100 in SPRAAK) will have non-zero weights for any individual state.
Assume:
then, the probability of observing in state
is given as:
In the case of diagonal convariance tied Gaussian mixtures (default) with:
the above expands to:
Obviously the above concept easily accomodates commonly used other variants of tying:
While SPRAAK will function correctly for all types of tying, there are a number of (computational) optimizations from the viewpoint of a semi-continuous system.
In semi-continous modeling the weight matrix linking basis functions to states tends to be very sparse (i.e. most weights are '0'). SPRAAK accomodates a flexible sparse data structure by putting the information in 3 separate files (typically with the same basename, e.g. 'acmod'): d
The files used for storing HMM parameters are described in more detail in HMM Files.
The acoustic model returns NORMALIZED scores for a segment of data. By default the Viterbi algorithm is used, but other computational approaches and other acoustic models (than the standard SCHMM) are possible.
'NORMALIZED' implies that the sum of scores over all possible states for any given time instant will be equal to 1.0. In practice log10-probabilities are returned. Thus:
where: = the prior probability for state j
The motivation for using these normalized scores is:
All probabilities in SPRAAK that are stored in files are by default given as log10() values.
SPRAAK uses an efficient bottom up scheme to predict which gaussians in the pool that need to be evaluate and which ones not. This is done for the whole pool of gaussians at once and not on a state by state basis. The datastructure describing for each axis which regions are relevant for which gaussians is computed once when the acoustic model is loaded.
More information on setting paramters for the FROG system may be found in rm_gauss.c