SPRAAK
|
Increase the level of tying in a reduced SC_HMM. More...
Increase the level of tying in a reduced SC_HMM.
spr_add_gauss <-i hmm_file> [-gi mvg_file] [-seli sel_file] [-u unit_fname] [-am_opt acmod_opt] [-o hmm_file] [-go hmm_file] [-selo sel_file] [-nu unit_fname] [-method method for add_gauss](dist) [-nsel NrGaussPerState] [-w WeightFactor](0.5) [-vm Value](1e-2) [-cm Value](9.0) [-ggi GaussianTying]
Increase the level of tying in a reduced SC_HMM. Increases the level of tying in a reduced SC_HMM. The idea is to make a fully tied SC_HMM, but to select immediately a fixed number Nsel (option -nsel) of gaussians per state as the fully tied HMM can be prohibitively large. Note that this fully tied SC_HMM is at no point stored in memory, calculations are done state by state.
Two methods are supported (see the option -method) to select 'suitable' gaussians for a given tied state.
The first method (-method dist) ranks all gaussians according to their distance (see PhD Jacques Duchateau, page 103) to the input state, and the Nsel closest ones are selected.
The second method (-method em) performs a 'simulated' EM-pass and then selects the Nsel gaussians with the highest mixture weights. The 'simulated' EM-pass first makes a fully tied gaussian mixture by distributing the probability fraction WGT (given by option -w) equally over all gaussians and adding the remaining probability (1-WGT) to the original gaussians in the input state according to their weight in the input state. Next a single 'simulated' EM-pass is performed. The simulated reflects the fact that no data points are used. Instead, the gaussians in the original mixture are used as data points. The likelihood of a gaussian (instead of a point) is calculated with the method presented in the PhD of Kris Demuynck, page 101. Next the probability distribution obtained per gaussian is multiplied by the a priori probabilities (the fully tied mixture weights), normalized to 1.0, and weighted with the original weight of the gaussian. Finally, the probability distributions for all gaussians in the given state are accumulated to obtain the mixture weights for ranking the gaussians.
By default, this program pursues sharing between all Gaussians. Restricted share can be requested by specifying a "Gaussian Grouping Info" (GGI) file. The GGI-file has the following format:
.spr DATA DES DIM1 <Nlines> # [group] <el> ... <phon>#<istate> <share_with> ... ( <phon>#<istate> ... ) <share_with> ...
After the standard header, the body of the file contains a mix of three types of lines:
States are refered to with a <phon>#<istate> denomination (cf. segmentation files). The name of a group must start with '[' and end with ']'. The elements in the definition of a group (<el> ...) or of a Gaussian grouping (<share_with> ...) can refer to states or previously defined groups. All group names must be unique. If the model uses tied states, the group sharing for such a state will be the union of all group sharing that refer to that state. Groups can only be used in the right hand side (the <share_with> part).
Finally, the weights for the selected gaussians in the output state are calculated: the weight WGT (given by option -w) is distributed equally over all selected gaussians (method dist) or all gaussians (method em, provides consistency with the fully tied intermediate model) and the remaining weight (1-WGT) is added to the original gaussians in the input state according to their weight in the input state. Finally the weights are normalized as to obtain a total weigth of 1.0.