SPRAAK
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
spr_add_gauss.c File Reference

Increase the level of tying in a reduced SC_HMM. More...

Detailed Description

Increase the level of tying in a reduced SC_HMM.

spr_add_gauss <-i hmm_file> [-gi mvg_file] [-seli sel_file] [-u unit_fname] [-am_opt acmod_opt]
    [-o hmm_file] [-go hmm_file] [-selo sel_file] [-nu unit_fname]
    [-method method for add_gauss](dist) [-nsel NrGaussPerState] [-w WeightFactor](0.5)
    [-vm Value](1e-2) [-cm Value](9.0) [-ggi GaussianTying]
Parameters
-i<em>hmm_file</em><aname="spr_add_gauss.i" class="el">
HMM-file to read.
-gi<em>mvg_file</em><aname="spr_add_gauss.gi" class="el">
MVG-file to read. If specified, this MVG file is read instead of the MVG file given in the header of the input HMM.
-seli<em>sel_file</em><aname="spr_add_gauss.seli" class="el">
SEL-file to read. If specified, this SELECT file is read instead of the SELECT file given in the header of the input HMM.
-u<em>unit_fname</em><aname="spr_add_gauss.u" class="el">
UNIT-file to read. If specified, this unit file is used to read the input HMM instead of the file corresponding to the filename in the header of the input HMM.
-am_opt<em>acmod_opt</em><aname="spr_add_gauss.am_opt" class="el">
Extra options for loading the acoustic model.
-o<em>hmm_file</em><aname="spr_add_gauss.o" class="el">
HMM-file to write to. If not specified, the input HMM will be overwritten.
-go<em>hmm_file</em><aname="spr_add_gauss.go" class="el">
MVG-file to write to. By default the input mvg file will be overwritten.
-selo<em>sel_file</em><aname="spr_add_gauss.selo" class="el">
SEL-file to write to. By default is input sel file will be overwritten.
-nu<em>unit_fname</em><aname="spr_add_gauss.nu" class="el">
Specify a new unit file to write the units to (or as input for rearanging the existing units).
-method<em>methodfor add_gauss
Add gaussians according to a distance criterion or according to a 'simulated' EM-training. See the explanation of add_gauss for more details on both methods.
-nsel<em>NrGaussPerState</em><aname="spr_add_gauss.nsel" class="el">
Indicates the number of gaussians per state for each of the mixtures. If one value is given, it is used for all mixtures. Else a value per mixture must be given. Separate values with a comma.
-w<em>WeightFactor</em><aname="spr_add_gauss.w" class="el">
Weight for the equally distributed prob. mass.
-vm<em>Value</em><aname="spr_add_gauss.vm" class="el">
Lower limit on the sqrt(variance) relative w.r.t. the weighted average of the variances.
-cm<em>Value</em><aname="spr_add_gauss.cm" class="el">
Lower limit on the count of a Gaussian (number of points assigned to that Gaussian); Gaussians with a count below the limit are excluded from the shared pool of Gaussians.
-ggi<em>GaussianTying</em><aname="spr_add_gauss.ggi" class="el">
File that specifies which states share Gaussians.

Increase the level of tying in a reduced SC_HMM. Increases the level of tying in a reduced SC_HMM. The idea is to make a fully tied SC_HMM, but to select immediately a fixed number Nsel (option -nsel) of gaussians per state as the fully tied HMM can be prohibitively large. Note that this fully tied SC_HMM is at no point stored in memory, calculations are done state by state.

Two methods are supported (see the option -method) to select 'suitable' gaussians for a given tied state.

The first method (-method dist) ranks all gaussians according to their distance (see PhD Jacques Duchateau, page 103) to the input state, and the Nsel closest ones are selected.

The second method (-method em) performs a 'simulated' EM-pass and then selects the Nsel gaussians with the highest mixture weights. The 'simulated' EM-pass first makes a fully tied gaussian mixture by distributing the probability fraction WGT (given by option -w) equally over all gaussians and adding the remaining probability (1-WGT) to the original gaussians in the input state according to their weight in the input state. Next a single 'simulated' EM-pass is performed. The simulated reflects the fact that no data points are used. Instead, the gaussians in the original mixture are used as data points. The likelihood of a gaussian (instead of a point) is calculated with the method presented in the PhD of Kris Demuynck, page 101. Next the probability distribution obtained per gaussian is multiplied by the a priori probabilities (the fully tied mixture weights), normalized to 1.0, and weighted with the original weight of the gaussian. Finally, the probability distributions for all gaussians in the given state are accumulated to obtain the mixture weights for ranking the gaussians.

By default, this program pursues sharing between all Gaussians. Restricted share can be requested by specifying a "Gaussian Grouping Info" (GGI) file. The GGI-file has the following format:

.spr
DATA    DES
DIM1    <Nlines>
#
[group]                 <el> ...
<phon>#<istate>         <share_with> ...
( <phon>#<istate> ... ) <share_with> ...

After the standard header, the body of the file contains a mix of three types of lines:

  1. the definition of a group
  2. the definition of the Gaussian grouping for a state
  3. the definition of the Gaussian grouping for multiple states

States are refered to with a <phon>#<istate> denomination (cf. segmentation files). The name of a group must start with '[' and end with ']'. The elements in the definition of a group (<el> ...) or of a Gaussian grouping (<share_with> ...) can refer to states or previously defined groups. All group names must be unique. If the model uses tied states, the group sharing for such a state will be the union of all group sharing that refer to that state. Groups can only be used in the right hand side (the <share_with> part).

Finally, the weights for the selected gaussians in the output state are calculated: the weight WGT (given by option -w) is distributed equally over all selected gaussians (method dist) or all gaussians (method em, provides consistency with the fully tied intermediate model) and the remaining weight (1-WGT) is added to the original gaussians in the input state according to their weight in the input state. Finally the weights are normalized as to obtain a total weigth of 1.0.

Revision History:
05/02, JD
creation
01/04, KD
added new method
05/12, KD
added the option to specificy which states share Gaussians