SPRAAK
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
spr_mdt_make_cgt.c File Reference

create cluster gaussian to back-end gaussian association table More...

Detailed Description

create cluster gaussian to back-end gaussian association table

spr_mdt_make_cgt [-SEG](flag: segmentation mode) [-tbl output association table]
    [-stats collected statistics] [-add collected statistics] [-l length of shortlist](5)
    [-p fraction](0.0) [-cg cluster gaussians] [-exp exponent file] [-beam beam search info]
    <-c Corpus> [-range b:e](0:-1) [-ssp script](SPR_BSS_DEV_NULL) [-obs ObsDir]
    [-suffix FileSuffix](sam) [-arcd FileName] <-h FileName> [-g FileName] [-sel FileName]
    [-am_opt Options] [-top_n Number(s)](0) [-rmg rmg_params](no) [-LMout Value](-100)
    [-NOGS](flag: no gauss sel) <-ci unit description files> [-cd unit description files]
    <-d dictionary> [-unwind unwind format] (-u==-ci)
Parameters
-SEGflagsegmentation mode
Work in segmentation mode, i.e. read the segmentation from the corpus/segmentation file instead of doing Viterbi alignment
-tbl<em>outputassociation table
File to write the association table to.
-stats<em>collectedstatistics
File to write the co-occurence statistics to.
-add<em>collectedstatistics
File(s) to read pre-computed co-occurence statistics from. Multiples file can be specified separated by ' .AND. ' (merging of results).
-l<em>lengthof shortlist
Maximum length for the shortlists.
-p<em>fraction</em><aname="spr_mdt_make_cgt.p" class="el">
Remove the least promising back-end gaussians from the shortlists as long a the removed gaussians do not represent more that <fraction> of the total co-occurence counts.
-cg<em>clustergaussians
The cluster gaussians.
-exp<em>exponentfile
The exponents to apply to the different components is a prospect feature vector.
-beam<em>beamsearch info
See load_search_space() for more details
-c<em>Corpus</em><aname="spr_mdt_make_cgt.c" class="el">
File with corpus entries or segmentations.
-range<em>b:e</em><aname="spr_mdt_make_cgt.range" class="el">
Optional begin and end entry the corpus/segmentation file. Counting starts at 0.
-ssp<em>script</em><aname="spr_mdt_make_cgt.ssp" class="el">
The signal processing script used to preprocess the input data.
-obs<em>ObsDir</em><aname="spr_mdt_make_cgt.obs" class="el">
Observation directory name.
-suffix<em>FileSuffix</em><aname="spr_mdt_make_cgt.suffix" class="el">
File suffix of the observation files (without leading '.').
-arcd<em>FileName</em><aname="spr_mdt_make_cgt.arcd" class="el">
Unit file name (.arcd or .cd format).
-h<em>FileName</em><aname="spr_mdt_make_cgt.h" class="el">
The input HMM file.
-g<em>FileName</em><aname="spr_mdt_make_cgt.g" class="el">
The input MVG file (gaussians).
-sel<em>FileName</em><aname="spr_mdt_make_cgt.sel" class="el">
The input select file name (tied gaussian).
-am_opt<em>Options</em><aname="spr_mdt_make_cgt.am_opt" class="el">
Extra options for loading the acoustic model. A non-default acoustic model can be selected by having '=<am_type>;' as first option. See cwr_am_tbl.c for a list of acoustic models available.
-top_n<em>Number(s)</em><aname="spr_mdt_make_cgt.top_n" class="el">
Only take the top-N gaussians into account when calculating output probabilities. If one value is given, it is used for all mixtures. Else a value per mixture must be given, separated by commas. Use '0' to set top_n to the number of gaussians in the mixture.
-rmg<em>rmg_params</em><aname="spr_mdt_make_cgt.rmg" class="el">
The parameters for the quick selection of gaussians. If one value is given, it is used for all mixtures. Else a value per mixture must be given, separated by commas. Use 'no' if no quick selection is wanted. See rm_gauss.c for a description of the parameters.
-LMout<em>Value</em><aname="spr_mdt_make_cgt.LMout" class="el">
Floor the state likelihoods of an observation using a fraction of the unconditional likelihood of the observation (weighted sum of the state likelihoods). Practically necessary if only few gaussians are evalutated (-top_n or -rmg options). The value given offset an automatically determined log10(fraction). Use -100 to turn the flooring off, and 0.0 to use the default.automatically.
-NOGSflagno gauss sel
Forgo the (sentence level) lexicon based Gaussian selection. The lexicon based Gaussian selection speeds up the decoding but may interfere with score normalization techniques that assume all Gaussians were evaluated.
-ci<em>unitdescription files
The two units description files seperated by white-space. The first file just lists the units (phones). The second file describes the context dependencies.
-cd<em>unitdescription files
The two units description files seperated by white-space. The first file just lists the units (phones). The second file describes the context dependencies.
-d<em>dictionary</em><aname="spr_mdt_make_cgt.d" class="el">
Dictionary file name.
-unwind<em>unwindformat
Define the parameters to modify the word transcriptions. See spr_cwr_lex_desc_read() for more details.

Create cluster gaussian to back-end gaussian association table.

The -ssp option
This program assumes a single preprocessing file that provides both the feature vectors for the back-end model and the cluster gaussians (in that order).
Note
Specify '/' as segmentation file, lexicon, HMM and ci-file to combine counts only without adding new data from a segmentation file.

For each frame listed in a segmentation file, evaluate both the cluster gaussians and the back-end gaussians for the corresponding state, find the most likely gaussian for both systems and update the co-occurence counts.

From the co-occurence counts, a short list of likely back-end gaussians per cluster gaussian is made. These shortlists are used in the mdt (missing data) recognizer .

Author
Kris Demuynck
Date
04/02/2011
Revision History:
04/02/2011 - KD
creation (copied from spr_fv_gauss and modified)