SPRAAK
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
spr_mvg_init.c File Reference

Initialization for K-means cluster centra and multi variate gaussians. More...

Detailed Description

Initialization for K-means cluster centra and multi variate gaussians.

spr_mvg_init [-MSG](flag: ) [-CDB](flag: ) [-WG](flag: ) [-BD](flag: ) [-VC](flag: )
    [-RVC](flag: ) [-VVC file] [-FVC file] [-VCW file/wgt_vec] [-pool PoolingString]
    [-vm Value](1e-2) [-init initialization method](PCA) [-train train method](EM)
    [-PCA_lim PCA -> LDA](-1) <-i data file name> [-f firstframe:lastframe] [-ci class info]
    [-hmm hmm file] [-u units] <-o output file name> [-rmg FRoG flags](no)
    [-fcov full covariance] [-hplanes hyper planes] [-Alpha LVQ-param](1.0)
    [-Beta class weighting](1.0) [-Gamma deterministic anealing](1.0) [-topN #gauss](32)
    [-nc nr. gaussians](1) [-ni nr. iterations](5)
Parameters
-MSGflag
Messages: Write some interresting comments on the INFO level.
-CDBflag
CodeBooks: Write K-means codebook instead of gaussians.
-WGflag
Weighted gaussians: Assign weights to the gaussians (proportional to the counts). By default all gaussians are assumed to be equiprobable, which forces the system towards an equal number of points underneath each gaussian.
-BDflag
Boundary Detection: Split the data to minimize some distorton measure. By default the data is splitted so that each gaussian contains approx. an equal number of points.
-VCflag
Variance Compensation: Normalize the variance of the input data before the first binary split.
-RVCflag
Recursive Variance Compensation: Normalize the variance of the input data before the each binary split.
-VVC<em>file</em><aname="spr_mvg_init.VVC" class="el">
Vector Variance Compensation: Set the variance of the input data to the given vector (filename) before the first binary split.
-FVC<em>file</em><aname="spr_mvg_init.FVC" class="el">
Fixed Variance Compensation: Divide the input data by the given vector (filename) before the first binary split.
-VCW<em>file/wgt_vec</em><aname="spr_mvg_init.VCW" class="el">
Vector Component Weighting: in the LDA initialization method, assign different weights (priorities) to each axis in the input space (implementation: left&right multiply the withing class covariance matrix with diag(1/sqrt(VCW))). Either a file name is specified or an expression (starting with '[').
-pool<em>PoolingString</em><aname="spr_mvg_init.pool" class="el">
Determines the variance pooling. P: the same variance for all gaussians. FP: the same variance for all gaussians and directions (no variance).
-vm<em>Value</em><aname="spr_mvg_init.vm" class="el">
Lower limit on the sqrt(variance) relative w.r.t. the weighted average of the variances.
-init<em>initializationmethod
Select the initialization method: PCA, LDA or NONE.
-train<em>trainmethod
Select the train method: EM, KM or NONE.
-PCA_lim<em>PCA-> LDA
Switch back to PCA-initialization instead of LDA-initialization if the number of mixtures to be assigned is less or equal than the given number. A value of zero enables an automatic switch based on the perplexity of the remaining data.
-i<em>datafile name
File that contains the input data to cluster.
-f<em>firstframe:lastframe</em><aname="spr_mvg_init.f" class="el">
Select these specific frames from the input data (normally all frames are used).
-ci<em>classinfo
File that contains the class info (LDA/LVQ). Same format as the 'select.siz' file made by spr_sel_frames, i.e. 3 fields (and an optional 4 fields): <class_name> <start_frame> <end_frame> [wgt].
-hmm<em>hmmfile
File name of the mixture weights (HMM). Also used to read initial mixture weights if init-style NONE is chosen. The HMM is used for LVQ-training and is creared when using LDA-initialization.
-u<em>units</em><aname="spr_mvg_init.u" class="el">
The unit file (.arcd or .cd format). Used by LVQ-training to read/write the HMM or by LDA-initialization to write the HMM. Should be consistent with the class information (select.siz) file.
-o<em>outputfile name
File that contains the resulting cluster centra or gaussians. Also used to read initial values for the gaussian if init-style NONE is chosen.
-rmg<em>FRoGflags
Initialization of the Fast Removal of Gaussians system. For more information about the flags see the man-page of rm_gauss.
-fcov<em>fullcovariance
Write the full covariance information for the gaussians.
-hplanes<em>hyperplanes
Write the hyperplane information to files. Only the base-name must be specified, the extensions are added automatically.
-Alpha<em>LVQ-param</em><aname="spr_mvg_init.Alpha" class="el">
Alpha >0.0: Force the LVQ to make more (|Alpha|>1.0) or less specific (|Alpha|<1.0) gaussians.
Alpha<=0.0: Force the LVQ to make more (|Alpha|>1.0) or less specific (|Alpha|<1.0) states.
If |Alpha| == 1.0, a standard EM-training (with mixture weights) is done.
-Beta<em>classweighting
Non equal weighting of the points in the classes.
Beta < 1.0: give more importance to the classes with few points.
Beta > 1.0: give more importance to the classes with lots of points.
-Gamma<em>deterministicanealing
Artificial smoothing of the gaussians by widening the surface it spans.
-topN<em>\gauss</em><aname="spr_mvg_init.topN" class="el">
Only udate the N-best scoring gaussians.
-nc<em>nr.gaussians
Number of gaussians or codebook size.
-ni<em>nr.iterations
Number of reestimation passes.

Initialization for K-means cluster centra and multi variate gaussians. Performs a binary tree splitting on the input data points. In each step the direction having the largest eigenvalue (PCA or LDA) is splitted. For each final region, a mean and variance is calculated based on the given data points. The global distortion is minimized further by performing some K-means (assign each data point to nearest mean) or E.M. (distribute data points over means according to the distance) reestimation passes.

Author
Kris Demuynck
Date
26/08/1995 - KD
Creation.
xx/01/1996 - JD
Adjusted for new hmm structures, training added for pooling and full pooling cases.
xx/05/1997 - KD
Added boundary detection for the PCA method.
xx/05/1997 - KD
Added the LDA method.
xx/06/1997 - KD
Added the LVQ training (first version).
xx/08/1997 - KD
LDA/LVQ - weighting of the data.
25/05/2018 - KD
added the range option.