Initialization for K-means cluster centra and multi variate gaussians. More...

Detailed Description

Initialization for K-means cluster centra and multi variate gaussians.

spr_mvg_init [-MSG](flag: ) [-CDB](flag: ) [-WG](flag: ) [-BD](flag: ) [-VC](flag: )
    [-RVC](flag: ) [-VVC file] [-FVC file] [-VCW file/wgt_vec] [-pool PoolingString]
    [-vm Value](1e-2) [-init initialization method](PCA) [-train train method](EM)
    [-PCA_lim PCA -> LDA](-1) <-i data file name> [-f firstframe:lastframe] [-ci class info]
    [-hmm hmm file] [-u units] <-o output file name> [-rmg FRoG flags](no)
    [-fcov full covariance] [-hplanes hyper planes] [-Alpha LVQ-param](1.0)
    [-Beta class weighting](1.0) [-Gamma deterministic anealing](1.0) [-topN #gauss](32)
    [-nc nr. gaussians](1) [-ni nr. iterations](5)

Parameters

	-MSGflag	Messages: Write some interresting comments on the INFO level.
	-CDBflag	CodeBooks: Write K-means codebook instead of gaussians.
	-WGflag	Weighted gaussians: Assign weights to the gaussians (proportional to the counts). By default all gaussians are assumed to be equiprobable, which forces the system towards an equal number of points underneath each gaussian.
	-BDflag	Boundary Detection: Split the data to minimize some distorton measure. By default the data is splitted so that each gaussian contains approx. an equal number of points.
	-VCflag	Variance Compensation: Normalize the variance of the input data before the first binary split.
	-RVCflag	Recursive Variance Compensation: Normalize the variance of the input data before the each binary split.
	-VVC<em>file</em><a	name="spr_mvg_init.VVC" class="el"> Vector Variance Compensation: Set the variance of the input data to the given vector (filename) before the first binary split.
	-FVC<em>file</em><a	name="spr_mvg_init.FVC" class="el"> Fixed Variance Compensation: Divide the input data by the given vector (filename) before the first binary split.
	-VCW<em>file/wgt_vec</em><a	name="spr_mvg_init.VCW" class="el"> Vector Component Weighting: in the LDA initialization method, assign different weights (priorities) to each axis in the input space (implementation: left&right multiply the withing class covariance matrix with diag(1/sqrt(VCW))). Either a file name is specified or an expression (starting with '[').
	-pool<em>PoolingString</em><a	name="spr_mvg_init.pool" class="el"> Determines the variance pooling. P: the same variance for all gaussians. FP: the same variance for all gaussians and directions (no variance).
	-vm<em>Value</em><a	name="spr_mvg_init.vm" class="el"> Lower limit on the sqrt(variance) relative w.r.t. the weighted average of the variances.
	-init<em>initialization	method Select the initialization method: PCA, LDA or NONE.
	-train<em>train	method Select the train method: EM, KM or NONE.
	-PCA_lim<em>PCA	-> LDA Switch back to PCA-initialization instead of LDA-initialization if the number of mixtures to be assigned is less or equal than the given number. A value of zero enables an automatic switch based on the perplexity of the remaining data.
	-i<em>data	file name File that contains the input data to cluster.
	-f<em>firstframe:lastframe</em><a	name="spr_mvg_init.f" class="el"> Select these specific frames from the input data (normally all frames are used).
	-ci<em>class	info File that contains the class info (LDA/LVQ). Same format as the 'select.siz' file made by spr_sel_frames, i.e. 3 fields (and an optional 4 fields): <class_name> <start_frame> <end_frame> [wgt].
	-hmm<em>hmm	file File name of the mixture weights (HMM). Also used to read initial mixture weights if init-style NONE is chosen. The HMM is used for LVQ-training and is creared when using LDA-initialization.
	-u<em>units</em><a	name="spr_mvg_init.u" class="el"> The unit file (.arcd or .cd format). Used by LVQ-training to read/write the HMM or by LDA-initialization to write the HMM. Should be consistent with the class information (select.siz) file.
	-o<em>output	file name File that contains the resulting cluster centra or gaussians. Also used to read initial values for the gaussian if init-style NONE is chosen.
	-rmg<em>FRoG	flags Initialization of the Fast Removal of Gaussians system. For more information about the flags see the man-page of rm_gauss.
	-fcov<em>full	covariance Write the full covariance information for the gaussians.
	-hplanes<em>hyper	planes Write the hyperplane information to files. Only the base-name must be specified, the extensions are added automatically.
	-Alpha<em>LVQ-param</em><a	name="spr_mvg_init.Alpha" class="el"> Alpha >0.0: Force the LVQ to make more (\|Alpha\|>1.0) or less specific (\|Alpha\|<1.0) gaussians. Alpha<=0.0: Force the LVQ to make more (\|Alpha\|>1.0) or less specific (\|Alpha\|<1.0) states. If \|Alpha\| == 1.0, a standard EM-training (with mixture weights) is done.
	-Beta<em>class	weighting Non equal weighting of the points in the classes. Beta < 1.0: give more importance to the classes with few points. Beta > 1.0: give more importance to the classes with lots of points.
	-Gamma<em>deterministic	anealing Artificial smoothing of the gaussians by widening the surface it spans.
-topN<em>\	gauss</em><a	name="spr_mvg_init.topN" class="el"> Only udate the N-best scoring gaussians.
	-nc<em>nr.	gaussians Number of gaussians or codebook size.
	-ni<em>nr.	iterations Number of reestimation passes.

Initialization for K-means cluster centra and multi variate gaussians. Performs a binary tree splitting on the input data points. In each step the direction having the largest eigenvalue (PCA or LDA) is splitted. For each final region, a mean and variance is calculated based on the given data points. The global distortion is minimized further by performing some K-means (assign each data point to nearest mean) or E.M. (distribute data points over means according to the distance) reestimation passes.

Author: Kris Demuynck

Date

26/08/1995 - KD: Creation.
xx/01/1996 - JD: Adjusted for new hmm structures, training added for pooling and full pooling cases.
xx/05/1997 - KD: Added boundary detection for the PCA method.
xx/05/1997 - KD: Added the LDA method.
xx/06/1997 - KD: Added the LVQ training (first version).
xx/08/1997 - KD: LDA/LVQ - weighting of the data.
25/05/2018 - KD: added the range option.