Extensions on standard LM's. More...

Data Structures
struct	SprCwrLMExtSM
	sub-model info More...

struct	SprCwrLMExtWI
	word info More...

struct	SprCwrLMExtESB
	the actual word strings stored per command in a large array More...

struct	SprCwrLMExt

struct	spr_t_cwr_lm_instance_ext

Typedefs
typedef float(*	_FuncPtr1_CWR_LM_EXT_ )(const SprCwrLMInstance lm, const SprLMContext lmc, int wndx)

typedef float(*	_FuncPtr2_CWR_LM_EXT_ )(SprCwrLMInstance lm, const SprLMContext lmc, SprLMContext lmc_next, SprCwrLMWord wrd)

typedef SprCwrLMWord (	_FuncPtr3_CWR_LM_EXT_ )(const SprCwrLMInstance lm, const SprLMContext lmc, SprCwrLMWord *wrd)

typedef float (	_FuncPtr4_CWR_LM_EXT_ )(const SprCwrLMInstance lm, const SprLMContext lmc, float all_probs, SprCwrLMWord wrd)

typedef int(*	_FuncPtr5_CWR_LM_EXT_ )(SprCwrLMInstance *lm, unsigned int action,...)

Functions
SprCwrLMExt *	spr_cwr_lm_ext_free (SprCwrLMExt *ext)

void	spr_cwr_lm_ext_lmcr (SprCwrLMInstance lm, const SprLMContext lmc)

int	spr_cwr_lm_ext_uninstall (SprCwrLMInterface lmi, SprCwrLMInstance lm_data)

float *	spr_cwr_lm_ext_read_wdsf (float *wdsf_base, const char fname, SprCwrLMStrRead io, const SprStrHashTbl wlst, const SprCwrLMExt *ext)

SprCwrLMExt *	spr_cwr_lm_ext_read (const char ext_fname, const char wdsf_fname, SprCwrLMStrRead io, const SprStrHashTbl wlst, int *eos_ndx_ptr, double prob_sf)

int *	spr_cwr_lm_ext_wxlat (const SprCwrSLex lex, const SprStrHashTbl whash_main, const SprStrHashTbl *whash_ext, int check_lvl, int closed_voc)

int	spr_cwr_lm_ext_install (SprCwrLMInterface lmi, SprCwrLMInstance lm_data, SprCwrLMExt ext, const SprStrHashTbl whash_main, const SprCwrSLex *lex, int check_lvl, int closed_voc)

Detailed Description

Extensions on standard LM's.

This module provides the necessary routines to push a set of common extensions on top of standard LM The extensions are typically load with the ext=<fname>; option. De format of the extension file and the behaviour of the extensions are described below.

Generic info:

The <noC/addC> flag: defines wether the given word needs the C-cost (word startup cost) to be added or not (e.g. inter-word silence).
The <frac(log10)> parameter: indicates that only the given fraction of the probability of the given <map_to_word> probability should be used. This is for example usefull when mapping multiple words to the same symbol, e.g. when using different filler models for unknown words.

Specific info:

subdef <name> <file> <options> [cost_A](1) [cost_C](0) [open_symbol](<s>) close_symbol
Define the submodel with name <name>, i.e. (1) read the data from file <file> with options <options>, (2) set cost A and cost C, and (3) define the open and close symbol ("" equals no open/close symbol)
eos0 <noC/addC> <lmc=S/lmc=OS/lmc=OES> <map_to_word> <frac(log10)> <start_symbol> <words> ...
Define an end-of-sentence symbol by means of remapping specific EOS words <words> (e.g. </s-point>) to some generic EOS symbol <map_to_word> (e.g. </s>). The LM-context is also remapped to (or extended with) the start-of-sentence symbol <start_symbol>: lmc=S: the new LM-context equals the SOS symbol only. lmc=OS: the new LM-context equals the the old LM-context extended with the SOS symbol. lmc=OES: the new LM-context equals the the old LM-context extended with the remapped EOS symbol and with the SOS symbol.
eos1 <noC/addC> <lmc=S/lmc=OS/lmc=OES> <frac(log10)> <start_symbol> <words> ...
Define the end-of-sentence symbols (without remapping). For an explanation of the arguments, see 'eos0'.
sos_cost_adjust=<log10_value>
Cost adjustment when starting up a new sentence – since most LM toolkits handle sentences instead of continuous text, the value of P(<s>|...,</s>) is typically bogus. This option allows you to adjust this, i.e. add an offset to the value.
fill0 <noC/addC> <lmc=O/lmc=O*> <map_to_word> <frac(log10)> <words> ...
Remap some word to another. The new LM-context can either incorporate the <map_to_word> (lmc=O*) or can ignore it (lmc=O). The latter is for example usefull when defining fillers.
fill1 <noC/addC> <prob(log10)> <words> ...
Define a new word by giving it a fixed probability, irrespective of the LM-context (e.g. fillers).
submod <noC/addC> <lmc=O/lmc=O*> <map_to_word> <frac(log10)> <submod> <words> ...
Define that the given list of words are part of the specified sub-model. For more info on how sub-models work, see hereunder.

Sub-models: When going from the normal LM to the sub-model the following steps are taken:

Take the probability of the <map_to_word> given the current LM-context. If <map_to_word> equals "", this step is omitted.
Either expand the LM-context with the <map_to_word> (lmc=O*) or not (lmc=O).
Save the current LM-context, and open a new LM-context for the given sub-model containing the <open_symbol> only.
Add the prob of the first word, given the context <open_symbol>, and update the LM-context for the sub-model.

When going from a sub-model to the normal LM (or to another sub-model), the following steps are taken:

take the probability of the <close_symbol> (if defined by the sub-model) given the current LM-context of the sub-model; the highest prob. is used in case the close leads to multiple contexts
remove the LM-context of the sub-model, and restore the previously saved LM-context of the base LM
add the probability of the current word; see the explanation above when going to yet another sub-model

[LM_extension_format]
`subdef=<N-gram/FSG/comb/dump/XXX> <name> <file> <options> [cost_A](1) [cost_C](0) [open_symbol](<s>) [close_symbol](</s>)`
UL
`subdef <name> <file> <options> [cost_A](1) [cost_C](0) [open_symbol](<s>) [close_symbol](</s>)`
UL
`eos0 <noC/addC> <lmc=S/lmc=OS/lmc=OES> <map_to_word> <frac(log10)> <start_symbol> <words> ...`
UL
`eos1 <noC/addC> <lmc=S/lmc=OS/lmc=OES> <frac(log10)> <start_symbol> <words> ...`
UL
`fill0 <noC/addC> <lmc=O/lmc=O*> <map_to_word> <frac(log10)> <words> ...`
UL
`fill1 <noC/addC> <prob(log10)> <words> ...`
UL
`submod <noC/addC> <lmc=O/lmc=O*> <map_to_word> <frac(log10)> <submod> <words> ...`
UL
`sos_cost_adjust=<log10_value)>`
UL

Date: July 2002

Author: Kris Demuynck

Bug:

The LM is not normalized when using 'extensions'. This has no tangible effect on the recognition, but perplexities calculated with the given LM will be incorrect.

When stacking multi-class LM's, only the base LM may have extensions.

If the FULL_DEBUG flag is set during compilation (-DFULL_DEBUG), some extra debug mesages and tests are added. These tests result in a serious overhead.

Revision History:

10/2003 - KD: Added the LM-extensions to cwr_lm_std.c
03/2005 - KD: Isolated the LM-extensions so that they can be pushed on top of other LM's than the N-gram LM (and made the extensions more general).

Typedefs
typedef float(*	_FuncPtr1_CWR_LM_EXT_ )(const SprCwrLMInstance lm, const SprLMContext lmc, int wndx)

typedef float(*	_FuncPtr2_CWR_LM_EXT_ )(SprCwrLMInstance lm, const SprLMContext lmc, SprLMContext lmc_next, SprCwrLMWord wrd)

typedef SprCwrLMWord (	_FuncPtr3_CWR_LM_EXT_ )(const SprCwrLMInstance lm, const SprLMContext lmc, SprCwrLMWord *wrd)

typedef float (	_FuncPtr4_CWR_LM_EXT_ )(const SprCwrLMInstance lm, const SprLMContext lmc, float all_probs, SprCwrLMWord wrd)

typedef int(*	_FuncPtr5_CWR_LM_EXT_ )(SprCwrLMInstance *lm, unsigned int action,...)

Data Structures

Typedefs

Functions

Detailed Description