SPRAAK
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
Data Structures | Typedefs | Functions
cwr_lm_ext.c File Reference

Extensions on standard LM's. More...

Data Structures

struct  SprCwrLMExtSM
 sub-model info More...
 
struct  SprCwrLMExtWI
 word info More...
 
struct  SprCwrLMExtESB
 the actual word strings stored per command in a large array More...
 
struct  SprCwrLMExt
 
struct  spr_t_cwr_lm_instance_ext
 

Typedefs

typedef float(* _FuncPtr1_CWR_LM_EXT_ )(const SprCwrLMInstance *lm, const SprLMContext *lmc, int wndx)
 
typedef float(* _FuncPtr2_CWR_LM_EXT_ )(SprCwrLMInstance *lm, const SprLMContext *lmc, SprLMContext *lmc_next, SprCwrLMWord *wrd)
 
typedef SprCwrLMWord *(* _FuncPtr3_CWR_LM_EXT_ )(const SprCwrLMInstance *lm, const SprLMContext *lmc, SprCwrLMWord *wrd)
 
typedef float *(* _FuncPtr4_CWR_LM_EXT_ )(const SprCwrLMInstance *lm, const SprLMContext *lmc, float *all_probs, SprCwrLMWord *wrd)
 
typedef int(* _FuncPtr5_CWR_LM_EXT_ )(SprCwrLMInstance *lm, unsigned int action,...)
 

Functions

SprCwrLMExtspr_cwr_lm_ext_free (SprCwrLMExt *ext)
 
void spr_cwr_lm_ext_lmcr (SprCwrLMInstance *lm, const SprLMContext *lmc)
 
int spr_cwr_lm_ext_uninstall (SprCwrLMInterface *lmi, SprCwrLMInstance *lm_data)
 
float * spr_cwr_lm_ext_read_wdsf (float **wdsf_base, const char *fname, SprCwrLMStrRead *io, const SprStrHashTbl *wlst, const SprCwrLMExt *ext)
 
SprCwrLMExtspr_cwr_lm_ext_read (const char *ext_fname, const char *wdsf_fname, SprCwrLMStrRead *io, const SprStrHashTbl *wlst, int *eos_ndx_ptr, double prob_sf)
 
int * spr_cwr_lm_ext_wxlat (const SprCwrSLex *lex, const SprStrHashTbl *whash_main, const SprStrHashTbl *whash_ext, int check_lvl, int closed_voc)
 
int spr_cwr_lm_ext_install (SprCwrLMInterface *lmi, SprCwrLMInstance *lm_data, SprCwrLMExt *ext, const SprStrHashTbl *whash_main, const SprCwrSLex *lex, int check_lvl, int closed_voc)
 

Detailed Description

Extensions on standard LM's.

This module provides the necessary routines to push a set of common extensions on top of standard LM The extensions are typically load with the ext=<fname>; option. De format of the extension file and the behaviour of the extensions are described below.

Generic info:

The <noC/addC> flag
defines wether the given word needs the C-cost (word startup cost) to be added or not (e.g. inter-word silence).
The <frac(log10)> parameter
indicates that only the given fraction of the probability of the given <map_to_word> probability should be used. This is for example usefull when mapping multiple words to the same symbol, e.g. when using different filler models for unknown words.

Specific info:

Sub-models: When going from the normal LM to the sub-model the following steps are taken:

  1. Take the probability of the <map_to_word> given the current LM-context. If <map_to_word> equals "", this step is omitted.
  2. Either expand the LM-context with the <map_to_word> (lmc=O*) or not (lmc=O).
  3. Save the current LM-context, and open a new LM-context for the given sub-model containing the <open_symbol> only.
  4. Add the prob of the first word, given the context <open_symbol>, and update the LM-context for the sub-model.

When going from a sub-model to the normal LM (or to another sub-model), the following steps are taken:

  1. take the probability of the <close_symbol> (if defined by the sub-model) given the current LM-context of the sub-model; the highest prob. is used in case the close leads to multiple contexts
  2. remove the LM-context of the sub-model, and restore the previously saved LM-context of the base LM
  3. add the probability of the current word; see the explanation above when going to yet another sub-model

[LM_extension_format]
subdef=<N-gram/FSG/comb/dump/XXX> <name> <file> <options> [cost_A](1) [cost_C](0) [open_symbol](<s>) [close_symbol](</s>)
UL
subdef <name> <file> <options> [cost_A](1) [cost_C](0) [open_symbol](<s>) [close_symbol](</s>)
UL
eos0 <noC/addC> <lmc=S/lmc=OS/lmc=OES> <map_to_word> <frac(log10)> <start_symbol> <words> ...
UL
eos1 <noC/addC> <lmc=S/lmc=OS/lmc=OES> <frac(log10)> <start_symbol> <words> ...
UL
fill0 <noC/addC> <lmc=O/lmc=O*> <map_to_word> <frac(log10)> <words> ...
UL
fill1 <noC/addC> <prob(log10)> <words> ...
UL
submod <noC/addC> <lmc=O/lmc=O*> <map_to_word> <frac(log10)> <submod> <words> ...
UL
sos_cost_adjust=<log10_value)>
UL

Date
July 2002
Author
Kris Demuynck
Bug:

The LM is not normalized when using 'extensions'. This has no tangible effect on the recognition, but perplexities calculated with the given LM will be incorrect.

When stacking multi-class LM's, only the base LM may have extensions.

If the FULL_DEBUG flag is set during compilation (-DFULL_DEBUG), some extra debug mesages and tests are added. These tests result in a serious overhead.

Revision History:
10/2003 - KD
Added the LM-extensions to cwr_lm_std.c
03/2005 - KD
Isolated the LM-extensions so that they can be pushed on top of other LM's than the N-gram LM (and made the extensions more general).