Finite state grammar. More...

Data Structures
struct	SprCwrFsgWarc
	hash table entry: {state,wid} -> this struct More...

struct	SprCwrFsgMarc
	list of sub-models for given FSG-node More...

union	_Union1_CWR_LM_FSG_

struct	SprCwrFsgSmwh
	hash table entry: {submod,wid} -> this struct More...

struct	SprCwrFsgState

struct	spr_t_cwr_fsg_sm_hdr
	common header for the sub-model structures More...

struct	spr_t_cwr_fsg_fsg
	a FSG (sub)model More...

struct	spr_t_cwr_fsg_ngram
	an N-gram sub-model More...

struct	spr_t_cwr_fsg_elm
	a sub-model (other type of LM) More...

union	_Union2_CWR_LM_FSG_

struct	SprCwrFsgSm
	a sub-model More...

struct	SprCwrFsg

struct	SprCwrFsgSmFsgi
	interface to a FSG sub-model More...

struct	SprCwrFsgSmElmi
	interface to a sub-model (other type of LM) More...

Functions
SprCwrFsg *	spr_cwr_fsg_free (SprCwrFsg *lm)
	Free all data corresponding to the LM. Return NULL. More...

int	spr_cwr_fsg_uninstall (SprCwrLMInterface lmi, SprCwrLMInstance lm_data)

int	spr_cwr_fsg_write (SprStream fd, const SprCwrFsg lm)

int	spr_cwr_fsg_install (SprCwrLMInterface lmi, SprCwrLMInstance lm_data, unsigned int sz_lm_data, const SprCwrFsg lm, const SprCwrSLex lex)

SprCwrFsg *	spr_cwr_fsg_read (SprCwrFsg dst, const char fname, const char *options)

Variables
const int	spr_cwr_lm_fsg_offs_sm_id_main
	needed since we don't have descent object yet More...

const SprCmdOptDesc	spr_cwr_od_fsg []

const SprCwrSRMDesc	spr_cwr_srm_desc_lm_fsg

Detailed Description

Finite state grammar.

Implementation of a finite state grammar (FSG) with sub-models. The implementation is optimized towards speed at the cost of memory; hence one should not try to encode a large N-gram as FSG using this implementation.

An observation in a given state is evaluated as follows:

take all arcs which have a matching terminal symbol (word arcs)
take all arcs with matching non terminal symbols (sub-model arcs)
test all eps-arcs (recursive, depth limited to <eps_lvl_depth>)
end the sub-model if the source state is an end-node and if the end condition is fullfilled (ignoring the outcome of eps-arcs)
if no match was found (ignoring the outcome of eps-arcs and END_TRY end conditions) and a fall-back state was specified, go to the fall-back state and retry
it is impossible to traverse a sub-model (using eps-arcs or by means of start-state that is an end-state as well) with no observation if that observation could not be observed in the sub-model as well; use an eps-arc to bridge the sub-model if this limitation must be overcome

Remarks on eps-arcs

Eps-arcs can introduce a substantial overhead. Avoid eps-arcs if possible. If eps-arcs are needed, try to adhere to the following guidelines:

when two or more states are connected with eps-arcs, only one state (or none) should be an end-state; a simple technique to fulfill this guideline, is to make sure that source states with eps-arcs are not end-states by using an eps-arcs refering to a global end-state instead
when two or more states are connected with eps-arcs, only one state (or none) should have a fall-back state
avoid chains of eps-arcs; or limit the depth using the <eps_lvl_depth> or <prob_clip> configuration parameters

Remark on sub-models: Sub-models introduce a substantial overhead. For optimal performance: combine the FSGs into one monolithic FSG and minimize that.

Remark on end-states and end conditions: Avoid the TRY condition, as it may introduce a large number of alternatives.

[FSG_file_format]
`name <str>`
Give the FSG a name.
`Nstate <Nstate>`
The FSG uses <Nstate> states (may be overestimated with a small fraction).
`Narc <Narc>`
The FSG uses approximately <Narc> (word) arcs; does not have to be exact, should only include the arcs which observe words, arcs which start sub-models should not be counted.
`fail_cost <cost>`
If the incomming word cannot be explained at all, then assign a cost <cost> (and stay in the same state).
`accept <word/submod> ...`
Define a list of words the FSG (and its sub-models) can use as input symbols, i.e. only these words/sub-models may be observed on an arc.
`output <word/submod> ...`
Define the words that can be output by the FSG.
`arc <from> <to> <word/submod> <cost> <symb_out>([]) [to...] ...`
Define one ore more arcs starting from node <from>. Each arc goes from node <from> to node <to> while observing word <word> (use '[]' for an epsilon arc) and with a cost <cost>. Optionally, there can also be an output symbol (no output is indicated by []).
`end <ecost> <TRY/NO_END/NO_FIT/OOV> <state_nr> ...`
Signal that the sub-model can end in the given set of states with a given extra cost and when a certain condition is met: OOV if the observed word is an OOV word; NO_FIT if the observed word does not fit for the current state TRY try to both stop the sub-model and continue it (results in two hypotheses to investigate)
`fb_state <to_state> <cost> <state_nr> ...`
If the incomming word cannot be observed in state(s) <state_nr>, then fall-back to state <to_state> with an extra cost <cost>.

[Load_flags]
`check_lvl=<I32:val>(0)`
Do some checking to see if there are errors in the LM, the level controls the output: (0) no checking, (1) give error count, (2) give detailed error info.
`closed=<I32:val>(1)`
The FSG is closed vocab – report open vocab words.
`max_eps_arc=<I32:val>(1)`
Limit the number of consequetive eps-arcs to <max_eps_lvl>.
`max_arc=<I32:val>(1024)`
Specify the maximum number of arcs that will match for any given word; calculating this from the FSG is slow and hence for practical reasons a conservative upperbound is used. Specify -1 to calculate the real number.
`prob_clip=<F32:val>(-999.9)`
Discard options when the probability is too low.
`FWDstate=<I32:val>(-1)`
Use the probabilities in the specified state as static LM smearing probs. By default max(P(w\|)) is used as static LM smearing prob. (excluding submod-end and eps-arc transitions).*
`ext=<String:val>(NULL)`
A file containing word extension on the base lexicon.
`wdsf=<String:val>(NULL)`
A file containing word dependent LM scale factors.

Date: March 2004

Author: Kris Demuynck

Revision History:

15/03/04 - KD: implemented based on cwr_lm_template and cwr_lm_std.
25/11/04 - KD: added ESP-arcs + fixed some bugs
20/04/05 - KD: fixed some bugs, made it more general

Bug:: The implementation is not optimal is some circumstances, e.g. when reaching a sure end of a sub-model: the current implementation stays in the sub-model. Closing the sub-model will reduce the number of open LM-contexts and hence will allow for more recombination in the search.

Data Structures

Functions

Variables

Detailed Description