|
SPRAAK
|
Finite state grammar. More...
Data Structures | |
| struct | SprCwrFsgWarc |
| hash table entry: {state,wid} -> this struct More... | |
| struct | SprCwrFsgMarc |
| list of sub-models for given FSG-node More... | |
| union | _Union1_CWR_LM_FSG_ |
| struct | SprCwrFsgSmwh |
| hash table entry: {submod,wid} -> this struct More... | |
| struct | SprCwrFsgState |
| struct | spr_t_cwr_fsg_sm_hdr |
| common header for the sub-model structures More... | |
| struct | spr_t_cwr_fsg_fsg |
| a FSG (sub)model More... | |
| struct | spr_t_cwr_fsg_ngram |
| an N-gram sub-model More... | |
| struct | spr_t_cwr_fsg_elm |
| a sub-model (other type of LM) More... | |
| union | _Union2_CWR_LM_FSG_ |
| struct | SprCwrFsgSm |
| a sub-model More... | |
| struct | SprCwrFsg |
| struct | SprCwrFsgSmFsgi |
| interface to a FSG sub-model More... | |
| struct | SprCwrFsgSmElmi |
| interface to a sub-model (other type of LM) More... | |
Functions | |
| SprCwrFsg * | spr_cwr_fsg_free (SprCwrFsg *lm) |
| Free all data corresponding to the LM. Return NULL. More... | |
| int | spr_cwr_fsg_uninstall (SprCwrLMInterface *lmi, SprCwrLMInstance *lm_data) |
| int | spr_cwr_fsg_write (SprStream *fd, const SprCwrFsg *lm) |
| int | spr_cwr_fsg_install (SprCwrLMInterface *lmi, SprCwrLMInstance *lm_data, unsigned int sz_lm_data, const SprCwrFsg *lm, const SprCwrSLex *lex) |
| SprCwrFsg * | spr_cwr_fsg_read (SprCwrFsg *dst, const char *fname, const char *options) |
Variables | |
| const int | spr_cwr_lm_fsg_offs_sm_id_main |
| needed since we don't have descent object yet More... | |
| const SprCmdOptDesc | spr_cwr_od_fsg [] |
| const SprCwrSRMDesc | spr_cwr_srm_desc_lm_fsg |
Finite state grammar.
Implementation of a finite state grammar (FSG) with sub-models. The implementation is optimized towards speed at the cost of memory; hence one should not try to encode a large N-gram as FSG using this implementation.
An observation in a given state is evaluated as follows:
| [FSG_file_format] | |
|---|---|
name <str> | |
Give the FSG a name. | |
Nstate <Nstate> | |
The FSG uses <Nstate> states (may be overestimated with a small fraction). | |
Narc <Narc> | |
The FSG uses approximately <Narc> (word) arcs; does not have to be exact, should only include the arcs which observe words, arcs which start sub-models should not be counted. | |
fail_cost <cost> | |
If the incomming word cannot be explained at all, then assign a cost <cost> (and stay in the same state). | |
accept <word/submod> ... | |
Define a list of words the FSG (and its sub-models) can use as input symbols, i.e. only these words/sub-models may be observed on an arc. | |
output <word/submod> ... | |
Define the words that can be output by the FSG. | |
arc <from> <to> <word/submod> <cost> <symb_out>([]) [to...] ... | |
Define one ore more arcs starting from node <from>. Each arc goes from node <from> to node <to> while observing word <word> (use '[]' for an epsilon arc) and with a cost <cost>. Optionally, there can also be an output symbol (no output is indicated by []). | |
end <ecost> <TRY/NO_END/NO_FIT/OOV> <state_nr> ... | |
Signal that the sub-model can end in the given set of states with a given extra cost and when a certain condition is met: OOV if the observed word is an OOV word; NO_FIT if the observed word does not fit for the current state TRY try to both stop the sub-model and continue it (results in two hypotheses to investigate) | |
fb_state <to_state> <cost> <state_nr> ... | |
If the incomming word cannot be observed in state(s) <state_nr>, then fall-back to state <to_state> with an extra cost <cost>. | |
| [Load_flags] | ||
|---|---|---|
check_lvl=<I32:val>(0) | ||
Do some checking to see if there are errors in the LM, the level controls the output: (0) no checking, (1) give error count, (2) give detailed error info. | ||
closed=<I32:val>(1) | ||
The FSG is closed vocab – report open vocab words. | ||
max_eps_arc=<I32:val>(1) | ||
Limit the number of consequetive eps-arcs to <max_eps_lvl>. | ||
max_arc=<I32:val>(1024) | ||
Specify the maximum number of arcs that will match for any given word; calculating this from the FSG is slow and hence for practical reasons a conservative upperbound is used. Specify -1 to calculate the real number. | ||
prob_clip=<F32:val>(-999.9) | ||
Discard options when the probability is too low. | ||
FWDstate=<I32:val>(-1) | ||
Use the probabilities in the specified state as static LM smearing probs. By default max(P(w|*)) is used as static LM smearing prob. (excluding submod-end and eps-arc transitions). | ||
ext=<String:val>(NULL) | ||
A file containing word extension on the base lexicon. | ||
wdsf=<String:val>(NULL) | ||
A file containing word dependent LM scale factors. | ||
1.8.6