SPRAAK
|
Finite state grammar. More...
Data Structures | |
struct | SprCwrFsgWarc |
hash table entry: {state,wid} -> this struct More... | |
struct | SprCwrFsgMarc |
list of sub-models for given FSG-node More... | |
union | _Union1_CWR_LM_FSG_ |
struct | SprCwrFsgSmwh |
hash table entry: {submod,wid} -> this struct More... | |
struct | SprCwrFsgState |
struct | spr_t_cwr_fsg_sm_hdr |
common header for the sub-model structures More... | |
struct | spr_t_cwr_fsg_fsg |
a FSG (sub)model More... | |
struct | spr_t_cwr_fsg_ngram |
an N-gram sub-model More... | |
struct | spr_t_cwr_fsg_elm |
a sub-model (other type of LM) More... | |
union | _Union2_CWR_LM_FSG_ |
struct | SprCwrFsgSm |
a sub-model More... | |
struct | SprCwrFsg |
struct | SprCwrFsgSmFsgi |
interface to a FSG sub-model More... | |
struct | SprCwrFsgSmElmi |
interface to a sub-model (other type of LM) More... | |
Functions | |
SprCwrFsg * | spr_cwr_fsg_free (SprCwrFsg *lm) |
Free all data corresponding to the LM. Return NULL. More... | |
int | spr_cwr_fsg_uninstall (SprCwrLMInterface *lmi, SprCwrLMInstance *lm_data) |
int | spr_cwr_fsg_write (SprStream *fd, const SprCwrFsg *lm) |
int | spr_cwr_fsg_install (SprCwrLMInterface *lmi, SprCwrLMInstance *lm_data, unsigned int sz_lm_data, const SprCwrFsg *lm, const SprCwrSLex *lex) |
SprCwrFsg * | spr_cwr_fsg_read (SprCwrFsg *dst, const char *fname, const char *options) |
Variables | |
const int | spr_cwr_lm_fsg_offs_sm_id_main |
needed since we don't have descent object yet More... | |
const SprCmdOptDesc | spr_cwr_od_fsg [] |
const SprCwrSRMDesc | spr_cwr_srm_desc_lm_fsg |
Finite state grammar.
Implementation of a finite state grammar (FSG) with sub-models. The implementation is optimized towards speed at the cost of memory; hence one should not try to encode a large N-gram as FSG using this implementation.
An observation in a given state is evaluated as follows:
[FSG_file_format] | |
---|---|
name <str> | |
Give the FSG a name. | |
Nstate <Nstate> | |
The FSG uses <Nstate> states (may be overestimated with a small fraction). | |
Narc <Narc> | |
The FSG uses approximately <Narc> (word) arcs; does not have to be exact, should only include the arcs which observe words, arcs which start sub-models should not be counted. | |
fail_cost <cost> | |
If the incomming word cannot be explained at all, then assign a cost <cost> (and stay in the same state). | |
accept <word/submod> ... | |
Define a list of words the FSG (and its sub-models) can use as input symbols, i.e. only these words/sub-models may be observed on an arc. | |
output <word/submod> ... | |
Define the words that can be output by the FSG. | |
arc <from> <to> <word/submod> <cost> <symb_out>([]) [to...] ... | |
Define one ore more arcs starting from node <from>. Each arc goes from node <from> to node <to> while observing word <word> (use '[]' for an epsilon arc) and with a cost <cost>. Optionally, there can also be an output symbol (no output is indicated by []). | |
end <ecost> <TRY/NO_END/NO_FIT/OOV> <state_nr> ... | |
Signal that the sub-model can end in the given set of states with a given extra cost and when a certain condition is met: OOV if the observed word is an OOV word; NO_FIT if the observed word does not fit for the current state TRY try to both stop the sub-model and continue it (results in two hypotheses to investigate) | |
fb_state <to_state> <cost> <state_nr> ... | |
If the incomming word cannot be observed in state(s) <state_nr>, then fall-back to state <to_state> with an extra cost <cost>. |
[Load_flags] | ||
---|---|---|
check_lvl=<I32:val>(0) | ||
Do some checking to see if there are errors in the LM, the level controls the output: (0) no checking, (1) give error count, (2) give detailed error info. | ||
closed=<I32:val>(1) | ||
The FSG is closed vocab – report open vocab words. | ||
max_eps_arc=<I32:val>(1) | ||
Limit the number of consequetive eps-arcs to <max_eps_lvl>. | ||
max_arc=<I32:val>(1024) | ||
Specify the maximum number of arcs that will match for any given word; calculating this from the FSG is slow and hence for practical reasons a conservative upperbound is used. Specify -1 to calculate the real number. | ||
prob_clip=<F32:val>(-999.9) | ||
Discard options when the probability is too low. | ||
FWDstate=<I32:val>(-1) | ||
Use the probabilities in the specified state as static LM smearing probs. By default max(P(w|*)) is used as static LM smearing prob. (excluding submod-end and eps-arc transitions). | ||
ext=<String:val>(NULL) | ||
A file containing word extension on the base lexicon. | ||
wdsf=<String:val>(NULL) | ||
A file containing word dependent LM scale factors. |