Large Vocabulary Continuous Speech Recognition with Hidden Markov models. More...

Data Structures
struct	TokenHeap

struct	CwrTrack

Detailed Description

Large Vocabulary Continuous Speech Recognition with Hidden Markov models.

spr_cwr_main [-i input stream](stdin) [-o output stream](stdout) [-e error stream]

Parameters

-i<em>input	stream File/stream to read commands from.
-o<em>output	stream File/stream to write results from.
-e<em>error	stream File/stream to write error messages to.

Large Vocabulary Continuous Speech Recognition with Hidden Markov models. Interactive recognition system for large vocabulary continuous speech recognition. The available commands can be recalled by entering the 'help' command after the prompt.

[commands]
`load phone_set <ci_fname> <cd_fname>`
Load a new phone(me) set. Both the context independent and the context dependent set must be specified. The context dependent set contains also information about the HMMs.
`load lexicon <bin_lex/text_lex> [bin_lex] [unwind]`
unwind (options are seperated by a ';' character):
`check_lvl=<I32:val>`
Check whether the lexicon is correct (level>=0) and optimal (level>=1).
`add_in_front=<String:val>`
Add a specified phone sequence in front of all word/sentence descriptions.
`add_between=<String:val>`
Add a specified phone sequence between all words in a sentence description.
`add_at_rear=<String:val>`
Add a specified phone sequence at the rear of all word/sentence descriptions.
`sent_context=<String:val>`
Alignment: a simple phone sequence (no optional sequences) used as context before (and after) a sentence for the CI-phone to the CD-phone conversion (not applied with the assimilation rules).
`sil_word=<String:val>`
Recognition: a word used for marking valid start (all paths extending from that words) and end sequences (all paths going to that word).
`assimilation=<String:val>`
Use the specified file with assimilation rules.
`apply_Wrules=<String:val>`
Apply only those within-word rules that have any of the specified characters set. By default all rules are applied within-word.
`apply_Arules=<String:val>`
Apply only those across-word rules that have any of the specified characters set. By default all rules are applied across-word.
`cross_word=<no/yes>`
(Do not) allow cross-word CD-phones (value: yes/no). By default cross-word CD-phones are allowed.
`quin_patch=<I8:val>`
Recognition: Set to a value of 2 to use cross-word quinphoneseven for sequences of words that consist out of a single phoneme. To allow this, the single phoneme words [spw] and multiphoneme words [mpw] are isolated and reorganized in a new network as follows: ROOT->(s1,s2) s1->spw->(s2,s3) s2->mpw->(s1,s2) s3->spw->(s1,s2) Alignment: allow to cross up to the given number of word boundaries in a match (gives no problems since the network for aligment is non cyclic).
`print_info=<none/some/all>`
(Do not) print advance information while converting the lexicon network (values=all/some/none). By default no advance info is printed.
`lc_weight=<I32:val>`
Number between [-1024,1024] giving more or less importance to the left (positive values) or right context. By default the left context is given some preference (lc_weight=1).
`old_multi_pron`
Support the old way of specifying multiple pronunciations, i.e. multiple entries for the same word.
`no_wsep`
Do not separate the words by a dummy START state. This allows to look across two word boundaries (normal operation only allow one word boundary, which may give problems when quinphones are used).
`graph_ext=<I8:val>`
Character to indicate grapheme extensions. Default: '('.
`xO<I32:val>`
Optimization level (0=NO, 1=Normal, 2=Full,default).
`RExO`
Redo the network optimisation when loading the binary lexicon.
Load a new lexicon. The type of the lexicon (textual or binary) is detected automatically. If the lexicon is in the textual format and the second argument is specified (and does not equal '*'), the intermediate binary lexicon representation will be saved, providing faster binary loads afterwards. For the unwind options, see read_lex_desc().
`load lm <fname> [options]()`	`[requires c-strings]`
options (options are seperated by a ';' character):
`FWD<I32:val>(1)`
Method used to calculate the forwarded probabilities: (0) no prob. forwarding, (1) the unigram probabilities, (2) a weighted sum of the N-gram probabilities over all seen events, (3) sqrt(fwd_1*fwd_2), (4) (fwd_1+fwd)/2.
`check_lvl=<I8:val>(0)`
Do some checking to see if there are errors in the LM, the level controls the output: (0) no checking, (1) give error count, (2) give detailed error info, (3) do not allow non zero back-off costs on empty LM-contexts, (4) report the first empty LM-context with a non zero back-off cost, (5) do not allow non-zero LM-contexts, (6) report the first non-zero LM-context.
`CLOSED(NULL)`
Do not allow a closed vocabulary LM to be used in an open vocabulary setup.
`CSR(NULL)`
Do not treat </s> as a blocking end-of-sentence symbol.
`ELMC0(NULL)`
Force the back-off cost of empty LM-contexts to zero.
`ext=<String:val>(NULL)`
A file containing word extension on the base lexicon.
`wdsf=<String:val>(NULL)`
A file containing word dependent LM scale factors.
Load a new N-gram language model.
`load lm=<N-gram/FSG/comb/dump/XXX> <fname> [options]()`	`[requires c-strings]`
Load a new language model of a certain type.
`load hmm <hmm> [mvg] [sel] [opt]`
Load new HMMs and the corresponding set of gaussians. For both the mvg and select file name a '' may be specified to indicate that the default file name should be used.*
`load hmm=<std/mdt/dft> <hmm> [mvg] [sel] [opt]`
Load new HMMs of a certain type and the corresponding set of gaussians. For both the mvg and select file name a '' may be specified to indicate that the default file name should be used.*
`set fast models <ON/OFF>(ON)`
Set score normalisation (divide state likelihoods by the unconditional frame likelihood, i.e. poor-mans posteriors) on or off (only relevant for the fast models).
`set norm <ON/OFF>(ON)`
Fast evaluation of the acoustic models.
`set asf <asf>(0.434294482)`
Set the scale factor for the acoustic scores (log likelihoods). By default, log10 values are used (<asf>=1/log(10)=0.434...).
`set top_n <values>`
Set the top_n value(s) (number of evaluated gaussians).
`set rmg <rmg_flags> ...`
Set the rmg flags (fast selection of gaussians).
`set sel_gauss <ON/OFF>(ON)`
Limit the pool of gaussians to those needed by the search space (states needed by the lexicon); this speeds up the search for small problems but changes the normalized acoustic scores.
`set logminout <value>`
Set the logminout value (flooring of gaussians and outputprobs).
`search maximum <prune_thresh> [frac_wend_prune](0.6) [Ntokens_max](-1) [size_lm_cache](-1) [size_lmc_cache](-1)`
Set the parameters that control the search effort or are related to it. See modify_search_space() for more details.
`search minimum <Ntokens_min>(0) [threshold_frac](1.0)`
Try to assure a minimal beam-width. See modify_search_space() for more details.
`search adapt <H0>(-1) [H1](-1) [Hscale](1.0) [t0](10) [Ntoken_lt_frac](-1) [t1](30)`
Adjust the beam search parameters based on observable statistics concerning the search and acoustics. The pruning threshold is adjusted based on the average acoustic entropy <Hav> for the last frames. The average acoustic entropy is computed with a first order filter with time constant <t0> (measured in number of frames). The pruning threshold is decreased with the following value max(min(<Hav>,<H1>)-<H0>,0)<Hscale>. The maximum beam width is adjusted based on the long term average beam-width. The long term average is computed with a first order filter with time constant <t1> (measured in number of frames). The upper limit on the long term average equals <Ntoken_lt_frac><Ntokens_max> with <Ntokens_max> the instantaneous maximum beam-width (see \"search maximum\\"). Use negative/illegal values to disable threshold or beam width adjustments.
`search trace <nodes/phones/nothing>`
Store additional traceback information while recognizing.
`search mode <CSR/CWR/IWR>(CWR)`
Set the recognition mode to CSR (multiple sentences), CWR (single sentence) or IWR (single word).
`search lmi [A](1.0) [C](0.0) [FULL/LTD/UNI/NONE](UNI)`
Set the parameters that modify the impact of the language model. The total LM_cost is calculated as follows: cost=A*log(LM_prob)+C. The third parameter controls the LM-forwarding (early application of the LM).
`search constraint <sentence/word/none>(sentence) [start+/start-](start+) [end+/end-](end+) [max_empty_words](1)`
Constraint the search space. See modify_search_space() for more details.
`load ssp_script <ssp_script> ...`
Load a new signal processing script.
`preprocess <input_file> [output_file]({key=FORMAT:ASCII;fname=stdout;}) [btime/f0](0.0) [etime/nfr](-1)`
Preprocess a given file. This will configure the preprocessing and allow all speaker adaptive settings to adapt to the audio.
`load spkr_data <fname>`
Load previously stored speaker information from the preprocessing pipeline. Note: the preprocessing must be fully configured. Process an empy file with a legal header (see the 'preprocess' command) if neccesary.
`save spkr_data <fname>`
Save all speaker information from the preprocessing pipeline.
`set ssp obsdir <obsdir>`
Set the directory name to add in front of all data file names
`set ssp suffix <suffix>`
Set the file suffix to add to all data file names.
`set ssp timebase <CONTINUOUS/DISCRETE> [ABSOLUTE+/ABSOLUTE/RELATIVE]`
Select whether CONTINUOUS (seconds) or DISCRETE (frame number) time indications have to be used in the file and block commands. The second parameter selects whether ABSOLUTE times (e.g. begin and end times) or RELATIVE times (e.g. start at frame X and process Y frames) are preferred. The ABSOLUTE+ option will process up to and including the last frame. By default, an ABSOLUTE-CONTINUOUS timebase is used.
`file <fname> [btime](0) [etime](-1) [spkr_id](-1)`
Open the specified file (if the file is a new one) and process the selected data. The meaning of <btime> and <etime> is explained in the timebase command. Use the <spkr_id> to save/reload adaptive beam search parameters on a speaker by speaker basis.
`<record+play/record> [a2d_args] ...`
Record data and process it.
`play [fname]`
Play back a given (or the last recorded) file.
`track <init/clear> <topN> <file>`
Setup/clear the search tracker, i.e. dump for each frame all available information on the <topN> best tokens to the file <file>.
`search prune_wl <init/clear> [max_nfr](200) [score_dec](32.0) [file]`
Setup/clear an extra pruning contraint based on the word length. The acoustic score of all words that are longer than <max_nfr> frames is decremented (per extra frame) with a value of <score_dec>. Word specific values for <max_nfr> can be specified in the optional file <file>.
`word_graph <init/clear> <file&options> [write_delay](-1) [finalize_delay](-1) [options]`
file&options (options are seperated by a ';' character):
`fname=<String:val>`
`ei=<none/node/node+score/BE>`
`<String:val>`
options (options are seperated by a ';' character):
`store_nodes=<no/yes>(no)`
`best_end_only=<no/yes>(no)`
Setup/clear the word graph handler. A (-1) delay means infinite delay: as long as the system thinks is needed, or to the end of the file. If the option 'store_nodes=yes' is given, the node sequences (see the 'search trace' command) or derivative information may be stored in the lattice as well (see the <file&options> argument for the additional options). The 'best_end_only=yes' option will output a single end node only corresponding to the 'best' end condition (see the 'search contraint' command).
`early_out <init/clear> <program> [text] ...`
Setup/clear an early output handler (displaying results while processing the data). If the program name has the format '* < input_fname > output_fname', an existing program is assumed, with the specified communication pipes. See also 'escape sequences'.
`early_out timer <millisec>`
Set the early output timer
`print scores [on/off](on)`
Do (not) print the model scores before the result is given.
`print words [on/off](on)`
Print words instead of model numbers. as result.
`print normalized [on/off](on)`
Print normalized scores (score divided by nr. of frames).
`print nfr [on/off](on)`
Always print the number of frames used.
`print <top_mod>`
Print only the results for the <top_mod> best models.
`list`
List the recognition parameters.
`recognizer <check/reset/compact>`
Perform a self test on the search space (check), or reset the search space and optionally free all non used memory (reset/compact).
`list models`
List the active vocabulary.
`give vocab`
List the vocabulary in a raw format.
`[give/g/write/w] units <sentence> ...`
Give the phones as stored in the lexicon of a given word/sentence. The write version will give a representation that is close to the actual network structure. The print will give a compact representation.
`print lexicon [dest](stdout)`
Print the lexicon to a file.
`write rmg <rmg_fname>`
Write the rmg-information for quick initialization later on.
`give segmentation <word/node/both>`
Print a detailed segmentation for the recognized sentence.
`give segmentation <word_nr>`
Give the recognized phones or states (and their duration) as recognized by the system for the given word (specified by its place in the recognized sentence).
`query <lm/lm*> <action> [context] [words]`
action (options are seperated by a ',' character):
`word`
Print for all specified words (1) whether they are in the lexicon, (2) whether they are in the LM, and (3) their static LM-smearing prob.
`context`
Print information concerning the given context (e.g. discount prob.) and how the specified words react in that context (possible classes + probs).
`next`
Give all possible successor words + classes (no fall back) and the corresponding probability for a given context.
`prev`
Give all possible predecessor words + classes (no fall back in the reverse LM) for a given context.
`tag`
Give the best possible class sequence corresponding to a given sequence of words for a certain start context and its probability.
`explain`
Calculate the total probability of a given sequence of words for a certain start condition and explain all steps.
`test_prob`
Verify the good operation of the func_prob(), func_lmcn(), func_lmcf() and func_lmcq() routines for the given sentence.
`test_pall`
Verify the good operation of the func_pall() routine for the given sentence.
Query the language model with scaling of the probabilities (lm) or without (lm). The words must be separated underscores. See also LM_query().*
`calculate perplexity <test_corpus> [sentence_start_cond] [sentence_end_cond]`
Calculate the perplexity for a given test corpus assuming a single sentence per entry. The start condition (usually <s>) and the end condition (usually </s>) must be specified. The probability of the end condition is included in the perplexity calculation.
`set <ci-phone/cd-phone/word-in/word-out> delimiter <delim>`	`[allows c-strings]`
Set/modify the delimiter used to separate phones (ci/cd) or words (when reading/writing)
`debug <level>`
Set debug level.
`debug add <routine> ...`
Add items to the DEBUG_IN list.
`debug delete [routine] ...`
Delete items in the DEBUG_IN list. Delete all items if no <routine> is given.
`debug stream <stream>`
Redirect the message-stream to a new file.
`debug list`
Print the DEBUG_IN list.
`debug print <hmminfo/states/mvgs/units> [number]`
Print the structure asked for, used as debug information. A specific state, mvg or segment number can be asked for.
`==== <sync_nr> ====`
Set a synchronisation mark in the data stream: copy the command to the output file/pipe.
`setenv <name> [value] ...`
Set environment variables.
`unsetenv <name> ...`
Clear environment variables.
`man [item] ...`
Display help on an item.
`help [item] ...`
Display help on an item.
`history [size](-1) [fname]`
Print the command history to the given file.
`diary [fname](/)`
Turn logging to a specified file on/off
`tic`
Reset the timer.
`toc`
Print the timer.
`echo [string] ...`	`[allows c-strings]`
Echo back some text.
`<chdir/cd> <directory>`
Change directory.
`var <var_name> ...`
Define or delete (precede the variable name with a '-') variables for use with builtin calculator (see '@' command).
`whos [var_name] ...`
List the variables for the builtin calculator (see '@' command).
`@<expression> ...`
Evaluate a mathematical expression.
`![cmd] [args] ...`
Invoke the shell indicated by the 'SHELL' environment variable and execute the given command.
`<quit/exit/q/x>`
Leave the program.
`%[txt] ...`
Comment lines.
`#[txt] ...`
Comment lines.
`set prompt [prompt] ...`
Set/clear the prompt. See also 'escape sequences'.
`set record cmd [cmd] ...`
Set the record command. See also 'escape sequences'.
`set play cmd [cmd] ...`
Set the play back command. See also 'escape sequences'.
`escape sequences`
Some command and string definitions have limited support for escape sequences. In addition to the most common C-escape sequences, following codes can be used: '\>': is replaced by the output file name. '\<': is replaced by the input file name. '\#': is replaced by the extra arguments specified with the command.

Date: Nov 1995

Author: Kris Demuynck

Revision History:

9/98 - KD: added the continuous recognition mode
9/98 - KD: added the word lattices
8/02 - KD: changed the command syntax

Data Structures

Detailed Description