SPRAAK
|
The SPRAAK system works with a configuration table. This configuration specifies the resources (Lexicon, LM, ...) and all other relevant configuration parameters. On startup, the configuration table is read from file. Since the SPRAAK configuration table is a basically a duplicate of this configuration file, every option that can be set in the configuration file can be found back in the SPRAAK configuration table and vica versa.
The configuration file/table consist of sections, which contain key value pairs. The order in which something is specified in irrelevant. Key and section names are case insensitive (but are restricted to ASCII characters). The following sections and keys (described as written in the config file) have a meaning to the SPRAAK system:
% default section: used for initial values and as a last resort [default] % where to find all files basedir base_directory_for_all_relative_paths % test the default setup with some file test_file file_to_recognize_as_sanity_check % specify default resources SETUP the_default_setup SSP the_default_preprocesing AM the_default_acoustic_model LEX the_default_lexicon LM the_default_language_model SPKR the_default_speaker % specify default values (only used as a last resort) cost_A LM_scale_factor cost_C word_statup_cost(log(prob)) max_bw maximum search beam width (#tokens) threshold maximum pruning threshold (relative w.r.t best token) min_bw minimum search beam with (only relevant for FSG LM's) add_min_frac keep real_threshold<=(threshold*(1+add_min_frac) when trying to satisfy min_bw frac_wend multiply threshold with this value when handling word transitions lm_cache size of the LM-cache lmc_cache size of the LM-context hash table % Use the new adaptive pruning strategy (useful when handling acoustically challenging conditions characterized by high entropy in the phone state posteriors; lower the threshold in this case) % Adjust the beam search parameters based on observable statistics concerning the search and acoustics. The pruning threshold is adjusted based on the average acoustic entropy <Hav> for the last frames. The average acoustic entropy is computed with a first order filter with time constant <t0> (measured in number of frames). The pruning threshold is decreased with the following value max(min(<Hav>,<H1>)-<H0>,0)*<Hscale>. The maximum beam width is adjusted based on the long term average beam-width. The long term average is computed with a first order filter with time constant <t1> (measured in number of frames). The upper limit on the long term average equals <Ntoken_lt_frac>*<Ntokens_max> with <Ntokens_max> the instantaneous maximum beam-width (see "search maximum"). Use negative/illegal values to disable threshold or beam width adjustments. % Herender we use sensible (but failry conservative) values, but nevertheless, pruning may still be too hash (this new pruning strategy is still experimental). adapt.H0 0.5 adapt.H1 2.0 adapt.Hscale 1.0 adapt.t0 10 adapt.Ntoken_lt_frac 0.2 adapt.t1 30 % define a preprocessing resource [SSP:...] % the preprocessing pipe-line (multiple scripts are just cattenated) file script1 script2... % file that can be used to initialize the preprocessing (determine the sample frequence, number of channels, size of internal buffers, ...) ini_file some_audio_file % if no ini file is present, or if one wants to process stream data, the audio format must be specified as follows audio.TYPE sample_type(I16/ALAW/...) audio.FORMAT sample_endianess(BIN01/BIN10) audio.SAMPLEFREQ sample_frequency audio.NCHAN number_of_channels(1(mono)/2(stereo)/...) audio.FSHIFT frame_shift_for_non_sample_data % define an acoustic model resource [AM:...] % description of the (context-dependent) (phone) units ci list_of_phone_units cd contenxt_dependency_description % files describing the state probability distributions: priors, transition probabilities, observation density functions (only if GMMs) hmm hmm_file mvg gaussians sel sparse_gaussian_mixtures rmg fast evaluation of gaussians % type of acoustic model: std=HMM+GMM; dft=HMM+whatever (state log likelihoods provided by the preporcessing) type std/dft % options for the acoustic model % also supports generic options that specify what the acoustic model computes (likelihoods/posteriors, log/linear values, ...); see @spr_am_flags_od for details options options % convert the log/linear likelihoods/posteriors into whatever is needed by the acoustic model + compute some extra info like the total probability (useful for computing confidence scores) or the acoustic entropy (useful for speeding-up the decoding of aucoustically challinging parts); see @spr_am_flags_od for details cvt options % configuration settings lmo likelihood_flooring % define a lexicon resource [LEX:...] % the lexicon content file file_containing_the_lexicon(txt_or_bin) % if 'file_bin' is specified and the file referred to exists, this file is used instead of 'file'; otherwise 'file_bin' (if specified) will be created based on 'file' (and be available the next time) file_bin file_containing_the_lexicon(bin) % how to handle noise, ... unwind unwind_options % prebuild the resource (based on the acoustic model) preload 1/0 % define a language model resource [LM:...] % the LM content file file_containing_the_LM type type_of_LM(N-gram/FSG/XXX) % how to handle noise, unknown words, ... options LM_options % preload the resource (can be large) preload 1/0 % how to integrate with the acoustic model (if not specified by the setup) cost_A LM_scale_factor cost_C word_statup_cost(log(prob)) % define 'no' LM [LM:NULL] preload 1 % define a speaker resource [SPKR:...] file file_to_store_the_speaker_specific_data % define a setup [SETUP:...] % resources to used (if not specified, we fall-back to the default section) SSP preprocesing AM acoustic_model LEX lexicon LM language_model SPKR speaker % how to combine the LM with the acoustic model cost_A LM_scale_factor cost_C word_statup_cost(log(prob)) mode IWR/CWR % pruning parameters max_bw maximum search beam width (#tokens) threshold maximum pruning threshold (relative w.r.t best token) min_bw minimum search beam with (only relevant for FSG LM's) add_min_frac keep real_threshold<=threshold*(1+add_min_frac) when trying to satisfy min_bw frac_wend multiply threshold with this factor when handling word transitions lm_cache size of the LM-cache lmc_cache size of the LM-context hash table % config parameters for monitor functions [monitor] % time step for the progress monitor, see SPRaak_mon_recog() timer <sec> % store some measures during the decoding that help in determining the acoustic % confidence; automatically enabled when monitoring the progress confidence 1/0 % calculate the LM backwards as well; can be used as a feature for a confidence % measure iff one uses a real probabilistic LM; default: 0 lm_bwd 1/0 % install the default volume monitor function (no implemented yet); % default: 0 volume 1/0 % some free sections, e.g. the audio device specification as used by % the program spr_spraak_socket [audio] device /dev/dsp mixer /dev/mixer buflen 10 port mic gain 0.1 dataformat I16 samplefreq 16000 nchan 1 fshift 0.01