SPRAAK
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
Higher level API configuration file and table format

The SPRAAK system works with a configuration table. This configuration specifies the resources (Lexicon, LM, ...) and all other relevant configuration parameters. On startup, the configuration table is read from file. Since the SPRAAK configuration table is a basically a duplicate of this configuration file, every option that can be set in the configuration file can be found back in the SPRAAK configuration table and vica versa.

The configuration file/table consist of sections, which contain key value pairs. The order in which something is specified in irrelevant. Key and section names are case insensitive (but are restricted to ASCII characters). The following sections and keys (described as written in the config file) have a meaning to the SPRAAK system:

% default section: used for initial values and as a last resort
[default]
  % where to find all files
  basedir       base_directory_for_all_relative_paths
  % test the default setup with some file
  test_file     file_to_recognize_as_sanity_check
  % specify default resources
  SETUP         the_default_setup
  SSP           the_default_preprocesing
  AM            the_default_acoustic_model
  LEX           the_default_lexicon
  LM            the_default_language_model
  SPKR          the_default_speaker
  % specify default values (only used as a last resort)
  cost_A        LM_scale_factor
  cost_C        word_statup_cost(log(prob))
  max_bw        maximum search beam width (#tokens)
  threshold     maximum pruning threshold (relative w.r.t best token)
  min_bw        minimum search beam with (only relevant for FSG LM's)
  add_min_frac  keep real_threshold<=(threshold*(1+add_min_frac) when trying to satisfy min_bw
  frac_wend     multiply threshold with this value when handling word transitions
  lm_cache      size of the LM-cache
  lmc_cache     size of the LM-context hash table
  % Use the new adaptive pruning strategy (useful when handling acoustically challenging conditions characterized by high entropy in the phone state posteriors; lower the threshold in this case)
  % Adjust the beam search parameters based on observable statistics concerning the search and acoustics. The pruning threshold is adjusted based on the average acoustic entropy <Hav> for the last frames. The average acoustic entropy is computed with a first order filter with time constant <t0> (measured in number of frames). The pruning threshold is decreased with the following value max(min(<Hav>,<H1>)-<H0>,0)*<Hscale>. The maximum beam width is adjusted based on the long term average beam-width. The long term average is computed with a first order filter with time constant <t1> (measured in number of frames). The upper limit on the long term average equals <Ntoken_lt_frac>*<Ntokens_max> with <Ntokens_max> the instantaneous maximum beam-width (see "search maximum"). Use negative/illegal values to disable threshold or beam width adjustments.
  % Herender we use sensible (but failry conservative) values, but nevertheless, pruning may still be too hash (this new pruning strategy is still experimental).
  adapt.H0              0.5
  adapt.H1              2.0
  adapt.Hscale          1.0
  adapt.t0              10
  adapt.Ntoken_lt_frac  0.2
  adapt.t1              30

  
% define a preprocessing resource
[SSP:...]
  % the preprocessing pipe-line (multiple scripts are just cattenated)
  file                  script1 script2...
  % file that can be used to initialize the preprocessing (determine the sample frequence, number of channels, size of internal buffers, ...)
  ini_file              some_audio_file
  % if no ini file is present, or if one wants to process stream data, the audio format must be specified as follows
  audio.TYPE            sample_type(I16/ALAW/...)
  audio.FORMAT          sample_endianess(BIN01/BIN10)
  audio.SAMPLEFREQ      sample_frequency
  audio.NCHAN           number_of_channels(1(mono)/2(stereo)/...)
  audio.FSHIFT          frame_shift_for_non_sample_data

% define an acoustic model resource
[AM:...]
  % description of the (context-dependent) (phone) units
  ci            list_of_phone_units
  cd            contenxt_dependency_description
  % files describing the state probability distributions: priors, transition probabilities, observation density functions (only if GMMs)
  hmm           hmm_file
  mvg           gaussians
  sel           sparse_gaussian_mixtures
  rmg           fast evaluation of gaussians
  % type of acoustic model: std=HMM+GMM; dft=HMM+whatever (state log likelihoods provided by the preporcessing)
  type          std/dft
  % options for the acoustic model
  % also supports generic options that specify what the acoustic model computes (likelihoods/posteriors, log/linear values, ...); see @spr_am_flags_od for details
  options       options
  % convert the log/linear likelihoods/posteriors into whatever is needed by the acoustic model + compute some extra info like the total probability (useful for computing confidence scores) or the acoustic entropy (useful for speeding-up the decoding of aucoustically challinging parts); see @spr_am_flags_od for details
  cvt           options
  % configuration settings
  lmo           likelihood_flooring

% define a lexicon resource
[LEX:...]
  % the lexicon content
  file          file_containing_the_lexicon(txt_or_bin)
  % if 'file_bin' is specified and the file referred to exists, this file is used instead of 'file'; otherwise 'file_bin' (if specified) will be created based on 'file' (and be available the next time)
  file_bin      file_containing_the_lexicon(bin)
  % how to handle noise, ...
  unwind        unwind_options
  % prebuild the resource (based on the acoustic model)
  preload       1/0

% define a language model resource
[LM:...]
  % the LM content
  file          file_containing_the_LM
  type          type_of_LM(N-gram/FSG/XXX)
  % how to handle noise, unknown words, ...
  options       LM_options
  % preload the resource (can be large)
  preload       1/0
  % how to integrate with the acoustic model (if not specified by the setup)
  cost_A        LM_scale_factor
  cost_C        word_statup_cost(log(prob))

% define 'no' LM
[LM:NULL]
  preload 1

% define a speaker resource
[SPKR:...]
  file          file_to_store_the_speaker_specific_data

% define a setup
[SETUP:...]
  % resources to used (if not specified, we fall-back to the default section)
  SSP           preprocesing
  AM            acoustic_model
  LEX           lexicon
  LM            language_model
  SPKR          speaker
  % how to combine the LM with the acoustic model
  cost_A        LM_scale_factor
  cost_C        word_statup_cost(log(prob))
  mode          IWR/CWR
  % pruning parameters
  max_bw        maximum search beam width (#tokens)
  threshold     maximum pruning threshold (relative w.r.t best token)
  min_bw        minimum search beam with (only relevant for FSG LM's)
  add_min_frac  keep real_threshold<=threshold*(1+add_min_frac) when trying to satisfy min_bw
  frac_wend     multiply threshold with this factor when handling word transitions
  lm_cache      size of the LM-cache
  lmc_cache     size of the LM-context hash table

% config parameters for monitor functions
[monitor]
  % time step for the progress monitor, see SPRaak_mon_recog()
  timer         <sec>
  % store some measures during the decoding that help in determining the acoustic
  % confidence; automatically enabled when monitoring the progress
  confidence    1/0
  % calculate the LM backwards as well; can be used as a feature for a confidence
  % measure iff one uses a real probabilistic LM; default: 0
  lm_bwd        1/0
  % install the default volume monitor function (no implemented yet);
  % default: 0
  volume        1/0

% some free sections, e.g. the audio device specification as used by
% the program spr_spraak_socket
[audio]
  device        /dev/dsp
  mixer         /dev/mixer
  buflen        10
  port          mic
  gain          0.1
  dataformat    I16
  samplefreq    16000
  nchan         1
  fshift        0.01