Defines the attributes of the configuration used by the trainer objects.
More...
Defines the attributes of the configuration used by the trainer objects.
The following attributes can be used:
basedir
- base directory for the resulting models.
exp
- base name of the experiment
log
- log-file, relative to
basedir
mname
, miname
- naming conventions for the resulting models allows e.g. to either store models in subdirectories or to use a flat directory structure if specified, the
miname
naming convention is used to save the acoustic models after each minor iteration following items are replaced by their current value
- {ITER} major iteration number (sequence of different model types)
- {iter} minor iteration number (viterbi iteration number) preceded by a dot (.)
- {name} name of the current file (select, mod, ...)
- {exp} name of the current experimet
These names should be relative names. They are not allowed to start with a '/'. To specify a location for the final models use the basedir
config attribute.
ldir
, sdir
- local/global scratch directory to store large temporary files. If not absolute they'll be relative to
basedir
.
preproc
, mida_np
, mida_opt
- preprocessing + number of parameters to extract with
MIDA + decorr
- note1: set
mida_np
to 0
if no MIDA + decorr
is wanted
- note2: complete
preprocessing == preproc + [MIDA + decorr] + gauss_decorr
- note3:
[MIDA + decorr] + gauss_decorr
is represented in 1 matrix and 1 extra preprocessing file
ph_ci
, ph_cd
, ph_arcd
- phonetic units
ph_spec
- list of special phonemes (noise, garbage, ...)
- will have their weights reduced in MIDA/decorr
- will always be modelled as context-independent phonemes
This attribute expects a single string with the special phonemes separated by whitespace."
ggi
- Gaussian grouping (tying) info; see spr_add_gauss.c for more information.
questions
- phonetic questions for making the context dependent states (descision tree)
seg
- initial segmentation file
cor
, dic
, unwind
- Train corpus & dictionary. The
unwind
string is a semi-column separated list of statements having the following format: <property>=<value>
. The following properties can be set:
check_lvl
- Check whether the lexicon is correct (level>=0) and optimal (level>=1).
lc_weight
- Number between
[-1024,1024]
giving more or less importance to the left (positive values) or right context. By default the left context is given some preference (lc_weight
= 1).
add_in_front
- Add a specified phone sequence in front of all word/sentence descriptions.
add_between
- Add a specified phone sequence between all words in a sentence description.
add_at_rear
- Add a specified phone sequence at the rear of all word/sentence descriptions.
sent_context
- Alignment: a simple phone sequence (no optional sequences) used as context before (and after) a sentence for the CI-phone to the CD-phone conversion (not applied with the assimilation rules).
sil_word
- Recognition: a word used for marking valid start (all paths extending from that words) and end sequences (all paths going to that word).
cross_word
- (Do not) allow cross-word CD-phones (value: yes/no). By default cross-word CD-phones are allowed.
print_info
- (Do not) print advance information while converting the lexicon network (values=all/some/none). By default no advance info is printed.
assimilation
- Use the specified file with assimilation rules.
apply_Wrules
- Apply only those within-word rules that have any of the specified characters set. By default all rules are applied within-word.
apply_Arules
- Apply only those across-word rules that have any of the specified characters set. By default all rules are applied across-word.
old_multi_pron
- Support the old way of specifying multiple pronunciations, i.e. multiple entries for the same word.
xO
- Optimization level (0=NO, 1=Normal, 2=Full,default)
RExO
- Redo the network optimisation
no_wsep
- Do not separate the words by a dummy START state. This allows to look across two word boundaries (normal operation only allow one word boundary, which may give problems when quinphones are used).
quin_patch
- Recognition: set to a value of 2 to use cross-word quinphones even for sequences of words that consist out of a single phoneme. To allow this, the single phoneme words [spw] and multiphoneme words [mpw] are isolated and reorganized in a new network as follows: | ROOT->(s1,s2) | s1->spw->(s2,s3) | s2->mpw->(s1,s2) | s3->spw->(s1,s2) Alignment: allow to cross up to the given number of word boundaries in a match (gives no problems since the network for aligment is non cyclic).
For backward compatibility, the following flags are also supported:
S
- Add silence in front, between and after each word in the sentence.
s
- Add silence in front and after the sentence description.
=
- Do not add silence (default).
Note that the unwind description may not contain white-space characters.
obsdir
, suffix
- where to find the data
pass_split_cnt
- Use this attribute to set in how many parts you want to split the training passes. To be effective this should be equal or greater to the number of computing nodes that are available. Bigger numbers give more flexibility to the scheduler, but also incur more overhead.
local_paths
- This attribute holds a list of directories that are not available on other computing nodes. To add directories to this attribute the following construction can be used:
config.local_paths.append("<some-local-dir>")
On ESAT for example the local drives are mounted with paths starting with /volume1/
. These local drives are not exported and are not available on other nodes. Tasks that use files from these local drives will not be scheduled remotely.
remote_env_script
- Use this attribute to set the command that needs to be run to setup the environment on the remote host. This can be used to set for example the
PATH
and LD_LIBRARY_PATH
environment variables on the remote node so that the various SPRAAK commands required during the training can be found and executed.
host_info
- Use this attribute to configure the hosts that can be used for distributed training. This attribute holds a list of pairs. The first element of the pair is a host name, or
None
for localhost. The second element is a number that gives an upper limit on the number of processes that can be started on that host. Always include at least one pair with None
. For example, if the local node has 2 processors and 2 remote nodes (spchcl09 and spchcl10, each with 4 processors) are available, we could configure this as follows: config.host_info = [(None, 2), ("spchcl09", 4), ("spchcl10", 4)]
If the local node supports automatic and transparent process migration you can just configure to schedule everything locally and let the process migration system handle the distribution.
metadir
- Use this attribute to set a directory where progress data will be written. This defaults to the current directory.
split_entries_func
- This is an advanced setting. This attribute can be used to redefine the function that is used to calculate the ranges of entries that will be used to split a corpus or segmentation file. The function should accept 3 parameters:
fname
: The filename of the corpus or segmentation file.
ftype
: This will be "corpus" or "segmentation" to indicate how the file should be interpreted.
N
: This is the number of splits that is requested.
The return value should be a list of length N
holding tuples with beginning and end entries. Entries are selected for begin <= entry < end
.
def spr_pylib.train.master.MasterConfig.__init__ |
( |
|
self | ) |
|
def spr_pylib.train.master.MasterConfig.checkConfig |
( |
|
self | ) |
|
Perform some basic sanity checks on the configuration attributes.
Currently all atributes equal to None will raise an error indicating a missing config attribute. Some extra checks on attributes avoid problems during the training steps.