Different DATA files contain different types of DATA; this is specified by the header field DATA. Individual SPRAAK programs only use a weak association between file name extension and the file type. However consistency is advised for readability purposes and certain standardized extensions are used in the scripts for training and evaluation and in the demonstration scripts.

Sample and Track Files

Parameter Files

Parameter files can hold parameters used/derived in certain processing steps. These can be of widely different nature: filter parameters, HMM parameters, .. The file organization is to a large extent similar to TRACK files. Below more detailed descriptions are found for the most common data :

HMM Files

Linguistic Resources, Corpora, Segmentations

We need all kinds of defintions and enumerations: the set of acoustic units, the enumeration of utterances in a corpus, ... Most of these data can be given in a list information. So, independent of the content, all of these files all look very similar. They are stored in ASCII for convenience, such that they are easily inspected, created and edited by general purpose tools.

Language Models, Grammars

For description on language model files, cfr. spr_um_lm