SPRAAK uses a standard linguistic hierarchy in which a higher level can be expressed as a sequence of lower level units according to the so-called beads-on-a-string model. SPRAAK largely follows this approach, implying:

Level           Definition                      Knowledge Source
------------------------------------------------------------------------------------------
Sentence == sequence of words                   Language Model (e.g. Ngram)
Word     == sequence of acoustic units          Lexicon
Unit     == sequence of states                  UnitFile
State    == sequence of frames                  Acoustic Model (e.g. HMM)

Examples of how one level is expressed in units of the next lower level are given below:

-----------sentence as sequence words ------------------
Sentence        speech_is_easy
----------- word as sequence of phones ----------------
speech          spit+S
is              i[s/z]
easy            izi
-- context independent phone as sequence of states -----
i               -i-     4       13 14 15 16
s               -s-     3       28 29 30
z               -z-     3       31 32 34
-- context dependent phone as sequence of states ------
s#98            [i]-s-[i]       3   147 149 150
z#137           [eaiuo]-z-      3   148 253 339
--------------------------------------------------------

It are the knowledge sources and data/model files that link the different levels that form the core of a speech recognition system. The software itself is the glue that make them function all together.

Language model and acoustic model which are stored in the model files (see Acoustic Model, Language Model) are typically derived by training from a large body of data.

The lexicon and unit definition - stored in Lexicon File and Acoustic Unit File - are more often derived from a combination of expert knowledge and some automatic procedures.