SPRAAK
|
SPRAAK uses a standard linguistic hierarchy in which a higher level can be expressed as a sequence of lower level units according to the so-called beads-on-a-string model. SPRAAK largely follows this approach, implying:
Level Definition Knowledge Source ------------------------------------------------------------------------------------------ Sentence == sequence of words Language Model (e.g. Ngram) Word == sequence of acoustic units Lexicon Unit == sequence of states UnitFile State == sequence of frames Acoustic Model (e.g. HMM)
Examples of how one level is expressed in units of the next lower level are given below:
-----------sentence as sequence words ------------------ Sentence speech_is_easy ----------- word as sequence of phones ---------------- speech spit+S is i[s/z] easy izi -- context independent phone as sequence of states ----- i -i- 4 13 14 15 16 s -s- 3 28 29 30 z -z- 3 31 32 34 -- context dependent phone as sequence of states ------ s#98 [i]-s-[i] 3 147 149 150 z#137 [eaiuo]-z- 3 148 253 339 --------------------------------------------------------
It are the knowledge sources and data/model files that link the different levels that form the core of a speech recognition system. The software itself is the glue that make them function all together.
Language model and acoustic model which are stored in the model files (see Acoustic Model, Language Model) are typically derived by training from a large body of data.
The lexicon and unit definition - stored in Lexicon File and Acoustic Unit File - are more often derived from a combination of expert knowledge and some automatic procedures.