SPRAAK
|
Word level transcriptions do not contain everything that is required to make a detailed acoustic transcription (e.g. intraword silence, coughs, .. ) Moreover, the pronunciation of words may change when embedded in a sentence. Hence the canonical transcription of the words as given in the lexicon may no longer be the correct one. A simple solution often used is to lexicalize the most relevant pronunciation variants.
SPRAAK has multiple options for specifying what can happen acoustically between- and cross-word. There are provisions for on-the-fly modifications to the transcriptions when converting lexical transcriptions to another level. The effect of the modifications is slightly different when mapping a sentence to a sequence of acoustic units for Viterbi alignment or training (top down) or when converting a lexicon to a search space for doing continuous word recognition (bottom up).
When mapping a sentence transcription:
When loading a lexicon for doing continuous word recognition (creating the search space):
In both cases:
Other 'unwind' options are used for debugging, compatibility issues, and so on. For more details, see spr_cwr_lex_desc_read()
Assimilation rules may optionally be added to a lexicon.
[A/E]B=CD=[]
[A/E][]=CD=E
[A/E]B=[X/Y/[]]D=E/F
[A/E][B=C]D
[A/E][[]=C]D
[A/E][B=[]]D
[A/E][B=[B/C/[]]][D=E/F]
[A/E][B=[(.1)B/(.7)C/(.2)[]]]D
This 'flat' notation strikes a good balance between readability and expressiveness. In the few cases that very complex descriptions are needed, the following formats can be used:
=<nr_of_nodes>[<from_node>/<to_node>/(<prob>)<phone>]... =<nr_of_nodes>[<from_node>/<to_node>/<phone>=(<prob>)<phone>]...
The (<prob>)
fields are optional. For example, the assimilation rule
[A/E][B=[(.1)B/(.7)C/(.2)[]]]D
can also be written as:
=4[0/1/A][0/1/E][1/2/B=(.1)B][1/2/B=(.7)C][1/2/B=(.2)[]][2/3/D]