Word level transcriptions do not contain everything that is required to make a detailed acoustic transcription (e.g. intraword silence, coughs, .. ) Moreover, the pronunciation of words may change when embedded in a sentence. Hence the canonical transcription of the words as given in the lexicon may no longer be the correct one. A simple solution often used is to lexicalize the most relevant pronunciation variants.

SPRAAK has multiple options for specifying what can happen acoustically between- and cross-word. There are provisions for on-the-fly modifications to the transcriptions when converting lexical transcriptions to another level. The effect of the modifications is slightly different when mapping a sentence to a sequence of acoustic units for Viterbi alignment or training (top down) or when converting a lexicon to a search space for doing continuous word recognition (bottom up).

Unwind Rules

When mapping a sentence transcription:

one can add a sequence of acoustic units (and alternatives using the '[/]' construction) at the beginning of the sentences, in between the words (i.e. before each word, except for the first onle) and after the sentence (options 'add_in_front=', 'add_between=' and 'add_at_rear=' respectively).
one can specify the acoustic units that should be used as left context for the initial acoustic units of the sentence and as right context for the final acoustic units of the sentence (option 'sent_context=')

When loading a lexicon for doing continuous word recognition (creating the search space):

one can add a sequence of acoustic units (and alternatives using the '[/]' construction) before and after each word (options 'add_in_front=' and 'add_at_rear=' respectively).
one can specify the required acoustic context left of the first recognized word and the preferred acoustic context right of the last recognized word; the acoustic context can only be specifies as a lexical item (a word) using the option 'sil_word='

In both cases:

one can specify a set of assimilation rules and flags that indicate which rules should be activated in word-internal and in cross-word situations (options 'assimilation=', 'apply_Wrules=' and 'apply_Arules='); note that the current implementation is still based on HMM75 and is very fragile when appying assimilation rules on a search space for doing continuous word recognition.
turn the use of cross-word context-dependent phones on or off (option 'cross_word=')

Other 'unwind' options are used for debugging, compatibility issues, and so on. For more details, see spr_cwr_lex_desc_read()

Assimilation Rules

Assimilation rules may optionally be added to a lexicon.

The presence of a '=' sign indicates that an entry in the lexicon is an assimilation rule
When writing assimilation rules, a single phone or the empty set [] can be replaced by another phone, by the empty set or by a complex construction:
[A/E]B=CD=[]
[A/E][]=CD=E
[A/E]B=[X/Y/[]]D=E/F
Note: square brackets can be used to make the rules more readable: [A/E][B=C]D
[A/E][[]=C]D
[A/E][B=[]]D
[A/E][B=[B/C/[]]][D=E/F]
When writing assimilation rules, probabilities can only be added at the right hand side of the '=' sign:
[A/E][B=[(.1)B/(.7)C/(.2)[]]]D

This 'flat' notation strikes a good balance between readability and expressiveness. In the few cases that very complex descriptions are needed, the following formats can be used:

  =<nr_of_nodes>[<from_node>/<to_node>/(<prob>)<phone>]...
  =<nr_of_nodes>[<from_node>/<to_node>/<phone>=(<prob>)<phone>]...

The (<prob>) fields are optional. For example, the assimilation rule
[A/E][B=[(.1)B/(.7)C/(.2)[]]]D
can also be written as:
=4[0/1/A][0/1/E][1/2/B=(.1)B][1/2/B=(.7)C][1/2/B=(.2)[]][2/3/D]