SPRAAK
|
Unit files contain lists of 'acoustic units' with additional information such as number of states, state numbers, context dependency .. In the SPRAAK hierarchy units are the layer between words and states: i.e.
'Acoustic units' are hence not necessarily phones, but can be any subword unit.
Unit files of different complexity appear at different moments in the design of a speech recognition system. Typically two types are distinguished:
The file extensions '.ci' and '.cd' may be a bit confusing as the '.cd' file is needed with context-independent phone models as well. Moreover, older versions till SPRAAK v0.9 also required a '.arcd' file; see legacy for more details
Each line in a Context Dependent UNIT file contains following information:
<unitname> <transcription> <nstates> [state numbers]
in which
The first example shows a context independent unit file, i.e. typically this is the definition of the phonetic alphabet.
.spr DATA UNIT DIM1 43 UNIT_TYPE CI phonemes LANGUAGE English # i I I! E { @ @! } }! u U O A e+I a+U a+I ... d+Z #
.spr DIM1 2131 DATA UNIT # @0 [d]-@-[sz] 3 0 118 198 @1 [d]-@-[r] 3 0 120 197 @2 [d]-@-[l] 3 0 122 196 ......... s1572 [mN]-s-[SZ] 3 443 461 433 s1573 [mN]-s-[n] 3 443 461 434 s1574 [mN]-s-[l] 3 443 461 436 s1575 [mN]-s-[szr*] 3 443 461 437 s1576 [td]-s-[#] 3 444 455 432 s1577 [td]-s-[d] 3 444 456 430 s1578 [td]-s-[t] 3 444 457 431 s1579 [td]-s-[w] 3 444 458 429 s1580 [td]-s-[m] 3 444 458 434 s1581 [td]-s-[pb] 3 444 458 435 s1582 [td]-s-[fv] 3 444 458 437 s1583 [td]-s-[hj] 3 444 459 429 s1584 [td]-s-[kgxGN] 3 444 459 433 ....... r2127 [pbkgfvxG*#]-r-[gxG] 3 552 573 564 r2128 [pbkgfvxG*#]-r-[p] 3 552 574 563 r2129 [pbkgfvxG*#]-r-[bfv*] 3 552 574 564 2130 -*- 1 575
Each line in a .cd-file describes a single allophonic unit.
The allophonic identifier is the concatenation of the 'context-independent' phone and a unique numerical identifier. The numerical is unique within the set of allophones of all phones, it is not the n'th variant of a phone.
Which context classes define the allophonic variant is given in the second field. The center phone is given between '-'s, the clustered contexts are given in square brackets. The example above shows triphone models, i.e. left and right context are defined. Quinphones are represented by [L2][L1]-ph-[R1][R2] , ... in which context lists are to be interpreted as 'OR' lists. Not specifying left or right or both contexts implies that there is no context dependency.
The last few columns give the absolute state numbers. There is no file that gives a list of states. The software will infer the list from the .cd file and assume consistency with the HMM Files which contain no names of states and are mere lists of numbers.
In some programs it is more convenient to reference states relative to the allophone, in that case we will use the notation <phone><identifier>#<relative_state_number> in which state numbers are 0,1,2, ... Thus "r2129\#1" would refer to absolute state number "574" in the above example.
The '#' symbol is used for a number of different meanings in the SPRAAK package. While never leading to parsing problems, it may somewhat hamper readability:
The '.arcd' file looks very much like the '.cd' file, except that it does not contain the field specifying the context dependency of the unit; thus, it contains: unit_name, number_of_states and state_numbers. Hence it was only suited for CI topology specifications. This role is now fully taken by the '.cd' file.