What ?

These are tutorials on HOW to use SPRAAK. These are NOT tutorials on speech recognition as such and assume that the user has at least a basic understanding of speech recognition. Each individual tutorial treats one specific aspect of a recognition system and is thus self-contained; at the same time they are placed in a natural order. It is not strictly necessarily to go through them sequentially, though you should expect many things to be explained only once. The majority of the tutorials on acoustic modeling are developed for the TIMIT database. Apart from that example setups are available for other tasks including WSJ.

Each of the tutorials consists of:

a manual page given some theoretical background and how this is implementated in SPRAAK
a set of scripts, resources and configuration files with which you can run an experiment illustrating the current topic
suggestions to make small modifications to the presented setup

All experiments relating to the tutorials and example setups are available in:

$SPR_HOME/examples/timit        for TIMIT tutorials
$SPR_HOME/examples/wsj          for WSJ example setups (to be used in conjunction with ./examples/resources)
$SPR_HOME/examples/fsg          for example Finite State Grammars in different formats

If you haven't read the general materials of the user manual, you should at least have a look at Example Linguistic Resources before moving on.

The tutorials are built up step by step and are best run sequentially.

Why TIMIT ?

PRELIMINARY REMARK: The TIMIT Database is NOT distributed with SPRAAK. For running the complete TIMIT based tutorials, you will need the database to be available on your system. For this you need a license from the LDC which can be found at http://www.ldc.upenn.edu.

The TIMIT database contains speech from (mainly native) US English speakers, all pronouncing a relatively small set of sentences selected from a shared pool of sentences. This database has been around for several decades and is still widely used today. We chose TIMIT as database for our introductory tutorials as:

it is commonly available and widely used
it is small enough to guarantee a fast turnaround time for complete experiments (less than 1/2 hr.)
it is large enough to demonstrate many of the capabilities in SPRAAK
the hand segmentations provide for easy bootstrapping
the default phone recognition setup is simple

On the other hand it is our understanding that due to its limitations, TIMIT is more suited for fast prototyping than and for large scale benchmarking.

Setting up for the TIMIT Tutorials

Part of the challenge in running speech recognition experiments is maintaining an overview of the numerous files and parameters that come into play. SPRAAK has a number of configuration files that help in maintaining an overview. Having a good directory structure is the other part of the solution. An 'optimal' directory structure is a personal choice that will evolve as you get more used to using SPRAAK. For the tutorials we advise you to mimick the directory structure in the tutorials and to follow the detailed instructions. You should also specify a scratch directory for storage of temporary files. SPRAAK doesn't do a full cleanup of the scratch directories, but you can do so whenever requested. All interesting files are stored elsewhere (and also there, you may like to do some cleanup from time to time).

So before getting started on the tutorials, you should execute following commands (assuming MYSCRATCHDISK and TIMIT are variables set respectively to point to your own scratchdisk and to the installation of the TIMIT CDs on your system)

> mkdir ~/MyTimit                               # create a top level working directory for your experiments
> cd ~/MyTimit
> ln -s $TIMIT dbase                            # put a link to the TIMIT database
> ln -s $MYSCRATCHDISK scratch                  # define a toplevel scratch directory
> cp -r $SPR_HOME/examples/timit/resources .    # define a directory with (linguistic) resources
> cp -r $SPR_HOME/examples/timit/scripts.       # define a directory with scripts and configuration files
> cp -r $SPR_HOME/examples/timit/models .       # define a directory with reference models
> mkdir exp                                     # create an experiment directory

This will give you following directory structure:

dbase@   exp/    models/    resources/     scripts/      scratch@

Converting TIMIT CDs to SPRAAK resources

The scripts below show how we converted the TIMIT resources available on the distribution CDs to the SPRAAK format. These scripts will need modifications to run within your computing environment.

Dictionary (.dic), Phone alphabet (.ci), State description (.cd), Alphabet translations (.xlat) files were generated by hand (see also description in Example Linguistic Resources).

The Corpus files and Segmentations were generated by following scripts, starting from the original TIMIT CDs:

MAKE_COR_SEG: makes corpus and continuous time segmentation files
SPR_SEG_CTB2DTB: converts continuous time phone segmentations to discrete time state segmentations

TBD **

The Phone Language Models were created using the SRI LM toolkit and some extra local scripts:

SPR_MAKE_PHONE_LM

TBD **