SPRAAK
|
The Wall Street Journal test suite has been one of the most widely used speech recognition benchmarks since the early 90's. This demo suite contains a configuration for training a state-of-the-art HMM based system for the WSJ benchmark. On the popular November 92 non-verbalized punctuation test set using a trigram language model an error rate of less than 8% is obtained.
This tutorial is not written for first time users of SPRAAK and/or speech recognition. We advise first time users to go through the TIMIT tutorials first, as these introduce individual components and concepts of the SPRAAK package step by step. This tutorial assumes an understanding of the concepts underlying a large vocabulary speech recognition system. Experienced speech recognition researchers may give it a try to start here and dig further in the general manual pages or read selected pages from the TIMIT tutorials when needed.
This tutorial shows all the steps in creating and evaluating a large vocabulary systen, including: – getting all the required resources in place and in the right format – training an acoustic model – preparing a language model – evaulating and fine tuning the final system
The WSJ benchmark was established in the early 90's as a means of driving the development and establishing a uniform evaluating methodology for large vocabulary speech recognition. The benchmark has mainly been used to evaluate the progress in acoustic modeling. While it is a popular and useful benchmark, one should also understand it's limitations. The "read speech" and "high quality recordings" properties make that these results or a system trained with only this data are not representative for real life applications.