| 
| SprCwrSLex *  | spr_cwr_slex_free (SprCwrSLex *lex, SprMsgId *routine) | 
|   | Free a set of lexicon names.  More...
  | 
|   | 
| SprCwrSLex *  | spr_cwr_slex_alloc (SprCwrSLex *lex, int nr_words, int wlen_tot) | 
|   | Allocate and/or initialise the set of lexicon names.  More...
  | 
|   | 
| int  | spr_cwr_sum_str_len (const SprCwrSLex *lex) | 
|   | 
| SprCwrSLex *  | spr_cwr_slex_dup (SprCwrSLex *lex_dst, const SprCwrSLex *lex_src) | 
|   | 
| SprCwrSLex *  | spr_cwr_slex1_create (SprCwrSLex *lex, int duplicate, const SprStrHashTbl *whash_main, const SprStrHashTbl *whash_ext) | 
|   | 
| SprCwrSLex *  | spr_cwr_slex_read (SprCwrSLex *lex, const char *fname, SprKeySet *keys, int split_phon_desc, int unsorted) | 
|   | 
| int  | spr_cwr_slex_dump (SprStream *fd, const SprCwrSLex *lex) | 
|   | 
| const char *  | spr_cwr_slex_get_word (int word_id, const SprCwrSLex *lex) | 
|   | Convert the word id to the corresponding word string.  More...
  | 
|   | 
| int  | spr_cwr_is_html_marker (const char *word, int len) | 
|   | 
| int  | spr_cwr_get_word_id (SprCwrWordSet *word_set, const char *word, int len, const SprCwrSLex *lex, int notify_level) | 
|   | 
| void  | spr_cwr_word_set_print (SprStream *dest, const SprCwrWordSet *word, const SprCwrSLex *lex) | 
|   | Print all words in a word_set.  More...
  | 
|   | 
| int  | spr_cwr_word_seq_decode (SprCwrWordSeq *word_seq, const char *word_str, const SprCwrSLex *lex, int notify_level) | 
|   | 
Handling of words in all cwr-routines. 
- Used symbols
 WORD_ID      = (GRAPHEME,GRAPHEME_EXTENSION)
WORD_MODEL   = (WORD_ID,PRONONCIATION)
LM_WORD_ID   = (GRAPHEME,GRAPHEME_EXTENSION')
WORD_CLASSES = (MEANING)
 The GRAPHEME_EXTENSION can be use for several purposes. Examples are: 
- 
Male/female models. 
 
- 
OOV-models for different word lengths (nr. of sylabes) 
 
- 
Different types of OOV-models (e.g. one for proper names, geographic names, ...) 
 
- 
Different words (PRONONCIATION,MEANING) with the same GRAPHEME (e.g. bedelen, appel, ...). 
 
- How are words handled
 
- 
The recognizer 
- 
Creates WORD_ID hypotheses and send them to the LM. 
 
- 
The LM translate them into the corresoponding LM_WORD_ID's, which are extended with the WORD_CLASS information. The LM also checks for special WORD_CLASSES (e.g. SENTENCE_MARKER, FILLER_MODEL, ...) 
 
 
- 
Perplexity/tagging operations: 
- 
The input are the GRAPHEME's. 
 
- 
They are translated into (multiple) WORD_ID hypotheses by the (binary search) lexicon lookup algorithm. The multiple WORD_ID hypotheses case occures if a word is given without its GRAPHEME_EXTENSION. 
 
- 
The work of the LM remains the same as during the recognition. 
 
 
- How is the information stored
 
- 
The LEXICON stores 
- 
The WORD_MODEL's (GRAPHEME,GRAPHEME_EXTENSION,PRONONCIATION) 
 
- 
The GRAPHEME for the UNKNOW_WORD. 
 
 
- 
The COUNT(LM)-file stores the LM-info, i.e.: 
- 
The CLASS-sequences and counts. 
 
- 
The LM_WORD_ID information (GRAPHEME,GRAPHEME_EXTENSION'). 
 
- 
The LM_WORD_ID to WORD_CLASSES conversion table. 
 
- 
The LM_WORD_ID distribution for all appropriate CLASSES. 
 
- 
A list of special CLASSES with their appropriate flags. 
 
 
- Remarks
 
- 
Special (fixed) WORD_CLASSES and WORD_MODEL's
 These classes are fixed, so they can be used at any time and by any program without having to read any resources (e.g. bootstrapping). For each special class, there is a corresponding special word. These words allow you to use the GRAPHEME's (both as input or as output), even if they do not occure in the lexicon. 
- 
SENTENCE_BEGIN 
 
- 
SENTENCE_END 
 
- 
UNKNOWN_WORD 
 
Other WORD_CLASSES are defined for internal use only: 
- 
EMPTY_LM_SLOT 
 
- 
PARTIAL_WORD 
 
 
- 
Fall back routine for unknown words:
 Replace by the UNKNOWN_WORD symbol and retry (may result in multiple WORD_MODEL candidates).  
- 
Handling sentence begin/end: 
- 
Only one symbol is provided (i.e. the SENTENCE_MARKER). 
 
- 
No statistics are recorded regarding cross-sentence effects. 
 
 
- Author
 - Kris Demuynck 
 
- Date
 - Oct 1996