SPRAAK
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
Data Structures | Typedefs | Enumerations | Functions | Variables
cwr_lex.c File Reference

Handling of linear and tree structured lexicon descriptions. More...

Data Structures

struct  SprLexOD
 
union  SprCwrLexNL
 Nodes and links in the lexicon network: final version. More...
 
struct  SprCwrLexDesc
 compressed lexicon description More...
 

Typedefs

typedef int(* _FuncPtr1_CWR_LEX_ )(SprCwrLexNL *lex_ptr, int node_nr, SprCwrLexDesc *lex_desc, va_list *args;int spr_cwr_lex_desc_traverse(SprCwrLexDesc *lex_desc, _FuncPtr1_CWR_LEX_ func),...)
 

Enumerations

enum  {
  SPR_CWR_START_NODE, SPR_CWR_START_NODE_X, SPR_CWR_END_NODE, SPR_CWR_END_NODE_X,
  SPR_CWR_ROOT_NODE, SPR_CWR_ROOT_NODE_X, SPR_CWR_LINK_NODE, SPR_CWR_LINK_NODE_X,
  SPR_CWR_XXX_NODE, SPR_CWR_XXX_NODE_X, SPR_CWR_NORMAL_NODE, SPR_CWR_FREE_NODE
}
 
enum  {
  SPR_CWR_CI_PHONES, SPR_CWR_CD_PHONES, SPR_CWR_CX_PHONES, SPR_CWR_CXI_PHONES,
  SPR_CWR_CXD_PHONES, SPR_CWR_SS_PHONES, SPR_CWR_STR_PHONES
}
 
enum  {
  SPR_CWR_LI_PHONE_FINAL, SPR_CWR_LI_TRIV_NEXT, SPR_CWR_LI_LAST_LINK, SPR_CWR_LI_TRIV2LAST,
  SPR_CWR_LI_LEX_HIST, SPR_CWR_LI_NO_LEX_HIST, SPR_CWR_LI_MARK_FSIL, SPR_CWR_LI_MASK,
  SPR_CWR_LI_SHIFT
}
 
enum  { FSILxSTART, FSILxEND }
 

Functions

int spr_cwr_lex_size (const SprCwrLexDesc *lex_desc)
 
void spr_cwr_lex_word_print (SprStream *dest, const char *lm, int word_nr, const char *rm, const SprCwrLexDesc *lex_desc)
 
SprCwrLexDescspr_cwr_lex_desc_info_free (SprCwrLexDesc *lex_desc, SprMsgId *routine)
 
SprCwrLexDescspr_cwr_lex_desc_free (SprCwrLexDesc *lex_desc, SprMsgId *routine)
 Free the lexicon description. More...
 
void spr_cwr_lex_desc_print (SprStream *dest, SprCwrLexDesc *lex_desc, const SprCwrPhoneDesc *ph)
 Print the tree structured lexicon in a linear format. More...
 
SprCwrLexNLspr_cwr_lex_node_next (const SprCwrLexNL *lex_ptr)
 
void spr_cwr_lex_desc_dump (const SprCwrLexDesc *lex_desc, SprStream *fd)
 Dump all information in the compressed/state-based lexicon. More...
 
int spr_cwr_lex_desc_check (SprCwrLexDesc *lex_desc, const SprCwrPhoneDesc *ph)
 Check for errors in the lexicon network. More...
 
int spr_cwr_lex_desc_opt_check (SprCwrLexDesc *lex_desc, const SprCwrPhoneDesc *ph)
 
SprCwrLexDescspr_cwr_lex_desc_read (SprCwrLexDesc *lex_desc, const char *lex_fname, const char *clex_fname, const SprCwrPhoneDesc *phone_desc, const char *unwind)
 
SprCwrLexDescspr_cwr_lex_desc_scan (SprCwrLexDesc *lex_desc, SprCwrPhoneDesc *phone_desc, const char *unwind, const char *lex_fname)
 
int spr_cwr_lex_sent_to_trellis (SprCwrLexDesc *lex_desc, const char *sentence, int word_level)
 
int spr_cwr_lex_node_expands_to (SprCwrLexNL *lex_ptr, SprCwrLexNL *to)
 
char * spr_cwr_phone_desc (char *word_str, SprCwrLexDesc *lex_desc, SprCwrPhoneDesc *ph, int compact, int wexpand_flags)
 
int spr_cwr_lexicon_print (const char *dest, SprCwrLexDesc *lex_desc, SprCwrPhoneDesc *ph, int wexpand_flags, SprStream *progress)
 

Variables

const char *const spr_cwr_special_node_str [SPR_CWR_NR_SPECIAL_NODES+1]
 
const char *const spr_cwr_special_node_str_short [SPR_CWR_NR_SPECIAL_NODES]
 
const char *const spr_cwr_phone_type_str []
 
const SprOptDesc spr_lex_load_opt_od []
 
const SprOptDesc spr_lex_load_req_od []
 
const SprCmdOptDesc spr_cwr_od_lex_unwind []
 
SprCwrLexDesc const spr_cwr_empty_lex_desc
 

Detailed Description

Handling of linear and tree structured lexicon descriptions.

The lexicon tree/network:
The lexicon is described as a left to right, minimal braching, word tag optimized network.
left to right
The lexicon network may not contain loops, except through the special END_NODE's and START_NODE's.
network:
The nodes may have several children as well as parents.
minimal branching:
No node can have two similar (same tag) parents.
word tag optimized:
The word tag (lex_hist field on the links) must occure as early as possible.
The network starts with one START_NODE or ROOT_NODE. The end of the network is marked with one or more END_NODE's. Word-beginnings are indicated with START_NODE's. These START_NODE's may, in contrary to the END_NODE's, occure in the middle of the network. Thus, the END_NODE's are the terminating elements, the only links starting from these nodes should be the loop back links (going to a START_NODE).
Creating the lexicon tree:
  • Add all possible phoneme sequence to the lexicon tree
  • Set the link between then end node and the start node (CWR).
  • Apply the given assimilation rules.
  • Compact the word endings (inverse tree structure).
  • Convert to context dependent phonemes and compress again (+ dummy link nodes).
  • Convert to a state based representation
Author
Kris Demuynck
Date
xx/10/1995 - KD
Creation.
01/02/1996 - KD
Allow nesting of brackets in textual word descriptions.
19/02/1996 - KD
Compact word endings.
06/07/1996 - KD
Assimilation rules added.
04/08/1997 - KD
State-based lexicon.
02/10/1998 - KD
State-based lexicon - compression.
27/07/1999 - KD
Modification to make quinphones run.
31/03/2004 - KD
Improved the robustness when using assimilation rules (still not very robust).
Bug:

Assimilation rules (should) work in sentence mode (vitpass, vitalign), but are very sensistive in recognition mode.

If the FULL_DEBUG flag is set during compilation (-DFULL_DEBUG), some extra debug messages and consistency tests are added. These messages and tests result in a serious overhead.

If then DAVINCIHOME environment is set, the debug information will also contain lexicon-networks in the Davinci format.