Handling of linear and tree structured lexicon descriptions. More...

Data Structures
struct	SprLexOD

union	SprCwrLexNL
	Nodes and links in the lexicon network: final version. More...

struct	SprCwrLexDesc
	compressed lexicon description More...

Typedefs
typedef int(*	_FuncPtr1_CWR_LEX_ )(SprCwrLexNL lex_ptr, int node_nr, SprCwrLexDesc lex_desc, va_list args;int spr_cwr_lex_desc_traverse(SprCwrLexDesc lex_desc, _FuncPtr1_CWR_LEX_ func),...)

Functions
int	spr_cwr_lex_size (const SprCwrLexDesc *lex_desc)

void	spr_cwr_lex_word_print (SprStream dest, const char lm, int word_nr, const char rm, const SprCwrLexDesc lex_desc)

SprCwrLexDesc *	spr_cwr_lex_desc_info_free (SprCwrLexDesc lex_desc, SprMsgId routine)

SprCwrLexDesc *	spr_cwr_lex_desc_free (SprCwrLexDesc lex_desc, SprMsgId routine)
	Free the lexicon description. More...

void	spr_cwr_lex_desc_print (SprStream dest, SprCwrLexDesc lex_desc, const SprCwrPhoneDesc *ph)
	Print the tree structured lexicon in a linear format. More...

SprCwrLexNL *	spr_cwr_lex_node_next (const SprCwrLexNL *lex_ptr)

void	spr_cwr_lex_desc_dump (const SprCwrLexDesc lex_desc, SprStream fd)
	Dump all information in the compressed/state-based lexicon. More...

int	spr_cwr_lex_desc_check (SprCwrLexDesc lex_desc, const SprCwrPhoneDesc ph)
	Check for errors in the lexicon network. More...

int	spr_cwr_lex_desc_opt_check (SprCwrLexDesc lex_desc, const SprCwrPhoneDesc ph)

SprCwrLexDesc *	spr_cwr_lex_desc_read (SprCwrLexDesc lex_desc, const char lex_fname, const char clex_fname, const SprCwrPhoneDesc phone_desc, const char *unwind)

SprCwrLexDesc *	spr_cwr_lex_desc_scan (SprCwrLexDesc lex_desc, SprCwrPhoneDesc phone_desc, const char unwind, const char lex_fname)

int	spr_cwr_lex_sent_to_trellis (SprCwrLexDesc lex_desc, const char sentence, int word_level)

int	spr_cwr_lex_node_expands_to (SprCwrLexNL lex_ptr, SprCwrLexNL to)

char *	spr_cwr_phone_desc (char word_str, SprCwrLexDesc lex_desc, SprCwrPhoneDesc *ph, int compact, int wexpand_flags)

int	spr_cwr_lexicon_print (const char dest, SprCwrLexDesc lex_desc, SprCwrPhoneDesc ph, int wexpand_flags, SprStream progress)

Variables
const char *const	spr_cwr_special_node_str [SPR_CWR_NR_SPECIAL_NODES+1]

const char *const	spr_cwr_special_node_str_short [SPR_CWR_NR_SPECIAL_NODES]

const char *const	spr_cwr_phone_type_str []

const SprOptDesc	spr_lex_load_opt_od []

const SprOptDesc	spr_lex_load_req_od []

const SprCmdOptDesc	spr_cwr_od_lex_unwind []

SprCwrLexDesc const	spr_cwr_empty_lex_desc

Detailed Description

Handling of linear and tree structured lexicon descriptions.

The lexicon tree/network:

The lexicon is described as a left to right, minimal braching, word tag optimized network.

left to right: The lexicon network may not contain loops, except through the special END_NODE's and START_NODE's.
network:: The nodes may have several children as well as parents.
minimal branching:: No node can have two similar (same tag) parents.
word tag optimized:: The word tag (lex_hist field on the links) must occure as early as possible.

The network starts with one START_NODE or ROOT_NODE. The end of the network is marked with one or more END_NODE's. Word-beginnings are indicated with START_NODE's. These START_NODE's may, in contrary to the END_NODE's, occure in the middle of the network. Thus, the END_NODE's are the terminating elements, the only links starting from these nodes should be the loop back links (going to a START_NODE).

Creating the lexicon tree:

Add all possible phoneme sequence to the lexicon tree
Set the link between then end node and the start node (CWR).
Apply the given assimilation rules.
Compact the word endings (inverse tree structure).
Convert to context dependent phonemes and compress again (+ dummy link nodes).
Convert to a state based representation

Author: Kris Demuynck

Date

xx/10/1995 - KD: Creation.
01/02/1996 - KD: Allow nesting of brackets in textual word descriptions.
19/02/1996 - KD: Compact word endings.
06/07/1996 - KD: Assimilation rules added.
04/08/1997 - KD: State-based lexicon.
02/10/1998 - KD: State-based lexicon - compression.
27/07/1999 - KD: Modification to make quinphones run.
31/03/2004 - KD: Improved the robustness when using assimilation rules (still not very robust).

Bug:

Assimilation rules (should) work in sentence mode (vitpass, vitalign), but are very sensistive in recognition mode.

If the FULL_DEBUG flag is set during compilation (-DFULL_DEBUG), some extra debug messages and consistency tests are added. These messages and tests result in a serious overhead.

If then DAVINCIHOME environment is set, the debug information will also contain lexicon-networks in the Davinci format.

Enumerations
enum	{ SPR_CWR_START_NODE, SPR_CWR_START_NODE_X, SPR_CWR_END_NODE, SPR_CWR_END_NODE_X, SPR_CWR_ROOT_NODE, SPR_CWR_ROOT_NODE_X, SPR_CWR_LINK_NODE, SPR_CWR_LINK_NODE_X, SPR_CWR_XXX_NODE, SPR_CWR_XXX_NODE_X, SPR_CWR_NORMAL_NODE, SPR_CWR_FREE_NODE }

enum	{ SPR_CWR_CI_PHONES, SPR_CWR_CD_PHONES, SPR_CWR_CX_PHONES, SPR_CWR_CXI_PHONES, SPR_CWR_CXD_PHONES, SPR_CWR_SS_PHONES, SPR_CWR_STR_PHONES }

enum	{ SPR_CWR_LI_PHONE_FINAL, SPR_CWR_LI_TRIV_NEXT, SPR_CWR_LI_LAST_LINK, SPR_CWR_LI_TRIV2LAST, SPR_CWR_LI_LEX_HIST, SPR_CWR_LI_NO_LEX_HIST, SPR_CWR_LI_MARK_FSIL, SPR_CWR_LI_MASK, SPR_CWR_LI_SHIFT }

enum	{ FSILxSTART, FSILxEND }

Data Structures

Typedefs

Enumerations

Functions

Variables

Detailed Description