SPRAAK
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
spr_scoreres.c File Reference

(Re)calculate the CSR score for a given result file. More...

Detailed Description

(Re)calculate the CSR score for a given result file.

spr_scoreres [-CIS](flag: correct skipped) [-PAR](flag: print all) [-c corpus_fname]
    [-ref ref_txt_fname_or_string] [-r result_fname] [-tst tst_txt_fname] [-dic dictionary]
    [-dist distance matrix] [-nr new_result_fname] [-res new_result_fname] [-omit words_to_omit]
Parameters
-CISflagcorrect skipped
Correct recognition results are not listed in the results file, i.e. any entry for which no result is found is considered correct.
-PARflagprint all
Print all recognition results, not only the wrong ones.
-c<em>corpus_fname</em><aname="spr_scoreres.c" class="el">
Name of the corpus file containing the reference transcriptions.
-ref<em>ref_txt_fname_or_string</em><aname="spr_scoreres.ref" class="el">
Alternative way to provide reference transcriptions: lines containing word sequences delimited by white space characters; one line per entry. A single reference entry can also be provided directly on the command line; this string must start with '@' as first word (this word is ignored in the alignment).
-r<em>result_fname</em><aname="spr_scoreres.r" class="el">
Name of result file containing the test sequences.
-tst<em>tst_txt_fname</em><aname="spr_scoreres.tst" class="el">
Alternative way to provide test transcriptions: lines containing word sequences delimited by white space characters; one line per entry. A single test string can also be provided directly on the command line; this this string must start with '@' as first word (this word is ignored in the alignment).
-dic<em>dictionary</em><aname="spr_scoreres.dic" class="el">
Name of an optional dictionary (only needed in combination with a distance matrix).
-dist<em>distancematrix
File defining distance between the dictionary elements (requires a dictionary) or a reference to a known distance function \"\@&lt;func_name&gt;[param1=val;param2=val;..]\\"; at this moment only a simple count of unique (occuring in wrd1 and not in wrd2, or vica versa) characters (or character sequences) is supported: \"\@chr_diff[N=2;w1=1;w2=0.1]\\"
-nr<em>new_result_fname</em><aname="spr_scoreres.nr" class="el">
Name of file to write results to.
-res<em>new_result_fname</em><aname="spr_scoreres.res" class="el">
Alternative way to get the results: one line per arc in the alignment; each line contains two word, the reference word on the left and the test word on the right; empty words are replaced with '/'; entries are seperated with an empty line.
-omit<em>words_to_omit</em><aname="spr_scoreres.omit" class="el">
Words that have to be omitted before scoring.

(Re)calculate the CSR score for a given result file. This programme calculates the word error rate from a reference file and a recognition result. The recognition results are either read from a result file produced by the spr_eval.py script or from a text file containing one line per entry, words seperated by spaces. The reference transcriptions are read from a corpus file or from a text file containing one line per entry, words seperated by spaces. It is possible to omit every occurence of words given with option -omit (e.g. for sentence ending) in both correct and recognised string before calculating the score. A result summary is written to standard out:

Instead of only a summary, also a new result file (option -nr) can be written, omitting the omitted words. In this case, nothing is written to standard out. If flag -PAR is set, all sentences are printed in the resultfile, even correct ones. The resulting alignment can also be dumped in a simple format using the '-res' option.

Author
Jacques Duchateau, Kris Demuynck
Date
07/05/1996 - JD
Creation
15/05/2004 - KD
Adapted for new cwr_err conventions
15/09/2015 - KD
Added to options to process simple text files