SPRAAK
|
Convert arpa style N-gram's to and from the binary format used in SPRAAK. More...
Convert arpa style N-gram's to and from the binary format used in SPRAAK.
spr_lm_arpabo [-R](flag: bin->arpa) [-FIX how](no) [-i input LM](stdin) [-o output LM](stdout) [-c build options] [-rf LM-load options](check_lvl=2) [-l level](0)
Convert arpa LM's to and from the LM's used by the SPRAAK system.
Convert the N-gram representation between the arpa-style and the compact binary format used in SPRAAK.
When converting from the arpa-style to the binary format, a file with options can be specified. The possible options are:
[Options] | |
---|---|
pquant <quant_precision>(10000.0) | |
Quantisize the probabilities with a precision of 1.0/<quant_precision> | |
Nlmc <estimated_nr_of_LM_contexts> | |
Initial size for the LM-context hash table (tends to be over-estimated, so providing a good initial value may save some memory during the conversion); when specified, this value may not be underestimated (less than the actual number of LM-contexts). | |
pct_init <percentage>(2.0) | |
Start with a LM-context hash table of <percentage> percent larger than the minimal size. | |
pct_add <percentage> | |
Enlarge the LM-context hash table by <percentage> percent whenever no perfect hash table could be made. | |
max_prob <max_prob(log10)>(0.0) [msg_lvl](0) | |
Clip all probabities larger than <max_prob> (log10) to <max_prob>. Report this change at message level <msg_lvl>. | |
min_prob <min_prob(log10)>(-Inf) [msg_lvl](0) | |
Clip all probabities smaller than <min_prob> (log10) to <min_prob>. Report this change at message level <msg_lvl>. | |
max_disc <max_discount_frac(log10)>(10.0) [msg_lvl](0) | |
Clip all discount fractions larger than <max_discount_frac> (log10) to <max_discount_frac>. Report this change at message level <msg_lvl>. | |
min_disc <min_discount_frac(log10)>(-Inf) [msg_lvl](0) | |
Clip all discount fractions smaller than <min_discount_frac> (log10) to <min_discount_frac>. Report this change at message level <msg_lvl>. | |
<lprob/ldisc> offs <prob_scale_factor(log10)> <words> ... | |
Scale the (discount) probability for the given words with a factor <prob_scale_factor> (log10). Since all operations are done in the log10 domain, the scaling is an addition. | |
<lprob/ldisc> fac <prob_power(log10)> <words> ... | |
Raise the (discount) probability for the given words to the power <prob_power>. Since all operations are done in the log10 domain, the raising to a certain power corresponds to scaling. |