| |
-
Recognition of the Test Data:
At this point we assume that a trained set of models are ready
to be used to recognize the test data. If not, then please see
this section
for details on training a set of models.
The decoder used for the evaluation is
trace_projector.
This decoder can be used in several modes. For the alphadigits
task, we have defined a word graph in the
data preparation
section of the manual. So, we use the decoder in the Lattice
Rescoring mode. Since our models are cross-word triphones,
we need to set the appropriate options and parameters.
Adetailed tutorial on how to use multiple LMs simultaneously
for decoding and switch LMs dynamically at runtime can be found
here.
-
Required Data:
Features for the test data - Use the
extract_feature
utility.
Grammar/Lattice for the test data - We generated this in
the
data preparation
stage. Note that we need one lattice file per test
utterance even if it is the same lattice as is the case
with alphadigits.
Models from the final pass of training - states, model
definitions, phone map, transitions and the lexicon.
-
Using the Decoder:
Here
is a parameter file we would use for the recognition
process. Note that we specify the context_mode
tag to be "cross_word".
trace_projector
-p
params.text
The pruning thresholds are set based on the complexity
of the recognition task. Since alphadigits is a
relatively simple task, the thresholds can be tight. In
case of tasks like Switchboard, these would be much
higher. The "wdpenalty" option is very useful in
controlling the insertion of short words like the letter
"o" in the alphadigits task.
-
Word Error Rate (WER) Computation:
We use the standard
NIST scoring software
for evaluation of recognition performance. We do, however, have
a script that does the necessary format conversions to allow
"sclite" to do the WER computation.
The NIST tools expect a reference transcript in what we call
the score format. The
isip_eval
utility is used to convert the output from the recognizer into
this format and to then to evaluate the recognition performance
using the NIST tools.
Here
is the score file generated from the evaluation data we used.
Since we ran the decoder to output data in the "word" format,
we use isip_eval as follows:
isip_eval
isip_word
output.list
ref.score
output.score
The script outputs the error statistics to stdout and also
creates a report file that contains the alignment, confusion
pairs etc.
prev
next
top
|
|
|
|