| |
4.3.2 NIST Scoring: Scoring Reports
NIST
provides
standard scoring software
which automatically computes the WER metrics described in the previous
section. The software computes the number and type of
errors in each sentence and provides a detailed listing
for each category of error.
An example of a NIST scoring report can be seen on
page 2 of lecture 43.
The process of scoring is explained in detail in this lecture on
evaluation metrics
from our on-line
speech recognition course notes.
NIST scoring also computes simple statistics,
such as overall percentages for each category of error,
speaker-specific error rates, and significance
measures that indicate whether an experimental result
is meaningful. One useful output is a list of
confusion pairs
which simply show, for a given pair of words, the
number of times one word was mistakenly recognized
or "confused" with another word.
For example:
indicates that 13 times, the word "five" was mistaken
for the word "oh".
The NIST scoring software requires all reference texts
of the actual sentences spoken as well as the corresponding
hypotheses produced by the decoder. Both the reference
texts and the hypotheses must be properly formatted.
We provide tools to obtain and convert this information
to the proper format.
Continue to
Generating the NIST Scoring Report
in the next section to learn how to convert reference texts and hypotheses
to the NIST format for automated scoring and generate reports, and how
to interpret the results.
|
| |
|