4.3.4 Scoring: Converting Recognition Output and Generating a Scoring Report
The output file generated in
Sections 4.2.5
a text Sof format and contains an annotation graph representation
for each utterance. To convert this file to a format that can be
used by NIST for scoring, go to the following directory
$ISIP_TUTORIAL/sections/s04/s04_03_p04/
and run the following command:
isip_extract_hypo -level word -format NIST_TRN -list $ISIP_TUTORIAL/databases/lists/identifiers_test.sof -exclude exclude_symbols.sof -output ./results.score -debug brief $ISIP_TUTORIAL/sections/s04/s04_02_p05/results.out
Expected Output:
loading transcription database: /ftp/pub/research/isip/projects/speech/software/tutorials/
production/fundamentals/current/examples/sections/s04/s04_02_p05/results.out
retrieving transcription for identifier: ah_111a, level: word
retrieving transcription for identifier: ah_1a, level: word
retrieving transcription for identifier: ah_27o6571a, level: word
retrieving transcription for identifier: ah_2o5a, level: word
retrieving transcription for identifier: ah_3b, level: word
retrieving transcription for identifier: ah_416a, level: word
retrieving transcription for identifier: ah_4b, level: word
...
The hypothesized transcriptions will be extracted from the annotation graph
and stored in the file results.score in NIST scoring format.
Next, we will do the same for the reference transcriptions. Run the
command:
isip_extract_hypo -level word -format NIST_TRN -list $ISIP_TUTORIAL/databases/lists/identifiers_test.sof -output ./reference.score -exclude exclude_symbols.sof -debug brief $ISIP_TUTORIAL/databases/db/tidigits_trans_word_test_db.sof
The reference transcriptions will be extracted from the annotation graph
and stored in the file reference.score in NIST scoring format.
Now that the hypothesis and reference transcriptions have been converted to
the appropriate format, we can use the tool isip_eval to generate
a scoring report. From the same directory, run the command:
isip_eval score results.score reference.score results
Expected output:
/usr/local/sctk/bin/sclite -F -i swb -r reference.score-h results.score -o dtl all
sclite: 2.2 TK Version 1.2
Begin alignment of Ref File: 'reference.score' and Hyp File: 'results.score'
Alignment# 18 for speaker ah
Alignment# 17 for speaker ar
Alignment# 17 for speaker at
Alignment# 17 for speaker bc
Alignment# 17 for speaker be
Alignment# 17 for speaker bm
Alignment# 17 for speaker bn
Alignment# 17 for speaker cc
Alignment# 17 for speaker ce
Alignment# 17 for speaker cp
Alignment# 17 for speaker df
Alignment# 18 for speaker dj
Alignment# 17 for speaker ed
Alignment# 17 for speaker ef
Alignment# 17 for speaker et
Alignment# 17 for speaker fa
Alignment# 17 for speaker fg
Alignment# 17 for speaker fh
Alignment# 17 for speaker fm
Alignment# 11 for speaker fp
Writing scoring report to 'results.score.sys'
Writing raw scoring report to 'results.score.raw'
Writing overall detailed scoring report 'results.score.dtl'
Writing string alignments to 'results.score.pra'
Successful Completion
===============================================================================
SENTENCE RECOGNITION PERFORMANCE
sentences 336
with errors 10.1% ( 34)
with substitions 0.6% ( 2)
with deletions 0.0% ( 0)
with insertions 9.5% ( 32)
WORD RECOGNITION PERFORMANCE
Percent Total Error = 3.6% ( 39)
Percent Correct = 99.8% (1082)
Percent Substitution = 0.2% ( 2)
Percent Deletions = 0.0% ( 0)
Percent Insertions = 3.4% ( 37)
The command has generated a scoring report called
results.report.
The contents of this file are discussed in the
next section.