| |
Overview:
An overview of the speech recognition process is shown below.
There are three main components to the process: acoustic modeling,
language modeling, and search. Search is often referred to
as recognition, decoding or evaluation. It is the process
by which the system uses a fully-trained recognizer to produce
a hypothesis of what was spoken. It is the main topic of this section.
Acoustic modeling is described in
Section 5
and language modeling is described in
Section 6.
Conversion of the speech signal to a text message containing the
spoken words is only one of many tasks entailed in the process of
automatic speech recognition. Once the acoustic and language
models are built, recognition requires searching all possibilities
generated by these models. The number of possibilities generated
can be prohibitive. Thus, efficient search techniques are critical
to the performance of a recognizer.
Most recognition systems use the
Viterbi beam search
algorithm, but other algorithms may be used and are supported in the
software. Continue to
Section 4.1
for additional theoretical information on search algorithms for speech
recognition.
Contents:
|
| |
|