| |
-
Why Flat-start?
The first thing we do to start the HMM training process is to
seed the models with some initial values. There are several
techniques used to do this. The more common ones include
choosing representative examples of the units (phones in our
example) and seeding models with the corresponding feature
data. However, this is a time consuming process requiring
linguistic/phonetic expertise.
A much simpler strategy has been to initialize all models with
the global mean and variance computed for a small subset of the
training data. This process, though not elegant, has been found
to be very effective. We call this Flat-start.
-
Model Initialization:
The utility we use to flat-start models is
init_hmm.
Before we flat-start the models, we need to decide the number
of states for each model and the set of input data we would
like to use in this stage. An illustration of the data flow is
shown below.
The command line used is:
init_hmm
-input
mfcc_data.list
-models
fs_num_states.list
-trans
fs_trans.text
-state
fs_states.text
-mode binary -vfloor_file varfloor.text -var_floor 0.0002
Typically we do not train the "sp" model at this point, but
we do define it as a placeholder for training in later stages.
We now go on to define the model structure (topology). The
initialized transitions file assumes a topology which is strict
left to right with no skips. Each state can stay in the same
state or move to the next state. In the case of silence it is
typical to allow an ergodic model where we allow transitions
from the last state to the first state and vice-versa. This
change has to be done manually in the transitions file which
would result in a transition matrix for the model "sil" of the
following form:
0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
0.000000e+00 5.000000e-01 3.000000e-01 2.000000e-01 0.000000e+00
0.000000e+00 0.000000e+00 5.000000e-01 5.000000e-01 0.000000e+00
0.000000e+00 2.000000e-01 0.000000e+00 3.000000e-01 5.000000e-01
0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
In order to determine the index of the transition matrix
corresponding to the "sil" model, look at the
model definition file
created in the data preparation section of this tutorial. It
shows that transition matrix 1 is the matrix we would like to
replace with the one defined above.
Now we are all set to start training the system.
-
Viterbi Training:
To save on disk space and I/O time we would like for all files
to be in binary format. Since
init_hmm
outputs an ASCII text file, we can convert the
fs_states.text
file to a binary file using the
convert_mmf
utility.
Training is performed using the
hmm_train
utility. Most of the user input to this utility is via the
parameter file. For our example a parameter file would look
like
this.
We use
model labels
for forced alignment in this stage of the training process. The
alignments are very primitive in that they are merely the
reference word sequences converted to their corresponding phone
sequences as explained in the
data preparation section
of this tutorial.
Now we execute the following command to begin the training
process.
hmm_train
-p
fs_params.text
-c CI
A rule of thumb is that one should make four such passes using
the same feature data and replacing the old states file and
transitions file with the newly generated ones at the end of
each iteration. In many cases the transitions are not updated
for the first few passes.
prev
next
top
|
|
|
|