Home Software Docs Tutorials Demos Databases Dictionaries Models Research Support Mailing Lists What's New
You are here: Model Estimation / Alphadigits Tutorial / Prototype System / Tutorials / Software / Home  

 
 
  • Why Flat-start?

    The first thing we do to start the HMM training process is to seed the models with some initial values. There are several techniques used to do this. The more common ones include choosing representative examples of the units (phones in our example) and seeding models with the corresponding feature data. However, this is a time consuming process requiring linguistic/phonetic expertise.

    A much simpler strategy has been to initialize all models with the global mean and variance computed for a small subset of the training data. This process, though not elegant, has been found to be very effective. We call this Flat-start.

  • Model Initialization:

    The utility we use to flat-start models is init_hmm. Before we flat-start the models, we need to decide the number of states for each model and the set of input data we would like to use in this stage. An illustration of the data flow is shown below.

    Flat start


    The command line used is:

    init_hmm -input mfcc_data.list -models fs_num_states.list -trans fs_trans.text -state fs_states.text -mode binary -vfloor_file varfloor.text -var_floor 0.0002

    Typically we do not train the "sp" model at this point, but we do define it as a placeholder for training in later stages.

    We now go on to define the model structure (topology). The initialized transitions file assumes a topology which is strict left to right with no skips. Each state can stay in the same state or move to the next state. In the case of silence it is typical to allow an ergodic model where we allow transitions from the last state to the first state and vice-versa. This change has to be done manually in the transitions file which would result in a transition matrix for the model "sil" of the following form:

    0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
    0.000000e+00 5.000000e-01 3.000000e-01 2.000000e-01 0.000000e+00
    0.000000e+00 0.000000e+00 5.000000e-01 5.000000e-01 0.000000e+00
    0.000000e+00 2.000000e-01 0.000000e+00 3.000000e-01 5.000000e-01
    0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00


    In order to determine the index of the transition matrix corresponding to the "sil" model, look at the model definition file created in the data preparation section of this tutorial. It shows that transition matrix 1 is the matrix we would like to replace with the one defined above. Now we are all set to start training the system.

  • Viterbi Training:

    To save on disk space and I/O time we would like for all files to be in binary format. Since init_hmm outputs an ASCII text file, we can convert the fs_states.text file to a binary file using the convert_mmf utility.

    Training is performed using the hmm_train utility. Most of the user input to this utility is via the parameter file. For our example a parameter file would look like this.

    We use model labels for forced alignment in this stage of the training process. The alignments are very primitive in that they are merely the reference word sequences converted to their corresponding phone sequences as explained in the data preparation section of this tutorial.

    Now we execute the following command to begin the training process.

    hmm_train -p fs_params.text -c CI

    A rule of thumb is that one should make four such passes using the same feature data and replacing the old states file and transitions file with the newly generated ones at the end of each iteration. In many cases the transitions are not updated for the first few passes.


prev


next


top
   
   
    Help / Support / Site Map / Contact Us / ISIP Home