Feature Extraction" --> Automatic Speech Recognition: Home Software Docs Tutorials Demos Databases Dictionaries Models Research Support Mailing Lists What's New
You are here: Data Prep / Alphadigits Tutorial / Prototype System / Tutorials / Software / Home  

 
 
  • NIST sphere to 16 bit linear format:

    The current version of the feature extraction utility, extract_feature expects the input speech data to be in a 16-bit linear format. Since most speech database distributions come in the NIST SPHERE format, we need to convert the data to 16-bit linear format. We do this using conversion routines, which are distributed by NIST as part of their SPHERE file processing utility suite.

    We use w_decode to produce a 16-bit linear file with a SPHERE header and then remove the SPHERE header using the h_strip utility. For example:

    # convert the SPHERE file to 16-bit linear format with a SPHERE header
    #
    w_decode -f -opcm AD-2623.p7.wav AD-2623.p7.lin

    # remove the SPHERE header
    #
    h_strip AD-2623.p7.lin AD-2623.p7.raw

  • Feature Extraction:

    The extract_feature utility has a large number of options to choose from. Typically we use 12 mfcc features and energy, their differences (delta coefficients) and double-differences (acceleration coefficients). Various other options need to be specified including standard signal processing parameters such as window type, pre-emphasis and window/frame durations.

    The data generated for this tutorial uses the following options:

    extract_feature -delta -acc -cms -zero_mean -energy -energy_norm -mfcc 12 -delta_win 2 -lifter_coeff 22 -window_dur 25 -frame_dur 10 -num_fbanks 24 -window_type hamming -pre_emph_coeff 0.97 -output_mode binary -input raw_data.list -output mfcc_data.list

    Now we are all set to start using the feature data for training our HMMs and for performing recognition using the ISIP speech recognition system.



prev


next


top
   
   
    Help / Support / Site Map / Contact Us / ISIP Home