| |
-
NIST sphere to 16 bit linear format:
The current version of the feature extraction utility,
extract_feature
expects the input speech data to be in a 16-bit linear
format. Since most speech database distributions come in the
NIST SPHERE format, we need to convert the data to 16-bit
linear format. We do this using conversion routines, which are
distributed by
NIST
as part of their SPHERE file processing utility suite.
We use w_decode to produce a 16-bit linear file with a
SPHERE header and then remove the SPHERE header using the
h_strip utility. For example:
# convert the SPHERE file to 16-bit linear format with a
SPHERE header
#
w_decode -f -opcm AD-2623.p7.wav AD-2623.p7.lin
# remove the SPHERE header
#
h_strip AD-2623.p7.lin AD-2623.p7.raw
-
Feature Extraction:
The
extract_feature
utility has a large number of options to choose from.
Typically we use 12 mfcc features and energy, their
differences (delta coefficients) and double-differences
(acceleration coefficients). Various other options need to be
specified including standard signal processing parameters such
as window type, pre-emphasis and window/frame durations.
The data generated for this tutorial uses the following
options:
extract_feature
-delta -acc -cms -zero_mean -energy -energy_norm -mfcc 12
-delta_win 2 -lifter_coeff 22 -window_dur 25 -frame_dur 10
-num_fbanks 24 -window_type hamming -pre_emph_coeff 0.97
-output_mode binary -input
raw_data.list
-output
mfcc_data.list
Now we are all set to start using the feature data for
training
our HMMs and for performing
recognition
using the ISIP speech recognition system.
prev
next
top
|
|
|
|