| |
This is a graphical user interface tool for speech segmentation and
speech transcription. The tool provides spectrograms and energy plots,
speech selection, and audio playback capabilities. The tool is a
single channel version which is specifically designed for quick access
to multiple files from a single speaker (mono). It is written partly
in object-oriented C++ (using GNU's gcc compiler) which is interfaced
to Tcl-Tk (v8.0) utilities.
To download the current version of the Transcriber tool click
here.
Also please feel free to send us
comments or suggestions
regarding the tool.
The interface:
The main features include
-
displays signal waveforms
-
displays spectrograms
-
displays energy plots
-
zoom in/out on the displays
-
ability to set time marks
-
automatic completion of words listed in a lexicon file
-
ability to set attributes of each individual transcription
-
ability to quickly switch between audio files from a list
-
plays audio on mono channels
-
modify/enter transcriptions
-
set user-defined configurations
The energy plot display:
The energy plot features include
-
zoom in/out on the energy plot
-
ability to set the canvas size
-
ability to change the amplitude
-
ability to set the frame length
-
ability to set the window length
-
ability to set the RMS scale factor
-
ability to set the preemphasis coefficient
-
ability to set the window function
The signal plot display:
The signal plot features include
-
zoom in/out on the signal plot
-
ability to change the amplitude
-
ability to change the volume
-
ability to set the audio device and server
The spectrogram display:
The spectrogram features include
-
zoom in/out on the spectrogram
-
ability to change brightness
-
ability to change the contrast
-
ability to set the preemphasis coefficient
-
ability to set the window function
The Transcriber tool currently supports only 16 bit single channel
linear data (RAW). In order to use the Transcriber with other types of
data you will need to use the
NIST SPHERE tools
to convert your data to RAW format.
In order to use your own data with the Transcriber you will need to
set up a configuration file with parameters like the audio device,
audio server, sample frequency, sample number of bytes etc. You will
also need to specify the lexicon file path (lexfile) and the call file
path (callfile). The lexicon file for all purposes is a user defied
reference dictionary that can be viewed, searched, and modified
according to one's preference. The call file contains the location
of the transcription file, audio list and comment file. Each of the
three previous parameters are significant in which the transcription
file contains a set key value pairs that describe each entry in the
file. The comment file on the other hand contains a set of bookmarks
that tells you the start and stop time along with the duration of the
transcription process. Finally, the audio list contains the location
of all the audio data that is associated with the given transcription
file.
An example directory structure of the Transcriber follows:
There are several options that are available for using the display and
audio facilities. These options are accessible by clicking on the
Config button on the main screen.
In the first section under Session File the current configuration
file, comment file, transcription file and lexicon file are
listed. You can even browse through and select another configuration
file via the Browse button.
In the second section under Audio-Related Parameters you have the
option of setting the audio device (sparc, dat, ncd, x86) to you
system. You can also select the audio server (speaker, headphone,
line) from the options offered.
In the third section under Energy Plot Parameters you have the option
of changing the energy plot parameters. The options include changing
the frame length, window length and the RMS scale factor of the energy
plot. You can even enlarge or diminish the size of the energy plot
canvas by setting the Canvas Size option to your preference.
In the fourth section under Spectrogram Parameters you have the option
of setting the brightness and contrast of the spectrogram to any
specific value instead of using the slider bars on the main screen.
Finally in the last section under Miscellaneous you have the option of
preemphasizing the data for the energy plot and spectrogram. You can
either set the preemphasis on or off depending on your preference. You
can also set the preemphasis coefficient to any desired value.
The user also has the option of windowing the data using the standard
window functions like Hamming, Hanning, rectangular, Bartlett and
Blackman.
The Transcriber also has a very nifty auto fill facility which
automatically completes the word by hitting the Tab key. However, the
auto fill facility will not work if the word to be completed is not in
the Lexicon file. Also if a word completion has several possible
outcomes, a pop up box will be generated which will list all possible
completion for that word. You can then select the desired word from
the generated list.
The Transcriber can be used not only for speech segmentation and
speech transcription but also for viewing signals. In order to view
the signal without having to deal with the transcription protocols
just click on the the Lock button on the screen. Apart form viewing
the signal there are several display options that are available. Some
of these displays options include having the ability to zoom into a
section of the signal, change the amplitude of the signal and set the
brightness and contrast of the spectrogram.
|
|
|
|