|
ACTIVE (Alphabetical Order By Directory):
-
Nonlinear Statistical Modeling of Speech
Hidden Markov models (HMMs) have been the primary approach
to to speech recognition for almost 25 years. The goal of
this project is to develop a new approach to statistical
modeling of speech based on nonlinear statistics. Our first
step will be to implement a speaker recognition system
using a nonlinear time series approach to modeling the
signal. This approach will be compared to our previous
attempts to advance HMMs based on Support Vector Machines
(SVMs) and Relevance Vector Machines (RVMs).
-
Internet-Accessible Speech Recognition Technology
Speech recognition research remains a major activity for
ISIP. Large vocabulary conversational speech recognition
(LVCSR) is a fascinating technology that draws heavily from
the diverse research areas of statistical pattern
recognition, digital signal processing, artificial
intelligence, linguistics, and information theory. On this
web site you will find a powerful and flexible public
domain speech recognition system written in C++.
INACTIVE (Alphabetical Order By Directory):
-
Vehicle Performance Monitoring System
This project is a one-year collaboration with the
Mississippi Department of Transportation (MDOT) to adapt
and apply the Mississippi State University wireless
web-based vehicle performance and monitoring system
(VPMS)(developed in the
Campus Bus Networking
project) to provide capabilities to measure vehicle
utilization, real-time performance monitoring and
historical information on vehicle travel paths. This
research will also provide key technology components to be
incorporated into future transportation safety programs to
be conducted by the Mississippi Department of
Transportation (MDOT).
-
Campus Bus Networking
Networked vehicles will be a cornerstone of the next
generation intelligent transportation system. In this
project, we are developing the hardware and software
necessary to perform two-way communications with a vehicle
track and to collect critical vehicle performance
data. Visit our web page that tracks the campus bus system
in real time.
-
IP Version 6 Research
IP version 6 (IPv6) is the next generation Internet
protocol that has the potential to drastically change the
way we use the Internet as part of our everyday lives. We
are exploring IPv6 and areas of research that we can
contribute to the development and deployment of this next
generation protocol. We are currently investigating peer
to peer IPv6 networks and applications, mobile IPv6, and
high performance routing.
-
Aurora Evaluation Of Speech Recognition Front Ends
The goal of this project is to evaluate and compare the
robustness of feature extraction algorithms on a large
vocabulary task. The target application is cellular
telephony. These evaluations are being conducted under the
auspices of the
Aurora Distributed Speech Recognition
working group of
The European Telecommunications Standards Institute
(ETSI). The Wall Street Journal database (WSJ0) is being
used as the basis for experiments.
-
Bulldog Stock Exchange
As part of a unique entrepreneurship thrust in MS State's
College of Engineering, EE Senior Design teams form
companies. These companies are publicly traded on the
Bulldog Stock Exchange. This simulation teaches our
students about the intimate relationships between
technology and business.
-
In-Vehicle Dialog Systems
A voice interface is a superb tool for in-vehicle
information access when your hands and eyes are busy. In
this project, we are developing a dialog system that
provides information about the university and its
surrounding area. For example, a user can ask "Where is the
nearest restaurant to my hotel?" or "How do I get from the
airport to my hotel?".
-
A Japanese Command and Control Word Database
The Japan Electronic Industry Development Association's
Common Speech Data (JCSD) Corpus is an isolated phrase
corpus consisting of 150 speakers (75 males/75 females) and
almost 200,000 utterances. It represents an important
milestone in Japanese speech recognition technology
development. The JCSD Corpus was originally collected in
1986 in Japan in a nationwide project managed by Professor
Shuichi Itahashi in coordination with the Japan Electronic
Industry Association (JEIDA). Its importance to Japanese
speech recognition technology development is, to some
extent, comparable to Texas Instruments' famous 46-word
speaker-dependent corpus. The JCSD Corpus was one of the
first industry-standard and freely available corpora for
the study of Japanese language speech recognition. Most of
the competitive Japanese language speech recognition
systems developed in Japan have been benchmarked on various
subsets of this corpus. Hence, it is one of the most
important standards of comparisons that exist for Japanese
language systems.
-
Automatic Pronunciation Generation
Correct recognition of proper nouns is critical to problems
in speech understanding and applications involving voice
interfaces. The recognition system requires accurate
pronunciation networks for correct recognition of such
words. This is a challenging problem because a large number
of proper nouns have multiple valid pronunciations that do
not follow typical letter-to-sound conversion rules.
Generating such pronunciation dictionaries by hand is
highly impractical; and classical rule-based text-to-speech
systems are unsuitable for this task as they inherently
generate only a single pronunciation. ISIP has developed a
suite of algorithms involving stochastic neural networks,
decision trees and other statistical techniques that are
capable of automatically generating multiple pronunciations
for proper nouns based on only the text-based spelling of
the name.
-
Spoken Language Information Retrieval
Our goal is to better understand how integration of
prosodic information, speech recognition and parsing can
impact the problem of information extraction from spoken
documents. This research will provide initial steps towards
information extraction from telephone messages,
conversations, or university lectures, or from any text
(such as encyclopedias), and can serve as the basis for a
sorely needed sophisticated web browser technology and data
mining applications.
-
Powertrain Design and Optimization
State of the art design tools in automotive engineering
still lack the power, sophistication, and automation of
design tools for the electronics industry. It is our goal
to fundamentally advance automotive design engineering by
introducing optimization and physics-based design
principles into standard industry design tools. This will
allow designers to globally optimize design criteria such
as size, efficiency, cost, weight, volume, and achieve
unprecedented reductions in design turnaround time.
-
Robust Acoustic Modeling
Field deployment of speech recognition technology results
in a number of interesting problems, such as microphone
saturation, which severely limit the performance of speech
recognition engines. In this project, we study the effects
of microphone saturation and develop algorithms to improve
robustness to saturation, clipping, and other forms of
signal degradation.
-
Robust Low Perplexity Voice Interfaces
Robust speech recognition technology for speech recorded
and transmitted over narrowband channels requires advances
in several components of a speech recognition system:
signal processing techniques that produce invariant feature
sets; acoustic modeling and training that produce
channel-independent acoustic models; noise cancellation
techniques that mitigate the effects of impulsive and
application-dependent transient noise. This project is a
one-year collaboration with the MITRE Corporation that will
result in a prototype of a near real-time system that
provides a robust and flexible command and control voice
interface in realistic tactical noisy environments.
-
Southern-Accented Speech
Southern accents are underrepresented in most pubicly
available databases. This had led to speculation that
performance for such speakers is worse than other
better-represented dialects. To test this hypothesis, a
small data collection effort was recently conducted that
targeted Southern-accented speakers. Data was collected
from February 21 to February 25, 2000. The data collected
consisted of a total of 23 speakers (13 males and 10
females) ranging in age from 18 to 56.
-
Switchboard Resegmentation
The SWITCHBOARD Corpus (SWB) has become critical to the
success of state-of-the-art LVCSR systems. Using this
data, however, has not been without its share of drawbacks.
Word-level transcription of SWB is difficult, and
conventions associated with such transcriptions are highly
controversial and often application dependent. By 1998,
the quality of the SWB transcriptions for LVCSR was
recognized to be less than ideal, and many years of small
projects attempting to correct the transcriptions had taken
their toll. In February of 1998 ISIP began a project to do
a final cleanup of the SWB Corpus, and to organize and
integrate all existing resources related to the data into
this final release.
-
A Digital Telephone Interface For Sun Workstations
Using the Linkon system, a speech data collection board, we
have developed a fully-expandable, robust system for
platform-independent collection of telephone speech data.
Our object-oriented software libraries and intuitive GUI
provide powerful tools with which even a novice user can
efficiently prototype complex applications. Using the
system one can generate programs which range from simple
single-user prompt/record demonstrations to robust
SWITCHBOARD-type multi-user applications.
-
Scenic Beauty Estimation of Forestry Images
The United States Department of Agriculture and Forest
Services require the automatic determination of the scenic
beauty of a given forest scene. Their requirement is a
consequence of rising public concern to preserve forest
beauty. To achieve this, we have developed an extensive
database that can support our algorithm development. The
database consists of 637 unique images, each image having
various subjective ratings for their scenic beauty content.
The database extensively samples several dimensions of the
problem including year, season, time of day, angle and
treatment. In order to automatically relate the beauty of
an image to the subjective beauty ratings, we have
developed algorithms to extract features from the image
that determine its scenic beauty. The features extracted
are compared to model files using standard pattern matching
paradigm. The other goal of this project is to recognize
the various constituents of a forest scene. To achieve
this, we will use a variety of techniques that are
currently being used for speech recognition purposes.
Currently we have produced algorithms that can classify
images into high, medium or low scenic beauty with an
accuracy of 62.3%.
-
Cognitive Assessment Using Voice Analysis
The goal of this project is to design an effective fatigue
monitoring and assessment system by characterizing changes
in a human voice as a speaker becomes fatigued or stressed.
A remote, near-real-time assessment system to monitor the
fatigue levels of military personnel will be developed
during the course of this project.
|