Abstract |
In adverse listening conditions (e.g. presence of noise, hearing
-
impaired listener etc.)
people adjust their speech in order
to overcome the communication difficulty and
successfully deliver their message. This remarkable adjustment produces different
speaking styles compared to unobstructed speech (casual speech) that vary among
speakers and conditions, but share a common characteristic; high intelligibility.
Developing algorithms that exploit acoustic features of intelligible human speech
could be beneficial for speech technology applications that seek methods to enhance
the intelligibility of “speaking
-
devices”. Besides the commercial orientation (e.g.,
mobile telephone, GPS, customer service systems) of these applications, most
important is their medical context, providing assistive communication to people with
speech or hearing deficits. However, current speech technology is
deaf, meaning that
it cannot adjust, like humans do, to the dynamically changing real environments or to
the listener’s specificity.
This work proposes signal modifications based on the acoustic properties of a high
intelligible human speaking style, the clear speech, assisting in the development of
smart speech technology systems that “mimic” the way people produce intelligible
speech.
Unlike other speaking styles, clear speech has a
high intelligibility impact on
various listening populations (native and non
-
native listeners, hearing impaired,
cochlear implant users, elderly people, people with learning disabilities etc.) in many
listening conditions (quiet, noise, reverberation). A significant part of this work is devoted to the comparative analysis between casual
and clear speech, which reveals differences on prosody, vowel spaces, spectral energy
and modulation depth of the temporal envelopes. Based on these observed and
measured
differences between the two speaking styles, we propose modifications for
enhancing the intelligibility of casual speech. Compared to other state
-
of
-
the
-
art
modification systems, our modification techniques
(1) do not require excessive
computation (2) are
speaker and speech independent (3) maintain speech quality (4)
are explicit, since they do not require statistical training and the preexistence of clear
speech recordings. Evaluations on intelligibility and quality are performed objectively using recently
proposed objective intelligibility scores and subjectively with listening tests conducted
by native and non native listeners in noisy environments (speech shaped noise, SSN),
reverberation and in quiet. Results show that our modifications enhance speech
intelligibility in SSN and reverberation for native and non
-
native listeners. Specifically,
the proposed spectral modification technique, namely Mix
-
filtering, increases the
intelligibility of speech in noise and reverberation while maintains the quality of the
original signal, unlike other intelligibility boosters. Moreover, a modulation depth
enhancement technique called DMod, increases speech intelligibility more than 30%
in SSN. DMod algorithm is inspired by both clear speech properties and by the non
-
linear phenomena that take place in the basilar membrane. DMod not only achieves
to enhance speech intelligibility, but it introduces a novel method for
manipulating
the modulation spectrum of the signal. Results
of this study indicate a connection of
the modulations of the temporal envelopes with speech perception and specifically
with processes that take place
on the basilar membrane of human ear and pave the
way for analyzing and comprehending speech in terms of modulations.
|