![]() |
Recent advances in auditory neuroscience are characterized by a close interaction between neurophysiological findings, psychophysical effects and integrative models that attempt to bridge the gap between neuroscience and psychophysics. This volume introduces the latest developments in this quickly evolving interdisciplinary area. Tutorials by leading international scientists as well as more focused contributions by active researchers providing an invaluable summary of our current knowledge of psychophysics and auditory physiology and the main lines of research in this field. The book will be of interest to anyone involved in hearing research, including neuroscientists, behavioral scientists, acousticians and biophysicists.
https://doi.org/10.1142/9789812818140_fmatter
The following sections are included:
https://doi.org/10.1142/9789812818140_0001
In this paper we present the results of a study comparing the encoding of the pitch of complex stimuli that differ greatly in their temporal structure. We used cosine-phase (CPH) and random-phase (RPH) harmonic complexes, and iterated rippled noise (IRN) with various fundamental frequencies or delays, respectively. Most neurones in the ventral cochlear nucleus were able to represent the pitch of harmonic complexes in their temporal response properties when all the components were in cosine-phase. However, some units failed to provide a robust representation of the pitch when the components were in random phase (as it is often the case for harmonic sounds in natural, echoic environments) or when the stimuli were IRNs. Based on first-order inter-spike interval statistics, sustained chopper units showed a bandpass periodicity tuning in response to IRN and RPHs. Surprisingly, bandpass periodicity tuning was also observed for onset-chopper units. Whereas ‘best periods’ for sustained chopper cells ranged from 4 to 10 ms, onset chopper units revealed best periods up to 16 ms thus covering also the lower pitch range. We hypothesise that sustained chopper and onset-chopper units play a role in the extraction of pitch of complex signals.
https://doi.org/10.1142/9789812818140_0002
Simultaneous recording and differential analysis of pre- and postsynaptic spike activity in spherical bushy cells of the anteroventral cochlear nucleus in the Mongolian gerbil was used to evaluate the interaction of acoustically driven excitation and inhibition in auditory signal processing. The data show that in 2/3rd of the units' acoustically evoked neuronal activation, even when pure tones are used as stimuli, is due to an integration of excitatory and inhibitory afferents. This inhibition can effectively narrow the bandwidth of acoustically evoked responses, enlarge the dynamic range of responsiveness, and increase the acuity of the coupling of the discharges to transients of the acoustic stimuli.
https://doi.org/10.1142/9789812818140_0003
Many natural sounds are characterized by periodic envelope or amplitude modulation (AM). The perceptual qualities corresponding to this signal characteristic are rhythm for low and pitch for high modulation frequencies (fm). We demonstrate that fm ranges corresponding to these two perceptual qualities seem to be represented by different codes in the primary auditory cortex (AI) of the Mongolian gerbil: Low fm are represented by a synchrony code whereas high fm are represented by a rate-place code. For the rate-place code we show that in response to AM neurons integrate over spectral ranges much wider than expected on the basis of their classical frequency receptive fields. Furthermore, we present preliminary data from learning experiments where gerbils were trained to discriminate between AM with different fm. It seems that the different cortical codes used to represent AM with different perceptual qualities are also reflected in the learning curves: learning performance is better for low fm ranges, where phase-locking is still present, than for high fm ranges that are encoded by a rate-place code.
https://doi.org/10.1142/9789812818140_0004
A great majority of neurons in the central nucleus of the inferior colliculus is spatially tuned for broadband sounds as revealed by free-field or virtual sound source (VSS) stimulation. The shape of the receptive fields can be explained by the shape of the monaural spectra and the interaural differences (time difference, level difference) of the individual head-related transfer functions (HRTF). Variation of overall sound pressure level had little effect on the preferred directions of the neurons. The size of the receptive fields, however, changed in a variety of ways. Constancy, expansion, contraction or combinations of both effects were found. Stimulation with binaural non-correlated VSS led to a loss of spatial tuning in many low-frequency neurons. Most high-frequency neurons showed no differences of the receptive fields under both conditions. In order to get information about the integration time of the inferior colliculus neurons, different stimulus duration was also tested. More that half of the neurons lost spatial selectivity when the stimulus duration was shorter than 5ms.
https://doi.org/10.1142/9789812818140_0005
Monaural auditory brainstem processing within the superior olivary complex (SOC) and cochlear nucleus, specifically the neuronal coding of temporal, spectral and intensity cues are a prerequisite for both sound localization and differentiation of acoustic objects. Neurons which are innervated strictly monaurally are located in the anteroventral cochlear nucleus (AVCN), the medial nucleus of the trapezoid body (MNTB) and in the superior paraolivary nucleus (SPN) (66% of the neurons). In all three nuclei, 58% of the investigated neurons receive a prominent inhibitory input. Different sets of acoustic stimuli were used to characterize the interaction between excitation and inhibition. In 19% of the AVCN neurons, inhibition causes an upper threshold in the response areas, while in a number of MNTB neurons thresholds are elevated due to inhibition. In another population of AVCN neurons (33%) the inhibitory influence increases the dynamic range. Many SPN neurons show a very broad excitatory or inhibitory frequency response. Cells which are dominated by inhibition are further characterized by a strong off-response. Both, excitatory and inhibitory SPN neurons show strong synchronized responses to AM-stimuli. This indicates that transients in sound amplitude are the main features processed by these neurons.
https://doi.org/10.1142/9789812818140_0006
The following sections are included:
https://doi.org/10.1142/9789812818140_0007
The recording of event-related brain potentials (ERPs) is a useful tool for studying central auditory processing in humans since it can be utilized in basic and clinical research to study the brain's pre-attentive auditory functioning without contamination by high-level, task-related processing- For example, the mismatch negativity (MMN) component of the ERP can tap mechanisms of involuntary attention switching underlying the conscious registration of sound change and features of passive auditory sensory memory creating the informational basis of active memory. This pre-attentive brain response is believed to be elicited by the outcome of a comparison process that registers a difference between the neural representation of the actual input and the memory trace of the invariances inherent to the recent stimulation.
https://doi.org/10.1142/9789812818140_0008
The presented study was designed to investigate ERP effects of auditory spatial attention in sustained attention condition (where the to-be-attended location is defined in a blockwise manner) and in a transient attention condition (where the to-be-attended location is defined in a trial-by-trial manner). Lateralization in the azimuth plane was manipulated (a) via monaural presentation of left- and right-ear sounds, (b) via interaural intensity differences, (c) via interaural time differences, (d) via an artificial-head recording, and (e) via free-field stimulation. Ten participants were delivered with frequent Nogo- and infrequent Go-Stimuli. In one half of the experiment participants were instructed to press a button if they detected a Go-stimulus at a predefined side (sustained attention), in the other half they were required to detect Go-stimuli following an arrow-cue at the cued side (transient attention). Results revealed negative differences (Nd) between ERPs elicited by to-be-attended and to-be-ignored sounds in all conditions. These Nd-effects were larger for the sustained than for the transient attention condition indicating that attentional selection according to spatial criteria is improved when subjects can focus to one and the same location for a series of stimuli.
https://doi.org/10.1142/9789812818140_0009
The following sections are included:
https://doi.org/10.1142/9789812818140_0010
Most of our knowledge about processing of amplitude modulated signals in the auditory system is based on single nerve recordings in animals and psychoacoustical experiments in humans. An objective investigation method in humans is the recording of amplitude modulation following responses (AMFRs). AMFRs are a subset of envelope following responses (EFRs) and can be recorded from the human scalp. In AMFRs continuous amplitude modulated (AM) stimuli with carrier frequencies fc and modulation frequencies fm are used. Responses arise at frequency fm and are largest for modulation frequencies in the range below 100 Hz with maxima depending on the subject's age and state of arousal. Responses are assumed to be evoked by neural excitation originating mainly from a basilar membrane region corresponding to the carrier frequency fc since stimulus bandwidth is very narrow. However, in masking experiments we could show that also regions of the basilar membrane related to frequencies up to several octaves above the carrier frequency contribute to the responses.
https://doi.org/10.1142/9789812818140_0011
The following sections are included:
https://doi.org/10.1142/9789812818140_0012
The Mismatch Negativity (MMN) is an event related potential component generated by pre-attentive difference-detection/comparison processes in the auditory cortex of the human brain [1] if a deviant stimulus is presented embedded in a sequence of standard stimuli. Traditionally, the MMN is defined as the difference between the response to the deviant stimulus and the response to the standard stimulus (i.e. deviant minus standard) [2]. If the MMN is interpreted as an indication for and a measure of the detection process, this traditional definition may not be adequate because it not only measures the field caused by neurons detecting the difference between the stimuli but also by any difference in the processing of the two stimuli, be it related to an explicit comparison or not. Several alternative definitions of the MMN are proposed and the resulting MMNs are compared.
https://doi.org/10.1142/9789812818140_0013
Presently we are establishing two sets of psychoacoustic tests (basic tests and integrational tests) which should allow the differentiation of peripheral and central auditory processing in normal-hearing and hearing-impaired subjects. In a baseline study performed on one hundred, 20-70 years old normal-hearing subjects we used these tests to evaluate age-dependent alterations in auditory processing. The basic tests measured (1) the monaural audiogram, (2) the frequnecy resolution and the limits of simultaneous (3) frequency and (4) intensity discrimination. The integrational tests reported here evaluate the limits of binaural (5) frequency, (6) intensity and (7) temporal discrimination. These measures, together with tests under successive stimulus conditions, show that temporal processing and processing of stimulus frequency worsens with increasing age, while the processing of stimulus intensity remains stable.
https://doi.org/10.1142/9789812818140_0014
The following sections are included:
https://doi.org/10.1142/9789812818140_0015
The detection of frequency modulation of formants (signals) can be disrupted by simultaneous presentation of off-frequency modulated formants (maskers) to the same ear1. The present study examines the effects of various binaural presentation schemes for the masker on this frequency modulation detection interference (FMDI) effect for formants. Signals and maskers were complex tones, bandlimited around 1500 and 3000 Hz, respectively. The signals were always presented to the right ear. The maskers were either stationary or modulated. No interference was found for stationary maskers. The amount of FMDI found for modulated maskers depended strongly on the binaural presentation scheme of the masker. Substantial interference was found for maskers presented to the ipsilateral ear. FMDI was smaller for contralateral maskers. In both cases it increased with increasing masker levels. Substantial interference was also found for binaural (both diotic and dichotic) masker presentations. The perceived location of the maskers, manipulated utilizing inter-aural time and level differences, did not affect FMDI. The amount of interference for binaural maskers was roughly equal to that of the monaural masker with the largest effect. The data were described very well by a simple model in which the amount of FMDI was predicted from a weighted combination of the monaural masker levels.
https://doi.org/10.1142/9789812818140_0016
The following sections are included:
https://doi.org/10.1142/9789812818140_0017
The following sections are included:
https://doi.org/10.1142/9789812818140_0018
In this study, subjects had to localize sound sources in free-field listening conditions and in virtual environments. Under virtual listening conditions, three auditory displays with distinct degrees of individualization were used. In contrast to previous studies using verbal reportexps or pointing techniques, the participants indicated the perceived location of a sound source by turning their eyes towards it. Localization performance under virtual conditions was comparable to free-field conditions for all subjects, at least if individual HRTFs were used. In general, judgments turned out to be more accurate than in previous studies, especially with regard to dispersion of single judgments. This may be due to a close link between the auditory and the oculomotor system leading to a quasi-automatic saccade towards an auditory stimulus, even without a corresponding visual target.
https://doi.org/10.1142/9789812818140_0019
The following sections are included:
https://doi.org/10.1142/9789812818140_0020
The following sections are included:
https://doi.org/10.1142/9789812818140_0021
Fundamental frequency (FO) discrimination can be impaired substantially by the presence of another complex (the “fringe”) immediately before and after the target complex. It has been shown [2] that for the impairment to occur (i) target and fringes have to be in the same frequency region; (ii) if all harmonics of target and fringes are unresolved then they may differ in F0; otherwise, they have to be similar. The present experiments investigated the effect of fringes lateralized differently from the target. In a 21-2AFC procedure, difference limens for FO for a 100-ms harmonic target complex were measured in the presence and absence of 200-ms harmonic fringes. The nominal FO was 88 Hz or 250 Hz. Stimuli were bandpass filtered between 125-625 Hz, 1375-1875 Hz, or 3900-5400 Hz. The target was presented monaurally, while the fringes were: (i) contralateral with the same level as the target, (ii) contralateral with a level increased by 20 dB, (iii) dichotic (ILD of +20 dB contralateral), (iv) diotic, and (v) ipsilateral. The effect of the fringes was reduced when their subjective location differed from that of the target. This reduction depended on the resolvability of both the fringes and the target. The results are consistent with the idea that the fringes produce interference when the listeners have difficulty segregating the target from the fringes.
https://doi.org/10.1142/9789812818140_0022
The following sections are included:
https://doi.org/10.1142/9789812818140_0023
The following sections are included:
https://doi.org/10.1142/9789812818140_0024
This work investigates the influence of the experimental setup on the resulting equal-loudness levels contours (ELLC) obtained with an adaptive 2 alternative forced choice procedure. The amount of change in the resulting contours is measured for different parameters. For a simple adaptive procedure the influence of the starting level is smaller than the range effect when using the method of constant stimuli. The effect of the starting level is reduced when using a large initial step size in the adaptive procedure. However the effect can still amount 6 dB. This effect diminishes when using an interleaved procedure. The maximum measurement error for the different procedures is estimated. In addition, it is shown that interindividual variance is reduced when the individual threshold is taken into account.
https://doi.org/10.1142/9789812818140_0025
The following sections are included:
https://doi.org/10.1142/9789812818140_0026
This study presents results from our time domain nonlinear model of the cochlea. After a general presentation of the model structure, some noticeable aspects of the cochlear time response are shown, in particular for a periodic pulse stimulus.
https://doi.org/10.1142/9789812818140_0027
Auditory signal processing begins long before the signal reaches the brainstem. The waveform is first split by the head into two waveforms. The pinnae introduce further spectral transformations that assist with estimating the elevation of the source. The ear canal and the middle ear action of the malleus incus and stapes attenuate potentially interfering very low and high frequency signals. Mechanical filtering of the signal in the cochlea can help to identify the spectral profile of the sound but may also have far reaching implications for segregating simultaneously present sounds sources through subsequent channel selection. The nonlinear response of the cochlea compresses the signal into a more manageable narrow dynamic range. It also cunningly fails to compress attenuated signals remote from best frequency with the effect of smearing information usefully across the array of mechanical filters. Electrical transduction at the inner hair cell transforms the signal in a number of ways. The process of adaptation makes it easier to identify the onset of a sound (or its modulation). Half-wave rectification and low pass filtering isolate the envelope of the signal for high frequency carriers while leaving the fine structure in place for low frequency signals. All this sophisticated signal processing occurs before the nervous system can become involved. As a consequence auditory modelling needs to replicate all of these processes if psychophysical phenomena are to be explained and predicted with quantitative accuracy. However, each one of these stages could easily occupy a lifetime of research. An important problem in modelling, therefore, is knowing how much detail is required and how much can be omitted. The answer lies in an understanding of what each stage contributes to the listening experience.
https://doi.org/10.1142/9789812818140_0028
It is widely believed that the frequency selectivity measured behaviourally in masking experiments is largely determined by the filtering that occurs on the basilar membrane. The tuning of a specific point of the basilar membrane is determined by two components: a passive component reflecting the hydromechanical properties of the basilar membrane and surrounding fluids and structures; and an active component reflecting the influence of the outer hair cells on basilar membrane motion. We have modelled this as the sum of two rounded-exponential functions, following Patterson et al. [17]. One of these functions represents the "tip" filter determined by the active mechanism. It is assumed to have a gain at the tip which varies with level in the same way as the basilar-membrane gain, for a tone close to the characteristic frequency; the gain is greater at low levels than at high levels. The low-frequency side of the tip filter is also assumed to vary in sharpness with level. The second rounded-exponential function represents the “tail” filter determined by the passive mechanism. Its gain and shape are assumed to be invariant with level. The model was used to fit a large body of data obtained with the notched-noise method. The model took into account the modification of the spectrum of the stimuli produced by the middle ear. Generally, the model fitted the data well using only a small number of free parameters. The filter shapes derived from the notched-noise masking data show a level dependence that strongly resembles the level dependence of filtering on the basilar membrane.
https://doi.org/10.1142/9789812818140_0029
This paper describes a number of computer simulations illustrating the main effects of outer hair loss upon the representation of sounds on the basilar membrane. These include (1) a basalward shift of the place of maximum vibration, (2) a decreased gain near the place of maximum vibration, (3) a broadening of the tuning curves, (4) an altered summation of activity across frequency components, and (5) an altered temporal response. It is argued that, for large classes of sounds, the basilar membrane patterns resulting from a loss of outer hair cells cannot, in principle, be compensated for by hearing aid signal processing techniques, such as multi-band amplitude compression and spectral sharpening.
https://doi.org/10.1142/9789812818140_0030
A model combining signal information from multiple frequency-time (f-t) windows is developed to address detection and intensity discrimination performance for stimuli with different spectro-temporal shapes. The model is based on the concept of elementary auditory f-t windows for stimulus analysis in the auditory system. Experiments have shown that this f-t window is a critical band wide and has a duration of about 4 ms at 1 kHz (1 ms at 4 kHz). The spectro-temporal shape of the stimulus determines the number of auditory f-t windows excited in the auditory system. The internal noise (coding acuity) in each f-t window is proportional to the stimulus energy (in accordance with Weber's Law). The information of the different f-t windows involved in the task is combined. Simulations with this model show that, for stimuli with different spectro-temporal shapes and equal total energy, the masked detection threshold is independent of the number of f-t windows, whereas the intensity discrimination threshold decreases if the number of f-t windows increases. These trends match our experimental data.
https://doi.org/10.1142/9789812818140_0031
Binaural hearing helps humans and animals to localize and unmask sounds. Here, binaural computations in the barn owl's auditory system are discussed. Barn owls use the interaural time difference (ITD) for azimuthal sound localization, and they use the interaural level difference (ELD) for elevational sound localization. ITD and ILD and their precursors are processed in separate neural pathways, the time pathway and the intensity pathway, respectively. Representation of ITD involves four main computational steps, while the representation of ILD is accomplished in three steps. In the discussion neural processing in the owl's auditory system is compared with neural computations present in mammals.
https://doi.org/10.1142/9789812818140_0032
Electrophysiological investigations in the midbrain of the guinea pig (ICc) revealed, that more than 90% of the neurons were spatially tuned when stimulated with broadband virtual sound sources. Most of the neurons preferred lateral positions, but tuning to frontal or rear directions and to different elevations was also observed. Based on the individual head-related transfer functions of each animal the interaural level differences (ILD), interaural time differences (ITD) and the monaural directivity were calculated in 1/3 octave bands for the upper hemisphere. It was assumed, that the neurons received input from ILD and ITD processors and from monaural pathways. The relative weights of these 72 inputs were estimated by a least square approximation of the neuronal response. The modeled responses were in good agreement with the measured responses. The weights were different for each of the tested neurons. High weights were found for ILD cues around the characteristic frequency of the neurons. Based on this single neuron model a localization model using a population of neurons which were tuned to different directions was tested in a localization task. The model allowed a robust estimation of the direction of the sound source.
https://doi.org/10.1142/9789812818140_0033
Coincidence detector neurons increase their firing rate significantly if the input changes from random to coherent. Similarly, neurons in the avian nucleus laminaris vary their firing rate as a function of the interaural time difference (ITD). In both cases, neurons transform temporally coded input into a rate-coded output. To characterize the quality of this transformation we define a new measure, which explicitly takes noisy spike output of neurons into account. As an application, we investigate the coincidence detection properties of an integrate-and-fire (I&F) neuron in dependence on internal parameters and input statistics. We show that there is an optimal threshold and, furthermore, that there is a broad range of near-optimal threshold values. The theoretical results are applied to ITD-tuning of neurons in the laminar nucleus of the barn owl.
https://doi.org/10.1142/9789812818140_0034
Binaural models attempt to explain binaural phenomena in terms of neural mechanisms that extract binaural information from acoustic stimuli. In this paper, a model setup is presented that can be used to simulate binaural detection tasks. In contrast to the most often used cross correlation between the right and left channel, this model is based on contralateral inhibition. The presented model is applied to a wide range of binaural detection experiments. It shows a good fit for changes in masker bandwidth or masker correlation, static and dynamic cues and level and frequency dependencies.
https://doi.org/10.1142/9789812818140_0035
The following sections are included:
https://doi.org/10.1142/9789812818140_0036
A review is given on past work in our working group that focussed on identifying and modeling factors involved in sensorineural hearing loss. Only the first two factors (i.e., the attenuation component or “linear” portion of the hearing loss as well as compression loss or “nonlinear” portion of the hearing loss) are primarily of cochlear origin. The “binaural loss”, i.e., a deterioration in binaural interaction encountered in most hearing-impaired listeners, constitutes the third factor which is a consequence of the distorted transformation from the cochlear output into the “internal representation” of the incoming acoustical signals. Finally, the fourth factor is the “central component” which may be viewed as an increased internal noise at the stage of the “internal representation”. This factor may both be due to noise in the transformation process from the acoustical signal to the internal representation and due to increased additive noise in the central processing mechanisms. The experimental data are interpreted in terms of a model of “effective” signal processing in the brain. Based on these findings, conclusions for the design of “intelligent” hearing aids can be drawn.
https://doi.org/10.1142/9789812818140_0037
In Cambridge, Oldenburg and Eindhoven, new efforts are being made to construct reasonably comprehensive, multi-channel, auditory models based on physiological knowledge. These are time-domain models that take sound waves as input rather than their power spectra, and they are intended to explain a substantial range of perceptual data associated with temporal auditory processing. In the first part of this paper, we review the spectral analysis and neural transduction stages of these models, basically to note the high degree of similarity and the implied common purpose. The output of these time-domain, multi-channel, cochlea simulations is a Neural Activity Pattern, or NAP, that preserves phase-locked time intervals up to 3-5 kHz. In the second part of the paper, we ask whether auditory modules proposed to extract perceptual properties like pitch, loudness, and modulation, all operate independently and in parallel on the NAP; or whether there is a mandatory stage of temporal integration prior to the distribution of the information to the independent, parallel modules.
https://doi.org/10.1142/9789812818140_0038
Recently, a model was presented which is able to predict human performance in a variety of experimental conditions related to modulation detection and modulation masking [2,3]- As a substantial part of the signal processing, a modulation filterbank was introduced to analyze the envelope fluctuations of the stimuli in each peripheral auditory filter. Modulation channels account for effects of frequency selectivity in the modulation frequency domain. However, the model does not cover conditions which require across-channel processing. Such across-channel processing of temporally modulated complex stimuli might occur, for example, in conditions of modulation detection interference (MDI), and comodulation masking release (CMR). A new modeling approach is presented that integrates envelope fluctuations across frequency while preserving the model's capabilities for describing modulation masking as well as spectral masking in the audio-frequency domain. The model can account for the main features of MDI, but it does not account for CMR.
https://doi.org/10.1142/9789812818140_0039
Optimised chirp signals for ABR, recently suggested by Dau et al. (1998) have been tested with the Auditory Image Model (AIM, Patterson et al., 1995). AIM is a comprehensive model of the peripheral auditory system, including a stage of strobed temporal integration to produce a stabilised image of repeating neural activity patterns. The chirps are calculated to compensate for spatial dispersion along the cochlea in order to stimulate all auditory nerve fibres at the same moment in time. Outputs of the several stages of the model in response to the chirps illustrate the form that the internal representation of these signals might take. Both a linear gammatone filterbank, and a transmission line model of basilar membrane motion produce responses that are reasonably consistent with what would be expected from the chirp signals, i.e., a distinct alignment of cochlear excitation over time. At the higher stages of auditory processing, some of the intended alignment seems to get degraded. The auditory images of upward and downward sweeps show less asymmetry than would be expected from the excitation along the cochlea.
https://doi.org/10.1142/9789812818140_0040
The following sections are included:
https://doi.org/10.1142/9789812818140_0041
A digital VLSI implementation of an algorithm modeling the “effective” signal processing of the human auditory system is presented. The model consists of several stages psychoacoustically and physiologically motivated by the signal processing in the human ear and was successfully applied to various speech processing applications. The processing scheme was partitioned for implementation in a set of three chips. It is demonstrated how an application of the model has been used to determine the necessary wordlengths for a transfer of the floating point algorithm into a version suitable for a hardware implementation. The developed synthesizable VHDL-descriptions of this fixed point version are now being evaluated in real time on FPGAs and will be manufactured as ASICs in a later version.
https://doi.org/10.1142/9789812818140_0042
For normal-hearing listeners reduction of modulations in speech transmission yields reduced intelligibility. Modulation transfer is even a good predictor for intelligibility in many listening conditions. For hearing-impaired listeners it is often argued that the limited dynamic range of the ear and the associated loudness recruitment needs to be compensated for by a compressive mapping of the level variations within speech (syllabic compression). However, because the results of numerous experiments on syllabic compression are generally negative, it is interesting to see whether the results for hearing-impaired listeners can be accounted for by a reduced transfer of modulations. Intelligibility scores for 16 normal-hearing listeners and 16 hearing-impaired listeners were obtained for speech after syllabic compression with as parameters the number of processing channels and the compression ratio. Simple modulation transfer accounts only partly for the variability in intelligibility scores. A much better result is obtained with a phase-locked transfer of modulations, using only output modulations in quarter octaves that are in-phase with modulations of the input signal. Both for normal-hearing listeners and for hearing-impaired listeners the data of various compression conditions can be equated in terms of the phase-locked modulation transfer.
https://doi.org/10.1142/9789812818140_0043
The following sections are included:
https://doi.org/10.1142/9789812818140_0044
Speech signals can be represented in the time-frequency domain and visualised by a spectrogram, i.e. the energy distribution in a number of spectral bands for successive speech frames (typically 10-20 ms fragments). This spectral representation is highly redundant. Principal component analysis (PCA) applied to such spectral representations of speech sounds reveals that a large proportion of the variance in speech can be described by only a few underlying components, or new dimensions. These dimensions define a subspace in which, by definition, the spectral variability of speech sounds is optimally represented. Projection of acoustical signals, speech and non-speech sounds, within this subspace shows different patterns. A speech detection scheme is presented which uses these differences to detect speech at low signal-to-noise-ratios, only by its spectral variation.
https://doi.org/10.1142/9789812818140_0045
The statistics of the binaural difference parameters were investigated by analyzing binaural signals from different directions in various sound environments. By using these statistics as a priori probabilities, an a posteriori probability of the location can be calculated using a Bayesian approach. A real-time localization algorithm based on this approach was implemented and tested. First results show a good azimuth and elevation localization accuracy in quiet environments and a robust azimuth estimation at high noise levels up to -5 dB Signal-to-Noise-Ratio (SNR).
https://doi.org/10.1142/9789812818140_0046
In noisy and reverberant environments the performance of automatic speech recognition systems drops below acceptable levels. It has been shown before, that using a psychoacoustical model of the peripheral auditory processing, as introduced by Dau [1], yields much higher recognition rates in noisy background situations than standard pre-processing methods [2]. To further improve the robustness of a speaker-independent digit recognition system, single-microphone noise reduction procedures [3] have been combined with the auditory model feature extractor and continuous Hidden Markov Models as well as Locally-recurrent Neural Networks. The study shows that the recognition rates improve significantly for certain noise conditions, while the performance for clean speech is not negatively effected.
https://doi.org/10.1142/9789812818140_0047
In an adaptive listening test, the bandwidth of speech in complementary notched noise was varied. The bandwidth (center frequency 1 kHz) required for 50% speech intelligibility is called Speech Reception Bandwidth Threshold (SRBT). The SRBT was measured for 10 normal-hearing and 30 hearing-impaired listeners. The average SRBT of the normal-hearing listeners is 1.4 octave. The performance of seven hearing-impaired listeners is considered normal, whereas 23 hearing-impaired listeners have a wider-than-normal SRBT. The SRBT of a hearing-impaired listener may be wider than normal, due to inaudibility of a part of the speech band, or to an impairment in the processing of speech. The Speech Intelligibility Index (SII) is used to separate these effects. The SII may be regarded as the proportion of the total speech information that is available to the listener. Each individual SRBT is converted to an SII value. For the normal-hearing listeners, the SII is about 0.3. For 21 hearing-impaired listeners, the SII is higher. This points to a speech-processing impairment in the 1-kHz frequency region. The deviation of an individual SII value from 0.3 can be used to “quantify” the degree of processing impairment.
https://doi.org/10.1142/9789812818140_0048
The following sections are included:
https://doi.org/10.1142/9789812818140_0049
In this study single-microphone noise reduction procedures were investigated for use in digital hearing aids. One widely reported artifact of most noise suppression systems, the musical noise phenomenon, can partly be overcome by the Ephraim-Malah noise suppression algorithms [1,2]. Based on these algorithms, three different versions have been implemented together with a procedure for automatically updating the noise-spectrum estimate. To evaluate the algorithms, different tests have been performed with six normal-hearing and six hearing-impaired subjects. With ‘standard’ measurement methods no increase in speech intelligibility was found compared to the unprocessed signal. However, benefits with respect to reductions in listener fatigue and in the mental effort needed to listen to speech in noise over longer periods of time were found in this study by use of a newly developed ease-of-listening test. Subsequent paired comparison tests also revealed a clear preference of the hearing-impaired subjects for the noise-reduced signals in situations with rather stationary noise. In the case of strongly fluctuating noise at low SNR, however, the subjects preferred the unprocessed signal due to speech distortions caused by the noise reduction algorithms.
https://doi.org/10.1142/9789812818140_bmatter
The following sections are included: