Schroeter, J, Mehta, S.K., Carter, G.C.Acoustic Signal Processing The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton CRC Press llc. 2000
Schroeter, J., Mehta, S.K., Carter, G.C. “Acoustic Signal Processing” The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000
Acoustic Signal Processing uergen Schroeter 19.1 Digital Signal Processing in Audio and Electroacoustics Acoustics Research Dept, ATa)T Bell laboratories Steerable Microphone Arrays. Digital Hearing Aids. Spatial ing.Audio Coding. Echo Cancellation.Active Noise Sanjay K Mehta 19.2 Underwater Acoustical Signal Processing s What Is Underwater Acoustical Signal Processing?. Technical G. Clifford Carter Overview. Underwater Propagation. Processing NUwC Detachment Functions. Advanced Signal Processing. Application 19.1 Digital Signal processing in audio and electroacoustics Juergen Schroeter In this section we will focus on advances in algorithms and technologies in digital signal processing(DSP)that have already had or, most likely, will soon have, a major impact on audio and electroacoustics(A&E). Because A&E embraces a wide range of topics, it is impossible for us to go here into any depth in any one of them Instead, this section will try to give a compressed overview of the topics the author judges to be most important n the following, we will look into steerable microphone arrays, digital hearing aids, spatial processing, audio coding, echo cancellation, and active noise and sound control. We will not cover basic techniques in digital recording [ Pohlmann, 1989] and computer [Moore, 1990 Steerable Microphone arrays Steerable microphone arrays have controllable directional characteristics. One important application is in teleconferencing. Here, sound pickup can be highly degraded by reverberation and room noise. One solution to this problem is to utilize highly directional microphones. Instead of pointing such a microphone manually to a desired talker, steerable microphone arrays can be used for reliable automatic tracking of speakers as they move around in a noisy room or auditorium, if combined with a suitable speech detection algorithm Figure 19. 1 depicts the simplest kind of steerable array using N microphones that are uniformly spaced with distance d along the linear x-axis. It can be shown that the response of this system to a plane wave impinging at an angle 0 is ane Here,j=V-1,@ is the radian frequency, and c is the speed of sound. Equation(19. 1)is a spatial filter coefficients am and the delay operator zl= exp(-jdovc cose). Therefore, we can apply finite impulse resp (FIR)filter theory. For example, we could taper the weights a, to suppress sidelobes of the array. we also to guard against spatial aliasing, that is, grating lobes that make the directional characteristic of the array c 2000 by CRC Press LLC
© 2000 by CRC Press LLC 19 Acoustic Signal Processing 19.1 Digital Signal Processing in Audio and Electroacoustics Steerable Microphone Arrays • Digital Hearing Aids • Spatial Processing • Audio Coding • Echo Cancellation • Active Noise and Sound Control 19.2 Underwater Acoustical Signal Processing s What Is Underwater Acoustical Signal Processing? • Technical Overview • Underwater Propagation • Processing Functions • Advanced Signal Processing • Application 19.1 Digital Signal Processing in Audio and Electroacoustics Juergen Schroeter In this section we will focus on advances in algorithms and technologies in digital signal processing (DSP) that have already had or, most likely, will soon have, a major impact on audio and electroacoustics (A&E). Because A&E embraces a wide range of topics, it is impossible for us to go here into any depth in any one of them. Instead, this section will try to give a compressed overview of the topics the author judges to be most important. In the following, we will look into steerable microphone arrays, digital hearing aids, spatial processing, audio coding, echo cancellation, and active noise and sound control. We will not cover basic techniques in digital recording [Pohlmann, 1989] and computer music [Moore, 1990]. Steerable Microphone Arrays Steerable microphone arrays have controllable directional characteristics. One important application is in teleconferencing. Here, sound pickup can be highly degraded by reverberation and room noise. One solution to this problem is to utilize highly directional microphones. Instead of pointing such a microphone manually to a desired talker, steerable microphone arrays can be used for reliable automatic tracking of speakers as they move around in a noisy room or auditorium, if combined with a suitable speech detection algorithm. Figure 19.1 depicts the simplest kind of steerable array using N microphones that are uniformly spaced with distance d along the linear x-axis. It can be shown that the response of this system to a plane wave impinging at an angle q is: (19.1) Here, j = , w is the radian frequency, and c is the speed of sound. Equation (19.1) is a spatial filter with coefficients an and the delay operator z–1 = exp(–jdw/c cosq). Therefore, we can apply finite impulse response (FIR) filter theory. For example, we could taper the weights an to suppress sidelobes of the array. We also have to guard against spatial aliasing, that is, grating lobes that make the directional characteristic of the array H j a en n N c nd ( ) cos w w q = = - - Â 0 1 j( / ) -1 Juergen Schroeter Acoustics Research Dept., AT&T Bell Laboratories Sanjay K. Mehta NUWC Detachment G. Clifford Carter NUWC Detachment
x=d x=2d FIGURE 19. 1 A linear array of N microphones(here, N=5; t=d/c cos 0) o FIGURE 19.2 Three superimposed linear arrays depicted by large, midsize, and small circles. The largest array covers the low frequencies, the midsize array covers the midrange frequencies, and the smallest covers the high frequenc ambiguous. The array is steered to an angle eo by introducing appropriate delays into the n microphone lines In Eg (19.1), we can incorporate these delays by letting an joto a+j(o/c)nd cos eo (19.2) Here to is an overall delay equal to or larger than Nd/c cose, that ensures causality, while the second term in Eq (19.2)cancels the corresponding term in Eq. (19.1)at 8 =0 o. Due to the axial symmetry of the one dimensional (linear, 1-D)array, the directivity of the array is a figure of revolution around the x-axis. Therefore in case we want the array to point to a single direction in space, we need a 2-D array. Since most of the energy of typical room noise and the highest level of reverberation in a room is at low frequencies, one would like to use arrays that have their highest directivity (i.e, narrowest beamwidth) at low frequencies. Unfortunately, this need collides with the physics of arrays: the smaller the array relative to the wavelength, the wider the beam (Again, the corresponding notion in filter theory is that systems with shorter have wider bandwidth. )One solution to this problem is to superimpose different-size arrays and filter each output by the appropriate bandpass filter, similar to a crossover network used in two-or three way loudspeaker designs. Such a superposition of three five-element arrays is shown in Fig. 19. 2. Note that we only need nine microphones in this example, instead of 5 X 3=1 Another interesting application is the use of an array to mitigate discrete noise sources in a room. For this, we need to attach an FIR filter to each of the microphone signal outputs. For any given frequency, one can show that N microphones can produce N-l nulls in the directional characteristic of the array. Similarly, attaching an M-point FIR filter to each of the microphones, we can get these zeros at M-1 frequencies. The weights for these filters have to be adapted, usually under the constraint that the transfer function(frequency nicely in(almost)anechoic environments. Their performance degrades, however, with increasing revedctaio haracteristic)of the array for the desired source is optimally flat. In practical tests, systems of this kind work More information on microphone arrays can be found in Flanagan et al. [1991 ] in particular, they describe how to make arrays adapt to changing talker positions in a room by constantly scanning the room with a moving search beam and by switching the main beam accordingly. Current research issues are, among others, 3-D arrays and how to take advantage of low-order wall reflections Digital Hearing Aids only used hearing aids attempt to compensate for sensorineural (cochlear) hearing loss by delivering an acoustic signal to the external ear canal. As will be pointed out below, the most important problem is how to find the best aid for a given patient. e 2000 by CRC Press LLC
© 2000 by CRC Press LLC ambiguous. The array is steered to an angle q0 by introducing appropriate delays into the N microphone lines. In Eq. (19.1), we can incorporate these delays by letting an (19.2) Here t0 is an overall delay equal to or larger than Nd/c cosq0 that ensures causality, while the second term in Eq. (19.2) cancels the corresponding term in Eq. (19.1) at q =q 0. Due to the axial symmetry of the onedimensional (linear, 1-D) array, the directivity of the array is a figure of revolution around the x-axis. Therefore, in case we want the array to point to a single direction in space, we need a 2-D array. Since most of the energy of typical room noise and the highest level of reverberation in a room is at low frequencies, one would like to use arrays that have their highest directivity (i.e., narrowest beamwidth) at low frequencies. Unfortunately, this need collides with the physics of arrays: the smaller the array relative to the wavelength, the wider the beam. (Again, the corresponding notion in filter theory is that systems with shorter impulse responses have wider bandwidth.) One solution to this problem is to superimpose different-size arrays and filter each output by the appropriate bandpass filter, similar to a crossover network used in two- or threeway loudspeaker designs. Such a superposition of three five-element arrays is shown in Fig. 19.2. Note that we only need nine microphones in this example, instead of 5 2 3 = 15. Another interesting application is the use of an array to mitigate discrete noise sources in a room. For this, we need to attach an FIR filter to each of the microphone signal outputs. For any given frequency, one can show that N microphones can produce N – 1 nulls in the directional characteristic of the array. Similarly, attaching an M-point FIR filter to each of the microphones, we can get these zeros at M – 1 frequencies. The weights for these filters have to be adapted, usually under the constraint that the transfer function (frequency characteristic) of the array for the desired source is optimally flat. In practical tests, systems of this kind work nicely in (almost) anechoic environments. Their performance degrades, however, with increasing reverberation. More information on microphone arrays can be found in Flanagan et al. [1991]; in particular, they describe how to make arrays adapt to changing talker positions in a room by constantly scanning the room with a moving search beam and by switching the main beam accordingly. Current research issues are, among others, 3-D arrays and how to take advantage of low-order wall reflections. Digital Hearing Aids Commonly used hearing aids attempt to compensate for sensorineural (cochlear) hearing loss by delivering an amplifed acoustic signal to the external ear canal. As will be pointed out below, the most important problem is how to find the best aid for a given patient. FIGURE 19.1 A linear array of N microphones (here, N = 5; t = d/c cos q). FIGURE 19.2 Three superimposed linear arrays depicted by large, midsize, and small circles. The largest array covers the low frequencies, the midsize array covers the midrange frequencies, and the smallest covers the high frequencies. t x = 0 x = d x = 2d d 2t 3t 4t q X a e e n j j c nd = - wt 0 + (w/ ) cos q0
Historically, technology has been the limiting factor in hearing aids. Early on, carbon hearing aids provided a limited gain and a narrow, peaky frequency response. Nowadays, hearing aids have a broader bandwidth and a flatter frequency response. Consequently, more people can benefit from the improved technology. With the advent of digital technology, the promise is that even more people would be able to do so. Unfortunately, as will be pointed out below, we have not fulfilled this promise yet Ve distinguish between analog, digitally controlled analog, and digital hearing aids. Analog hearing aids contain only(low-power)pre-amp, filter(s),(optional)automatic gain control (AGC)or compressor, power amp, and output limiter Digitally controlled aids have certain additional components: one kind adds a digital controller to monitor and adjust the analog components of the aid. Another kind contains switched-capacitor circuits that represent sampled signals in analog form, in effect allowing simple discrete-time processing (e. g Itering). Aids with switched-capacitor circuits have a lower power consumption compared to digital aids Digital aids--none are yet commercially available-contain A/D and D/A converters and at least one program mable digital signal processing(DSP)chip, allowing for the use of sophisticated DSP algorithms,(small) microphone arrays, speech enhancement in noise, etc. Experts disagree, however, as to the usefulness of these techniques. To date, the most successful approach seems to be to ensure that all parts of the signal get amplified so that they are clearly audible but not too loud and to "let the brain sort out signal and noise Hearing aids pose a tremendous challenge for the DSP engineer, as well as for the audiologist and acous- tician. Due to the continuing progress in chip tech- 0 nology, the physical size of a digital aid should no 70 In AK SPEECH LEVELS ever, power consumption will still be a problem for quite some time. Besides the obvious necessity of 350 avoiding howling(acoustic feedback), for example, 4 by employing sophisticated models of the electroa coustic transducers acoustic leaks, and ear canal to WEAK CONSONA control the aid accordingly, there is a much more fundamental problem: since DSP allows complex NORMAL schemes of splitting, filtering, compressing, and(re- 10 combining the signal, hearing aid performance is limited by bottlenecks 125255 still limited, however, by the lack of basic knowledge FREQUENCY[kHz about how to map an arbitrary input signal (i.e, speech from a desired speaker) onto the reduced FIGURE 19.3 Peak third-octave band levels of normal to capabilities of the auditory system of the targeted loud speech(hatched)and typical levels/dominant freque wearer of the aid. Hence, the selection and fitting of cies of speech sounds(identifiers). Both can be compared to an appropriate aid becomes the most important issue. the third-octave threshold of normal-hearing people(solid This serious problem is illustrated in Fig. 19.3. line), thresholds for a mildly hearing- impaired person(A) portant to note that for for a severely hearing- impaired person( B), and for a pr a constant level, a linear(no compression) hearing foundly hearing-impaired person(C). For example, for per- aid can be tuned to do as well as a hearing aid with son(A), sibilants and some weak consonants in a normal conversation cannot be perceived. ( Source: H. Levitt, " Speech compression. However, if parameters like signal and discrimination ability in the hearing impaired: spectrum con- background noise levels change dynamically, com- siderations, in The Vanderbilt Hearing-Aid Report: State of pression aids, in particular those with two bands or the Art-Research Needs, G.A. Studebaker and E.H. Bess(Eds. more,should have an advantage. Audiology, Upp While a patient usually has no problem telling 1982, p. 34. with permission. whether setting A or B is"clearer, "adjusting more than just 2-3(usually interdependent) parameters is very time consuming. For a multiparameter aid, an efficient fitting procedure that maximizes a certain objective is needed. Possible objectives are, for example, intelligibility maximization or loudness restoration. The latter objective is assumed in the following known that an impaired ear has a reduced dynamic range. Therefore, the procedure for fitting a patient with a hearing aid could estimate the so-called loudness-growth function(LGF)that relates the sound pressure e 2000 by CRC Press LLC
© 2000 by CRC Press LLC Historically, technology has been the limiting factor in hearing aids. Early on, carbon hearing aids provided a limited gain and a narrow, peaky frequency response. Nowadays, hearing aids have a broader bandwidth and a flatter frequency response. Consequently, more people can benefit from the improved technology. With the advent of digital technology, the promise is that even more people would be able to do so. Unfortunately, as will be pointed out below, we have not fulfilled this promise yet. We distinguish between analog, digitally controlled analog, and digital hearing aids. Analog hearing aids contain only (low-power) pre-amp, filter(s), (optional) automatic gain control (AGC) or compressor, power amp, and output limiter. Digitally controlled aids have certain additional components: one kind adds a digital controller to monitor and adjust the analog components of the aid. Another kind contains switched-capacitor circuits that represent sampled signals in analog form, in effect allowing simple discrete-time processing (e.g., filtering). Aids with switched-capacitor circuits have a lower power consumption compared to digital aids. Digital aids—none are yet commercially available—contain A/D and D/A converters and at least one programmable digital signal processing (DSP) chip, allowing for the use of sophisticated DSP algorithms, (small) microphone arrays, speech enhancement in noise, etc. Experts disagree, however, as to the usefulness of these techniques. To date, the most successful approach seems to be to ensure that all parts of the signal get amplified so that they are clearly audible but not too loud and to “let the brain sort out signal and noise.” Hearing aids pose a tremendous challenge for the DSP engineer, as well as for the audiologist and acoustician. Due to the continuing progress in chip technology, the physical size of a digital aid should no longer be a serious problem in the near future; however, power consumption will still be a problem for quite some time. Besides the obvious necessity of avoiding howling (acoustic feedback), for example, by employing sophisticated models of the electroacoustic transducers, acoustic leaks, and ear canal to control the aid accordingly, there is a much more fundamental problem: since DSP allows complex schemes of splitting, filtering, compressing, and (re- ) combining the signal, hearing aid performance is no longer limited by bottlenecks in technology. It is still limited, however, by the lack of basic knowledge about how to map an arbitrary input signal (i.e., speech from a desired speaker) onto the reduced capabilities of the auditory system of the targeted wearer of the aid. Hence, the selection and fitting of an appropriate aid becomes the most important issue. This serious problem is illustrated in Fig. 19.3. It is important to note that for speech presented at a constant level, a linear (no compression) hearing aid can be tuned to do as well as a hearing aid with compression. However, if parameters like signal and background noise levels change dynamically, compression aids, in particular those with two bands or more, should have an advantage. While a patient usually has no problem telling whether setting A or B is “clearer,” adjusting more than just 2–3 (usually interdependent) parameters is very time consuming. For a multiparameter aid, an efficient fitting procedure that maximizes a certain objective is needed. Possible objectives are, for example, intelligibility maximization or loudness restoration. The latter objective is assumed in the following. It is known that an impaired ear has a reduced dynamic range. Therefore, the procedure for fitting a patient with a hearing aid could estimate the so-called loudness-growth function (LGF) that relates the sound pressure FIGURE 19.3 Peak third-octave band levels of normal to loud speech (hatched) and typical levels/dominant frequencies of speech sounds (identifiers). Both can be compared to the third-octave threshold of normal-hearing people (solid line), thresholds for a mildly hearing-impaired person (A), for a severely hearing-impaired person (B), and for a profoundly hearing-impaired person (C). For example, for person (A), sibilants and some weak consonants in a normal conversation cannot be perceived. (Source: H. Levitt, “Speech discrimination ability in the hearing impaired: spectrum considerations,” in The Vanderbilt Hearing-Aid Report: State of the Art-Research Needs, G.A. Studebaker and F.H. Bess (Eds.), Monographs in Contemporary Audiology, Upper Darby, Pa., 1982, p. 34. With permission.)
FIGURE 19.4 Measuring and using transfer functions of the external ear for binaural mixing(Fir finite impulse response).( Source: E M. Wenzel, Localization in virtual acoustic displays, Presence, vol. 1, p. 91, 1992. With permission level of a specific(band-limited) sound to its loudness. An efficient way of measuring the LGF is described by Allen et al.[ 1990]. Once the LGF of an impaired ear is known, a multiband hearing aid can implement the necessary compression for each band [ Villchur, 1973]. Note, however, that this assumes that interaction tween the bands can be neglected (problem of summation of partial loudnesses). This might not be vali for aids with a large number of bands. Other open questions include the choice of widths and filter shape of the bands, and optimization of dynamic aspects of the compression (e. g time constants). For aids with just two bands, the crossover frequency is a crucial parameter that is difficult to optimize Spatial Processing In spatial processing, audio signals are modified to give them new spatial attributes, such as, for example, the perception of having been recorded in a specific concert hall. The auditory system-using only the two ears as inputs-is capable of perceiving the direction and distance of a sound source with a high degree of accuracy, by exploiting binaural and monaural spectral cues. Wave propagation in the ear canal is essentially one dimensional. Hence, the 3-D spatial information is coded by sound diffraction into spectral information before he sound enters the ear canal. The sound diffraction is caused by the head/torso(on the order of 20-dB and 600-Hs interaural level difference and delay, respectively) and at the two pinnae(auriculae); see, for example, Shaw [1980]. Binaural techniques like the one discussed below can be used for evaluating room and concert hall acoustics(optionally in reduced-scale model rooms using a miniature dummy head), for noise assessment (e.g, in cars), and for"Kunstkopfstereophonie"(dummy-head stereophony ). In addition, there are techniques for loudspeaker reproduction (like" Q-Sound")that try to extend the range in horizontal angle of traditional stereo speakers by using interaural cross cancellation. Largely an open question is how to reproduce spatial information for large audiences, for example, in movie theaters. Figure 19.4 illustrates the technique for filtering a single-channel source using measured headrelated transfer functions, in effect, creating a virtual sound source in a given direction of the listeners auditory space(assuming plane waves, i. e, infinite source distance). On the left in this figure, the measurement of head-related transfer functions is shown. Focusing on the left ear for a moment(subscript I), we need to estimate the so-called free field transfer function(subscript ff) for given angles of incidence in the horizontal plane(azimuth (p)and vertical plane(elevation 8) H GO, o, 8)=Probe (jo, p, 8)/Pe Gjo) (19.3) where Probe. is the Fourier transform of the sound pressure measured in the subjects left ear, and Pref is the Fourier transform of the pressure measured at a suitable reference point in the free field without the subject ing present(e.g, at the midpoint between the two ears).( Note that Prf is independent of the direction of sound incidence since we assume an anechoic environment. )The middle of Fig. 19.4 depicts the convolution e 2000 by CRC Press LLC
© 2000 by CRC Press LLC level of a specific (band-limited) sound to its loudness. An efficient way of measuring the LGF is described by Allen et al. [1990]. Once the LGF of an impaired ear is known, a multiband hearing aid can implement the necessary compression for each band [Villchur, 1973]. Note, however, that this assumes that interactions between the bands can be neglected (problem of summation of partial loudnesses). This might not be valid for aids with a large number of bands. Other open questions include the choice of widths and filter shape of the bands, and optimization of dynamic aspects of the compression (e.g., time constants). For aids with just two bands, the crossover frequency is a crucial parameter that is difficult to optimize. Spatial Processing In spatial processing, audio signals are modified to give them new spatial attributes, such as, for example, the perception of having been recorded in a specific concert hall. The auditory system—using only the two ears as inputs—is capable of perceiving the direction and distance of a sound source with a high degree of accuracy, by exploiting binaural and monaural spectral cues. Wave propagation in the ear canal is essentially onedimensional. Hence, the 3-D spatial information is coded by sound diffraction into spectral information before the sound enters the ear canal. The sound diffraction is caused by the head/torso (on the order of 20-dB and 600-ms interaural level difference and delay, respectively) and at the two pinnae (auriculae); see, for example, Shaw [1980]. Binaural techniques like the one discussed below can be used for evaluating room and concerthall acoustics (optionally in reduced-scale model rooms using a miniature dummy head), for noise assessment (e.g., in cars), and for “Kunstkopfstereophonie” (dummy-head stereophony). In addition, there are techniques for loudspeaker reproduction (like “Q-Sound”) that try to extend the range in horizontal angle of traditional stereo speakers by using interaural cross cancellation. Largely an open question is how to reproduce spatial information for large audiences, for example, in movie theaters. Figure 19.4 illustrates the technique for filtering a single-channel source using measured headrelated transfer functions, in effect, creating a virtual sound source in a given direction of the listener’s auditory space (assuming plane waves, i.e., infinite source distance). On the left in this figure, the measurement of head-related transfer functions is shown. Focusing on the left ear for a moment (subscript l), we need to estimate the so-called free- field transfer function (subscript ff) for given angles of incidence in the horizontal plane (azimuth j) and vertical plane (elevation d): (19.3) where Pprobe,l is the Fourier transform of the sound pressure measured in the subject’s left ear, and Pref is the Fourier transform of the pressure measured at a suitable reference point in the free field without the subject being present (e.g., at the midpoint between the two ears). (Note that Pref is independent of the direction of sound incidence since we assume an anechoic environment.) The middle of Fig. 19.4 depicts the convolution FIGURE 19.4 Measuring and using transfer functions of the external ear for binaural mixing (FIR = finite impulse response). (Source: E.M. Wenzel, Localization in virtual acoustic displays, Presence, vol. 1, p. 91, 1992. With permission.) H j P j P j ff,l probe,l ref ( w, j, d) = ( w, j, d)/ ( w)