Audio signal processing

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waveslongitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.


The motivation for audio signal processing began at the beginning of the 20th century with inventions like the telephone, phonograph, and radio that allowed for the transmission and storage of audio signals. Audio processing was necessary for early radio broadcasting, as there were many problems with studio-to-transmitter links.[1] The theory of signal processing and its application to audio was largely developed at Bell Labs in the mid 20th century. Claude Shannon and Harry Nyquist's early work on communication theory, sampling theory and pulse-code modulation (PCM) laid the foundations for the field. In 1957, Max Mathews became the first person to synthesize audio from a computer, giving birth to computer music.

Major developments in digital audio coding and audio data compression include differential pulse-code modulation (DPCM) by C. Chapin Cutler at Bell Labs in 1950,[2] linear predictive coding (LPC) by Fumitada Itakura (Nagoya University) and Shuzo Saito (Nippon Telegraph and Telephone) in 1966,[3] adaptive DPCM (ADPCM) by P. Cummiskey, Nikil S. Jayant and James L. Flanagan at Bell Labs in 1973,[4][5] discrete cosine transform (DCT) coding by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974,[6] and modified discrete cosine transform (MDCT) coding by J. P. Princen, A. W. Johnson and A. B. Bradley at the University of Surrey in 1987.[7] LPC is the basis for perceptual coding and is widely used in speech coding,[8] while MDCT coding is widely used in modern audio coding formats such as MP3[9] and Advanced Audio Coding (AAC).[10]

Analog signalsEdit

An analog audio signal is a continuous signal represented by an electrical voltage or current that is “analogous” to the sound waves in the air. Analog signal processing then involves physically altering the continuous signal by changing the voltage or current or charge via electrical circuits.

Historically, before the advent of widespread digital technology, analog was the only method by which to manipulate a signal. Since that time, as computers and software have become more capable and affordable, digital signal processing has become the method of choice. However, in music applications, analog technology is often still desirable as it often produces nonlinear responses that are difficult to replicate with digital filters.

Digital signalsEdit

A digital representation expresses the audio waveform as a sequence of symbols, usually binary numbers. This permits signal processing using digital circuits such as digital signal processors, microprocessors and general-purpose computers. Most modern audio systems use a digital approach as the techniques of digital signal processing are much more powerful and efficient than analog domain signal processing.[11]

Application areasEdit

Processing methods and application areas include storage, data compression, music information retrieval, speech processing, localization, acoustic detection, transmission, noise cancellation, acoustic fingerprinting, sound recognition, synthesis, and enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.).

Audio broadcastingEdit

Audio signal processing is used when broadcasting audio signals in order to enhance their fidelity or optimize for bandwidth or latency. In this domain, the most important audio processing takes place just before the transmitter. The audio processor here must prevent or minimize overmodulation, compensate for non-linear transmitters (a potential issue with medium wave and shortwave broadcasting), and adjust overall loudness to desired level.

Active noise controlEdit

Active noise control is a technique designed to reduce unwanted sound. By creating a signal that is identical to the unwanted noise but with the opposite polarity, the two signals cancel out due to destructive interference.

Audio synthesisEdit

Audio synthesis is the electronic generation of audio signals. A musical instrument that accomplishes this is called a synthesizer. Synthesizers can either imitate sounds or generate new ones. Audio synthesis is also used to generate human speech using speech synthesis.

Audio effectsEdit

Audio effects are systems designed to alter how an audio signal sounds. Unprocessed audio is metaphorically referred to as dry, while processed audio is referred to as wet.[12]

  • delay or echo - To simulate the effect of reverberation in a large hall or cavern, one or several delayed signals are added to the original signal. To be perceived as echo, the delay has to be of order 35 milliseconds or above. Short of actually playing a sound in the desired environment, the effect of echo can be implemented using either digital or analog methods. Analog echo effects are implemented using tape delays or bucket-brigade devices. When large numbers of delayed signals are mixed a reverberation effect is produced; The resulting sound has the effect of being presented in a large room.
  • flanger - to create an unusual sound, a delayed signal is added to the original signal with a continuously variable delay (usually smaller than 10 ms). This effect is now done electronically using DSP, but originally the effect was created by playing the same recording on two synchronized tape players, and then mixing the signals together. As long as the machines were synchronized, the mix would sound more-or-less normal, but if the operator placed their finger on the flange of one of the players (hence "flanger"), that machine would slow down and its signal would fall out-of-phase with its partner, producing a phasing comb filter effect. Once the operator took his finger off, the player would speed up until it was back in phase with the master, and as this happened, the phasing effect would appear to slide up the frequency spectrum. This phasing up-and-down the register can be performed rhythmically.
  • phaser - another way of creating an unusual sound; the signal is split, a portion is filtered with a variable all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed to produce a comb filter. The phaser effect was originally a simpler implementation of the flanger effect since delays were difficult to implement with analog equipment.
  • chorus - a delayed version of the signal is added to the original signal. The delay has to be short in order not to be perceived as echo, but above 5 ms to be audible. If the delay is too short, it will destructively interfere with the un-delayed signal and create a flanging effect. Often, the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices.
  • equalization - frequency response is adjusted using audio filter(s) to produce desired spectral characteristics. Frequency ranges can be emphasized or attenuated using low-pass, high-pass, band-pass or band-stop filters. Moderate use of equalization can be used to fine-tune the tonal quality of a recording; extreme use of equalization, such as heavily cutting a certain frequency can create more unusual effects. Band-pass filtering of voice can simulate the effect of a telephone because telephones use band-pass filters.
  • overdrive effects can be used to produce distorted sounds, and increase loudness. The most basic overdrive effect involves clipping the signal when its absolute value exceeds a certain threshold.
  • timescale-pitch modification - this effect shifts a signal up or down in pitch. For example, a signal may be shifted an octave up or down. Blending the original signal with shifted duplicate(s) can create harmonization. Another application of pitch shifting is pitch correction where a musical signal is adjusted to improve intonation. The complement of pitch shift is timescale modification, that is, the process of changing the speed of an audio signal without affecting its pitch.
  • resonators - emphasize harmonic frequency content on specified frequencies. These may be created from parametric equation or from delay-based comb-filters.
  • robotic voice effects are used to make an actor's voice sound like a synthesized human voice.
  • ring modulation is an effect made famous by Doctor Who's Daleks and commonly used throughout sci-fi.
  • dynamic range compression - the control of the dynamic range of a sound to avoid unintentional or undesirable fluctuation in level. Dynamic range compression is not to be confused with audio data compression, where the amount of data is reduced without affecting the amplitude of the sound it represents.
  • 3D audio effects - placement of sounds outside the spatial range available through stereo or surround imaging.
  • wave field synthesis - a spatial audio rendering technique for the creation of virtual acoustic environments.
  • De-esser - control of sibilance in speech and singing.

See alsoEdit


  1. ^ Atti, Andreas Spanias, Ted Painter, Venkatraman (2006). Audio signal processing and coding ([Online-Ausg.] ed.). Hoboken, NJ: John Wiley & Sons. p. 464. ISBN 0-471-79147-4.
  2. ^ US patent 2605361, C. Chapin Cutler, "Differential Quantization of Communication Signals", issued 1952-07-29 
  3. ^ Gray, Robert M. (2010). "A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol" (PDF). Found. Trends Signal Process. 3 (4): 203–303. doi:10.1561/2000000036. ISSN 1932-8346.
  4. ^ P. Cummiskey, Nikil S. Jayant, and J. L. Flanagan, "Adaptive quantization in differential PCM coding of speech", Bell Syst. Tech. J., vol. 52, pp. 1105—1118, Sept. 1973
  5. ^ Cummiskey, P.; Jayant, Nikil S.; Flanagan, J. L. (1973). "Adaptive quantization in differential PCM coding of speech". The Bell System Technical Journal. 52 (7): 1105–1118. doi:10.1002/j.1538-7305.1973.tb02007.x. ISSN 0005-8580.
  6. ^ Nasir Ahmed; T. Natarajan; Kamisetty Ramamohan Rao (January 1974). "Discrete Cosine Transform" (PDF). IEEE Transactions on Computers. C-23 (1): 90–93. doi:10.1109/T-C.1974.223784.
  7. ^ J. P. Princen, A. W. Johnson und A. B. Bradley: Subband/transform coding using filter bank designs based on time domain aliasing cancellation, IEEE Proc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2161–2164, 1987.
  8. ^ Schroeder, Manfred R. (2014). "Bell Laboratories". Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder. Springer. p. 388. ISBN 9783319056609.
  9. ^ Guckert, John (Spring 2012). "The Use of FFT and MDCT in MP3 Audio Compression" (PDF). University of Utah. Retrieved 14 July 2019.
  10. ^ Brandenburg, Karlheinz (1999). "MP3 and AAC Explained" (PDF). Archived (PDF) from the original on 2017-02-13.
  11. ^ Zölzer, Udo (1997). Digital Audio Signal Processing. John Wiley and Sons. ISBN 0-471-97226-6.
  12. ^ Hodgson, Jay (2010). Understanding Records, p.95. ISBN 978-1-4411-5607-5.

Further readingEdit