Open main menu

Linear predictive coding

Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.[1][2] It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides highly accurate estimates of speech parameters. LPC is the most widely used method in speech coding and speech synthesis.

OverviewEdit

LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (voiced sounds), with occasional added hissing and popping sounds (sibilants and plosive sounds). Although apparently crude, this model is actually a close approximation of the reality of speech production. The glottis (the space between the vocal folds) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat and mouth) forms the tube, which is characterized by its resonances, which give rise to formants, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives.

LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modelled signal is called the residue.

The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech.

Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally, 30 to 50 frames per second give an intelligible speech with good compression.

Early historyEdit

The origins of linear predictive coding (LPC) date back to 1966, when the concept was first proposed by Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone (NTT). They described an approach to automatic phoneme discrimination that involved the first maximum likelihood approach to speech coding. In 1967, John Burg outlined a maximum entropy approach. In 1969, Itakura and Saito introduced partial correlation (PARCOR), Glen Culler proposed real-time speech encoding, and Bishnu S. Atal presented an LPC speech coder at the Annual Meeting of the Acoustical Society of America.[3] In 1971, realtime LPC using 16-bit LPC hardware was demonstrated by Philco-Ford; four units were sold.

LPC was the basis for voice-over-IP (VoIP) technology.[3] In 1972, Bob Kahn of ARPA, with Jim Forgie (Lincoln Laboratory, LL) and Dave Walden (BBN Technologies), started the first developments in packetized speech, which would eventually lead to voice-over-IP technology. In 1973, according to Lincoln Laboratory informal history, the first real-time 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974, the first real-time two-way LPC packet speech communication was accomplished over the ARPANET at 3500 bit/s between Culler-Harrison and Lincoln Laboratory. In 1976, the first LPC conference took place over the ARPANET using the Network Voice Protocol, between Culler-Harrison, ISI, SRI, and LL at 3500 bit/s.

LPC technology was advanced by Bishnu Atal and Manfred Schroeder during the 1970s–1980s.[3] In 1978, Atal and Vishwanath et al. of BBN developed the first variable-rate LPC algorithm.[3] The same year, Atal and Manfred R. Schroeder at Bell Labs proposed an LPC speech codec called adaptive predictive coding, which used a psychoacoustic coding algorithm exploiting the masking properties of the human ear.[4][5] This later became the basis for the perceptual coding technique used by the MP3 audio compression format, introduced in 1993.[4] Code-excited linear prediction (CELP) was developed by Schroeder and Atal in 1985.[6]

LPC coefficient representationsEdit

LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see linear prediction for a definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable.

There are more advanced representations such as log area ratios (LAR), line spectral pairs (LSP) decomposition and reflection coefficients. Of these, especially LSP decomposition has gained popularity since it ensures the stability of the predictor, and spectral errors are local for small coefficient deviations.

ApplicationsEdit

LPC is the most widely used method in speech coding and speech synthesis.[7] It is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, such as in the GSM standard, for example. It is also used for secure wireless, where voice must be digitized, encrypted and sent over a narrow voice channel; an early example of this is the US government's Navajo I.

LPC synthesis can be used to construct vocoders where musical instruments are used as an excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in electronic music. Paul Lansky made the well-known computer music piece notjustmoreidlechatter using linear predictive coding. [1] A 10th-order LPC was used in the popular 1980s Speak & Spell educational toy.

LPC predictors are used in Shorten, MPEG-4 ALS, FLAC, SILK audio codec, and other lossless audio codecs.

LPC is receiving some attention as a tool for use in the tonal analysis of violins and other stringed musical instruments.[8]

See alsoEdit

NotesEdit

  1. ^ Deng, Li; Douglas O'Shaughnessy (2003). Speech processing: a dynamic and optimization-oriented approach. Marcel Dekker. pp. 41–48. ISBN 978-0-8247-4040-5.
  2. ^ Beigi, Homayoon (2011). Fundamentals of Speaker Recognition. Berlin: Springer-Verlag. ISBN 978-0-387-77591-3.
  3. ^ a b c d Gray, Robert M. (2010). "A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol" (PDF). Found. Trends Signal Process. 3 (4): 203–303. doi:10.1561/2000000036. ISSN 1932-8346.
  4. ^ a b Schroeder, Manfred R. (2014). "Bell Laboratories". Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder. Springer. p. 388. ISBN 9783319056609.
  5. ^ Atal, B.; Schroeder, M. (1978). "Predictive coding of speech signals and subjective error criteria". ICASSP '78. IEEE International Conference on Acoustics, Speech, and Signal Processing. 3: 573–576. doi:10.1109/ICASSP.1978.1170564.
  6. ^ Schroeder, Manfred R.; Atal, Bishnu S. (1985). "Code-excited linear prediction (CELP): High-quality speech at very low bit rates". ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing. 10: 937–940. doi:10.1109/ICASSP.1985.1168147.
  7. ^ Gupta, Shipra (May 2016). "Application of MFCC in Text Independent Speaker Recognition" (PDF). International Journal of Advanced Research in Computer Science and Software Engineering. 6 (5): 805-810 (806). ISSN 2277-128X. Retrieved 18 October 2019.
  8. ^ Tai, Hwan-Ching; Chung, Dai-Ting (June 14, 2012). "Stradivari Violins Exhibit Formant Frequencies Resembling Vowels Produced by Females". Savart Journal. 1 (2).

ReferencesEdit

Further readingEdit

External linksEdit