User:Meinard.mueller/sandbox/Music alignment

Music can be described and represented in many different ways including sheet music, symbolic representations, and [audio signal|audio] recordings. For each of these representations, there may exist different versions that correspond to the same musical work. The general goal of music alignment (sometimes also referred to as music synchronization) is to automatically link the various data streams, thus interrelating the multiple information sets related to a given musical work. More precisely, music alignment is taken to mean a procedure which, for a given position in one representation of a piece of music, determines the corresponding position within another representation. ^[1] On the topmost figure on the right, each such an alignment is visualized by a red bidirectional arrow.

Such synchronization results form the basis for novel interfaces that allow users to access, search, and browse musical content in a convenient way.

Given two different music representations, typical music alignment approaches proceed in two steps. In the first step, the two representations are transformed into sequences of suitable features. In general, such feature representations need to find a compromise between two conflicting goals. On the one hand, features should show a large degree of robustness to variations that are to be left unconsidered for the task at hand. On the other hand, features should capture enough characteristic information to accomplish the given task. In the second step, the derived feature sequences have to be brought into (temporal) correspondence. To this end, techniques related to dynamic time warping (DTW) or hidden Markov models (HMMs) are used to compute an optimal alignment between two given feature sequences.

Basic Procedure

First theme of Symphony No. 5 by Ludwig van Beethoven in a sheet music, audio, and piano-roll representation. The red bidirectional arrows indicate the aligned time positions of corresponding note events in the different representations.

Music alignment and related synchronization tasks have been studied extensively within the field of music information retrieval. In the following, we give some pointers to the literature. Depending upon the respective types of music representations, one can distinguish between various synchronization scenarios. For example, audio alignment refers to the task of temporally aligning two different audio recordings of a piece of music. Similarly, the goal of score–audio alignment is to coordinate note events given in the score representation with audio data.

Related Tasks

In the offline scenario, the two data streams to be aligned are known prior to the actual alignment. In this case, one can use global optimization procedures such as dynamic time warping (DTW) to find an optimal alignment. In general, it is harder to deal with scenarios where the data streams are to be processed online. One prominent online scenario is known as score following, where a musician is performing a piece according to a given musical score. The goal is then to identify the currently played musical events depicted in the score with high accuracy and low latency ^[2],^[3]. In this scenario, the score is known as a whole in advance, but the performance is known only up to the current point in time. In this context, alignment techniques such as hidden Markov models or particle filters have been employed, where the current score position and tempo are modeled in a statistical sense ^[4], ^[5]. As opposed to classical DTW, such an online synchronization procedure inherently has a running time that is linear in the duration of the performed version. However, as a main disadvantage, an online strategy is very sensitive to local tempo variations and deviations from the score---once the procedure is out of sync, it is very hard to recover and return to the right track. A further online synchronization problem is known as automatic accompaniment. Having a solo part played by a musician, the task of the computer is to accompany the musician according to a given score by adjusting the tempo and other parameters in real time. Such systems were already proposed some decades ago (see ^[6], ^[7] and ^[8] for an overview).

Automated accompaniment - the computer simulates the processing of playing along with an instrumental or vocal soloist or ensemble.

Automated score following - the computer has the task of tracking the position in the score while listening to a performance.

References

^ Müller, Meinard (2015). Fundamentals of music processing audio, analysis, algorithms, applications. Springer. ISBN 978-3-319-21944-8.
^ Christensen, Mads Græsbøll; Jakobsson, Andreas (2009). "Multi-Pitch Estimation". Synthesis Lectures on Speech and Audio Processing. 5 (1): 1–160. doi:10.2200/S00178ED1V01Y200903SAP005. ISSN 1932-121X.
^ Ono, Nobutaka; Miyamoto, Kenichi; Kameoka, Hirokazu; Sagayama, Shigeki (2008). "A real-time equalizer of harmonic and percussive components in music signals". Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 139–144.
^ Driedger, Jonathan; Muller, Meinard; Ewert, Sebastian (2014). "Improving Time-Scale Modification of Music Signals Using Harmonic-Percussive Separation". IEEE Signal Processing Letters. 21 (1): 105–109. doi:10.1109/LSP.2013.2294023. ISSN 1070-9908.
^ Klapuri, Anssi; Davy, Manuel (2006). "Signal Processing Methods for Music Transcription". doi:10.1007/0-387-32845-9. {{cite journal}}: Cite journal requires |journal= (help)
^ Dolson, Mark (1986). "The Phase Vocoder: A Tutorial". Computer Music Journal. 10 (4): 14–27. doi:10.2307/3680093. {{cite journal}}: |access-date= requires |url= (help)
^ Driedger, Jonathan; Grohganz, Harald; Prätzlich, Thomas; Ewert, Sebastian; Müller, Meinard (2013). "Score-informed audio decomposition and applications": 541–544. doi:10.1145/2502081.2502143. {{cite journal}}: Cite journal requires |journal= (help)
^ Paulus, Jouni (2009). Signal Processing Methods for Drum Transcription and Music Structure Analysis (Ph.D.). Tampere University of Technology.

[Mueller15FMP-1] Müller, Meinard (2015). Fundamentals of music processing audio, analysis, algorithms, applications. Springer. ISBN 978-3-319-21944-8.

[ChristensenJakobsson2009-2] Christensen, Mads Græsbøll; Jakobsson, Andreas (2009). "Multi-Pitch Estimation". Synthesis Lectures on Speech and Audio Processing. 5 (1): 1–160. doi:10.2200/S00178ED1V01Y200903SAP005. ISSN 1932-121X.

[3] Ono, Nobutaka; Miyamoto, Kenichi; Kameoka, Hirokazu; Sagayama, Shigeki (2008). "A real-time equalizer of harmonic and percussive components in music signals". Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 139–144.

[DriedgerMuller2014-4] Driedger, Jonathan; Muller, Meinard; Ewert, Sebastian (2014). "Improving Time-Scale Modification of Music Signals Using Harmonic-Percussive Separation". IEEE Signal Processing Letters. 21 (1): 105–109. doi:10.1109/LSP.2013.2294023. ISSN 1070-9908.

[KlapuriDavy2006-5] Klapuri, Anssi; Davy, Manuel (2006). "Signal Processing Methods for Music Transcription". doi:10.1007/0-387-32845-9. {{cite journal}}: Cite journal requires |journal= (help)

[6] Dolson, Mark (1986). "The Phase Vocoder: A Tutorial". Computer Music Journal. 10 (4): 14–27. doi:10.2307/3680093. {{cite journal}}: |access-date= requires |url= (help)

[DriedgerGrohganz2013-7] Driedger, Jonathan; Grohganz, Harald; Prätzlich, Thomas; Ewert, Sebastian; Müller, Meinard (2013). "Score-informed audio decomposition and applications": 541–544. doi:10.1145/2502081.2502143. {{cite journal}}: Cite journal requires |journal= (help)

[Paulus09-8] Paulus, Jouni (2009). Signal Processing Methods for Drum Transcription and Music Structure Analysis (Ph.D.). Tampere University of Technology.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]