Artificial Intelligence and Music

Department of

Computational Perception

Home Mission Teaching People Research Media Awards Impressum

Sponsor: Austrian Science Fund (FWF)

Project Type: Wittgenstein Prize 2009 to Gerhard Widmer

Project Number: Z159

Duration: 2010 - 2017

Brief Summary for the General Public (from the Final Project Report)

FWF Project Z159 is the result of a Wittgenstein Prize awarded to Gerhard Widmer in 2009. The project was supported by the generous sum of EUR 1.4 million, and its purpose was to greatly advance our research in the intersection of computer science, Artificial Intelligence (AI), and music.

The general goal of our research is to develop computer systems that can 'listen' to music, develop a basic 'understanding' of the contents and meaning of musical signals, and learn to recognise, classify, synchronise, and manipulate music in 'intelligent' ways and this way support many important practical applications in the digital music world.

This particular research project focused on two grand goals: one, to teach computers to recognise musically relevant patterns and structure in music recordings, much as human listeners do when listening to music, at many different levels -- e.g., to recognise event onsets in music, identify beat, rhythm, tempo, metrical structure, harmonies, instruments, voices -- and to recognise musical pieces and track them (i.e., follow along in the sheet music, as musicians do) in real time (live). The outcome of this work are numerous new computer listening algorithms that are among the (or are the) best in the world for these tasks, as was also shown by winning first prizes in many international scientific competitions (see our `Hall of Fame'). Some of these also algorithms have also found their way into real commercial applications in the digital media world (e.g., automatic media monitoring, radio broadcast analysis, and music search and recommendation services.)

The second goal was to go a level 'deeper', developing computer methods that can help us get a deeper understanding of the 'meaning' of music, its expressive aspects, how music becomes 'human music' through the artistic act of interpretation and performance. Here, we greatly advanced previous work on computer systems that investigate the art of expressive music performance, analysing performances by great human musicians and learning to describe and predict how music needs to be played (e.g., in terms of timing, dynamics, articulation) so as to sound 'musical' and 'natural' to us. The result of this strand of research are computer programs that helped discover and describe interesting details about the art of great pianists (these were also published in the international world of musicology), and programs that can learning themselves to play music in musically meaningful and 'expressive' ways, winning, among other things, an international Computer Piano Performance Contest (RENCON 2011). The last result in this respect (Spring 2017) is that in a blind listening test, our computer's performance of a piano piece was judged by human listeners as more 'human' than that of an actual concert pianist ...

The work started in the Wittgenstein Project Z159 will be continued and brought to a great synthesis in a new long-term project ("Con Espressione") funded by the European Research Council (ERC) -- a project that would not have been possible without the support of the Wittgenstein project, and the FWF.

Results / Publications

MIR: Towards Real-time Computational Music Perception

Arzt, A., Böck, S. and Widmer, G. (2012).
Fast Identification of Piece and Score Position Via Symbolic Fingerprinting.
In Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR 2012), Porto, Portugal.
PDF

Arzt, A., Widmer, G. and Dixon, S. (2012).
Adaptive Distance Normalization for Real-Time Music Tracking.
In Proceedings of the 20th European Signal Processing Conference (EUSIPCO 2012), Bucharest, Romania.
PDF

Arzt, A., Widmer, G. Böck, S., Sonnleitner, R. and Frostel, H. (2012).
Towards a Complete Classical Music Companion.
In Proceedings of the 20th European Conference on Artificial Intelligence (ECAI 2012), Montpellier, France.
PDF

Arzt, A., Widmer G. and Sonnleitner, R. (2014).
Tempo- and Transposition-invariant Identification of Piece and Score Position.
In Proceedings of 15th International Society for Music Information Retrieval Conference (ISMIR 2014), Taipei, Taiwan.
PDF

Arzt, A., Liem, C. and Widmer, G. (2014).
A Tempo- and Transposition-Invariant Piano Music Companion.
Demonstration Session, 15th International Society for Music Information Retrieval Conference (ISMIR 2014), Taipei, Taiwan.
PDF

Arzt A., Böck S., Flossmann S., Frostel H., Gasser M., Liem C.C.S. and Widmer G. (2014).
The Piano Music Companion.
In Proceedings of the Conference on Prestigious Applications of Intelligent Systems (PAIS 2014), Prague, Czech Republic.
Best Demonstration Award.
PDF

Arzt, A., Böck, S., Flossmann, S., Frostel, H., Gasser, M., and Widmer, G. (2014).
The Complete Classical Music Companion v0.9.
In Proceedings of the 53rd AES Conference on Semantic Audio, Audio Engineering Society, London, Jan. 2014.
Best Demonstration Award.
PDF

Arzt, A. and Widmer, G. (2015)
Real-time Music Tracking Using Multiple Performances as a Reference.
In Proceedings of the 16th International Society of Information Retrieval Conference (ISMIR 2015), Malaga, Spain.
Best Paper Award.
PDF

Arzt, A., Frostel, H., Gadermaier, T., Gasser, M., Grachten, M. and Widmer, G. (2015).
Artificial Intelligence in the Concertgebouw.
In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2015), Buenos Aires, Argentina.
PDF

Arzt, A., Goebl, W. and Widmer, G. (2015).
Flexible Score Following: The Piano Music Companion and Beyond.
In Vienna Talk on Music Acoustics (2015), Vienna, Austria.
PDF

Böck, S., Krebs, F. and Schedl, M. (2012).
Evaluating the Online Capabilities of Onset Detection Methods.
In Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR 2012), Porto, Portugal.
PDF

Böck, S., Arzt, A., Krebs, F. and Schedl, M. (2012).
Online Real-time Onset Detection with Recurrent Neural Networks.
In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx-12), York, UK.
PDF

Böck, S., Krebs, F. and Widmer, G. (2014).
A Multi-Model Approach to Beat Tracking Considering Heterogeneous Music Styles.
In Proceedings of 15th International Society for Music Information Retrieval Conference (ISMIR 2014), Taipei, Taiwan.
PDF

Böck, S., Krebs, F. and Widmer, G. (2015).
Accurate Tempo Estimation based on Recurrent Neural Networks and Resonating Comb Filters.
In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR 2015), Malaga, Spain.
PDF

Böck, S., Krebs, F. and Widmer, G. (2016).
Joint Beat and Downbeat Tracking with Recurrent Neural Networks.
In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), New York, USA.
PDF

Böck, S., Korzeniowski, F., Schlüter, J., Krebs, F. and Widmer, G. (2016).
madmom: A New Python Audio and Music Singal Processing Library.
In Proceedings of the 2016 ACM Multimedia Conference, Amsterdam, the Netherlands.
PDF (arxiv)

Dittmar, C., Lehner, B., Prätzlich, T., Müller, M. and Widmer, G. (2015).
Cross-Version Singing Voice Detection in Classical Opera Recordings.
In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR 2015), Malaga, Spain.
PDF

Dzhambazov, G. (2014).
Towards a Drum Transcription System Aware of Bar Position.
In Proceedings of the 53rd AES Conference on Semantic Audio, Audio Engineering Society, London, Jan. 2014.
AES E-Library

Eghbal-zadeh, H., Lehner, B., Schedl, M. and Widmer, G. (2015).
I-Vectors for Timbre-Based Music Similarity and Music Artist Classification.
In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Malaga, Spain.
PDF

Eghbal-zadeh, H., Dorfer, M. and Widmer, G. (2016).
A Cosine-Distance based Neural Network for Music Artist Recognition Using Raw I-vector Features.
In Proceedings of the 19th International Conference on Digital Audio Effects (DAFx16), Brno, Czech Republic.
PDF

Eghbal-zadeh, H. and Widmer, G. (2016).
Noise-Robust Music Artist Recognition Using I-vector Features.
In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR 2016). New York, USA.
PDF

Eghbal-zadeh, H., Lehner, B., Dorfer, M. and Widmer, G. (2016).
A Hybrid Approach Using Binaural i-vectors and Deep Convolutional Neural Networks.
IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE).
https://www.cs.tut.fi/sgn/arg/dcase2016/documents/challenge_technical_reports/Task1/Eghbal-Zadeh_2016_task1.pdf
PDF

Eghbal-zadeh, H., Lehner, B., Dorfer, M. and Widmer, G. (2017).
A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification.
In Proceedings of the 25th European Signal Processing Conference (EUSIPCO 2017), Kos, Greece.
PDF (arxiv)

Frostel, H. Arzt, A. and Widmer, G. (2011).
The Vowel Worm: Real-Time Mapping and Visualisation of Sung Vowels in Music.
In Proceedings of the 8th Sound and Music Computing Conference (SMC 2011), Padova, Italy.
PDF

Holzapfel, A., Flexer, A. and Widmer, G. (2011).
Improving Tempo-Sensitive and Tempo-Robust Descriptors for Rhythmic Similarity.
In Proceedings of the 8th Sound and Music Computing Conference (SMC 2011), Padova, Italy.
PDF

Holzapfel, A., Krebs, F., and Srinivasamurthy, A. (2014).
Tracking the `Odd': Meter Inference in a Culturally Diverse Music Corpus.
In Proceedings of 15th International Society for Music Information Retrieval Conference (ISMIR 2014), Taipei, Taiwan.
PDF

Korzeniowski, F. and Widmer, G. (2013).
Refined Spectral Template Models for Score Following.
In Proceedings of the Sound and Music Computing Conference (SMC 2013), Stockholm, Sweden.
PDF

Korzeniowski, F., Krebs, F., Arzt, A. and Widmer, G. (2013).
Tracking Rests and Tempo Changes: Improved Score Following with Particle Filters.
In Proceedings of the International Computer Music Conference (ICMC), Perth, Australia.
PDF

Krebs, F. and Widmer, G. (2012).
MIREX 2012 Audio Beat Tracking Evaluation: Beat.E.
Music Information Retrieval Evaluation eXchange (MIREX) 2012, 13th International Society for Music Information Retrieval Conference (ISMIR 2012), Porto, Portugal.
PDF

Krebs, F., Böck, S. and Widmer, G. (2013).
Rhythmic Pattern Modeling for Beat- and Downbeat Tracking in Musical Audio.
In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR 2013), Curitiba, Brazil.
PDF

Krebs, F., Korzeniowski, F., Grachten, M., and Widmer, G. (2014).
Unsupervised Learning and Refinement of Rhythmic Patterns for Beat and Downbeat Tracking.
In Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), Lisbon, Portugal.
PDF

Krebs, F., Holzapfel, A., Cemgil, A.T. and Widmer, G. (2015).
Inferring Metrical Structure in Music Using Particle Filters.
IEEE/ACM Transactions on Audio, Speech and Language Processing 23(5), 817-827.
PDF

Krebs, F., Böck, S., and Widmer, G. (2015).
An Efficient State-Space Model for Joint Tempo and Meter Tracking.
In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR 2015), Malaga, Spain.
PDF

Krebs, F., Böck, S., Dorfer, M. and Widmer, G. (2016).
Downbeat Tracking Using Beat-synchronous Features and Recurrent Neural Networks.
In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), New York, USA.
PDF

Lehner, B., Sonnleitner, R. and Widmer, G. (2013).
Towards Light-weight, Real-time-capable Singing Voice Detection.
In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR 2013), Curitiba, Brazil.
PDF

Lehner, B., Widmer, G. and Sonnleitner, R. (2014).
On the Reduction of False Positives in Singing Voice Detection.
In Proceedings of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014), Florence, Italy.
PDF

Lehner, B. and Widmer, G. (2015).
Monaural Blind Source Separation in the Context of Vocal Detection.
In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Malaga, Spain.
PDF

Lehner, B., Widmer, G. and Böck, S. (2015).
A Low-Latency, Real-Time-Capable Singing Voice Detection Method with LSTM Recurrent Neural Networks.
InProceedings of the 23th European Signal Processing Conference (EUSIPCO 2015), Nice, France.
PDF

Lehner, B. and Widmer, G. (2015).
Improving Voice Activity Detection in Movies.
In Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), Dresden, Germany.
PDF

Niedermayer, B., Widmer, G., and C. Reuter (2011).
Version Detection for Historical Musical Automata.
In Proceedings of the 8th Sound and Music Computing Conference (SMC 2011), Padova, Italy.
PDF

Niedermayer, B., Böck, S., and Widmer, G. (2011).
On the Importance of "Real" Audio Data for MIR Algorithm Evaluation at the Note-Level - A Comparative Study.
In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), Miami, Florida, USA.
PDF

Schedl, M., Hoeglinger, C. and Knees, P. (2011).
Large-Scale Music Exploration in Hierarchically Organized Landscapes Using Prototypicality Information.
In Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR 2011), Trento, Italy.
PDF

Schedl, M., Widmer, G., Knees, P. and Pohle, T. (2011).
A Music Information System Automatically Generated via Web Content Mining Techniques. Information Processing & Management 47, 426-439.
HTML/PDF (ScienceDirect)

Schedl, M., Pohle, T., Knees, P. and Widmer, G. (2011).
Exploring the Music Similarity Space on the Web.
ACM Transactions on Information Systems 29(3), article 14.
PDF (SemanticScholar)

Schlüter, J. and Osendorfer, C. (2011).
Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine.
In Proceedings of the 10th International Conference on Machine Learning and Applications (ICMLA), Honolulu, HI, USA.
PDF

Schlüter, J. and Sonnleitner, R.(2012).
Unsupervised Feature Learning for Speech and Music Detection in Radio Broadcasts.
In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx-12), York, UK.
PDF

Schnitzer, D., Flexer, A., Schedl, M.and Widmer, G. (2011).
Using Mutual Proximity to Improve Content-Based Audio Similarity.
In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), Miami, Florida.
PDF

Schnitzer, D., Flexer, A., Schedl, M. and Widmer, G. (2012).
Local and Global Scaling Reduce Hubs in Space.
Journal of Machine Learning Research 13(Oct), 2871-2902.
PDF

Seyerlehner, K., Sonnleitner, R., Schedl, M., Hauger, D., and Ionescu, B. (2012).
From Improved Auto-taggers to Improved Music Similarity Measures.
In Proceedings of the 10th International Workshop on Adaptive Multimedia Retrieval (AMR 2012), Copenhagen, Denmark.
PDF

Sonnleitner, R., Niedermayer, B., Widmer, G. and Schlüter, J. (2012).
A Simple and Effective Spectral Feature for Speech Detection in Mixed Audio Signals.
In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx-12), York, UK.
PDF

Sonnleitner, R. and Widmer, G. (2014).
Quad-Based Audio Fingerprinting Robust to Time and Frequency Scaling.
In Proceedings of the 17th International Conference on Digital Audio Effects (DAFx-14), Erlangen, Germany.
Best Student Paper Award.
PDF

Sonnleitner, R. and Widmer, G. (2016).
Robust Quad-based Audio Fingerprinting.
IEEE/ACM Transactions on Audio, Speech and Language Processing 24(3), 409-421.
PDF

Sonnleitner, R., Arzt, A. and Widmer, G. (2016).
Landmark-based Audio Fingerprinting for DJ Mix Monitoring.
In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), New York, USA.
PDF

Vall, A., Eghbal-zadeh, H., Dorfer, M. and Schedl, M. (2016).
Timbral and Semantic Features for Music Playlists.
Machine Learning for Music Discovery Workshop, International Conference on Machine Learning (ICML 2016), New York City, USA.
PDF

Widmer, G. (2016).
Getting Closer to the Essence of Music: The Con Espressione Manifesto.
ACM Transactions on Intelligent Systems and Technology 8(2), Article 19.
PDF

Widmer, G. (2014).
What Really Moves Us in Music: Expressivity as a Challenge to Semantic Audio Research. (Extended Abstract)
In Proceedings of the 53rd AES Conference on Semantic Audio, Audio Engineering Society, London, Jan. 2014.
PDF

Music Performance Research: Computational Models and Feature Learning

Collins, T., Arzt, A., Flossmann, S., and Widmer, G.(2013).
SIARCT-CFP: Improving Precision and the Discovery of Inexact Musical Patterns in Point-set Representations.
In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR 2013), Curitiba, Brazil.
PDF

Collins, T. and Meredith, D. (2013).
Maximal Translational Equivalence Classes of Musical Patterns in Point-Set Representations.
In Proceedings of Mathematics and Computation in Music (MCM 2013), Montreal, Canada.
PDF

Collins, T., Böck, S., Krebs, F., and Widmer, G. (2014).
Bridging the Audio-Symbolic Gap: The Discovery of Repeated Note Content Directly from Polyphonic Music Audio.
In Proceedings of the 53rd AES Conference on Semantic Audio, Audio Engineering Society, London, Jan. 2014.
Best Paper Award.
PDF

Collins, T., Arzt, A., Frostel, H. and Widmer, G. (2016).
Using Geometric Symbolic Fingerprinting to Discover Distinctive Patterns in Polyphonic Music Corpora.
In D. Meredith (Ed.), Computational Music Analysis. Berlin: Springer Verlag.
PDF

Flossmann, S., Goebl, W., Grachten, M., Niedermayer, B. and Widmer, G. (2010).
The Magaloff Project: An Interim Report.
Journal of New Music Research 39 (4), 363-377.
PDF

Flossmann, S., Goebl, W. and Widmer, G.(2010).
The Magaloff Corpus: An Empirical Error Study.
In Proceedings of the 11th International Conference on Music Perception and Cognition (ICMPC), Seattle, WA, USA.
PDF

Flossmann, S., and Widmer, G. (2011).
Toward a Multilevel Model of Expressive Piano Performance.
In Proceedings of the International Symposium on Performance Science (ISPS 2011), Toronto, Canada.
PDF

Flossmann, S., and Widmer, G. (2011).
Toward a Model of Performance Errors: A Qualitative Review of Magaloff's Chopin.
In Proceedings of the International Symposium on Performance Science (ISPS 2011), Toronto, Canada.
PDF

Flossmann, S., Grachten, M. and Widmer, G. (2011).
Expressive Performance with Bayesian Networks and Linear Basis Models. Extended abstract.
Rencon Workshop 2011: Musical Performance Rendering competition for Computer Systems, Padova, Italy.
PDF

Flossmann, S., Grachten, M., and Widmer, G.(2012).
Expressive Performance Rendering with Probabilistic Models.
In A. Kirke & E. Miranda, (Eds.), Guide to Computing for Expressive Music Performance, Springer Verlag.
PDF

Goebl, W., Flossmann, S., and Widmer, G. (2010).
Investigations into between-hand synchronization in Magaloff's Chopin.
Computer Music Journal 34(3), 35-44.
PDF (IEEE XPlore)

Grachten, M. and Widmer, G. (2011).
A Method to Determine the Contribution of Annotated Performance Directives in Music Performances.
In Proceedings of the International Symposium on Performance Science (ISPS 2011), Toronto, Canada.
PDF

Grachten, M. and Widmer G. (2011).
Explaining Expressive Dynamics as a Mixture of Basis Functions.
In Proceedings of the 8th Sound and Music Computing Conference (SMC 2011), Padova, Italy.
PDF

Grachten, M. and Widmer, G. (2012).
Linear Basis Models for Prediction and Analysis of Musical Expression.
Journal of New Music Research 41 (4), 311-322.
PDF

Grachten, M. and Krebs, F. (2014).
An Assessment of Learned Score Features for Modeling Expressive Dynamics in Music.
In IEEE Transactions on Multimedia 16(5), 1211-1218.
PDF

Krebs, F. and Grachten, M. (2012).
Combining Score and Filter Based Models to Predict Tempo Fluctuations in Expressive Music Performances.
In Proceedings of the 9th Sound and Music Computing Conference (SMC 2012), Copenhagen, Denmark.
PDF

Molina, M., Grachten, M. and Widmer, G. (2010).
Evidence for Pianist-specific Rubato Style in Chopin Nocturnes.
In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010), Utrecht, The Netherlands.
PDF (SemanticScholar)

Niedermayer, B. and Widmer, G. (2010).
Strategies towards the Automatic Annotation of Classical Piano Music.
In Proceedings of the 7th Sound and Music Computing Conference (SMC 2010), Barcelona, Spain.
PDF

Niedermayer, B. and Widmer, G. (2010).
A Multi-Pass Algorithm for Accurate Audio-to-Score Alignment.
In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010), Utrecht, The Netherlands.
PDF

Ritter, A., Grachten, M. and Widmer, G. (2011).
Macht Musizieren gesund? Zur Herzrate und deren Variabilität während Mozarts Klavierkonzert Nr. 14.
In 27. Jahrestagung der Deutschen Gesellschaft für Musikpsychologie. Osnabrück, Germany.

van Herwaarden, S., Grachten, M. and de Haas, B. (2014).
Predicting Expressive Dynamics in Piano Performances Using Neural Networks.
In Proceedings of 15th International Society for Music Information Retrieval Conference (ISMIR 2014), Taipei, Taiwan.
PDF

PhD Theses Supported by the Wittgenstein Project

Arzt, Andreas (2016).
Flexible and Robust Music Tracking.
Doctoral Dissertation, Dept. of Computational Perception, Johannes Kepler University Linz, Austria.
PDF

Flossmann, Sebastian (2012).
Expressive Performance Rendering with Probabilistic Models. Creating, Analyzing, and Using the Magaloff Corpus.
Doctoral Dissertation, Dept. of Computational Perception, Johannes Kepler University Linz, Austria.
PDF

Krebs, Florian (2016).
Metrical Analysis of Musical Audio Using Probabilistic Models.
Doctoral Dissertation, Dept. of Computational Perception, Johannes Kepler University Linz, Austria.
PDF

Lehner, Bernhard (2017).
Robust Real-time-capable Singing Voice Detection in Audio (Working Title).
Doctoral Dissertation, Dept. of Computational Perception, Johannes Kepler University Linz, Austria. (in preparation)

Niedermayer, Bernhard (2012).
Accurate Audio-to-Score Alignment -- Data Acquisition in the Context of Computational Musicology.
Doctoral Dissertation, Dept. of Computational Perception, Johannes Kepler University Linz, Austria.
PDF

Schlüter, Jan (2017).
Deep Learning for Event Detection, Sequence Labeling and Similarity Estimation in Music Signals.
Doctoral Dissertation, Dept. of Computational Perception, Johannes Kepler University Linz, Austria.

Sonnleitner, Reinhard (2017).
Audio Identification via Fingerprinting: Achieving Robustness to Severe Signal Modifications.
Doctoral Dissertation, Dept. of Computational Perception, Johannes Kepler University Linz, Austria.
PDF

last edited by gw on June 20, 2017