Department of Computational Perception
Department of
Computational Perception
Johannes Kepler Universität Linz


Home  –  Mission  –  Teaching  –  People  –  Research  –  Media  –  Impressum


Music Data Sets

This page contains data sets frequently used in web- and social media-based music information research as well as in multimodal music information retrieval tasks. For the MusicMicro data set, see here. For the MusiClef 2012 data set, see here.


C224a

This is a collection of 224 artists categorized into 14 genres with a uniform genre distribution. It was proposed in the paper:
Artist Classification with Web-based Data
P. Knees, E. Pampalk, and G. Widmer.
Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR'04), pp. 517-524, Barcelona, Spain, October 10-14, 2004.

C3ka

This is a collection of 3,000 artists, corresponding to the top-ranked last.fm artists (filtered by occurrence in allmusic.com). The genre assignment originates from allmusic.com (18 distinct genres, skewed genre distribution). The paper where the data set was first used is:

Investigating Web-Based Approaches to Revealing Prototypical Music Artists in Genre Taxonomies
M. Schedl, P. Knees, and G. Widmer.
Proceedings of the 1st IEEE International Conference on Digital Information Management (ICDIM'06), Bangalore, India, December 6-8, 2006.


C49ka and C111ka

These are two artist collections used for microblog indexing experiments. C111ka contains a list of 110,588 artists (without genre information). C49ka  comprises 48,800 artists, for which genre information is available as well.


last edited by ms at 2012-12-17