Department of Computational Perception
Department of
Computational Perception
Johannes Kepler Universitšt Linz

Music Data Sets

This page contains data sets frequently used in web- and social media-based music information research as well as in multimodal music information retrieval tasks. For the MusicMicro data set, see here. For the MusiClef 2012 data set, see here.


This is a collection of 224 artists categorized into 14 genres with a uniform genre distribution. It was proposed in the paper:
Artist Classification with Web-based Data
P. Knees, E. Pampalk, and G. Widmer.
Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR'04), pp. 517-524, Barcelona, Spain, October 10-14, 2004.


This is a collection of 3,000 artists, corresponding to the top-ranked artists (filtered by occurrence in The genre assignment originates from (18 distinct genres, skewed genre distribution). The paper where the data set was first used is:

Investigating Web-Based Approaches to Revealing Prototypical Music Artists in Genre Taxonomies
M. Schedl, P. Knees, and G. Widmer.
Proceedings of the 1st IEEE International Conference on Digital Information Management (ICDIM'06), Bangalore, India, December 6-8, 2006.

C49ka and C111ka

These are two artist collections used for microblog indexing experiments. C111ka contains a list of 110,588 artists (without genre information). C49ka  comprises 48,800 artists, for which genre information is available as well.

last edited by ms at 2012-12-17