MusiClef 2012 Multimodal Music Data Set

MusiClef 2012
Multimodal Music Data Set

This page contains the MusiClef 2012 data set, a paper on which was accepted for ACM MMSys 2013 (Dataset Session).

The data set contains multimodal data on 1355 popular music songs by 218 leading artists, and is a considerably expanded version of the data set that was used for the MusiClef Multimodal Music Tagging Task at MediaEval 2012. Following the corpus annotation standard recently proposed by Peeters and Fort at ISMIR 2012 (reference [10] in our paper), the data set used at MediaEval 2012 would be identified as corpus:MIR:MusiClef:2012:MediaEval:version1.0, while the data set on this page would be identified as corpus:MIR:MusiClef:2012:MMSys:version1.0.

In case you make use of the data set in your own research, please cite the corresponding paper:

A Professionally Annotated and Enriched Multimodal Data Set on Popular Music
Schedl, M., Liem, C.C.S., Peeters, G., and Orio, N.
Proceedings of the 4th ACM Multimedia Systems Conference (MMSys 2013), Oslo, Norway, February-March 2013.
>> PDF, BibTeX

You can either download the entire dataset (musiclef_2012_dataset.zip - 12.5 GB) or each component separately, below.

Editorial Metadata

To identify the music items in the data set and link them to other data sources, we provide lists of artist and songs, the corresponding album information for the songs, as well as corresponding MusicBrainz identifiers. For artists, we further provide shortened representations ("webartist"), which are used to identify artists in the web crawling subsets (to avoid file naming problems). The file names and corresponding composition of entries is as follows:

songs.csv	<song-id, song, artist-id, artist>
artists.csv	<artist-id, artist, webartist>
mbids.csv	<song-id, song-mbid, song, artist-id, artist-mbid, artist>
artists-songs-albums-tags.csv	<song, artist, album, tag1, tag2, ..., tagN>

Audio Features

The data set contains different music content features, computed from the audio signal of the music pieces. Two low-level features (FB-Mel and MFCCs) are provided, along with the output of two state-of-the-art audio feature extraction algorithms (BLF and PS09).

FB-Mel

FB-Mel [signal is decomposed using a bank of 40 triangular filters and distributed according to the Mel scale]: fbmel.zip, fbmel.tar.bz2

MFCC

MFCC [compute a Discrete Cosine Transform (DCT) over the logarithm of the FB-Mel descriptors and retain the 20 first coefficients]: mfcc.zip, mfcc.tar.bz2

BLF and PS09

Block-Level Features are a combination of various descriptors that model temporal aspects of the audio signal. We provide similarity scores between all music pieces in the collection: blf_similarities.txt
We further provide the feature vector representations on which these similarities are computed: blf_features.txt. For details about the feature vector composition, please see the corresponding paper (reference [17] in our paper).

PS09 Features are an aggregation of various rhythm features (Fluctuation Patterns, Onset Patterns, Onset Coefficients) and timbre features (MFCCs, Spectral Contrast Coecients, Harmonicness, Attackness). Again, we provide pairwise similarity scores between all music items in the collection: ps09_similarities.txt
We further provide the feature vector representations on which these similarities are computed: ps09_features.txt. For details about the feature vector composition, please see the corresponding paper (reference [11] in our paper).
The corresponding assignment between song-ids and feature vectors for both, BLF and PS09 features can be found in song_ids.txt.

User Tags

Collaborative tags representing the "wisdom of the crowds" are provided on the track level. They were gathered from Last.fm and are presented along with corresponding weights, normalized to the range [0,100]: lastfm_tags.zip

Web Pages

To offer contextual music data, we crawled various sets of web pages in six different languages (English, German, French, Italian, Spanish, Swedish), on the level of artists and of releases. The corresponding data sets contain the raw HTML pages, information about the crawling process, standard vector space representations of the artists/releases as TF-IDF weight vectors, and Lucene indices of the web page sets (artist-level: 6 languages, release-level: English). The sets are decomposed as follows:

web_crawls.zip	crawled web pages with meta-information about the crawl
web_weights.zip	term weights (TF and TF-IDF)
web_indices.zip	Lucene indices of the web pages

Expert Labels

The music items were tagged by professional music annotators with respect to genre and mood aspects. We provide a unique list of tags used by the annotators (tag_list.csv), the <song-id, tag> assignments which were provided for training in the MusiClef 2012 campaign (train.csv), the test set containing only the song identifiers for which the tags had to be predicted (test.csv), and the ground truth for the test set, i.e., expert tags for the song identifiers in the test set (test_with_groundtruth.csv).

tag_list.csv	list containing only the tags used in the annotation process
train.csv	training set
test.csv	test set
test_with_groundtruth.csv	test set including ground truth annotations

MediaEval 2012 Reference Implementation

We provide a reference implementation, in which several text- and audio-features are fused to create a simple, baseline auto-tagger. The reference implementation is available as Matlab scripts: reference-implementation.zip. Please see README.TXT in the root of the zip file for further details.

last edited by ms and cl at 2013-02-08