LFM-2b Dataset
Corpus of Music Listening Events for Music Recommendation and Retrieval
Description
This web page hosts the LFM-2b dataset of more than two billion listening events, intended to be used for various music retrieval and recommendation tasks.
Dataset Variants
For the sake of study reproducibility we host both the full version of the dataset and the one released with the paper Investigating gender fairness of recommendation algorithms in the music domain published in Information Processing & Management journal (IP&M). We also host the 2020 Subset of the data used in RecSys'22 paper ProtoMF: Prototype-based Matrix Factorization for Effective and Explainable Recommendations.Joined Dataset
You can download the LFM-2b dataset here: LFM-2b.zip (~132.7GB, uncompressed).Field | Type | Meaning |
---|---|---|
User Id | Integer | Unique user Id |
Country | String | Two-letter country code of the user |
Age | Integer | Age of the user |
Gender | String | Gender of the user as specified on Last.fm |
Track Name | String | Name of the track |
Artist Name | String | Name of the artist |
Timestamp | Timestamp <YYYY-MM-DD 00:00:00> |
Timestamp of the listening event |
Adapted for Track Recommendation
The dataset is also available in three different files for an easier integration in Recommender Systems:Field | Type | Meaning |
---|---|---|
User Id | Integer | User Id |
Track Id | Integer | Id of the pair Track Name and Artist Name |
Playcount | Integer | Number of times the user listened to the Track overall |
Field | Type | Meaning |
---|---|---|
Track Id | Integer | Track Id |
Artist Name | String | Full artist name |
Track Name | String | Full track title |
Field | Type | Meaning |
---|---|---|
User Id | Integer | User Id |
Country | String | Two-letter country code of the user |
Age | Integer | Age of the user |
Gender | String | Gender of the user as specified on Last.fm |
User registration date | Timestamp <YYYY-MM-DD 00:00:00> |
The creation date of the corresponding Last.fm account |
Adapted for Artist Recommendation
In the case of artist recommendation, we also provide the same following files:Field | Type | Meaning |
---|---|---|
User Id | Integer | User Id |
Artist Id | Integer | Unique Id of the Artist |
Playcount | Integer | Number of times the user listened to the Artist's tracks overall |
Field | Type | Meaning |
---|---|---|
Artist Id | Integer | Artist Id |
Artist Name | String | Full name of the artist |
Available Files
File | Size | md5sum | Records | Fields |
---|---|---|---|---|
albums.tsv.bz2 | 245M | 938e232f0d5d7a9162487088829378ed | 24,237,348 | album_id, album_name, artist_name |
artists.tsv.bz2 | 45M | 0fcaea92c8c2fb1e247c5a3d0d4e8e3e | 5,159,580 | artist_id, artist_name |
listening-events.tsv.bz2 | 14G | cebd1047535d562a67801377ca2db0e4 | 2,014,164,872 | user_id, track_id, album_id, timestamp |
spotify-uris.tsv.bz2 | 49M | 7ac4d2e0a6845cd3a658a0a5e486a602 | 2,378,113 | track_id, uri |
tracks.tsv.bz2 | 641M | d89ca3c4d5344a6166da5cf5305e71fe | 50,813,373 | track_id, artist_name, track_name |
listening-counts.tsv.bz2 | 2.3G | 9c761797d89640e2137670598031f577 | 519,293,333 | user_id, track_id, count |
users.tsv.bz2 | 797K | 0aca8ab3a67ea71b1422d28c6c76d834 | 120,322 | user_id, country, age, gender, creation_time |
lyrics-features.json.bz2 | 3.8G | 500f36096bf2f06119dd3a0d15141ca8 | 1,266,554 | features{...} |
tags.json.bz2 | 142M | f49891c59a7028a1fb3b798544e6edeb | 2,230,814 | <tag, weight>+ |
tags-micro-genres.json.bz2 | 38M | e937e2d0317323c77e3e4f02b4d5be5b | 1,638,468 | <micro-genre, weight>+ |
Format Clarifications
Code