MusicStream Analytics Pipeline
End-to-end music analytics pipeline integrating Spotify and Last.fm APIs, Python-based data processing, and a MySQL database to analyze artist popularity and genre trends.
Python · APIs · ETL · MySQL
Key Insights
- Spotify popularity scores cluster around mid-range engagement levels, with only a small subset of tracks reaching very high visibility.
- Genre classification varies widely across artists, revealing inconsistencies in Spotify's genre tagging that complicate genre-level analysis.
- Combining Spotify and Last.fm data provided richer artist context, highlighting differences between platform popularity metrics and listener engagement.
Project Overview
MusicStream is an end-to-end data pipeline project designed to analyze trends in music popularity by integrating data from the Spotify and Last.fm APIs. The project involved extracting song and artist metadata, enriching records across platforms, and storing the results in a relational MySQL database for analysis.
The workflow covered the full analytics pipeline, including API integration, data extraction, transformation, database design, and SQL-based analysis. The resulting dataset enabled exploration of artist popularity, genre trends, and listening patterns across the sampled music catalog.
What I Worked On
- Configured developer access and authentication for the Spotify and Last.fm APIs to enable automated data extraction.
- Built Python workflows to retrieve music metadata including tracks, artists, genres, and popularity metrics from Spotify.
- Enriched Spotify track data with additional artist information from Last.fm such as biographies, listener statistics, and play counts.
- Used Selenium and BeautifulSoup to extract and clean artist biographies when API responses returned partial or truncated content.
- Designed and implemented a MySQL relational database to store the integrated datasets.
- Wrote SQL queries to analyze artist popularity, song counts, and genre-level listening trends.