This repository contains an AWS Glue job developed in Python using PySpark for Spotify music recommendations. The Glue job is designed for use as Extract-Transform-Load (ETL), which is responsible for preprocessing the data.
Before running this AWS Glue job, ensure you have the following:
- An AWS account with the necessary permissions to run AWS Glue jobs.
- Required AWS Glue job resources and configurations set up in your AWS environment.
- Python libraries or dependencies required for the Glue job. You can specify the dependencies in the job script or provide a requirements file.
The dataset used in this project can be derived from Culture-Aware Music Recommendation Dataset provided by Eva Zangerle.
This dataset encompasses the information of 55,190 users, 3,471,884 tracks, and more than 120 million listening events, along with Hofstede's cultural dimensions data of 47 countries and the World Happiness Report (WHR) of over 160 countries. It consists of 5 files, 4 being *.tsv and one in *.tar.gz format. The following is a list of those files:
- acoustic_features_lfm_id.tsv (265.0 MB)
- events.tar.gz (2.8 GB)
- hofstede.tsv (1.7 kB)
- users.tsv (1.9 MB)
- world_happiness_report_2018.tsv (439.8 kB)
Here are the visual ETL generated by AWS Glue for each job
Copyright (c) 2023 Sorawit Chokphantavee and Sirawit Chokphantavee