Skip to content

This repository contains an AWS Glue job developed in Python using PySpark for Spotify music recommendations. The Glue job is designed for use as Extract-Transform-Load (ETL), which is responsible for preprocessing the data.

License

Notifications You must be signed in to change notification settings

SorawitChok/AWS-Glue-data-preprocess

Repository files navigation

Spotify Music Recommendation AWS Glue Job

This repository contains an AWS Glue job developed in Python using PySpark for Spotify music recommendations. The Glue job is designed for use as Extract-Transform-Load (ETL), which is responsible for preprocessing the data.

Prerequisites

Before running this AWS Glue job, ensure you have the following:

  • An AWS account with the necessary permissions to run AWS Glue jobs.
  • Required AWS Glue job resources and configurations set up in your AWS environment.
  • Python libraries or dependencies required for the Glue job. You can specify the dependencies in the job script or provide a requirements file.

Dataset

The dataset used in this project can be derived from Culture-Aware Music Recommendation Dataset provided by Eva Zangerle.

This dataset encompasses the information of 55,190 users, 3,471,884 tracks, and more than 120 million listening events, along with Hofstede's cultural dimensions data of 47 countries and the World Happiness Report (WHR) of over 160 countries. It consists of 5 files, 4 being *.tsv and one in *.tar.gz format. The following is a list of those files:

  • acoustic_features_lfm_id.tsv (265.0 MB)
  • events.tar.gz (2.8 GB)
  • hofstede.tsv (1.7 kB)
  • users.tsv (1.9 MB)
  • world_happiness_report_2018.tsv (439.8 kB)

Job Visualization

Here are the visual ETL generated by AWS Glue for each job

WHR job

WHR-job

User job

user-job

Hofstede job

hofstede-job

Event-Acoustic job

event-acous-job

User-EventAcoustic job

user-eventacous-job

Hofstede-WHR job

hofstede-whr-job

User-profiling job

user-profile-job

Copyright

Copyright (c) 2023 Sorawit Chokphantavee and Sirawit Chokphantavee

About

This repository contains an AWS Glue job developed in Python using PySpark for Spotify music recommendations. The Glue job is designed for use as Extract-Transform-Load (ETL), which is responsible for preprocessing the data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages