Start a [MusicLab] project

MusicLab - Music Chart Scraper

During the bootcamp, we were given this assignment.

There are many sites that summarize data or provide news like below. The task is to create a backend server that collects data from famous sites in a specific field and analyzes trends.

As I mentioned in the previous post, I love music, so I decided to make a music chart scraper.
At first, I thought of scraping several streaming sites in Korea, and although overseas-based services are a little less common, there was a problem with Korean music streaming services that the so-called idols, or artists' fandoms, would rank new or existing songs by singers with large fan bases, rather than the order in which the public actually listens to them.

Also, if we scraped only the Korean charts, the overall volume would be a bit small, so we decided to target the sites with the highest market share in the three countries of Korea, the U.S., and Japan, following the advice of our mentor that we should expand the number of countries.

The end goal was to.

Comparing data from multiple services to show "what's really popular" without being platform-specific.

For each country
Korea: Youtube Music
US: Spotify
Japan: Apple Music, and it didn't take me long to realize that this was the wrong choice, because while there was definitely a course in the bootcamp where we got to practice scraping, I hadn't taken it yet because I was busy cramming for an advanced class assignment. I paid the price for targeting three large companies with a blank slate and no knowledge of scraping.

I started the project by Googling "Python scraping" and the most common result was static scraping with BeautifulSoup, but the problem was that Apple Music and Spotify, the two services I wanted to scrape, were not designed for static scraping.

So first, I received a csv file from each site telling me not to scrape, but to put it in a form that indicates it.

But then I realized that wasn't really scraping or anything.
Even if I automated the importing of the csv, if you ask me, is this really in line with the purpose of the assignment, I would say absolutely not.

So I started thinking about other ways to do it. The idea of using an API came to mind. Luckily, both Spotify and Apple Music have APIs, but Apple's MusicKit requires a $140,000 subscription to their developer program.

I didn't have the courage to pay $140,000 for a mini-project, even a personal one.

Spotify's API was also flawed, requiring me to manipulate it to get the daily/weekly popularity charts the way I wanted.

In the end, I was left with manually scraping all the sites, and after analyzing the sites, I realized that Apple Music's charts were initially loaded from 1 to 50 and dynamically fetched the next ranking as the user scrolled, while Spotify had no way to view the charts without logging in, so static scraping with BeautifulSoup was not an option.

So I searched and found the Selenium library I needed for dynamic scrolling, and the shoveling began. Within a day of starting scraping, I realized that I had taken crawling too easy.

Start a [MusicLab] project

MusicLab - Music Chart Scraper

Comments 0