rachel jackson

ux research + museums

Hikki & Spotify’s API

A look at Hikaru Utada’s music statistics with Spotify API and R

About this project

While searching for a dataset to use for this project, I came across #TidyTuesday, a weekly challenge to practice building data visualizations in RStudio. Searching through years worth of .csv files, I finally came across a topic that piqued my interest - a dataset created using SpotifyR. Months ago, I requested for Spotify to send me my account data, but I still haven’t heard back. Eager to finally get my hands on Spotify stats, I dived right in.

Thumbnail: ‘Sakura DROPS’ by Hikaru Utada

Tools Used

The main tool used for this project was RStudio, a program that uses the R coding language to produce data visualizations. I also used SpotifyR, an R package wrapper “for pulling track audio features and other information from Spotify’s Web API in bulk. By automatically batching API requests, it allows you to enter an artist’s name and retrieve their entire discography in seconds, along with Spotify’s audio features and track/album popularity metrics. You can also pull song and playlist information for a given Spotify User.”(“R Wrapper for the ’Spotify’ Web API,” n.d.) All of the data was pulled directly from Spotify’s servers, meaning that there was no .csv file involved in my process. Thankfully, SpotifyR contains a directory of every usable function for Spotify’s API. Additionally, I used Plotly and its online library to create the interactive charts in this document. Plotly is a chart-building plugin available for R and similar coding languages such as Python and Javascript.(“Plotly r Graphing Library,” n.d.)

Methodology

My original goal for this project was to analyze my personal Spotify data. However, I soon discovered that the Spotify API does not provide most of the functions I was hoping to use. There are no functions available that I believe could adequately replicate Spotify’s ‘Year Wrapped’ event.

get_user_profile("bzgggexjmb54ugltmxpp9whgr", authorization = get_spotify_access_token())

## # A tibble: 1 × 10
##   display_name external_urls.spot… followers.total href     id     images.height
##   <chr>        <chr>               <chr>           <chr>    <chr>  <chr>        
## 1 rachel       https://open.spoti… 2               https:/… bzggg… <NA>         
## # … with 4 more variables: images.url <chr>, images.width <chr>, type <chr>,
## #   uri <chr>

# My pubically available account information
# IDs for artists, users, playlists, songs, etc., can be found in their respective Spotify URLs
# To edit user data, you have to also know their special Client_ID and Secret_ID
# The access token has to continuously refreshes itself

A press image of Utada for their 2022 BAD MODE album.
Image courtesy of utadahikaru.jp.

Moving forward, I set my focus on analyzing the data of a single artist, a option that Spotify provides many data points for. I decided to research Hikaru Utada’s discography. Utada is a Japanese-American singer-songwriter whose 1999 debut album remains the all-time bestselling in Japan. Over the years, Utada has experimented with different sounds and I was eager to analyze the datapoints of their music. For this project, I will be excluding their English-language albums (Exodus and This is the One), because they are listed under a separate artist profile, Utada.

I started my research by looking at Utada’s overall popularity. SpotifyR provides the function get_artist(), allowing programmers to pull an artist’s Spotify information including number of followers, attached genres, sub-genres, and popularity.

Utada <- get_artist("7lbSsjYACZHn1MSDXPxNF2")
# This string is Utada's unique Artist ID

The printed result shared that Utada currently has over 2.1 million Spotify followers and their current artist popularity score is 68 out of 100.

My first visualization is a look at Utada’s album track popularity compared to each other. I created a playlist featuring all of Utada’s tracks with a few exceptions to prevent duplicates and unwanted results; remasters, karaoke and instrumentals.

hikki <- get_playlist_tracks(
  '4xiEpLHqV9fHwZRsWma1ON',
  offset = 0,include_meta_info = FALSE)
# This function pulls the specific playlist I created
# The string is the playlist's ID
# Hikki is Utada's longstanding nickname

popularity

Most Popular:

Utada’s most popular song worldwide is ‘First Love,’ the title track of their debut album.

Least Popular:

The least popular song is also from that album, a 30 second interlude. Seeing the track ‘Interlude’ in last place doesn’t surprise me, I often skip it myself (in part, to avoid it from affecting my #SpotifyWrapped data).

other measurements

Spotify has many unique song measurements available to analyze. This report includes three of them in relation to Utada’s discography:

Danceability: Can you dance to it?
- How suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
Valence: Does the rhythm and tempo evoke a positive feeling?
- A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
Speechiness: How many words are being spoken and what is the rhythm?
- Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
  (“Tidytuesday/Data/2020/2020-01-21 at Master · Rfordatascience/Tidytuesday,” n.d.) Descriptions courtesy of the tidytuesday page for SpotifyR.

Danceability

Below is a segment of the code used to create the danceability chart. The code for the positivity and speechiness charts are nearly identical. There was a lot of trial and error figuring out how to get the hover tool to share the album title and release year!

# The following charts use the get_playlist_audio_features() function
# It uses the same inputs as get_user_profile, my ID and a refreshable access token
dance <- plot_ly(
  data = audio_features,
  x = ~track.name,
  y = ~danceability*10,
  #I multiplied the scores for danceability and the other measurements
  #to give them a more understandable & similar scale
  color = ~danceability,
  type = "scatter",
  mode = "markers",
  text = ~paste('Song:',track.name,'<br>Danceability:',danceability*10,
         '<br>Released:',str_sub(track.album.release_date,1,
          nchar(track.album.release_date)-6)),
  hoverinfo = 'text') %>%
  hide_colorbar() #For these straightforward charts, I removed the legend
  marker = list(size = 10)
  dance <- dance %>% 
  layout(title = "Danceability of Hikaru Utada's Discography",
         yaxis = list(title = 'Danceability', ticksuffix = ""),
         #Tick suffix was originally for creating percentages
         xaxis = list(title = 'Song Title',
                      showticklabels = FALSE))
dance

Most Danceable:

Least Danceable:

This song, 少年時代, ‘Shounen Jidai’ or ‘Boyhood,’ was a part of a tribute album for Japanese artist Inoue Yosui. So I’m acknowledging the second-to-last least danceable song, 海路 or ‘Kairo’/’Sea Route.’

positivity

Most Positive:

Least Positive:

It’s interesting to see ‘Kairo‘ in this position again! Both least danceable and least positive.

Speechiness

Most Speech:

Least Speech:

Peer Review

I sent a very rough draft of this project to my peer review partner to look over. I provided Chelsea with a screenshot of the Danceability chart while it was still being worked on. She recommended that I explain how Spotify calculates the song measurements, and “for the details that appear over hover, the text may be hard to read where the background color is darker.”

I took Chelsea’s review and went back over my descriptions of the measurements. I’m still a beginner when it comes to using Plotly, so I had a hard time figuring out how to edit the colors on the hover tool. While I agree that some tooltip details can be a bit difficult to see at first glance, Plotly itself actually adjusts the text from light to dark depending on the dot color. In the future, I’ll keep this in mind while building charts.

Conclusion

There are multiple popular web apps that can perform these same tasks for users. To use these websites, Spotify users grant the apps authorized access to their accounts - the same authorization I used to view my data. After reading through the function list, its a bit jarring how much control these apps have over user profiles. There are functions available that can make users follow artists, playlists, or other users, control music playback, and export personal information. I plan to keep away from these kinds of apps from now on, especially since I can now do the work myself.

This is a terribly indulgent project. I’ve been trying to figure out the right time to try out Spotify data in a visualization, and RStudio ended up being the perfect opportunity. I’m interested in using more R and Rstudio in my work! Despite the trouble I had with SpotifyR, I’m eager to try out other datasets with the program. It would be interesting to see Spotify add a ‘Singability’ category, for songs that users enjoy singing along to.

Lastly, thanks to Hikaru Utada for the music - I wouldn’t have been able to get through this project without listening to my Hikki playlist on loop!🧸

References

“October 11, 2021-October 17, 2021 Oricon Week Total Album Ranking.” n.d. https://www.oricon.co.jp/rank/coa/w/2021-10-25/.

“Plotly r Graphing Library.” n.d. https://plotly.com/r/.

“R Wrapper for the ’Spotify’ Web API.” n.d. https://www.rcharlie.com/spotifyr/index.html.

“Tidytuesday/Data/2020/2020-01-21 at Master · Rfordatascience/Tidytuesday.” n.d. https://github.com/rfordatascience/tidytuesday.