Spotify is the world's most popular and largest music and video streaming subscription service, allowing you to search and listen to millions of songs and music from a wide range of artists all over the world. It began in October 2008 as a Swedish audio streaming and media service with a few songs, but it now has 70 million songs in its library, with 60000 added every day.
Spotify has 365 million users as of 2021 and is rapidly growing. It currently has 2.9 million podcasts in its library and is available in 171 markets, with 44 percent of daily users using Spotify. It dominates the way we consume music in the twenty-first century.
You may be wondering how Spotify processes these millions of data and user records, as well as what programming language and technologies are used to create the powerful Spotify Technology Stack.No worries, guys, I'll assist you in obtaining this information.
Spotify uses a variety of programming languages for building tech stack some of them are Python, Java, C++, Closure, etc.
Python is used by Spotify for backend services, quick scripts, build processes, and data analysis. Approximately 80% of services are written in Python and are linked by Hermes, a message-based messaging protocol built on ZeroMW and protbuf. They chose Python because it is simple to write and shortens the development cycle. (source python at Spotify)
It is estimated that approximately 90% of Spotify map-reduce jobs are written in Python.
Java is said that Non-Python services are typically written in Java. Spotify uses a Java SE subscription which provides all Java SE licensing and supports(source).Java is very fast, well documented with over many active developer communities. They also use Java for Android.
Some of the shared code based is written in C++ and other platforms adopt it.
Objective-C for iOS applications
Big Data Tools
There are various tools used for storing and maintaining a large amount of variety of data.
Spotify has millions of users, songs, music, and podcasts. It is difficult to analyze this type of data and generate useful insights. As a result, for complex analysis, a "data-lake" based on Hadoop is used.
Spotify used it to assist in training machines for better track and music recommendation to users, calculating royalties, serving user-intentioned ads, measuring audience response to new features and functions, and generating business reporting marketing campaigns, etc.
According to this 2017 blog, Hadoop handles big data inflows with a 2500 node on-premise Apache Hadoop Cluster for Spotify.
Apache Spark is a distributed processing system used for big data workloads.
Apache Spark is used by Spotify for machine learning application sections, taking advantage of in-memory caching and optimizing query execution for fast queries against any volume of data.
Spotify made use of Apache Storm to power real-time use cases such as new user suggestions, ad targeting, and product metrics as Apache Storm is a leading tool for real-time processing information.
One of the popular Data Pipelines tools used in Spotify is Luigi.
Luigi is a Python-based ETL tool that feeds Spotify's Hadoop data intelligence activities. It aids developers in scheduling, monitoring, and managing batch jobs rather than continuous, streaming activities. It handles the plumbing for Hadoop jobs.
Google Cloud Pub/Sub
In 2016, Spotify's Event Delivery Solution is powered by Google Cloud Pub/Sub. They used to use Apache Kafka before.
Following are the DevOps tools used in Spotify: Docker, New Relic, Datadog, Pingdom, Percy, Apache CloudStack, Helios , etc.
Docker is a lightweight standalone virtual machine that comes preloaded with all of the programs and dependencies you'll need to run your program. It simplifies the creation, deployment, and execution of any application.
New Relic functions as a service that assists website and application owners in tracking the performance of their web apps. It performs like a real-time performance dashboard with X-ray vision. New Relic assists in locating actual problems in apps.
New Relic removes the burden of monitoring, identifying, troubleshooting, and scaling the web app from your shoulders and makes it simple for you.
Datadog is a monitoring, security, and analytics platform for cloud-based applications and is useful for developers, IT operators,
security engineers, and business users. It provides monitoring of servers, databases, tools, services, and many more.
Pingdom is a website performance and availability monitoring tool.
CloudStack is an open-source cloud computing management platform that allows you to create, manage, and deploy various cloud services. Some of Apache CloudStack's key features are as follows: Built-in high availability for hosts and VMs, user-friendly Web-based UI for cloud management, Management of snapshots, accounting of network, compute, and storage resources, usage metering, virtual routers, firewalls, and load balancers, and so on.