In this post, I collected a few articles that help understand how classic web-based systems became scalable, as well as an overview of general techniques that can make the difference between a system that support 100 active users and one that supports 1 million active users.
Interesting articles
A couple of interesting articles describing how LinkedIn and Meta resolved some of their scalability issues.
- A Brief History of Scaling LinkedIn: How Linkedin moved from monolith to service-oriented architecture and how it got rid of its internal tightly-coupled components, resulting in Kafka as a byproduct.
- Building Timeline: Scaling up to hold your life story: How distributing and denormalising data helped Facebook to build real-time timelines for millions of users.
- A few more insider point of views of known features and architectures:
Common techniques
The following sections describe general concepts for making a system scalable. I learned these techniques from my experience at Groupon and FarmHedge. They are also common to the systems outlined in the articles above and often topics of system design interviews (as described in the book System Design Interview by Alex Xu).
Let’s assume we want to optimise a web-based system in which a set of clients, such as mobile apps or web-abs, connects to a server to access data. Assuming we already exploited the limited performance improvements given by vertical scaling, a few common techniques to make the data access even more scalable are:
Statelessness and load balancers
If servers do not hold sessions, a load balancer can be put in front of the backend architecture to distribute the requests to servers independently. It is crucial that servers do not hold sessions, because subsequent requests coming from the same clients might be directed towards different web servers. Sessions can be stored in Shared Storage, normally a fast-access NoSql DB or key-value pair storage that hold session tokens, such as Redis.
Caches
Caches can be used in different ways to skip repeated operations that require time, such as database access or downstream access to other services. Memcached and Redis are examples of high-performance storage systems that can be used for in-memory and distributed caching.
CDNs
Content Delivery Networks (CDNs) store static resources close to the place where they are used. By doing so, the access times to these resources is reduced. HTML, scripts and other assets are commonly stored CDNs. Examples of commonly used delivery networks are Cloudflare and Akamai.
Message queues
Heavy-duty operations, intra-service asynchronous communication and asynchronous access to third-party services can be achieved by using message queues. Message queues work as a buffer and they can also throttle or limit messages exchanged for further scalability and availability purposes. Commercial examples are Amazon SQS or RabbitMQ.
Database replication
By replicating the same data in multiple databases, it is possible to parallelise read operations. This technique is useful mostly when the number of read operations far exceed the number of write operations. Modern DB systems provide replication out of the box, usually via a master-slave configuration. The data is written to one or more masters only and the data is read from the slaves (or copies).
Bear in mind that replication introduces complications related to data consistency among the copies. As dictated by the CAP theorem, only two of the properties Availability, Consistency and Partition tolerance can be achieved at the same time. Which property couple is chosen usually depends on the nature of the system and the data stored. For example, a payment system might need Consistency, while an ordering system might prioritise Availability. MongoDB leans towards being a CP (Consistency & Partition Tolerance) type of system, while Cassandra is an AP (Availability & Partition Tolerance) system.
Database sharding
Sharding is the concept of distributing data among multiple database servers, called shards. How data is distributed on different shards, is usually based on one or more sharding keys, a subset of the data keys. Sharding is also provided out of the box from most of the modern DB systems. MongoDB, for example, provides sharding with the possibility of dynamically re-shard data when shards become unbalanced with no downtime.
Denormalisation and precomputation
Another common technique to increase the data read performances is to have it ready when it is needed. DB joins can be avoided by denormalising data. Denormalise means to basically copy the join data in the source data table. Of course, this results in data duplication and introduces consistency problems that need to be taken care of. This is one of the optimisations that the Facebook timeline uses to provide real-time data. See the article Building Timeline: Scaling up to hold your life story for more info.
Data centres
The whole backend architecture can be replicated in multiple geographical locations to increase the availability and performance of the system. Normally, the closest geographical data centre is used, this reduces the latency of all requests. Furthermore, when a data centre fails, requests can be rerouted to another data centre, making the system more available as a whole. For example, Amazon provides data centres scattered across the (rich part of the) world. In order to increase availability and scalability even further, Amazon also provides another level of subdivision within the same data centres, called “Availability Zones”.
Pingback: Building a Serverless Web-crawler (AWS SAM) – Bernardino Frola