How to scale a web application to handle an infinite amount of traffic

19 min readApr 28, 2020

This is the second part of the Ikea store analogy and how to set up a web application such that it can handle an infinite amount of traffic.

First of all, if you haven’t read the first part of the article, please read that first. The analogy will make a lot more sense once you’ve read the first article about HTTP request/response and where the Ikea store analogy is introduced.

Back to the Ikea store analogy

Imagine that at our Ikea store, there is a big issue all of a sudden. Your store became so popular that there are tens of thousands of people visiting your store every day. In fact, a customer arrives and has to wait 5+ hours to walk into the store and place their order!

“This is crazy that customers have to wait this long!”, is what you tell your friends and your employees. In fact, you start looking for an experienced general manager to take care of your store and fix these issues!

So, you start interviewing a bunch of managers essentially asking each of them: “How do you plan on solving this issue?!”

Bunch of experienced managers give you a lot of opinions. Some of them have told you: “You need to hire C++ workers. They are so much faster than your current workers who speak Python/Javascript/Node.js. In fact, they are faster at assembling furnitures than any other language by a factor of 10!”

Some other managers you’ve interviewed mentioned that “You need to move to a bigger store! Your store needs more space and more workers. Upgrade to a bigger score so that you can have more workers!”

Some other managers have said “You need to use this new cool database that just came out. It’s faster than any other database by a factor of 20! It’s the greatest thing in the world!”

The answers that these potential managers give you are endless and you start getting overwhelmed, not sure who is exactly right or wrong? After all, these are the “experts” that you’re speaking to, right?

Thought Exercise: if you were the manager applying for this role, what would your response be? Take 3 minutes to write down how you would improve this store and write it down before going any further.

So who was right?

The answer is that none of them were.

Scenario 1

Imagine that this was what was happening in the store:

Receptionists were handling requests extremely quickly, say in 0.0001 seconds.
Workers were extremely fast and processed all the instructions and assembled all the furnitures (once the parts were provided from the warehouse) in 0.00001 seconds.
The Database was really slow. The manager didn’t have a good system of organizing the inventories so they ended up opening every single drawer one by one to find out if that drawer had the part the worker needed! With say 10,000,000 drawers, it took a really really long time for the warehouse worker to find all the parts the workers needed. Say this process was what took 5 hours.

Imagine that with the above scenario, you did indeed fire all your workers and replaced them with C++ workers who did things say 10 times faster. Now, customers instead of waiting 5 hours, they are now waiting 5 hours -0.0000009 seconds less. Was that a smart thing to do?

How this database issue can be resolved is addressed in the next scenario, although it has a different problem now.

Scenario 2

Imagine this was the process:

The receptionist handled requests extremely quickly, again in 0.0001 seconds.
Workers had inefficient instructions they were following. For example, instead of visiting the warehouse once to get all the parts they needed, they kept going to the warehouse each time they needed a new inventory. For example, to build 500 Ikea desks, they went to the warehouse 500 x 4 times to get each of its legs, and then 500 times to get its tabletop, etc. Not only this, for things they could have done in 3 steps, they were also given instructions that took them 50 steps instead. All of these piled up and now the workers were the bottleneck, taking close to 5 hours to assemble the furnitures.
The Database was not slow this time but delivered needed parts very quickly. In fact, this warehouse manager decided to create an ‘index’, essentially a catalog where they could easily look up the names of the parts the workers needed and it indicated which Aisle/section the warehouse manager should go. For example, for retrieving a silver Ikea chair leg, they could simply walk over to isle AA section 10and find the part there! This was a lot more efficient than starting from Aisle 1 and checking each section one by one until the right part was found!

For example, this database index would have looked something like this:

So looking at this scenario again, would it have helped to again replace your Python workers to say Java, even if it brought 10% improvement to your speed?

The answer to this scenario is not to replace the worker but to find a more efficient way for the workers to produce the final output. For example, you can identify all the queries that were executed/run to render a particular page and then analyze if you could obtain the same results with less queries and also by using indexes or foreign keys, as well as if you could have done things by avoiding inner queries, etc. This is especially good to check if your developers are using ORM libraries as ORM can make it easy to introduce a lot of repetitive queries that really could be combined into a single query.

For example, years ago, I consulted for a site that crowdsourced great deals from the internet. On their homepage, it showed top deals of the day. This page was unusually taking long, often taking over 30 seconds to load. When we looked further, it turned out that the developer had a logic where

they were getting all deals in the database (let’s say there were N deals)
they had a for loop, where they iterated through each deal and did some computations for each deal. To get this computation, they sent a query to the database each time the loop was run.
inside this for loop, there was another for loop where they were making an additional query to the database. Not exactly sure why.

In essence, for N deals on their database, they were making N² queries to the database. With only 10,000 deals (which is really small in the database), the homepage was making a whopping 100,000,000 queries to the database (this number of queries is a lot. You usually want to limit the # of queries per page to 3–20).

We simply removed the two for loops, did the computation in 1–3 queries and got the site to load up under 0.5 seconds. They were previously thinking about upgrading their servers thinking they needed a lot more servers to handle the traffic. That wasn’t needed once the code was optimized. This example shows how the workers’ flow can be streamlined and why it’s good to have a developer who always thinks about how to most efficiently get the worker to do what is needed. A good developer can write steps where the workers can be very productive while an inexperienced developer can write logic that slows down the entire process.

Scenario 3

Imagine that the reason why the customers were waiting in line for so long was because someone sent a massive number of requests to your server, blocking the entire thread. This is often called a DDOS attack and in fact is super easy to engineer. You could even pull up your browser, open up a javascript console, and write a for loop to send lots of Ajax request to a server. If the server is not configured correctly, you can simply open 20 tabs in your browser, send 10,000 requests at a time, and cause the server to be back-filled with useless HTTP requests. All of this can be done with a few javascript lines in any browser.

If this was the bottleneck, then it wouldn’t better if you had faster workers/receptionists or databases. You could have instead blocked these attacks by recording their IP address, limiting the number of requests that can be received from a specific IP address, etc.

You can also look at technologies such as Cloudflare to help manage these scenarios.

Scenario 4

Let’s say that the users thought the reason they are waiting for say 30–40 seconds for your web server to load was because your web server was just slow and not optimized. Let’s say, however, that upon closer introspection, what was happening was as follows:

receptionist, workers, and warehouse all worked really quickly, generating the appropriate HTTP response within say 0.5 seconds.
when this response was sent back to the browser, it was discovered that the browser needed to make additional requests to fully render the page. For example, say there were dozens of javascript files, dozens of images files, and some of the images were say 10M-100M in size each!

In this scenario, you could have used Chrome’s internal ‘Network tab’ to visualize this. For example, on Chrome’s network, these are the information displayed for all the HTTP requests sent by the browser to render that page. Note that each CSS, Javascript, and image file requires a separate HTTP request.

Note that there were a total of 60 requests made and 1.5MB of data was transferred (shown on the bottom left corner). Imagine that this was instead 50MB of data (due to large uncompressed files or extremely high-resolution photos) and the user’s internet speed was only 1MB/s. That would basically be 50 seconds of waiting for the page to be rendered, even if your web server was super quick. This is like the Ikea store generating truck loads of data but transferring all that information was done by a tiny car in a single lane dirt road. It’s not the fault of the store that the information took forever to be delivered.

The way to fix this issue (other than having the user get a faster internet) are:

compressing your javascript/css files (removing un-necessary white spaces helps)
making sure the images are not higher in resolution than what you need
combining multiple images into a single image so that instead of multiple HTTP requests, a single HTTP request could be made to grab an image that has multiple icons/graphics (and then only using a portion of that bigger image for each element).
Making sure that any old or un-necessary css or javascript files are removed and not loaded.
instead of hosting your own public javascript file or css file such as React, Angular, jQuery, Bootstrap, Foundation, etc, use a CDN so that the browser can use a cached version in the browser instead of downloading this file from your own server
Just like combining multiple images to a single image, you could also combine all the css files that are often used together as a single css file. Similarly, javascript files that you use together, combine them into a single javascript file as well as minifying or removing line breaks and extra spaces in your file.

Scenario 5

Imagine that most users were on East Coast U.S.A. but your servers were someone how in the middle of India. It would take time for the HTTP request to travel across the continents and to have it travel back. Yes, internet is fast, but it still takes time for information to travel across.

In this case, move your web servers to where the majority of your customers are at.

Taking it a step further

The ultimate right answer for scaling a web application is very simple. It is to find a bottleneck and simply address that.

Some of the typical scenarios for the bottleneck were addressed above. Let’s take it a step further as the above scenarios are only helpful for optimizing a single server.

Let’s say that you have a store that’s optimized. You figured out how to get the workers to work most efficiently. You’ve streamlined how information is stored in the database. You’ve organized your database table neatly using indexes, foreign keys, storing things as integers as much as possible, breaking down large tables into smaller pieces that are more modular, etc.

Say that the site is doing well and it is able to handle a lot more traffic!

Say that after a few months, the server starts to suffer again and customers start waiting in line for minutes/hours again. What do you do next? Is it now time to buy a bunch of other computers in the cloud?

You can, but there is another thing you should try first. Try to guess what that is first, however.

Let me give you a hint. Imagine that in your web server, you had 1 million people lined up. Let’s say that 99% of them want the same thing. They want to visit your homepage where they get to see the top 100 news of the day. They line up and your receptionist, workers and the warehouse folks work really hard to deliver that output. Each time that the customer talks to the receptionist, the receptionist hands off that information to the works at a lightning speed. The workers also get busy very quickly and visits the database which is really well organized and streamlined. It still takes a bit of time for obtaining the 100 news, but as it’s indexed well and neatly organized, it finds this information fairly quickly. Workers assemble that information in a neat HTML, CSS, and Javascript package and hands it off to the receptionist. This repeats over and over again.

Can you imagine that the warehouse manager getting smart and saying, “wait…. instead of me walking down the aisles and opening a bunch of drawers, why don’t I just make copies for what was in there? After all, no new items were added, removed, or updated from the cabinets”

So imagine that when the worker comes, they talk to the warehouse manager, and the manager hands them a copy of the top 100 news and never walked down the aisle to open any drawers! The warehouse manager is now freed up!

This concept is called caching and in this exam, the database decided to utilize caching.

Imagine that later, the workers found out about this trick and they also create a copy of their output and instead of even going to the warehouse now, they just hand off a copy of their output to the receptionist.

You can imagine that the receptionist can also follow a similar pattern where instead of handing off the order to the workers, they can also just save a copy and handoff that copy to the customer who wanted the top 100 news of the day.

All these layers of caching were done on the server-side. Caching can also be done before it even hits the web server. Your internet service provider can also cache popular pages and send you the response without even directing that request to the actual web server. Your browser can also save cache of the previous files that were requested, loading up your web page much faster as a lot of files were already cached.

Common files could also be saved in a single place (CDN) so that your browser could request it once and simply have it cached in the browser instead of every downloading that file every time it visits a site that uses that css file or a javascript file. This is why a lot of common css and javascript libraries (such as Bootstrap, Foundation, React, Angular, etc) are hosted in CDN, as these files are quite large and downloading them from each web application that uses these libraries could be time-consuming.

What’s the next step to scale my web server even more?

Addressing bottlenecks within a server can improve the performance of your web server significantly. In fact, my rule of thumb is that a server that costs less than $20/month should easily handle about 1M visitors per month for a typical web application. If you have less than 1M visitors per month and you’re spending a lot more than $20/month, there is a high chance that your server hasn’t been optimized yet. In that case, measure each step of the HTTP request/response cycle, find out what the bottleneck is and optimize that part of the process until you can’t optimize any further. Follow the steps outlined above.

Once you’ve optimized each process, here is what you can do next.

Get a bigger store

Whether this is the best plan or not, we’ll discuss later when additional concepts are introduced, but for now, let’s imagine that this is the game plan. In other words, you decided to upgrade your server to a computer with a lot more RAM and CPU power. A bigger RAM and CPU is essentially like moving your store to a much bigger space where with the added space and computing power, you can hire a lot more receptionists and workers. With this, say you’re now spending $50/month compared to $20/month previously.

This can be a short term boost. However, even the largest computer has a limit on RAM and CPU and sooner or later, you’ll come to the point where a single massive computer just isn’t enough to handle all the web traffic.

Separate the database

Another thing you could do is create a dedicated server to only host the database. As the database usually takes up a lot of storage space, separating the database from the workers and the web-servers could result in a significant performance boost. Once this is done, your servers would look as follows:

Note that your database server would most likely have a different ip address.

Scaling the database

With two separate servers, your site should handle a lot more traffic, but do you know if it’s still not enough and the database still becomes the bottleneck?

There are a few options

1. Create multiple database servers that are identical to each other but where the load can be shared

You could create copies of the database server and in essence, create a ‘load balancer’ which is essentially a receptionist who routes that request to a different server.

Instead of sending say 1M requests all to a single database server, you could for example set up 5 database servers, and have each server 200,000 requests. Assuming your database can handle that many requests, this is a perfectly fine way of scaling your database servers. With this, your store would look as follows:

Now, if you needed more database servers, you could easily set up more and scale “up” or “down” based on your needs. Using the cloud, you can also easily write code so that based on traffic the number of replicate servers go “up” or “down”

2. “Shard” the database to multiple smaller database where each is not a clone of each other but hold smaller amounts of data.

Say that even the largest data-server can handle 1TB of data and your database is so large now that a single server (even the biggest server you can buy) just can’t hold the data anymore.

Before you explore databases built for really large data (like Oracle or Hadoop), you could also just break your large database into multiple smaller databases. For example, you could break your large database to 10 smaller databases and depending on the operations you need, have it go to the appropriate database server to retrieve the information you need. This route does get more complex especially if one database server just doesn’t have all the information you need and you need to query multiple database servers to piece together the information you need. For now, just know that this option is a possibility

3. Use Amazon or other cloud service’s database

You could also just have Amazon AWS handle all the database complexities including scaling the database servers up and down. Using this could feel like an easy way out, but oh well, it could be an effective strategy. If you do this though, just make sure you’ve fully optimized your database before just using this database. A common mistake is you put database tables that are not that optimized and you end up spending thousands/tens-of thousands of dollars each month on server cost. If you would have optimized heavily before relying on these cloud databases, you can often save 10–100 times the server cost. If you’re VC funded and have a lot of money and thousands/tens of thousands of dollars per month is not a big deal, then maybe this is okay…

4. Oracle or Hadoop

If you need to handle an even larger amount of data, you can explore these options. Amazon AWS and other cloud services also started offering solutions for handling large amounts of data. Hadoop is very interesting as it utilizes a lot of small servers, maps out all the tasks that need to be done, and reduces what each small server needs to do. In essence, this “map-reduce” technology makes Hadoop very robust, very affordable, and extremely powerful in analyzing large sets of data. Most web applications however don’t need to go to this stage.

As a rule of thumb, if your database is no more than a few hundred GBs, a single database server would be enough. If your database is less than 16 TB, you can use Amazon RDS to handle all the complexities. You have data that is even bigger than that, check Oracle or Hadoop. In fact, Amazon AWS even has services where you can use Oracle or Hadoop using Amazon AWS too.

Scaling the web servers

Note that so far, we’ve only scaled up or down the database. If the web server is also becoming a bottleneck, you can also easily set up a load balancer and have multiple web servers. When you do this, it would look as follows:

You could also scale up or down the number of web servers you have and even write a script to manage the number of web servers that are on. This way, you can have lots of web servers up and running when traffic is high and reduce this to a single server when traffic is low.

Introduction of Docker and containerization

If you’ve been around a little bit, you would remember how the power of creating a virtual machine has changed the whole software development industry. Before, people used to buy an actual server and host it themselves either at their office or home. If you needed a faster computer with more RAM and CPU power, you bought a new powerful computer and set up the server yourself.

With the concept of a virtual machine, this has changed everything. Now one could buy an extremely powerful computer (a computer that would be super expensive) and split the resources to create many virtual computers. They could easily adjust the amount of resources (CPU, RAM, disk storage) given to each ‘virtual’ computer. This, in essence, allowed services like Amazon AWS and other cloud technology companies to scale their operations massively.

This way, when you’re renting a computer from the cloud that has 8GM RAM, and certain CPU power, you’re not actually getting a computer that size, you’re getting a virtual machine with that amount of dedicated resources.

Just like how virtual machine has changed the industry, now a new concept called ‘containerization’ I believe is what will also transform this industry.

The good news is that once everything is set up, this will allow managing the servers a lot easier, particularly managing the number of web or database servers needed to handle the traffic. The bad news is that there is a steep learning curve for learning how all of this works and it will probably take at least 3–6 months for someone who’s not familiar with containerization to learn this and apply this in their workflow.

In a nut-shell, think of Virtual Machine as breaking a giant computer into multiple virtual computers. A virtual machine is nice but the operating system and all the dependencies (which basically means all the software required for the web server as well as the operating system), consumer vast amount of RAM, disk space, and CPU power. Imagine if you could rip all of the operating system and dependencies out and put it into a ‘container’. This container would be independent of the operating system.

This results in a lot of benefits. For example, now the container is significantly smaller! As an operating system can be 100–1000 times bigger than your actual code, now you’re saving a lot of disk space as well as money!

In addition, developers usually spent a lot of time getting the virtual machine to be configured just right in order for their code to work really smoothly. For example, the code that worked just fine with Mac just would not work the same when it’s run under Microsoft, or certain Linux distributions (e.g. CentOs, Ubuntu, etc). What worked on Ubuntu x.x would not work on another version, etc.

This usually required separate engineers just to manage all the servers. These engineers were called Dev-ops and they were responsible for managing all the server configurations, etc.

Now, with containerization, a lot of what Dev-ops had to do, some of them have not become necessary, but now the developers are expected to know how to set up these containers and do what a lot of Dev-ops used to do. The expectations for what a good software developer must do has gone even further where now a software engineer is expected to do a lot of the work that Dev-ops used to do.

Docker is one of the pioneers of this space. Google also built services such as ‘Kubernetes’ where it can automatically deploy and manage these containers for you.

Summary

There was a lot of information covered in this article. We covered how all the different web technologies come together and what the role of the web servers are, how different programming languages play a role, what databases are, why there are different frameworks as well as database, how to scale up/down the web servers as well as database, the role of the load balancers, what indexing and caching are, as well as how to troubleshoot scalability issues for common bottlenecks within a web application.

Although this information could be overwhelming, in the beginning to digest, if you were ever faced with a scalability issue, I would recommend you always think back to the Ikea store analogy and think what you would do in that scenario, remembering that it’s always best to identify the bottleneck and address that bottleneck before taking any hasty actions. Also instead of scaling the number of servers up, really focus on optimizing a single server relentlessly first. By optimizing your server, you can often get 10–100 times out of the original server than what the initial setting may have provided. Once these are all done and each step of the HTTP request/response optimized, then you can get a more powerful server before getting load balancers and increasing the number of servers supporting your application. After this, then you can consider moving all of your services in the cloud. This will make sure that your servers run efficiently and also that you are saving money from buying additional servers that you may not need.

Liked what you’ve read?

If you liked what you read, please follow me to get notified of other articles I will be writing. You can also check other articles I’ve written about education, coding, and technology.