How to improve Software Architecture in a Cloud Environment?
About this article The need for software architecture today has grown more critical due to the increasing complexity, scale, and expectations of modern software systems. Applications today aren't simple. They involve multiple layers: frontend, backend, databases, integrations, microservices, and sometimes even AI/ML components. A strong architecture provides a roadmap for organizing this complexity into manageable pieces, making it easier to develop, maintain, and scale. This article explains how we can improve the existing architecture on a project to make it more robust and powerful for all of today's challenges. Introduction When creating an architecture for any application, the first step is to understand the requirements, formal description of what we need to build. Types of requirements are: Functional requirements - Features of the system Multi - Tier Architecture (Monolithic app with API Gateway or without API Gateway) Microservices Architecture Event - Driven Architecture Existing application Now, we can focus on an example of how we can improve the software architecture of an existing application, which is the goal of this text. Let's assume we have an existing app similar to Instagram ( https://www.instagram.com/ ), Tripadwisor ( https://www.tripadvisor.com/ ) or Tik Tok ( https://www.tiktok.com/ ). A user can sign up and login to post, vote, or comment( Title, Topic tags, Body (text and/or upload images and/or upload files)) A user should be able to comment on any existing post Comments are ordered chronologically as a flat list Logged-in user can up-vote/down-vote an existing post/comment. Present the top most popular posts in the last 24 hours on the homepage. Current implementation of the app looks like it is presented on the next architecture diagram: Web App Service - A central service that provides web pages that cover part of the content presentation application Users Service - Used for creating and login user Post & Comments Service - Used for creation and storing posts and comment Ranking Service - Efficient ranking of posts by popularity in the last 24 hours Votes Service - Service which stores voting We will consider non-functional requirements that relate to the quality of our application in a real-world application usage environment. These are Non-Functional Requirements for this application: Scalability (app should support millions of daily users) Performance (it will be expected to be less than 500ms Response time at 99%) Fault Tolerance/ High Availability (99,9%) Availability - Partition Tolerance (We gave priority for "Availability - Partition Tolerance" over "Consistency - Partition Tolerance"; It is not possible to have Availability and Consistency at the same time) Durability Question is: "How to improve current architecture and cover required Non-Functional Requirements?!" It can be easy, if we go step by step and solve all requirements one by one. Scalability How can we improve scalability? We can place a Load Balancer in front of the web app service and each one of the other services. This will help us scale out our system, depending on the traffic to our system and each service independently. Then we can place an API Gateway to decouple the frontend code from the system's internal structure. This will allow us to make changes internally and increase our organizational scalability. On the posts and comments database side, as we store more posts and comments, our database instance may run out of space. It also may not be able to handle the traffic from so many users. So the solution will be sharding our posts and comments collection across different database shards. Database sharding is the process of storing a large database across multiple machines. A single machine, or database server, can store and process only a limited amount of data. Database sharding overcomes this limitation by splitting data into smaller chunks, called shards, and storing them across several database servers. All database servers usually have the same underlying technologies, and they work together to store and process large volumes of data. This way we can scale to as many posts or comments as we want to, by adding more instances and distributing our data across all those instances. We will use a range sharding strategy which many NoSQL databases support. Range-based sharding involves dividing data into contiguous ranges determined by the shard key values. In this model, documents with "close" shard key values are likely to be in the same chunk or shard. This allows for efficient queries where reads target documents within a contiguous range. The best solution for range sharding strategy in our case is to use sharding on a compound index of a post ID and the comment ID. In the worst case, the same post will be located in the two Shareds, like it is shown in the picture below: Performance One potential perfor

About this article
The need for software architecture today has grown more critical due to the increasing complexity, scale, and expectations of modern software systems. Applications today aren't simple. They involve multiple layers: frontend, backend, databases, integrations, microservices, and sometimes even AI/ML components. A strong architecture provides a roadmap for organizing this complexity into manageable pieces, making it easier to develop, maintain, and scale.
This article explains how we can improve the existing architecture on a project to make it more robust and powerful for all of today's challenges.
Introduction
When creating an architecture for any application, the first step is to understand the requirements, formal description of what we need to build. Types of requirements are:
- Functional requirements - Features of the system
- Multi - Tier Architecture (Monolithic app with API Gateway or without API Gateway)
- Microservices Architecture
- Event - Driven Architecture
Existing application
Now, we can focus on an example of how we can improve the software architecture of an existing application, which is the goal of this text.
Let's assume we have an existing app similar to Instagram ( https://www.instagram.com/ ), Tripadwisor ( https://www.tripadvisor.com/ ) or Tik Tok ( https://www.tiktok.com/ ).
- A user can sign up and login to post, vote, or comment( Title, Topic tags, Body (text and/or upload images and/or upload files))
- A user should be able to comment on any existing post
- Comments are ordered chronologically as a flat list
- Logged-in user can up-vote/down-vote an existing post/comment.
- Present the top most popular posts in the last 24 hours on the homepage.
Current implementation of the app looks like it is presented on the next architecture diagram:
- Web App Service - A central service that provides web pages that cover part of the content presentation application
- Users Service - Used for creating and login user
- Post & Comments Service - Used for creation and storing posts and comment
- Ranking Service - Efficient ranking of posts by popularity in the last 24 hours
- Votes Service - Service which stores voting
We will consider non-functional requirements that relate to the quality of our application in a real-world application usage environment. These are Non-Functional Requirements for this application:
- Scalability (app should support millions of daily users)
- Performance (it will be expected to be less than 500ms Response time at 99%)
- Fault Tolerance/ High Availability (99,9%)
- Availability - Partition Tolerance (We gave priority for "Availability - Partition Tolerance" over "Consistency - Partition Tolerance"; It is not possible to have Availability and Consistency at the same time)
- Durability
Question is: "How to improve current architecture and cover required Non-Functional Requirements?!"
It can be easy, if we go step by step and solve all requirements one by one.
Scalability
How can we improve scalability? We can place a Load Balancer in front of the web app service and each one of the other services. This will help us scale out our system, depending on the traffic to our system and each service independently.
Then we can place an API Gateway to decouple the frontend code from the system's internal structure. This will allow us to make changes internally and increase our organizational scalability.
On the posts and comments database side, as we store more posts and comments, our database instance may run out of space. It also may not be able to handle the traffic from so many users.
So the solution will be sharding our posts and comments collection across different database shards. Database sharding is the process of storing a large database across multiple machines. A single machine, or database server, can store and process only a limited amount of data. Database sharding overcomes this limitation by splitting data into smaller chunks, called shards, and storing them across several database servers. All database servers usually have the same underlying technologies, and they work together to store and process large volumes of data. This way we can scale to as many posts or comments as we want to, by adding more instances and distributing our data across all those instances.
We will use a range sharding strategy which many NoSQL databases support. Range-based sharding involves dividing data into contiguous ranges determined by the shard key values. In this model, documents with "close" shard key values are likely to be in the same chunk or shard. This allows for efficient queries where reads target documents within a contiguous range.
The best solution for range sharding strategy in our case is to use sharding on a compound index of a post ID and the comment ID.
In the worst case, the same post will be located in the two Shareds, like it is shown in the picture below:
Performance
One potential performance bottleneck is image blob store. Every time a user loads a post of a news feed with the most popular posts, we need to load images from our system. So, loading those images on the browser can definitely add a lot of latency.
To solve this problem, we can utilize a global network of CDN servers. A content delivery network (CDN) is a geographically distributed group of servers that caches content close to end users. A CDN allows for the quick transfer of assets needed for loading Internet content, including HTML pages, JavaScript files, stylesheets, images, and videos.
By using a pull model with long TTL (Time to live - refers to the amount of time or "hops" that a packet is set to exist inside a network before being discarded by a router.), images of popular posts will be distributed among those seed and edge servers, which are located very close to users in different geographical locations. A few popular CDN solutions are Cloudflare CDN, AWS Cloudfront, GCP Cloud CDN, Azure CDN and Oracle CDN. This reduces the traffic load on our system and also reduces the page load time for the users significantly.
On the API gateway level, we can introduce Cashing at peak times when a small group of posts is viewed by many users simultaneously, we can store those posts in a cache and update this cache every few minutes. We can store those posts in a cache and update this cache every few minutes. This will also reduce the response the response time to our users and goes hand in hand with our decision to prioritize availability over consistency.
For example, also we can add an Index on the post ID which will make searching for a particular post in a constant time instead of scanning through all our collection of posts.
Similarly, we can use the same compound index we created for Sharding the comments using the post ID and comments ID. This way, requesting a batch of comments for a given post will also be much faster. Of course, we can also use indexing for the other databases.
When a user comes to votes, we also have a performance problem here. As we recall, the votes are stored in the vote service database as individual records for each vote. However, when we show a post or a comment in the UI, we want to show the overall total number of up votes and down votes for each entity. So what we can do is introduce a Message Broker between the voting service and the post and comment. Additionally, we can add two fields in the post in comments collections. One for total up votes and one for total down votes. So any time a user up votes or down votes opposed, the voting service will add that entry to its own database and publish it as an event to the post and common service. Then the Post and Common Service can consume this event and update the total numbers of up votes for a given post which will store with the post record. This approach really helps if we have a short but sudden traffic spike when many people start voting on the same post or several post simultaneously, since we can buffer those votes in the message broker and we can make those updates slowly and complete them after that traffic spike is over. Of course, during this transition, users will not get the most up to date values for the total number of up votes or down votes, but that's okay because we prioritized availability over consistency when we gather our nonfunctional requirement.
Finally, now that we have the total number of up votes and down votes stored together with each post and comment, we can load that data easily and efficiently to the user.
Fault Tolerance/ High Availability (99.9%)
To ensure we don't lose any data, we can introduce database replication if a database instance or database shard crashes. Similarly, every message broker data store and micro service is replicated, adding to our overall system's full tolerance.
In addition, we can run our system across multiple data centers in different geographical locations and utilize a global load balancing service. This way, we can always fall over to another region if there is a natural disaster in one of those regions. Of course, as an added benefit, this also improves our performance as our system gets physically closer to our users on different continents.
The following image shows a situation where the closest server is unavailable to the user. In this situation, the user contacts the second closest server that is available.
Availability + Partition Tolerance (AP over CP)
In our architecture we prioritize availability over consistency when it comes to data.
Durability
- Replication and Backups
Using database replication across multiple instances and data centers, as well as rolling backups, we guaranteed durability so our system never loses any data.
Additional - Possible implementation
Load Balancing Solutions
Open Source Software Load Balancing Solutions:
HAProxy
NGINX
Cloud Based Load Balancing Solutions:
- AWS - Elastic Load Balancing (ELB):
- Application (Layer 7) Load Balancer
- Network (Layer 4) Load Balancer
- Gateway Load Balancer
Classic Load Balancer
GCP - Cloud Load Balancing:
External HTTP(S) Load Balancer
Internal HTTP(S) Load Balancer
External TCP/UDP Network Load Balancer
Internal TCP/UDP Load Balancer
Microsoft Azure Load Balancer:
Standard Load Balancer
Gateway Load Balancer
Basic Load Balancer
Gateway
Open Source API Gateways
- Netflix Zuul
Cloud - Base API Gateways
- Amazon API Gateway
- Google Cloud Platform API Gateway:
- Google Cloud Platform API Gateway
- Apigee
- Microsoft Azure API Management
CDN
- Cloudflare
- Fastly
- Akamai
- Amazon CloudFront
- Google Cloud Platform CDN
- Microsoft Azure Content Delivery Network
Message Broker
Open Source Message Broker
- Apache Kafka
- RabbitMQ
Cloud Based Message Brokers
- Amazon AWS:
- Amazon Simple Queue Service (SQS)
- Google Cloud:
- GCP Pub/Sub and Cloud Tasks
- Microsoft Azure:
- Service Bus
- Event Hubs
- Event Grid
Contact and support
author: Milan Karajovic
Portfolio: milan.karajovic.rs