Applications in prod. How to handle skyrocket growth with caching.
We are lucky, and the company we work at has skyrocketed. In the previous steps, we set up the Postgres database(s) well, set up monitoring and alerting, optimized our data and queries, and even switched the application to read from read replicas. And now we have so high request rate that our app and Postgres can’t handle it. There are many tricks to increase Postgres productivity. But in the end, to apply a long-term fix, we have to fight the root cause of our problems. We have only one option: to decrease the QPS (Query Per Second) on our Postgres instances and decrease the database size. Real life with a highly loaded system is complex and full of pain, and we don’t want to get lost with it. We have 6 basic strategies to fight QPS and table sizes as a long-term solution: caching, rebuilding business processes to decrease SQL queries numbers required to process the transaction, Split application to microservice architecture and move tables used by microservice to other instances or database technology, like switching to NoSQL databases. Switching to asynchronous transaction processing, tables partitioning, sharding the database, or splitting transaction processing from one database cluster to multiple. Following only one strategy will not guide us to success; in real life, it is always mixed. Let’s discuss how it is better to apply these strategies. It is a huge topic, so in this article, we start with cashing. Caching Caching is a brilliant strategy to decrease both QPS and API latency heavily. It is applicable when the data and content produced by the application and/or results of API calls are the same for a group of users at some period of time and, therefore, can be cached. The main things - you should have a clear caching rules (why, when, what to cache), TTL for cached content, and a fallback algorithm. There are 3 main approaches to apply caching: in-app caching like cachetools for Python, Caffeine Cache for Java/SpringBoot, etc., use a key-value in-memory database like Redis or Memcached, use an external app for caching like reverse proxy - nginx, traefik, etc. Sometimes, terms like ‘cache tiering’ are being used, but I don’t like it; the most successful high-load applications typically use all ‘tiers’ at the same time. The benefits of in-app caching: cached data can be shared with and controlled by all classes and functions in the worker process, no dependency on any third-party infrastructure application like an external Nginx cache, Redis, or Memcached, The FASTEST cache solution for apps ever, provides maximum LATECY DECREASING effect. Weaks: the data in the cache is NOT PERSISTENT (*), any restart of the worker process will completely destroy the cache, requires additional memory for application worker process, and its always limited, dedicated to a single worker process and cannot be shared between multiple processes. Taking into account benefits and weaknesses, in-app caching best suits any short-lived, frequently-accessible, small-sized data, like session details, tokens, feature flags, counters - for throttling, accounting, and metrics, user/sessions limits, etc. Some weaks can be mitigated; for example, on worker process start sequence, the local in-app cache can be preheated with useful data. The benefits of key-value databases for caching - Redis and its clones, Memcached, etc: data can be distributed between application worker processes atomic operations, operations like INCR/DECR without race conditions some PERSISTENCY features sharding and high-availability features data structures like Strings, Lists, Sets, Sorted Sets, Hashes, and more housekeeping policies like LRU, LFU, and TTL-based eviction Pub/Sub & Streams: can be used for messaging. Weaks: Even with persistency features enabled, you may LOOSE some DATA on Redis restarts, it is an in-memory databases, and the total cache size is still limited. So, key-value databases are suitable for distributed caching and messaging, real-time analytics, distributed locking, leaderboards, session storages, etc. With external caching proxies, the responses could be sent to the client before reaching the application server. The benefits of external caching on proxies: reduces large amounts of requests to the application workers with very low latencies, reduces bandwidth to your servers if some kind of CDN is being used, like CloudFlare, AWS CloudFront, Google Cloud CDN, etc. Weaks: supports only HTTP/HTTPS protocols less flexibility, can cache only entire response, not all data can be cached due to security reasons, etc, hard to maintain cache keys, proxy configs can be very complicated. Very important to understand the caching rules. Here are some basic questions to answer to build the caching rules and choose the caching strategies: What data to cache, what are the data types, what is the size, etc? What will the outcome be from caching - how mu

We are lucky, and the company we work at has skyrocketed. In the previous steps, we set up the Postgres database(s) well, set up monitoring and alerting, optimized our data and queries, and even switched the application to read from read replicas. And now we have so high request rate that our app and Postgres can’t handle it. There are many tricks to increase Postgres productivity. But in the end, to apply a long-term fix, we have to fight the root cause of our problems. We have only one option: to decrease the QPS (Query Per Second) on our Postgres instances and decrease the database size. Real life with a highly loaded system is complex and full of pain, and we don’t want to get lost with it.
We have 6 basic strategies to fight QPS and table sizes as a long-term solution:
- caching,
- rebuilding business processes to decrease SQL queries numbers required to process the transaction,
- Split application to microservice architecture and move tables used by microservice to other instances or database technology, like switching to NoSQL databases.
- Switching to asynchronous transaction processing,
- tables partitioning,
- sharding the database, or splitting transaction processing from one database cluster to multiple.
Following only one strategy will not guide us to success; in real life, it is always mixed.
Let’s discuss how it is better to apply these strategies. It is a huge topic, so in this article, we start with cashing.
Caching
Caching is a brilliant strategy to decrease both QPS and API latency heavily. It is applicable when the data and content produced by the application and/or results of API calls are the same for a group of users at some period of time and, therefore, can be cached. The main things - you should have a clear caching rules (why, when, what to cache), TTL for cached content, and a fallback algorithm. There are 3 main approaches to apply caching:
- in-app caching like cachetools for Python, Caffeine Cache for Java/SpringBoot, etc.,
- use a key-value in-memory database like Redis or Memcached,
- use an external app for caching like reverse proxy - nginx, traefik, etc.
Sometimes, terms like ‘cache tiering’ are being used, but I don’t like it; the most successful high-load applications typically use all ‘tiers’ at the same time.
The benefits of in-app caching:
- cached data can be shared with and controlled by all classes and functions in the worker process,
- no dependency on any third-party infrastructure application like an external Nginx cache, Redis, or Memcached,
- The FASTEST cache solution for apps ever, provides maximum LATECY DECREASING effect.
Weaks:
- the data in the cache is NOT PERSISTENT (*), any restart of the worker process will completely destroy the cache,
- requires additional memory for application worker process, and its always limited,
- dedicated to a single worker process and cannot be shared between multiple processes.
Taking into account benefits and weaknesses, in-app caching best suits any short-lived, frequently-accessible, small-sized data, like session details, tokens, feature flags, counters - for throttling, accounting, and metrics, user/sessions limits, etc. Some weaks can be mitigated; for example, on worker process start sequence, the local in-app cache can be preheated with useful data.
The benefits of key-value databases for caching - Redis and its clones, Memcached, etc:
- data can be distributed between application worker processes
- atomic operations, operations like INCR/DECR without race conditions
- some PERSISTENCY features
- sharding and high-availability features
- data structures like Strings, Lists, Sets, Sorted Sets, Hashes, and more
- housekeeping policies like LRU, LFU, and TTL-based eviction
- Pub/Sub & Streams: can be used for messaging.
Weaks:
- Even with persistency features enabled, you may LOOSE some DATA on Redis restarts,
- it is an in-memory databases, and the total cache size is still limited.
So, key-value databases are suitable for distributed caching and messaging, real-time analytics, distributed locking, leaderboards, session storages, etc.
With external caching proxies,
the responses could be sent to the client before reaching the application server. The benefits of external caching on proxies:
- reduces large amounts of requests to the application workers with very low latencies,
- reduces bandwidth to your servers if some kind of CDN is being used, like CloudFlare, AWS CloudFront, Google Cloud CDN, etc.
Weaks:
- supports only HTTP/HTTPS protocols
- less flexibility, can cache only entire response, not all data can be cached due to security reasons, etc,
- hard to maintain cache keys, proxy configs can be very complicated.
Very important to understand the caching rules.
Here are some basic questions to answer to build the caching rules and choose the caching strategies:
- What data to cache, what are the data types, what is the size, etc?
- What will the outcome be from caching - how much QPS are we gonna save, how faster is the API call going to be, etc?
- Security - What if data will be exposed to another worker process or another class/function in your app? In the case of Redis, what if data can be accessible with the terminal and redis-cli?
- What should cache keys look like?
- What group of users/API calls/business processes/whatever grouping we have will be affected by the caching algorithm?
- What about sharing the cache between processes?
- etc
TTL
Just believe me and never use cached data without TTL! It is the basic metric and easiest way to refresh the stale data and free the cache space if data is not needed anymore. Also, if the cache supports eviction policies like LRU or LFU, the policy should be set, too. TTL and eviction policies help to keep your cache operational and automate housekeeping processes.
Fallback.
Fallbacks are the most important part of building a caching system. Fallbacks should handle any exceptions with caching: refresh the cache if data is stale with TTL expired, recalculate and update cache if no data available by the key, do the calculations if cache is not available (any Redis error, network issue between worker process and Redis, in-app cache is disabled, etc). In other words, fallbacks allow applications to work even if caches are not available at all. IMHO, a well-designed system should keep working and at least handle the median workload without any 503/502 or other errors, even if caches become unavailable by an incident or by misconfiguration. It may work with higher latencies or may go asynchronous, but it keeps working and serving clients in any case. But it could happen in a perfect world only.
Cache preheat.
In theory, your app should work without caches, but real life is more interesting. Usually, applications can’t work and/or can’t handle the workload without data in caches. You will definitely lose the data in in-app caches on restarts, or you could lose data in Redis or proxy caches, and it will affect the application performance. To avoid performance degradation after data loss, you need to fill caches with data. This process is called cache preheat. Preheating routines usually run on application restarts, but sometimes the preheating runs periodically like cron does to refresh the data in caches.
And, of course, your Grafana or an Application Performance Monitoring (APM) tool must have enough metrics and graphs to see the outcome and efficiency of your efforts to decrease QPS and latencies.
Conclusions
- Caching is a great strategy to decrease QPS and latency.
- We have in-app, external, and proxy caching solutions, and all of them have their own benefits and weaknesses. Should consider it when applying.
- Always keep clear caching rules and policies - what data and how to cache.
- TTL and fallbacks should always be applied, it is a hygienic factor.
- To prevent performance degratation if data in caches has lost consider use cache preheating.
- Always use the graphs, metrics, and numbers before and after applying caching to evaluate your efforts and make your clients and bosses happy.
PS: I use AI a lot as a learning partner or an advisor but not in production. No AI was used to write this article.
(*) For clarity, you can persist the in-app cache, for example, as Nginx does. But it is a very complicated topic with a lot of factors to be taken into account, like security, IOPSes, housekeeping, surviving on restarts, etc. By the current most popular common backend application design approach, the application worker processes are stateless, we consider the in-app cache is not persistent.