Saturday, March 7, 2026

System Design : Ad Click Aggregator

https://www.youtube.com/watch?v=6TroztUV3f8&list=PLjTveVh7FakLGZ36GbWAk_DMf_0xBZpGv&index=4

https://docs.google.com/presentation/d/1sx1oxYRYNwvhEoPM7LqgXPWaZI5kv64-139hNdLpwB4/edit?slide=id.g3707c69fac3_0_10#slide=id.g3707c69fac3_0_10

Requirement : Functional

  • Every Click of each advertisement should be counted
  • We should be able to run arbitrary analytical queries on the data

Requirement : Functional

  • Secure
  • Data Privacy Compliance
  • Call Volumes
  • High Availability
  • Low Latency : query should be returned as fast as possible
    • time series granularty of 1 min
  • FAST : Data should be available withing a few seconds of click

High level










  • since APIs are stateless , round robin load balancer can be used
  • check duplicate for malicous users
  • batch data as parquet files and push in data warehouse












its not a good choice to use dynamo DB instead of kafka , may not be able to support write throughput

Kafka has log based writes , so it always appends data , hence disk moves sequentially not randomly






















Kafka one partition can easily do 50k writes per second per partition




Data / Entity Object diag ( he did not do it ) 

  • linkClicked(String ad-unique-id, String user-id, timestamp)  


Block Diagram



System Design : Paste Bin

Requirements : Functional

  • Create a Paste
  • Read a Paste
  • Delete a Paste
  • Edit a Paste
  • Expire URL
  • No User registration , Authentication , Authorization
  • No Favorite 

Requirements : Non Functional

  • Secure
  • Data Privacy / Compliance
  • Call Volumes
    • create = 500 million per month = 192 per sec
      • 60 sec x 60 min x 60 hrs x 24 x 30 = 2.6 million sec in a month
    • read = 20x = 10billion per month = 4000 per sec 
    • avg paste size of 100kb
    • max paste size of 5gb
    • spike
  • High Availability
    • within US / Global
    • 99%
  • Low Latency 

Data Design / Entity Diagram

  • text : 100kb (s3 link)
  • unique id : 8 bytes
  • hashed short url : 8 bytes
  • create time stamp : 8 bytes
  • update time stamp : 8 bytes
  • user : 8 bytes
100kb = 0.1mb

Data Volumes

  • month 
    • 60 calls in a min
    • 3600 in hr 
    • 24 x 3600 in a day 
    • 30 x 24 x 3600 = 2,592,000 
    • 1335 GB ( 2,592,00 MB = 259.2 GB = 0.26 TB) 

  • yearly : 17 TB
  • Is it Fair to assume that 80% of traffic is generated by 20% URLs ?

APIs

  • string createShortText (string LongText)
  • string getLongText (string shortText)
  • void deleteText (shorttext)

High Level 








this has a single point of failure , the load balancer


























Extending arch to PasteBin









ques

  • do we want to use S3 ? does it make sense ?
  • use of CDN ?




















By poor cache efficiency DB page sizes gets messed up.

Generally, storing a 1GB text file directly in a SQL database is not a good idea. While most modern SQL databases (like PostgreSQL or MySQL) have "Large Object" types that can technically hold up to 1GB or 4GB, doing so often leads to significant performance and operational headaches.

Why it's usually a bad idea

  • RAM and Buffer Bloat: SQL databases are optimized for small, row-based data. Loading a 1GB field into memory can clear out the database's cache, forcing other frequent queries to read from the slow disk instead of RAM.
  • Backup and Recovery: Large Blobs make your database backups (dumps) massive and slow. Restoring a 50GB database with 40 huge files takes much longer than restoring a 10GB database of structured data.
  • Network Latency: Moving 1GB from the database to your application server takes significant time. If your app is not careful about "streaming" the data, it might try to load the entire 1GB into its own RAM, causing an Out of Memory (OOM) crash.
  • Fragmentation: Storing huge chunks of data can lead to disk fragmentation within the database files, making overall database performance degrade over time.

The "Standard" Alternative: Object Storage

The industry-standard approach is to use Object Storage (like AWS S3, Google Cloud Storage, or Azure Blob Storage).
  • Store the file in Object Storage.
  • Store the URL/Path (a small string) in your SQL database.

When it MIGHT be okay

There are rare cases where storing it in the DB is acceptable:
  • Strict Transactional Integrity: You need the file and its metadata to be deleted or updated in a single "atomic" operation (all or nothing).
  • Security: You need the database's specific Row-Level Security (RLS) to manage access to the file. 











How does content get into S3 ?














  • avoid server 
    • if handling large loads become a bottle neck
    • addn n/w hops
  • request should directly goto S3.

Other is Pre-Signed URL
(since we do not want to share S3 config details) Max size is 5GB , bigger file size Multi part upload












Naive AttemptBetter
CreatePaste(String longStringOfText, long expirationTimeMs) -> String shortURL

FetchPaste(String shortURL) -> String longStringOfText
CreateSignedURL(long expirationTime) -> String s3URL

CreatePaste(String s3URL, long expirationTimeMs) -> String shortURL

FetchPaste(String shortURL) -> String s3URL





































If we do DB first and then there if error in cr8 s3 then we have hanging entries in DB, so its a bad option



































If we dont want to use S3 lifecycle , custom cron job to delete.












  • use EPOCH time , make sure same number of characters exist in ALL names.

















System Design : Load Balancers

How can i use multiple load balancers in a system design architecture diagram ?

Using multiple load balancers (LBs) in a system design isn't just about handling more traffic; it’s about removing Single Points of Failure (SPOF) and organizing your architecture into manageable layers. 

Here is how you can effectively integrate multiple load balancers into your design.

1. Multi-Tier Load Balancing (Vertical Scaling)

The most common way to use multiple LBs is to place them at different layers of the "stack." This separates external traffic management from internal service communication.
  • Tier 1: Global/External LB: This sits at the edge of your network (often a DNS-based LB like AWS Route 53 or a Content Delivery Network). It routes traffic to the nearest geographic data center.
  • Tier 2: Regional/External LB: Once traffic hits a data center, a Layer 4 (Transport) or Layer 7 (Application) LB distributes it across web servers.
  • Tier 3: Internal LB: Used for communication between internal microservices. For example, your Web Server layer might talk to a "Payment Service" through an internal load balancer to ensure the payment service itself is highly available.

2. High Availability Pair (Redundancy)

Placing a single load balancer in front of your cluster creates a bottleneck. If the LB crashes, the whole system goes down. To solve this, use a primary-secondary or active-active setup.
  • Active-Passive: One LB handles traffic while the other stays on standby. They share a "Floating IP." If the primary fails, the secondary takes over the IP.
  • Active-Active: Both LBs handle traffic simultaneously. This is often managed via DNS (Round Robin) or BGP (Border Gateway Protocol)

3. Specialized Load Balancing Types

When drawing your diagram, distinguish between these two based on where they sit in the OSI model:
 


 





Choosing between a Network Load Balancer (NLB) and an Application Load Balancer (ALB) depends on whether you need raw speed and protocol flexibility or intelligent, content-aware routing.

Quick Decision Guide
  • Choose a Network Load Balancer (Layer 4) if you need to handle millions of requests per second with ultra-low latency, or if you are using non-HTTP protocols like TCP, UDP, or TLS. It is ideal for gaming, IoT, and real-time streaming.
  • Choose an Application Load Balancer (Layer 7) if you are running modern web applications or microservices that require routing based on URL paths, host headers, or cookies. It is the standard choice for HTTP/HTTPS traffic.

If using multiple load balancers ,how to decide which request goes to which load balancer ?

When you have multiple load balancers, the decision of "who goes where" happens through a layered routing hierarchy. Think of it like a funnel: traffic starts broad and gets more specific as it moves deeper into your system.

1. The Entry Point: DNS & GSLB

The very first decision is made before the request even hits your servers. When a user types your URL, Global Server Load Balancing (GSLB) via DNS decides which regional load balancer to use.
  • Proximity/Latency: Routing the user to the data center physically closest to them (e.g., a London user goes to eu-west-1).
  • Health: If the US-East data center is down, DNS sends everyone to US-West.
  • Weighted Round Robin: Sending 10% of traffic to a new "Canary" deployment to test features.

2. The Network Layer: L4 Load Balancing

Once traffic hits a specific data center, a Network Load Balancer (NLB) often acts as the "front door." It doesn't look at the data inside the packet; it just looks at the IP address and Port.
  • Hash-based Routing: The NLB takes the Source IP and Port and runs a math function (hash) to pick a target. This ensures that a specific user stays connected to the same backend for that session (Sticky Sessions).
  • High Throughput: Since it doesn't "open" the packets, it can redirect millions of requests per second to a fleet of internal Application Load Balancers.

3. The Application Layer: L7 Load Balancing

This is where the "smart" decisions happen. An Application Load Balancer (ALB) opens the HTTP packet and makes decisions based on the content.
  • Path-based Routing: example.com/api goes to the API Cluster, while example.com/images goes to the Image Bucket.
  • Host-based Routing: app.example.com goes to one fleet, while blog.example.com goes to another.
  • Header/Cookie Routing: If a request has a cookie user_tier=premium, the ALB can route it to high-priority, faster servers
 


System Design : Databases

LSM-based databases

Log strucured Merge Tree : 
  • designed for high write throughput, 
  • use a log-structured merge-tree to manage data by first writing to memory (MemTable) 
  • then flushing sorted, immutable files (SSTables) to disk. 
  • Common examples include RocksDB, Apache Cassandra, LevelDB, InfluxDB, and ScyllaDB, which are frequently used in NoSQL and time-series applications

Avoids getting locks when reading immutable table = performance benefits
 

B-trees based databases

  • Like Postgres & MySQL

When searching

  • LSM may search in memory and disk hence read maybe slower
  • Btree : always search by going to single location (single search) on disk.

 



System Design : Tiny URL

https://www.youtube.com/watch?v=xFeWVugaouk&list=PLjTveVh7FakLGZ36GbWAk_DMf_0xBZpGv&index=2

https://docs.google.com/presentation/d/1kAAxyxuY8KX4Oo9aDo1ric7-AxnhPdPI0ceFasTM4j8/edit?slide=id.g33d3b1923dc_0_61#slide=id.g33d3b1923dc_0_61

Requirements : Functional

  • createShortURL(String longURL) : returns string shortURL
    • expiration time
    • purge time 
  • fetchShortURL(String shortURL) : redirects user to corresponding longURL

Requirements : Non Functional

  • secure
  • data privacy / compliance 
  • call volume : 
    • launch 1million users , DAU , WAU , MAU
    • URL creation 600 million users per month , 228 per second
    • URL Retrieval 10 billion per month : 3805 per second
    • spike 5x
  • low latency 
  • High availability 
  • Uptime : 99% 

Data Volume

How many characters should we use for our URL suffix?
  • 600 million creations/mo = 7.2 billion creations/y = 720 billion creations/century
  • If each URL can contain a-z and 0-9, we have 36 choices per character
  • If using 8 characters, we have 368 = 3 trillion possibilities
Back of Envelope Calculation 










Number Of Database Nodes
3805 database reads per second
Simple reads, can probably be handled on a single node.

228 writes per second
Can most likely be handled on a single node, depending on write complexity

74 bytes * 720 billion = 48TB of data
Can be handled on a beefy node, but likely worth horizontally scaling

 Block diagram




















Create Database Index for reads , indexed on shortURL value








Replication










if a leader fails, postgres is able to promote one of the replica as leader

We may loose some data if change of leader happens








Since we do not know which short URL will be hot , use LRU policy will max hit rate



















One option is Citus

Citus is an open-source extension that transforms PostgreSQL into a distributed database, allowing it to scale horizontally across multiple commodity servers

Key Features and Concepts
  • As an Extension, not a Fork: Citus is implemented as an extension to standard PostgreSQL. This means it can stay current with the latest PostgreSQL features, tooling, and ecosystem, including support for recent versions like Postgres 18.
  • Horizontal Scaling: It enables PostgreSQL to use more CPU cores, memory, and storage than a single machine by distributing data and queries across a cluster of "nodes" in a shared-nothing architecture.
  • Sharding and Distribution: Citus shards (splits) large tables into smaller pieces distributed across "worker" nodes, while "reference" tables (smaller, frequently joined tables) are replicated across all nodes for maximum performance.
  • Distributed Query Engine: It includes a query planner and executor that parallelizes SQL queries across the worker nodes, resulting in dramatically improved response times for data-intensive applications.
  • Compatibility: Users can interact with the distributed database using standard SQL and existing PostgreSQL tools.
  • Deployment Options: Citus is available as a 100% open-source download or as a managed cloud service through Azure Cosmos DB for PostgreSQL.
Primary Use Cases
Citus is designed for performance-critical applications that have outgrown a single Postgres instance. Common use cases include:
  • Multi-tenant SaaS applications: Sharding data by tenant to isolate workloads and improve performance.
  • Real-time analytics and dashboards: Providing low-latency queries on large datasets (terabytes or more).
  • Time-series and IoT data platforms: Efficiently managing and analyzing large volumes of time-stamped data
 




















LSM-based databases
designed for high write throughput, use a log-structured merge-tree to manage data by first writing to memory (MemTable) then flushing sorted, immutable files (SSTables) to disk. Common examples include RocksDB, Apache Cassandra, LevelDB, InfluxDB, and ScyllaDB, which are frequently used in NoSQL and time-series applications

Avoids getting locks when reading immutable table = performance benefits

B-trees
Many popular database management systems use B-trees (or the related B+ trees) as their primary indexing method due to their efficiency in handling data stored on disk.

When searching
  • LSM may search in memory and disk hence read maybe slower
  • Btree : always search by going to single location (single search) on disk.
Replication

If use a single leader based replication we have risk of loosing data if failover happens








Example : Cassandra & Sila

Do we need strong consistency ?

no , if leader goes down it can have inconsistent reads

Multi Leader replication : enables low latency writers in geo diverse regions , can be used , comes with a cost of write conflict when 2 users come & try to claim same short URL,  

We can solve it by multiple shard leaders