Saturday, March 7, 2026

Requirement : Functional

Every Click of each advertisement should be counted
We should be able to run arbitrary analytical queries on the data

Requirement : Functional

Secure
Data Privacy Compliance
Call Volumes
High Availability
Low Latency : query should be returned as fast as possible

time series granularty of 1 min

FAST : Data should be available withing a few seconds of click

High level

since APIs are stateless , round robin load balancer can be used
check duplicate for malicous users
batch data as parquet files and push in data warehouse

its not a good choice to use dynamo DB instead of kafka , may not be able to support write throughput

Kafka has log based writes , so it always appends data , hence disk moves sequentially not randomly

Kafka one partition can easily do 50k writes per second per partition

Data / Entity Object diag ( he did not do it )

linkClicked(String ad-unique-id, String user-id, timestamp)

Block Diagram

System Design : Paste Bin

Requirements : Functional

Create a Paste
Read a Paste
Delete a Paste
Edit a Paste
Expire URL
No User registration , Authentication , Authorization
No Favorite

Requirements : Non Functional

Secure
Data Privacy / Compliance
Call Volumes

create = 500 million per month = 192 per sec

60 sec x 60 min x 60 hrs x 24 x 30 = 2.6 million sec in a month

read = 20x = 10billion per month = 4000 per sec
avg paste size of 100kb
max paste size of 5gb
spike

High Availability

within US / Global
99%

Low Latency

Data Design / Entity Diagram

~~text : 100kb~~ (s3 link)
unique id : 8 bytes
hashed short url : 8 bytes
create time stamp : 8 bytes
update time stamp : 8 bytes
user : 8 bytes

100kb = 0.1mb

Data Volumes

month

60 calls in a min
3600 in hr
24 x 3600 in a day
30 x 24 x 3600 = 2,592,000
1335 GB ( 2,592,00 MB = 259.2 GB = 0.26 TB)

yearly : 17 TB
Is it Fair to assume that 80% of traffic is generated by 20% URLs ?

APIs

string createShortText (string LongText)
string getLongText (string shortText)
void deleteText (shorttext)

High Level

this has a single point of failure , the load balancer

Extending arch to PasteBin

ques

do we want to use S3 ? does it make sense ?
use of CDN ?

By poor cache efficiency DB page sizes gets messed up.

Generally, storing a 1GB text file directly in a SQL database is not a good idea. While most modern SQL databases (like PostgreSQL or MySQL) have "Large Object" types that can technically hold up to 1GB or 4GB, doing so often leads to significant performance and operational headaches.

Why it's usually a bad idea

RAM and Buffer Bloat: SQL databases are optimized for small, row-based data. Loading a 1GB field into memory can clear out the database's cache, forcing other frequent queries to read from the slow disk instead of RAM.
Backup and Recovery: Large Blobs make your database backups (dumps) massive and slow. Restoring a 50GB database with 40 huge files takes much longer than restoring a 10GB database of structured data.
Network Latency: Moving 1GB from the database to your application server takes significant time. If your app is not careful about "streaming" the data, it might try to load the entire 1GB into its own RAM, causing an Out of Memory (OOM) crash.
Fragmentation: Storing huge chunks of data can lead to disk fragmentation within the database files, making overall database performance degrade over time.

The "Standard" Alternative: Object Storage

The industry-standard approach is to use Object Storage (like AWS S3, Google Cloud Storage, or Azure Blob Storage).

Store the file in Object Storage.
Store the URL/Path (a small string) in your SQL database.

When it MIGHT be okay

There are rare cases where storing it in the DB is acceptable:

Strict Transactional Integrity: You need the file and its metadata to be deleted or updated in a single "atomic" operation (all or nothing).
Security: You need the database's specific Row-Level Security (RLS) to manage access to the file.

How does content get into S3 ?

avoid server

if handling large loads become a bottle neck
addn n/w hops

request should directly goto S3.

Other is Pre-Signed URL

(since we do not want to share S3 config details) Max size is 5GB , bigger file size Multi part upload

Naive Attempt	Better
CreatePaste(String longStringOfText, long expirationTimeMs) -> String shortURL FetchPaste(String shortURL) -> String longStringOfText	CreateSignedURL(long expirationTime) -> String s3URL CreatePaste(String s3URL, long expirationTimeMs) -> String shortURL FetchPaste(String shortURL) -> String s3URL

If we do DB first and then there if error in cr8 s3 then we have hanging entries in DB, so its a bad option

If we dont want to use S3 lifecycle , custom cron job to delete.

use EPOCH time , make sure same number of characters exist in ALL names.

System Design : Load Balancers

How can i use multiple load balancers in a system design architecture diagram ?

Using multiple load balancers (LBs) in a system design isn't just about handling more traffic; it’s about removing Single Points of Failure (SPOF) and organizing your architecture into manageable layers.

Here is how you can effectively integrate multiple load balancers into your design.

1. Multi-Tier Load Balancing (Vertical Scaling)

The most common way to use multiple LBs is to place them at different layers of the "stack." This separates external traffic management from internal service communication.

Tier 1: Global/External LB: This sits at the edge of your network (often a DNS-based LB like AWS Route 53 or a Content Delivery Network). It routes traffic to the nearest geographic data center.
Tier 2: Regional/External LB: Once traffic hits a data center, a Layer 4 (Transport) or Layer 7 (Application) LB distributes it across web servers.
Tier 3: Internal LB: Used for communication between internal microservices. For example, your Web Server layer might talk to a "Payment Service" through an internal load balancer to ensure the payment service itself is highly available.

2. High Availability Pair (Redundancy)

Placing a single load balancer in front of your cluster creates a bottleneck. If the LB crashes, the whole system goes down. To solve this, use a primary-secondary or active-active setup.

Active-Passive: One LB handles traffic while the other stays on standby. They share a "Floating IP." If the primary fails, the secondary takes over the IP.
Active-Active: Both LBs handle traffic simultaneously. This is often managed via DNS (Round Robin) or BGP (Border Gateway Protocol)

3. Specialized Load Balancing Types

When drawing your diagram, distinguish between these two based on where they sit in the OSI model:

Choosing between a Network Load Balancer (NLB) and an Application Load Balancer (ALB) depends on whether you need raw speed and protocol flexibility or intelligent, content-aware routing.

Quick Decision Guide

Choose a Network Load Balancer (Layer 4) if you need to handle millions of requests per second with ultra-low latency, or if you are using non-HTTP protocols like TCP, UDP, or TLS. It is ideal for gaming, IoT, and real-time streaming.
Choose an Application Load Balancer (Layer 7) if you are running modern web applications or microservices that require routing based on URL paths, host headers, or cookies. It is the standard choice for HTTP/HTTPS traffic.

If using multiple load balancers ,how to decide which request goes to which load balancer ?

When you have multiple load balancers, the decision of "who goes where" happens through a layered routing hierarchy. Think of it like a funnel: traffic starts broad and gets more specific as it moves deeper into your system.

1. The Entry Point: DNS & GSLB

The very first decision is made before the request even hits your servers. When a user types your URL, Global Server Load Balancing (GSLB) via DNS decides which regional load balancer to use.

Proximity/Latency: Routing the user to the data center physically closest to them (e.g., a London user goes to eu-west-1).
Health: If the US-East data center is down, DNS sends everyone to US-West.
Weighted Round Robin: Sending 10% of traffic to a new "Canary" deployment to test features.

2. The Network Layer: L4 Load Balancing

Once traffic hits a specific data center, a Network Load Balancer (NLB) often acts as the "front door." It doesn't look at the data inside the packet; it just looks at the IP address and Port.

Hash-based Routing: The NLB takes the Source IP and Port and runs a math function (hash) to pick a target. This ensures that a specific user stays connected to the same backend for that session (Sticky Sessions).
High Throughput: Since it doesn't "open" the packets, it can redirect millions of requests per second to a fleet of internal Application Load Balancers.

3. The Application Layer: L7 Load Balancing

This is where the "smart" decisions happen. An Application Load Balancer (ALB) opens the HTTP packet and makes decisions based on the content.

Path-based Routing: example.com/api goes to the API Cluster, while example.com/images goes to the Image Bucket.
Host-based Routing: app.example.com goes to one fleet, while blog.example.com goes to another.
Header/Cookie Routing: If a request has a cookie user_tier=premium, the ALB can route it to high-priority, faster servers

System Design : Databases

https://www.youtube.com/watch?v=6GebEqt6Ynk

TWO TYPES OF INDICES

LSM-based databases

Log strucured Merge Tree :

Designed for high write throughput.
Use a log-structured merge-tree to manage data by first writing to memory (MemTable)
Then flushing sorted, immutable files (SSTables) to disk.
Common examples include RocksDB, Apache Cassandra, LevelDB, InfluxDB, and ScyllaDB, which are frequently used in NoSQL and time-series applications.

Avoids getting locks when reading immutable table = performance benefits

B-trees based databases

Like Postgres & MySQL

When searching

LSM may search in memory and disk hence read maybe slower
Btree : always search by going to single location (single search) on disk.

https://www.linkedin.com/pulse/multi-leader-replication-handling-write-conflicts-saurav-prateek/

Yes, multi-leader SQL replication inherently has write conflicts because multiple nodes can independently accept and commit writes to the same data simultaneously. These conflicts are typically detected asynchronously when the changes are replicated between the leaders, requiring specific strategies for resolution.

Understanding Write Conflicts
In a multi-leader setup, each leader processes write requests locally and then propagates the changes to other leaders. When two users modify the same record on different leaders at roughly the same time, both writes succeed locally. The conflict is only discovered later during the asynchronous replication process, by which point it's often too late to involve the user in the resolution.

In contrast, a single-leader system avoids this by having a single primary node manage all writes, either blocking or aborting subsequent writes to ensure sequential order and immediate consistency.

Conflict Resolution Strategies
Because conflicts are a natural outcome in multi-leader systems, mechanisms must be in place to ensure all replicas eventually converge to the same consistent state (convergent conflict resolution). Common strategies include:

Conflict Avoidance: The simplest approach is to design the application so that all writes for a specific record are routed to the same designated leader (e.g., a user's "home" data center).
Last Write Wins (LWW): Each write is tagged with a unique identifier, such as a timestamp (which requires synchronized clocks) or a UUID. The write with the highest ID is chosen as the winner, and others are discarded. This can lead to data loss.
Merging Values: For certain data types (like collaborative text editing or shopping carts), the system might merge the conflicting changes into a single value.
Application-Level Resolution: The database stores all conflicting versions, and custom application code (either on read or on write) handles the logic to resolve the conflict based on business rules, sometimes involving user input. For example, CouchDB uses this approach.
Conflict-Free Replicated Data Types (CRDTs): These are specialized data structures that are mathematically guaranteed to converge to the same state regardless of the order of concurrent updates, making conflict resolution automatic and safe.

Different SQL database implementations have varying levels of support for multi-leader replication and conflict handling. For example, MySQL Group Replication offers virtual synchronous replication with conflict handling, while in older MySQL multi-master setups, manual conflict avoidance or external tools were necessary. Modern SQL Server Peer-to-Peer replication can be configured for automatic LWW conflict resolution.

To write in 2 tables in 2 diff tables on 2 diff computers need "2 phase commit"

System Design : Tiny URL

https://www.youtube.com/watch?v=xFeWVugaouk&list=PLjTveVh7FakLGZ36GbWAk_DMf_0xBZpGv&index=2

https://docs.google.com/presentation/d/1kAAxyxuY8KX4Oo9aDo1ric7-AxnhPdPI0ceFasTM4j8/edit?slide=id.g33d3b1923dc_0_61#slide=id.g33d3b1923dc_0_61

Requirements : Functional

createShortURL(String longURL) : returns string shortURL

expiration time
purge time

fetchShortURL(String shortURL) : redirects user to corresponding longURL

Requirements : Non Functional

secure
data privacy / compliance
call volume :

launch 1million users , DAU , WAU , MAU
URL creation 600 million users per month , 228 per second
URL Retrieval 10 billion per month : 3805 per second
spike 5x

low latency
High availability
Uptime : 99%

Data Volume

How many characters should we use for our URL suffix?

600 million creations/mo = 7.2 billion creations/y = 720 billion creations/century
~~If each URL can contain a-z and 0-9, we have 36 choices per character~~
~~If using 8 characters, we have 368 = 3 trillion possibilities~~

Back of Envelope Calculation

Number Of Database Nodes

3805 database reads per second
Simple reads, can probably be handled on a single node.

228 writes per second
Can most likely be handled on a single node, depending on write complexity

74 bytes * 720 billion = 48TB of data
Can be handled on a beefy node, but likely worth horizontally scaling

Block diagram

Create Database Index for reads , indexed on shortURL value

Replication

if a leader fails, postgres is able to promote one of the replica as leader

We may loose some data if change of leader happens

Since we do not know which short URL will be hot , use LRU policy will max hit rate

One option is Citus

Citus is an open-source extension that transforms PostgreSQL into a distributed database, allowing it to scale horizontally across multiple commodity servers

Key Features and Concepts

As an Extension, not a Fork: Citus is implemented as an extension to standard PostgreSQL. This means it can stay current with the latest PostgreSQL features, tooling, and ecosystem, including support for recent versions like Postgres 18.
Horizontal Scaling: It enables PostgreSQL to use more CPU cores, memory, and storage than a single machine by distributing data and queries across a cluster of "nodes" in a shared-nothing architecture.
Sharding and Distribution: Citus shards (splits) large tables into smaller pieces distributed across "worker" nodes, while "reference" tables (smaller, frequently joined tables) are replicated across all nodes for maximum performance.
Distributed Query Engine: It includes a query planner and executor that parallelizes SQL queries across the worker nodes, resulting in dramatically improved response times for data-intensive applications.
Compatibility: Users can interact with the distributed database using standard SQL and existing PostgreSQL tools.
Deployment Options: Citus is available as a 100% open-source download or as a managed cloud service through Azure Cosmos DB for PostgreSQL.

Primary Use Cases

Citus is designed for performance-critical applications that have outgrown a single Postgres instance. Common use cases include:

Multi-tenant SaaS applications: Sharding data by tenant to isolate workloads and improve performance.
Real-time analytics and dashboards: Providing low-latency queries on large datasets (terabytes or more).
Time-series and IoT data platforms: Efficiently managing and analyzing large volumes of time-stamped data

LSM-based databases

designed for high write throughput, use a log-structured merge-tree to manage data by first writing to memory (MemTable) then flushing sorted, immutable files (SSTables) to disk. Common examples include RocksDB, Apache Cassandra, LevelDB, InfluxDB, and ScyllaDB, which are frequently used in NoSQL and time-series applications

Avoids getting locks when reading immutable table = performance benefits

B-trees

Many popular database management systems use B-trees (or the related B+ trees) as their primary indexing method due to their efficiency in handling data stored on disk.

When searching

LSM may search in memory and disk hence read maybe slower
Btree : always search by going to single location (single search) on disk.

Replication

If use a single leader based replication we have risk of loosing data if failover happens

Example : Cassandra & Sila

Do we need strong consistency ?

no , if leader goes down it can have inconsistent reads

Multi Leader replication : enables low latency writers in geo diverse regions , can be used , comes with a cost of write conflict when 2 users come & try to claim same short URL,

We can solve it by multiple shard leaders

My Tech Learnings

Saturday, March 7, 2026

System Design : Ad Click Aggregator

System Design : Paste Bin

Requirements : Functional

Requirements : Non Functional

Data Design / Entity Diagram

Data Volumes

APIs

High Level

ques

Why it's usually a bad idea

The "Standard" Alternative: Object Storage

When it MIGHT be okay

System Design : Load Balancers

How can i use multiple load balancers in a system design architecture diagram ?

1. Multi-Tier Load Balancing (Vertical Scaling)

2. High Availability Pair (Redundancy)

3. Specialized Load Balancing Types

If using multiple load balancers ,how to decide which request goes to which load balancer ?

1. The Entry Point: DNS & GSLB

2. The Network Layer: L4 Load Balancing

3. The Application Layer: L7 Load Balancing

System Design : Databases

TWO TYPES OF INDICES

LSM-based databases

B-trees based databases

When searching

System Design : Tiny URL

Requirements : Functional

Requirements : Non Functional

Data Volume

About Me