No User registration , Authentication , Authorization
No Favorite
Requirements : Non Functional
Secure
Data Privacy / Compliance
Call Volumes
create = 500 million per month = 192 per sec
60 sec x 60 min x 60 hrs x 24 x 30 = 2.6 million sec in a month
read = 20x = 10billion per month = 4000 per sec
avg paste size of 100kb
max paste size of 5gb
spike
High Availability
within US / Global
99%
Low Latency
Data Design / Entity Diagram
text : 100kb (s3 link)
unique id : 8 bytes
hashed short url : 8 bytes
create time stamp : 8 bytes
update time stamp : 8 bytes
user : 8 bytes
100kb = 0.1mb
Data Volumes
month
60 calls in a min
3600 in hr
24 x 3600 in a day
30 x 24 x 3600 = 2,592,000
1335 GB ( 2,592,00 MB = 259.2 GB = 0.26 TB)
yearly : 17 TB
Is it Fair to assume that 80% of traffic is generated by 20% URLs ?
APIs
string createShortText (string LongText)
string getLongText (string shortText)
void deleteText (shorttext)
High Level
this has a single point of failure , the load balancer
Extending arch to PasteBin
ques
do we want to use S3 ? does it make sense ?
use of CDN ?
By poor cache efficiency DB page sizes gets messed up.
Generally, storing a 1GB text file directly in a SQL database is not a good idea. While most modern SQL databases (like PostgreSQL or MySQL) have "Large Object" types that can technically hold up to 1GB or 4GB, doing so often leads to significant performance and operational headaches.
Why it's usually a bad idea
RAM and Buffer Bloat: SQL databases are optimized for small, row-based data. Loading a 1GB field into memory can clear out the database's cache, forcing other frequent queries to read from the slow disk instead of RAM.
Backup and Recovery: Large Blobs make your database backups (dumps) massive and slow. Restoring a 50GB database with 40 huge files takes much longer than restoring a 10GB database of structured data.
Network Latency: Moving 1GB from the database to your application server takes significant time. If your app is not careful about "streaming" the data, it might try to load the entire 1GB into its own RAM, causing an Out of Memory (OOM) crash.
Fragmentation: Storing huge chunks of data can lead to disk fragmentation within the database files, making overall database performance degrade over time.
The "Standard" Alternative: Object Storage
The industry-standard approach is to use Object Storage (like AWS S3, Google Cloud Storage, or Azure Blob Storage).
Store the file in Object Storage.
Store the URL/Path (a small string) in your SQL database.
When it MIGHT be okay
There are rare cases where storing it in the DB is acceptable:
Strict Transactional Integrity: You need the file and its metadata to be deleted or updated in a single "atomic" operation (all or nothing).
Security: You need the database's specific Row-Level Security (RLS) to manage access to the file.
How does content get into S3 ?
avoid server
if handling large loads become a bottle neck
addn n/w hops
request should directly goto S3.
Other is Pre-Signed URL
(since we do not want to share S3 config details) Max size is 5GB , bigger file size Multi part upload
Naive Attempt
Better
CreatePaste(String longStringOfText, long expirationTimeMs) -> String shortURL
How can i use multiple load balancers in a system design architecture diagram ?
Using multiple load balancers (LBs) in a system design isn't just about handling more traffic; it’s about removing Single Points of Failure (SPOF) and organizing your architecture into manageable layers.
Here is how you can effectively integrate multiple load balancers into your design.
1. Multi-Tier Load Balancing (Vertical Scaling)
The most common way to use multiple LBs is to place them at different layers of the "stack." This separates external traffic management from internal service communication.
Tier 1: Global/External LB: This sits at the edge of your network (often a DNS-based LB like AWS Route 53 or a Content Delivery Network). It routes traffic to the nearest geographic data center.
Tier 2: Regional/External LB: Once traffic hits a data center, a Layer 4 (Transport) or Layer 7 (Application) LB distributes it across web servers.
Tier 3: Internal LB: Used for communication between internal microservices. For example, your Web Server layer might talk to a "Payment Service" through an internal load balancer to ensure the payment service itself is highly available.
2. High Availability Pair (Redundancy)
Placing a single load balancer in front of your cluster creates a bottleneck. If the LB crashes, the whole system goes down. To solve this, use a primary-secondary or active-active setup.
Active-Passive: One LB handles traffic while the other stays on standby. They share a "Floating IP." If the primary fails, the secondary takes over the IP.
Active-Active: Both LBs handle traffic simultaneously. This is often managed via DNS (Round Robin) or BGP (Border Gateway Protocol)
3. Specialized Load Balancing Types
When drawing your diagram, distinguish between these two based on where they sit in the OSI model:
Choosing between a Network Load Balancer (NLB) and an Application Load Balancer (ALB) depends on whether you need raw speed and protocol flexibility or intelligent, content-aware routing.
Quick Decision Guide
Choose a Network Load Balancer (Layer 4) if you need to handle millions of requests per second with ultra-low latency, or if you are using non-HTTP protocols like TCP, UDP, or TLS. It is ideal for gaming, IoT, and real-time streaming.
Choose an Application Load Balancer (Layer 7) if you are running modern web applications or microservices that require routing based on URL paths, host headers, or cookies. It is the standard choice for HTTP/HTTPS traffic.
If using multiple load balancers ,how to decide which request goes to which load balancer ?
When you have multiple load balancers, the decision of "who goes where" happens through a layered routing hierarchy. Think of it like a funnel: traffic starts broad and gets more specific as it moves deeper into your system.
1. The Entry Point: DNS & GSLB
The very first decision is made before the request even hits your servers. When a user types your URL, Global Server Load Balancing (GSLB) via DNS decides which regional load balancer to use.
Proximity/Latency: Routing the user to the data center physically closest to them (e.g., a London user goes to eu-west-1).
Health: If the US-East data center is down, DNS sends everyone to US-West.
Weighted Round Robin: Sending 10% of traffic to a new "Canary" deployment to test features.
2. The Network Layer: L4 Load Balancing
Once traffic hits a specific data center, a Network Load Balancer (NLB) often acts as the "front door." It doesn't look at the data inside the packet; it just looks at the IP address and Port.
Hash-based Routing: The NLB takes the Source IP and Port and runs a math function (hash) to pick a target. This ensures that a specific user stays connected to the same backend for that session (Sticky Sessions).
High Throughput: Since it doesn't "open" the packets, it can redirect millions of requests per second to a fleet of internal Application Load Balancers.
3. The Application Layer: L7 Load Balancing
This is where the "smart" decisions happen. An Application Load Balancer (ALB) opens the HTTP packet and makes decisions based on the content.
Path-based Routing: example.com/api goes to the API Cluster, while example.com/images goes to the Image Bucket.
Host-based Routing: app.example.com goes to one fleet, while blog.example.com goes to another.
Header/Cookie Routing: If a request has a cookie user_tier=premium, the ALB can route it to high-priority, faster servers
If each URL can contain a-z and 0-9, we have 36 choices per character
If using 8 characters, we have 368 = 3 trillion possibilities
Back of Envelope Calculation
Number Of Database Nodes
3805 database reads per second Simple reads, can probably be handled on a single node.
228 writes per second Can most likely be handled on a single node, depending on write complexity
74 bytes * 720 billion = 48TB of data Can be handled on a beefy node, but likely worth horizontally scaling
Block diagram
Create Database Index for reads , indexed on shortURL value
Replication
if a leader fails, postgres is able to promote one of the replica as leader
We may loose some data if change of leader happens
Since we do not know which short URL will be hot , use LRU policy will max hit rate
One option is Citus
Citus is an open-source extension that transforms PostgreSQL into a distributed database, allowing it to scale horizontally across multiple commodity servers
Key Features and Concepts
As an Extension, not a Fork: Citus is implemented as an extension to standard PostgreSQL. This means it can stay current with the latest PostgreSQL features, tooling, and ecosystem, including support for recent versions like Postgres 18.
Horizontal Scaling: It enables PostgreSQL to use more CPU cores, memory, and storage than a single machine by distributing data and queries across a cluster of "nodes" in a shared-nothing architecture.
Sharding and Distribution: Citus shards (splits) large tables into smaller pieces distributed across "worker" nodes, while "reference" tables (smaller, frequently joined tables) are replicated across all nodes for maximum performance.
Distributed Query Engine: It includes a query planner and executor that parallelizes SQL queries across the worker nodes, resulting in dramatically improved response times for data-intensive applications.
Compatibility: Users can interact with the distributed database using standard SQL and existing PostgreSQL tools.
Deployment Options: Citus is available as a 100% open-source download or as a managed cloud service through Azure Cosmos DB for PostgreSQL.
Primary Use Cases
Citus is designed for performance-critical applications that have outgrown a single Postgres instance. Common use cases include:
Multi-tenant SaaS applications: Sharding data by tenant to isolate workloads and improve performance.
Real-time analytics and dashboards: Providing low-latency queries on large datasets (terabytes or more).
Time-series and IoT data platforms: Efficiently managing and analyzing large volumes of time-stamped data
LSM-based databases
designed for high write throughput, use a log-structured merge-tree to manage data by first writing to memory (MemTable) then flushing sorted, immutable files (SSTables) to disk. Common examples include RocksDB, Apache Cassandra, LevelDB, InfluxDB, and ScyllaDB, which are frequently used in NoSQL and time-series applications
Avoids getting locks when reading immutable table = performance benefits
B-trees
Many popular database management systems use B-trees (or the related B+ trees) as their primary indexing method due to their efficiency in handling data stored on disk.
When searching
LSM may search in memory and disk hence read maybe slower
Btree : always search by going to single location (single search) on disk.
Replication
If use a single leader based replication we have risk of loosing data if failover happens
Example : Cassandra & Sila
Do we need strong consistency ?
no , if leader goes down it can have inconsistent reads
Multi Leader replication : enables low latency writers in geo diverse regions , can be used , comes with a cost of write conflict when 2 users come & try to claim same short URL,