My Tech Learnings: System Design : Paste Bin

Requirements : Functional

Create a Paste
Read a Paste
Delete a Paste
Edit a Paste
Expire URL
No User registration , Authentication , Authorization
No Favorite

Requirements : Non Functional

Secure
Data Privacy / Compliance
Call Volumes

create = 500 million per month = 192 per sec

60 sec x 60 min x 60 hrs x 24 x 30 = 2.6 million sec in a month

read = 20x = 10billion per month = 4000 per sec
avg paste size of 100kb
max paste size of 5gb
spike

High Availability

within US / Global
99%

Low Latency

Data Design / Entity Diagram

~~text : 100kb~~ (s3 link)
unique id : 8 bytes
hashed short url : 8 bytes
create time stamp : 8 bytes
update time stamp : 8 bytes
user : 8 bytes

100kb = 0.1mb

Data Volumes

month

60 calls in a min
3600 in hr
24 x 3600 in a day
30 x 24 x 3600 = 2,592,000
1335 GB ( 2,592,00 MB = 259.2 GB = 0.26 TB)

yearly : 17 TB
Is it Fair to assume that 80% of traffic is generated by 20% URLs ?

APIs

string createShortText (string LongText)
string getLongText (string shortText)
void deleteText (shorttext)

High Level

this has a single point of failure , the load balancer

Extending arch to PasteBin

ques

do we want to use S3 ? does it make sense ?
use of CDN ?

By poor cache efficiency DB page sizes gets messed up.

Generally, storing a 1GB text file directly in a SQL database is not a good idea. While most modern SQL databases (like PostgreSQL or MySQL) have "Large Object" types that can technically hold up to 1GB or 4GB, doing so often leads to significant performance and operational headaches.

Why it's usually a bad idea

RAM and Buffer Bloat: SQL databases are optimized for small, row-based data. Loading a 1GB field into memory can clear out the database's cache, forcing other frequent queries to read from the slow disk instead of RAM.
Backup and Recovery: Large Blobs make your database backups (dumps) massive and slow. Restoring a 50GB database with 40 huge files takes much longer than restoring a 10GB database of structured data.
Network Latency: Moving 1GB from the database to your application server takes significant time. If your app is not careful about "streaming" the data, it might try to load the entire 1GB into its own RAM, causing an Out of Memory (OOM) crash.
Fragmentation: Storing huge chunks of data can lead to disk fragmentation within the database files, making overall database performance degrade over time.

The "Standard" Alternative: Object Storage

The industry-standard approach is to use Object Storage (like AWS S3, Google Cloud Storage, or Azure Blob Storage).

Store the file in Object Storage.
Store the URL/Path (a small string) in your SQL database.

When it MIGHT be okay

There are rare cases where storing it in the DB is acceptable:

Strict Transactional Integrity: You need the file and its metadata to be deleted or updated in a single "atomic" operation (all or nothing).
Security: You need the database's specific Row-Level Security (RLS) to manage access to the file.

How does content get into S3 ?

avoid server

if handling large loads become a bottle neck
addn n/w hops

request should directly goto S3.

Other is Pre-Signed URL

(since we do not want to share S3 config details) Max size is 5GB , bigger file size Multi part upload

Naive Attempt	Better
CreatePaste(String longStringOfText, long expirationTimeMs) -> String shortURL FetchPaste(String shortURL) -> String longStringOfText	CreateSignedURL(long expirationTime) -> String s3URL CreatePaste(String s3URL, long expirationTimeMs) -> String shortURL FetchPaste(String shortURL) -> String s3URL