Requirements : Functional
- Create a Paste
- Read a Paste
- Delete a Paste
- Edit a Paste
- Expire URL
- No User registration , Authentication , Authorization
- No Favorite
Requirements : Non Functional
- Secure
- Data Privacy / Compliance
- Call Volumes
- create = 500 million per month = 192 per sec
- 60 sec x 60 min x 60 hrs x 24 x 30 = 2.6 million sec in a month
- read = 20x = 10billion per month = 4000 per sec
- avg paste size of 100kb
- max paste size of 5gb
- spike
- High Availability
- within US / Global
- 99%
- Low Latency
Data Design / Entity Diagram
text : 100kb(s3 link)- unique id : 8 bytes
- hashed short url : 8 bytes
- create time stamp : 8 bytes
- update time stamp : 8 bytes
- user : 8 bytes
Data Volumes
- month
- 60 calls in a min
- 3600 in hr
- 24 x 3600 in a day
- 30 x 24 x 3600 = 2,592,000
- 1335 GB ( 2,592,00 MB = 259.2 GB = 0.26 TB)
- yearly : 17 TB
- Is it Fair to assume that 80% of traffic is generated by 20% URLs ?
APIs
- string createShortText (string LongText)
- string getLongText (string shortText)
- void deleteText (shorttext)
High Level
ques
- do we want to use S3 ? does it make sense ?
- use of CDN ?
Why it's usually a bad idea
- RAM and Buffer Bloat: SQL databases are optimized for small, row-based data. Loading a 1GB field into memory can clear out the database's cache, forcing other frequent queries to read from the slow disk instead of RAM.
- Backup and Recovery: Large Blobs make your database backups (dumps) massive and slow. Restoring a 50GB database with 40 huge files takes much longer than restoring a 10GB database of structured data.
- Network Latency: Moving 1GB from the database to your application server takes significant time. If your app is not careful about "streaming" the data, it might try to load the entire 1GB into its own RAM, causing an Out of Memory (OOM) crash.
- Fragmentation: Storing huge chunks of data can lead to disk fragmentation within the database files, making overall database performance degrade over time.
The "Standard" Alternative: Object Storage
The industry-standard approach is to use Object Storage (like AWS S3, Google Cloud Storage, or Azure Blob Storage).- Store the file in Object Storage.
- Store the URL/Path (a small string) in your SQL database.
When it MIGHT be okay
There are rare cases where storing it in the DB is acceptable:- Strict Transactional Integrity: You need the file and its metadata to be deleted or updated in a single "atomic" operation (all or nothing).
- Security: You need the database's specific Row-Level Security (RLS) to manage access to the file.
- avoid server
- if handling large loads become a bottle neck
- addn n/w hops
- request should directly goto S3.
If we do DB first and then there if error in cr8 s3 then we have hanging entries in DB, so its a bad option
If we dont want to use S3 lifecycle , custom cron job to delete.
- use EPOCH time , make sure same number of characters exist in ALL names.