Requirements : Functional
- Create a Paste
- Read a Paste
- Delete a Paste
- Edit a Paste
- Expire URL
- No User registration , Authentication , Authorization
- No Favorite
Requirements : Non Functional
- Secure
- Data Privacy / Compliance
- Call Volumes
- create = 500 million per month = 192 per sec
- 60 sec x 60 min x 60 hrs x 24 x 30 = 2.6 million sec in a month
- read = 20x = 10billion per month = 4000 per sec
- avg paste size of 100kb
- max paste size of 5gb
- spike
- High Availability
- within US / Global
- 99%
- Low Latency
Data Design / Entity Diagram
text : 100kb(s3 link)- unique id : 8 bytes
- hashed short url : 8 bytes
- create time stamp : 8 bytes
- update time stamp : 8 bytes
- user : 8 bytes
100kb = 0.1mb
Data Volumes
- month
- 60 calls in a min
- 3600 in hr
- 24 x 3600 in a day
- 30 x 24 x 3600 = 2,592,000
- 1335 GB ( 2,592,00 MB = 259.2 GB = 0.26 TB)
- yearly : 17 TB
- Is it Fair to assume that 80% of traffic is generated by 20% URLs ?
APIs
- string createShortText (string LongText)
- string getLongText (string shortText)
- void deleteText (shorttext)
High Level
this has a single point of failure , the load balancer
Extending arch to PasteBin
ques
- do we want to use S3 ? does it make sense ?
- use of CDN ?
By poor cache efficiency DB page sizes gets messed up.
Generally, storing a 1GB text file directly in a SQL database is not a good idea. While most modern SQL databases (like PostgreSQL or MySQL) have "Large Object" types that can technically hold up to 1GB or 4GB, doing so often leads to significant performance and operational headaches.
Why it's usually a bad idea
- RAM and Buffer Bloat: SQL databases are optimized for small, row-based data. Loading a 1GB field into memory can clear out the database's cache, forcing other frequent queries to read from the slow disk instead of RAM.
- Backup and Recovery: Large Blobs make your database backups (dumps) massive and slow. Restoring a 50GB database with 40 huge files takes much longer than restoring a 10GB database of structured data.
- Network Latency: Moving 1GB from the database to your application server takes significant time. If your app is not careful about "streaming" the data, it might try to load the entire 1GB into its own RAM, causing an Out of Memory (OOM) crash.
- Fragmentation: Storing huge chunks of data can lead to disk fragmentation within the database files, making overall database performance degrade over time.
The "Standard" Alternative: Object Storage
The industry-standard approach is to use Object Storage (like AWS S3, Google Cloud Storage, or Azure Blob Storage).- Store the file in Object Storage.
- Store the URL/Path (a small string) in your SQL database.
When it MIGHT be okay
There are rare cases where storing it in the DB is acceptable:- Strict Transactional Integrity: You need the file and its metadata to be deleted or updated in a single "atomic" operation (all or nothing).
- Security: You need the database's specific Row-Level Security (RLS) to manage access to the file.
How does content get into S3 ?
- avoid server
- if handling large loads become a bottle neck
- addn n/w hops
- request should directly goto S3.
Other is Pre-Signed URL
(since we do not want to share S3 config details) Max size is 5GB , bigger file size Multi part upload
If we do DB first and then there if error in cr8 s3 then we have hanging entries in DB, so its a bad option
If we dont want to use S3 lifecycle , custom cron job to delete.
- use EPOCH time , make sure same number of characters exist in ALL names.
No comments:
Post a Comment