https://www.youtube.com/watch?v=xFeWVugaouk&list=PLjTveVh7FakLGZ36GbWAk_DMf_0xBZpGv&index=2
Requirements : Functional
- createShortURL(String longURL) : returns string shortURL
- expiration time
- purge time
- fetchShortURL(String shortURL) : redirects user to corresponding longURL
Requirements : Non Functional
- secure
- data privacy / compliance
- call volume :
- launch 1million users , DAU , WAU , MAU
- URL creation 600 million users per month , 228 per second
- URL Retrieval 10 billion per month : 3805 per second
- spike 5x
- low latency
- High availability
- Uptime : 99%
Data Volume
- 600 million creations/mo = 7.2 billion creations/y = 720 billion creations/century
If each URL can contain a-z and 0-9, we have 36 choices per characterIf using 8 characters, we have 368 = 3 trillion possibilities
Simple reads, can probably be handled on a single node.
228 writes per second
Can most likely be handled on a single node, depending on write complexity
74 bytes * 720 billion = 48TB of data
Can be handled on a beefy node, but likely worth horizontally scaling
Create Database Index for reads , indexed on shortURL value
Replication
if a leader fails, postgres is able to promote one of the replica as leader
We may loose some data if change of leader happens
Since we do not know which short URL will be hot , use LRU policy will max hit rate
One option is Citus
- As an Extension, not a Fork: Citus is implemented as an extension to standard PostgreSQL. This means it can stay current with the latest PostgreSQL features, tooling, and ecosystem, including support for recent versions like Postgres 18.
- Horizontal Scaling: It enables PostgreSQL to use more CPU cores, memory, and storage than a single machine by distributing data and queries across a cluster of "nodes" in a shared-nothing architecture.
- Sharding and Distribution: Citus shards (splits) large tables into smaller pieces distributed across "worker" nodes, while "reference" tables (smaller, frequently joined tables) are replicated across all nodes for maximum performance.
- Distributed Query Engine: It includes a query planner and executor that parallelizes SQL queries across the worker nodes, resulting in dramatically improved response times for data-intensive applications.
- Compatibility: Users can interact with the distributed database using standard SQL and existing PostgreSQL tools.
- Deployment Options: Citus is available as a 100% open-source download or as a managed cloud service through Azure Cosmos DB for PostgreSQL.
- Multi-tenant SaaS applications: Sharding data by tenant to isolate workloads and improve performance.
- Real-time analytics and dashboards: Providing low-latency queries on large datasets (terabytes or more).
- Time-series and IoT data platforms: Efficiently managing and analyzing large volumes of time-stamped data
- LSM may search in memory and disk hence read maybe slower
- Btree : always search by going to single location (single search) on disk.
Example : Cassandra & Sila
Do we need strong consistency ?
no , if leader goes down it can have inconsistent reads
Multi Leader replication : enables low latency writers in geo diverse regions , can be used , comes with a cost of write conflict when 2 users come & try to claim same short URL,
We can solve it by multiple shard leaders
No comments:
Post a Comment