Saturday, November 1, 2014

System Design: Design a Tiny URL

1. What is a Tiny URL?
http://n00tc0d3r.blogspot.com/2013/09/big-data-tinyurl.html

tinyurl is a URL service that users enter a long URL and then the service return a shorter and unique url such as "http://tiny.me/5ie0V2". The highlight part can be any string with 6 letters containing [0-9, a-z, A-Z]. That is, 62^6 ~= 56.8 billions unique strings.

Step 1: Constraints and use cases
Use cases:
  1. shortening: take a URL => return a much shorter URL
  2. Redirection: take a short URL => redirect to the original URL
  3. Custom URL
  4. High availability of the system
Out of scope:
  1. Analytics
  2. Automatic link expiration
  3. Manual link removal
  4. UI v.s. API

System contraints
1. New URLs per month: 100 MLN
2. 1BN requests per month
3. 10% from shortening and 90% from redirection
4. Request per second: 400 + (40 shortening, 360 from redirection)
5. 6BN URLs in 5 years
6. 500 bytes per URL
7. 5 bytes per hash
8. 3 TB for all URLs, 36 GB for hashes
9. New data written per second (40 * (500 + 5)) = 20 K
10. New data read per second 360 * (500 + 5) = 180 K

Step 2: Abstract Design

  1. Application service layer
    1. Shortening service
    2. Redirection service
  2. Data storage layer (keep track of hash <=> url mapping)
    1. Acts like a big hash table


hashed_url = convert_to_base64(md5(original_url + random_salt))[:6]


Step 3: Understand the bottlenecks
Traffic:
Lot of data:

Bottleneck:
Traffic is not probably very hard
Data is interesting

Step 4: Scaling your abstract design




Useful links:
http://en.wikipedia.org/wiki/URL_shortening
http://en.wikipedia.org/wiki/TinyURL
http://en.wikipedia.org/wiki/Bitly

Useful concepts:
1. MD5 http://en.wikipedia.org/wiki/MD5
2. Base64 http://en.wikipedia.org/wiki/Base64
3. mysql, nosql http://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/ 
5. consistent hashing http://en.wikipedia.org/wiki/Consistent_hashing
6. MongoDB http://en.wikipedia.org/wiki/MongoDB

No comments:

Post a Comment