Short URL Service Design


Project Description

Given a long url and a user specified short url. Map the short one to the long one. Very similar to bit.do or zgzg.link

The service is for personal usage, peak traffic is 1 qps

Following features should be implemented, ordered by priority. P0 means top priority, or mission critical feature. P1 means important feature. P2 means good to have feature.

  1. P0: the domain for short link should be short and easy to type on phone. – bosong.link is chosen.
    1. other domains is either too expensive or is taken. such as short.link, quick.link, b.link, bo.link, boso.link, bs.link, etc. bobo.link asks for $500.
  2. P0: bosong.link lands at a short link creation page. In the link creation page:
    1. P0: User specifies short URL and original url
    2. P1: User could login (google login, wechat login) to claim existing links, edit links, delete links.
  3. P0: bosong.link/[shortlink] fetches the corresponding long link in database, redirects to the long link if it exists. Otherwise redirects to the landing page with short link prefilled.

Tech Stack Selection

Tech stack consists of

  • programming language
  • operating system
  • web server
  • database
  • web development framework

There are several mature web app stacks on the market, this article summarizes them. Following are the highlights:

  • LAMP (Linux + Apache + MySQL + PhP)
    • most popular, and most traditional one.
    • Easy to customize
    • This stack does not contain framework
  • MEAN (MongoDB + Express.js (backend framework) + AngularJS (frontend framework) + Node.js (runtime environment)) , javascript in both frontend and backend
    • Node.js is a runtime environment that could execute javascript outside of a browser.
    • Express.js is the framework for Node.js environment.
  • MERN (replace AngularJS with ReactJS in MEAN)
  • Python-Django Apache + MySQL
    • Quick development
    • Suitable for create MVP projects.

SQL vs Non SQL

In almost all situations SQL databases are vertically scalable. This means that you can increase the load on a single server by increasing things like RAM, CPU or SSD. But on the other hand NoSQL databases are horizontally scalable. This means that you handle more traffic by sharding, or adding more servers in your Non SQL database.

There are two kinds of sharding – vertical partition and horizontal partition. Vertical partition – partition based on columns. Columns are clustered into different shards, Horizontal partition – rows are patition into different shards.

数据库sharding基本思想和切分策略

Decision

The table is quite simple in this short URL service, but tons of rows would be added into the table. So Non SQL with horizontal sharding and horizontal scalability is preferred.

However, since current www.bo-song.com server is using LAMP and current memory usage is consistently over 50%. It’s not practical to deploy another Non SQL database.

Therefore, LAMP is used and a new database is created in MySQL.

Pros: save server resources. quick deploy

Cons:

  1. short link service is coupled with personal website. MySQL is the single point of failure.
  2. MySQL is hard to horizontally scale
  3. No framework, hard to extend the code to support more complex logic

Given it’s a personal usage short link. The cons above is tolerant. A system redesign is needed to support scale up in the future.

Detailed Design

Redirection code – 301 vs 302

HTTP 301 – permanent redirection

HTTP 302 – temporary redirection

Decision: HTTP 302

https://www.searchenginejournal.com/301-vs-302-redirects-seo/299843/#close

Table Design – minimize read cost

# first version
CREATE TABLE url ( 
        id INT unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY, 
	short_url VARCHAR(255) NOT NULL, 
	original_url VARCHAR(65535) NOT NULL, 
	owner_email VARCHAR(255), 
	is_deleted BOOLEAN DEFAULT false, 
	visit_count INT DEFAULT 0, 
	creation_date DATE DEFAULT CURRENT_DATE, 
	last_visited_date DEFAULT DATE CURRENT_DATE);

There are some bottlenecks in the first version. Several fields need to be updated in every read operation, such as last_visited_date and visit_count. Though these costy operations could be done after server returns the response so that user is unware of such delay. However, it increases the server burden and might crash the server in peak traffic.

Therefore in version 2, these two fields are removed. There is no DB write operation in url query request.

# version 2
CREATE TABLE url ( id INT unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY, 
     short_url VARCHAR(255) NOT NULL, 
     original_url VARCHAR(65535) NOT NULL, 
     owner_email VARCHAR(255), 
     is_deleted BOOLEAN DEFAULT false, 
     creation_date DATE DEFAULT CURRENT_DATE);

Future Work (Non MVP features)

Hotlink Memory Caching

QR code generation

Link edit/delete, user login

DDos Prevention

Useful Resources

MySQL php operation https://www.w3schools.com/php/php_mysql_connect.asp

MySQL creating new db

https://www.a2hosting.com/kb/developer-corner/mysql/managing-mysql-databases-and-users-from-the-command-line

Write 302 redirection

Set up mod_write rules

https://gist.github.com/RaVbaker/2254618

Apache official document for rewrite rules

https://httpd.apache.org/docs/current/rewrite/remapping.html

It seems different Apache versions handle the file path in RewriteCond differently, try different approaches to find out the correct pattern for it.

https://www.digitalocean.com/community/tutorials/how-to-set-up-mod_rewrite

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.