Project Description
Given a long url and a user specified short url. Map the short one to the long one. Very similar to bit.do or zgzg.link
The service is for personal usage, peak traffic is 1 qps
Following features should be implemented, ordered by priority. P0 means top priority, or mission critical feature. P1 means important feature. P2 means good to have feature.
- P0: the domain for short link should be short and easy to type on phone. – bosong.link is chosen.
- other domains is either too expensive or is taken. such as short.link, quick.link, b.link, bo.link, boso.link, bs.link, etc. bobo.link asks for $500.
- P0: bosong.link lands at a short link creation page. In the link creation page:
- P0: User specifies short URL and original url
- P1: User could login (google login, wechat login) to claim existing links, edit links, delete links.
- P0: bosong.link/[shortlink] fetches the corresponding long link in database, redirects to the long link if it exists. Otherwise redirects to the landing page with short link prefilled.
Tech Stack Selection
Tech stack consists of
- programming language
- operating system
- web server
- database
- web development framework
There are several mature web app stacks on the market, this article summarizes them. Following are the highlights:
- LAMP (Linux + Apache + MySQL + PhP)
- most popular, and most traditional one.
- Easy to customize
- This stack does not contain framework
- MEAN (MongoDB + Express.js (backend framework) + AngularJS (frontend framework) + Node.js (runtime environment)) , javascript in both frontend and backend
- Node.js is a runtime environment that could execute javascript outside of a browser.
- Express.js is the framework for Node.js environment.
- MERN (replace AngularJS with ReactJS in MEAN)
- Python-Django Apache + MySQL
- Quick development
- Suitable for create MVP projects.
SQL vs Non SQL
In almost all situations SQL databases are vertically scalable. This means that you can increase the load on a single server by increasing things like RAM, CPU or SSD. But on the other hand NoSQL databases are horizontally scalable. This means that you handle more traffic by sharding, or adding more servers in your Non SQL database.
There are two kinds of sharding – vertical partition and horizontal partition. Vertical partition – partition based on columns. Columns are clustered into different shards, Horizontal partition – rows are patition into different shards.
— 数据库sharding基本思想和切分策略
Decision
The table is quite simple in this short URL service, but tons of rows would be added into the table. So Non SQL with horizontal sharding and horizontal scalability is preferred.
However, since current www.bo-song.com server is using LAMP and current memory usage is consistently over 50%. It’s not practical to deploy another Non SQL database.
Therefore, LAMP is used and a new database is created in MySQL.
Pros: save server resources. quick deploy
Cons:
- short link service is coupled with personal website. MySQL is the single point of failure.
- MySQL is hard to horizontally scale
- No framework, hard to extend the code to support more complex logic
Given it’s a personal usage short link. The cons above is tolerant. A system redesign is needed to support scale up in the future.
Detailed Design
Redirection code – 301 vs 302
HTTP 301 – permanent redirection
HTTP 302 – temporary redirection
Decision: HTTP 302
https://www.searchenginejournal.com/301-vs-302-redirects-seo/299843/#close
Table Design – minimize read cost
# first version
CREATE TABLE url (
id INT unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
short_url VARCHAR(255) NOT NULL,
original_url VARCHAR(65535) NOT NULL,
owner_email VARCHAR(255),
is_deleted BOOLEAN DEFAULT false,
visit_count INT DEFAULT 0,
creation_date DATE DEFAULT CURRENT_DATE,
last_visited_date DEFAULT DATE CURRENT_DATE);
There are some bottlenecks in the first version. Several fields need to be updated in every read operation, such as last_visited_date and visit_count. Though these costy operations could be done after server returns the response so that user is unware of such delay. However, it increases the server burden and might crash the server in peak traffic.
Therefore in version 2, these two fields are removed. There is no DB write operation in url query request.
# version 2
CREATE TABLE url ( id INT unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
short_url VARCHAR(255) NOT NULL,
original_url VARCHAR(65535) NOT NULL,
owner_email VARCHAR(255),
is_deleted BOOLEAN DEFAULT false,
creation_date DATE DEFAULT CURRENT_DATE);
Future Work (Non MVP features)
Hotlink Memory Caching
QR code generation
Link edit/delete, user login
DDos Prevention
Useful Resources
MySQL php operation https://www.w3schools.com/php/php_mysql_connect.asp
MySQL creating new db
Write 302 redirection
Set up mod_write rules
https://gist.github.com/RaVbaker/2254618
Apache official document for rewrite rules
https://httpd.apache.org/docs/current/rewrite/remapping.html
It seems different Apache versions handle the file path in RewriteCond differently, try different approaches to find out the correct pattern for it.
https://www.digitalocean.com/community/tutorials/how-to-set-up-mod_rewrite