What is CDN? This article will introduce it in detail. - Weekly Sharing

Summary : In today's world, the Internet has become inseparable from people's lives, and Internet businesses such as e-commerce, portals, live broadcasts, and games have a wide audience. Behind these services, there is an important role: CDN. This article will introduce CDN's working principle and main technical knowledge points.

In today's world, the Internet has become inseparable from people's lives, and Internet businesses such as e-commerce, portals, live broadcasts, and games have a wide audience. Behind these services, there is an important role: CDN.

Most businesses on the Internet are inseparable from the support of CDN. It can be said that without CDN, there would be no prosperity of today's Internet.

In this article, we will introduce CDN's working principle and main technical knowledge points.

I. What is CDN?

In the 1990s, Tim Berners-Lee, a professor at the Massachusetts Institute of Technology and the inventor of the World Wide Web, predicted that network congestion would become the biggest obstacle to the development of the Internet shortly in response to the rapidly growing Internet traffic in the time. In this regard, he raised an academic challenge: he wanted to invent a new, fundamentally problem-solving method to achieve congestion-free distribution of Internet content.

This academic challenge eventually led to a revolutionary internet service - CDN- commercialized by Berners-Lee's colleague Professor Tom Leighton and several other scientists. They established the world's first CDN company: Akamai.

Image Source: Britannica

For CDN, there is the following explanation in Wikipedia:

A content delivery network, or content distribution network (CDN), is a geographically distributed network of proxy servers and their data centres. The goal is to provide high availability and performance by distributing the service spatially relative to end users. CDNs came into existence in the late 1990s to alleviate the performance bottlenecks of the Internet as the Internet was starting to become a mission-critical medium for people and enterprises. Since then, CDNs have grown to serve a large portion of the Internet content today, including web objects (text, graphics and scripts), downloadable objects (media files, software, documents), applications (e-commerce, portals), live streaming media, on-demand streaming media, and social media sites.

If we summarize it in a simple sentence, CDN allows users to access proximity to resources, thus achieving optimal access efficiency.

For example, let's assume that a user now needs to access a video resource on a website. Without CDN, all user requests must be processed at the source station, and the resource must be downloaded from the source station.

In this scenario, there will be several problems: first is the impact of the cross-operator network. Currently, in addition to the three major operators, there are also large and small operators around the country. In the cross-operator network mutual access process, there is often a large delay, which leads to a decline in network performance; Secondly, it is a cross-regional problem. Internet services are often oriented to users nationwide or even in a wider range. As the distance between users and the source station increases, the round-trip time of data will also increase, which may affect the service experience; Finally, when a large number of user requests are processed through the source station, there is no doubt that it will bring huge pressure to the source station, and even overwhelm the site in serious cases.

In the scenario of CDN, the architecture of the access request will be shown in the following figure. The resource requests from the source station will be allocated to the edge nodes where the CDN is located. Users will request resources from the allocated nodes according to the principle of close-by access to obtain optimal access efficiency.

II. How Does a CDN Work？

The technical implementation of CDN requires the support of DNS. Here, two nouns related to DNS are involved: A record and Cname record.

A Record: Address record is used to specify the domain name and the corresponding IP address record, for example, www.test com → 10.10.10.10. When accessing the domain name of such records, the DNS server will resolve the corresponding IP address and return it to the client.
Cname record: also known as alias record, is used to specify the relationship between a domain name and another domain name, for example, www.aaa com --> www.bbb. com. When accessing this domain name, DNS will first parse the domain name to the corresponding alias record, resolve the corresponding IP address through the alias, and return it to the client. The purpose of this type is to delegate its domain name to a third party for resolution, which is often used in CDN and other scenarios.

When CDN is not used, we usually configure the A record on DNS, that is, to parse the domain name to the origin IP. At this time, when the client accesses the domain name, the following process will be used:

After we use the CDN, the source station domain name resolution will be configured as Cname. That is, the domain name resolution will be transferred to the CDN domain name, and finally, the IP will be assigned by the CDN manufacturer's GSLB. At this time, the overall access process becomes as shown in the figure below, and the browser will request resources from the CDN node.

The acceleration function of CDN is mainly implemented by GSLB (Global Load Balancer) and the cache system.

1. GSLB

GSLB system can be implemented based on intelligent DNS technology, which is more powerful and intelligent than traditional DNS. GSLB assigns the most suitable node address to users according to the preconfigured policy.

The following are the common scheduling policies of GSLB:

Static Scheduling Based on Local DNS

The policy will find the area corresponding to the IP in the configuration according to the Internet Protocol Address of the Local DNS(or the IP address of the terminal machine) and return the most appropriate CDN node address of this zone to the client.

RTT-based Scheduling

RTT (Round-Trip Time) refers to the round-trip delay of data from the node to the target. This policy will compare the candidate CDN node with the RTT of this address according to the IP address of the Local DNS and schedule the small node of RTT to the user.

Cost and Bandwidth Based Scheduling

The cost aspect is mainly considered from the CDN vendor's perspective; for example, in certain regions with little business, the scheduler will schedule some requests to nodes in other regions for processing, which can reduce the deployment of nodes in that region. Bandwidth-based scheduling, on the other hand, calculates weights and allocates access requests based on the egress bandwidth size of CDN nodes.

Service Level-Based Scheduling

This policy is based on the enterprise service level of the target domain name. It usually assigns nodes of better quality to enterprise customers with higher levels to provide better services to high-level users.

The above are common scheduling policies. CDN manufacturers use these combined methods to provide nearby node resources as much as possible when the cost and bandwidth are satisfied. Of course, it is not ruled out that some CDN manufacturers will also have their own customized strategies.

2. Cache System

The most basic work unit of the caching system is many cache nodes (cache servers). Cache nodes are responsible for directly responding to the end user's access requests and quickly providing the cached local content to users. At the same time, the Cache node will also synchronize the content with the source station, obtain the updated content and the content that is not locally available from the source station and save it locally.

The caching system may have a multi-level architecture, such as a typical three-tier architecture: the edge node, the node closest to the user, is provided to the user for nearby access. When the edge node misses the resource, it will request the upper node. If the central node is still missed, it will go back to the source station to get it.

III. CDN Application Scenarios

1. Website Acceleration

It mainly aims at business scenarios such as portal websites and e-commerce, such as NetEase, Taobao, and other sites. Such sites often have a lot of static content files (text, pictures, etc.), which CDN can accelerate, thereby significantly improving page response time and user experience.

2. File Download Acceleration

File download acceleration is an important function of CDN. Common scenarios include software patch package release, game installation package acquisition, etc. Such file capacity is large, and it is easy to bring performance and bandwidth pressure to the source station during the download process. Through the CDN method, these pressures can be effectively shared, and the download efficiency of the client can be improved.

3. Streaming Media Acceleration

The way of streaming media acceleration is to push the streaming media content to the edge node nearest to the user so that the user can obtain the content nearby, thereby improving the video transmission quality, shortening the access time, and saving the backbone network traffic. Streaming media acceleration includes live and on-demand modes, which apply to audio and video websites and applications, such as Tiktok and iQIYI.

4. Whole Station Acceleration

It is mainly aimed at sites with more dynamic content. Intelligent routing, protocol optimization, and other dynamic acceleration technologies can improve the network efficiency from the client to the source station and facilitate rapid access to dynamic resources.

IV. Benefits from CDN

1. Cost Saving

CDN distributes traffic through widely deployed nodes. The hit rate for static resources is usually more than 90%, which greatly reduces the source station's bandwidth and server resource requirements and can greatly reduce the cost of enterprises.

In addition, for Internet-type enterprises, business traffic tends to have greater volatility. For example, when e-commerce enterprises are doing activities, their traffic on the day may reach several times than usual, but after the activities, it will decline more.

In the case of not using a CDN, preparing enough resources to cope with the traffic peak is often necessary, which will cause a large waste of resources. The CDN uses a use-first-pay-later model, effectively reducing this waste of resources.

2. Improve User Experience

There exists an 8-second law on the Internet: when users visit a website, if they wait more than 8 seconds for the page to open, more than 70% of them will give up waiting. And for every extra second of loading time, you will lose 7% of users. This law shows the importance of speed for Internet business.

CDN service solves the common problems that cause network obstruction, such as cross-region and cross-operator. The mode of intelligent allocation and nearby access can effectively improve users' download efficiency for relevant resources, thus greatly enhancing user experience.

3. Increase Security

At present, hacker attacks on the Internet are not uncommon. Traffic attacks such as DDoS often exhaust the resources of the source station through a large number of requests, thereby making it impossible for normal users to access. The CDN shields the IP information of the source station through the method of Cname, which makes it impossible for attackers to attack the address of the source station directly, and the CDN has widely distributed nodes, which can effectively reduce the harm of the attack and enhance the Security of the business.

4. Reduce Operation and Maintenance Complexity

CDN vendors usually provide a one-stop stack of services, including supporting monitoring and alarms, service analysis, and software tools, which can better reduce the complexity of operation and maintenance and facilitate more energy into the core business.

Appendix: CDN Related Terms

Accelerated domain name: refers to the domain name using CDN acceleration service.
Edge node: the cache server provided for users to access nearby.
Hit rate: The way CDN provides acceleration for static files is mainly performed through caching technology. When a client-side request arrives at the CDN node, if the requested content file has been cached, it will be directly obtained in the cache and returned to the client. The CDN node will go back to the source station to pull the file if there is no cache. The hit rate represents the proportion of the client's request hitting the CDN cache.
Back-to-source: When the cache is not hit, the CDN node will return to the source station to obtain resources. This process is called back-to-source.
Refresh: load the specified resources to the CDN in advance.