Summary : The first Cloud Computing product was born in 2006 and the concept of Cloud Computing was proposed. Now Cloud Computing has penetrated into almost all industries and application scenarios. We may not directly feel the impact of Cloud Computing on daily life, work, and study, but as IT infrastructure, it quietly supports the various applications we are using.

6 Principles of Cloud Computing Architecture Design

We can get to know the overall architecture and service capability of Cloud Computing from another perspective, which is the cloud computing architecture system. From down to up it includes infrastructure, cloud computing operating system, product system (including security and management), solution system, and service system.

A complete cloud computing architecture design is also a step-by-step process that begins with collecting requirements and analysis, designing the architecture based on the requirements analysis, evaluating improvements and delivering implementation, and then achieving continuous operations.

To proceed with architecture design based on Cloud Computing, all technical solutions should follow certain principles, which is the target to be hit in architecture design. There are 6 principles of cloud computing architecture design, including reasonable deployment, business continuity, elastic expansion, performance efficiency, security compliance, and continuous operation. These 6 principles represent different perspectives that need to be considered in architecture design, and only by following these principles can a full-equipped architecture solution be designed. But in practice, it is not necessary to integrate all design patterns into all architectural designs to construct complicated architectural solutions.

Principle 1: Reasonable Deployment

Source: Freepik

The deployment of business systems on the public cloud includes the cloud hosts in the form of virtual machines and physical cloud hosts with higher performance. Hosting services include hosted applications and hosted physical servers.

Based on the status of IT historical resources and compliance requirements, many enterprises have not yet migrated to the cloud. In response to this situation, the cloud computing operating system is extracted and packaged as independent software and services to be deployed in the user's private environment. Unlike public clouds, which are available to "any" user, private deployments are only available to a select few users.

Hybrid architecture enables unified management and scheduling of resources for public cloud and privatized deployed platforms, traditional VMware, OpenStack virtualization platforms or physical servers. The hybrid architecture not only enjoys the benefits of not changing the local environment and meeting compliance requirements, but also enjoys the advantages of abundant cloud platform resources and sufficient service capabilities. Hybrid architecture is also an intermediate state in the current enterprise transformation to the cloud, which will exist for a long time.

In scenarios such as cross-border e-commerce and games going overseas, multiple regions around the world are involved. Deploying services and data closer to users can reduce network latency and improve the access experience. Therefore, global deployment is adopted to focus on how to deploy as close to the user as possible on a global scale, and to achieve a solution for synchronized data storage and processing.

Don't trust any hard drive, any cloud host, any availability zone, or any region. And don't fully trust any cloud provider. When conducting business deployment, multiple public cloud platforms should be selected to enhance business continuity, make up for the shortcomings of individual cloud providers in resources and services, and shield some technical locking and commercial binding of cloud providers.

Principle 2: Business Continuity

Source: Freepik

Business continuity mainly refers to the 3 aspects of high availability, continuous operations, and disaster recovery, and the design pattern is developed according to this logic.

High Availability refers to avoiding business interruption by redundancy and other designs when the resources running the business fail.
Continuous Operations mean that the resources for business operation are fault-free and the business can continuously provide services.
Disaster Recovery refers to the ability to restore applications and data in different environments when the business operating environment is damaged.

Redundancy and business continuity should be implemented in each layer of the architecture design. No redundancy means that there will be a single point, and a single point of failure will cause local service termination.

Storage products: Block storage achieves redundancy through 3 copies. When an error occurs in one copy, the data is verified and recovered by other copies; Data redundancy checks are implemented in object storage through correction codes to provide recoverability; Object storage provides cross-region replication to avoid a single geographic region becoming a single point of object storage.
Backup solution: Improve reliability through cross-availability zone and cross-region data backup in the cloud to avoid storing only one copy of data; Backing up data to the cloud in a hybrid architecture allows for recovery from cloud backup files in the event of data damage in the local environment.
Disaster recovery solution: Achieve disaster recovery for business systems to avoid the current business environment from becoming a single point and improve the availability and risk resistance of the overall business.
High Availability: Achieve redundancy of cloud hosts and availability zones through cross-availability zone load balancing deployment; Achieve high availability across regions and cloud platforms through global load balancing.

Principle 3: Elastic Expansion

Source: Freepik

Tightly coupled systems are not easy to expand, and it is difficult to troubleshoot problems when software bugs and system failures occur. The pressure to call each system component is different, and small problems are magnified step by step, which can easily cause the entire business to be interrupted. To maintain the elastic expansion of the system, the first step is to decouple the system components, including the decoupling of dynamic data and static data. The decoupled components can realize functional unitization and perform their own duties.

The decoupling is followed by the expansion of components and services, namely the vertical expansion, horizontal expansion and automatic expansion, including the expansion of the database layer, as well as the extension of the computing, storage backup, security protection, and product service capabilities of the local environment through the hybrid architecture. Migration of applications and data also counts as an expansion of the entire system. When migrating from one environment to another, the system should maintain elastic expansion and enable rapid implementation when migration is required. Finally, balancing needs to be carried out.

Component decoupling is a prerequisite for achieving expandability and can be done in the following ways.

Remain stateless and store state data in Redis.
Put in load balancing, as expansion and scaling had no effect on the overall business.
Decoupled by message queues or API Gateway. Producers and consumers can be expanded without affecting each other.
Achieve global load balancing of business. Back-end business can be expanded in a hybrid architecture and multi-cloud environment.

Principle 4: Performance Efficiency

Source: Freepik

A very large number of solutions and cases involve performance challenges due to high concurrency and traffic surges. The primary goal in performance efficiency is to discover and enhance the performance of the application and improve the efficiency of resources and components.

The first is computing performance. The single-machine performance is improved by using high-configuration cloud hosts or physical cloud hosts, and the overall service performance is expanded through clusters.

The second is storage and caching. Cache hot data and store temporary state data by Redis, and performing in-memory calculations can improve business performance. Hotspot configuration files and hotspot data are cached through Redis and loaded in advance to reduce access time.

The third is the optimization of network. Select the optimal data center when the business is deployed globally, and improve the network performance based on global infrastructure network, CDN and global application acceleration to obtain the request acceleration effect.

Finally is the introduction of application performance monitoring and stress testing. Evaluate the current performance status and identify problematic bottlenecks from the application perspective, and solves the problem in a targeted manner.

Principle 5: Security Compliance

Source: Freepik

On the one hand, security compliance is to meet the business security protection of their own needs, on the other hand, it is to meet the security supervision requirements. In the specific implementation, these two aspects will be crossed together.

Set up master accounts and sub-accounts in the account system, and manage public keys and private keys separately; Set appropriate roles and assign the minimum permissions required to accounts and roles.
Controlling network access through ACLs; Restrict open ports on cloud hosts through security groups; Control communication across subnets through subnetting and routing. Configure the database and the cloud hosts that only need internal access to the intranet VPCs, set the VPCs that are allowed to access, and set them to not connect to the extranet.
Prevent DDoS, cc, SQL injection, XSS and other attacks.
Security audit; Keep access logs and operation logs; Gradually realize low-frequency storage and archive storage, etc.

Principle 6: Continuous Operation

Cloud resources, cloud services, events and users' applications will be monitored in continuous operation, and alarms will be set. When alarm conditions are reached, relevant personnel will be notified by phone, SMS, email, WeChat, etc. Alarms will be handed over to callback functions, which can realize automated fault handling or corresponding contingency plans and reduce manual intervention.

In addition, it needs to have automatic response and processing functions. Automatic scaling can automatically expand or shrink the number of cloud hosts by monitoring indicators such as CPU.

Detect changes in consumption and business costs in a timely manner and optimize costs. Set alarm values for account balances to avoid rapid spending and achieve cost control.