Kubernetes' Load Balancing Scheme: MetalLB - Weekly Sharing

Summary : MetalLB aims to solve the problem of not support load balancing by providing LoadBalancer integrated with standard network equipment, so that external services on bare metal clusters can also be "normal operation" as much as possible, reducing operation and maintenance management costs.

I. Product Introduction

After Kubernetes deploys a service, we often need to open the service to external users. If you are using a cloud platform (AliCloud, Tencent Cloud, AWS, etc.), this requirement is very simple to handle and can be achieved through the LoadBalancer of the cloud platform.

But if it is a self-built Kubernetes bare metal cluster, it is much more troublesome. The bare metal cluster does not support load balancing by default, and the available solutions are Ingress, NodePort, and ExternalIPs to achieve external access. Unfortunately, these solutions are not perfect in themselves. They all have some shortcomings more or less, which make bare metal clusters a second-class citizen in the Kubernetes ecosystem.

MetalLB aims to solve this problem by providing LoadBalancer integrated with standard network equipment, so that external services on bare metal clusters can also be "normal operation" as much as possible, reducing operation and maintenance management costs.

II. Deployment Requirements

MetalLB deployment requires the following environments to run：

The Kubernetes 1.13.0 or higher cluster we want to run does not have the network load balancing function;
Some IPv4 addresses are used for MetalLB allocation;
If BGP mode is used, one or more routers supporting BGP need to be prepared;
If layer 2 mode is used, port 7946 access and communication between user agents must be allowed between cluster nodes;
The cluster network type needs to support MetalLB, as shown in the figure below

Network Type	Compatibility
Antrea	Yes
Calico	Mostly
Canal	Yes
Cilium	Yes
Flannel	Yes
Kube-ovn	Yes
Kube-router	Mostly
Weave Net	Mostly

III. Working Principle

Metalbcontains two components, Controller and Speaker. The Controller is deployed for Deployment, while Speaker is deployed to each node in the cluster in Daemonset mode.

The specific working principle is shown in the figure below. The Controller is responsible for monitoring Service changes. When the Service is configured in LoadBalancer mode, it is assigned to the corresponding Internet Protocol Address from the IP pool and manages the life cycle of the IP. The Speaker will broadcast or respond according to the selected protocol to realize the communication response of the IP Address. When the traffic arrives at the specified Node through the TCP/UDP protocol, the Kube-Proxy component running on the Node processes the traffic and distributes it to the Pod of the corresponding service.

MetalLB supports two modes, one is Layer2 mode and the other is BGP mode.

1. Layer2 mode

In Layer 2 mode, MetaILb will select one in the Node as the Leader, and all traffic related to the service IP will flow to this node. On this node, Kube-proxy propagates the received traffic to the Pod of the corresponding service. When the leader node fails, another node will take over. From this perspective, Layer 2 mode is more like high availability than load balancing, because only one node can be responsible for receiving data at the same time.

There are two limitations in a Layer 2 model: single-node bottlenecks and slow failover.

Since Layer 2 mode will use a single elected Leader to receive all the traffic of the service IP, which means that the bandwidth of the service entrance is limited to the bandwidth of a single node, and the traffic processing capacity of a single node will become the bottleneck for the entire cluster to receive external traffic.

In terms of failover, the current mechanism is that MetalLB notifies nodes by sending Layer 2 data packets and re-elects the Leader, which can usually be done within seconds. But if it is caused by an unplanned accident, the service IP will not be accessible until the failed client-side refresh its cache entries.

2. BGP mode

BGP mode is true load balancing, which requires routers to support the BGP protocol. Each node in the cluster will suggest a BGP-based peer-to-peer session with the network router and use this session to advertise the IP of the load balance. The routes published by MetalLB are equivalent to each other, which means that the router will use all the target nodes and load balance between them. After the data packet arrives at the node, the Kube-proxy is responsible for the last hop of the traffic routing, sending the data packet to the Pod of the corresponding service.

The way of load balancing depends on your specific router model and configuration. It is common to balance each connection based on packet Hash, which means that all packets of a single TCP or UDP session will be directed to a single computer in the cluster.

BGP mode also has its own limitations. This mode processes some fields in the data package through the hash and uses the hash value as the index of the backend array to allocate a given data package to a specific next node. However, the hash used in the router is usually unstable, so as long as the number of back-end nodes changes, the existing connections will be rehashed randomly, which means that most existing connections will be forwarded to another back-end, which does not know the original connection status. To reduce this issue, it is recommended to use a more stable BGP algorithm, such as the ECMP hash algorithm.

IV. Deployment Installation

MetaILB supports deployment through Kubernetes inventory, Helm, and Kustomize. In this article, we will use the Kubernetes inventory as an example to introduce the deployment and installation of the product. The deployed version is the latest v0.13.4.

Note: Since MetaILB no longer uses configmap since version v0.13.0 and uses custom resource definition to configure, this example will be configured differently from older versions.

1. Start using the ARP mode of kube-proxy

If the cluster uses the kube proxy in the IPVS mode, we must use the ARP mode starting from Kubernetes v.1.14.2.

2. Install MetalLB related components

We should run the following command to install related components. By default, MetalLB will be deployed to the namespace of the metailb-system.

MetalLB will be deployed to the namespace of the metailb-system.

3. Configuration Mode

3.1 Layer2 mode configuration

We create an IPAddressPool and specify the IP pool to be used for allocation.

We create a broadcast statement. If no IP pool is specified here, all IP pool addresses will be used by default.

3.2 BGP mode configuration

For a basic configuration with a BGP router and an IP address range, you need four pieces of information:

The IP address of the router that MetalLB should connect to
The AS number of the router
The AS number that MetalLB should use
The range of IP addresses represented by the CIDR prefix

Example: The IP address pool with AS No. 64500 and 192.168.10.0/24 assigned to MetalLB is connected to the router with AS No. 64501 and address 10.0.0.1. The configuration is as follows:

Creating BGPPeer

Configuring IP address pool

Creating broadcast statement

V. Functional Verification

In this example, we use the Layer2 configuration above to test.

1. We should create and execute a sample yaml file, including svc and deployment.

2. We should view the created SVC status and obtain the IP address.

3. We should access via external browser

VI. Project maturity

The MetalLB project is currently in beta but has been used by multiple people and companies in multiple productions and non-production clusters. According to the frequency of bug reports, no major bugs have been found for the time being.