Extending Aws Eks Cluster Ips
For most kubernetes clusters using EKS, there is the fear of running out of IPs since each pod gets one IP address from the VPC. For large enterprise clusters, this is a problem. Now the question is, how do we solve this? We can leverage one of the feature of AWS VPC to use secondary IP range combined with customize AWS VPC CNI configuration.
Secondary IPs
For thoser that are not aware, AWS released a feature back in 2017 which allows you to extend your VPC with secondary CIDRs. This means, let’s say your primary CIDR is 10.2.0.0/16, you now have the ability to use another secondary CIDR like 172.14.0.0/16 with the VPC also. We will be taking advantage of this to extend our EKS cluster combined with ability of AWS VPC CNI to use custom CNI config.
The Cluster
We will be creating a VPC cluster that has primary CIDR 10.0.0.0/16 and additional secondary CIDR 172.2.0.0/16. In this VPC, we will create 3 private subnets, two belonging to 10.0.0.0/16 while the other to 172.2.0.0/16.
resource "aws_vpc" "eks_vpc" {
cidr_block = "10.0.0.0/16"
enable_dns_support = true //These configuration are needed for private EKS cluster
enable_dns_hostnames = true
tags = {
"kubernetes.io/cluster/test-cluster" = "shared"
}
}
...
resource "aws_vpc_ipv4_cidr_block_association" "secondary_cidr" {
vpc_id = aws_vpc.eks_vpc.id
cidr_block = "172.2.0.0/16"
}
...
resource "aws_subnet" "private_1" {
vpc_id = aws_vpc.eks_vpc.id
availability_zone = "us-east-1a"
cidr_block = "10.0.3.0/24"
tags = {
Name = "private_1"
"kubernetes.io/cluster/${var.cluster_name}" = "shared" #We are adding this because EKS automatically does this anyway.
}
}
resource "aws_subnet" "private_2" {
vpc_id = aws_vpc.eks_vpc.id
availability_zone = "us-east-1b"
cidr_block = "10.0.4.0/24"
tags = {
Name = "private_2"
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
}
#This is the secondary CIDR subnet.
resource "aws_subnet" "private_3" {
vpc_id = aws_vpc.eks_vpc.id
availability_zone = "us-east-1a" #The secondary subnet must be in the same AZ for AWS CNI to use its IPs.
cidr_block = "172.2.3.0/24"
tags = {
Name = "private_3"
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
}
Now that our VPC has been setup, lets go ahead and create our EKS cluster to launch into private_1 and private_2 subnets both belonging to 10.0.0.0/16 CIDR. For our demo, we will be launching our workers node into one of the subnets in us-east-1a.
resource "aws_eks_cluster" "test_cluster" {
name = var.cluster_name
role_arn = aws_iam_role.cluster.arn
vpc_config {
subnet_ids = [aws_subnet.private_1.id, aws_subnet.private_2.id] #mininum of 2 is required
security_group_ids = [aws_security_group.cluster.id]
}
}
...
resource "aws_launch_template" "eks-cluster-worker-nodes" {
iam_instance_profile {
arn = aws_iam_instance_profile.workers-node.arn
}
image_id = data.aws_ami.eks-worker.id
instance_type = "t3.medium"
key_name = "mykey.pem"
vpc_security_group_ids = [aws_security_group.workers-node.id]
user_data = "${base64encode(local.workers-node-userdata)}"
lifecycle {
create_before_destroy = true
}
}
resource "aws_autoscaling_group" "eks-cluster-worker-nodes-spot" {
...
mixed_instances_policy {
...
launch_template {
launch_template_specification {
launch_template_id = "${aws_launch_template.eks-cluster-worker-nodes.id}"
version = "$Latest"
}
override {
instance_type = "t3.medium"
}
}
}
...
}
To connect the cluster, we will need our awsauth config, the kubeconfig as well as ENIConfig which informs AWS VPC CNI which subnet to use for a particular node. These will be generated from terraform
locals {
kubeconfig = <<KUBECONFIG
apiVersion: v1
clusters:
- cluster:
server: ${aws_eks_cluster.test_cluster.endpoint}
certificate-authority-data: ${aws_eks_cluster.test_cluster.certificate_authority.0.data}
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: aws
name: aws
current-context: aws
kind: Config
preferences: {}
users:
- name: aws
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
command: aws-iam-authenticator
args:
- "token"
- "-i"
- "${var.cluster_name}"
KUBECONFIG
config_map_aws_auth = <<CONFIGMAPAWSAUTH
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: ${aws_iam_role.workers-node.arn}
username: system:node:
groups:
- system:bootstrappers
- system:nodes
CONFIGMAPAWSAUTH
awsauth = <<AWSAUTH
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: ${aws_iam_role.workers-node.name}
username: system:node:
groups:
- system:bootstrappers
- system:nodes
AWSAUTH
}
resource "local_file" "kubeconfig" {
content = "${local.kubeconfig}"
filename = "kubeconfig"
}
resource "local_file" "aws_auth" {
content = "${local.config_map_aws_auth}"
filename = "awsauth.yaml"
}
resource "local_file" "eni-a" {
content = "${local.eni_a}"
filename = "eni-${aws_subnet.private_1.availability_zone}.yaml"
}
...
DEMO
Once terraform apply is complete, these files will be generated and should be applied following the procedures below;
-
export KUBECONFIG to the generated
kubeconfigfile. -
Run kubectl apply -f
eni-us-east-1a.yamlto create CRD which informs the CNI which subnets to create workers pods on. Update the CNI daemonset.kubectl apply -f eni-us-east-1a.yaml kubectl set env daemonset aws-node -n kube-system AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true kubectl set env daemonset aws-node -n kube-system ENI_CONFIG_LABEL_DEF=failure-domain.beta.kubernetes.io/zone -
Run
kubectl apply -f awsauth.yamlfor the workers node to be able to join the cluster. -
Once joined, you should see the pods scheduled on
172.2.3.0/24subnet rather than the primary interface ENI.
$ kubectl get pods -n kube-system -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
aws-node-7tv4z 1/1 Running 0 2m 10.0.3.232 ip-10-0-3-232.ec2.internal <none> <none>
coredns-69bc49bfdd-s5t75 1/1 Running 0 3m9s 172.2.3.218 ip-10-0-3-232.ec2.internal <none> <none>
coredns-69bc49bfdd-wk48q 1/1 Running 0 3m9s 172.2.3.230 ip-10-0-3-232.ec2.internal <none> <none>
kube-proxy-fm564 1/1 Running 0 2m 10.0.3.232 ip-10-0-3-232.ec2.internal <none> <none>
What happened here is that L-IPAMD launches an ENI and instead of attaching secondary IPs from the primary ENI subnet, it uses the IPs from subnet specified in the ENIConfig associated with the node. Note that the subnet in the ENIConfig must be in the same AZ as the subnet of the primary ENI.