Background
EKS clusters come with default AWS VPC CNI plugin that provides some excellent features like getting an address within the VPC subnet range. One limitation of AWS CNI that comes from the number of IP addresses and ENI that you can assign to the single instance. Refer to the official page which shows that limit. As you can see,
Instance type | Maximum network interfaces | Private IPv4 addresses per interface | IPv6 addresses per interface |
---|---|---|---|
t3.large | 3 | 12 | 12 |
AWS VPC-CNI IPs limitation
For t3.large, you can assign 3×12 = 36 IP addresses to a single EC2 instance. Which severely limits the number of pods that can we can schedule in a single node.
Here is the formula for max Pods numbers.
Max Pods = (Maximum Network Interfaces ) * ( IPv4 Addresses per Interface ) - 1
For example, if you have a t3.large instance which supports max three ethernet and 12 IPs per interface. You can create only 35 pods, including the Kubernetes internal Pods, Because One IP is reserve for nodes itself.
3 * 12 - 1 = 35
If you want to replace the default VPC CNI plugin with calico, here is the process for that.
Prerequisites
Before you get started, make sure you have downloaded and configured the necessary prerequisites.
Create EKS cluster
Create cluster “eksctl way”
We can replace VPC-CNI with calico in the EKS cluster, no matter how we created a cluster in the first place. But I see some problems while trying to increase the number of pods we can deploy in each machine if cluster created using **aws eks**
command. Please follow the “eksctl way” method mentioned below to create a cluster.
I have created a dirty bash script to create a cluster. Not perfect, but it will do the jobs.
- First, create a Cloudformation template file using the following content, call it
amazon-eks-nodegroup-role.yaml.
You can find the template in AWS official page also. Which is missing following two sections,
...
- !FindInMap [ServicePrincipals, !Ref "AWS::Partition", eks]
...
- !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKSClusterPolicy"
...
amazon-eks-nodegroup-role.yaml
AWSTemplateFormatVersion: "2010-09-09"
Description: Amazon EKS - Node Group Role
Mappings:
ServicePrincipals:
aws-cn:
ec2: ec2.amazonaws.com.cn
aws-us-gov:
ec2: ec2.amazonaws.com
aws:
ec2: ec2.amazonaws.com
eks: eks.amazonaws.com
Resources:
NodeInstanceRole:
Type: "AWS::IAM::Role"
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service:
- !FindInMap [ServicePrincipals, !Ref "AWS::Partition", ec2]
- !FindInMap [ServicePrincipals, !Ref "AWS::Partition", eks]
Action:
- "sts:AssumeRole"
ManagedPolicyArns:
- !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy"
- !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKS_CNI_Policy"
- !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
- !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKSClusterPolicy"
Path: /
Outputs:
NodeInstanceRole:
Description: The node instance role
Value: !GetAtt NodeInstanceRole.Arn
Now set following environment variables for your bash script as per your environment.
export AWS_ACCOUNT_ID=111111111111
export REGION="us-west-2"
export SUBNET1="subnet-00212121212121"
export SUBNET2="subnet-00212123232323"
export SECURITY_GROUP="sg-095df33a10a8"
export CLUSTER_NAME="democluster"
export INSTANCE_TYPE="t3.large"
export PRIVATE_AWS_KEY_NAME="demokey"
Lets create the cluster using following script,
aws cloudformation create-stack \
--stack-name eksrole \
--template-body file://amazon-eks-nodegroup-role.yaml \
--capabilities CAPABILITY_IAM \
--output text || true
export eks_role_arn=$(aws cloudformation describe-stacks \
--stack-name eksrole \
--query "Stacks[0].Outputs[?OutputKey=='NodeInstanceRole'].OutputValue" \
--output text)
#sleeping 20 seconds, sometimes Cloudformation taking time to create the stack.
sleep 20
echo ${eks_role_arn}
# This will create a cluster.
aws eks create-cluster \
--region ${REGION} \
--name ${CLUSTER_NAME} \
--kubernetes-version 1.16 \
--role-arn ${eks_role_arn} \
--resources-vpc-config subnetIds=${SUBNET1},${SUBNET2}
Now wait for sometimes before you create a node-groups, it will approximately take 10min to create it. After that, you can apply the following scripts.
aws eks create-nodegroup --cluster-name ${CLUSTER_NAME} \
--nodegroup-name ${CLUSTER_NAME} \
--subnets ${SUBNET1} ${SUBNET2} \
--node-role ${eks_role_arn} \
--remote-access=ec2SshKey=${PRIVATE_AWS_KEY_NAME},sourceSecurityGroups=${SECURITY_GROUP} \
--kubernetes-version=1.16 \
--scaling-config=minSize=1,maxSize=1,desiredSize=1 \
--instance-types ${INSTANCE_TYPE} \
--region ${REGION}
Fetch kubeconfig file locally, so that you can use kubectl command as follows,
aws eks update-kubeconfig --name ${CLUSTER_NAME}
Deploy a sample application
Let’s deploy a simple Nginx application to make sure the app will come up or not.
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: load-balancer-example
name: hello-world
spec:
replicas: 1
selector:
matchLabels:
run: load-balancer-example
template:
metadata:
labels:
run: load-balancer-example
spec:
containers:
- image: nginx:1.15.8
name: hello-world
ports:
- containerPort: 80
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
labels:
run: load-balancer-example
name: hello-service
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
run: load-balancer-example
type: LoadBalancer
EOF
Scale Up sample application
Let’s create 20 replicas of sample application as follows,
kubectl scale deployment hello-world --replicas=20
and validate
kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
hello-world 20/20 20 20 16d
So here, all 20 pods are active. Now let’s scale these replicas to 50.
kubectl scale deployment hello-world --replicas=50
and validate
kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
hello-world 30/50 50 30 16d
Only 30 pods are active. Since I am using t3.large, so remaining pods (5 pods based on the above calculation) are must be system related pods. Let’s validate,
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-10-26-180.us-west-2.compute.internal Ready <none> 16d v1.16.8-eks-e16311
$ kubectl get pods -A -o wide | grep ip-10-10-26-180.us-west-2.compute.internal | wc -l
35
As per this, overall running pods are 35, which matches our calculation.
Remove existing AWS CNI components
First, we need to get rid of AWS CNI. We can delete individual resources, but there are lots. The easy way you can do is by using the following manifest file. By the time of writing, I am using the v1.6 version of vpc-cni plugin, which is also the latest one available at the moment.
kubectl delete -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-1.6/config/v1.6/aws-k8s-cni.yaml
Deploy calico components
You can follow this quickstart guide to deploy calico. Since we are implementing calico in the existing cluster, you only need to run following command,
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
You will see output similar to this,
...
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
Let’s confirm everything is up and running or not.
watch kubectl get pods -n kube-system -o wide
Here calico-node Daemonsets has the STATUS
of Running
, but calico-kube-controller is not running.
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-77d6cbc65f-cvhkn 0/1 ContainerCreating 0 19m <none> ip-10-10-3-213.us-west-2.compute.internal <none> <none>
calico-node-2qthl 1/1 Running 0 19m 10.10.3.213 ip-10-10-3-213.us-west-2.compute.internal <none> <none>
coredns-5c97f79574-sxvc7 1/1 Running 0 43m 10.10.3.31 ip-10-10-3-213.us-west-2.compute.internal <none> <none>
coredns-5c97f79574-txm9f 1/1 Running 0 43m 10.10.7.151 ip-10-10-3-213.us-west-2.compute.internal <none> <none>
kube-proxy-lnknf 1/1 Running 0 36m 10.10.3.213 ip-10-10-3-213.us-west-2.compute.internal <none> <none>
Let’s troubleshoot,
kubectl describe po calico-kube-controllers-77d6cbc65f-cvhkn -n kube-system
Here I see the following error log,
...
kubelet, ip-10-10-3-213.us-west-2.compute.internal Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "c5446664b39653e3ef08f88bb55d25b70cae569eb059e4be3d201740ba5b50f7" network for pod "calico-kube-controllers-77d6cbc65f-cvhkn": networkPlugin cni failed to set up pod "calico-kube-controllers-77d6cbc65f-cvhkn_kube-system" network: add cmd: Error received from AddNetwork gRPC call: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused", failed to clean up sandbox container "c5446664b39653e3ef08f88bb55d25b70cae569eb059e4be3d201740ba5b50f7" network for pod "calico-kube-controllers-77d6cbc65f-cvhkn": networkPlugin cni failed to teardown pod "calico-kube-controllers-77d6cbc65f-cvhkn_kube-system" network: del cmd: error received from DelNetwork gRPC call: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"]
Normal SandboxChanged 4m41s (x160 over 39m) kubelet, ip-10-10-3-213.us-west-2.compute.internal Pod sandbox changed, it will be killed and re-created.
In this log, kubelet
is timing out while trying to reach port 50051. This port is mention in readinessProbe
and livenessProbe
of vpc cni manifest file. https://github.com/aws/amazon-vpc-cni-k8s/blob/master/config/v1.6/aws-k8s-cni.yaml#L108-L115
...
readinessProbe:
exec:
command: ["/app/grpc-health-probe", "-addr=:50051"]
initialDelaySeconds: 35
livenessProbe:
exec:
command: ["/app/grpc-health-probe", "-addr=:50051"]
initialDelaySeconds: 35
...
Meaning it is still trying to reach to VPC CNI. Possibly some caching going on. I deleted the node, and the autoscaling group brings up the nodes.
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-10-26-180.us-west-2.compute.internal Ready <none> 4m1s v1.16.8-eks-e16311 10.10.26.180 34.219.58.217 Amazon Linux 2 4.14.177-139.254.amzn2.x86_64 docker://18.9.9
After that everything seems fine and working
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-77d6cbc65f-pvx6r 1/1 Running 0 5m18s
calico-node-rjctk 1/1 Running 0 4m26s
coredns-5c97f79574-746kc 1/1 Running 0 5m18s
coredns-5c97f79574-qvdjr 1/1 Running 0 5m19s
kube-proxy-lgl9k 1/1 Running 0 4m26s
Validate hello-world application
Check the status of the sample application deployed previously. The new pods are coming up as 192.168.*.*
range, which is a calico network.
And verify using the following.
kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/hello-world-64db9f698b-2sd79 1/1 Running 0 16d 192.168.36.132 ip-10-10-26-180.us-west-2.compute.internal <none> <none>
pod/hello-world-64db9f698b-4dw94 0/1 Pending 0 13m <none> <none> <none> <none>
pod/hello-world-64db9f698b-5v2t9 0/1 Pending 0 13m <none> <none> <none> <none>
pod/hello-world-64db9f698b-6dld2 1/1 Running 0 16d 192.168.36.150 ip-10-10-26-180.us-west-2.compute.internal <none> <none>
...
...
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/hello-service LoadBalancer 172.20.130.23 a90c1d3d1ac3b4bcd8ed21ece59a6b47-2002783034.us-west-2.elb.amazonaws.com 80:31095/TCP 16d run=load-balancer-example
service/kubernetes ClusterIP 172.20.0.1 <none> 443/TCP 16d <none>
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/hello-world 30/50 50 30 16d hello-world nginx:1.15.8 run=load-balancer-example
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/hello-world-64db9f698b 50 50 30 16d hello-world nginx:1.15.8 pod-template-hash=64db9f698b,run=load-balancer-example
But still, the total number of active pods is 30. Which means there is some restriction going on EKS NodeGroups. The feature is missing on aws eks
command. Now I have to install eksctl
just to handle this. If you know the workaround, please let me know.
Using eksctl
I found the following options,
$ eksctl create nodegroup --help
...
New nodegroup flags:
-n, --name string name of the new nodegroup (generated if unspecified, e.g. "ng-91a7a011")
-t, --node-type string node instance type (default "m5.large")
...
--max-pods-per-node int maximum number of pods per node (set automatically if unspecified)
...
So initially, I thought the following command would do the trick.
eksctl create nodegroup --cluster ${CLUSTER_NAME} --node-type t3.large --node-ami auto --max-pods-per-node 100
But as per this github issue eksctl
does not support clusters that were not created by eksctl
. I haven’t created the cluster using eksctl
, so this means I am hitting the limit and seeing following issue,
[ℹ] eksctl version 0.22.0
[ℹ] using region us-west-2
[ℹ] will use version 1.16 for new nodegroup(s) based on control plane version
Error: getting VPC configuration for cluster "calico": no eksctl-managed CloudFormation stacks found for "calico"
Test Sample application
Let’s verify if the application is running or not using following,
curl a90c1d3d1ac3b4bcd8ed21ece59a6b47-2002783034.us-west-2.elb.amazonaws.com
I am seeing the following output, which is a response coming from Nginx.
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
eksctl way
- First, create an Amazon EKS cluster without any nodes. This command will create everything from scratch including vpc, subnetes and others. If you want to use existing vpc and subnetes, explore more on
eksctl
by checking help menu or official document.
eksctl create cluster --name my-calico-cluster --without-nodegroup
Output:
[ℹ] eksctl version 0.22.0
[ℹ] using region us-west-2
[ℹ] setting availability zones to [us-west-2c us-west-2b us-west-2a]
[ℹ] subnets for us-west-2c - public:192.168.0.0/19 private:192.168.96.0/19
[ℹ] subnets for us-west-2b - public:192.168.32.0/19 private:192.168.128.0/19
[ℹ] subnets for us-west-2a - public:192.168.64.0/19 private:192.168.160.0/19
[ℹ] using Kubernetes version 1.16
[ℹ] creating EKS cluster "my-calico-cluster" in "us-west-2" region with
[ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-2 --cluster=my-calico-cluster'
[ℹ] CloudWatch logging will not be enabled for cluster "my-calico-cluster" in "us-west-2"
[ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=us-west-2 --cluster=my-calico-cluster'
[ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "my-calico-cluster" in "us-west-2"
[ℹ] 2 sequential tasks: { create cluster control plane "my-calico-cluster", no tasks }
[ℹ] building cluster stack "eksctl-my-calico-cluster-cluster"
[ℹ] deploying stack "eksctl-my-calico-cluster-cluster"
[ℹ] waiting for the control plane availability...
[✔] saved kubeconfig as "/Users/pandeyb/.kube/config"
[ℹ] no tasks
[✔] all EKS cluster resources for "my-calico-cluster" have been created
[ℹ] kubectl command should work with "/Users/pandeyb/.kube/config", try 'kubectl get nodes'
[✔] EKS cluster "my-calico-cluster" in "us-west-2" region is ready
- Since this cluster will use Calico for networking, you must delete the
aws-node
daemon set to disable AWS VPC networking for pods. Follow “Remove existing AWS CNI components” section for this. - Now that you have a cluster configured, you can install Calico, follow section “Deploy calico components” for this.
- Add nodes to the cluster.
eksctl create nodegroup --cluster my-calico-cluster --node-type t3.large --node-ami auto --max-pods-per-node 100
Output:
[ℹ] eksctl version 0.22.0
[ℹ] using region us-west-2
[ℹ] will use version 1.16 for new nodegroup(s) based on control plane version
[ℹ] nodegroup "ng-6d80fb78" will use "ami-06e2c973f2d0373fa" [AmazonLinux2/1.16]
[ℹ] 1 nodegroup (ng-6d80fb78) was included (based on the include/exclude rules)
[ℹ] will create a CloudFormation stack for each of 1 nodegroups in cluster "my-calico-cluster"
[ℹ] 2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create nodegroup "ng-6d80fb78" } } }
[ℹ] checking cluster stack for missing resources
[ℹ] cluster stack has all required resources
[ℹ] building nodegroup stack "eksctl-my-calico-cluster-nodegroup-ng-6d80fb78"
[ℹ] --nodes-min=2 was set automatically for nodegroup ng-6d80fb78
[ℹ] --nodes-max=2 was set automatically for nodegroup ng-6d80fb78
[ℹ] deploying stack "eksctl-my-calico-cluster-nodegroup-ng-6d80fb78"
[ℹ] no tasks
[ℹ] adding identity "arn:aws:iam::xxxxxxxxxx:role/eksctl-my-calico-cluster-nodegrou-NodeInstanceRole-S668LPGH9HFZ" to auth ConfigMap
[ℹ] nodegroup "ng-6d80fb78" has 0 node(s)
[ℹ] waiting for at least 2 node(s) to become ready in "ng-6d80fb78"
[ℹ] nodegroup "ng-6d80fb78" has 2 node(s)
[ℹ] node "ip-192-168-26-237.us-west-2.compute.internal" is ready
[ℹ] node "ip-192-168-68-246.us-west-2.compute.internal" is ready
[✔] created 1 nodegroup(s) in cluster "my-calico-cluster"
[✔] created 0 managed nodegroup(s) in cluster "my-calico-cluster"
[ℹ] checking security group configuration for all nodegroups
[ℹ] all nodegroups have up-to-date configuration
- Validate
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-26-237.us-west-2.compute.internal Ready <none> 72s v1.16.8-eks-e16311
ip-192-168-68-246.us-west-2.compute.internal Ready <none> 69s v1.16.8-eks-e16311
$ kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-69cb4d4df7-pm9k4 1/1 Running 0 9m19s 172.16.178.195 ip-192-168-26-237.us-west-2.compute.internal <none> <none>
calico-node-htg6v 1/1 Running 0 102s 192.168.26.237 ip-192-168-26-237.us-west-2.compute.internal <none> <none>
calico-node-lzqbg 1/1 Running 0 99s 192.168.68.246 ip-192-168-68-246.us-west-2.compute.internal <none> <none>
coredns-5c97f79574-dtrv5 1/1 Running 0 69m 172.16.178.194 ip-192-168-26-237.us-west-2.compute.internal <none> <none>
coredns-5c97f79574-r59gk 1/1 Running 0 69m 172.16.178.193 ip-192-168-26-237.us-west-2.compute.internal <none> <none>
kube-proxy-lbvl9 1/1 Running 0 99s 192.168.68.246 ip-192-168-68-246.us-west-2.compute.internal <none> <none>
kube-proxy-njwxh 1/1 Running 0 102s 192.168.26.237 ip-192-168-26-237.us-west-2.compute.internal <none> <none>
Everything looks good.
- Deploy sample application For this follow section “Deploy a sample application”.
And validate as follows,
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hello-world-64db9f698b-wsmbw 1/1 Running 0 22s 172.16.180.1 ip-192-168-68-246.us-west-2.compute.internal <none> <none>
Now lets create 100 replicas, this time it should able to create all pods.
kubectl scale deployment hello-world --replicas=100
Validate:
kubectl get deployment hello-world
NAME READY UP-TO-DATE AVAILABLE AGE
hello-world 100/100 100 100 2m55s
Cleanup
Delete eks cluster and aws resources using following,
eksctl delete cluster --name my-calico-cluster
Output:
[ℹ] eksctl version 0.22.0
[ℹ] using region us-west-2
[ℹ] deleting EKS cluster "my-calico-cluster"
[ℹ] deleted 0 Fargate profile(s)
[✔] kubeconfig has been updated
[ℹ] cleaning up LoadBalancer services
[ℹ] 2 sequential tasks: { delete nodegroup "ng-6d80fb78", delete cluster control plane "my-calico-cluster" [async] }
[ℹ] will delete stack "eksctl-my-calico-cluster-nodegroup-ng-6d80fb78"
[ℹ] waiting for stack "eksctl-my-calico-cluster-nodegroup-ng-6d80fb78" to get deleted
[ℹ] will delete stack "eksctl-my-calico-cluster-cluster"
[✔] all cluster resources were deleted
Troubleshooting
I encountered various problems while completing this blog. Those errors are listed here and mitigated already in the above steps.
- An error occurred (InvalidParameterException) when calling the CreateNodegroup operation: Subnets are required
- An error occurred (InvalidParameterException) when calling the CreateNodegroup operation: One or more security groups in remote-access is not valid!
- An error occurred (InvalidParameterException) when calling the CreateNodegroup operation: Following required service principals [eks.amazonaws.com] were not found in the trust relationships of clusterRole arn:aws:iam::XXXXXXXXXX:role/eksrole-NodeInstanceRole-DYX5G48JN3NP
- An error occurred (AlreadyExistsException) when calling the CreateStack operation: Stack [eksrole] already exists
- An error occurred (InvalidParameterException) when calling the CreateNodegroup operation: The role with name eksrole-NodeInstanceRole-DYX2148JN21P cannot be found. (Service: AmazonIdentityManagement; Status Code: 404; Error Code: NoSuchEntity; Request ID: ca90689e-daas-4233-955b-963121d5604c; Proxy: null)
- An error occurred (InvalidRequestException) when calling the CreateNodegroup operation: Cluster ‘democluster’ is not in ACTIVE status
- An error occurred (InvalidParameterException) when calling the CreateNodegroup operation: Subnets are not tagged with the required tag. Please tag all subnets with Key: kubernetes.io/cluster/democluster Value: shared
Top comments (0)