-
Logging With Aws Kubernetes Eks Cluster
Logs
EKS is the managed kubernetes offering by AWS that saves you the stress of managing your own control plane with a twist of offboarding some controls like what goes on in your control. The feature was not available when the service went GA but was recently made available recently. Here are the kinds of logs that it provides;
- API server component logs: You know that component of your cluster that validates requests, provides api rest endpoint and so on? These are the logs from the apiserver which are very critical when trying to diagnose things like why your pods are not creating, admission controller issues etc.
E0523 03:27:22.258958 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request - Audit Logs: People make changes in your cluster and you want to know who, what and when. This logs gives you the ability to this. ```yaml { “kind”: “Event”, “apiVersion”: “audit.k8s.io/v1beta1”, “metadata”: { “creationTimestamp”: “2019-05-23T02:08:34Z” }, “level”: “Request”, “timestamp”: “2019-05-23T02:08:34Z”, “auditID”: “84662c40-8d4f-4d3e-99b2-0d4005e44375”, “stage”: “ResponseComplete”, “requestURI”: “/api/v1/namespaces/default/services/kubernetes”, “verb”: “get”, “user”: { “username”: “system:apiserver”, “uid”: “2d8ad7ed-25ed-4f37-a2f0-416d2af705e9”, “groups”: [ “system:masters” ] }, “sourceIPs”: [ “::1” ], “userAgent”: “kube-apiserver/v1.12.6 (linux/amd64) kubernetes/d69f1bf”, “objectRef”: { “resource”: “services”, “namespace”: “default”, “name”: “kubernetes”, “apiVersion”: “v1” }, “responseStatus”: { “metadata”: {}, “code”: 200 }, “requestReceivedTimestamp”: “2019-05-23T02:08:34.498973Z”, “stageTimestamp”: “2019-05-23T02:08:34.501446Z”, “annotations”: { “authorization.k8s.io/decision”: “allow”, “authorization.k8s.io/reason”: “” } }
+ Authenticator Logs: EKS uses this thing called aws-iam-authenticator to guess what? Authenticate against the EKS cluster using AWS credentials and roles. These logs contains event from these activities ```yaml time="2019-05-16T22:19:48Z" level=info msg="Using assumed role for EC2 API" roleARN="arn:aws:iam::523447765480:role/idaas-kubernetes-cluster-idauto-dev-masters-role"- Controller manager: For those familiar with kubernetes objects such as Deployments, Replicas etc; these are managed by controllers which ships with kubernetes controller manager. To see what these controllers are doing under the hood, you need these.
E0523 02:07:55.486872 1 horizontal.go:212] failed to compute desired number of replicas based on listed metrics for Deployment/routing/rapididentity-default-backend: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io) - Scheduler: This component of the control plane does what it name says, put pods on the right node after factoring a number of constraints and resources available. To see information on how this component is making its decision, check these logs.
E0523 02:07:55.486872 1 horizontal.go:212] failed to compute desired number of replicas based on listed metrics for Deployment/routing/rapididentity-default-backend: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)Enabling Logs
You can easily enable the logs in your EKS cluser console and AWS updates your cluster to enable those logs to ship to cloudwatch. The corresponding cloudwatch log group will be displayed in your console. For those using terraform to provision their cluster, you can just pass in the types of logs that you want to provision and also create the log group to ship it to.
resource "aws_eks_cluster" "my_cluster" { depends_on = ["aws_cloudwatch_log_group.eks_log_group"] enabled_cluster_log_types = ["api", "audit"] name = "${var.cluster_name}" # ... other configuration ... } - API server component logs: You know that component of your cluster that validates requests, provides api rest endpoint and so on? These are the logs from the apiserver which are very critical when trying to diagnose things like why your pods are not creating, admission controller issues etc.
-
Setting Up Jenkins As Code
Ok so our goal here is to deploy jenkins with the click of a button with our job configured and all. Our secret sauce for this will be jenkins configuration-as-code-plugin(JCasC) which allows you to define your jenkins setup in a YAML file or folder. The problem is, we want to use JCasC to configure jenkins but we need JCasC plugin installed ahead to be able to do that for us. Thankfully, we have a solution for that. We will be using Jenkins built-in process to install plugins.
Install plugins
workflow-aggregator:latest blueocean:latest pipeline-maven:latest configuration-as-code-support:latest job-dsl:latestFor those installing jenkins using kubernetes, you will need to update your helm values file. Now, lets crank things up;
#plugins.txt workflow-aggregator:2.6 blueocean:1.16.0 pipeline-maven:3.6.11 configuration-as-code-support:1.14 job-dsl:1.74 workflow-job:2.32 credentials-binding:1.18 git:3.10.0Build and Configure
jenkins: systemMessage: "I did this using Jenkins Configuration as Code Plugin \n\n" tool: git: installations: - home: "git" name: "Default" maven: installations: - name: "Maven 3" properties: - installSource: installers: - maven: id: "3.5.4" jobs: - script: > pipelineJob('pipeline') { definition { cpsScm { scriptPath 'Jenkinsfile' scm { git { remote { url 'https://github.com/mkrzyzanowski/blog-001.git' } branch '*/docker-for-mac' extensions {} } } } } }These are the plugins that we are trying to install as well as how we want our jenkins setup. Here is the Docker image build that takes care of installing for us.
#Dockerfile FROM jenkins/jenkins:lts COPY plugins.txt /usr/share/jenkins/ref/plugins.txt RUN /usr/local/bin/install-plugins.sh < /usr/share/jenkins/ref/plugins.txtOnce the image has been built, we need a way to let JCasC know the location of our configuration file( named jenkins.yaml in most cases).
- Copy the jenkins.yaml file to /var/jenkins_home/. It looks for this file by default
- Use CASC_JENKINS_CONFIG environmental variable to point to the file location and the location could be any of these;
- A file path(/my/path/jenkins.yaml)
- A folder path(/my/path/jenkins_casc_configs/)
- A configuration file URL PATH(https://example.com/git/jenkins.yaml)
For this example, I will mount the jenkins.yaml to /var/jenkins_home with docker
$ docker run --name jenkins -p -d 8081:8080 -v $(pwd):/var/jenkins_home my_jenkins_image Running from: /usr/share/jenkins/jenkins.war webroot: EnvVars.masterEnvVars.get("JENKINS_HOME") May 08, 2019 12:00:19 AM org.eclipse.jetty.util.log.Log initialized INFO: Logging initialized @612ms to org.eclipse.jetty.util.log.JavaUtilLog May 08, 2019 12:00:19 AM winstone.Logger logInternal INFO: Beginning extraction from war file May 08, 2019 12:00:40 AM org.eclipse.jetty.server.handler.ContextHandler setContextPath WARNING: Empty contextPath May 08, 2019 12:00:40 AM org.eclipse.jetty.server.Server doStart INFO: jetty-9.4.z-SNAPSHOT; built: 2018-08-30T13:59:14.071Z; git: 27208684755d94a92186989f695db2d7b21ebc51; jvm 1.8.0_212-8u212-b01-1~deb9u1-b01 May 08, 2019 12:00:47 AM org.eclipse.jetty.webapp.StandardDescriptorProcessor visitServlet INFO: NO JSP Support for /, did not find org.eclipse.jetty.jsp.JettyJspServlet May 08, 2019 12:00:47 AM org.eclipse.jetty.server.session.DefaultSessionIdManager doStart INFO: DefaultSessionIdManager workerName=node0 May 08, 2019 12:00:47 AM org.eclipse.jetty.server.session.DefaultSessionIdManager doStart INFO: No SessionScavenger set, using defaults May 08, 2019 12:00:47 AM org.eclipse.jetty.server.session.HouseKeeper startScavenging INFO: node0 Scavenging every 660000ms Jenkins home directory: /var/jenkins_home found at: EnvVars.masterEnvVars.get("JENKINS_HOME") May 08, 2019 12:00:50 AM org.eclipse.jetty.server.handler.ContextHandler doStart INFO: Started w.@a50b09c{Jenkins v2.164.2,/,file:///var/jenkins_home/war/,AVAILABLE}{/var/jenkins_home/war} May 08, 2019 12:00:50 AM org.eclipse.jetty.server.AbstractConnector doStart INFO: Started ServerConnector@5a38588f{HTTP/1.1,[http/1.1]}{0.0.0.0:8080} May 08, 2019 12:00:50 AM org.eclipse.jetty.server.Server doStart INFO: Started @31513ms May 08, 2019 12:00:50 AM winstone.Logger logInternal INFO: Winstone Servlet Engine v4.0 running: controlPort=disabled May 08, 2019 12:00:53 AM jenkins.InitReactorRunner$1 onAttained INFO: Started initialization May 08, 2019 12:02:20 AM hudson.ClassicPluginStrategy createClassJarFromWebInfClasses WARNING: Created /var/jenkins_home/plugins/job-dsl/WEB-INF/lib/classes.jar; update plugin to a version created with a newer harness May 08, 2019 12:02:36 AM jenkins.InitReactorRunner$1 onAttained INFO: Listed all plugins May 08, 2019 12:02:58 AM jenkins.InitReactorRunner$1 onAttained INFO: Prepared all plugins May 08, 2019 12:02:58 AM jenkins.InitReactorRunner$1 onAttained INFO: Started all plugins May 08, 2019 12:03:09 AM jenkins.InitReactorRunner$1 onAttained INFO: Augmented all extensions May 08, 2019 12:03:10 AM io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator tryConstructor INFO: Setting class hudson.plugins.git.GitTool.name = Default May 08, 2019 12:03:10 AM io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator tryConstructor INFO: Setting class hudson.plugins.git.GitTool.home = git May 08, 2019 12:03:10 AM io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator tryConstructor INFO: Setting class hudson.tasks.Maven$MavenInstallation.name = Maven 3 May 08, 2019 12:03:10 AM io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator tryConstructor INFO: Setting class hudson.tasks.Maven$MavenInstaller.id = 3.5.4 May 08, 2019 12:03:10 AM io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator tryConstructor INFO: Setting class hudson.tools.InstallSourceProperty.installers = [{maven={}}] May 08, 2019 12:03:10 AM io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator tryConstructor INFO: Setting class hudson.tasks.Maven$MavenInstallation.properties = [{installSource={}}] May 08, 2019 12:03:11 AM io.jenkins.plugins.casc.Attribute setValue INFO: Setting hudson.model.Hudson@4fbfd7e4.systemMessage = I did this using Jenkins Configuration as Code Plugin Processing provided DSL script May 08, 2019 12:03:15 AM javaposse.jobdsl.plugin.JenkinsJobManagement createOrUpdateConfig INFO: createOrUpdateConfig for pipeline May 08, 2019 12:03:16 AM io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator tryConstructor INFO: Setting class hudson.plugins.git.GitTool.name = Default May 08, 2019 12:03:16 AM io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator tryConstructor INFO: Setting class hudson.plugins.git.GitTool.home = git May 08, 2019 12:03:16 AM io.jenkins.plugins.casc.Attribute setValue INFO: Setting hudson.plugins.git.GitTool$DescriptorImpl@7d18607f.installations = [GitTool[Default]] ....Here is the screenshot of our newly configured Jenkins.

Happy Automation!!!
-
Kubernetes Performance And Cpu Manager
So you have a workload that is CPU senstive and you want to optimize things by providing better CPU performance to your workflow, CPU Manager can help. Now, howexactly does it help you; Before we can talk about this, lets try to understand CFS(Completely Fair Scheduler) slang;
CFS Share
No, this is not like stock market share, we are talking CPU here. Think about a fixed time that everyone is trying to take a slice of. CPU Shares simply implies how much of system CPU time do you have access to.
- CPU Share: This determines your power when assigned to a CPU core under excess load. Lets say two processes(A and B) jumps on a CPU core and they both get allocated 1024 shares each(default allocation unless you change things), it means they both carry the same weight in terms of time allocation with each getting 1/2 CPU core time. Now, if we make things interesting and make process B share updated to 512, it means B gets (512/(1024+512)) = 1/3 of the CPU time. Now, one more thing to remember is that, if process A goes idle, process B can use some of that CPU time provided we only have A and B on the core.
-
CPU Period: This is part of the CFS bandwidth control and it determines the what a period means to a CPU. What is a Period? Think of it as a time that represent a CPU cycle, usually, 100ms(100,000) for most system and it is expressed as
cfs_period_us. - CPU Quota: A process with 20ms(20,000) quota will get 1/5 of time during a CPU period of 100ms. So quota is basically, how much of the time slice do you get to access? You see this variable expressed as
cfs_quota_us.
Ok, enough of the jargons, how does kubernetes translate a container with 100m(0.1 CPU) to shares and quota? You can see the answer below.
This kubernetes go code explains it all for those interested in how kubernetes does all these.
// milliCPUToShares converts milliCPU to CPU shares func milliCPUToShares(milliCPU int64) int64 { if milliCPU == 0 { // Return 2 here to really match kernel default for zero milliCPU. return minShares } // Conceptually (milliCPU / milliCPUToCPU) * sharesPerCPU, but factored to improve rounding. shares := (milliCPU * sharesPerCPU) / milliCPUToCPU // for example, share := (100m/1024) * 1000 = (100/1000) * 1024 = 102.4 shares if shares < minShares { return minShares } return shares } // milliCPUToQuota converts milliCPU to CFS quota and period values func milliCPUToQuota(milliCPU int64) (quota int64, period int64) { // CFS quota is measured in two values: // - cfs_period_us=100ms (the amount of time to measure usage across) // - cfs_quota=20ms (the amount of cpu time allowed to be used across a period) // so in the above example, you are limited to 20% of a single CPU // for multi-cpu environments, you just scale equivalent amounts if milliCPU == 0 { return } // we set the period to 100ms by default period = quotaPeriod // we then convert your milliCPU to a value normalized over a period quota = (milliCPU * quotaPeriod) / milliCPUToCPU // quota needs to be a minimum of 1ms. if quota < minQuotaPeriod { quota = minQuotaPeriod } returnCPU Manager and Scheduling
Nice that we have gone through all these terms. The question remains, how does CPU manager work to help with my CPU sensitive workload. so basically, it uses cpuset feature in linux to place containers on specific cpu. It takes slice of cpu, equals to specified requests(or limit) in the container, separates it and assign it to your container thereby preventing context switching and noisy-neighbour issue. Let’s look under the cover; it creates different pool of memory as shown below
- Shared Memory Pool: This is the pool of memory that every scheduled container gets assigned to until decision is made to move them elsewhere.
- Reserved Pool: Remember that your kubelet can reserve cpu right? Yes, those are these guys. Simply, cpu that you can not touch in the shared pool.
- Assignable: This is where containers with that meets exclusivity requirement get their CPU from. They are taking from remaining CPU left after removing the reserved pool for kubelet. Once they are assigned to a container, they get removed from the shared pool.
- Exclusive Allocations; This pool contains those cpuset assigned to containers.
The next question, who qualified for Assignable Pool? Any Guaranteed container(request = limit) with integer CPU. Yes, integer! Containers like the one below;
apiVersion: v1 kind: Pod metadata: name: memory-demo namespace: mem-example spec: containers: - name: memory-demo-ctr image: polinux/stress resources: limits: cpu: 1 memory: "200Mi" requests: cpu: 1 memory: "200Mi" command: ["stress"]Ok, I mentioned everyone gets assigned to shared pool at first. What moves them to exclusive pool? Well, kubelet does what we call resync(configurable kubelet option) by checking the containers in the shared pool every certain period and move those who qualified to exclusive pool. That means, your pod could be in shared pool until next resync. Also, please note it is possible that kubelet or system process will be running on the exclusive CPU set because manager only guarantees exclusivity for pods.Other processes in the system, thats a not kubelet business. ###Show me the money Enough theory, let’s get dirty. How do we enable this feature on our kubelet? Just enable feature-gate CPUManager and pass in static policy. I used this in my test kubeadm setup
kind: KubeletConfiguration featureGates: CPUManager: true cpuManagerPolicy: static systemReserved: cpu: 500m memory: 256M kubeReserved: cpu: 500m memory: 256MLets go ahead and create this pod for example;
apiVersion: v1 kind: Pod metadata: name: myapp-pod labels: app: myapp spec: containers: - name: myapp-container image: busybox resources: requests: memory: "24Mi" cpu: "150m" limits: memory: "28Mi" cpu: "160m" command: ['sh', '-c', 'echo Hello Kubernetes! && sleep 3600']After creating the pod on the cluster with CPU Manager feature gate enabled, it get scheduled onto node. At this point, it is a burstable pod with 153(150/1000 * 1024) share of CPU and 16000 CPU Quota(160/1000 * 100,000). You can confirm this by looking the container cgroup.
cat /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/podf19e6b4b-6eb0-11e9-898e-062ad3dc4fe4/138208e13ba 73882fc0a5c06862b7b0bc7f6d3f43116d61ecf2488fae11d6004/cpu.shares 153 cat /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/podf19e6b4b-6eb0-11e9-898e-062ad3dc4fe4/138208e13ba 73882fc0a5c06862b7b0bc7f6d3f43116d61ecf2488fae11d6004/cpu.cfs_quota_us 16000The above pod is a burstable pod but to test CPU Manager, we need a guaranteed pod with whole number CPU. Once the pod is scheduled, the kubelet should configure our container runtime to run the pod on particular core(s) using cpuset.
apiVersion: v1 kind: Pod metadata: name: guaranteed-myapp-pod labels: app: myapp spec: containers: - name: myapp-container image: busybox resources: requests: memory: "38Mi" cpu: "1" limits: memory: "38Mi" cpu: "1" command: ['sh', '-c', 'echo Hello Kubernetes! && sleep 3600']We can see this in the container docker inspect as well as the cgroup.
# docker inspect 59964c06d765 | grep -i cpu "CpuShares": 1024, "NanoCpus": 0, "CpuPeriod": 100000, "CpuQuota": 100000, "CpuRealtimePeriod": 0, "CpuRealtimeRuntime": 0, "CpusetCpus": "1", "CpusetMems": "", "CpuCount": 0, "CpuPercent": 0, # cat /sys/fs/cgroup/cpuset/kubepods/pod273e9d00-706c-11e9-a529-062ad3dc4fe4/59964c06d7657face0585c9db37 5d8773dcb1b351a2d7e87204e89a2e47c2b97/cpuset.effective_cpus 1If you look at other burstable containers, they will be restricted to shared cores.
# docker inspect 6ca5ea998f0d | grep -i cpu "CpuShares": 153, "NanoCpus": 0, "CpuPeriod": 100000, "CpuQuota": 16000, "CpuRealtimePeriod": 0, "CpuRealtimeRuntime": 0, "CpusetCpus": "0,2-3", "CpusetMems": "", "CpuCount": 0, "CpuPercent": 0, # cat /sys/fs/cgroup/cpuset/kubepods/burstable/podd024aac6-706b-11e9-a529-062ad3dc4fe4/539b9b1c4ec3e49fa 49b84b22ff4a06058a1c0bd6db57667cb30786927d3a380/cpuset.cpus 0,2-3Helpful Links
- https://stupefied-goodall-e282f7.netlify.com/contributors/design-proposals/node/cpu-manager/
- https://goldmann.pl/blog/2014/09/11/resource-management-in-docker/#_cpu
- https://software.intel.com/en-us/blogs/2018/08/07/cpu-manager-for-performance-sensitive-applications
- https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-time-9eff74d3161b
- https://medium.com/@mcastelino/kubernetes-resource-management-deep-dive-b337ba15359c