Kubernetes Orchestration Skill
Table of Contents
-
Introduction
-
Core Concepts
-
Workloads
-
Services and Networking
-
Ingress Controllers
-
Configuration Management
-
Storage
-
Namespaces and Resource Isolation
-
Security and RBAC
-
Autoscaling
-
Monitoring and Observability
-
Logging
-
Production Operations
-
Troubleshooting
Introduction
Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It provides a robust framework for running distributed systems resiliently, handling scaling and failover for your applications, and providing deployment patterns.
Key Benefits
-
Service Discovery and Load Balancing: Automatic DNS and load balancing for containers
-
Storage Orchestration: Mount storage systems from local, cloud, or network storage
-
Automated Rollouts and Rollbacks: Declarative deployment with health monitoring
-
Automatic Bin Packing: Optimal placement of containers based on resource requirements
-
Self-Healing: Automatic restart, replacement, and rescheduling of failed containers
-
Secret and Configuration Management: Store and manage sensitive information securely
-
Horizontal Scaling: Scale applications up and down automatically or manually
-
Batch Execution: Manage batch and CI workloads
Core Concepts
Cluster Architecture
A Kubernetes cluster consists of:
Control Plane Components:
-
kube-apiserver: The API server is the front end for the Kubernetes control plane
-
etcd: Consistent and highly-available key-value store for all cluster data
-
kube-scheduler: Watches for newly created Pods and assigns them to nodes
-
kube-controller-manager: Runs controller processes
-
cloud-controller-manager: Integrates with cloud provider APIs
Node Components:
-
kubelet: Agent that runs on each node and ensures containers are running
-
kube-proxy: Network proxy maintaining network rules on nodes
-
container runtime: Software responsible for running containers (containerd, CRI-O)
Objects and Specifications
Kubernetes objects are persistent entities representing the state of your cluster. Every object includes:
-
metadata: Data about the object (name, namespace, labels, annotations)
-
spec: The desired state
-
status: The current state (managed by Kubernetes)
Workloads
Pods
Pods are the smallest deployable units in Kubernetes, representing one or more containers that share storage and network resources.
Basic Pod Example:
apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: nginx spec: containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
Multi-Container Pod:
apiVersion: v1 kind: Pod metadata: name: multi-container-pod spec: containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80 volumeMounts:
- name: shared-data mountPath: /usr/share/nginx/html
- name: sidecar
image: busybox
command: ['sh', '-c', 'while true; do echo "$(date)" > /pod-data/index.html; sleep 30; done']
volumeMounts:
- name: shared-data mountPath: /pod-data volumes:
- name: shared-data emptyDir: {}
Pod with Init Container:
apiVersion: v1 kind: Pod metadata: name: init-demo spec: initContainers:
- name: install
image: busybox:1.28
command:
- wget
- "-O"
- "/work-dir/index.html"
- http://info.cern.ch volumeMounts:
- name: workdir mountPath: "/work-dir" containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80 volumeMounts:
- name: workdir mountPath: /usr/share/nginx/html volumes:
- name: workdir emptyDir: {}
Pod with Security Context:
apiVersion: v1 kind: Pod metadata: name: security-context-demo spec: securityContext: runAsUser: 1000 runAsGroup: 3000 fsGroup: 2000 containers:
- name: sec-ctx-container image: busybox command: [ "sh", "-c", "sleep 1h" ] securityContext: allowPrivilegeEscalation: false capabilities: drop: - NET_RAW - ALL runAsNonRoot: true seccompProfile: type: RuntimeDefault
Pod with Resource Limits and Requests:
apiVersion: v1 kind: Pod metadata: name: resource-demo spec: containers:
- name: app image: nginx:1.21 resources: requests: memory: "64Mi" cpu: "250m" limits: memory: "128Mi" cpu: "500m"
Pod with Probes:
apiVersion: v1 kind: Pod metadata: name: probe-demo spec: containers:
- name: app
image: nginx:1.21
ports:
- containerPort: 80 livenessProbe: httpGet: path: /healthz port: 80 initialDelaySeconds: 3 periodSeconds: 10 timeoutSeconds: 1 failureThreshold: 3 readinessProbe: httpGet: path: /ready port: 80 initialDelaySeconds: 5 periodSeconds: 5 successThreshold: 1 startupProbe: httpGet: path: /startup port: 80 initialDelaySeconds: 0 periodSeconds: 10 failureThreshold: 30
Deployments
Deployments provide declarative updates for Pods and ReplicaSets, enabling rolling updates and rollbacks.
Basic Deployment:
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.21 ports: - containerPort: 80
Deployment with Rolling Update Strategy:
apiVersion: apps/v1 kind: Deployment metadata: name: rolling-update-deployment spec: replicas: 5 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 1 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: myapp:v2 ports: - containerPort: 8080 resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi
Deployment with Recreate Strategy:
apiVersion: apps/v1 kind: Deployment metadata: name: recreate-deployment spec: replicas: 3 strategy: type: Recreate selector: matchLabels: app: database-migration template: metadata: labels: app: database-migration spec: containers: - name: migrator image: migrator:v1
Blue-Green Deployment Pattern:
Blue Deployment
apiVersion: apps/v1 kind: Deployment metadata: name: myapp-blue spec: replicas: 3 selector: matchLabels: app: myapp version: blue template: metadata: labels: app: myapp version: blue spec: containers: - name: myapp image: myapp:v1.0 ports: - containerPort: 8080
Green Deployment
apiVersion: apps/v1 kind: Deployment metadata: name: myapp-green spec: replicas: 3 selector: matchLabels: app: myapp version: green template: metadata: labels: app: myapp version: green spec: containers: - name: myapp image: myapp:v2.0 ports: - containerPort: 8080
StatefulSets
StatefulSets manage stateful applications requiring stable network identities and persistent storage.
Basic StatefulSet with Headless Service:
apiVersion: v1 kind: Service metadata: name: nginx labels: app: nginx spec: ports:
- port: 80 name: web clusterIP: None selector: app: nginx
apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: serviceName: "nginx" replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: registry.k8s.io/nginx-slim:0.21 ports: - containerPort: 80 name: web volumeMounts: - name: www mountPath: /usr/share/nginx/html volumeClaimTemplates:
- metadata: name: www spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 1Gi
StatefulSet with Parallel Pod Management:
apiVersion: apps/v1 kind: StatefulSet metadata: name: web-parallel spec: serviceName: "nginx" podManagementPolicy: "Parallel" replicas: 5 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: registry.k8s.io/nginx-slim:0.24 ports: - containerPort: 80 name: web volumeMounts: - name: www mountPath: /usr/share/nginx/html volumeClaimTemplates:
- metadata: name: www spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 1Gi
StatefulSet for Database (MySQL):
apiVersion: v1 kind: Service metadata: name: mysql-headless spec: ports:
- port: 3306 name: mysql clusterIP: None selector: app: mysql
apiVersion: apps/v1 kind: StatefulSet metadata: name: mysql spec: serviceName: mysql-headless replicas: 3 selector: matchLabels: app: mysql template: metadata: labels: app: mysql spec: containers: - name: mysql image: mysql:8.0 ports: - containerPort: 3306 name: mysql env: - name: MYSQL_ROOT_PASSWORD valueFrom: secretKeyRef: name: mysql-secret key: root-password volumeMounts: - name: data mountPath: /var/lib/mysql volumeClaimTemplates:
- metadata: name: data spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 10Gi
DaemonSets
DaemonSets ensure that all or specific nodes run a copy of a Pod, ideal for logging, monitoring, and cluster storage.
Logging DaemonSet:
apiVersion: apps/v1 kind: DaemonSet metadata: name: fluentd-elasticsearch namespace: kube-system labels: k8s-app: fluentd-logging spec: selector: matchLabels: name: fluentd-elasticsearch template: metadata: labels: name: fluentd-elasticsearch spec: tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule containers: - name: fluentd-elasticsearch image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2 resources: limits: memory: 200Mi requests: cpu: 100m memory: 200Mi volumeMounts: - name: varlog mountPath: /var/log - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true terminationGracePeriodSeconds: 30 volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers
Monitoring DaemonSet (Node Exporter):
apiVersion: apps/v1 kind: DaemonSet metadata: name: node-exporter namespace: monitoring spec: selector: matchLabels: app: node-exporter template: metadata: labels: app: node-exporter spec: hostNetwork: true hostPID: true containers: - name: node-exporter image: prom/node-exporter:v1.3.1 args: - --path.procfs=/host/proc - --path.sysfs=/host/sys - --path.rootfs=/host/root - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/) ports: - containerPort: 9100 hostPort: 9100 name: metrics volumeMounts: - name: proc mountPath: /host/proc readOnly: true - name: sys mountPath: /host/sys readOnly: true - name: root mountPath: /host/root readOnly: true volumes: - name: proc hostPath: path: /proc - name: sys hostPath: path: /sys - name: root hostPath: path: /
Jobs
Jobs create one or more Pods and ensure a specified number successfully complete.
Basic Job:
apiVersion: batch/v1 kind: Job metadata: name: pi spec: template: spec: containers: - name: pi image: perl:5.34 command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"] restartPolicy: Never backoffLimit: 4
Parallel Job with Fixed Completions:
apiVersion: batch/v1 kind: Job metadata: name: parallel-job spec: completions: 8 parallelism: 2 template: spec: containers: - name: worker image: busybox command: ["sh", "-c", "echo Processing item $ITEM_ID && sleep 5"] env: - name: ITEM_ID value: "$(JOB_COMPLETION_INDEX)" restartPolicy: Never backoffLimit: 3
Job with TTL After Finished:
apiVersion: batch/v1 kind: Job metadata: name: ttl-job spec: ttlSecondsAfterFinished: 100 template: spec: containers: - name: cleaner image: busybox command: ["sh", "-c", "echo Cleaning up && sleep 10"] restartPolicy: Never
CronJobs
CronJobs create Jobs on a repeating schedule.
Basic CronJob:
apiVersion: batch/v1 kind: CronJob metadata: name: hello spec: schedule: "*/5 * * * *" jobTemplate: spec: template: spec: containers: - name: hello image: busybox command: - /bin/sh - -c - date; echo Hello from the Kubernetes cluster restartPolicy: OnFailure
Backup CronJob:
apiVersion: batch/v1 kind: CronJob metadata: name: database-backup spec: schedule: "0 2 * * *" successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 concurrencyPolicy: Forbid jobTemplate: spec: template: spec: containers: - name: backup image: postgres:14 command: - /bin/sh - -c - pg_dump -h $DB_HOST -U $DB_USER $DB_NAME | gzip > /backup/db-$(date +%Y%m%d-%H%M%S).sql.gz env: - name: DB_HOST value: postgres-service - name: DB_USER valueFrom: secretKeyRef: name: db-credentials key: username - name: PGPASSWORD valueFrom: secretKeyRef: name: db-credentials key: password - name: DB_NAME value: mydb volumeMounts: - name: backup-storage mountPath: /backup restartPolicy: OnFailure volumes: - name: backup-storage persistentVolumeClaim: claimName: backup-pvc
Report Generation CronJob:
apiVersion: batch/v1 kind: CronJob metadata: name: daily-report spec: schedule: "0 8 * * 1-5" timeZone: "America/New_York" startingDeadlineSeconds: 3600 concurrencyPolicy: Replace jobTemplate: spec: template: spec: containers: - name: report-generator image: report-app:v1 command: - python - generate_report.py - --format=pdf - --email-recipients=team@example.com restartPolicy: OnFailure
Services and Networking
ClusterIP Service
Default service type providing internal cluster communication.
apiVersion: v1 kind: Service metadata: name: backend-service spec: type: ClusterIP selector: app: backend ports:
- protocol: TCP port: 80 targetPort: 8080
Service with Multiple Ports:
apiVersion: v1 kind: Service metadata: name: multi-port-service spec: selector: app: myapp ports:
- name: http protocol: TCP port: 80 targetPort: 8080
- name: https protocol: TCP port: 443 targetPort: 8443
- name: metrics protocol: TCP port: 9090 targetPort: 9090
Headless Service:
apiVersion: v1 kind: Service metadata: name: stateful-service spec: clusterIP: None selector: app: stateful-app ports:
- port: 80 targetPort: 8080
NodePort Service
Exposes the service on each node's IP at a static port.
apiVersion: v1 kind: Service metadata: name: nodeport-service spec: type: NodePort selector: app: frontend ports:
- protocol: TCP port: 80 targetPort: 8080 nodePort: 30080
LoadBalancer Service
Creates an external load balancer in cloud environments.
apiVersion: v1 kind: Service metadata: name: loadbalancer-service annotations: service.beta.kubernetes.io/aws-load-balancer-type: "nlb" spec: type: LoadBalancer selector: app: web ports:
- protocol: TCP port: 80 targetPort: 8080 loadBalancerSourceRanges:
- "10.0.0.0/8"
- "172.16.0.0/12"
ExternalName Service
Maps a service to an external DNS name.
apiVersion: v1 kind: Service metadata: name: external-database spec: type: ExternalName externalName: database.example.com
Service with Session Affinity
apiVersion: v1 kind: Service metadata: name: sticky-service spec: selector: app: myapp ports:
- protocol: TCP port: 80 targetPort: 8080 sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 10800
Ingress Controllers
Ingress manages external access to services, typically HTTP/HTTPS, providing load balancing, SSL termination, and name-based virtual hosting.
Basic Ingress
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: basic-ingress spec: ingressClassName: nginx rules:
- host: myapp.example.com
http:
paths:
- path: / pathType: Prefix backend: service: name: myapp-service port: number: 80
Ingress with TLS
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: tls-ingress spec: ingressClassName: nginx tls:
- hosts:
- myapp.example.com secretName: myapp-tls-secret rules:
- host: myapp.example.com
http:
paths:
- path: / pathType: Prefix backend: service: name: myapp-service port: number: 80
Path-Based Routing
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: path-based-ingress annotations: nginx.ingress.kubernetes.io/rewrite-target: / spec: ingressClassName: nginx rules:
- host: example.com
http:
paths:
- path: /api pathType: Prefix backend: service: name: api-service port: number: 8080
- path: /web pathType: Prefix backend: service: name: web-service port: number: 80
- path: /admin pathType: Prefix backend: service: name: admin-service port: number: 3000
Multi-Host Ingress
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: multi-host-ingress spec: ingressClassName: nginx tls:
- hosts:
- app1.example.com
- app2.example.com secretName: multi-tls-secret rules:
- host: app1.example.com
http:
paths:
- path: / pathType: Prefix backend: service: name: app1-service port: number: 80
- host: app2.example.com
http:
paths:
- path: / pathType: Prefix backend: service: name: app2-service port: number: 80
Ingress with Authentication
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: auth-ingress annotations: nginx.ingress.kubernetes.io/auth-type: basic nginx.ingress.kubernetes.io/auth-secret: basic-auth nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required' spec: ingressClassName: nginx rules:
- host: secure.example.com
http:
paths:
- path: / pathType: Prefix backend: service: name: secure-service port: number: 80
Ingress with Rate Limiting
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: rate-limit-ingress annotations: nginx.ingress.kubernetes.io/limit-rps: "10" nginx.ingress.kubernetes.io/limit-connections: "5" spec: ingressClassName: nginx rules:
- host: api.example.com
http:
paths:
- path: / pathType: Prefix backend: service: name: api-service port: number: 8080
Configuration Management
ConfigMaps
ConfigMaps store non-confidential data in key-value pairs.
ConfigMap from Literals:
apiVersion: v1 kind: ConfigMap metadata: name: app-config data: database_host: "postgres.default.svc.cluster.local" database_port: "5432" log_level: "INFO" feature_flags: | feature1=enabled feature2=disabled feature3=enabled
ConfigMap with File Content:
apiVersion: v1 kind: ConfigMap metadata: name: nginx-config data: nginx.conf: | events { worker_connections 1024; } http { server { listen 80; location / { root /usr/share/nginx/html; index index.html; } } }
Using ConfigMap as Environment Variables:
apiVersion: v1 kind: Pod metadata: name: config-env-pod spec: containers:
- name: app
image: myapp:v1
envFrom:
- configMapRef: name: app-config env:
- name: SPECIFIC_CONFIG valueFrom: configMapKeyRef: name: app-config key: log_level
Using ConfigMap as Volume:
apiVersion: v1 kind: Pod metadata: name: config-volume-pod spec: containers:
- name: nginx
image: nginx:1.21
volumeMounts:
- name: config-volume mountPath: /etc/nginx/nginx.conf subPath: nginx.conf volumes:
- name: config-volume configMap: name: nginx-config
Secrets
Secrets store sensitive information such as passwords, tokens, and keys.
Opaque Secret:
apiVersion: v1 kind: Secret metadata: name: db-credentials type: Opaque data: username: YWRtaW4= # base64 encoded "admin" password: cGFzc3dvcmQxMjM= # base64 encoded "password123" stringData: connection-string: "postgresql://admin:password123@postgres:5432/mydb"
TLS Secret:
apiVersion: v1 kind: Secret metadata: name: tls-secret type: kubernetes.io/tls data: tls.crt: LS0tLS1CRUdJTi... # base64 encoded certificate tls.key: LS0tLS1CRUdJTi... # base64 encoded private key
Docker Registry Secret:
apiVersion: v1 kind: Secret metadata: name: registry-credentials type: kubernetes.io/dockerconfigjson data: .dockerconfigjson: eyJhdXRocyI6eyJodHRwczovL2luZGV4...
Using Secrets as Environment Variables:
apiVersion: v1 kind: Pod metadata: name: secret-env-pod spec: containers:
- name: app
image: myapp:v1
env:
- name: DB_USERNAME valueFrom: secretKeyRef: name: db-credentials key: username
- name: DB_PASSWORD valueFrom: secretKeyRef: name: db-credentials key: password
Using Secrets as Volume:
apiVersion: v1 kind: Pod metadata: name: secret-volume-pod spec: containers:
- name: app
image: myapp:v1
volumeMounts:
- name: secret-volume mountPath: /etc/secrets readOnly: true volumes:
- name: secret-volume secret: secretName: db-credentials
Pod with Service Account and Secrets:
apiVersion: v1 kind: Pod metadata: name: prod-db-client-pod labels: name: prod-db-client spec: serviceAccount: prod-db-client containers:
- name: db-client-container
image: postgres:14
env:
- name: DATABASE_URL valueFrom: secretKeyRef: name: db-credentials key: connection-string
Storage
PersistentVolumes (PV)
PersistentVolumes are cluster-level storage resources.
apiVersion: v1 kind: PersistentVolume metadata: name: pv-example spec: capacity: storage: 10Gi accessModes:
- ReadWriteOnce persistentVolumeReclaimPolicy: Retain storageClassName: standard hostPath: path: /mnt/data
NFS PersistentVolume:
apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv spec: capacity: storage: 100Gi accessModes:
- ReadWriteMany nfs: server: nfs-server.example.com path: /exports/data persistentVolumeReclaimPolicy: Retain storageClassName: nfs
PersistentVolumeClaims (PVC)
PVCs request storage from PersistentVolumes.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mysql-pvc spec: accessModes:
- ReadWriteOnce resources: requests: storage: 20Gi storageClassName: standard
PVC with Selector:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: selective-pvc spec: accessModes:
- ReadWriteOnce resources: requests: storage: 10Gi storageClassName: fast-ssd selector: matchLabels: environment: production tier: database
StorageClass
StorageClasses define different classes of storage.
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-ssd provisioner: kubernetes.io/aws-ebs parameters: type: gp3 iops: "3000" throughput: "125" encrypted: "true" volumeBindingMode: WaitForFirstConsumer allowVolumeExpansion: true
Azure StorageClass:
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: azure-premium provisioner: kubernetes.io/azure-disk parameters: storageaccounttype: Premium_LRS kind: Managed reclaimPolicy: Delete allowVolumeExpansion: true
Using PVC in Pod:
apiVersion: v1 kind: Pod metadata: name: pvc-pod spec: containers:
- name: app
image: nginx:1.21
volumeMounts:
- name: data mountPath: /usr/share/nginx/html volumes:
- name: data persistentVolumeClaim: claimName: mysql-pvc
Namespaces and Resource Isolation
Creating Namespaces
apiVersion: v1 kind: Namespace metadata: name: development labels: environment: dev team: engineering
Namespace with Annotations:
apiVersion: v1 kind: Namespace metadata: name: production labels: environment: prod compliance: required annotations: owner: "platform-team@example.com" cost-center: "12345"
ResourceQuota
ResourceQuotas limit resource consumption in a namespace.
apiVersion: v1 kind: ResourceQuota metadata: name: compute-quota namespace: development spec: hard: requests.cpu: "10" requests.memory: 20Gi limits.cpu: "20" limits.memory: 40Gi persistentvolumeclaims: "10" pods: "50"
Object Count Quota:
apiVersion: v1 kind: ResourceQuota metadata: name: object-quota namespace: development spec: hard: configmaps: "10" secrets: "10" services: "10" services.loadbalancers: "2" services.nodeports: "5"
LimitRange
LimitRanges set default resource limits and requests.
apiVersion: v1 kind: LimitRange metadata: name: resource-limits namespace: development spec: limits:
- max: cpu: "2" memory: 4Gi min: cpu: 100m memory: 128Mi default: cpu: 500m memory: 512Mi defaultRequest: cpu: 200m memory: 256Mi type: Container
- max: storage: 10Gi min: storage: 1Gi type: PersistentVolumeClaim
Security and RBAC
ServiceAccounts
apiVersion: v1 kind: ServiceAccount metadata: name: app-service-account namespace: default
ServiceAccount with Image Pull Secrets:
apiVersion: v1 kind: ServiceAccount metadata: name: build-robot namespace: default imagePullSecrets:
- name: registry-credentials
Roles and RoleBindings
Role (Namespace-scoped):
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: development name: pod-reader rules:
- apiGroups: [""] resources: ["pods"] verbs: ["get", "watch", "list"]
- apiGroups: [""] resources: ["pods/log"] verbs: ["get"]
RoleBinding:
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: read-pods namespace: development subjects:
- kind: User name: jane apiGroup: rbac.authorization.k8s.io
- kind: ServiceAccount name: app-service-account namespace: development roleRef: kind: Role name: pod-reader apiGroup: rbac.authorization.k8s.io
ClusterRole and ClusterRoleBinding
ClusterRole (Cluster-scoped):
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: cluster-admin-role rules:
- apiGroups: [""] resources: ["nodes"] verbs: ["get", "list", "watch"]
- apiGroups: ["apps"] resources: ["deployments", "statefulsets", "daemonsets"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""] resources: ["namespaces"] verbs: ["get", "list"]
ClusterRoleBinding:
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: cluster-admin-binding subjects:
- kind: User name: admin-user apiGroup: rbac.authorization.k8s.io
- kind: Group name: system:masters apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: cluster-admin-role apiGroup: rbac.authorization.k8s.io
Developer Role:
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: development name: developer rules:
- apiGroups: ["", "apps", "batch"] resources: ["pods", "deployments", "services", "configmaps", "secrets", "jobs", "cronjobs"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""] resources: ["pods/log", "pods/exec"] verbs: ["get", "create"]
NetworkPolicy
Deny All Ingress:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: deny-all-ingress namespace: production spec: podSelector: {} policyTypes:
- Ingress
Allow Specific Ingress:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-frontend namespace: production spec: podSelector: matchLabels: app: backend policyTypes:
- Ingress ingress:
- from:
- podSelector: matchLabels: app: frontend ports:
- protocol: TCP port: 8080
Allow from Specific Namespace:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-namespace spec: podSelector: matchLabels: app: myapp policyTypes:
- Ingress ingress:
- from:
- namespaceSelector: matchLabels: environment: production podSelector: matchLabels: role: client
Egress Network Policy:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-dns-egress spec: podSelector: matchLabels: app: myapp policyTypes:
- Egress egress:
- to:
- namespaceSelector: matchLabels: name: kube-system ports:
- protocol: UDP port: 53
- to:
- podSelector: matchLabels: app: database ports:
- protocol: TCP port: 5432
PodSecurityPolicy
apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: restricted spec: privileged: false allowPrivilegeEscalation: false requiredDropCapabilities:
- ALL volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim' hostNetwork: false hostIPC: false hostPID: false runAsUser: rule: MustRunAsNonRoot seLinux: rule: RunAsAny fsGroup: rule: RunAsAny readOnlyRootFilesystem: false
Autoscaling
Horizontal Pod Autoscaler (HPA)
HPA automatically scales the number of Pods based on observed metrics.
CPU-based HPA:
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: webapp-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: webapp minReplicas: 2 maxReplicas: 10 metrics:
- type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70
Memory-based HPA:
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: memory-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: memory-intensive-app minReplicas: 3 maxReplicas: 15 metrics:
- type: Resource resource: name: memory target: type: Utilization averageUtilization: 80
Multi-Metric HPA:
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: multi-metric-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-server minReplicas: 2 maxReplicas: 20 metrics:
- type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60
- type: Resource resource: name: memory target: type: Utilization averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent value: 50 periodSeconds: 60 scaleUp: stabilizationWindowSeconds: 0 policies:
- type: Percent value: 100 periodSeconds: 30
- type: Pods value: 4 periodSeconds: 30 selectPolicy: Max
Vertical Pod Autoscaler (VPA)
VPA automatically adjusts CPU and memory requests/limits.
apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: app-vpa spec: targetRef: apiVersion: apps/v1 kind: Deployment name: myapp updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies: - containerName: app minAllowed: cpu: 100m memory: 128Mi maxAllowed: cpu: 2 memory: 4Gi controlledResources: ["cpu", "memory"]
Cluster Autoscaler
Cluster Autoscaler adjusts the number of nodes in the cluster.
apiVersion: v1 kind: ConfigMap metadata: name: cluster-autoscaler-priority-expander namespace: kube-system data: priorities: | 10: - .-spot-. 50: - .-ondemand-.
Monitoring and Observability
Metrics Server
Metrics Server provides resource usage metrics.
apiVersion: v1 kind: ServiceAccount metadata: name: metrics-server namespace: kube-system
apiVersion: apps/v1 kind: Deployment metadata: name: metrics-server namespace: kube-system spec: selector: matchLabels: k8s-app: metrics-server template: metadata: labels: k8s-app: metrics-server spec: serviceAccountName: metrics-server containers: - name: metrics-server image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1 args: - --cert-dir=/tmp - --secure-port=4443 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port ports: - name: https containerPort: 4443 protocol: TCP
Prometheus
Prometheus Deployment:
apiVersion: apps/v1 kind: Deployment metadata: name: prometheus namespace: monitoring spec: replicas: 1 selector: matchLabels: app: prometheus template: metadata: labels: app: prometheus spec: serviceAccountName: prometheus containers: - name: prometheus image: prom/prometheus:v2.40.0 args: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--storage.tsdb.retention.time=15d' ports: - containerPort: 9090 volumeMounts: - name: config-volume mountPath: /etc/prometheus - name: storage-volume mountPath: /prometheus volumes: - name: config-volume configMap: name: prometheus-config - name: storage-volume persistentVolumeClaim: claimName: prometheus-pvc
ServiceMonitor for Prometheus Operator:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: app-metrics namespace: monitoring spec: selector: matchLabels: app: myapp endpoints:
- port: metrics interval: 30s path: /metrics
Grafana
apiVersion: apps/v1 kind: Deployment metadata: name: grafana namespace: monitoring spec: replicas: 1 selector: matchLabels: app: grafana template: metadata: labels: app: grafana spec: containers: - name: grafana image: grafana/grafana:9.3.0 ports: - containerPort: 3000 env: - name: GF_SECURITY_ADMIN_PASSWORD valueFrom: secretKeyRef: name: grafana-credentials key: admin-password volumeMounts: - name: grafana-storage mountPath: /var/lib/grafana volumes: - name: grafana-storage persistentVolumeClaim: claimName: grafana-pvc
Logging
Fluentd DaemonSet
apiVersion: apps/v1 kind: DaemonSet metadata: name: fluentd namespace: kube-system spec: selector: matchLabels: k8s-app: fluentd-logging template: metadata: labels: k8s-app: fluentd-logging spec: serviceAccount: fluentd containers: - name: fluentd image: fluent/fluentd-kubernetes-daemonset:v1.15-debian-elasticsearch7-1 env: - name: FLUENT_ELASTICSEARCH_HOST value: "elasticsearch.logging.svc.cluster.local" - name: FLUENT_ELASTICSEARCH_PORT value: "9200" volumeMounts: - name: varlog mountPath: /var/log - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: config-volume mountPath: /fluentd/etc volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: config-volume configMap: name: fluentd-config
Elasticsearch
apiVersion: apps/v1 kind: StatefulSet metadata: name: elasticsearch namespace: logging spec: serviceName: elasticsearch replicas: 3 selector: matchLabels: app: elasticsearch template: metadata: labels: app: elasticsearch spec: containers: - name: elasticsearch image: docker.elastic.co/elasticsearch/elasticsearch:8.5.0 env: - name: cluster.name value: "k8s-logs" - name: node.name valueFrom: fieldRef: fieldPath: metadata.name - name: discovery.seed_hosts value: "elasticsearch-0.elasticsearch,elasticsearch-1.elasticsearch,elasticsearch-2.elasticsearch" - name: cluster.initial_master_nodes value: "elasticsearch-0,elasticsearch-1,elasticsearch-2" - name: ES_JAVA_OPTS value: "-Xms512m -Xmx512m" ports: - containerPort: 9200 name: rest - containerPort: 9300 name: inter-node volumeMounts: - name: data mountPath: /usr/share/elasticsearch/data volumeClaimTemplates:
- metadata: name: data spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 50Gi
Production Operations
Health Checks and Probes
Kubernetes provides three types of probes:
-
Liveness Probe: Determines if a container is running
-
Readiness Probe: Determines if a container is ready to serve traffic
-
Startup Probe: Determines if the application has started
Rolling Updates
kubectl set image deployment/myapp myapp=myapp:v2 kubectl rollout status deployment/myapp kubectl rollout history deployment/myapp kubectl rollout undo deployment/myapp
Pod Disruption Budgets
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: app-pdb spec: minAvailable: 2 selector: matchLabels: app: myapp
PDB with Max Unavailable:
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: database-pdb spec: maxUnavailable: 1 selector: matchLabels: app: database
Taints and Tolerations
Node Taint:
kubectl taint nodes node1 key=value:NoSchedule
Pod Toleration:
apiVersion: v1 kind: Pod metadata: name: toleration-pod spec: tolerations:
- key: "key" operator: "Equal" value: "value" effect: "NoSchedule" containers:
- name: app image: nginx:1.21
Node Affinity
apiVersion: v1 kind: Pod metadata: name: affinity-pod spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: disktype operator: In values: - ssd preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: zone operator: In values: - us-east-1a containers:
- name: app image: nginx:1.21
Pod Anti-Affinity
apiVersion: apps/v1 kind: Deployment metadata: name: web-server spec: replicas: 3 selector: matchLabels: app: web template: metadata: labels: app: web spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - web topologyKey: "kubernetes.io/hostname" containers: - name: web image: nginx:1.21
Troubleshooting
Common kubectl Commands
Get resources
kubectl get pods kubectl get deployments kubectl get services kubectl get nodes
Describe resources
kubectl describe pod <pod-name> kubectl describe node <node-name>
View logs
kubectl logs <pod-name> kubectl logs <pod-name> -c <container-name> kubectl logs -f <pod-name> # Follow logs
Execute commands in pod
kubectl exec -it <pod-name> -- /bin/bash kubectl exec <pod-name> -- ls /app
Port forwarding
kubectl port-forward pod/<pod-name> 8080:80 kubectl port-forward service/<service-name> 8080:80
Resource usage
kubectl top nodes kubectl top pods
Events
kubectl get events --sort-by='.lastTimestamp' kubectl get events --field-selector involvedObject.name=<pod-name>
Debug
kubectl debug node/<node-name> -it --image=ubuntu kubectl run debug-pod --rm -i --tty --image=busybox -- /bin/sh
Common Issues and Solutions
Pod Stuck in Pending State:
-
Check node resources: kubectl describe node
-
Check PVC binding: kubectl describe pvc
-
Check pod events: kubectl describe pod <pod-name>
CrashLoopBackOff:
-
Check logs: kubectl logs <pod-name> --previous
-
Check resource limits
-
Verify liveness and readiness probes
ImagePullBackOff:
-
Verify image name and tag
-
Check registry credentials
-
Verify network connectivity
Service Not Accessible:
-
Verify service selector matches pod labels
-
Check endpoints: kubectl get endpoints <service-name>
-
Test DNS: kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup <service-name>
This comprehensive guide covers the essential aspects of Kubernetes orchestration. For production deployments, always follow security best practices, implement proper monitoring and logging, and regularly update your cluster and applications.