Apache Kafka
Event streaming for Kubernetes. Strimzi operator, KRaft mode, no ZooKeeper.
Quick Start (Tested)
make install # Deploy Strimzi + Kafka make test # Verify everything works make status # Show resources make uninstall # Clean up
Requirements: Kubernetes cluster, Helm 3+
Versions: Strimzi 0.49+, Kafka 4.1.1
Resource Detection & Adaptation
Before generating manifests, detect the target environment:
Detect machine memory
sysctl -n hw.memsize 2>/dev/null | awk '{print $0/1024/1024/1024 " GB"}' ||
grep MemTotal /proc/meminfo | awk '{print $2/1024/1024 " GB"}'
Detect Docker Desktop allocation
docker info --format '{{.MemTotal}}' 2>/dev/null | awk '{print $0/1024/1024/1024 " GB"}'
Detect Kubernetes node capacity
kubectl get nodes -o jsonpath='{.items[0].status.capacity.memory}' 2>/dev/null
Adapt resource configuration based on detection:
Detected RAM Profile Kafka Memory Action
< 12GB Minimal 512Mi-1Gi Warn user about constraints
12-24GB Standard 1Gi-2Gi Default configuration
24GB Production 4Gi-8Gi Enable full features
Adaptive Resource Templates
Minimal (detected < 12GB):
resources: requests: memory: 512Mi cpu: 200m limits: memory: 1Gi cpu: 500m
⚠️ Agent should warn: "Limited resources detected. Kafka may be unstable under load."
Standard (detected 12-24GB):
resources: requests: memory: 1Gi cpu: 250m limits: memory: 2Gi cpu: 1000m
Production (detected > 24GB or real cluster):
resources: requests: memory: 4Gi cpu: 1000m limits: memory: 8Gi cpu: 4000m
Agent Behavior
-
Always detect before generating manifests
-
Adapt resource configs to detected environment
-
Warn if resources are insufficient for requested workload
-
Suggest Docker Desktop settings if running locally
What This Skill Does
Task How
Analyze coupling Identify temporal, availability, behavioral issues
Explain eventual consistency Consistency windows, read-your-writes patterns
Design events Domain events, CloudEvents, Avro schemas
Deploy Kafka Helm (Strimzi) + kubectl (manifests)
Create topics KafkaTopic CRD
Build producers confluent-kafka-python templates
Build consumers AIOConsumer for FastAPI
Debug issues Runbooks in references/
What This Skill Does NOT Do
-
Deploy ZooKeeper (KRaft only)
-
Manage Kafka Streams applications
-
Configure multi-datacenter replication
Deployment
Install Strimzi Operator
helm repo add strimzi https://strimzi.io/charts helm install strimzi-operator strimzi/strimzi-kafka-operator -n kafka --create-namespace --wait
Deploy Kafka Cluster
kubectl apply -f manifests/kafka-cluster.yaml -n kafka kubectl wait kafka/dev-cluster --for=condition=Ready --timeout=300s -n kafka
Create Topic
kubectl apply -f manifests/kafka-topic.yaml -n kafka
Verify
kubectl get kafka,kafkatopic,pods -n kafka
Core Concepts
Topic = Named stream (like a database table) Partition = Ordered log within topic (parallelism unit) Consumer Group = Consumers sharing work (partition → one consumer) Offset = Consumer position (commit to track progress) Broker = Kafka server Controller = Metadata manager (KRaft replaces ZooKeeper)
Local Development
Connect from your host machine (no port-forward needed):
From your local machine (outside Kubernetes)
producer = Producer({'bootstrap.servers': 'localhost:30092'})
Connect from inside Kubernetes (pod-to-pod):
From another pod in the cluster
producer = Producer({'bootstrap.servers': 'dev-cluster-kafka-bootstrap.kafka:9092'})
Location Bootstrap Server
Local machine localhost:30092
Same namespace dev-cluster-kafka-bootstrap:9092
Different namespace dev-cluster-kafka-bootstrap.kafka.svc.cluster.local:9092
Producer/Consumer (Python)
from confluent_kafka import Producer, Consumer
Producer (production config)
producer = Producer({ 'bootstrap.servers': 'localhost:30092', # Or K8s service for pods 'acks': 'all', 'enable.idempotence': True, }) producer.produce('my-topic', key='key', value='message') producer.flush()
Consumer
consumer = Consumer({ 'bootstrap.servers': 'localhost:30092', 'group.id': 'my-group', 'auto.offset.reset': 'earliest', 'enable.auto.commit': False, }) consumer.subscribe(['my-topic']) msg = consumer.poll(1.0)
See assets/templates/producer-consumer.py for async FastAPI integration.
Debugging
Check consumer lag
kubectl exec -n kafka dev-cluster-dual-role-0 --
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092
--describe --group <group-name>
List topics
kubectl exec -n kafka dev-cluster-dual-role-0 --
bin/kafka-topics.sh --bootstrap-server localhost:9092 --list
Describe topic
kubectl exec -n kafka dev-cluster-dual-role-0 --
bin/kafka-topics.sh --bootstrap-server localhost:9092
--describe --topic <topic-name>
See references/debugging-runbooks.md for detailed troubleshooting.
Delivery Semantics
Guarantee Config Use When
At-most-once acks=0
Metrics, logs (may lose)
At-least-once acks=all
- manual commit Most cases (may duplicate)
Exactly-once Transactions Financial (higher latency)
Default: At-least-once with idempotent consumers.
File Structure
kafka/ ├── Makefile # Tested deployment commands ├── manifests/ │ ├── kafka-cluster.yaml # KRaft cluster (tested) │ └── kafka-topic.yaml # Topic CRD ├── assets/templates/ │ └── producer-consumer.py # Python async templates └── references/ # Deep knowledge ├── core-concepts.md ├── producers.md ├── consumers.md ├── debugging-runbooks.md ├── gotchas.md └── ... (15 files)
Architecture Analysis
When analyzing synchronous architectures for coupling:
Scenario: Service A calls B, C, D directly (500ms each)
Temporal Coupling? └── Does caller wait for all responses? → YES = coupled
Availability Coupling? └── If B is down, does A fail? → YES = coupled
Behavioral Coupling? └── Does A import B, C, D clients? → YES = coupled
Solution: Publish domain event, services consume independently.
See references/architecture-patterns.md for detailed analysis templates.
References
File When to Read
references/architecture-patterns.md
Coupling analysis, eventual consistency, when to use Kafka
references/agent-event-patterns.md
AI agent coordination, correlation IDs, fanout
references/strimzi-deployment.md
KRaft mode, CRDs, storage sizing
references/producers.md
Producer configuration, batching, tuning
references/consumers.md
Consumer groups, commits
references/delivery-semantics.md
At-most/least/exactly-once decision tree
references/outbox-pattern.md
Transactional outbox with Debezium CDC
references/debugging-runbooks.md
Lag, rebalancing issues
references/monitoring.md
Prometheus, alerts, Grafana
references/gotchas.md
Common mistakes
references/security-patterns.md
SCRAM, mTLS
Related Skills
Skill Use For
/kubernetes
Cluster operations
/helm
Chart customization
/docker
Local development