cloud-devops-expert

- Compute: EC2, Lambda (serverless), ECS/EKS (containers), Fargate

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "cloud-devops-expert" with this command: npx skills add oimiragieo/agent-studio/oimiragieo-agent-studio-cloud-devops-expert

Cloud Devops Expert

Core Services:

  • Compute: EC2, Lambda (serverless), ECS/EKS (containers), Fargate

  • Storage: S3 (object), EBS (block), EFS (file system)

  • Database: RDS (relational), DynamoDB (NoSQL), Aurora (MySQL/PostgreSQL)

  • Networking: VPC, ALB/NLB, CloudFront (CDN), Route 53 (DNS)

  • Monitoring: CloudWatch (metrics, logs, alarms)

Best Practices:

  • Use AWS Organizations for multi-account management

  • Implement least privilege with IAM roles and policies

  • Enable CloudTrail for audit logging

  • Use AWS Config for compliance and resource tracking

  • Tag all resources for cost allocation and management

GCP (Google Cloud Platform) Patterns

Core Services:

  • Compute: Compute Engine (VMs), Cloud Functions (serverless), GKE (Kubernetes)

  • Storage: Cloud Storage (object), Persistent Disk (block)

  • Database: Cloud SQL, Cloud Spanner, Firestore

  • Networking: VPC, Cloud Load Balancing, Cloud CDN

  • Monitoring: Cloud Monitoring, Cloud Logging

Best Practices:

  • Use Google Cloud Identity for centralized identity management

  • Implement VPC Service Controls for security perimeters

  • Enable Cloud Audit Logs for compliance

  • Use labels for resource organization and billing

Azure Patterns

Core Services:

  • Compute: Virtual Machines, Azure Functions, AKS (Kubernetes), Container Instances

  • Storage: Blob Storage, Azure Files, Managed Disks

  • Database: Azure SQL, Cosmos DB (NoSQL), PostgreSQL/MySQL

  • Networking: Virtual Network, Application Gateway, Front Door (CDN)

  • Monitoring: Azure Monitor, Log Analytics

Best Practices:

  • Use Azure AD for identity and access management

  • Implement Azure Policy for governance

  • Enable Azure Security Center for threat protection

  • Use resource groups for logical organization

Terraform Best Practices

Project Structure:

terraform/ ├── environments/ │ ├── dev/ │ │ ├── main.tf │ │ ├── variables.tf │ │ └── terraform.tfvars │ ├── staging/ │ └── prod/ ├── modules/ │ ├── vpc/ │ ├── eks/ │ └── rds/ └── global/ └── backend.tf

Code Organization:

  • Use modules for reusable infrastructure components

  • Separate environments with workspaces or directories

  • Store state remotely (S3 + DynamoDB for AWS, GCS for GCP, Azure Blob for Azure)

  • Use variables for environment-specific values

  • Never commit secrets (use AWS Secrets Manager, HashiCorp Vault, etc.)

Terraform Workflow:

Initialize

terraform init

Plan (review changes)

terraform plan -out=tfplan

Apply (execute changes)

terraform apply tfplan

Destroy (when needed)

terraform destroy

Best Practices:

  • Use terraform fmt for consistent formatting

  • Use terraform validate to check syntax

  • Implement state locking to prevent concurrent modifications

  • Use terraform import for existing resources

  • Version pin providers: required_version = "~> 1.5"

  • Use data sources for referencing existing resources

  • Implement depends_on for explicit resource dependencies

Kubernetes Deployment Patterns

Deployment Strategies:

  • Rolling Update: Gradual replacement of pods (default)

  • Blue/Green: Run two identical environments, switch traffic

  • Canary: Gradual traffic shift to new version

  • Recreate: Terminate old pods before creating new ones (downtime)

Resource Management:

apiVersion: apps/v1 kind: Deployment metadata: name: myapp spec: replicas: 3 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: myapp:v1.0.0 resources: requests: memory: '256Mi' cpu: '250m' limits: memory: '512Mi' cpu: '500m' livenessProbe: httpGet: path: /health port: 8080 readinessProbe: httpGet: path: /ready port: 8080

Best Practices:

  • Use namespaces for environment/team isolation

  • Implement RBAC for access control

  • Define resource requests and limits

  • Use liveness and readiness probes

  • Use ConfigMaps and Secrets for configuration

  • Implement Pod Security Policies (PSP) or Pod Security Standards (PSS)

  • Use Horizontal Pod Autoscaler (HPA) for auto-scaling

CI/CD Pipeline Patterns

GitHub Actions Example:

name: CI/CD Pipeline

on: push: branches: [main, develop] pull_request: branches: [main]

jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run tests run: npm test

build: needs: test runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Build Docker image run: docker build -t myapp:${{ github.sha }} . - name: Push to registry run: docker push myapp:${{ github.sha }}

deploy: needs: build runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' steps: - name: Deploy to Kubernetes run: kubectl set image deployment/myapp myapp=myapp:${{ github.sha }}

Best Practices:

  • Implement automated testing (unit, integration, e2e)

  • Use matrix builds for multi-platform testing

  • Cache dependencies to speed up builds

  • Use secrets management for sensitive data

  • Implement deployment gates and approvals for production

  • Use semantic versioning for releases

  • Implement rollback strategies

Infrastructure as Code (IaC) Principles

Version Control:

  • Store all infrastructure code in Git

  • Use pull requests for code review

  • Implement branch protection rules

  • Tag releases for production deployments

Testing:

  • Use terraform plan to preview changes

  • Implement policy-as-code with Sentinel, OPA, or Checkov

  • Use tflint for Terraform linting

  • Test modules in isolation

Documentation:

  • Document module inputs and outputs

  • Maintain README files for each module

  • Use terraform-docs to auto-generate documentation

Monitoring and Observability

The Three Pillars:

Metrics (Prometheus + Grafana)

  • Use Prometheus for metrics collection

  • Define SLIs (Service Level Indicators)

  • Set up alerting rules

  • Create Grafana dashboards for visualization

Logs (ELK Stack, CloudWatch, Cloud Logging)

  • Centralize logs from all services

  • Implement structured logging (JSON format)

  • Use log aggregation and parsing

  • Set up log-based alerts

Traces (Jaeger, Zipkin, X-Ray)

  • Implement distributed tracing

  • Track request flow across microservices

  • Identify performance bottlenecks

  • Correlate traces with logs and metrics

Observability Best Practices:

  • Define SLOs (Service Level Objectives) and SLAs

  • Implement health check endpoints

  • Use APM (Application Performance Monitoring) tools

  • Set up on-call rotations and runbooks

  • Practice incident response procedures

Container Orchestration (Kubernetes)

Helm Charts:

  • Use Helm for package management

  • Create reusable chart templates

  • Use values files for environment-specific configuration

  • Version and publish charts to chart repository

Kubernetes Operators:

  • Automate operational tasks

  • Manage complex stateful applications

  • Examples: Prometheus Operator, Postgres Operator

Service Mesh (Istio, Linkerd):

  • Implement traffic management (canary, blue/green)

  • Enable mutual TLS for service-to-service communication

  • Implement circuit breakers and retries

  • Observe traffic with distributed tracing

Cost Optimization

AWS Cost Optimization:

  • Use Reserved Instances or Savings Plans for predictable workloads

  • Implement auto-scaling to match demand

  • Use S3 lifecycle policies to transition to cheaper storage classes

  • Enable Cost Explorer and set up budgets

  • Right-size instances based on usage metrics

Multi-Cloud Cost Management:

  • Use tags/labels for cost allocation

  • Implement chargeback models for team accountability

  • Use spot/preemptible instances for non-critical workloads

  • Monitor unused resources (idle VMs, unattached volumes)

Cloudflare Developer Platform

Cloudflare Workers & Pages:

  • Edge computing platform for serverless functions

  • Deploy at the edge (close to users globally)

  • Use Workers KV for edge key-value storage

  • Use Durable Objects for stateful applications

Cloudflare Primitives:

  • R2: S3-compatible object storage (no egress fees)

  • D1: SQLite-based serverless database

  • KV: Key-value storage (globally distributed)

  • AI: Run AI inference at the edge

  • Queues: Message queuing service

  • Vectorize: Vector database for embeddings

Configuration (wrangler.toml):

name = "my-worker" main = "src/index.ts" compatibility_date = "2024-01-01"

[[kv_namespaces]] binding = "MY_KV" id = "xxx"

[[r2_buckets]] binding = "MY_BUCKET" bucket_name = "my-bucket"

[[d1_databases]] binding = "DB" database_name = "my-db" database_id = "xxx"

Consolidated Skills

This expert skill consolidates 1 individual skills:

  • cloudflare-developer-tools-rule

Related Skills

  • docker-compose
  • Container orchestration and multi-container application management

Memory Protocol (MANDATORY)

Before starting:

cat .claude/context/memory/learnings.md

After completing: Record any new patterns or exceptions discovered.

ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

pyqt6-ui-development-rules

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

code-analyzer

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

gcloud-cli

No summary provided by upstream source.

Repository SourceNeeds Review