cloud-infrastructure

Comprehensive cloud infrastructure skill covering multi-cloud architecture, Infrastructure as Code, cost optimization, and production deployment patterns.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "cloud-infrastructure" with this command: npx skills add 89jobrien/steve/89jobrien-steve-cloud-infrastructure

Cloud Infrastructure

Comprehensive cloud infrastructure skill covering multi-cloud architecture, Infrastructure as Code, cost optimization, and production deployment patterns.

When to Use This Skill

  • Designing cloud architecture for new applications

  • Implementing Infrastructure as Code (Terraform, CloudFormation, Pulumi)

  • Cost optimization and resource right-sizing

  • Multi-region and high-availability deployments

  • Cloud migration planning

  • Security and compliance implementation

  • Auto-scaling and performance optimization

Cloud Architecture Patterns

Compute Patterns

Pattern AWS Azure GCP Use Case

Serverless Lambda Functions Cloud Functions Event-driven, variable load

Containers ECS/EKS AKS GKE Microservices, consistent env

VMs EC2 Virtual Machines Compute Engine Legacy apps, full control

Batch Batch Batch Batch Large-scale processing

Storage Patterns

Type AWS Azure GCP Use Case

Object S3 Blob Storage Cloud Storage Static files, backups

Block EBS Managed Disks Persistent Disk Database storage

File EFS Azure Files Filestore Shared file systems

Archive Glacier Archive Coldline Long-term retention

Database Patterns

Type AWS Azure GCP Use Case

Relational RDS, Aurora SQL Database Cloud SQL ACID transactions

NoSQL DynamoDB Cosmos DB Firestore Flexible schema

Cache ElastiCache Cache for Redis Memorystore Session, caching

Data Warehouse Redshift Synapse BigQuery Analytics

Infrastructure as Code

Terraform Best Practices

Project Structure:

infrastructure/ ├── modules/ │ ├── networking/ │ ├── compute/ │ └── database/ ├── environments/ │ ├── dev/ │ ├── staging/ │ └── prod/ ├── main.tf ├── variables.tf ├── outputs.tf └── versions.tf

State Management:

  • Use remote state (S3, Azure Blob, GCS)

  • Enable state locking (DynamoDB, Blob lease)

  • Separate state per environment

  • Never commit state files

Module Design:

  • Single responsibility per module

  • Expose minimal required variables

  • Document inputs/outputs

  • Version modules with git tags

Cost Optimization

Compute Savings:

  • Reserved Instances (1-3 year commitment): 30-60% savings

  • Spot/Preemptible instances: 60-90% savings for interruptible workloads

  • Right-sizing: Match instance size to actual usage

  • Auto-scaling: Scale down during low usage

Storage Savings:

  • Lifecycle policies: Auto-transition to cheaper tiers

  • Compression: Reduce storage footprint

  • Deduplication: Eliminate redundant data

  • Delete unused resources: Orphaned volumes, snapshots

Network Savings:

  • Use CDN for static content

  • Optimize data transfer paths

  • Use private endpoints

  • Compress API responses

High Availability Patterns

Multi-AZ Deployment

  • Deploy across 2-3 availability zones

  • Use load balancers for distribution

  • Database replication across AZs

  • Automatic failover configuration

Multi-Region Deployment

  • Active-active or active-passive

  • DNS-based routing (Route53, Traffic Manager)

  • Data replication strategy

  • Disaster recovery procedures

Resilience Patterns

  • Circuit breakers for external dependencies

  • Retry with exponential backoff

  • Bulkhead isolation

  • Graceful degradation

Security Best Practices

Identity & Access

  • Principle of least privilege

  • Use IAM roles, not long-term credentials

  • Enable MFA for privileged accounts

  • Regular access reviews

Network Security

  • VPC/VNet isolation

  • Security groups as firewalls

  • Private subnets for backend services

  • VPN/Direct Connect for hybrid

Data Protection

  • Encryption at rest (KMS)

  • Encryption in transit (TLS)

  • Key rotation policies

  • Backup and recovery testing

Monitoring & Observability

Key Metrics

  • CPU, Memory, Disk utilization

  • Network throughput and latency

  • Error rates and types

  • Cost per service/team

Alerting Strategy

  • Set thresholds based on baselines

  • Alert on symptoms, not causes

  • Runbooks for each alert

  • Escalation paths defined

Reference Files

  • references/terraform_patterns.md

  • IaC patterns and examples

  • references/cost_optimization.md

  • Detailed cost reduction strategies

Integration with Other Skills

  • security-engineering - For security architecture

  • network-engineering - For network design

  • performance - For optimization strategies

  • devops-runbooks - For operational procedures

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

dead-code-removal

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

ai-code-cleanup

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

python-scripting

No summary provided by upstream source.

Repository SourceNeeds Review