server-management

Server management principles for production operations. Learn to THINK, not memorize commands.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "server-management" with this command: npx skills add tai-ch0802/skills-bundle/tai-ch0802-skills-bundle-server-management

Server Management

Server management principles for production operations. Learn to THINK, not memorize commands.

  1. Process Management Principles

Tool Selection

Scenario Tool

Node.js app PM2 (clustering, reload)

Any app systemd (Linux native)

Containers Docker/Podman

Orchestration Kubernetes, Docker Swarm

Process Management Goals

Goal What It Means

Restart on crash Auto-recovery

Zero-downtime reload No service interruption

Clustering Use all CPU cores

Persistence Survive server reboot

  1. Monitoring Principles

What to Monitor

Category Key Metrics

Availability Uptime, health checks

Performance Response time, throughput

Errors Error rate, types

Resources CPU, memory, disk

Alert Severity Strategy

Level Response

Critical Immediate action

Warning Investigate soon

Info Review daily

Monitoring Tool Selection

Need Options

Simple/Free PM2 metrics, htop

Full observability Grafana, Datadog

Error tracking Sentry

Uptime UptimeRobot, Pingdom

  1. Log Management Principles

Log Strategy

Log Type Purpose

Application logs Debug, audit

Access logs Traffic analysis

Error logs Issue detection

Log Principles

  • Rotate logs to prevent disk fill

  • Structured logging (JSON) for parsing

  • Appropriate levels (error/warn/info/debug)

  • No sensitive data in logs

  1. Scaling Decisions

When to Scale

Symptom Solution

High CPU Add instances (horizontal)

High memory Increase RAM or fix leak

Slow response Profile first, then scale

Traffic spikes Auto-scaling

Scaling Strategy

Type When to Use

Vertical Quick fix, single instance

Horizontal Sustainable, distributed

Auto Variable traffic

  1. Health Check Principles

What Constitutes Healthy

Check Meaning

HTTP 200 Service responding

Database connected Data accessible

Dependencies OK External services reachable

Resources OK CPU/memory not exhausted

Health Check Implementation

  • Simple: Just return 200

  • Deep: Check all dependencies

  • Choose based on load balancer needs

  1. Security Principles

Area Principle

Access SSH keys only, no passwords

Firewall Only needed ports open

Updates Regular security patches

Secrets Environment vars, not files

Audit Log access and changes

  1. Troubleshooting Priority

When something's wrong:

  • Check if running (process status)

  • Check logs (error messages)

  • Check resources (disk, memory, CPU)

  • Check network (ports, DNS)

  • Check dependencies (database, APIs)

  1. Anti-Patterns

❌ Don't ✅ Do

Run as root Use non-root user

Ignore logs Set up log rotation

Skip monitoring Monitor from day one

Manual restarts Auto-restart config

No backups Regular backup schedule

Remember: A well-managed server is boring. That's the goal.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

testing-mastery

No summary provided by upstream source.

Repository SourceNeeds Review
General

prd

No summary provided by upstream source.

Repository SourceNeeds Review
General

skill-creator

No summary provided by upstream source.

Repository SourceNeeds Review
General

seo-fundamentals

No summary provided by upstream source.

Repository SourceNeeds Review