performance-profiling

Analyze system and application performance using Linux kernel-level tools (perf, ftrace, eBPF, SystemTap) and application-level profiling to identify bottlenecks and optimize performance

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "performance-profiling" with this command: npx skills add wojons/skills/wojons-skills-performance-profiling

Performance Profiling

Analyze system and application performance using comprehensive profiling techniques including Linux kernel-level tools (perf, ftrace, eBPF, SystemTap), application-level profiling, bottleneck identification, and optimization recommendations to improve system responsiveness, throughput, and resource efficiency.

When to use me

Use this skill when:

  • Application performance is slow or degrading
  • System resource utilization is high
  • Identifying CPU, memory, I/O, or network bottlenecks
  • Optimizing application response times
  • Debugging performance regressions
  • Capacity planning and resource sizing
  • Comparing performance before/after changes
  • Analyzing production performance issues
  • Creating performance baselines
  • Tuning system and application parameters

What I do

1. System-Level Profiling

  • CPU profiling: Analyze CPU usage, context switches, interrupts, scheduler latency
  • Memory profiling: Analyze memory usage, page faults, swapping, memory leaks
  • I/O profiling: Analyze disk I/O, file system performance, storage latency
  • Network profiling: Analyze network throughput, latency, packet loss, connections
  • Kernel profiling: Analyze kernel functions, system calls, interrupt handlers

2. Application-Level Profiling

  • Application CPU usage: Profile application-specific CPU consumption
  • Memory allocation: Track heap allocations, garbage collection, memory leaks
  • Function timing: Measure function execution times and call frequencies
  • Database query profiling: Analyze SQL query performance and optimization
  • API endpoint profiling: Measure API response times and throughput

3. Tool Integration

  • Linux perf: CPU profiling, hardware performance counters, tracepoints
  • eBPF/BCC: Dynamic tracing, custom performance instrumentation
  • Ftrace: Kernel function tracing, event tracing, latency measurements
  • SystemTap: System-wide tracing and profiling
  • Application profilers: Language-specific profiling tools
  • Container profiling: Docker, Kubernetes performance analysis

4. Bottleneck Identification

  • Hot spot detection: Identify frequently executed code paths
  • Resource contention: Detect lock contention, CPU starvation, I/O wait
  • Latency analysis: Measure and analyze latency distributions
  • Scalability analysis: Identify scalability limits and bottlenecks
  • Anomaly detection: Detect performance anomalies and regressions

5. Optimization Recommendations

  • Code optimizations: Suggest algorithmic improvements, caching strategies
  • Configuration tuning: Recommend system and application tuning parameters
  • Architecture improvements: Suggest architectural changes for performance
  • Resource allocation: Recommend optimal resource allocation strategies
  • Monitoring setup: Recommend performance monitoring configurations

6. Visualization & Reporting

  • Flame graphs: Generate CPU and memory flame graphs for visualization
  • Heat maps: Create latency heat maps for time-series analysis
  • Performance dashboards: Create real-time performance dashboards
  • Trend analysis: Analyze performance trends over time
  • Comparison reports: Compare performance across versions/environments

Profiling Tools Covered

Linux Kernel-Level Tools

  • perf: Linux performance events for CPU profiling, hardware counters
  • eBPF/BCC: Extended Berkeley Packet Filter for dynamic tracing
  • bpftrace: High-level tracing language for eBPF
  • Ftrace: Linux kernel internal tracer for function tracing
  • SystemTap: System-wide tracing and profiling framework
  • LTTng: Linux Trace Toolkit next generation
  • ktap: Lightweight kernel tracing

Application-Level Tools

  • Java: JProfiler, YourKit, VisualVM, Async Profiler
  • Python: cProfile, py-spy, Scalene, line_profiler
  • Node.js: clinic.js, 0x, node --prof, v8-profiler
  • Go: pprof, trace, delve, gops
  • Ruby: ruby-prof, stackprof, rbspy
  • .NET: dotnet-counters, dotnet-trace, PerfView
  • PHP: Xdebug, Blackfire, Tideways
  • C/C++: gprof, Valgrind, Intel VTune, perf

System Monitoring Tools

  • top/htop: Process monitoring
  • vmstat: Virtual memory statistics
  • iostat: I/O statistics
  • netstat/ss: Network statistics
  • sar: System activity reporter
  • dstat: Versatile resource statistics
  • nmon: Nigel's performance monitor

Visualization Tools

  • FlameGraph: CPU and memory flame graphs
  • perfetto: System tracing and performance visualization
  • grafana: Performance dashboard visualization
  • prometheus: Time-series monitoring and alerting
  • jaeger: Distributed tracing visualization

Analysis Techniques

CPU Profiling with perf

# Sample CPU usage for 30 seconds
perf record -F 99 -ag -- sleep 30

# Generate flame graph
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg

# Analyze hardware performance counters
perf stat -e cycles,instructions,cache-misses,branch-misses ./application

# Trace system calls
perf trace -e syscalls:sys_enter_* ./application

eBPF Tracing with BCC

from bcc import BPF

# eBPF program to trace function calls
bpf_text = """
#include <uapi/linux/ptrace.h>

struct data_t {
    u64 timestamp;
    u32 pid;
    char comm[TASK_COMM_LEN];
    u64 duration_ns;
};

BPF_HASH(start, u32);
BPF_PERF_OUTPUT(events);

int trace_entry(struct pt_regs *ctx) {
    u32 pid = bpf_get_current_pid_tgid();
    u64 ts = bpf_ktime_get_ns();
    
    start.update(&pid, &ts);
    return 0;
}

int trace_return(struct pt_regs *ctx) {
    u32 pid = bpf_get_current_pid_tgid();
    u64 *tsp = start.lookup(&pid);
    
    if (tsp == 0) {
        return 0;
    }
    
    u64 duration = bpf_ktime_get_ns() - *tsp;
    
    struct data_t data = {};
    data.timestamp = bpf_ktime_get_ns();
    data.pid = pid;
    data.duration_ns = duration;
    bpf_get_current_comm(&data.comm, sizeof(data.comm));
    
    events.perf_submit(ctx, &data, sizeof(data));
    start.delete(&pid);
    
    return 0;
}
"""

# Attach to function entry and return
bpf = BPF(text=bpf_text)
bpf.attach_uprobe(name="application", sym="function_name", fn_name="trace_entry")
bpf.attach_uretprobe(name="application", sym="function_name", fn_name="trace_return")

Memory Leak Detection

# Monitor memory allocations
valgrind --leak-check=full --show-leak-kinds=all ./application

# Track heap allocations with eBPF
/usr/share/bcc/tools/memleak -p $(pidof application)

# Analyze memory usage over time
cat /proc/$(pidof application)/smaps | grep -i pss | awk '{total+=$2} END {print total}'

# Monitor garbage collection (Java)
jstat -gc $(pidof java) 1s

Latency Analysis

def analyze_latency_distribution(latency_samples):
    """
    Analyze latency distribution and identify outliers.
    """
    import numpy as np
    from scipy import stats
    
    latencies = np.array(latency_samples)
    
    analysis = {
        'count': len(latencies),
        'mean': np.mean(latencies),
        'median': np.median(latencies),
        'p90': np.percentile(latencies, 90),
        'p95': np.percentile(latencies, 95),
        'p99': np.percentile(latencies, 99),
        'std_dev': np.std(latencies),
        'min': np.min(latencies),
        'max': np.max(latencies),
        'outliers': []
    }
    
    # Identify outliers using IQR method
    q1 = np.percentile(latencies, 25)
    q3 = np.percentile(latencies, 75)
    iqr = q3 - q1
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr
    
    outliers = latencies[(latencies < lower_bound) | (latencies > upper_bound)]
    analysis['outliers'] = outliers.tolist()
    analysis['outlier_percentage'] = len(outliers) / len(latencies) * 100
    
    return analysis

Examples

# Profile CPU usage for 60 seconds
npm run performance-profiling:cpu -- --duration 60 --output cpu-profile.json

# Generate flame graph
npm run performance-profiling:flamegraph -- --pid $(pidof application) --output flamegraph.svg

# Analyze memory leaks
npm run performance-profiling:memory -- --application myapp --leak-check

# Trace database queries
npm run performance-profiling:database -- --database postgresql --duration 300

# Profile API endpoints
npm run performance-profiling:api -- --endpoints "/api/*" --duration 60 --output api-performance.json

# Compare performance before/after changes
npm run performance-profiling:compare -- --before baseline.json --after new-version.json --output comparison.json

# Analyze system resource usage
npm run performance-profiling:system -- --metrics cpu,memory,disk,network --duration 300

# Create performance dashboard
npm run performance-profiling:dashboard -- --metrics all --interval 1s --duration 3600

# Detect bottlenecks in microservices
npm run performance-profiling:microservices -- --services auth,payment,notification --duration 600

# Optimize configuration based on profiling
npm run performance-profiling:optimize -- --profile profile.json --output optimizations.md

# Monitor production performance
npm run performance-profiling:monitor -- --production --alert-threshold p95:200ms

Output format

Performance Profiling Report:

Performance Profiling Report
────────────────────────────
System: payment-processing-service
Analysis Date: 2026-02-26
Duration: 300 seconds
Profiling Tools: perf, eBPF, Application Profiler

Executive Summary:
⚠️ Performance issues detected: 3 critical, 2 warnings
✅ System resources: Within normal limits
📊 Overall performance score: 72/100

Critical Issues:
1. ❌ Database query bottleneck (Severity: Critical)
   • Query: SELECT * FROM transactions WHERE user_id = ?
   • Average latency: 450ms (p95: 1200ms)
   • Frequency: 1200 executions/minute
   • Root cause: Missing index on user_id column
   • Impact: 40% of API latency
   • Recommendation: Add index on transactions.user_id

2. ❌ Memory leak in cache service (Severity: Critical)
   • Service: redis-cache-service
   • Memory growth: 2MB/minute
   • Total leaked: 120MB over 1 hour
   • Pattern: Cache entries not expired properly
   • Recommendation: Implement TTL and LRU eviction

3. ❌ CPU contention in payment processor (Severity: Critical)
   • Function: processPayment() in payment-service
   • CPU usage: 85% during peak
   • Bottleneck: Cryptographic operations
   • Recommendation: Implement caching or hardware acceleration

Warnings:
1. ⚠️ API endpoint latency degradation (Severity: Warning)
   • Endpoint: POST /api/v1/payments
   • p95 latency increase: 150ms → 320ms (+113%)
   • Timeframe: Last 7 days
   • Recommendation: Profile endpoint and optimize

2. ⚠️ Garbage collection pauses (Severity: Warning)
   • Application: notification-service (Java)
   • GC pauses: 45ms average, 120ms max
   • Frequency: Every 30 seconds
   • Recommendation: Tune JVM garbage collector

System Resource Analysis:
┌────────────────────┬────────────┬────────────┬────────────┐
│ Resource           │ Usage      │ Threshold │ Status     │
├────────────────────┼────────────┼────────────┼────────────┤
│ CPU                │ 65%        │ 80%       ✅ Normal     │
│ Memory             │ 72%        │ 85%       ✅ Normal     │
│ Disk I/O           │ 45%        │ 70%       ✅ Normal     │
│ Network            │ 38%        │ 60%       ✅ Normal     │
│ Database Connections│ 85%       │ 90%       ⚠️ Warning    │
└────────────────────┴────────────┴────────────┴────────────┘

Application Performance:
• API Response Times:
  - p50: 85ms ✅
  - p95: 320ms ⚠️
  - p99: 1200ms ❌
  - Success Rate: 99.8% ✅

• Database Performance:
  - Query Cache Hit Rate: 65% ⚠️
  - Average Query Time: 85ms ✅
  - Slow Queries (>100ms): 12% ⚠️
  - Connection Pool Usage: 85% ⚠️

• Cache Performance:
  - Redis Hit Rate: 92% ✅
  - Cache Latency: 3ms ✅
  - Memory Usage: 78% ⚠️
  - Eviction Rate: 5% ✅

Flame Graph Analysis:
• Hot Functions:
  1. processPayment() - 35% CPU time
  2. validateTransaction() - 22% CPU time
  3. updateDatabase() - 18% CPU time
  4. sendNotification() - 8% CPU time
  5. logActivity() - 5% CPU time

• Optimization Opportunities:
  1. Cache validation results (potential 15% improvement)
  2. Batch database updates (potential 10% improvement)
  3. Async notifications (potential 8% improvement)

Memory Analysis:
• Heap Usage: 2.4GB
• Stack Usage: 320MB
• Native Memory: 450MB
• Garbage Collection:
  - Young GC: 45ms every 30s
  - Full GC: 120ms every 5min
  - Throughput: 98.5%

I/O Analysis:
• Disk Read: 45MB/s (average)
• Disk Write: 28MB/s (average)
• File Descriptors: 1250/4096 (31%)
• Network Throughput:
  - Inbound: 85Mbps
  - Outbound: 120Mbps
  - Connections: 850 active

Bottleneck Timeline:
┌─────────────────────────────────────────────────────────────┐
│ Bottleneck Timeline (Last 60 minutes)                       │
│                                                             │
│ 00:00 ┼───────┬──────────────┬─────────────┬────────────── │
│       │ CPU   │ Database     │ Memory      │ Network       │
│ 15:00 ┼───────┼──────────────┼─────────────┼────────────── │
│       │ ███   │ █████████    │ ███         │ ██            │
│ 30:00 ┼───────┼──────────────┼─────────────┼────────────── │
│       │ █████ │ ████████████ │ █████       │ ███           │
│ 45:00 ┼───────┼──────────────┼─────────────┼────────────── │
│       │ ██████│ █████████████│ ███████     │ ████          │
│ 60:00 ┼───────┴──────────────┴─────────────┴────────────── │
│      0%                   50%                   100%       │
└─────────────────────────────────────────────────────────────┘

Optimization Recommendations:
1. Immediate (High Impact):
   • Add database index on transactions.user_id
   • Implement cache TTL for redis-cache-service
   • Optimize processPayment() cryptographic operations

2. Short-term (Medium Impact):
   • Implement connection pooling for database
   • Add query caching for frequent queries
   • Batch database writes where possible

3. Long-term (Architectural):
   • Implement read replicas for database
   • Add CDN for static assets
   • Implement circuit breakers for external services

Performance Metrics Baseline:
• CPU Usage: < 70% target
• Memory Usage: < 80% target
• API p95 Latency: < 200ms target
• Database Query Time: < 100ms target
• Cache Hit Rate: > 90% target

Monitoring Configuration:
• Alert on: p95 latency > 200ms
• Alert on: CPU usage > 80% for 5 minutes
• Alert on: Memory usage > 85%
• Alert on: Error rate > 1%
• Dashboard: Real-time performance metrics

Next Steps:
1. Implement database index (estimate: 2 hours)
2. Fix memory leak in cache service (estimate: 4 hours)
3. Optimize payment processor CPU usage (estimate: 8 hours)
4. Deploy optimizations with feature flags
5. Monitor performance for 24 hours
6. Schedule performance regression tests

JSON Output Format:

{
  "analysis": {
    "system": "payment-processing-service",
    "analysis_date": "2026-02-26",
    "duration_seconds": 300,
    "profiling_tools": ["perf", "ebpf", "application_profiler"],
    "overall_score": 72
  },
  "critical_issues": [
    {
      "id": "issue-db-001",
      "description": "Database query bottleneck",
      "severity": "critical",
      "component": "database",
      "metric": "query_latency",
      "average_value": 450,
      "p95_value": 1200,
      "unit": "ms",
      "frequency": "1200 executions/minute",
      "root_cause": "Missing index on user_id column",
      "impact": "40% of API latency",
      "recommendation": "Add index on transactions.user_id",
      "estimated_effort_hours": 2,
      "priority": "high"
    },
    {
      "id": "issue-memory-001",
      "description": "Memory leak in cache service",
      "severity": "critical",
      "component": "cache",
      "metric": "memory_growth",
      "average_value": 2,
      "unit": "MB/minute",
      "total_leaked": 120,
      "total_leaked_unit": "MB",
      "timeframe": "1 hour",
      "pattern": "Cache entries not expired properly",
      "recommendation": "Implement TTL and LRU eviction",
      "estimated_effort_hours": 4,
      "priority": "high"
    }
  ],
  "system_resources": {
    "cpu": {
      "usage_percentage": 65,
      "threshold": 80,
      "status": "normal",
      "breakdown": {
        "user": 45,
        "system": 20,
        "iowait": 8,
        "steal": 2
      }
    },
    "memory": {
      "usage_percentage": 72,
      "threshold": 85,
      "status": "normal",
      "breakdown": {
        "heap": 2400,
        "stack": 320,
        "native": 450,
        "cached": 1200
      }
    },
    "disk_io": {
      "usage_percentage": 45,
      "threshold": 70,
      "status": "normal",
      "read_mbps": 45,
      "write_mbps": 28
    },
    "network": {
      "usage_percentage": 38,
      "threshold": 60,
      "status": "normal",
      "inbound_mbps": 85,
      "outbound_mbps": 120,
      "connections": 850
    }
  },
  "application_performance": {
    "api_response_times": {
      "p50_ms": 85,
      "p95_ms": 320,
      "p99_ms": 1200,
      "success_rate": 99.8
    },
    "database_performance": {
      "query_cache_hit_rate": 65,
      "average_query_time_ms": 85,
      "slow_queries_percentage": 12,
      "connection_pool_usage": 85
    },
    "cache_performance": {
      "hit_rate": 92,
      "latency_ms": 3,
      "memory_usage_percentage": 78,
      "eviction_rate": 5
    }
  },
  "flame_graph_analysis": {
    "hot_functions": [
      {
        "function": "processPayment",
        "cpu_percentage": 35,
        "optimization_opportunity": "Cache validation results"
      },
      {
        "function": "validateTransaction",
        "cpu_percentage": 22,
        "optimization_opportunity": "Batch validation"
      }
    ],
    "optimization_opportunities": [
      {
        "description": "Cache validation results",
        "estimated_improvement": 15,
        "effort_hours": 8
      },
      {
        "description": "Batch database updates",
        "estimated_improvement": 10,
        "effort_hours": 6
      }
    ]
  },
  "optimization_recommendations": {
    "immediate": [
      "Add database index on transactions.user_id",
      "Implement cache TTL for redis-cache-service",
      "Optimize processPayment() cryptographic operations"
    ],
    "short_term": [
      "Implement connection pooling for database",
      "Add query caching for frequent queries",
      "Batch database writes where possible"
    ],
    "long_term": [
      "Implement read replicas for database",
      "Add CDN for static assets",
      "Implement circuit breakers for external services"
    ]
  },
  "performance_baseline": {
    "cpu_usage_target": 70,
    "memory_usage_target": 80,
    "api_p95_latency_target": 200,
    "database_query_time_target": 100,
    "cache_hit_rate_target": 90
  },
  "next_steps": [
    {
      "action": "Implement database index",
      "estimate_hours": 2,
      "priority": "high"
    },
    {
      "action": "Fix memory leak in cache service",
      "estimate_hours": 4,
      "priority": "high"
    },
    {
      "action": "Optimize payment processor CPU usage",
      "estimate_hours": 8,
      "priority": "medium"
    }
  ]
}

Performance Dashboard:

Performance Dashboard
────────────────────
Status: ACTIVE
Last Update: 2026-02-26 19:45:00
Update Interval: 1 second

Real-time Metrics:
┌────────────────────┬────────────┬────────────┬────────────┐
│ Metric             │ Current    │ 1min Avg   │ Trend      │
├────────────────────┼────────────┼────────────┼────────────┤
│ CPU Usage          │ 65%        │ 62%        │ ↗️ Rising   │
│ Memory Usage       │ 72%        │ 71%        │ → Stable   │
│ API Latency (p95)  │ 320ms      │ 310ms      ↗️ Rising     │
│ Database Latency   │ 85ms       │ 82ms       → Stable     │
│ Cache Hit Rate     │ 92%        │ 91%        ↘️ Falling    │
│ Error Rate         │ 0.2%       │ 0.3%       ↘️ Falling    │
└────────────────────┴────────────┴────────────┴────────────┘

Alerts:
• ⚠️  API p95 latency above threshold (200ms): 320ms
• ✅  CPU usage within limits
• ✅  Memory usage within limits
• ⚠️  Database connections approaching limit (85%)

Hotspots:
1. processPayment(): 35% CPU (🔥 Hot)
2. validateTransaction(): 22% CPU (⚠️ Warm)
3. updateDatabase(): 18% CPU (⚠️ Warm)

Resource Utilization Trend:
CPU:    ████████████████████████████████████░░░░ 65%
Memory: ██████████████████████████████████████░░ 72%
Disk:   █████████████████████░░░░░░░░░░░░░░░░░░░ 45%
Network:████████████████░░░░░░░░░░░░░░░░░░░░░░░░ 38%

Recent Events:
• 19:40: Database query slowdown detected
• 19:35: Cache miss rate increased by 15%
• 19:30: API latency spike (p95: 450ms)
• 19:25: Memory usage increased by 2%

Recommendations:
1. Add index on transactions.user_id (pending)
2. Implement cache TTL (in progress)
3. Optimize payment processor (planned)

Performance Score: 72/100
Status: Needs Improvement

Notes

  • Profile in production-like environments for accurate results
  • Use appropriate sampling rates to balance overhead and accuracy
  • Compare against baselines to identify regressions
  • Monitor profiling overhead to avoid affecting production performance
  • Use flame graphs for visual bottleneck identification
  • Combine multiple tools for comprehensive analysis
  • Profile representative workloads that match production usage
  • Consider security implications of profiling in production
  • Document profiling methodology for reproducibility
  • Automate performance regression testing in CI/CD pipelines

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

adversarial-thinking

No summary provided by upstream source.

Repository SourceNeeds Review
General

redteam

No summary provided by upstream source.

Repository SourceNeeds Review
Research

test-gap-analysis

No summary provided by upstream source.

Repository SourceNeeds Review