tokio-troubleshooting

Tokio Troubleshooting

This skill provides techniques for debugging and troubleshooting async applications built with Tokio.

Using tokio-console for Runtime Inspection

Monitor async runtime in real-time:

// In Cargo.toml [dependencies] console-subscriber = "0.2"

// In main.rs fn main() { console_subscriber::init();

tokio::runtime::Builder::new_multi_thread()
    .enable_all()
    .build()
    .unwrap()
    .block_on(async {
        run_application().await
    });

}

Run console in separate terminal:

tokio-console

Key metrics to monitor:

Task spawn rate and total tasks
Poll duration per task
Idle vs. busy time
Waker operations
Resource utilization

Identifying issues:

Long poll durations: CPU-intensive work in async context
Many wakers: Potential contention or inefficient polling
Growing task count: Task leak or unbounded spawning
High idle time: Not enough work or blocking operations

Debugging Deadlocks and Hangs

Detect and resolve deadlock situations:

Common Deadlock Pattern

// BAD: Potential deadlock async fn deadlock_example() { let mutex1 = Arc::new(Mutex::new(())); let mutex2 = Arc::new(Mutex::new(()));

let m1 = mutex1.clone();
let m2 = mutex2.clone();
tokio::spawn(async move {
    let _g1 = m1.lock().await;
    tokio::time::sleep(Duration::from_millis(10)).await;
    let _g2 = m2.lock().await; // May deadlock
});

let _g2 = mutex2.lock().await;
tokio::time::sleep(Duration::from_millis(10)).await;
let _g1 = mutex1.lock().await; // May deadlock

}

// GOOD: Consistent lock ordering async fn no_deadlock_example() { let mutex1 = Arc::new(Mutex::new(())); let mutex2 = Arc::new(Mutex::new(()));

// Always acquire locks in same order
let _g1 = mutex1.lock().await;
let _g2 = mutex2.lock().await;

}

// BETTER: Avoid nested locks async fn best_example() { // Use message passing instead let (tx, mut rx) = mpsc::channel(10);

tokio::spawn(async move {
    while let Some(msg) = rx.recv().await {
        process_message(msg).await;
    }
});

tx.send(message).await.unwrap();

}

Detecting Hangs with Timeouts

use tokio::time::{timeout, Duration};

async fn detect_hang() { match timeout(Duration::from_secs(5), potentially_hanging_operation()).await { Ok(result) => println!("Completed: {:?}", result), Err(_) => { eprintln!("Operation timed out - potential hang detected"); // Log stack traces, metrics, etc. } } }

Deadlock Detection with try_lock

use tokio::sync::Mutex;

async fn try_with_timeout(mutex: &Mutex<State>) -> Option<State> { for _ in 0..10 { if let Ok(guard) = mutex.try_lock() { return Some(guard.clone()); } tokio::time::sleep(Duration::from_millis(10)).await; } eprintln!("Failed to acquire lock - possible deadlock"); None }

Memory Leak Detection

Identify and fix memory leaks:

Task Leaks

// BAD: Tasks never complete async fn leaking_tasks() { loop { tokio::spawn(async { loop { // Never exits tokio::time::sleep(Duration::from_secs(1)).await; } }); } }

// GOOD: Tasks have exit condition async fn proper_tasks(shutdown: broadcast::Receiver<()>) { loop { let mut shutdown_rx = shutdown.resubscribe(); tokio::spawn(async move { loop { tokio::select! { _ = shutdown_rx.recv() => break, _ = tokio::time::sleep(Duration::from_secs(1)) => { // Work } } } }); } }

Arc Cycles

// BAD: Reference cycle struct Node { next: Option<Arc<Mutex<Node>>>, prev: Option<Arc<Mutex<Node>>>, // Creates cycle! }

// GOOD: Use weak references use std::sync::Weak;

struct Node { next: Option<Arc<Mutex<Node>>>, prev: Option<Weak<Mutex<Node>>>, // Weak reference breaks cycle }

Monitoring Memory Usage

use sysinfo::{System, SystemExt};

pub async fn memory_monitor() { let mut system = System::new_all(); let mut interval = tokio::time::interval(Duration::from_secs(60));

loop {
    interval.tick().await;
    system.refresh_memory();

    let used = system.used_memory();
    let total = system.total_memory();
    let percent = (used as f64 / total as f64) * 100.0;

    tracing::info!(
        used_mb = used / 1024 / 1024,
        total_mb = total / 1024 / 1024,
        percent = %.2 percent,
        "Memory usage"
    );

    if percent > 80.0 {
        tracing::warn!("High memory usage detected");
    }
}

}

Performance Profiling with Tracing

Instrument code for performance analysis:

use tracing::{info, instrument, span, Level};

#[instrument] async fn process_request(id: u64) -> Result<Response, Error> { let span = span!(Level::INFO, "database_query"); let _enter = span.enter();

let data = fetch_from_database(id).await?;

drop(_enter);

let span = span!(Level::INFO, "transformation");
let _enter = span.enter();

let result = transform_data(data).await?;

Ok(Response { result })

}

// Configure subscriber for flame graphs use tracing_subscriber::layer::SubscriberExt;

fn init_tracing() { let fmt_layer = tracing_subscriber::fmt::layer(); let filter_layer = tracing_subscriber::EnvFilter::from_default_env();

tracing_subscriber::registry()
    .with(filter_layer)
    .with(fmt_layer)
    .init();

}

Understanding Panic Messages

Common async panic patterns:

Panics in Spawned Tasks

// Panic is isolated to the task tokio::spawn(async { panic!("This won't crash the program"); });

// To catch panics let handle = tokio::spawn(async { // Work that might panic });

match handle.await { Ok(result) => println!("Success: {:?}", result), Err(e) if e.is_panic() => { eprintln!("Task panicked: {:?}", e); // Handle panic } Err(e) => eprintln!("Task cancelled: {:?}", e), }

Send + 'static Errors

// ERROR: future cannot be sent between threads async fn bad_example() { let rc = Rc::new(5); // Rc is !Send tokio::spawn(async move { println!("{}", rc); // Error! }); }

// FIX: Use Arc instead async fn good_example() { let rc = Arc::new(5); // Arc is Send tokio::spawn(async move { println!("{}", rc); // OK }); }

// ERROR: borrowed value does not live long enough async fn lifetime_error() { let data = String::from("hello"); tokio::spawn(async { println!("{}", data); // Error: data might not live long enough }); }

// FIX: Move ownership async fn lifetime_fixed() { let data = String::from("hello"); tokio::spawn(async move { println!("{}", data); // OK: data is moved }); }

Common Error Patterns and Solutions

Blocking in Async Context

// PROBLEM: Detected with tokio-console (long poll time) async fn blocking_example() { std::thread::sleep(Duration::from_secs(1)); // Blocks thread! }

// SOLUTION async fn non_blocking_example() { tokio::time::sleep(Duration::from_secs(1)).await; // Yields control }

// For unavoidable blocking async fn necessary_blocking() { tokio::task::spawn_blocking(|| { expensive_cpu_work() }).await.unwrap(); }

Channel Closed Errors

// PROBLEM: SendError because receiver dropped async fn send_error_example() { let (tx, rx) = mpsc::channel(10); drop(rx); // Receiver dropped

match tx.send(42).await {
    Ok(_) => println!("Sent"),
    Err(e) => eprintln!("Send failed: {}", e), // Channel closed
}

}

// SOLUTION: Check if receiver exists async fn handle_closed_channel() { let (tx, rx) = mpsc::channel(10);

tokio::spawn(async move {
    // Receiver keeps channel open
    while let Some(msg) = rx.recv().await {
        process(msg).await;
    }
});

// Or handle the error
if let Err(e) = tx.send(42).await {
    tracing::warn!("Channel closed: {}", e);
    // Cleanup or alternative action
}

}

Task Cancellation

// PROBLEM: Task cancelled unexpectedly let handle = tokio::spawn(async { // Long-running work });

handle.abort(); // Cancels task

// SOLUTION: Handle cancellation gracefully let handle = tokio::spawn(async { let result = tokio::select! { result = do_work() => result, _ = tokio::signal::ctrl_c() => { cleanup().await; return Err(Error::Cancelled); } }; result });

Testing Async Code Effectively

Write reliable async tests:

#[tokio::test] async fn test_with_timeout() { tokio::time::timeout( Duration::from_secs(5), async { let result = my_async_function().await; assert!(result.is_ok()); } ) .await .expect("Test timed out"); }

#[tokio::test] async fn test_concurrent_access() { let shared = Arc::new(Mutex::new(0));

let handles: Vec&#x3C;_> = (0..10)
    .map(|_| {
        let shared = shared.clone();
        tokio::spawn(async move {
            let mut lock = shared.lock().await;
            *lock += 1;
        })
    })
    .collect();

for handle in handles {
    handle.await.unwrap();
}

assert_eq!(*shared.lock().await, 10);

}

// Test with mocked time #[tokio::test(start_paused = true)] async fn test_with_time_control() { let start = tokio::time::Instant::now();

tokio::time::sleep(Duration::from_secs(100)).await;

// Time is mocked, so this completes instantly
assert!(start.elapsed() &#x3C; Duration::from_secs(1));

}

Debugging Checklist

When troubleshooting async issues:

Use tokio-console to monitor runtime behavior
Check for blocking operations with tracing
Verify all locks are released properly
Look for task leaks (growing task count)
Monitor memory usage over time
Add timeouts to detect hangs
Check for channel closure errors
Verify Send + 'static bounds are satisfied
Use try_lock to detect potential deadlocks
Profile with tracing for performance bottlenecks
Test with tokio-test for time-based code
Check for Arc cycles with weak references

Helpful Tools

tokio-console: Real-time async runtime monitoring
tracing: Structured logging and profiling
cargo-flamegraph: Generate flame graphs
valgrind/heaptrack: Memory profiling
perf: CPU profiling on Linux
Instruments: Profiling on macOS

Best Practices

Always use tokio-console in development
Add tracing spans to critical code paths
Use timeouts liberally to detect hangs
Monitor task count for leaks
Profile before optimizing - measure first
Test with real concurrency - don't just test happy paths
Handle cancellation gracefully in all tasks
Use structured logging for debugging
Avoid nested locks - prefer message passing
Document lock ordering when necessary

tokio-troubleshooting

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

documentation-update

git-troubleshooting

git-advanced

tokio-patterns