cpp-embedded

Expert guidance for writing C (C99/C11) and C++ (C++17) code for embedded systems and microcontrollers. Use this skill whenever the user is working with: STM32, ESP32, Arduino, PIC, AVR, nRF52, or any other MCU; FreeRTOS, Zephyr, ThreadX, or any RTOS; bare-metal firmware; hardware registers, DMA, interrupts, or memory-mapped I/O; memory pools, allocators, or fixed-size buffers; MISRA C or MISRA C++ compliance; smart pointers or RAII in embedded contexts; stack vs heap decisions; placement new; volatile correctness; alignment and struct packing; C99/C11 patterns; C and C++ interoperability; debugging firmware crashes, HardFaults, stack overflows, or heap corruption; firmware architecture decisions (superloop vs RTOS vs event-driven); low-power modes (WFI/WFE/sleep); CubeMX project setup; HAL vs LL driver selection; CI/CD for firmware; embedded code review; MPU configuration; watchdog strategies; safety-critical design (IEC 61508, SIL); peripheral protocol selection (UART/I2C/SPI/CAN); linker script memory placement; or C/C++ callback patterns. Also trigger on implicit cues like "my MCU keeps crashing", "writing firmware", "ISR safe", "embedded allocator", "no dynamic memory", "power consumption", "CubeMX regenerated my code", "which RTOS pattern should I use", "MPU fault", "watchdog keeps resetting", "which protocol should I use for my sensor", "ESP32 deep sleep", "PSRAM vs DRAM", "ESP32 heap keeps shrinking", "ESP.getFreeHeap()", "task stack overflow on ESP32", or "WiFi reconnect after deep sleep is slow".

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "cpp-embedded" with this command: npx skills add bmad-labs/skills/bmad-labs-skills-cpp-embedded

Embedded C and C++ Skill

Quick navigation

  • Memory patterns, ESP32 heap fragmentation, ETL containers → references/memory-patterns.md
  • C99/C11 patterns, UART read-to-clear, _Static_assert → references/c-patterns.md
  • Debugging workflows (stack overflow, heap corruption, HardFault) → references/debugging.md
  • Coding style and conventions → references/coding-style.md
  • Firmware architecture, RTOS IPC, low-power, ESP32 deep sleep, CI/CD → references/architecture.md
  • STM32 pitfalls, CubeMX, hard-to-debug issues, code review → references/stm32-pitfalls.md
  • Design patterns, SOLID for embedded, HAL design, C/C++ callbacks → references/design-patterns.md
  • MPU protection, watchdog hierarchy, safety-critical, protocols, linker scripts → references/safety-hardware.md

Embedded Memory Mindset

Embedded systems have no OS safety net. A bad pointer dereference doesn't produce a polite segfault — it silently corrupts memory, triggers a HardFault hours later, or hangs in an ISR. The stakes of every allocation decision are higher than in hosted environments.

Three principles govern embedded memory:

Determinism over convenience. Dynamic allocation (malloc/new) is non-deterministic in both time and failure mode. MISRA C Rule 21.3 and MISRA C++ Rule 21.6.1 ban dynamic memory after initialization. Even outside MISRA, avoid heap allocation in production paths.

Size is known at compile time. Embedded software has a fixed maximum number of each object type. Design around this. If you need 8 UART message buffers, declare 8 at compile time. Don't discover the maximum at runtime.

ISRs are sacred ground. Never allocate, never block, never call non-reentrant functions from an ISR. Keep ISRs minimal — set a flag or write to a ring buffer, then do the real work in a task.


Allocation Decision Table

NeedSolutionNotes
Short-lived local dataStackKeep < 256 bytes per frame; profile with -fstack-usage
Fixed singleton objectsstatic at file or function scopeZero-initialized before main()
Fixed array of objectsObject pool (ObjectPool<T, N>)O(1) alloc/free, no fragmentation
Temporary scratch spaceArena / bump allocatorReset whole arena at end of operation
Variable-size messagesRing buffer of fixed-size slotsSimplest ISR-safe comms pattern
Custom lifetime controlPlacement new + static storageFull control, no heap involvement
Never in ISRAny of the above except stackAllocator calls are not ISR-safe
Avoid entirelymalloc/new / std::vectorNon-deterministic; fragmentation risk

Critical Patterns

RAII Resource Guard

Acquire on construction, release on destruction. Guarantees release even through early returns or exceptions (if using exceptions — rare in embedded, but possible in C++ environments that allow them).

class SpiGuard {
public:
    explicit SpiGuard(SPI_HandleTypeDef* spi) : spi_(spi) {
        HAL_GPIO_WritePin(CS_GPIO_Port, CS_Pin, GPIO_PIN_RESET);
    }
    ~SpiGuard() {
        HAL_GPIO_WritePin(CS_GPIO_Port, CS_Pin, GPIO_PIN_SET);
    }
    // Non-copyable, non-movable — guard is tied to this scope
    SpiGuard(const SpiGuard&) = delete;
    SpiGuard& operator=(const SpiGuard&) = delete;
private:
    SPI_HandleTypeDef* spi_;
};

// Usage: CS deasserts automatically at end of scope
void read_sensor() {
    SpiGuard guard(&hspi1);
    // ... transfer bytes ...
}  // CS deasserts here

ISR-to-Task Communication: Ring Buffer vs Ping-Pong DMA

For passing data from an ISR/DMA to a task, two main patterns exist:

  • SPSC ring buffer (power-of-2 capacity with bitmask indexing) — best for variable-rate streams. See references/memory-patterns.md §3 for the full implementation. Always use _Static_assert or static_assert to validate the capacity is a power of 2.
  • Ping-pong (double) buffering — best for fixed-burst DMA transfers. ISR fills one buffer while task processes the other.

When asked for ISR-to-task communication, include a ring buffer with power-of-2 bitmask indexing as the primary pattern, even if ping-pong is also mentioned as an alternative for specific DMA scenarios.

Static Object Pool

Pre-allocates N objects of type T with O(1) alloc/free and no heap involvement. Read references/memory-patterns.md §1 for the full arena allocator and §2 for CRTP patterns.

template<typename T, size_t N>
class ObjectPool {
public:
    template<typename... Args>
    T* allocate(Args&&... args) {
        for (auto& slot : slots_) {
            if (!slot.used) {
                slot.used = true;
                return new (&slot.storage) T(std::forward<Args>(args)...);
            }
        }
        return nullptr;  // Pool exhausted — handle at call site
    }

    void free(T* obj) {
        obj->~T();
        for (auto& slot : slots_) {
            if (reinterpret_cast<T*>(&slot.storage) == obj) {
                slot.used = false;
                return;
            }
        }
    }

private:
    struct Slot {
        alignas(T) std::byte storage[sizeof(T)];
        bool used = false;
    };
    Slot slots_[N]{};
};

Volatile Hardware Register Access

volatile tells the compiler the value can change outside its knowledge (hardware can write it). Without volatile, the compiler may cache the read in a register and never see the hardware update.

// Define register layout matching the hardware manual
struct UartRegisters {
    volatile uint32_t SR;   // Status register
    volatile uint32_t DR;   // Data register
    volatile uint32_t BRR;  // Baud rate register
    volatile uint32_t CR1;  // Control register 1
};

// Map to the hardware base address
auto* uart = reinterpret_cast<UartRegisters*>(0x40011000U);

// Read status — volatile ensures each read hits the hardware
if (uart->SR & (1U << 5)) {   // RXNE bit
    uint8_t byte = static_cast<uint8_t>(uart->DR);
}

Interrupt-Safe Access

Sharing data between an ISR and a task requires either a critical section or std::atomic. Use atomics when the type fits in a single load/store (usually ≤ pointer size). Use critical sections for larger structures.

#include <atomic>

// Atomic: ISR and task can access without disabling interrupts
std::atomic<uint32_t> adc_value{0};

// In ISR:
void ADC_IRQHandler() {
    adc_value.store(ADC1->DR, std::memory_order_relaxed);
}

// In task:
uint32_t val = adc_value.load(std::memory_order_relaxed);

// Critical section for larger structures (ARM Cortex-M):
struct SensorFrame { uint32_t timestamp; int16_t x, y, z; };
volatile SensorFrame latest_frame{};

void update_frame_from_isr(const SensorFrame& f) {
    __disable_irq();
    latest_frame = f;
    __enable_irq();
}

Smart Pointer Policy

Pointer typeUse in embedded?Guidance
Raw pointer (observing)YesFor non-owning references; make ownership explicit in naming
Raw pointer (owning)CarefullyOnly into static/pool storage where lifetime is obvious
std::unique_ptrYes, with careZero overhead; use with custom deleters for pool objects
std::unique_ptr + custom deleterYesReturns pool objects to their pool on destruction
std::shared_ptrAvoidReference counting uses heap and is non-deterministic
std::weak_ptrAvoidTied to shared_ptr; same concerns
// unique_ptr with pool deleter — zero heap, automatic return to pool
ObjectPool<SensorData, 8> sensor_pool;

auto deleter = [](SensorData* p) { sensor_pool.free(p); };
using PooledSensor = std::unique_ptr<SensorData, decltype(deleter)>;

PooledSensor acquire_sensor() {
    return PooledSensor(sensor_pool.allocate(), deleter);
}

Compile-Time Preferences

Prefer compile-time computation and verification over runtime checks:

// constexpr: computed at compile time, no runtime cost
constexpr uint32_t BAUD_DIVISOR = PCLK_FREQ / (16U * TARGET_BAUD);
static_assert(BAUD_DIVISOR > 0 && BAUD_DIVISOR < 65536, "Baud divisor out of range");

// std::array: bounds info preserved, unlike raw arrays
std::array<uint8_t, 64> tx_buffer{};

// std::span: non-owning view, no allocation, C++20 but often available via ETL
// std::string_view: for string literals and buffers, no heap

// CRTP replaces virtual dispatch — zero runtime overhead
// See references/memory-patterns.md §2 for full example

C++ features to avoid in embedded:

AvoidReasonAlternative
-fexceptionsCode size (10-30% increase), non-deterministic stack unwindstd::optional, std::expected (C++23/polyfill), error codes
-frtti / typeid / dynamic_castRuntime type tables increase ROMCRTP, explicit type tags
std::vector, std::string, std::mapHeap allocationstd::array, ETL containers
std::threadRequires OS primitivesRTOS tasks
std::functionHeap allocation for capturesFunction pointers, templates
Virtual destructors in deep hierarchiesvtable size, indirect dispatch, blocks inliningCRTP or flat hierarchies (≤2 levels); virtual OK for non-critical paths

Compile with -fno-exceptions -fno-rtti for ARM targets. Use the Embedded Template Library (ETL) for fixed-size etl::vector, etl::map, etl::string alternatives.


Common Anti-Patterns

These patterns cause real bugs in production firmware. Knowing them saves hours of debugging.

Anti-PatternProblemFix
std::function with capturesHeap-allocates when captures exceed small-buffer threshold (~16-32 bytes, implementation-dependent)Function pointer + void* context, template callable, or lambda passed directly to template parameter
Arduino String in loopsEvery concatenation/conversion heap-allocates; 10Hz × 4 sensors = 3.4M alloc/day → fragmentationFixed char[] with snprintf; for complex strings use etl::string<N>
new/delete in periodic tasksSame fragmentation; violates MISRA C++ Rule 21.6.1Object pool, static array, or ETL fixed-capacity container
std::shared_ptr anywhereAtomic ref-counting overhead, control block on heapstd::unique_ptr with pool deleter, or raw pointer with clear ownership
Deep virtual hierarchiesvtable per class (ROM), indirect dispatch, blocks inliningCRTP or flat hierarchy (≤2 levels)
Hidden std::string/std::vectorDynamic allocation on every operation — often hidden in library APIsETL containers (etl::string<N>, etl::vector<T,N>)
volatile for thread syncvolatile only prevents the compiler from caching or eliminating reads/writes — it does NOT prevent reordering between threads or provide atomicity. Data races on volatile variables are still undefined behaviorstd::atomic with explicit memory ordering (memory_order_relaxed suffices for ISR-to-task on single-core Cortex-M)
ISR handler name mismatchMisspelled ISR name silently falls through to Default_Handler — system hangs or resetsVerify names against startup .s vector table; use -Wl,--undefined=USART1_IRQHandler to catch missing symbols at link time
printf/sprintf in ISRHeap allocation, locking, non-reentrant — crashes or deadlockssnprintf to pre-allocated buffer outside ISR, or ITM/SWO trace output
Unbounded recursionStack overflow on MCU with 1–8KB stack; MISRA C Rule 17.2 bans recursionConvert to iterative with explicit stack

Hidden allocation checklist: Before using any STL type, check whether it allocates. Common surprises: std::string::operator+=, std::vector::push_back (reallocation), std::function (type-erased captures), std::any, std::regex.


Common Memory Bug Diagnosis

SymptomLikely causeFirst action
Crash after N hours of uptimeHeap fragmentationSwitch to pools/ETL containers; cite MISRA C Rule 21.3; see references/memory-patterns.md §9 for ESP32-specific guidance
HardFault with BFAR/MMFAR validNull/wild pointer dereference or bus faultRead CFSR sub-registers; check MMARVALID/BFARVALID before using address registers; see references/debugging.md §4
Stack pointer in wrong regionStack overflowCheck .su files; add MPU guard region; see references/debugging.md §1
ISR data looks staleMissing volatileAdd volatile to shared variables; audit ISR data paths
Random corruption near ISRData raceApply atomics or critical section; see references/debugging.md §3
Use-after-freeObject returned to pool while still referencedVerify no aliasing; use unique_ptr with pool deleter
MPU fault in taskTask overflowed its stack into neighboring regionIncrease stack size or reduce frame depth
Uninitialized readLocal variable used before assignmentEnable -Wuninitialized; initialize all locals
HardFault on first float operationFPU not enabled in CPACR`SCB->CPACR
ISR does nothing / default handler runsISR function name misspelledVerify name against startup .s vector table

Debugging Tools Decision Tree

Is the bug reproducible on a host (PC)?
├── YES → Use AddressSanitizer (ASan) + Valgrind
│         Compile embedded logic for PC with -fsanitize=address
│         See references/debugging.md §5
└── NO  → Is it a memory layout/access issue?
          ├── YES → Enable MPU; add stack canaries; read CFSR on fault
          │         See references/debugging.md §1, §4
          └── NO  → Is it a data-race between ISR and task?
                    ├── YES → Audit shared state; apply atomics/critical section
                    │         See references/debugging.md §3
                    └── NO  → Use GDB watchpoint on the corrupted address
                              See references/debugging.md §6

Static analysis: run clang-tidy with clang-analyzer-* and cppcoreguidelines-* checks. Run cppcheck --enable=all for C code. Both catch many issues before target hardware.


Error Handling Philosophy

Four layers, each for a distinct failure category:

Recoverable errors (with context) — use std::expected (C++23, or tl::expected/etl::expected as header-only polyfills for C++17) when you need to communicate why something failed. Zero heap allocation, zero exceptions, type-safe error propagation:

enum class SensorError : uint8_t { not_ready, crc_fail, timeout };

[[nodiscard]] std::expected<SensorReading, SensorError> read_sensor() {
    if (!sensor_ready()) return std::unexpected(SensorError::not_ready);
    auto raw = read_raw();
    if (!verify_crc(raw)) return std::unexpected(SensorError::crc_fail);
    return SensorReading{.temp = convert(raw)};
}

Recoverable errors (simple) — use std::optional when the only failure is "no value":

[[nodiscard]] std::optional<SensorReading> read_sensor() {
    if (!sensor_ready()) return std::nullopt;
    return SensorReading{.temp = read_temp(), .humidity = read_humidity()};
}

[[nodiscard]] enforcement: Mark every function returning an error code, status, or allocated pointer with [[nodiscard]]. The compiler then warns if the caller silently ignores the return value — this catches the #1 error handling bug in firmware (discarded error codes). In C, use [[nodiscard]] (C23) or __attribute__((warn_unused_result)).

Programming errors — use assert or a trap that halts with debug info:

void write_to_pool(uint8_t* buf, size_t len) {
    assert(buf != nullptr);
    assert(len <= MAX_PACKET_SIZE);  // Trips in debug, removed in release with NDEBUG
    // ...
}

Unrecoverable runtime errors — log fault reason + registers to non-volatile memory (flash/EEPROM), then let the watchdog reset the system. Without pre-reset logging, field failures leave no post-mortem data. Write CFSR + stacked PC + LR to a dedicated flash sector, then call NVIC_SystemReset() or spin and let the watchdog fire.


Testability Architecture

Write firmware that can be tested on a PC without target hardware. The key: separate business logic from hardware I/O at a clear HAL boundary.

Compile-time dependency injection — the hardware is known at compile time, so use templates instead of virtual dispatch. Zero runtime overhead, full testability:

// HAL interface as a concept (or just template parameter)
template<typename Hal>
class SensorController {
public:
    explicit SensorController(Hal& hal) : hal_(hal) {}
    std::optional<float> read_temperature() {
        auto raw = hal_.i2c_read(SENSOR_ADDR, TEMP_REG, 2);
        if (!raw) return std::nullopt;
        return convert_raw_to_celsius(*raw);
    }
private:
    Hal& hal_;
};

// Production: uses real hardware
SensorController<StmHal> controller(real_hal);

// Test: uses mock — same code, no vtable, no overhead in production
SensorController<MockHal> test_controller(mock_hal);

Testing pyramid for firmware:

  • Unit tests (host PC): Business logic with mock HAL — runs with ASan/UBSan, fast CI
  • Integration tests (QEMU): Full firmware on emulated Cortex-M — catches linker/startup issues
  • Hardware-in-the-loop (HIL): On real target — catches timing, peripheral, and electrical issues

Firmware Architecture Selection

Choose the simplest architecture that meets your requirements:

ArchitectureComplexityBest for
Superloop (bare-metal polling)Lowest< 5 tasks, loose timing, fully deterministic
Cooperative scheduler (time-triggered)LowHard real-time, safety-critical (IEC 61508 SIL 1–2), analyzable
RTOS preemptive (FreeRTOS/Zephyr)MediumComplex multi-task, priority-based scheduling
Active Object (QP framework)HighestEvent-heavy, hierarchical state machines, protocol handling

For FreeRTOS IPC selection (task notifications vs queues vs stream buffers), low-power patterns, CI/CD pipeline setup, and binary size budgeting → see references/architecture.md.

For STM32 CubeMX pitfalls, HAL vs LL driver selection, hard-to-debug embedded issues, and code review checklists → see references/stm32-pitfalls.md.


ESP32 Platform Guidance

When responding to any ESP32/ESP32-S2/S3/C3 question, always consider these platform-specific concerns:

Memory architecture — always address in ESP32 responses: ESP32 has multiple non-contiguous memory regions. In every ESP32 response, explicitly discuss where buffers should be placed:

  • DRAM (~320KB, fast): Ring buffers, DMA buffers, ISR data, FreeRTOS stacks. All real-time and latency-sensitive data goes here.
  • PSRAM (4-8MB, ~10× slower, optional on -WROVER/S3): Large non-realtime data like SD card write buffers, CSV formatting buffers, web server buffers, display framebuffers. Allocate with heap_caps_malloc(size, MALLOC_CAP_SPIRAM) or ps_malloc(). Never use PSRAM in ISRs or tight control loops — access latency is ~100ns vs ~10ns for DRAM.
  • If the ESP32 variant has PSRAM, recommend moving large format/write buffers there to free DRAM for real-time use. If PSRAM is not available, note the DRAM pressure and size budgets. Always mention both regions so the user understands the tradeoff.

ETL on ESP32: When replacing std::vector, std::string, or std::map on ESP32, always recommend the Embedded Template Library (ETL) by name with specific types: etl::vector<T, N>, etl::string<N>, etl::map<K, V, N>, etl::queue_spsc_atomic<T, N>. ETL works on ESP32 with both Arduino (lib_deps = ETLCPP/Embedded Template Library) and ESP-IDF (add as component). Even when providing a custom implementation (like a ring buffer), mention ETL as a production-ready alternative the user should consider.

Deep sleep and fast wake: For battery-powered ESP32 sensor nodes, see references/architecture.md for RTC memory WiFi caching, static IP, and sensor forced mode patterns.

Task stack sizing on ESP32: WiFi tasks need 4096-8192 bytes, BLE 4096-8192, TLS/SSL 8192-16384. Monitor with uxTaskGetStackHighWaterMark(). See references/memory-patterns.md §9 for details.

For full ESP32 heap fragmentation diagnosis, monitoring, and ETL integration → see references/memory-patterns.md §9.


Coding Conventions Summary

See references/coding-style.md for the full guide. Key rules:

  • Variables and functions: snake_case
  • Classes and structs: PascalCase
  • Constants: kConstantName (Google style) or ALL_CAPS for macros
  • Member variables: trailing_underscore_
  • Include guards: #pragma once (prefer) or #ifndef HEADER_H_ guard
  • const correctness: const every non-mutating method, const every parameter that isn't modified
  • [[nodiscard]]: on any function whose return value must not be silently dropped (error codes, pool allocate)

Reference File Index

FileRead when
references/memory-patterns.mdImplementing arena, ring buffer, DMA buffers, lock-free SPSC, singletons, linker sections, ESP32 heap fragmentation, ETL containers
references/c-patterns.mdWriting C99/C11 firmware, C memory pools, C error handling, C/C++ interop, MISRA C rules, UART read-to-clear mechanism, _Static_assert for buffer validation
references/debugging.mdDiagnosing stack overflow, heap corruption, HardFault, data races, NVIC priority issues, or running ASan/GDB
references/coding-style.mdNaming conventions, feature usage table, struct packing, attributes, include guards
references/architecture.mdChoosing firmware architecture, FreeRTOS IPC patterns, low-power modes, ESP32 deep sleep + fast wake, CI/CD pipeline setup, binary size budgets
references/stm32-pitfalls.mdCubeMX code generation issues, HAL vs LL selection, hard-to-debug issues (cache, priority inversion, flash stall), code review checklist
references/design-patterns.mdHAL design patterns (CRTP/template/virtual/opaque), dependency injection strategies, SOLID for embedded, callback + trampoline patterns
references/safety-hardware.mdMPU protection patterns, watchdog hierarchy, IEC 61508 fault recovery, peripheral protocol selection (UART/I2C/SPI/CAN), linker script memory placement

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

typescript-e2e-testing

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

typescript-unit-testing

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

typescript-clean-code

No summary provided by upstream source.

Repository SourceNeeds Review
General

slides-generator

No summary provided by upstream source.

Repository SourceNeeds Review