Google Continuous Fuzzing
Overview
Google's continuous fuzzing infrastructure (OSS-Fuzz + ClusterFuzz) has found over 10,000 bugs in 1,000+ open source projects, including critical security vulnerabilities like Heartbleed-class bugs. This technique turns fuzzing from a one-time activity into a continuous quality gate.
References
-
Paper: "OSS-Fuzz - Google's continuous fuzzing service for open source software" (USENIX Security '17)
-
Documentation: https://google.github.io/oss-fuzz/
-
ClusterFuzz: https://google.github.io/clusterfuzz/
Core Philosophy
"Fuzzing should be continuous, not a one-time event."
"Every bug found by fuzzing is a bug not found by attackers."
Fuzzing is most effective when it runs continuously against the latest code, with automatic bug reporting and regression tracking.
Key Concepts
Coverage-Guided Fuzzing
Traditional Fuzzing: Random input generation Coverage-Guided Fuzzing: Inputs that increase code coverage are kept
Corpus → Mutate → Execute → Measure Coverage → Keep interesting inputs ↑ | └──────────────────────────────────────────────────────┘
The Fuzzing Pipeline
-
Build: Compile with sanitizers (ASan, MSan, UBSan)
-
Fuzz: Run fuzzers continuously on cluster
-
Triage: Automatically deduplicate and file bugs
-
Reproduce: Generate minimal reproducer
-
Verify: Confirm fix eliminates the bug
-
Regress: Add reproducer to regression corpus
When Implementing
Always
-
Use sanitizers (AddressSanitizer, MemorySanitizer, UndefinedBehaviorSanitizer)
-
Build seed corpus from existing tests and real inputs
-
Integrate fuzzing into CI/CD pipeline
-
Track coverage metrics over time
-
Minimize reproducers for easier debugging
-
Keep regression tests for all found bugs
Never
-
Fuzz only once and declare victory
-
Ignore crashes in dependencies
-
Skip sanitizers to "improve performance"
-
Discard valuable corpus data
-
Treat fuzzing as separate from testing
Prefer
-
LibFuzzer/AFL++ over basic random testing
-
Structure-aware fuzzing for complex formats
-
Continuous fuzzing over periodic runs
-
Automated triage over manual analysis
-
Coverage metrics over time-based metrics
Implementation Patterns
Basic Fuzz Target (C/C++)
// fuzz_target.cc // A fuzz target is a function that takes arbitrary bytes
#include <stdint.h> #include <stddef.h>
// Your library headers #include "parser.h"
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { // Call the function under test with fuzzer-provided data parse_input(data, size);
// Return 0 - non-zero return values are reserved
return 0;
}
// Build with: // clang++ -g -fsanitize=address,fuzzer fuzz_target.cc parser.cc -o fuzzer // Run with: // ./fuzzer corpus_dir/
Fuzz Target with Structure
// Structure-aware fuzzing for better coverage
#include <stdint.h> #include <stddef.h> #include <string.h>
// Fuzz a function expecting a specific structure struct Header { uint32_t magic; uint32_t version; uint32_t length; uint8_t flags; };
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { // Need at least header size if (size < sizeof(Header)) { return 0; }
Header header;
memcpy(&header, data, sizeof(Header));
// Constrain to valid magic (helps fuzzer find deeper paths)
if (header.magic != 0xDEADBEEF) {
return 0;
}
// Constrain length to available data
size_t payload_size = size - sizeof(Header);
if (header.length > payload_size) {
header.length = payload_size;
}
const uint8_t *payload = data + sizeof(Header);
// Now fuzz with valid-looking input
process_packet(&header, payload, header.length);
return 0;
}
Python Fuzzing with Atheris
#!/usr/bin/env python3
fuzz_json_parser.py
import atheris import sys
Import the module to fuzz
import json
def test_one_input(data): """Fuzz target: called with random bytes""" fdp = atheris.FuzzedDataProvider(data)
# Convert bytes to string for JSON parsing
json_str = fdp.ConsumeUnicodeNoSurrogates(
fdp.ConsumeIntInRange(0, 1024)
)
try:
# This should never crash, only raise ValueError
json.loads(json_str)
except (json.JSONDecodeError, ValueError):
pass # Expected for invalid input
except Exception as e:
# Unexpected exception = potential bug
raise
def main(): atheris.Setup(sys.argv, test_one_input) atheris.Fuzz()
if name == "main": main()
Run with:
python fuzz_json_parser.py corpus_dir/ -max_len=1024
Go Fuzzing (Native)
// fuzz_test.go // Go 1.18+ has built-in fuzzing support
package parser
import ( "testing" )
func FuzzParseInput(f *testing.F) { // Seed corpus with known inputs f.Add([]byte("valid input")) f.Add([]byte("{"key": "value"}")) f.Add([]byte(""))
f.Fuzz(func(t *testing.T, data []byte) {
// Call function under test
result, err := ParseInput(data)
if err != nil {
// Errors are fine, panics are not
return
}
// Optionally verify invariants
if result != nil && result.Length < 0 {
t.Errorf("negative length: %d", result.Length)
}
})
}
// Run with: // go test -fuzz=FuzzParseInput -fuzztime=60s
OSS-Fuzz Integration
Dockerfile for OSS-Fuzz integration
FROM gcr.io/oss-fuzz-base/base-builder
RUN apt-get update && apt-get install -y
make
autoconf
automake
libtool
Clone your project
RUN git clone --depth 1 https://github.com/your/project.git
WORKDIR project COPY build.sh $SRC/
#!/bin/bash
build.sh - OSS-Fuzz build script
Build the library with fuzzing instrumentation
./configure make clean make -j$(nproc) CC="$CC" CXX="$CXX" CFLAGS="$CFLAGS" CXXFLAGS="$CXXFLAGS"
Build fuzz targets
$CXX $CXXFLAGS $LIB_FUZZING_ENGINE
fuzz_target.cc -o $OUT/fuzz_target
-I. libproject.a
Copy seed corpus
zip -j $OUT/fuzz_target_seed_corpus.zip seeds/*
Copy dictionary if available
cp project.dict $OUT/fuzz_target.dict
Corpus Management
corpus_manager.py
Manage and minimize fuzzing corpus
import subprocess import hashlib import os from pathlib import Path
class CorpusManager: def init(self, corpus_dir: str): self.corpus_dir = Path(corpus_dir) self.corpus_dir.mkdir(exist_ok=True)
def add(self, data: bytes) -> str:
"""Add input to corpus with content-based filename"""
hash_name = hashlib.sha256(data).hexdigest()[:16]
path = self.corpus_dir / hash_name
if not path.exists():
path.write_bytes(data)
return str(path)
def minimize(self, fuzzer_binary: str) -> int:
"""Minimize corpus using fuzzer's merge feature"""
minimized_dir = self.corpus_dir.parent / "corpus_minimized"
minimized_dir.mkdir(exist_ok=True)
# LibFuzzer merge minimizes corpus
result = subprocess.run([
fuzzer_binary,
"-merge=1",
str(minimized_dir),
str(self.corpus_dir)
], capture_output=True)
return len(list(minimized_dir.iterdir()))
def get_coverage_report(self, fuzzer_binary: str) -> dict:
"""Generate coverage report for corpus"""
# Run with coverage instrumentation
result = subprocess.run([
fuzzer_binary,
"-runs=0", # Don't generate new inputs
str(self.corpus_dir)
], capture_output=True, text=True)
# Parse coverage from output
# (actual implementation depends on sanitizer output format)
return {"corpus_size": len(list(self.corpus_dir.iterdir()))}
CI/CD Integration
.github/workflows/fuzz.yml
name: Continuous Fuzzing
on: push: branches: [main] schedule: - cron: '0 0 * * *' # Daily
jobs: fuzz: runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build fuzzer
run: |
clang++ -g -O1 \
-fsanitize=address,fuzzer \
-fno-omit-frame-pointer \
fuzz_target.cc -o fuzzer
- name: Download corpus
uses: actions/cache@v3
with:
path: corpus
key: fuzz-corpus-${{ github.sha }}
restore-keys: fuzz-corpus-
- name: Run fuzzer
run: |
mkdir -p corpus
timeout 600 ./fuzzer corpus/ -max_total_time=600 || true
- name: Upload crash artifacts
if: always()
uses: actions/upload-artifact@v3
with:
name: crashes
path: crash-*
if-no-files-found: ignore
- name: Check for crashes
run: |
if ls crash-* 1> /dev/null 2>&1; then
echo "Crashes found!"
exit 1
fi
Mental Model
Google's fuzzing approach asks:
-
Is this running continuously? One-time fuzzing misses regression bugs
-
Are sanitizers enabled? Crashes without sanitizers miss real bugs
-
Is the corpus growing? Coverage should increase over time
-
Are bugs being tracked? Automatic filing and deduplication
-
Are fixes verified? Reproducers become regression tests
Signature Moves
-
Coverage-guided mutation (LibFuzzer, AFL++)
-
Sanitizer builds (ASan, MSan, UBSan, TSan)
-
Automatic corpus management and minimization
-
CI/CD integration for every commit
-
Regression corpus from found bugs
-
Structure-aware fuzzing for protocols