Test-Driven Development Workflow for R

This skill ensures all R code development follows TDD principles with comprehensive test coverage using testthat.

When to Activate

Writing new functions or features
Fixing bugs or issues
Refactoring existing code
Adding new model types
Creating data processing pipelines
Building Shiny components

Getting Started

Initialize testing infrastructure for your package:

Set up testthat (Edition 3)

usethis::use_testthat(3)

Create a test file for an existing source file

usethis::use_test("function_name")

Or create test and source file together

usethis::use_r("function_name") usethis::use_test("function_name")

Core Principles

Tests BEFORE Code

ALWAYS write tests first, then implement code to make tests pass.

Coverage Requirements

Minimum 80% coverage (unit + integration)
100% coverage for statistical calculations
100% coverage for data validation
All edge cases covered
Error scenarios tested

Test Types

Tests follow a three-level hierarchy: File → Test → Expectation

Unit Tests

Individual functions and utilities:

test_that("rescale01 normalizes to [0, 1] range", { expect_equal(rescale01(c(0, 5, 10)), c(0, 0.5, 1)) expect_equal(rescale01(c(-10, 0, 10)), c(0, 0.5, 1)) })

test_that("rescale01 handles edge cases", { expect_equal(rescale01(c(5, 5, 5)), c(NaN, NaN, NaN)) expect_equal(rescale01(numeric(0)), numeric(0)) expect_equal(rescale01(c(0, NA, 10)), c(0, NA, 1)) })

Integration Tests

Function interactions and workflows:

test_that("data pipeline produces expected output", { raw_data <- read_fixture("sample_input.csv")

result <- raw_data |> clean_data() |> transform_features() |> summarize_results()

expect_s3_class(result, "tbl_df") expect_named(result, c("group", "mean", "sd", "n")) expect_true(all(result$n > 0)) })

Snapshot Tests

For complex outputs that are hard to specify:

test_that("model summary format is stable", { model <- fit_model(test_data) expect_snapshot(print(summary(model))) })

test_that("error messages are informative", { expect_snapshot( validate_input(invalid_data), error = TRUE ) })

Snapshot workflow:

Review snapshot changes

testthat::snapshot_review("test_name")

Accept snapshot changes

testthat::snapshot_accept("test_name")

Snapshots are stored in tests/testthat/_snaps/ directory.

BDD Alternative (Optional)

For behavior-driven development, use describe() and it() :

describe("matrix()", { it("can be multiplied by a scalar", { m1 <- matrix(1:4, 2, 2) m2 <- m1 * 2 expect_equal(matrix(c(2, 4, 6, 8), 2, 2), m2) })

it("can be transposed", { m <- matrix(1:4, 2, 2) expect_equal(t(m), matrix(c(1, 3, 2, 4), 2, 2)) }) })

Key distinction: "describe() verifies you implement the right things, test_that() ensures you do things right."

Test Design Principles

Self-Sufficient Tests

Each test should contain all setup, execution, and teardown code. Tests must be independent and runnable in isolation without relying on ambient state or prior test execution.

GOOD: Self-contained

test_that("function works with specific data", { data <- tibble(x = 1:10, y = rnorm(10)) # Setup result <- my_function(data) # Execute expect_equal(nrow(result), 10) # Assert })

BAD: Depends on external state

setup_data <- tibble(...) # Created outside test

test_that("function works", { result <- my_function(setup_data) # Relies on external data expect_equal(nrow(result), 10) })

Duplication Over Factoring

Repetition is acceptable in tests—duplicate setup code rather than extracting it elsewhere. Clarity outweighs avoiding duplication.

GOOD: Duplicated but clear

test_that("clean_data handles missing values", { data <- tibble(x = c(1, NA, 3), y = c(4, 5, 6)) result <- clean_data(data) expect_equal(nrow(result), 2) })

test_that("clean_data handles invalid values", { data <- tibble(x = c(1, -999, 3), y = c(4, 5, 6)) result <- clean_data(data, invalid = -999) expect_equal(nrow(result), 2) })

ACCEPTABLE: Each test is self-contained and readable

Plan for Failure

Write tests assuming they'll fail and require debugging. Make logic explicit and obvious. Run tests in fresh R sessions independently.

Use devtools::load_all()

During development, prefer devtools::load_all() over library() . This:

Exposes unexported functions for testing
Automatically attaches testthat
Eliminates unnecessary library() calls in tests
Simulates package loading without installation

testthat Edition 3

Edition 3 provides improved snapshot testing, better diffs via waldo, unified condition handling, parallel execution support, and byte-compiled code compatibility for mocking.

Deprecated Patterns → Modern Alternatives

DEPRECATED: context() calls

context("Data validation") # Remove - filename serves this purpose

DEPRECATED: expect_equivalent()

expect_equivalent(x, y)

MODERN:

expect_equal(x, y, ignore_attr = TRUE)

DEPRECATED: with_mock()

with_mock(external_call = function() "mocked", { result <- my_function() })

MODERN:

local_mocked_bindings( external_call = function() "mocked" ) result <- my_function()

DEPRECATED: expect_is()

expect_is(x, "data.frame")

MODERN:

expect_s3_class(x, "data.frame")

Initialize Edition 3

In DESCRIPTION , ensure:

Config/testthat/edition: 3

Or initialize with:

usethis::use_testthat(3)

Essential Expectations Reference

Equality & Identity

expect_equal(x, y) # With numeric tolerance expect_equal(x, y, tolerance = 0.001) expect_equal(x, y, ignore_attr = TRUE) expect_identical(x, y) # Exact match required expect_all_equal(x) # Every element equal (v3.3.0+)

Conditions

expect_error(code) expect_error(code, "pattern") expect_error(code, class = "validation_error") expect_warning(code) expect_no_warning(code) expect_message(code) expect_no_message(code)

Collections & Sets

expect_setequal(x, y) # Same elements, any order expect_contains(set, element) # Subset relationship (v3.2.0+) expect_in(element, set) # Membership check (v3.2.0+) expect_disjoint(set1, set2) # No overlap (v3.3.0+) expect_named(x, c("a", "b")) # Named vector/list

Type & Structure

expect_type(x, "double") expect_s3_class(x, "data.frame") expect_s4_class(x, "S4Class") expect_r6_class(x, "R6Class") expect_shape(matrix, c(2, 3)) # Matrix/array dimensions (v3.3.0+) expect_length(x, 10)

Logical

expect_true(x) expect_false(x) expect_all_true(x) # Every element TRUE (v3.3.0+) expect_all_false(x) # Every element FALSE (v3.3.0+)

Other Useful Expectations

expect_null(x) expect_invisible(result) expect_output(print(x), "pattern") expect_snapshot(complex_output)

File Organization

Tests mirror your package structure:

tests/ ├── testthat/ │ ├── test-validation.R # Tests for R/validation.R │ ├── test-processing.R # Tests for R/processing.R │ ├── test-models.R # Tests for R/models.R │ ├── test-output.R # Tests for R/output.R │ ├── helper-fixtures.R # Shared functions (sourced before tests) │ ├── setup-database.R # Setup code (runs during R CMD check) │ ├── helper-expectations.R # Custom expectations │ └── fixtures/ # Static test data files │ ├── sample_input.csv │ └── expected_output.rds └── testthat.R # Test runner

File Types

test-*.R
Actual test files (paired with source files)
helper-*.R
Shared utility functions, sourced before tests run
setup-*.R
Setup code that runs only during R CMD check
fixtures/
Static test data, accessed via test_path("fixtures/file")

Access fixtures:

test_path("fixtures", "sample_data.csv")

TDD Workflow Steps

Step 1: Define Expected Behavior

Document what the function should do:

Function: calculate_ci

Purpose: Calculate bootstrap confidence intervals

Inputs:

- data: numeric vector

- conf_level: confidence level (default 0.95)

- n_boot: number of bootstrap samples (default 1000)

Outputs:

- Named numeric vector with lower and upper bounds

Edge cases:

- Handle NA values

- Error on non-numeric input

- Error on empty input

Step 2: Write Failing Tests

tests/testthat/test-calculate_ci.R

library(testthat)

test_that("calculate_ci returns correct structure", { set.seed(123) result <- calculate_ci(1:100)

expect_type(result, "double") expect_named(result, c("lower", "upper")) expect_true(result["lower"] < result["upper"]) })

test_that("calculate_ci respects confidence level", { set.seed(123) ci_95 <- calculate_ci(1:100, conf_level = 0.95) ci_99 <- calculate_ci(1:100, conf_level = 0.99)

99% CI should be wider

expect_true(ci_99["upper"] - ci_99["lower"] > ci_95["upper"] - ci_95["lower"]) })

test_that("calculate_ci handles NA values", { set.seed(123) result <- calculate_ci(c(1:100, NA, NA))

expect_false(any(is.na(result))) })

test_that("calculate_ci validates inputs", { expect_error(calculate_ci("not numeric"), class = "validation_error") expect_error(calculate_ci(numeric(0)), class = "validation_error") expect_error(calculate_ci(1:10, conf_level = 1.5), class = "validation_error") })

Step 3: Run Tests (They Should Fail)

devtools::test()

✖ calculate_ci returns correct structure

✖ calculate_ci respects confidence level

✖ calculate_ci handles NA values

✖ calculate_ci validates inputs

Step 4: Implement Minimal Code

R/calculate_ci.R

#' Calculate Bootstrap Confidence Interval #' #' @param x Numeric vector #' @param conf_level Confidence level (default 0.95) #' @param n_boot Number of bootstrap samples (default 1000) #' @return Named numeric vector with lower and upper bounds #' @export calculate_ci <- function(x, conf_level = 0.95, n_boot = 1000) {

Validate inputs

if (!is.numeric(x)) { cli::cli_abort("{.arg x} must be numeric", class = "validation_error") } if (length(x) == 0) { cli::cli_abort("{.arg x} cannot be empty", class = "validation_error") } if (conf_level <= 0 || conf_level >= 1) { cli::cli_abort("{.arg conf_level} must be between 0 and 1", class = "validation_error") }

Remove NA values

x <- x[!is.na(x)]

Bootstrap

boot_means <- replicate(n_boot, mean(sample(x, replace = TRUE)))

Calculate quantiles

alpha <- 1 - conf_level c( lower = unname(quantile(boot_means, alpha / 2)), upper = unname(quantile(boot_means, 1 - alpha / 2)) ) }

Step 5: Run Tests Again

devtools::test()

✔ calculate_ci returns correct structure

✔ calculate_ci respects confidence level

✔ calculate_ci handles NA values

✔ calculate_ci validates inputs

Step 6: Refactor

Improve while keeping tests green:

Extract validation to helper

validate_ci_inputs <- function(x, conf_level) { if (!is.numeric(x)) { cli::cli_abort("{.arg x} must be numeric", class = "validation_error") } if (length(x) == 0) { cli::cli_abort("{.arg x} cannot be empty", class = "validation_error") } if (conf_level <= 0 || conf_level >= 1) { cli::cli_abort("{.arg conf_level} must be between 0 and 1", class = "validation_error") } }

calculate_ci <- function(x, conf_level = 0.95, n_boot = 1000) { validate_ci_inputs(x, conf_level)

x <- x[!is.na(x)] boot_means <- replicate(n_boot, mean(sample(x, replace = TRUE)))

alpha <- 1 - conf_level c( lower = unname(quantile(boot_means, alpha / 2)), upper = unname(quantile(boot_means, 1 - alpha / 2)) ) }

Step 7: Verify Coverage

covr::package_coverage()

calculate_ci.R: 100%

Testing Patterns

Testing Data Transformations

test_that("clean_data removes invalid rows", { input <- tibble( id = 1:4, value = c(1, NA, 3, -999) )

result <- clean_data(input, invalid_value = -999)

expect_equal(nrow(result), 2) expect_equal(result$id, c(1, 3)) expect_false(anyNA(result$value)) })

Testing Statistical Functions

test_that("weighted_mean matches manual calculation", { x <- c(1, 2, 3) w <- c(1, 2, 1)

result <- weighted_mean(x, w) expected <- sum(x * w) / sum(w) # (1 + 4 + 3) / 4 = 2

expect_equal(result, expected) })

Testing with Fixtures

helper-fixtures.R

read_fixture <- function(name) { path <- testthat::test_path("fixtures", name) readr::read_csv(path, show_col_types = FALSE) }

test-pipeline.R

test_that("pipeline handles real data", { input <- read_fixture("sample_data.csv") result <- process_pipeline(input)

expect_snapshot(result) })

Mocking External Dependencies

test_that("fetch_data handles API errors", {

Mock the API call

local_mocked_bindings( httr2_request = function(...) { stop("API unavailable") } )

expect_error( fetch_data("endpoint"), "API unavailable" ) })

Using withr for Cleanup

Use withr functions to manage temporary state with automatic restoration:

test_that("function respects options", {

Temporarily set options

withr::local_options(list(digits = 2))

result <- format_number(3.14159) expect_equal(result, "3.14") })

test_that("function writes to temp file", {

Create temp file that's automatically cleaned up

tmp <- withr::local_tempfile(lines = c("line 1", "line 2"))

result <- process_file(tmp) expect_equal(result$n_lines, 2) })

test_that("function uses custom environment variable", {

Temporarily set env var

withr::local_envvar(MY_VAR = "test_value")

result <- get_config() expect_equal(result$my_var, "test_value") })

Test Data Strategies

Choose the appropriate approach for your testing needs:

Constructor Functions

Create data on-demand with helper functions:

helper-data.R

make_sample_data <- function(n = 100) { tibble( id = 1:n, group = sample(c("A", "B"), n, replace = TRUE), value = rnorm(n) ) }

test-analysis.R

test_that("analysis handles grouped data", { data <- make_sample_data(n = 50) result <- analyze_groups(data) expect_s3_class(result, "tbl_df") })

Local Functions with Cleanup

Handle side effects using withr:

test_that("function reads CSV correctly", {

Create temp file with cleanup

tmp <- withr::local_tempfile(fileext = ".csv") write.csv(mtcars, tmp, row.names = FALSE)

result <- read_and_process(tmp) expect_equal(nrow(result), 32) })

Static Fixtures

Store data files in fixtures/ directory:

Store in: tests/testthat/fixtures/sample_data.csv

test_that("function handles real data format", { path <- test_path("fixtures", "sample_data.csv") data <- read_csv(path) result <- process_data(data) expect_true(all(result$valid)) })

Common Testing Mistakes to Avoid

WRONG: Testing Implementation Details

Don't test internal state

expect_equal(obj$internal_cache, expected_cache)

CORRECT: Test Behavior

Test observable behavior

expect_equal(get_result(obj), expected_result)

WRONG: Brittle Tests

Breaks on any output change

expect_equal(as.character(result), "Mean: 5.234567890")

CORRECT: Flexible Assertions

Robust to formatting changes

expect_equal(result$mean, 5.23, tolerance = 0.01)

WRONG: Dependent Tests

test_that("creates data", { global_data <<- create() }) test_that("uses data", { process(global_data) }) # Depends on previous!

CORRECT: Independent Tests

test_that("creates and uses data", { data <- create() result <- process(data) expect_true(is_valid(result)) })

WRONG: Modifying Tests to Pass

When a test fails, don't change the test (unless it's wrong)

test_that("function returns 42", { expect_equal(my_function(), 42) # Test fails })

DON'T DO THIS:

test_that("function returns 41", { expect_equal(my_function(), 41) # Changed to pass - WRONG! })

CORRECT: Fix the Implementation

Fix the code to match expected behavior

test_that("function returns 42", { expect_equal(my_function(), 42) # Test fails })

Fix my_function() implementation instead

When Tests Fail

Do NOT modify tests to make them pass (unless the test is wrong)
Fix the implementation to match expected behavior
Add more tests if the failure reveals missing coverage
Update snapshots only if the change is intentional

Review and accept snapshot changes

testthat::snapshot_review("test_name") testthat::snapshot_accept("test_name")

Coverage Verification

Run coverage report

covr::package_coverage()

Interactive HTML report

covr::report()

Check specific thresholds

cov <- covr::package_coverage() pct <- covr::percent_coverage(cov) if (pct < 80) { stop("Coverage below 80%: ", round(pct, 1), "%") }

In testthat.R or as a coverage check

covr::package_coverage( type = "all", line_coverage = 0.80, function_coverage = 0.80 )

Debugging & Development

Running Tests at Different Scales

Micro: Interactive development

devtools::load_all() expect_equal(my_function(1), 1) # Direct expectation

Mezzo: Single file

testthat::test_file("tests/testthat/test-validation.R")

RStudio: Ctrl/Cmd+Shift+T

Macro: Full suite

devtools::test() devtools::check() # Full package validation

Test Reporters

Find slow tests

devtools::test(reporter = "slow")

Progress reporter (verbose)

devtools::test(reporter = "progress")

Test execution order independence

devtools::test(shuffle = TRUE)

Continuous Testing

Watch mode - auto-run on file changes

testthat::auto_test_package()

Parallel Execution (Edition 3)

Edition 3 supports parallel test execution for faster runs on multi-core systems.

Running Tests

All tests

devtools::test()

All tests (keyboard shortcut)

RStudio: Ctrl/Cmd+Shift+T

With coverage

covr::package_coverage()

Specific file

testthat::test_file("tests/testthat/test-validation.R")

Watch mode

testthat::auto_test_package()

Verbose output

devtools::test(reporter = "progress")

Find slow tests

devtools::test(reporter = "slow")

Test independence

devtools::test(shuffle = TRUE)

Full package check

devtools::check()

Success Metrics

80%+ code coverage achieved
All tests passing
No skipped tests
Fast execution (< 30s for unit tests)
Tests catch bugs before production
Confident refactoring enabled
Tests run independently in any order
Clear, descriptive test names
Each test validates one concept

Remember: Tests are not optional. They are the safety net that enables confident refactoring, rapid development, and production reliability. Write them FIRST.

tdd-workflow

Safety Notice

Copy this and send it to your AI assistant to learn

Set up testthat (Edition 3)

Create a test file for an existing source file

Or create test and source file together

Review snapshot changes

Accept snapshot changes

GOOD: Self-contained

BAD: Depends on external state

setup_data <- tibble(...) # Created outside test

GOOD: Duplicated but clear

ACCEPTABLE: Each test is self-contained and readable

DEPRECATED: context() calls

DEPRECATED: expect_equivalent()

MODERN:

DEPRECATED: with_mock()

MODERN:

DEPRECATED: expect_is()

MODERN:

Function: calculate_ci

Purpose: Calculate bootstrap confidence intervals

Inputs:

- data: numeric vector

- conf_level: confidence level (default 0.95)

- n_boot: number of bootstrap samples (default 1000)

Outputs:

- Named numeric vector with lower and upper bounds

Edge cases:

- Handle NA values

- Error on non-numeric input

- Error on empty input

tests/testthat/test-calculate_ci.R

99% CI should be wider

✖ calculate_ci returns correct structure

✖ calculate_ci respects confidence level

✖ calculate_ci handles NA values

✖ calculate_ci validates inputs

R/calculate_ci.R

Validate inputs

Remove NA values

Bootstrap

Calculate quantiles

✔ calculate_ci returns correct structure

✔ calculate_ci respects confidence level

✔ calculate_ci handles NA values

✔ calculate_ci validates inputs

Extract validation to helper

calculate_ci.R: 100%

helper-fixtures.R

test-pipeline.R

Mock the API call

Temporarily set options

Create temp file that's automatically cleaned up

Temporarily set env var

helper-data.R

test-analysis.R

Create temp file with cleanup

Store in: tests/testthat/fixtures/sample_data.csv

Don't test internal state

Test observable behavior

Breaks on any output change

Robust to formatting changes

When a test fails, don't change the test (unless it's wrong)

DON'T DO THIS:

Fix the code to match expected behavior

Fix my_function() implementation instead

Review and accept snapshot changes

Run coverage report

Interactive HTML report

Check specific thresholds

In testthat.R or as a coverage check

Micro: Interactive development

Mezzo: Single file

RStudio: Ctrl/Cmd+Shift+T

Macro: Full suite

Find slow tests

Progress reporter (verbose)

Test execution order independence

Watch mode - auto-run on file changes