R Package Development Decision Guide

Dependencies, API design, testing, documentation, and best practices for R packages

Dependency Strategy

When to Add Dependencies vs Base R

Add dependency when:

- Significant functionality gain

- Maintenance burden reduction

- User experience improvement

- Complex implementation (regex, dates, web)

Use base R when:

- Simple utility functions

- Package will be widely used (minimize deps)

- Dependency is large for small benefit

- Base R solution is straightforward

Example decisions:

str_detect(x, "pattern") # Worth stringr dependency length(x) > 0 # Don't need purrr for this parse_dates(x) # Worth lubridate dependency x + 1 # Don't need dplyr for this

Tidyverse Dependency Guidelines

Core tidyverse (usually worth it):

dplyr # Complex data manipulation purrr # Functional programming, parallel stringr # String manipulation tidyr # Data reshaping

Specialized tidyverse (evaluate carefully):

lubridate # If heavy date manipulation forcats # If many categorical operations readr # If specific file reading needs ggplot2 # If package creates visualizations

Heavy dependencies (use sparingly):

tidyverse # Meta-package, very heavy shiny # Only for interactive apps

Dependency Specification in DESCRIPTION

Strong dependencies (required)

Imports: dplyr (>= 1.1.0), rlang (>= 1.0.0)

Suggested dependencies (optional)

Suggests: testthat (>= 3.0.0), knitr, rmarkdown

Enhanced functionality (optional but loaded if available)

Enhances: data.table

API Design Patterns

Function Design Strategy

Modern tidyverse API patterns

1. Use .by for per-operation grouping

my_summarise <- function(.data, ..., .by = NULL) {

Support modern grouped operations

}

2. Use {{ }} for user-provided columns

my_select <- function(.data, cols) { .data |> select({{ cols }}) }

3. Use ... for flexible arguments

my_mutate <- function(.data, ..., .by = NULL) { .data |> mutate(..., .by = {{ .by }}) }

4. Return consistent types (tibbles, not data.frames)

my_function <- function(.data) { result |> tibble::as_tibble() }

Input Validation Strategy

Validation level by function type:

User-facing functions - comprehensive validation

user_function <- function(x, threshold = 0.5) {

Check all inputs thoroughly

if (!is.numeric(x)) stop("x must be numeric") if (!is.numeric(threshold) || length(threshold) != 1) { stop("threshold must be a single number") }

... function body

}

Internal functions - minimal validation

.internal_function <- function(x, threshold) {

Assume inputs are valid (document assumptions)

Only check critical invariants

... function body

}

Package functions with vctrs - type-stable validation

safe_function <- function(x, y) { x <- vec_cast(x, double()) y <- vec_cast(y, double())

Automatic type checking and coercion

}

Error Handling Patterns

Good error messages - specific and actionable

if (length(x) == 0) { cli::cli_abort( "Input {.arg x} cannot be empty.", "i" = "Provide a non-empty vector." ) }

Include function name in errors

validate_input <- function(x, call = caller_env()) { if (!is.numeric(x)) { cli::cli_abort("Input must be numeric", call = call) } }

Use consistent error styling

cli package for user-friendly messages

rlang for developer tools

Error Classes

Custom error classes for programmatic handling

my_error <- function(message, ..., call = caller_env()) { cli::cli_abort( message, ..., class = "my_package_error", call = call ) }

Specific error types

validation_error <- function(message, ..., call = caller_env()) { cli::cli_abort( message, ..., class = c("validation_error", "my_package_error"), call = call ) }

When to Create Internal vs Exported Functions

Export Function When

Export when:

- Users will call it directly

- Other packages might want to extend it

- Part of the core package functionality

- Stable API that won't change often

Example: main data processing functions

#' @export process_data <- function(.data, ...) {

Comprehensive input validation

Full documentation required

Stable API contract

}

Keep Function Internal When

Keep internal when:

- Implementation detail that may change

- Only used within package

- Complex implementation helpers

- Would clutter user-facing API

Example: helper functions (no @export)

.validate_input <- function(x, y) {

Minimal documentation

Can change without breaking users

Assume inputs are pre-validated

}

Naming convention: prefix with . for internal functions

.compute_metrics <- function(data) { ... }

Testing and Documentation Strategy

Testing Levels

Unit tests - individual functions

test_that("function handles edge cases", { expect_equal(my_func(c()), expected_empty_result) expect_error(my_func(NULL), class = "my_error_class") })

Integration tests - workflow combinations

test_that("pipeline works end-to-end", { result <- data |> step1() |> step2() |> step3() expect_s3_class(result, "expected_class") })

Property-based tests for package functions

test_that("function properties hold", {

Test invariants across many inputs

})

Test File Organization

tests/ testthat/ test-validation.R # Input validation tests test-processing.R # Core processing tests test-output.R # Output format tests test-integration.R # End-to-end tests helper-fixtures.R # Shared test fixtures testthat.R # Test runner

Snapshot Testing

For complex outputs that are hard to specify exactly

test_that("summary output is correct", { expect_snapshot(summary(my_object)) })

For error messages

test_that("errors are informative", expect_snapshot(my_function(bad_input), error = TRUE) })

Documentation Priorities

Must document:

- All exported functions

- Complex algorithms or formulas

- Non-obvious parameter interactions

- Examples of typical usage

Can skip documentation:

- Simple internal helpers

- Obvious parameter meanings

- Functions that just call other functions

roxygen2 Documentation

#' Process and summarize data #' #' @description #' Takes a data frame and computes summary statistics #' for specified variables. #' #' @param data A data frame or tibble. #' @param vars <[tidy-select][dplyr::dplyr_tidy_select]> Columns to summarize. #' @param .by <[data-masking][dplyr::dplyr_data_masking]> Optional grouping variable. #' #' @return A tibble with summary statistics. #' #' @examples #' mtcars |> process_data(mpg, .by = cyl) #' #' @export process_data <- function(data, vars, .by = NULL) {

...

}

Package Structure

Recommended Directory Layout

mypackage/ DESCRIPTION NAMESPACE LICENSE README.md R/ utils.R # Internal utilities validation.R # Input validation core.R # Core functionality methods.R # S3/S7 methods zzz.R # .onLoad, .onAttach man/ # Generated by roxygen2 tests/ testthat/ testthat.R vignettes/ getting-started.Rmd inst/ extdata/ # Example data files data/ # Package data (lazy-loaded) data-raw/ # Scripts to create package data

DESCRIPTION Best Practices

Package: mypackage Title: What The Package Does (One Line) Version: 0.1.0 Authors@R: person("First", "Last", email = "email@example.com", role = c("aut", "cre")) Description: A longer description that spans multiple lines. Use four spaces for continuation lines. License: MIT + file LICENSE Encoding: UTF-8 Roxygen: list(markdown = TRUE) RoxygenNote: 7.2.3 Imports: dplyr (>= 1.1.0), rlang (>= 1.0.0) Suggests: testthat (>= 3.0.0) Config/testthat/edition: 3

Release Checklist

Before release:

devtools::check() # Must pass with 0 errors, warnings, notes devtools::test() # All tests pass devtools::document() # Documentation up to date urlchecker::url_check() # All URLs valid spelling::spell_check_package() # No typos

Update version

usethis::use_version("minor") # or "major", "patch"

Update NEWS.md with changes

Final checks

devtools::check(remote = TRUE, manual = TRUE)

Common Package Development Mistakes

Avoid - Using library() in package code

library(dplyr) # Never in package code!

Good - Use namespace qualification

dplyr::filter(data, x > 0)

Or import in NAMESPACE via roxygen2

#' @importFrom dplyr filter mutate

Avoid - Modifying global state

options(my_option = TRUE) # Side effect!

Good - Restore state if you must modify

old_opts <- options(my_option = TRUE) on.exit(options(old_opts), add = TRUE)

Avoid - Hardcoded paths

read.csv("/home/user/data.csv")

Good - Use system.file for package data

system.file("extdata", "data.csv", package = "mypackage")

r-package-development

Safety Notice

Copy this and send it to your AI assistant to learn

Add dependency when:

- Significant functionality gain

- Maintenance burden reduction

- User experience improvement

- Complex implementation (regex, dates, web)

Use base R when:

- Simple utility functions

- Package will be widely used (minimize deps)

- Dependency is large for small benefit

- Base R solution is straightforward

Example decisions:

Core tidyverse (usually worth it):

Specialized tidyverse (evaluate carefully):

Heavy dependencies (use sparingly):

Strong dependencies (required)

Suggested dependencies (optional)

Enhanced functionality (optional but loaded if available)

Modern tidyverse API patterns

1. Use .by for per-operation grouping

Support modern grouped operations

2. Use {{ }} for user-provided columns

3. Use ... for flexible arguments

4. Return consistent types (tibbles, not data.frames)

Validation level by function type:

User-facing functions - comprehensive validation

Check all inputs thoroughly

... function body

Internal functions - minimal validation

Assume inputs are valid (document assumptions)

Only check critical invariants

... function body

Package functions with vctrs - type-stable validation

Automatic type checking and coercion

Good error messages - specific and actionable

Include function name in errors

Use consistent error styling

cli package for user-friendly messages

rlang for developer tools

Custom error classes for programmatic handling

Specific error types

Export when:

- Users will call it directly

- Other packages might want to extend it

- Part of the core package functionality

- Stable API that won't change often

Example: main data processing functions

Comprehensive input validation

Full documentation required

Stable API contract

Keep internal when:

- Implementation detail that may change

- Only used within package

- Complex implementation helpers

- Would clutter user-facing API

Example: helper functions (no @export)

Minimal documentation

Can change without breaking users

Assume inputs are pre-validated

Naming convention: prefix with . for internal functions

Unit tests - individual functions

Integration tests - workflow combinations

Property-based tests for package functions

Test invariants across many inputs

For complex outputs that are hard to specify exactly

For error messages

Must document:

- All exported functions

- Complex algorithms or formulas

- Non-obvious parameter interactions

- Examples of typical usage

Can skip documentation:

- Simple internal helpers

- Obvious parameter meanings

- Functions that just call other functions

...

Before release:

Update version