R Guide

Applies to: R 4.1+, Statistical Computing, Data Analysis, R Packages, Shiny Apps

Core Principles

Tidyverse First: Use tidyverse conventions for data manipulation, visualization, and functional programming; fall back to base R only when performance demands it
Vectorize Everything: Prefer vectorized operations and purrr::map() over explicit for loops; R is optimized for vector operations
Reproducibility: Every analysis must be reproducible -- use renv for dependency management, set.seed() for stochastic operations, and R Markdown/Quarto for literate programming
Functional Style: Write pure functions with no side effects; avoid modifying global state or relying on .GlobalEnv
Explicit Over Implicit: No reliance on partial matching, implicit type coercion, or positional argument passing for non-trivial functions

Guardrails

Version & Dependencies

Target R 4.1+ (native pipe |> , lambda shorthand (x) )
Manage dependencies with renv -- always commit renv.lock
For packages, declare all dependencies in DESCRIPTION (Imports: , Suggests: )
Pin CRAN snapshot dates in renv for full reproducibility
Audit new dependencies: check CRAN status, reverse dependencies, license (GPL compatibility)

Code Style

Follow the tidyverse style guide
Run styler::style_pkg() and lintr::lint_package() before every commit
Naming: snake_case for functions/variables, PascalCase for R6/S4 classes
Max line length: 80 characters
Use <- for assignment (not = outside function arguments)
Explicit library() at top of scripts; never use require()
Always use TRUE /FALSE (never T /F -- they can be overwritten)
No attach() or setwd() -- use here::here() for project-relative paths

Vectorization

Prefer vectorized operations: x * 2 not for (i in seq_along(x)) x[i] * 2
Use dplyr::mutate() / dplyr::summarise() for column-wise transformations
Use purrr::map() family for list iteration (map_dbl() , map_chr() , map_dfr() )
Use dplyr::across() for applying functions to multiple columns
Reserve for loops for side effects only (writing files, API calls)
Use vapply() over sapply() when base R is required (explicit return type)

Error Handling

Use rlang::abort() / cli::cli_abort() over stop() for structured conditions
Validate inputs at the start of every exported function
Use stopifnot() or rlang::arg_match() for argument validation
Never use try() -- always tryCatch() or purrr::safely()

validate_dataframe <- function(df, required_cols) { if (!is.data.frame(df)) { cli::cli_abort("{.arg df} must be a data frame, not {.obj_type_friendly {df}}.") } missing_cols <- setdiff(required_cols, names(df)) if (length(missing_cols) > 0) { cli::cli_abort( "Missing required column{?s}: {.field {missing_cols}}.", class = "validation_error" ) } invisible(df) }

Reproducibility

Always use set.seed() before stochastic operations; document the seed
Use renv::snapshot() after adding or updating packages
Never use absolute paths -- use here::here() for project-relative paths
Use R Markdown (.Rmd ) or Quarto (.qmd ) for analysis reports
Include sessioninfo::session_info() at the end of reports

Project Structure

mypackage/ myanalysis/ ├── R/ # Source files ├── R/ # Reusable functions │ ├── data-clean.R ├── analysis/ # Rmd/Quarto (numbered) │ └── utils.R │ ├── 01-exploration.Rmd ├── tests/ │ └── 02-modeling.qmd │ ├── testthat.R # Runner ├── data/ │ └── testthat/ │ ├── raw/ # Immutable input │ └── test-data-clean.R │ └── processed/ # Generated output ├── man/ # roxygen2 ├── output/ # Figures, reports ├── vignettes/ ├── tests/testthat/ ├── data-raw/ # Data scripts ├── renv.lock ├── DESCRIPTION └── README.md ├── NAMESPACE # roxygen2 ├── renv.lock └── README.md

Use roxygen2 for all docs; never edit man/ or NAMESPACE by hand
Raw data is immutable -- store in data/raw/ , process into data/processed/

Key Patterns

Tidyverse Pipe Chains

Prefer native pipe |> (R 4.1+) over magrittr %>%

result <- raw_data |> dplyr::filter(year >= 2020, !is.na(revenue)) |> dplyr::mutate( revenue_m = revenue / 1e6, growth = (revenue - dplyr::lag(revenue)) / dplyr::lag(revenue) ) |> dplyr::summarise( mean_revenue = mean(revenue_m, na.rm = TRUE), .by = region )

Tidy Evaluation

Use {{ }} (embrace) for column names passed as arguments

summarise_by <- function(df, group_col, value_col) { df |> dplyr::summarise( mean_val = mean({{ value_col }}, na.rm = TRUE), n = dplyr::n(), .by = {{ group_col }} ) }

Use .data pronoun for string column references

filter_column <- function(df, col_name, threshold) { df |> dplyr::filter(.data[[col_name]] > threshold) }

Use across() for multiple columns

standardize_numeric <- function(df) { df |> dplyr::mutate(dplyr::across( where(is.numeric), (x) (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE) )) }

ggplot2 Grammar of Graphics

plot_distribution <- function(df, x_col, fill_col = NULL) { ggplot2::ggplot(df, ggplot2::aes(x = {{ x_col }})) + ggplot2::geom_histogram(ggplot2::aes(fill = {{ fill_col }}), bins = 30, alpha = 0.7) + ggplot2::labs(title = "Distribution", x = NULL, y = "Count") + ggplot2::theme_minimal(base_size = 14) }

Functional Programming with purrr

Type-stable map variants -- read and combine CSV files

results <- purrr::map_dfr(file_paths, (path) { readr::read_csv(path, show_col_types = FALSE) |> dplyr::mutate(source_file = basename(path)) })

Safe execution -- capture errors without stopping

safe_read <- purrr::safely(readr::read_csv) reads <- purrr::map(file_paths, safe_read) successes <- purrr::map(purrr::keep(reads, (x) is.null(x$error)), "result")

Testing

Standards

Use testthat 3rd edition (Config/testthat/edition: 3 in DESCRIPTION )
Test files: test-*.R (mirror source: data-clean.R -> test-data-clean.R )
Test names describe behavior: test_that("filter_active removes inactive users", ...)
Coverage target: >80% for business logic, >60% overall (measured with covr )
Use snapshot tests (expect_snapshot() ) for complex output (plots, printed tables)
No test interdependencies -- each test_that() block is self-contained
Use withr::local_*() for temporary state changes (env vars, options, files)

testthat Examples

test_that("summarise_by computes correct group means", { df <- tibble::tibble( region = c("east", "east", "west", "west"), revenue = c(100, 200, 300, 400) ) result <- summarise_by(df, region, revenue) expect_equal(nrow(result), 2) expect_equal(result$mean_val[result$region == "east"], 150) })

test_that("validate_dataframe errors on missing columns", { df <- tibble::tibble(a = 1, b = 2) expect_error(validate_dataframe(df, c("a", "c")), class = "validation_error") })

Tooling

Essential Commands

Rscript -e 'styler::style_pkg()' # Format package code Rscript -e 'lintr::lint_package()' # Lint package Rscript -e 'devtools::test()' # Run tests Rscript -e 'covr::package_coverage()' # Coverage report Rscript -e 'devtools::check()' # Full R CMD check Rscript -e 'renv::snapshot()' # Lock dependencies Rscript -e 'devtools::document()' # Rebuild roxygen2 docs quarto render analysis/report.qmd # Render Quarto document

References

For detailed patterns and examples, see:

references/patterns.md -- dplyr pipelines, ggplot2 recipes, purrr functional patterns

External References

Tidyverse Style Guide
R for Data Science (2e)
Advanced R (2e)
R Packages (2e)
Tidy Evaluation
testthat 3e Documentation
ggplot2 Documentation
renv Documentation
Quarto Guide
lintr Documentation

r-guide

Safety Notice

Copy this and send it to your AI assistant to learn

Prefer native pipe |> (R 4.1+) over magrittr %>%

Use {{ }} (embrace) for column names passed as arguments

Use .data pronoun for string column references

Use across() for multiple columns

Type-stable map variants -- read and combine CSV files

Safe execution -- capture errors without stopping

Source Transparency

Related Skills

actix-web

frontend-design

blazor