r-guide

Applies to: R 4.1+, Statistical Computing, Data Analysis, R Packages, Shiny Apps

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "r-guide" with this command: npx skills add ar4mirez/samuel/ar4mirez-samuel-r-guide

R Guide

Applies to: R 4.1+, Statistical Computing, Data Analysis, R Packages, Shiny Apps

Core Principles

  • Tidyverse First: Use tidyverse conventions for data manipulation, visualization, and functional programming; fall back to base R only when performance demands it

  • Vectorize Everything: Prefer vectorized operations and purrr::map() over explicit for loops; R is optimized for vector operations

  • Reproducibility: Every analysis must be reproducible -- use renv for dependency management, set.seed() for stochastic operations, and R Markdown/Quarto for literate programming

  • Functional Style: Write pure functions with no side effects; avoid modifying global state or relying on .GlobalEnv

  • Explicit Over Implicit: No reliance on partial matching, implicit type coercion, or positional argument passing for non-trivial functions

Guardrails

Version & Dependencies

  • Target R 4.1+ (native pipe |> , lambda shorthand (x) )

  • Manage dependencies with renv -- always commit renv.lock

  • For packages, declare all dependencies in DESCRIPTION (Imports: , Suggests: )

  • Pin CRAN snapshot dates in renv for full reproducibility

  • Audit new dependencies: check CRAN status, reverse dependencies, license (GPL compatibility)

Code Style

  • Follow the tidyverse style guide

  • Run styler::style_pkg() and lintr::lint_package() before every commit

  • Naming: snake_case for functions/variables, PascalCase for R6/S4 classes

  • Max line length: 80 characters

  • Use <- for assignment (not = outside function arguments)

  • Explicit library() at top of scripts; never use require()

  • Always use TRUE /FALSE (never T /F -- they can be overwritten)

  • No attach() or setwd() -- use here::here() for project-relative paths

Vectorization

  • Prefer vectorized operations: x * 2 not for (i in seq_along(x)) x[i] * 2

  • Use dplyr::mutate() / dplyr::summarise() for column-wise transformations

  • Use purrr::map() family for list iteration (map_dbl() , map_chr() , map_dfr() )

  • Use dplyr::across() for applying functions to multiple columns

  • Reserve for loops for side effects only (writing files, API calls)

  • Use vapply() over sapply() when base R is required (explicit return type)

Error Handling

  • Use rlang::abort() / cli::cli_abort() over stop() for structured conditions

  • Validate inputs at the start of every exported function

  • Use stopifnot() or rlang::arg_match() for argument validation

  • Never use try() -- always tryCatch() or purrr::safely()

validate_dataframe <- function(df, required_cols) { if (!is.data.frame(df)) { cli::cli_abort("{.arg df} must be a data frame, not {.obj_type_friendly {df}}.") } missing_cols <- setdiff(required_cols, names(df)) if (length(missing_cols) > 0) { cli::cli_abort( "Missing required column{?s}: {.field {missing_cols}}.", class = "validation_error" ) } invisible(df) }

Reproducibility

  • Always use set.seed() before stochastic operations; document the seed

  • Use renv::snapshot() after adding or updating packages

  • Never use absolute paths -- use here::here() for project-relative paths

  • Use R Markdown (.Rmd ) or Quarto (.qmd ) for analysis reports

  • Include sessioninfo::session_info() at the end of reports

Project Structure

mypackage/ myanalysis/ ├── R/ # Source files ├── R/ # Reusable functions │ ├── data-clean.R ├── analysis/ # Rmd/Quarto (numbered) │ └── utils.R │ ├── 01-exploration.Rmd ├── tests/ │ └── 02-modeling.qmd │ ├── testthat.R # Runner ├── data/ │ └── testthat/ │ ├── raw/ # Immutable input │ └── test-data-clean.R │ └── processed/ # Generated output ├── man/ # roxygen2 ├── output/ # Figures, reports ├── vignettes/ ├── tests/testthat/ ├── data-raw/ # Data scripts ├── renv.lock ├── DESCRIPTION └── README.md ├── NAMESPACE # roxygen2 ├── renv.lock └── README.md

  • Use roxygen2 for all docs; never edit man/ or NAMESPACE by hand

  • Raw data is immutable -- store in data/raw/ , process into data/processed/

Key Patterns

Tidyverse Pipe Chains

Prefer native pipe |> (R 4.1+) over magrittr %>%

result <- raw_data |> dplyr::filter(year >= 2020, !is.na(revenue)) |> dplyr::mutate( revenue_m = revenue / 1e6, growth = (revenue - dplyr::lag(revenue)) / dplyr::lag(revenue) ) |> dplyr::summarise( mean_revenue = mean(revenue_m, na.rm = TRUE), .by = region )

Tidy Evaluation

Use {{ }} (embrace) for column names passed as arguments

summarise_by <- function(df, group_col, value_col) { df |> dplyr::summarise( mean_val = mean({{ value_col }}, na.rm = TRUE), n = dplyr::n(), .by = {{ group_col }} ) }

Use .data pronoun for string column references

filter_column <- function(df, col_name, threshold) { df |> dplyr::filter(.data[[col_name]] > threshold) }

Use across() for multiple columns

standardize_numeric <- function(df) { df |> dplyr::mutate(dplyr::across( where(is.numeric), (x) (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE) )) }

ggplot2 Grammar of Graphics

plot_distribution <- function(df, x_col, fill_col = NULL) { ggplot2::ggplot(df, ggplot2::aes(x = {{ x_col }})) + ggplot2::geom_histogram(ggplot2::aes(fill = {{ fill_col }}), bins = 30, alpha = 0.7) + ggplot2::labs(title = "Distribution", x = NULL, y = "Count") + ggplot2::theme_minimal(base_size = 14) }

Functional Programming with purrr

Type-stable map variants -- read and combine CSV files

results <- purrr::map_dfr(file_paths, (path) { readr::read_csv(path, show_col_types = FALSE) |> dplyr::mutate(source_file = basename(path)) })

Safe execution -- capture errors without stopping

safe_read <- purrr::safely(readr::read_csv) reads <- purrr::map(file_paths, safe_read) successes <- purrr::map(purrr::keep(reads, (x) is.null(x$error)), "result")

Testing

Standards

  • Use testthat 3rd edition (Config/testthat/edition: 3 in DESCRIPTION )

  • Test files: test-*.R (mirror source: data-clean.R -> test-data-clean.R )

  • Test names describe behavior: test_that("filter_active removes inactive users", ...)

  • Coverage target: >80% for business logic, >60% overall (measured with covr )

  • Use snapshot tests (expect_snapshot() ) for complex output (plots, printed tables)

  • No test interdependencies -- each test_that() block is self-contained

  • Use withr::local_*() for temporary state changes (env vars, options, files)

testthat Examples

test_that("summarise_by computes correct group means", { df <- tibble::tibble( region = c("east", "east", "west", "west"), revenue = c(100, 200, 300, 400) ) result <- summarise_by(df, region, revenue) expect_equal(nrow(result), 2) expect_equal(result$mean_val[result$region == "east"], 150) })

test_that("validate_dataframe errors on missing columns", { df <- tibble::tibble(a = 1, b = 2) expect_error(validate_dataframe(df, c("a", "c")), class = "validation_error") })

Tooling

Essential Commands

Rscript -e 'styler::style_pkg()' # Format package code Rscript -e 'lintr::lint_package()' # Lint package Rscript -e 'devtools::test()' # Run tests Rscript -e 'covr::package_coverage()' # Coverage report Rscript -e 'devtools::check()' # Full R CMD check Rscript -e 'renv::snapshot()' # Lock dependencies Rscript -e 'devtools::document()' # Rebuild roxygen2 docs quarto render analysis/report.qmd # Render Quarto document

References

For detailed patterns and examples, see:

  • references/patterns.md -- dplyr pipelines, ggplot2 recipes, purrr functional patterns

External References

  • Tidyverse Style Guide

  • R for Data Science (2e)

  • Advanced R (2e)

  • R Packages (2e)

  • Tidy Evaluation

  • testthat 3e Documentation

  • ggplot2 Documentation

  • renv Documentation

  • Quarto Guide

  • lintr Documentation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

actix-web

No summary provided by upstream source.

Repository SourceNeeds Review
General

frontend-design

No summary provided by upstream source.

Repository SourceNeeds Review
General

blazor

No summary provided by upstream source.

Repository SourceNeeds Review