tidyverse-patterns

Modern Tidyverse Patterns

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "tidyverse-patterns" with this command: npx skills add ab604/claude-code-r-skills/ab604-claude-code-r-skills-tidyverse-patterns

Modern Tidyverse Patterns

Best practices for modern tidyverse development with dplyr 1.1+ and R 4.3+

Core Principles

  • Use modern tidyverse patterns - Prioritize dplyr 1.1+ features, native pipe, and current APIs

  • Profile before optimizing - Use profvis and bench to identify real bottlenecks

  • Write readable code first - Optimize only when necessary and after profiling

  • Follow tidyverse style guide - Consistent naming, spacing, and structure

Pipe Usage (|> not %>% )

  • Always use native pipe |> instead of magrittr %>%

  • R 4.3+ provides all needed features

Good - Modern native pipe

data |> filter(year >= 2020) |> summarise(mean_value = mean(value))

Avoid - Legacy magrittr pipe

data %>% filter(year >= 2020) %>% summarise(mean_value = mean(value))

Join Syntax (dplyr 1.1+)

  • Use join_by() instead of character vectors for joins

  • Support for inequality, rolling, and overlap joins

Good - Modern join syntax

transactions |> inner_join(companies, by = join_by(company == id))

Good - Inequality joins

transactions |> inner_join(companies, join_by(company == id, year >= since))

Good - Rolling joins (closest match)

transactions |> inner_join(companies, join_by(company == id, closest(year >= since)))

Avoid - Old character vector syntax

transactions |> inner_join(companies, by = c("company" = "id"))

Multiple Match Handling

  • Use multiple and unmatched arguments for quality control

Expect 1:1 matches, error on multiple

inner_join(x, y, by = join_by(id), multiple = "error")

Allow multiple matches explicitly

inner_join(x, y, by = join_by(id), multiple = "all")

Ensure all rows match

inner_join(x, y, by = join_by(id), unmatched = "error")

Data Masking and Tidy Selection

  • Understand the difference between data masking and tidy selection

  • Use {{}} (embrace) for function arguments

  • Use .data[[]] for character vectors

Data masking functions: arrange(), filter(), mutate(), summarise()

Tidy selection functions: select(), relocate(), across()

Function arguments - embrace with {{}}

my_summary <- function(data, group_var, summary_var) { data |> group_by({{ group_var }}) |> summarise(mean_val = mean({{ summary_var }})) }

Character vectors - use .data[[]]

for (var in names(mtcars)) { mtcars |> count(.data[[var]]) |> print() }

Multiple columns - use across()

data |> summarise(across({{ summary_vars }}, ~ mean(.x, na.rm = TRUE)))

Modern Grouping and Column Operations

  • Use .by for per-operation grouping (dplyr 1.1+)

  • Use pick() for column selection inside data-masking functions

  • Use across() for applying functions to multiple columns

  • Use reframe() for multi-row summaries

Good - Per-operation grouping (always returns ungrouped)

data |> summarise(mean_value = mean(value), .by = category)

Good - Multiple grouping variables

data |> summarise(total = sum(revenue), .by = c(company, year))

Good - pick() for column selection

data |> summarise( n_x_cols = ncol(pick(starts_with("x"))), n_y_cols = ncol(pick(starts_with("y"))) )

Good - across() for applying functions

data |> summarise(across(where(is.numeric), mean, .names = "mean_{.col}"), .by = group)

Good - reframe() for multi-row results

data |> reframe(quantiles = quantile(x, c(0.25, 0.5, 0.75)), .by = group)

Avoid - Old persistent grouping pattern

data |> group_by(category) |> summarise(mean_value = mean(value)) |> ungroup()

Modern purrr Patterns

  • Use map() |> list_rbind() instead of superseded map_dfr()

  • Use walk() for side effects (file writing, plotting)

  • Use in_parallel() for scaling across cores

Modern data frame row binding (purrr 1.0+)

models <- data_splits |> map((split) train_model(split)) |> list_rbind() # Replaces map_dfr()

Column binding

summaries <- data_list |> map((df) get_summary_stats(df)) |> list_cbind() # Replaces map_dfc()

Side effects with walk()

plots <- walk2(data_list, plot_names, (df, name) { p <- ggplot(df, aes(x, y)) + geom_point() ggsave(name, p) })

Parallel processing (purrr 1.1.0+)

library(mirai) daemons(4) results <- large_datasets |> map(in_parallel(expensive_computation)) daemons(0)

String Manipulation with stringr

  • Use stringr over base R string functions

  • Consistent str_ prefix and string-first argument order

  • Pipe-friendly and vectorized by design

Good - stringr (consistent, pipe-friendly)

text |> str_to_lower() |> str_trim() |> str_replace_all("pattern", "replacement") |> str_extract("\d+")

Common patterns

str_detect(text, "pattern") # vs grepl("pattern", text) str_extract(text, "pattern") # vs complex regmatches() str_replace_all(text, "a", "b") # vs gsub("a", "b", text) str_split(text, ",") # vs strsplit(text, ",") str_length(text) # vs nchar(text) str_sub(text, 1, 5) # vs substr(text, 1, 5)

String combination and formatting

str_c("a", "b", "c") # vs paste0() str_glue("Hello {name}!") # templating str_pad(text, 10, "left") # padding str_wrap(text, width = 80) # text wrapping

Case conversion

str_to_lower(text) # vs tolower() str_to_upper(text) # vs toupper() str_to_title(text) # vs tools::toTitleCase()

Pattern helpers for clarity

str_detect(text, fixed("$")) # literal match str_detect(text, regex("\d+")) # explicit regex str_detect(text, coll("e", locale = "fr")) # collation

Avoid - inconsistent base R functions

grepl("pattern", text) # argument order varies regmatches(text, regexpr(...)) # complex extraction gsub("a", "b", text) # different arg order

Vectorization and Performance

Good - vectorized operations

result <- x + y

Good - Type-stable purrr functions

map_dbl(data, mean) # always returns double map_chr(data, class) # always returns character

Avoid - Type-unstable base functions

sapply(data, mean) # might return list or vector

Avoid - explicit loops for simple operations

result <- numeric(length(x)) for(i in seq_along(x)) { result[i] <- x[i] + y[i] }

Common Anti-Patterns to Avoid

Legacy Patterns

Avoid - Old pipe

data %>% function()

Avoid - Old join syntax

inner_join(x, y, by = c("a" = "b"))

Avoid - Implicit type conversion

sapply() # Use map_*() instead

Avoid - String manipulation in data masking

mutate(data, !!paste0("new_", var) := value)

Use across() or other approaches instead

Performance Anti-Patterns

Avoid - Growing objects in loops

result <- c() for(i in 1:n) { result <- c(result, compute(i)) # Slow! }

Good - Pre-allocate

result <- vector("list", n) for(i in 1:n) { result[[i]] <- compute(i) }

Better - Use purrr

result <- map(1:n, compute)

Migration from Old Patterns

From Base R to Modern Tidyverse

Data manipulation

subset(data, condition) -> filter(data, condition) data[order(data$x), ] -> arrange(data, x) aggregate(x ~ y, data, mean) -> summarise(data, mean(x), .by = y)

Functional programming

sapply(x, f) -> map(x, f) # type-stable lapply(x, f) -> map(x, f)

String manipulation

grepl("pattern", text) -> str_detect(text, "pattern") gsub("old", "new", text) -> str_replace_all(text, "old", "new") substr(text, 1, 5) -> str_sub(text, 1, 5) nchar(text) -> str_length(text) strsplit(text, ",") -> str_split(text, ",") paste0(a, b) -> str_c(a, b) tolower(text) -> str_to_lower(text)

From Old to New Tidyverse Patterns

Pipes

data %>% function() -> data |> function()

Grouping (dplyr 1.1+)

group_by(data, x) |> summarise(mean(y)) |> ungroup() -> summarise(data, mean(y), .by = x)

Column selection

across(starts_with("x")) -> pick(starts_with("x")) # for selection only

Joins

by = c("a" = "b") -> by = join_by(a == b)

Multi-row summaries

summarise(data, x, .groups = "drop") -> reframe(data, x)

Data reshaping

gather()/spread() -> pivot_longer()/pivot_wider()

String separation (tidyr 1.3+)

separate(col, into = c("a", "b")) -> separate_wider_delim(col, delim = "_", names = c("a", "b")) extract(col, into = "x", regex) -> separate_wider_regex(col, patterns = c(x = regex))

Superseded purrr Functions (purrr 1.0+)

map_dfr(x, f) -> map(x, f) |> list_rbind() map_dfc(x, f) -> map(x, f) |> list_cbind() map2_dfr(x, y, f) -> map2(x, y, f) |> list_rbind() pmap_dfr(list, f) -> pmap(list, f) |> list_rbind() imap_dfr(x, f) -> imap(x, f) |> list_rbind()

For side effects

walk(x, write_file) # instead of for loops walk2(data, paths, write_csv) # multiple arguments

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

r-performance

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

r-style-guide

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

rlang-patterns

No summary provided by upstream source.

Repository SourceNeeds Review