Vectorization
- Loops are slow — use
apply(),lapply(),sapply(), orpurrr::map() - Vectorized functions operate on whole vectors —
sum(x)notfor (i in x) total <- total + i ifelse()is vectorized —ifis not, useifelse()for vector conditions- Column operations faster than row — R is column-major
Indexing Gotchas
- R is 1-indexed — first element is
x[1], notx[0] x[0]returns empty vector — not error, silent bug- Negative index excludes —
x[-1]removes first element [[extracts single element —[returns subset (list stays list)df[, 1]drops to vector — usedf[, 1, drop = FALSE]to keep data frame
NA Handling
- NA propagates —
1 + NAisNA,NA == NAisNA - Use
is.na()to check — notx == NA - Most functions need
na.rm = TRUE—mean(x)returns NA if any NA present na.omit()removes rows with any NA — may lose data unexpectedlycomplete.cases()returns logical vector — rows without NA
Factor Traps
- Old R converted strings to factors by default — use
stringsAsFactors = FALSEor modern R levels()shows categories — but factor values are integers internally- Adding new value not in levels gives NA — use
factor(x, levels = c(old, new)) as.numeric(factor)gives level indices — useas.numeric(as.character(factor))for values- Dropping unused levels:
droplevels()— orfactor()again
Recycling
- Shorter vector recycled to match longer —
c(1,2,3) + c(10,20)gives11, 22, 13 - No error if lengths aren't multiples — just warning, easy to miss
- Single values recycle intentionally —
x + 1adds 1 to all elements
Data Frames vs Tibbles
- Tibble never converts strings to factors — safer defaults
- Tibble never drops dimensions —
df[, 1]stays tibble - Tibble prints better — shows type, doesn't flood console
as_tibble()to convert — fromtibbleordplyrpackage
Assignment
<-is idiomatic R —=works but avoided in style guides<<-assigns to parent environment — global assignment, usually a mistake->right assignment exists — rarely used, confusing
Scope
- Functions look up in parent environment — can accidentally use global variable
- Local variable shadows global — same name hides outer variable
local()creates isolated scope — variables don't leak out
Common Mistakes
TandFcan be overwritten — useTRUEandFALSEalways1:length(x)fails on empty x — givesc(1, 0), useseq_along(x)sample(5)vssample(c(5))— different! first gives 1:5 permutation- String splitting:
strsplit()returns list — even for single string