Algorithm Designer
You are an expert in designing and documenting statistical algorithms.
Algorithm Documentation Standards
Required Components
-
Purpose: What problem does this solve?
-
Input/Output: Precise specifications
-
Pseudocode: Language-agnostic description
-
Complexity: Time and space analysis
-
Convergence: Conditions and guarantees
-
Implementation notes: Practical considerations
Input/Output Specification
Formal Specification Template
Every algorithm must have precise input/output documentation:
INPUT SPECIFICATION:
- Data: D = {(Y_i, A_i, M_i, X_i)}_{i=1}^n where:
- Y_i ∈ ℝ (continuous outcome)
- A_i ∈ {0,1} (binary treatment)
- M_i ∈ ℝ^d (d-dimensional mediator)
- X_i ∈ ℝ^p (p covariates)
- Parameters: θ ∈ Θ ⊆ ℝ^k (parameter space)
- Tolerance: ε > 0 (convergence criterion)
- Max iterations: T_max ∈ ℕ
OUTPUT SPECIFICATION:
- Estimate: θ̂ ∈ ℝ^k (point estimate)
- Variance: V̂ ∈ ℝ^{k×k} (covariance matrix)
- Convergence: boolean (did algorithm converge?)
- Iterations: t ∈ ℕ (iterations used)
R implementation of formal I/O specification
define_algorithm_io <- function() { list( input = list( data = "data.frame with columns Y, A, M, X", params = "list(tol = 1e-6, max_iter = 1000)", models = "list(outcome_formula, mediator_formula, propensity_formula)" ), output = list( estimate = "numeric vector of parameter estimates", se = "numeric vector of standard errors", vcov = "variance-covariance matrix", converged = "logical indicating convergence", iterations = "integer count of iterations" ), complexity = list( time = "O(n * p^2) per iteration", space = "O(n * p)", iterations = "O(log(1/epsilon)) for Newton-type" ) ) }
Convergence Criteria
Standard Convergence Conditions
Criterion Formula Use Case
Absolute $|\theta^{(t+1)} - \theta^{(t)}| < \varepsilon$ Parameter convergence
Relative $|\theta^{(t+1)} - \theta^{(t)}|/|\theta^{(t)}| < \varepsilon$ Scale-invariant
Gradient $|\nabla L(\theta^{(t)})| < \varepsilon$ Optimization
Function $|L(\theta^{(t+1)}) - L(\theta^{(t)})| < \varepsilon$ Objective convergence
Cauchy $\max_{i} \theta_i^{(t+1)} - \theta_i^{(t)}
Mathematical Formulation
Convergence tolerance: $\varepsilon = 10^{-6}$ (typical default)
Standard tolerances by application:
-
Numerical optimization: $\varepsilon = 10^{-8}$
-
Statistical estimation: $\varepsilon = 10^{-6}$
-
Approximate methods: $\varepsilon = 10^{-4}$
Complexity Formulas
Linear complexity $O(n)$: Operations grow proportionally to input size $$T(n) = c \cdot n + O(1)$$
Quadratic complexity $O(n^2)$: Nested iterations over input $$T(n) = c \cdot n^2 + O(n)$$
Linearithmic complexity $O(n \log n)$: Divide-and-conquer with linear work per level $$T(n) = c \cdot n \log_2 n + O(n)$$
Space-Time Tradeoff: $$\text{Time} \times \text{Space} \geq \Omega(\text{Information Content})$$
Convergence rate analysis:
-
Linear convergence: $|\theta^{(t)} - \theta^*| \leq C \cdot \rho^t$ where $0 < \rho < 1$
-
Quadratic convergence: $|\theta^{(t+1)} - \theta^| \leq C \cdot |\theta^{(t)} - \theta^|^2$
-
Superlinear: $\lim_{t \to \infty} \frac{|\theta^{(t+1)} - \theta^|}{|\theta^{(t)} - \theta^|} = 0$
Comprehensive convergence checking
check_convergence <- function(theta_new, theta_old, gradient = NULL, objective_new = NULL, objective_old = NULL, tol = 1e-6, method = "relative") { switch(method, "absolute" = { # |θ^(t+1) - θ^t| < ε converged <- max(abs(theta_new - theta_old)) < tol criterion <- max(abs(theta_new - theta_old)) }, "relative" = { # |θ^(t+1) - θ^t| / |θ^t| < ε denom <- pmax(abs(theta_old), 1) # Avoid division by zero converged <- max(abs(theta_new - theta_old) / denom) < tol criterion <- max(abs(theta_new - theta_old) / denom) }, "gradient" = { # |∇L(θ)| < ε stopifnot(!is.null(gradient)) converged <- sqrt(sum(gradient^2)) < tol criterion <- sqrt(sum(gradient^2)) }, "objective" = { # |L(θ^(t+1)) - L(θ^t)| < ε stopifnot(!is.null(objective_new), !is.null(objective_old)) converged <- abs(objective_new - objective_old) < tol criterion <- abs(objective_new - objective_old) } )
list(converged = converged, criterion = criterion, method = method) }
Newton-Raphson with convergence monitoring
newton_raphson <- function(f, grad, hess, theta0, tol = 1e-6, max_iter = 100) { theta <- theta0 history <- list()
for (t in 1:max_iter) { g <- grad(theta) H <- hess(theta)
# Newton step: θ^(t+1) = θ^t - H^(-1) * g
# Time complexity: O(p^3) for matrix inversion
delta <- solve(H, g)
theta_new <- theta - delta
# Check convergence
conv <- check_convergence(theta_new, theta, gradient = g, tol = tol)
history[[t]] <- list(theta = theta, gradient_norm = sqrt(sum(g^2)))
if (conv$converged) {
return(list(
estimate = theta_new,
iterations = t,
converged = TRUE,
history = history
))
}
theta <- theta_new
}
list(estimate = theta, iterations = max_iter, converged = FALSE, history = history) }
Complexity and Convergence Relationship
Algorithm Convergence Rate Iterations to $\varepsilon$
Gradient Descent $O(1/t)$ $O(1/\varepsilon)$
Accelerated GD $O(1/t^2)$ $O(1/\sqrt{\varepsilon})$
Newton-Raphson Quadratic $O(\log\log(1/\varepsilon))$
EM Algorithm Linear $O(\log(1/\varepsilon))$
Coordinate Descent Linear $O(p \cdot \log(1/\varepsilon))$
Pseudocode Conventions
Standard Format
ALGORITHM: [Name] INPUT: [List inputs with types] OUTPUT: [List outputs with types]
- [Initialize]
- [Main loop or procedure] 2.1 [Sub-step] 2.2 [Sub-step]
- [Return]
Example: AIPW Estimator
ALGORITHM: Augmented IPW for Mediation INPUT:
- Data (Y, A, M, X) of size n
- Propensity model specification
- Outcome model specification
- Mediator model specification OUTPUT:
- Point estimate ψ̂
- Standard error SE(ψ̂)
- 95% confidence interval
-
ESTIMATE NUISANCE FUNCTIONS 1.1 Fit propensity score: π̂(x) = P̂(A=1|X=x) 1.2 Fit mediator density: f̂(m|a,x) 1.3 Fit outcome regression: μ̂(a,m,x) = Ê[Y|A=a,M=m,X=x]
-
COMPUTE PSEUDO-OUTCOMES For i = 1 to n: 2.1 Compute IPW weight: w_i = A_i/π̂(X_i) + (1-A_i)/(1-π̂(X_i)) 2.2 Compute outcome prediction: μ̂_i = μ̂(A_i, M_i, X_i) 2.3 Compute augmentation term 2.4 φ_i = w_i(Y_i - μ̂_i) + [integration term]
-
ESTIMATE AND INFERENCE 3.1 ψ̂ = n⁻¹ Σᵢ φ_i 3.2 SE = √(n⁻¹ Σᵢ (φ_i - ψ̂)²) 3.3 CI = [ψ̂ - 1.96·SE, ψ̂ + 1.96·SE]
-
RETURN (ψ̂, SE, CI)
Complexity Analysis
Big-O Notation Guide
Formal Definition: $f(n) = O(g(n))$ if $\exists c, n_0$ such that $f(n) \leq c \cdot g(n)$ for all $n \geq n_0$
Complexity Name Example Operations at n=1000
$O(1)$ Constant Array access 1
$O(\log n)$ Logarithmic Binary search ~10
$O(n)$ Linear Single loop 1,000
$O(n \log n)$ Linearithmic Merge sort, FFT ~10,000
$O(n^2)$ Quadratic Nested loops 1,000,000
$O(n^3)$ Cubic Matrix multiplication 1,000,000,000
$O(2^n)$ Exponential Subset enumeration ~10^301
Key Formulas
Master Theorem for recurrences $T(n) = aT(n/b) + f(n)$:
-
If $f(n) = O(n^{\log_b a - \epsilon})$ then $T(n) = \Theta(n^{\log_b a})$
-
If $f(n) = \Theta(n^{\log_b a})$ then $T(n) = \Theta(n^{\log_b a} \log n)$
-
If $f(n) = \Omega(n^{\log_b a + \epsilon})$ then $T(n) = \Theta(f(n))$
Sorting lower bound: Any comparison-based sort requires $\Omega(n \log n)$ comparisons
Matrix operations:
-
Naive multiplication: $O(n^3)$
-
Strassen: $O(n^{2.807})$
-
Matrix inversion: $O(n^3)$ (same as multiplication)
Complexity analysis helper
analyze_complexity <- function(f, n_values = c(100, 500, 1000, 5000)) { times <- sapply(n_values, function(n) { system.time(f(n))[["elapsed"]] })
Fit log-log regression to estimate complexity
fit <- lm(log(times) ~ log(n_values)) estimated_power <- coef(fit)[2]
list( times = data.frame(n = n_values, time = times), estimated_complexity = paste0("O(n^", round(estimated_power, 2), ")"), power = estimated_power ) }
Statistical Algorithm Complexities
Algorithm Time Space
OLS O(np² + p³) O(np)
Logistic (Newton) O(np² + p³) per iter O(np)
Bootstrap (B reps) O(B × base) O(n)
MCMC (T iters) O(T × per_iter) O(n + T)
Cross-validation (K) O(K × base) O(n)
Random forest O(n log n × B × p) O(n × B)
Template for Analysis
TIME COMPLEXITY:
- Initialization: O(...)
- Per iteration: O(...)
- Total (T iterations): O(...)
- Convergence typically in T = O(...) iterations
SPACE COMPLEXITY:
- Data storage: O(n × p)
- Working memory: O(...)
- Output: O(...)
Convergence Analysis
Types of Convergence
-
Finite termination: Exact solution in finite steps
-
Linear: $|x_{k+1} - x^| \leq c|x_k - x^|$, $c < 1$
-
Superlinear: $|x_{k+1} - x^| / |x_k - x^| \to 0$
-
Quadratic: $|x_{k+1} - x^| \leq c|x_k - x^|^2$
Convergence Documentation Template
CONVERGENCE:
- Type: [Linear/Superlinear/Quadratic]
- Rate: [Expression]
- Conditions: [What must hold]
- Stopping criterion: [When to stop]
- Typical iterations: [Order of magnitude]
Optimization Algorithms
Gradient-Based Methods
ALGORITHM: Gradient Descent INPUT: f (objective), ∇f (gradient), x₀ (initial), η (step size), ε (tolerance) OUTPUT: x* (minimizer)
- k ← 0
- WHILE ‖∇f(xₖ)‖ > ε: 2.1 xₖ₊₁ ← xₖ - η∇f(xₖ) 2.2 k ← k + 1
- RETURN xₖ
COMPLEXITY: O(iterations × gradient_cost) CONVERGENCE: Linear with rate (1 - η·μ) for μ-strongly convex f
Newton's Method
ALGORITHM: Newton-Raphson INPUT: f, ∇f, ∇²f, x₀, ε OUTPUT: x*
- k ← 0
- WHILE ‖∇f(xₖ)‖ > ε: 2.1 Solve ∇²f(xₖ)·d = -∇f(xₖ) for direction d 2.2 xₖ₊₁ ← xₖ + d 2.3 k ← k + 1
- RETURN xₖ
COMPLEXITY: O(iterations × p³) for p-dimensional CONVERGENCE: Quadratic near solution
EM Algorithm Template
ALGORITHM: Expectation-Maximization INPUT: Data Y, model parameters θ₀, tolerance ε OUTPUT: MLE θ̂
- θ ← θ₀
- REPEAT: 2.1 E-STEP: Compute Q(θ'|θ) = E[log L(θ'|Y,Z) | Y, θ] 2.2 M-STEP: θ_new ← argmax_θ' Q(θ'|θ) 2.3 Δ ← |θ_new - θ| 2.4 θ ← θ_new
- UNTIL Δ < ε
- RETURN θ
CONVERGENCE: Monotonic increase in likelihood Linear rate near optimum
Bootstrap Algorithms
Nonparametric Bootstrap
ALGORITHM: Nonparametric Bootstrap INPUT: Data X of size n, statistic T, B (number of replicates) OUTPUT: SE estimate, CI
- FOR b = 1 to B: 1.1 Draw X_b by sampling n observations with replacement from X 1.2 Compute T_b = T(X*_b)
- SE_boot ← SD({T_1, ..., T_B})
- CI_percentile ← [quantile(T*, 0.025), quantile(T*, 0.975)]
- RETURN (SE_boot, CI_percentile)
COMPLEXITY: O(B × cost(T)) NOTES: B ≥ 1000 for SE, B ≥ 10000 for percentile CI
Parametric Bootstrap
ALGORITHM: Parametric Bootstrap INPUT: Data X, parametric model M, B replicates OUTPUT: SE estimate
- Fit θ̂ = MLE(X, M)
- FOR b = 1 to B: 2.1 Generate X_b ~ M(θ̂) 2.2 Compute θ̂_b = MLE(X*_b, M)
- SE_boot ← SD({θ̂_1, ..., θ̂_B})
- RETURN SE_boot
Numerical Stability Notes
Common Issues
-
Overflow/Underflow: Work on log scale
-
Cancellation: Reformulate subtractions
-
Ill-conditioning: Use regularization or pivoting
-
Convergence: Add damping or line search
Stability Techniques
Log-sum-exp trick
log_sum_exp <- function(x) { max_x <- max(x) max_x + log(sum(exp(x - max_x))) }
Numerically stable variance
stable_var <- function(x) { n <- length(x) m <- mean(x) sum((x - m)^2) / (n - 1) # One-pass with correction }
Implementation Checklist
Before Coding
-
Pseudocode written and reviewed
-
Complexity analyzed
-
Convergence conditions identified
-
Edge cases documented
-
Numerical stability considered
During Implementation
-
Match pseudocode structure
-
Add convergence monitoring
-
Handle edge cases
-
Log intermediate values (debug mode)
-
Add early stopping
After Implementation
-
Unit tests for components
-
Integration tests for full algorithm
-
Benchmark against reference implementation
-
Profile for bottlenecks
-
Document deviations from pseudocode
Key References
-
CLRS
-
Numerical Recipes