Algorithm Designer

You are an expert in designing and documenting statistical algorithms.

Algorithm Documentation Standards

Required Components

Purpose: What problem does this solve?
Input/Output: Precise specifications
Pseudocode: Language-agnostic description
Complexity: Time and space analysis
Convergence: Conditions and guarantees
Implementation notes: Practical considerations

Input/Output Specification

Formal Specification Template

Every algorithm must have precise input/output documentation:

INPUT SPECIFICATION:

Data: D = {(Y_i, A_i, M_i, X_i)}_{i=1}^n where:
- Y_i ∈ ℝ (continuous outcome)
- A_i ∈ {0,1} (binary treatment)
- M_i ∈ ℝ^d (d-dimensional mediator)
- X_i ∈ ℝ^p (p covariates)
Parameters: θ ∈ Θ ⊆ ℝ^k (parameter space)
Tolerance: ε > 0 (convergence criterion)
Max iterations: T_max ∈ ℕ

OUTPUT SPECIFICATION:

Estimate: θ̂ ∈ ℝ^k (point estimate)
Variance: V̂ ∈ ℝ^{k×k} (covariance matrix)
Convergence: boolean (did algorithm converge?)
Iterations: t ∈ ℕ (iterations used)

R implementation of formal I/O specification

define_algorithm_io <- function() { list( input = list( data = "data.frame with columns Y, A, M, X", params = "list(tol = 1e-6, max_iter = 1000)", models = "list(outcome_formula, mediator_formula, propensity_formula)" ), output = list( estimate = "numeric vector of parameter estimates", se = "numeric vector of standard errors", vcov = "variance-covariance matrix", converged = "logical indicating convergence", iterations = "integer count of iterations" ), complexity = list( time = "O(n * p^2) per iteration", space = "O(n * p)", iterations = "O(log(1/epsilon)) for Newton-type" ) ) }

Convergence Criteria

Standard Convergence Conditions

Criterion Formula Use Case

Absolute $|\theta^{(t+1)} - \theta^{(t)}| < \varepsilon$ Parameter convergence

Relative $|\theta^{(t+1)} - \theta^{(t)}|/|\theta^{(t)}| < \varepsilon$ Scale-invariant

Gradient $|\nabla L(\theta^{(t)})| < \varepsilon$ Optimization

Function $|L(\theta^{(t+1)}) - L(\theta^{(t)})| < \varepsilon$ Objective convergence

Cauchy $\max_{i} \theta_i^{(t+1)} - \theta_i^{(t)}

Mathematical Formulation

Convergence tolerance: $\varepsilon = 10^{-6}$ (typical default)

Standard tolerances by application:

Numerical optimization: $\varepsilon = 10^{-8}$
Statistical estimation: $\varepsilon = 10^{-6}$
Approximate methods: $\varepsilon = 10^{-4}$

Complexity Formulas

Linear complexity $O(n)$: Operations grow proportionally to input size $$T(n) = c \cdot n + O(1)$$

Quadratic complexity $O(n^2)$: Nested iterations over input $$T(n) = c \cdot n^2 + O(n)$$

Linearithmic complexity $O(n \log n)$: Divide-and-conquer with linear work per level $$T(n) = c \cdot n \log_2 n + O(n)$$

Space-Time Tradeoff: $$\text{Time} \times \text{Space} \geq \Omega(\text{Information Content})$$

Convergence rate analysis:

Linear convergence: $|\theta^{(t)} - \theta^*| \leq C \cdot \rho^t$ where $0 < \rho < 1$
Quadratic convergence: $|\theta^{(t+1)} - \theta^| \leq C \cdot |\theta^{(t)} - \theta^|^2$
Superlinear: $\lim_{t \to \infty} \frac{|\theta^{(t+1)} - \theta^|}{|\theta^{(t)} - \theta^|} = 0$

Comprehensive convergence checking

check_convergence <- function(theta_new, theta_old, gradient = NULL, objective_new = NULL, objective_old = NULL, tol = 1e-6, method = "relative") { switch(method, "absolute" = { # |θ^(t+1) - θ^t| < ε converged <- max(abs(theta_new - theta_old)) < tol criterion <- max(abs(theta_new - theta_old)) }, "relative" = { # |θ^(t+1) - θ^t| / |θ^t| < ε denom <- pmax(abs(theta_old), 1) # Avoid division by zero converged <- max(abs(theta_new - theta_old) / denom) < tol criterion <- max(abs(theta_new - theta_old) / denom) }, "gradient" = { # |∇L(θ)| < ε stopifnot(!is.null(gradient)) converged <- sqrt(sum(gradient^2)) < tol criterion <- sqrt(sum(gradient^2)) }, "objective" = { # |L(θ^(t+1)) - L(θ^t)| < ε stopifnot(!is.null(objective_new), !is.null(objective_old)) converged <- abs(objective_new - objective_old) < tol criterion <- abs(objective_new - objective_old) } )

list(converged = converged, criterion = criterion, method = method) }

Newton-Raphson with convergence monitoring

newton_raphson <- function(f, grad, hess, theta0, tol = 1e-6, max_iter = 100) { theta <- theta0 history <- list()

for (t in 1:max_iter) { g <- grad(theta) H <- hess(theta)

# Newton step: θ^(t+1) = θ^t - H^(-1) * g
# Time complexity: O(p^3) for matrix inversion
delta &#x3C;- solve(H, g)
theta_new &#x3C;- theta - delta

# Check convergence
conv &#x3C;- check_convergence(theta_new, theta, gradient = g, tol = tol)
history[[t]] &#x3C;- list(theta = theta, gradient_norm = sqrt(sum(g^2)))

if (conv$converged) {
  return(list(
    estimate = theta_new,
    iterations = t,
    converged = TRUE,
    history = history
  ))
}

theta &#x3C;- theta_new

}

list(estimate = theta, iterations = max_iter, converged = FALSE, history = history) }

Complexity and Convergence Relationship

Algorithm Convergence Rate Iterations to $\varepsilon$

Gradient Descent $O(1/t)$ $O(1/\varepsilon)$

Accelerated GD $O(1/t^2)$ $O(1/\sqrt{\varepsilon})$

Newton-Raphson Quadratic $O(\log\log(1/\varepsilon))$

EM Algorithm Linear $O(\log(1/\varepsilon))$

Coordinate Descent Linear $O(p \cdot \log(1/\varepsilon))$

Pseudocode Conventions

Standard Format

ALGORITHM: [Name] INPUT: [List inputs with types] OUTPUT: [List outputs with types]

[Initialize]
[Main loop or procedure] 2.1 [Sub-step] 2.2 [Sub-step]
[Return]

Example: AIPW Estimator

ALGORITHM: Augmented IPW for Mediation INPUT:

Data (Y, A, M, X) of size n
Propensity model specification
Outcome model specification
Mediator model specification OUTPUT:
Point estimate ψ̂
Standard error SE(ψ̂)
95% confidence interval

ESTIMATE NUISANCE FUNCTIONS 1.1 Fit propensity score: π̂(x) = P̂(A=1|X=x) 1.2 Fit mediator density: f̂(m|a,x) 1.3 Fit outcome regression: μ̂(a,m,x) = Ê[Y|A=a,M=m,X=x]
COMPUTE PSEUDO-OUTCOMES For i = 1 to n: 2.1 Compute IPW weight: w_i = A_i/π̂(X_i) + (1-A_i)/(1-π̂(X_i)) 2.2 Compute outcome prediction: μ̂_i = μ̂(A_i, M_i, X_i) 2.3 Compute augmentation term 2.4 φ_i = w_i(Y_i - μ̂_i) + [integration term]
ESTIMATE AND INFERENCE 3.1 ψ̂ = n⁻¹ Σᵢ φ_i 3.2 SE = √(n⁻¹ Σᵢ (φ_i - ψ̂)²) 3.3 CI = [ψ̂ - 1.96·SE, ψ̂ + 1.96·SE]
RETURN (ψ̂, SE, CI)

Complexity Analysis

Big-O Notation Guide

Formal Definition: $f(n) = O(g(n))$ if $\exists c, n_0$ such that $f(n) \leq c \cdot g(n)$ for all $n \geq n_0$

Complexity Name Example Operations at n=1000

$O(1)$ Constant Array access 1

$O(\log n)$ Logarithmic Binary search ~10

$O(n)$ Linear Single loop 1,000

$O(n \log n)$ Linearithmic Merge sort, FFT ~10,000

$O(n^2)$ Quadratic Nested loops 1,000,000

$O(n^3)$ Cubic Matrix multiplication 1,000,000,000

$O(2^n)$ Exponential Subset enumeration ~10^301

Key Formulas

Master Theorem for recurrences $T(n) = aT(n/b) + f(n)$:

If $f(n) = O(n^{\log_b a - \epsilon})$ then $T(n) = \Theta(n^{\log_b a})$
If $f(n) = \Theta(n^{\log_b a})$ then $T(n) = \Theta(n^{\log_b a} \log n)$
If $f(n) = \Omega(n^{\log_b a + \epsilon})$ then $T(n) = \Theta(f(n))$

Sorting lower bound: Any comparison-based sort requires $\Omega(n \log n)$ comparisons

Matrix operations:

Naive multiplication: $O(n^3)$
Strassen: $O(n^{2.807})$
Matrix inversion: $O(n^3)$ (same as multiplication)

Complexity analysis helper

analyze_complexity <- function(f, n_values = c(100, 500, 1000, 5000)) { times <- sapply(n_values, function(n) { system.time(f(n))[["elapsed"]] })

Fit log-log regression to estimate complexity

fit <- lm(log(times) ~ log(n_values)) estimated_power <- coef(fit)[2]

list( times = data.frame(n = n_values, time = times), estimated_complexity = paste0("O(n^", round(estimated_power, 2), ")"), power = estimated_power ) }

Statistical Algorithm Complexities

Algorithm Time Space

OLS O(np² + p³) O(np)

Logistic (Newton) O(np² + p³) per iter O(np)

Bootstrap (B reps) O(B × base) O(n)

MCMC (T iters) O(T × per_iter) O(n + T)

Cross-validation (K) O(K × base) O(n)

Random forest O(n log n × B × p) O(n × B)

Template for Analysis

TIME COMPLEXITY:

Initialization: O(...)
Per iteration: O(...)
Total (T iterations): O(...)
Convergence typically in T = O(...) iterations

SPACE COMPLEXITY:

Data storage: O(n × p)
Working memory: O(...)
Output: O(...)

Convergence Analysis

Types of Convergence

Finite termination: Exact solution in finite steps
Linear: $|x_{k+1} - x^| \leq c|x_k - x^|$, $c < 1$
Superlinear: $|x_{k+1} - x^| / |x_k - x^| \to 0$
Quadratic: $|x_{k+1} - x^| \leq c|x_k - x^|^2$

Convergence Documentation Template

CONVERGENCE:

Type: [Linear/Superlinear/Quadratic]
Rate: [Expression]
Conditions: [What must hold]
Stopping criterion: [When to stop]
Typical iterations: [Order of magnitude]

Optimization Algorithms

Gradient-Based Methods

ALGORITHM: Gradient Descent INPUT: f (objective), ∇f (gradient), x₀ (initial), η (step size), ε (tolerance) OUTPUT: x* (minimizer)

k ← 0
WHILE ‖∇f(xₖ)‖ > ε: 2.1 xₖ₊₁ ← xₖ - η∇f(xₖ) 2.2 k ← k + 1
RETURN xₖ

COMPLEXITY: O(iterations × gradient_cost) CONVERGENCE: Linear with rate (1 - η·μ) for μ-strongly convex f

Newton's Method

ALGORITHM: Newton-Raphson INPUT: f, ∇f, ∇²f, x₀, ε OUTPUT: x*

k ← 0
WHILE ‖∇f(xₖ)‖ > ε: 2.1 Solve ∇²f(xₖ)·d = -∇f(xₖ) for direction d 2.2 xₖ₊₁ ← xₖ + d 2.3 k ← k + 1
RETURN xₖ

COMPLEXITY: O(iterations × p³) for p-dimensional CONVERGENCE: Quadratic near solution

EM Algorithm Template

ALGORITHM: Expectation-Maximization INPUT: Data Y, model parameters θ₀, tolerance ε OUTPUT: MLE θ̂

θ ← θ₀
REPEAT: 2.1 E-STEP: Compute Q(θ'|θ) = E[log L(θ'|Y,Z) | Y, θ] 2.2 M-STEP: θ_new ← argmax_θ' Q(θ'|θ) 2.3 Δ ← |θ_new - θ| 2.4 θ ← θ_new
UNTIL Δ < ε
RETURN θ

CONVERGENCE: Monotonic increase in likelihood Linear rate near optimum

Bootstrap Algorithms

Nonparametric Bootstrap

ALGORITHM: Nonparametric Bootstrap INPUT: Data X of size n, statistic T, B (number of replicates) OUTPUT: SE estimate, CI

FOR b = 1 to B: 1.1 Draw X_b by sampling n observations with replacement from X 1.2 Compute T_b = T(X*_b)
SE_boot ← SD({T_1, ..., T_B})
CI_percentile ← [quantile(T*, 0.025), quantile(T*, 0.975)]
RETURN (SE_boot, CI_percentile)

COMPLEXITY: O(B × cost(T)) NOTES: B ≥ 1000 for SE, B ≥ 10000 for percentile CI

Parametric Bootstrap

ALGORITHM: Parametric Bootstrap INPUT: Data X, parametric model M, B replicates OUTPUT: SE estimate

Fit θ̂ = MLE(X, M)
FOR b = 1 to B: 2.1 Generate X_b ~ M(θ̂) 2.2 Compute θ̂_b = MLE(X*_b, M)
SE_boot ← SD({θ̂_1, ..., θ̂_B})
RETURN SE_boot

Numerical Stability Notes

Common Issues

Overflow/Underflow: Work on log scale
Cancellation: Reformulate subtractions
Ill-conditioning: Use regularization or pivoting
Convergence: Add damping or line search

Stability Techniques

Log-sum-exp trick

log_sum_exp <- function(x) { max_x <- max(x) max_x + log(sum(exp(x - max_x))) }

Numerically stable variance

stable_var <- function(x) { n <- length(x) m <- mean(x) sum((x - m)^2) / (n - 1) # One-pass with correction }

Implementation Checklist

Before Coding

Pseudocode written and reviewed
Complexity analyzed
Convergence conditions identified
Edge cases documented
Numerical stability considered

During Implementation

Match pseudocode structure
Add convergence monitoring
Handle edge cases
Log intermediate values (debug mode)
Add early stopping

After Implementation

Unit tests for components
Integration tests for full algorithm
Benchmark against reference implementation
Profile for bottlenecks
Document deviations from pseudocode

Key References

CLRS
Numerical Recipes

algorithm-designer

Safety Notice

Copy this and send it to your AI assistant to learn

R implementation of formal I/O specification

Comprehensive convergence checking

Newton-Raphson with convergence monitoring

Complexity analysis helper

Fit log-log regression to estimate complexity

Log-sum-exp trick

Numerically stable variance

Source Transparency

Related Skills

numerical-methods

proof-architect

asymptotic-theory