GitLab CI Optimizer
Analyze GitLab CI/CD pipeline configurations to find speed, cost, and reliability improvements. Examines .gitlab-ci.yml for caching gaps, missing parallelism, inefficient job ordering, bloated Docker images, redundant work, and misconfigured runners. Produces a concrete optimization plan with estimated time savings.
Use when: "speed up our CI", "pipeline takes too long", "optimize gitlab ci", "review our .gitlab-ci.yml", "reduce build costs", "fix flaky pipeline", or when a pipeline configuration needs improvement.
Analysis Steps
1. Parse Pipeline Structure
Read the .gitlab-ci.yml and any included files to build a complete picture:
# Find the main CI config
cat .gitlab-ci.yml
# Find all included CI files
grep -r "include:" .gitlab-ci.yml
find . -name "*.gitlab-ci.yml" -o -name ".gitlab-ci*.yml" | head -20
find . -path "*/.gitlab/ci/*.yml" | head -20
# Check for CI/CD variables defined in the file
grep -E "variables:" .gitlab-ci.yml
# List all jobs and their stages
grep -E "^[a-zA-Z_][a-zA-Z0-9_-]*:" .gitlab-ci.yml | grep -v "^#" | grep -v "stage:" | grep -v "variables:" | grep -v "include:" | grep -v "default:" | grep -v "workflow:"
For each job, extract: name, stage, runner tags, Docker image, needs/dependencies, cache/artifacts config, rules/conditions, estimated duration, and whether it runs on every commit or only specific branches.
2. Analyze Stage Dependencies
Map the execution flow to find bottlenecks:
Stage 1: build [job-a: 3min] [job-b: 5min]
↓ ↓
Stage 2: test [job-c: 8min, needs: job-a] [job-d: 2min, needs: job-b]
↓
Stage 3: deploy [job-e: 1min, needs: job-c, job-d]
Critical path: job-b (5m) → job-d (2m) → job-e (1m) = 8 min
OR: job-a (3m) → job-c (8m) → job-e (1m) = 12 min ← BOTTLENECK
Total wall time: 12 min (limited by the longest path)
Total compute time: 3 + 5 + 8 + 2 + 1 = 19 min (what you pay for)
Key questions:
- Which job is the longest on the critical path? (optimize this first)
- Are there jobs running sequentially that could run in parallel?
- Are there jobs in the same stage that have no actual dependency on each other?
3. Evaluate Caching
Check for these common caching problems:
Missing cache entirely:
# BAD: Downloads all dependencies every run
install:
script:
- npm ci
# GOOD: Cache node_modules between runs
install:
script:
- npm ci
cache:
key:
files:
- package-lock.json
paths:
- node_modules/
policy: pull-push
Cache key analysis:
| Key Strategy | When to Use | Invalidation |
|---|---|---|
$CI_COMMIT_REF_SLUG | Branch-specific caches | New branch = cold start |
files: [package-lock.json] | Dependency caches | Only when lockfile changes |
$CI_JOB_NAME | Job-specific caches | Never (manual clear) |
prefix: $CI_COMMIT_REF_SLUG + files: | Best of both worlds | Branch + lockfile change |
Cache policy optimization: Jobs that only READ the cache should use policy: pull (saves upload time). Only the install job should use policy: pull-push.
Cache vs Artifacts decision:
- Cache = best-effort, speeds up repeated runs, may not be available
- Artifacts = guaranteed, passes files between jobs in the same pipeline
- Rule: Use artifacts for build outputs that downstream jobs NEED. Use cache for dependencies that are expensive to re-download.
4. Optimize Docker Images
Problem: Using large base images
# BAD: 1.2 GB image, takes 45 seconds to pull
build:
image: node:18
# BETTER: 180 MB image, pulls in 5 seconds
build:
image: node:18-alpine
# BEST: Pre-built image with your dependencies baked in
build:
image: registry.gitlab.com/my-org/ci-images/node:18
Create custom CI images when before_script takes >30s or you install system packages every run. Use GitLab's dependency proxy (${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}/node:18-alpine) to cache Docker Hub images and avoid rate limits.
5. Implement DAG Dependencies
Replace stage-based ordering with needs: for parallel execution:
# Without DAG: test-frontend waits for ALL build jobs (including build-backend)
# With DAG: test-frontend starts immediately after build-frontend finishes
test-frontend:
stage: test
needs: [build-frontend] # Start as soon as build-frontend finishes
test-backend:
stage: test
needs: [build-backend] # Start as soon as build-backend finishes
deploy:
stage: deploy
needs: [test-frontend, test-backend] # Start when BOTH tests pass
Impact: A pipeline with 5 jobs (3+5+8+2+1 min) drops from 19 min wall time to 12 min (37% faster).
6. Apply Parallelism
Test splitting with parallel:
test:
stage: test
parallel: 4
script:
- TOTAL=$CI_NODE_TOTAL
- INDEX=$CI_NODE_INDEX
# Split test files across parallel jobs
- |
TEST_FILES=$(find tests/ -name "*.test.js" | sort | awk "NR % $TOTAL == $INDEX")
npx jest $TEST_FILES
artifacts:
reports:
junit: junit.xml
Use parallel:matrix for multi-environment testing (e.g., NODE_VERSION: ["16", "18", "20"] x DB: ["postgres", "mysql"]).
7. Reduce Redundant Work
Three key techniques:
- Path-based filtering — only run jobs when relevant files change:
test-frontend:
rules:
- changes: ["frontend/**/*", "package-lock.json"]
when: always
- when: never
- Auto-cancel outdated pipelines:
workflow:
auto_cancel:
on_new_commit: interruptible
- Pass artifacts, don't rebuild — use
artifacts: paths: [dist/]withexpire_in: 1 hourandneeds:in downstream jobs.
8. Optimize Runner Configuration
Match runner size to job requirements:
| Job Type | Recommended Size | Why |
|---|---|---|
| Build (compiled) | large (4 CPU) | Compilation is CPU-bound |
| Unit tests | medium (2 CPU) | Moderate CPU, moderate RAM |
| Lint/format | small (1 CPU) | Trivial compute |
| Integration tests | large (4 CPU) | Runs services, needs RAM |
| Deploy | small (1 CPU) | Just runs scripts/API calls |
9. Common Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
apt-get install in every job | 30-60s wasted per job | Bake into custom Docker image |
No cache: on dependency install | Downloads 500MB+ every run | Add cache with lockfile key |
| All jobs in one stage | Maximum sequential execution | Split into stages + use DAG |
artifacts: paths: ["."] | Uploads entire repo as artifact | Only artifact what downstream needs |
when: manual without allow_failure | Blocks entire pipeline | Add allow_failure: true for optional manual jobs |
No expire_in on artifacts | Storage grows forever | Set expire_in: 1 day for CI artifacts |
Using only/except instead of rules | Confusing precedence, deprecated | Migrate to rules: syntax |
retry: 2 on flaky tests | Masks real problems, slows pipeline | Fix the flaky test, don't retry |
GIT_STRATEGY: clone (default) | Full clone every time | Use GIT_STRATEGY: fetch or GIT_DEPTH: 20 |
| Monorepo without path filtering | Every change triggers all jobs | Use rules: changes: per component |
Output Format
# GitLab CI Optimization Report
## Pipeline Overview
- **Stages:** {list} | **Jobs:** {count} | **Wall Time:** {duration}
- **Critical Path:** {job-a -> job-b -> job-c}
## Findings (ranked by impact)
### 1. {Finding Title} — Impact: {High|Medium|Low}, saves ~{X} min/pipeline
- **Current:** {what it does now}
- **Recommended:** {what it should do}
## Estimated Savings
- Wall time: {X} min -> {Y} min ({Z}% reduction)
- Monthly cost: ${X} -> ${Y} (${Z} saved)
- Developer wait time saved: {hours}/day
Tips
- Measure before optimizing — get baseline pipeline duration from GitLab's CI/CD analytics
- Optimize the critical path first — speeding up non-critical-path jobs saves compute cost but not wall time
- Use
interruptible: trueon all jobs except deploy — auto-cancels old pipelines when new commits arrive - Set
GIT_DEPTH: 20globally to avoid full clones (unless you need full git history) - Use
rules:changes:in monorepos to skip unaffected jobs — this is the single biggest optimization for monorepos - Merge
before_scriptinto custom Docker images when the commands don't change between runs - Profile your scripts — add
timeprefix to commands to find which step is slow - Check
artifacts:expire_inon all jobs — unlimited artifacts eat storage and slow uploads - Consider GitLab's
parallelkeyword to split test suites — 4x parallelism = ~3.5x speedup (Amdahl's law) - Use dependency proxy for Docker Hub images to avoid rate limiting and speed up pulls