GitLab CI Optimizer

Analyze GitLab CI/CD pipeline configurations to find speed, cost, and reliability improvements. Examines .gitlab-ci.yml for caching gaps, missing parallelism, inefficient job ordering, bloated Docker images, redundant work, and misconfigured runners. Produces a concrete optimization plan with estimated time savings.

Use when: "speed up our CI", "pipeline takes too long", "optimize gitlab ci", "review our .gitlab-ci.yml", "reduce build costs", "fix flaky pipeline", or when a pipeline configuration needs improvement.

Analysis Steps

1. Parse Pipeline Structure

Read the .gitlab-ci.yml and any included files to build a complete picture:

# Find the main CI config
cat .gitlab-ci.yml

# Find all included CI files
grep -r "include:" .gitlab-ci.yml
find . -name "*.gitlab-ci.yml" -o -name ".gitlab-ci*.yml" | head -20
find . -path "*/.gitlab/ci/*.yml" | head -20

# Check for CI/CD variables defined in the file
grep -E "variables:" .gitlab-ci.yml

# List all jobs and their stages
grep -E "^[a-zA-Z_][a-zA-Z0-9_-]*:" .gitlab-ci.yml | grep -v "^#" | grep -v "stage:" | grep -v "variables:" | grep -v "include:" | grep -v "default:" | grep -v "workflow:"

For each job, extract: name, stage, runner tags, Docker image, needs/dependencies, cache/artifacts config, rules/conditions, estimated duration, and whether it runs on every commit or only specific branches.

2. Analyze Stage Dependencies

Map the execution flow to find bottlenecks:

Stage 1: build       [job-a: 3min] [job-b: 5min]
                      ↓              ↓
Stage 2: test        [job-c: 8min, needs: job-a] [job-d: 2min, needs: job-b]
                      ↓
Stage 3: deploy      [job-e: 1min, needs: job-c, job-d]

Critical path: job-b (5m) → job-d (2m) → job-e (1m) = 8 min
              OR: job-a (3m) → job-c (8m) → job-e (1m) = 12 min ← BOTTLENECK

Total wall time: 12 min (limited by the longest path)
Total compute time: 3 + 5 + 8 + 2 + 1 = 19 min (what you pay for)

Key questions:

Which job is the longest on the critical path? (optimize this first)
Are there jobs running sequentially that could run in parallel?
Are there jobs in the same stage that have no actual dependency on each other?

3. Evaluate Caching

Check for these common caching problems:

Missing cache entirely:

# BAD: Downloads all dependencies every run
install:
  script:
    - npm ci

# GOOD: Cache node_modules between runs
install:
  script:
    - npm ci
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
    policy: pull-push

Cache key analysis:

Key Strategy	When to Use	Invalidation
`$CI_COMMIT_REF_SLUG`	Branch-specific caches	New branch = cold start
`files: [package-lock.json]`	Dependency caches	Only when lockfile changes
`$CI_JOB_NAME`	Job-specific caches	Never (manual clear)
`prefix: $CI_COMMIT_REF_SLUG + files:`	Best of both worlds	Branch + lockfile change

Cache policy optimization: Jobs that only READ the cache should use policy: pull (saves upload time). Only the install job should use policy: pull-push.

Cache vs Artifacts decision:

Cache = best-effort, speeds up repeated runs, may not be available
Artifacts = guaranteed, passes files between jobs in the same pipeline
Rule: Use artifacts for build outputs that downstream jobs NEED. Use cache for dependencies that are expensive to re-download.

4. Optimize Docker Images

Problem: Using large base images

# BAD: 1.2 GB image, takes 45 seconds to pull
build:
  image: node:18

# BETTER: 180 MB image, pulls in 5 seconds
build:
  image: node:18-alpine

# BEST: Pre-built image with your dependencies baked in
build:
  image: registry.gitlab.com/my-org/ci-images/node:18

Create custom CI images when before_script takes >30s or you install system packages every run. Use GitLab's dependency proxy (${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}/node:18-alpine) to cache Docker Hub images and avoid rate limits.

5. Implement DAG Dependencies

Replace stage-based ordering with needs: for parallel execution:

# Without DAG: test-frontend waits for ALL build jobs (including build-backend)
# With DAG: test-frontend starts immediately after build-frontend finishes

test-frontend:
  stage: test
  needs: [build-frontend]     # Start as soon as build-frontend finishes

test-backend:
  stage: test
  needs: [build-backend]      # Start as soon as build-backend finishes

deploy:
  stage: deploy
  needs: [test-frontend, test-backend]  # Start when BOTH tests pass

Impact: A pipeline with 5 jobs (3+5+8+2+1 min) drops from 19 min wall time to 12 min (37% faster).

6. Apply Parallelism

Test splitting with parallel:

test:
  stage: test
  parallel: 4
  script:
    - TOTAL=$CI_NODE_TOTAL
    - INDEX=$CI_NODE_INDEX
    # Split test files across parallel jobs
    - |
      TEST_FILES=$(find tests/ -name "*.test.js" | sort | awk "NR % $TOTAL == $INDEX")
      npx jest $TEST_FILES
  artifacts:
    reports:
      junit: junit.xml

Use parallel:matrix for multi-environment testing (e.g., NODE_VERSION: ["16", "18", "20"] x DB: ["postgres", "mysql"]).

7. Reduce Redundant Work

Three key techniques:

Path-based filtering — only run jobs when relevant files change:

test-frontend:
  rules:
    - changes: ["frontend/**/*", "package-lock.json"]
      when: always
    - when: never

Auto-cancel outdated pipelines:

workflow:
  auto_cancel:
    on_new_commit: interruptible

Pass artifacts, don't rebuild — use artifacts: paths: [dist/] with expire_in: 1 hour and needs: in downstream jobs.

8. Optimize Runner Configuration

Match runner size to job requirements:

Job Type	Recommended Size	Why
Build (compiled)	large (4 CPU)	Compilation is CPU-bound
Unit tests	medium (2 CPU)	Moderate CPU, moderate RAM
Lint/format	small (1 CPU)	Trivial compute
Integration tests	large (4 CPU)	Runs services, needs RAM
Deploy	small (1 CPU)	Just runs scripts/API calls

9. Common Anti-Patterns

Anti-Pattern	Problem	Fix
`apt-get install` in every job	30-60s wasted per job	Bake into custom Docker image
No `cache:` on dependency install	Downloads 500MB+ every run	Add cache with lockfile key
All jobs in one stage	Maximum sequential execution	Split into stages + use DAG
`artifacts: paths: ["."]`	Uploads entire repo as artifact	Only artifact what downstream needs
`when: manual` without `allow_failure`	Blocks entire pipeline	Add `allow_failure: true` for optional manual jobs
No `expire_in` on artifacts	Storage grows forever	Set `expire_in: 1 day` for CI artifacts
Using `only/except` instead of `rules`	Confusing precedence, deprecated	Migrate to `rules:` syntax
`retry: 2` on flaky tests	Masks real problems, slows pipeline	Fix the flaky test, don't retry
`GIT_STRATEGY: clone` (default)	Full clone every time	Use `GIT_STRATEGY: fetch` or `GIT_DEPTH: 20`
Monorepo without path filtering	Every change triggers all jobs	Use `rules: changes:` per component

Output Format

# GitLab CI Optimization Report

## Pipeline Overview
- **Stages:** {list}  |  **Jobs:** {count}  |  **Wall Time:** {duration}
- **Critical Path:** {job-a -> job-b -> job-c}

## Findings (ranked by impact)
### 1. {Finding Title} — Impact: {High|Medium|Low}, saves ~{X} min/pipeline
- **Current:** {what it does now}
- **Recommended:** {what it should do}

## Estimated Savings
- Wall time: {X} min -> {Y} min ({Z}% reduction)
- Monthly cost: ${X} -> ${Y} (${Z} saved)
- Developer wait time saved: {hours}/day

Tips

Measure before optimizing — get baseline pipeline duration from GitLab's CI/CD analytics
Optimize the critical path first — speeding up non-critical-path jobs saves compute cost but not wall time
Use interruptible: true on all jobs except deploy — auto-cancels old pipelines when new commits arrive
Set GIT_DEPTH: 20 globally to avoid full clones (unless you need full git history)
Use rules:changes: in monorepos to skip unaffected jobs — this is the single biggest optimization for monorepos
Merge before_script into custom Docker images when the commands don't change between runs
Profile your scripts — add time prefix to commands to find which step is slow
Check artifacts:expire_in on all jobs — unlimited artifacts eat storage and slow uploads
Consider GitLab's parallel keyword to split test suites — 4x parallelism = ~3.5x speedup (Amdahl's law)
Use dependency proxy for Docker Hub images to avoid rate limiting and speed up pulls