CI/CD in Apache Beam
Overview
Apache Beam uses GitHub Actions for CI/CD. Workflows are located in .github/workflows/ .
Workflow Types
PreCommit Workflows
-
Run on PRs and merges
-
Validate code changes before merge
-
Naming: beam_PreCommit_*.yml
PostCommit Workflows
-
Run after merge and on schedule
-
More comprehensive testing
-
Naming: beam_PostCommit_*.yml
Scheduled Workflows
-
Run nightly on master
-
Check for external dependency impacts
-
Tag master with nightly-master
Key Workflows
PreCommit
Workflow Description
beam_PreCommit_Java.yml
Java build and tests
beam_PreCommit_Python.yml
Python tests
beam_PreCommit_Go.yml
Go tests
beam_PreCommit_RAT.yml
License header checks
beam_PreCommit_Spotless.yml
Code formatting
PostCommit - Java
Workflow Description
beam_PostCommit_Java.yml
Full Java test suite
beam_PostCommit_Java_ValidatesRunner_*.yml
Runner validation tests
beam_PostCommit_Java_Examples_*.yml
Example pipeline tests
PostCommit - Python
Workflow Description
beam_PostCommit_Python.yml
Full Python test suite
beam_PostCommit_Python_ValidatesRunner_*.yml
Runner validation
beam_PostCommit_Python_Examples_*.yml
Examples
Load & Performance Tests
Workflow Description
beam_LoadTests_*.yml
Load testing
beam_PerformanceTests_*.yml
I/O performance
Triggering Tests
Automatic
-
PRs trigger PreCommit tests
-
Merges trigger PostCommit tests
Triggering Specific Workflows
Use trigger files to run specific workflows.
Workflow Dispatch
Most workflows support manual triggering via GitHub UI.
Understanding Test Results
Finding Logs
-
Go to PR → Checks tab
-
Click on failed workflow
-
Expand failed job
-
View step logs
Common Failure Patterns
Flaky Tests
-
Random failures unrelated to change
-
Solution: Use trigger files to re-run the specific workflow.
Timeout
-
Increase timeout in workflow if justified
-
Or optimize test
Resource Exhaustion
-
GCP quota issues
-
Check project settings
GCP Credentials
Workflows requiring GCP access use these secrets:
-
GCP_PROJECT_ID
-
Project ID (e.g., apache-beam-testing )
-
GCP_REGION
-
Region (e.g., us-central1 )
-
GCP_TESTING_BUCKET
-
Temp storage bucket
-
GCP_PYTHON_WHEELS_BUCKET
-
Python wheels bucket
-
GCP_SA_EMAIL
-
Service account email
-
GCP_SA_KEY
-
Base64-encoded service account key
Required IAM roles:
-
Storage Admin
-
Dataflow Admin
-
Artifact Registry Writer
-
BigQuery Data Editor
-
Service Account User
Self-hosted vs GitHub-hosted Runners
Self-hosted (majority of workflows)
-
Pre-configured with dependencies
-
GCP credentials pre-configured
-
Naming: beam_*.yml
GitHub-hosted
-
Used for cross-platform testing (Linux, macOS, Windows)
-
May need explicit credential setup
Workflow Structure
name: Workflow Name on: push: branches: [master] pull_request: branches: [master] schedule: - cron: '0 0 * * *' workflow_dispatch:
jobs: build: runs-on: [self-hosted, ...] steps: - uses: actions/checkout@v4 - name: Run Gradle run: ./gradlew :task:name
Local Debugging
Run Same Commands as CI
Check workflow file's run commands:
./gradlew :sdks:java:core:test ./gradlew :sdks:python:test
Common Issues
-
Clean gradle cache: rm -rf ~/.gradle .gradle
-
Remove build directory: rm -rf build
-
Check Java version matches CI
Snapshot Builds
Locations
-
Java SDK: https://repository.apache.org/content/groups/snapshots/org/apache/beam/
-
SDK Containers: https://gcr.io/apache-beam-testing/beam-sdk
-
Portable Runners: https://gcr.io/apache-beam-testing/beam_portability
-
Python SDK: gs://beam-python-nightly-snapshots
Release Workflows
Workflow Purpose
cut_release_branch.yml
Create release branch
build_release_candidate.yml
Build RC
finalize_release.yml
Finalize release
publish_github_release_notes.yml
Publish notes