java-development

Java Development in Apache Beam

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "java-development" with this command: npx skills add apache/beam/apache-beam-java-development

Java Development in Apache Beam

Project Structure

Key Directories

  • sdks/java/core

  • Core Java SDK (PCollection, PTransform, Pipeline)

  • sdks/java/harness

  • SDK harness (container entrypoint)

  • sdks/java/io/

  • I/O connectors (51+ connectors including BigQuery, Kafka, JDBC, etc.)

  • sdks/java/extensions/

  • Extensions (SQL, ML, protobuf, etc.)

  • runners/

  • Runner implementations:

  • runners/direct-java

  • Direct Runner (local execution)

  • runners/flink/

  • Flink Runner

  • runners/spark/

  • Spark Runner

  • runners/google-cloud-dataflow-java/

  • Dataflow Runner

  • examples/java/

  • Java examples including WordCount

Build System

Apache Beam uses Gradle with a custom BeamModulePlugin . Every Java project's build.gradle starts with:

apply plugin: 'org.apache.beam.module' applyJavaNature( ... )

Common Commands

Build Commands

Compile a specific project

./gradlew -p sdks/java/core compileJava

Build a project (compile + tests)

./gradlew :sdks:java:harness:build

Run WordCount example

./gradlew :examples:java:wordCount

Running Unit Tests

Run all tests in a project

./gradlew :sdks:java:harness:test

Run a specific test class

./gradlew :sdks:java:harness:test --tests org.apache.beam.fn.harness.CachesTest

Run tests matching a pattern

./gradlew :sdks:java:harness:test --tests *CachesTest

Run a specific test method

./gradlew :sdks:java:harness:test --tests *CachesTest.testClearableCache

Running Integration Tests

Integration tests have filenames ending in IT.java and use TestPipeline .

Run I/O integration tests on Direct Runner

./gradlew :sdks:java:io:google-cloud-platform:integrationTest

Run with custom GCP project

./gradlew :sdks:java:io:google-cloud-platform:integrationTest
-PgcpProject=<project> -PgcpTempRoot=gs://<bucket>/path

Run on Dataflow Runner

./gradlew :runners:google-cloud-dataflow-java:examplesJavaRunnerV2IntegrationTest
-PdisableSpotlessCheck=true -PdisableCheckStyle=true -PskipCheckerFramework
-PgcpProject=<project> -PgcpRegion=us-central1 -PgcsTempRoot=gs://<bucket>/tmp

Code Formatting

Format Java code

./gradlew spotlessApply

Writing Integration Tests

@Rule public TestPipeline pipeline = TestPipeline.create();

@Test public void testSomething() { pipeline.apply(...); pipeline.run().waitUntilFinish(); }

Set pipeline options via -DbeamTestPipelineOptions='[...]' :

-DbeamTestPipelineOptions='["--runner=TestDataflowRunner","--project=myproject","--region=us-central1","--stagingLocation=gs://bucket/path"]'

Using Modified Beam Code

Publish to Maven Local

Publish a specific module

./gradlew -Ppublishing -p sdks/java/io/kafka publishToMavenLocal

Publish all modules

./gradlew -Ppublishing publishToMavenLocal

Building SDK Container

Build Java SDK container (for Runner v2)

./gradlew :sdks:java:container:java11:docker

Tag and push

docker tag apache/beam_java11_sdk:2.XX.0.dev
"us-docker.pkg.dev/your-project/beam/beam_java11_sdk:custom" docker push "us-docker.pkg.dev/your-project/beam/beam_java11_sdk:custom"

Building Dataflow Worker Jar

./gradlew :runners:google-cloud-dataflow-java:worker:shadowJar

Test Naming Conventions

  • Unit tests: *Test.java

  • Integration tests: *IT.java

JUnit Report Location

After running tests, find HTML reports at: <project>/build/reports/tests/test/index.html

IDE Setup (IntelliJ)

  • Open /beam (the repository root, NOT sdks/java )

  • Wait for indexing to complete

  • Find examples/java/build.gradle and click Run next to wordCount task to verify setup

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

python-development

No summary provided by upstream source.

Repository SourceNeeds Review
General

gradle-build

No summary provided by upstream source.

Repository SourceNeeds Review
General

license-compliance

No summary provided by upstream source.

Repository SourceNeeds Review