[BUG] Launching dataflow with Spanner
Expected Behavior
Pipeline with Spanner is launched to Dataflow and executed successfully
Current Behavior
Facing an error when launching any Dataflow which has Spanner as source ot target database
2025-09-03 13:49:57.908 GET
Exception in thread "main" ; see consecutive INFO logs for details.
2025-09-03 13:49:57.928 GET
Error: Template launch failed: exit status 1
2025-09-03 13:50:35.685 GET
Error occurred in the launcher container: Template launch failed. See console logs.
Context
To build Docker image, I had to add one dependency to pom.xml
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.7.0</version>
</dependency>
Without it, mvn clean package fails with an error. Can you also check it?
I am trying to sync data from Spanner table to BQ
Building docker:
mvn clean package -DskipTests -Dimage=us-east4-docker.pkg.dev/pp-test-staging/sxope-emr-files-loader/maven-dataflow-template:latest
Creating flex template:
gcloud dataflow flex-template build gs://sxp-test-test/beam_test/maven-flex-dataflow.json \
--image "us-east4-docker.pkg.dev/pp-test-staging/sxope-emr-files-loader/maven-dataflow-template:latest" \
--sdk-language "JAVA"
Running flex template:
gcloud dataflow flex-template run maven-dataflow-test \
--region=us-east4 \
--template-file-gcs-location=gs://sxp-test-test/beam_test/maven-flex-dataflow.json \
--parameters=config="$(spanner-to-bigquery.json)”
Config example:
{
"sources": [
{
"name": "spanner",
"module": "spanner",
"parameters": {
"projectId": "test",
"instanceId": "test",
"databaseId": "sxope-member-datapoints-stagin",
"query": "select emr_member_id from emr_members"
}
}
],
"sinks": [
{
"name": "bigquery",
"module": "bigquery",
"input": "spanner",
"parameters": {
"table": "pp-import-staging:sb_temp.emr_data_test_11039",
"createDisposition": "CREATE_IF_NEEDED",
"writeDisposition": "WRITE_TRUNCATE"
}
}
]
}
Dataflow doesn't render anything and fails
I also tried to launch an example config from README
sources:
- name: bigquery
module: bigquery
parameters:
query: |-
SELECT
*
FROM
`myproject.mydataset.mytable`
sinks:
- name: spanner
module: spanner
inputs:
- bigquery
parameters:
projectId: myproject
instanceId: myinstance
databaseId: mydatabase
table: mytable
But it didnt work out for me also Can you check is it working for you?
Thank you for using our service and for the report! I tried verifying under the same conditions, but I couldn't quite reproduce the issue in my environment... If possible, could you please share the detail error message from the failed job in your environment? (The launch job error logs are output at the info level, not the error level, so could you please check the logs at the info level?)
@orfeon Thanks for the fast answer. I took some time to experiment with Dataflow. Sadly, there are no console log messages at all, because the VM starts and I catch the error before the container is even created. I have some raw VM logs, but it’s difficult to navigate inside them.
I chose another approach:
- Successfully pushed image to Artifact Registry
- Pulled image from Artifact Registry to local Docker
- Ran pipeline locally with config, and everything worked
That should mean that the Docker container and config are correct. But when I’m trying to launch Dataflow from gcloud or the API in GCP, it always fails on container build.
It’s very confusing, because from the access point, everything should be fine. I am launching it via the API with a service account that has:
- dataflow.worker
- artifactregistry.reader
- storage.objectAdmin