Mark Pevec

Results 23 comments of Mark Pevec

This appears to be due to 2 causes. Firstly the retry parameters are not properly used because of this line: https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/v2/elasticsearch-common/src/main/java/com/google/cloud/teleport/v2/elasticsearch/transforms/WriteToElasticsearch.java#L100 Updating that line to: ``` elasticsearchWriter = elasticsearchWriter.withRetryConfiguration( ElasticsearchIO.RetryConfiguration.create(...

I've added fixes to the above 2 issues as part of my PR for other Elasticsearch template improvements https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/399

@an2x thanks Nick for the review and approval! I noticed the 2 failed checks but they both seem to be based on code in the main branch that hasn't changed...

> @ggprod Sounds like `mvn spotless:apply` should fix the validation for ValueExtractorTransform.java. Did you give it a try? I don't believe there were any spotless problems with ValueExtractorTransform.java. Did you...

@an2x apologies Nick, I had just noticed today there was a minor issue with the README.md files for the Bigquery and GCS Elasticsearch templates so made a small commit to...

> > > @ggprod Sounds like `mvn spotless:apply` should fix the validation for ValueExtractorTransform.java. Did you give it a try? > > > > > > I don't believe there...

The autogeneration of the _id causes another problem with these templates in that if there is a retry because of Elasticsearch timeout (but elasticsearch did receive the initial request with...

@alexandregiordanelli I have a PR open but waiting for review/approval (I believe it needs to be a repo maintainer) and then merge

I believe his could be fixed by doing a check and conditional flush before this line: https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/v2/elasticsearch-common/src/main/java/com/google/cloud/teleport/v2/elasticsearch/utils/ElasticsearchIO.java#L1459

It looks like the error in question may not have much to do with the bulk size in bytes and is instead related to the configured JVM heap size: https://discuss.elastic.co/t/org-elasticsearch-common-breaker-circuitbreakingexception-parent-data-too-large-data-for-indices-data-write-bulk-s-r/275660...