Michel Davit issues

Results 66 issues of


                                            Michel Davit

Pinning of different versions for the same artifact

We have a use case where we want to pin different major versions of the same artifact in a multi module setup eg: - `elasticsearch-7` module depending on `"org.elasticsearch" %...

cat:repo-config

BatchDoFn and sio batch API on SCollection

Amortize processing cost by local batching of elements Batching respects windowing This aims to give symetric API with the KV batching in #4458 As the batch is emitted on `finishBundle`,...

Implement gRPC lookup API

Set base to `sbt-protoc` #4483 Create an API to facilitate lookup over GRPC on `SCollection` elements with idiomatic scala

Migrate from sbt-protobuf to sbt-protoc

This enables `grpc` java codegen support, Clean useless `scio-schemas`. Move test schemas to project's Test configuration.

Improve KV batch API

I realized that sio is only having an API for beam `GroupIntoBatches` with `ofSize` 1. this can be problematic in streaming pipelines when keys have few elements and batch takes...

Planed deprecation for v0.13.0

Cleanup planned deprecation for `v0.13.0` milestone - [ ] elasticsearch 6 module (end of life)

Streaming pipeline update with naming on composite transform

When launching a replacement job for a streaming pipeline in dataflow, `transformNameMapping` option must be given when transformation names have changed. Scio by default use the callsite for the transformation...

bug

streaming

Support BigQuery Load jobs for batch loading

From beam [documentation](https://beam.apache.org/documentation/io/built-in/google-bigquery/#setting-the-insertion-method) > When you specify load jobs as the insertion method using BigQueryIO.write().withMethod(FILE_LOADS). Scio should also give users the possibility to use files to load data to BigQuery

Support BigQuery Storage Write API

From beam BigQuery [documentation](https://beam.apache.org/documentation/io/built-in/google-bigquery/#writing-to-bigquery) > Starting with version 2.36.0 of the Beam SDK for Java, you can use the [BigQuery Storage Write API](https://cloud.google.com/bigquery/docs/write-api) from the BigQueryIO connector. Scio should also...

SMB module is pulling all storage implementations

Depending on `scio-smb` pulls transitively all storage implementation dependencies for: - parquet - json - avro - tensorflow TensorFlow dependencies alone are ~200Mb. Users should only have the desired storage...