[Feature Request]: Set quota project in `beam.io.ReadFromBigQuery`
What would you like to happen?
This issue relates to the Python SDK, but it is probably relevant to other SDKs as well:
We have a use case where queries initiated by beam.io.ReadFromBigQuery should be billed on a specific GCP Project ID.
As we use a custom container, the only option for now would be setting the env. var. GOOGLE_CLOUD_QUOTA_PROJECT in the Dockerfile - but it affects all other GCP services as well.
It would be best making it configurable via the connector (i.e., beam.io.ReadFromBigQuery(..., quota_project_id='some-project-id)).
When implementing, you could gain inspiration from similar feature in beam.io.WriteToBigQuery: https://github.com/apache/beam/pull/16186.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- [X] Component: Python SDK
- [X] Component: Java SDK
- [X] Component: Go SDK
- [X] Component: Typescript SDK
- [X] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [x] Component: Google Cloud Dataflow Runner
@shahar1 this level of customization might make sense.
Let's explore your specific concern for a moment, I might have others, but imagine worth understanding your needs/usecase:
What Quotas are getting hit that are problematic? Or, what are the specific billing charges you are looking to attribute elsewhere?
Your are running on Dataflow? Or other? [ not critical, but curious ]
What read method? [ BQ Storage Read API? ]
You want to run the compute in one GCP project, but use BQ from another? If this unloads, writes to GCS and then into Dataflow [ that is another way that can occur ], do you intend to specify which project [ bucket within ] that the data is written?
Also, I wonder whether implimentation of this issue would help with https://github.com/apache/beam/issues/30747
@shahar1 this level of customization might make sense.
Let's explore your specific concern for a moment, I might have others, but imagine worth understanding your needs/usecase:
What Quotas are getting hit that are problematic? Or, what are the specific billing charges you are looking to attribute elsewhere? Your are running on Dataflow? Or other? [ not critical, but curious ] What read method? [ BQ Storage Read API? ] You want to run the compute in one GCP project, but use BQ from another? If this unloads, writes to GCS and then into Dataflow [ that is another way that can occur ], do you intend to specify which project [ bucket within ] that the data is written?
Thank for your response! Here are the answers for your questions:
- I'd like to attribute the queries execution to another project. In my case, the BigQuery is on project A, and
beam.io.ReadFromBigQueryruns on project B - I'd like to bill project B for the queries (for that matter it could also be project C). - We use Dataflow and direct runner (when implementing, it should better a be a general solution and not Dataflow specific).
- I use both methods - if I'm not wrong, in both cases you could set the
quota_project_idviaClientOptions. - Yup, you got the idea correctly :)
As for #30747 - it is related, but there might be some changes in implementation as GCS is project's resource rather than a service.
@shahar1 sounds like you've got a decent idea/design in mind, which could be supported.
Are you interested in contributing? Feel free to start and include me on PRs, if that's the case.
@shahar1 sounds like you've got a decent idea/design in mind, which could be supported.
Are you interested in contributing? Feel free to start and include me on PRs, if that's the case.
I'd be happy to try! I need first to learn how development works here (I'm coming from the Airflow community)
.take-issue
This should be pretty good --> https://github.com/apache/beam/blob/master/CONTRIBUTING.md
If you find a problem [ or that is outdated ], let's overcome and fix the docs along the way.
Are there any updates on this?
Are there any updates on this?
I haven't managed to work on it yet. If you (or anyone else) find this feature useful and want to take over, please let me know.