Matthias Roels
Matthias Roels
Absolutely agree with the comment from Anna! A couple of remarks from a release management perspective though: storing flow code in S3 or e.g. GitHub instead of a container image...
Alternatively, it could also be useful to be able to do something similar as https://github.com/anna-geller/packaging-prefect-flows/blob/master/flows_no_build/docker_script_kubernetes_run_custom_ecr_image.py in Orion…
Is there any further update on this issue?
That’s exactly what I always do: I use credentials as env vars. however, it would still be nice to be able to inject them in a dataset (think db credentials...
> Interesting to learn about why Rust choose object store instead of a general filesystem interface. @noklam: It is explained in the [docs](https://docs.rs/object_store/latest/object_store/#why-not-a-filesystem-interface).
I did some experiments to see where we are in using plain vanilla `read_*`, `scan_*` and `write_*` operations on object stores (for a local filesystem, they work as expected). -...
As per my comment [here](https://github.com/kedro-org/kedro-plugins/issues/702#issuecomment-2195299774), I wouldn't recommend using streaming or `sink_*` methods. Even when using `.collect(streaming=True)`, it is explicitly mentioned [in the docs](https://docs.pola.rs/api/python/stable/reference/api/polars.LazyFrame.sink_parquet.html) that streaming mode is considered unstable.
If you use the Lazy API, you already get some optimisations such as predicate and filter pushdown. This means that you only read the rows/columns in memory that you need...
I can't comment on spark, but be careful when forcing something like pandas >= 2.0 as users typically use other packages that might not be compatible with pandas 2.0 yet....
Haha I also misread the title 😅. Thanks @noklam for pointing it out!