Randy Gelhausen issues

Results 60 issues of


                                            Randy Gelhausen

[FEA] Clarify non-existent subdirectory error messages for file writers

With latest cudf nightlies, attempting to write a file to a non-existend subdirectory fails as expected (matching Pandas), but the error is unclear: ``` RuntimeError: cuDF failure at: ../src/io/utilities/data_sink.cpp:36: Cannot...

feature request

good first issue

cuDF (Python)

cuIO

inactive-30d

[QST] Why does cuml have a dependency on ucx-py?

I'm trying to use cuml and dask-cuda in an environment that doesn't have libnuma installed and I unfortunately cannot install it. It seems that cuml has a dependency on ucx-py,...

question

inactive-30d

inactive-90d

[DOC] How to create tables from a public GCS bucket?

Following the [Google Cloud Storage bucket](https://docs.blazingdb.com/docs/google-storage) docs, I see I need a GCP project id. However, I'm trying to read the taxi dataset out of a public bucket `gcs://anaconda-public-data/nyc-taxi/csv/`. If...

[DF] select * limit 5 seems does a full scan

I'm struggling to find a programmatic reproducer for this, but on the datafusion-sql-planner branch: ``` c.sql("SELECT * FROM large_table limit 5") ``` results in reading the entire dataset before filtering...

bug

needs triage

[DF] Some grouped aggregations fail

Repro: ``` import pandas as pd from dask_sql import Context c = Context() df = pd.DataFrame({"id": [0, 1, 1, 2], "val": [1, 1, 2, 1]}) c.create_table("df", df) c.sql(""" SELECT val,...

bug

needs triage

[DF] Any query containing ORDER BY immediately executes

``` import pandas as pd from dask_sql import Context c = Context() df = pd.DataFrame({"id": [0, 1, 2]}) c.create_table("df", df) # returns a DataFrame c.sql("select * from df") # returns...

bug

needs triage

[ENH] Support executing multiple, semicolon delimited statements sequentially

I often inherit existing SQL files which contain a series of queries/statements that should be executed one after the other. It's fairly easy to do something like: ``` with open("my_sql.txt")...

enhancement

datafusion

[ENH] Support INTERSECT operator

Sometimes intead of using a `JOIN`, an [`INTERSECT`](https://www.techonthenet.com/sql/intersect.php) is used to find the overlap in two sets of records: ``` import pandas as pd df_a = pd.DataFrame({'id': [0, 1, 2]})...

enhancement

SQL grammar

needs triage

[BUG] Can't infer file type of table when passing directory name only

``` from dask_sql import Context import pandas as pd import dask.dataframe as dd c = Context() pd.DataFrame({'id': [0, 1, 2]}).to_parquet('/data/test/part.0.parquet') # this works c.sql(""" CREATE OR REPLACE TABLE test WITH...

bug

needs triage

[BUG] read_parquet filters don't seem to be applied as passed in CREATE TABLE statements

I'm trying to use a `CREATE TABLE WITH (... filters=[...])` on a Parquet dataset, and trying to achieve row group filtering based on filters supplied in the `CREATE TABLE` statement,...

bug

needs triage