Randy Gelhausen

Results 60 issues of Randy Gelhausen

With latest cudf nightlies, attempting to write a file to a non-existend subdirectory fails as expected (matching Pandas), but the error is unclear: ``` RuntimeError: cuDF failure at: ../src/io/utilities/data_sink.cpp:36: Cannot...

feature request
good first issue
cuDF (Python)
cuIO
inactive-30d

I'm trying to use cuml and dask-cuda in an environment that doesn't have libnuma installed and I unfortunately cannot install it. It seems that cuml has a dependency on ucx-py,...

question
inactive-30d
inactive-90d

Following the [Google Cloud Storage bucket](https://docs.blazingdb.com/docs/google-storage) docs, I see I need a GCP project id. However, I'm trying to read the taxi dataset out of a public bucket `gcs://anaconda-public-data/nyc-taxi/csv/`. If...

I'm struggling to find a programmatic reproducer for this, but on the datafusion-sql-planner branch: ``` c.sql("SELECT * FROM large_table limit 5") ``` results in reading the entire dataset before filtering...

bug
needs triage

Repro: ``` import pandas as pd from dask_sql import Context c = Context() df = pd.DataFrame({"id": [0, 1, 1, 2], "val": [1, 1, 2, 1]}) c.create_table("df", df) c.sql(""" SELECT val,...

bug
needs triage

``` import pandas as pd from dask_sql import Context c = Context() df = pd.DataFrame({"id": [0, 1, 2]}) c.create_table("df", df) # returns a DataFrame c.sql("select * from df") # returns...

bug
needs triage

I often inherit existing SQL files which contain a series of queries/statements that should be executed one after the other. It's fairly easy to do something like: ``` with open("my_sql.txt")...

enhancement
datafusion

Sometimes intead of using a `JOIN`, an [`INTERSECT`](https://www.techonthenet.com/sql/intersect.php) is used to find the overlap in two sets of records: ``` import pandas as pd df_a = pd.DataFrame({'id': [0, 1, 2]})...

enhancement
SQL grammar
needs triage

``` from dask_sql import Context import pandas as pd import dask.dataframe as dd c = Context() pd.DataFrame({'id': [0, 1, 2]}).to_parquet('/data/test/part.0.parquet') # this works c.sql(""" CREATE OR REPLACE TABLE test WITH...

bug
needs triage

I'm trying to use a `CREATE TABLE WITH (... filters=[...])` on a Parquet dataset, and trying to achieve row group filtering based on filters supplied in the `CREATE TABLE` statement,...

bug
needs triage