gpu-bdb
gpu-bdb copied to clipboard
RAPIDS GPU-BDB
This PR aims to re-enable query 27 , this PR currently fixes breakages though we currently are getting empty results though. Dont know why.
After generating test data using PDGF tool and placing the .dat files under $DATA_DIR/SF what should be the next step to run the benchmark? running the load test generates .parquet...
I noticed that gpu-bdb query 18 was using private methods from cudf for `find_multiple`: https://github.com/rapidsai/gpu-bdb/blob/f48c05d63d5cb4baa59708cb262506f6d9d3f4f1/gpu_bdb/bdb_tools/q18_utils.py#L21 https://github.com/rapidsai/gpu-bdb/blob/f48c05d63d5cb4baa59708cb262506f6d9d3f4f1/gpu_bdb/bdb_tools/q18_utils.py#L127 cudf now has public APIs that perform the same task: `Series.str.find_multiple`. This should be...
Below queries rely on cuML models from for ML GPU . Depending on the performance we need to decide b/w Distributed (dask-ml) vs non distributed (sklearn) implementation for the ML...
ucx-py recently added dockerfiles that have cuda enabled containers with all the pre-requisites for building ucx+ib from source. We should update our images to use that as a central source...
This PR enables using the CPU backend option with DataFrame queries: 11, 12, 15, 16, 17 and 22 I also verified that the DataFrame versions of all of the other...
When running multiple instances of GPU_BDB from the same working directory the interim results files, `benchmarked_*.csv` will be overwritten and/or corrupted. Add the run_id configuration option into each filename to...
We should add CPU backend for Queries 25, 26 and 30 as we have to revert those changes https://github.com/rapidsai/gpu-bdb/pull/244 . More details in the above PR.
We should add some level of CI testing at sf1 (which should work on a single gpu) to catch breakages earlier. Came up as a discussion from #244 .