Scott Lee

Results 9 issues of Scott Lee

Signed-off-by: Scott Lee ## Why are these changes needed? When creating Datasets with ragged arrays, the resulting Dataset incorrectly uses `ArrowTensorArray` instead of `ArrowVariableShapedTensorArray` as the underlying schema type. This...

## Why are these changes needed? WIP - checking what changes are needed to enable optimizer by default 100%. ## Related issue number ## Checks - [ ] I've signed...

## Why are these changes needed? Implement the `LogicalOperator` and `PhysicalOperator` for `Dataset.union()`, and make `union()` lazy. This PR also introduces `Nary` and `NaryOperator` Logical/Physical Operators to support abstraction for...

@author-action-required

## Why are these changes needed? - Modifies the existing multi node train benchmark code to enable testing with heterogeneous clusters. - Adds a new release test `read_images_train_1_gpu_5_cpu` with 1...

## Why are these changes needed? Whenever there is any error when using Ray Data, the full stack trace is currently printed to stdout. If the exception originates from the...

### What happened + What you expected to happen When iterating over a Ray Dataset within the `TorchTrainer` train loop, a non-`None` `local_shuffle_buffer_size` causes a decrease in throughput compared to...

bug
P1
performance
data
ray 2.11

## Why are these changes needed? There is a bug with progress bars generated from Ray Data on Jupyter notebooks, where the progress bar is left partially complete after the...

## Why are these changes needed? It can be useful to have access to the `DataContext` for a particular `Dataset` in the execution plan optimizer path. This PR adds a...

go

## Why are these changes needed? Improve docs around Parquet filter predicate / column selection pushdown, so that they are easier to access from multiple parts of the Ray Data...