Liangcai Li

Results 18 issues of Liangcai Li

We are meeting this exception when running a regression training repeatly with the xgboost JVM jars built from the latest master branch using Scala 2.11, along with our spark example...

There will be more than 100 cases. We may need multiple sub issues for this. [Click to see full type casting list CPU ORC supports. ](https://github.com/apache/orc/blob/main/java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java#L2258)

task

CPU ORC reading supports schema evolution as discribed in issue #135. But GPU does not. GPU will run into exceptions when users specify a reader schema which is different from...

feature request

Test to reproduce this bug. ``` @Test void testPartitionEmptyTable() { try (Table t = new Table.TestBuilder() .timestampDayColumn() .build(); ColumnVector parts = ColumnVector .fromInts(); PartitionedTable pt = t.partition(parts, 3)) { assertArrayEquals(new...

bug
? - Needs Triage
libcudf

closes https://github.com/NVIDIA/spark-rapids/issues/6313 This PR adds the columnar support for the new API `mapInArrow` which is introduced in Spark 3.3.0. ***Performance*** - About 6.8 GB Parquet data in local files. -...

feature request

**Describe the bug** GPU JSON reader can not read the JSON string of an empty body `{}`. But Spark can read it successfully. **Steps/Code to reproduce bug** There are two...

bug

close https://github.com/NVIDIA/spark-rapids/issues/10968 This PR adds the `MinBy` support on GPU. The GPU `MinBy` may produce different results than that of CPU when multiple rows in the ordering column have the...

feature request

Contribute to https://github.com/NVIDIA/spark-rapids/issues/10790 Fix https://github.com/NVIDIA/spark-rapids/issues/10841 This PR is trying to accelerate the normal shuffle path by partitioning and slicing tables on GPU. The sliced table is already serializable so can...

**Describe the bug** PR https://github.com/NVIDIA/spark-rapids/pull/10912 introduces the parquet support for `GpuInsertIntoHiveTable`, along with the relevant tests. In some of the tests on Databricks, the `ProjectExec` will fall back to CPU...

bug
performance

close https://github.com/NVIDIA/spark-rapids/issues/11646 `curXORShiftRandomSeed ` is marked as `transient`, so it will be null on executors without retry-restore context, leading to this NPE. This fix removes the `transient` for `curXORShiftRandomSeed`, `seed`...

bug