Liangcai Li issues

Results 18 issues of


                                            Liangcai Li

XGBoost-Spark training fails sometimes due to "java.lang.NumberFormatException: For input string: "inf""

We are meeting this exception when running a regression training repeatly with the xgboost JVM jars built from the latest master branch using Scala 2.11, along with our spark example...

Implement all the casting cases that GPU can support for ORC reading.

There will be more than 100 cases. We may need multiple sub issues for this. [Click to see full type casting list CPU ORC supports. ](https://github.com/apache/orc/blob/main/java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java#L2258)

task

[FEA] ORC reading supports schema evolution

CPU ORC reading supports schema evolution as discribed in issue #135. But GPU does not. GPU will run into exceptions when users specify a reader schema which is different from...

feature request

[BUG] Calling `partition()` on an empty Table leads to a crash

Test to reproduce this bug. ``` @Test void testPartitionEmptyTable() { try (Table t = new Table.TestBuilder() .timestampDayColumn() .build(); ColumnVector parts = ColumnVector .fromInts(); PartitionedTable pt = t.partition(parts, 3)) { assertArrayEquals(new...

bug

? - Needs Triage

libcudf

Support columnar processing for mapInArrow[databricks]

closes https://github.com/NVIDIA/spark-rapids/issues/6313 This PR adds the columnar support for the new API `mapInArrow` which is introduced in Spark 3.3.0. ***Performance*** - About 6.8 GB Parquet data in local files. -...

feature request

[BUG] GPU JSON reader fails to read the JSON string of an empty body

**Describe the bug** GPU JSON reader can not read the JSON string of an empty body `{}`. But Spark can read it successfully. **Steps/Code to reproduce bug** There are two...

bug

Support `MinBy` on GPU

close https://github.com/NVIDIA/spark-rapids/issues/10968 This PR adds the `MinBy` support on GPU. The GPU `MinBy` may produce different results than that of CPU when multiple rows in the ordering column have the...

feature request

Support serializing packed tables directly for the normal shuffle path

Contribute to https://github.com/NVIDIA/spark-rapids/issues/10790 Fix https://github.com/NVIDIA/spark-rapids/issues/10841 This PR is trying to accelerate the normal shuffle path by partitioning and slicing tables on GPU. The sliced table is already serializable so can...

Figure out why `MapFromArrays ` appears in the tests for hive parquet write

**Describe the bug** PR https://github.com/NVIDIA/spark-rapids/pull/10912 introduces the parquet support for `GpuInsertIntoHiveTable`, along with the relevant tests. In some of the tests on Databricks, the `ProjectExec` will fall back to CPU...

bug

performance

Fix a NPE issue in GpuRand

close https://github.com/NVIDIA/spark-rapids/issues/11646 `curXORShiftRandomSeed ` is marked as `transient`, so it will be null on executors without retry-restore context, leading to this NPE. This fix removes the `transient` for `curXORShiftRandomSeed`, `seed`...

bug