Hao Zhu issues

Results 18 issues of


                                            Hao Zhu

[BUG] Reading Binary Type in Iceberg table fallback to CPU

**Env:** rapids-4-spark_2.12-22.10.0-20220817.170628-9.jar Spark Standalone cluster **Issue** Reading Binary Type in Iceberg table fallback to CPU. eg: ``` spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.1 \ --jars /xxx/2210snapshot/rapids-4-spark_2.12-22.10.0-20220817.170628-9.jar \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \...

bug

[FEA] Support FromUTCTimestamp

I wish we can Support FromUTCTimestamp Reproduce: ``` import org.apache.spark.sql.functions._ import spark.implicits._ import org.apache.spark.sql.types._ var df = spark.sparkContext.parallelize(Seq(1)).toDF() df=df.withColumn("value82", (lit("123456.78").cast(DecimalType(8,2)))). withColumn("value63", (lit("123.456").cast(DecimalType(6,3)))). withColumn("value1510", (lit("12345.0123456789").cast(DecimalType(15,10)))). withColumn("value2510", (lit("123456789012345.0123456789").cast(DecimalType(25,10)))). withColumn("value2901", (lit("1234567890123456789012345678.1").cast(DecimalType(29,1)))). withColumn("value3802", (lit("123456789012345678901234567890123456.01").cast(DecimalType(38,2))))....

feature request

[FEA] Set default value of spark.task.resource.gpu.amount to 1/spark.executor.cores

I wish we can set the default value of spark.task.resource.gpu.amount to 1/spark.executor.cores so that users do not need to manually set it.

ease of use

[FEA] Avoid CPU fallback due to date_format:Failed to convert Unsupported word: SSS null.

I wish we can avoid CPU fallback due to date_format:Failed to convert Unsupported word: SSS null. Reproduce: ``` import org.apache.spark.sql.functions._ import spark.implicits._ import org.apache.spark.sql.types._ var df = spark.sparkContext.parallelize(Seq(1)).toDF() df=df.withColumn("value82", (lit("123456.78").cast(DecimalType(8,2))))....

feature request

[FEA] Support Percentile

I wish we can support function Percentile. Eg: ``` select percentile(ss_wholesale_cost,0.1) from tpcds.store_sales limit 10; ! percentile(ss_wholesale_cost#82, 0.1, 1, 0, 0) cannot run on GPU because GPU does not currently...

feature request

cudf_dependency

[FEA] Qualification Tool: For Databricks eventlog capture more information in output csv file

Currently when using Qualification tool to process lots of Eventlogs on Databricks, the result normally would be: ``` ================================================================================================================================================================================== | App Name| App ID|App Duration|SQL DF Duration|GPU Opportunity|Estimated GPU Duration|Estimated...

feature request

tools

[BUG] spark.rapids.sql.exec.CollectLimitExec=true can mess up the CSV header row

If we enable spark.rapids.sql.exec.CollectLimitExec=true on a 2 nodes cluster, the CSV with header may be messed up. For example, let's use this example csv file: ``` wget -q https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Public/data/news_category_train.csv ```...

bug

? - Needs Triage

[BUG] Thrift server reported java.lang.NoClassDefFoundError: com/nvidia/spark/rapids/GpuBroadcastHashJoinExec

When running a join query in Spark Thrift Server it reported: ``` SQL Error: org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.NoClassDefFoundError: com/nvidia/spark/rapids/GpuBroadcastHashJoinExec at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:44) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:488) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:246) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79)...

bug

[FEA]Support HiveTableScanExec to scan a Hive text table

I wish we can support HiveTableScanExec to scan a Hive text table. Say you already created a Hive text table using the code in https://github.com/NVIDIA/spark-rapids/issues/6419, then query it ``` spark.sql("select...

feature request

[FEA] Support China Standard Timezone(CST = UTC + 8) on GPU

I wish we can support China Standard Timezone(CST = UTC + 8) on GPU. Say server timezone is CST and user do not need to explicitly set `user.timezome=UTC`. As a...

feature request