[SPARK-36663] [FOLLOWUP] [SQL] Support number-only column names in ORC data sources when orc impl is hive
What changes were proposed in this pull request?
This PR aims to support number-only column names in ORC data sources when orc impl is hive. In the current master, with ORC datasource, we can write a DataFrame which contains such columns into ORC files.
spark.sql("SELECT 'a' as `1`, 'b' as `2`, 'c' as `3`").write.orc(path)
But reading the ORC files will fail.
val df = spark.read.orc(path)
...
== SQL ==
struct<1:string,2:string,3:string>
-------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:265)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:126)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseDataType(ParseDriver.scala:40)
at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$readSchema$2.applyOrElse(OrcFileOperator.scala:101)
The cause of this is CatalystSqlParser.parseDataType fails to parse if a column name (and nested field) consists of only numbers.
Why are the changes needed?
For better usability.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit Tests.
Can one of the admins verify this patch?
@cloud-fan please take a look
@dongjoon-hyun I created a new JIRA, please take a look
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!