hyperspace
hyperspace copied to clipboard
input_file_name() results change after Hyperspace is enabled
Describe the issue
Results change after Hyperspace is enabled.
To Reproduce
import com.microsoft.hyperspace._
import com.microsoft.hyperspace.index._
spark.range(1000).toDF("A").write.parquet("X")
val df = spark.read.parquet("X")
val hs = Hyperspace()
hs.createIndex(df, IndexConfig("myind", Seq("A"), Nil))
spark.enableHyperspace
df.filter("A = 1").withColumn("B", input_file_name()).show(false)
Expected behavior
Column B contains the source file names.
Possible fix:
- If index lineage is disabled: Don't apply CoveringIndex if input_file_name() is used in the query.
- If index lineage is enabled: Replace input_file_name() with source file paths using the file IDs.