hyperspace icon indicating copy to clipboard operation
hyperspace copied to clipboard

input_file_name() results change after Hyperspace is enabled

Open clee704 opened this issue 4 years ago • 1 comments

Describe the issue

Results change after Hyperspace is enabled.

To Reproduce

import com.microsoft.hyperspace._
import com.microsoft.hyperspace.index._

spark.range(1000).toDF("A").write.parquet("X")
val df = spark.read.parquet("X")
val hs = Hyperspace()
hs.createIndex(df, IndexConfig("myind", Seq("A"), Nil))
spark.enableHyperspace
df.filter("A = 1").withColumn("B", input_file_name()).show(false)

Expected behavior

Column B contains the source file names.

clee704 avatar Jul 21 '21 10:07 clee704

Possible fix:

  1. If index lineage is disabled: Don't apply CoveringIndex if input_file_name() is used in the query.
  2. If index lineage is enabled: Replace input_file_name() with source file paths using the file IDs.

clee704 avatar Jul 21 '21 10:07 clee704