ecosystem icon indicating copy to clipboard operation
ecosystem copied to clipboard

Why convert SparseVector to DenseVector in your DefaultTfRecordRowEncoder.scala?

Open NeilRon opened this issue 6 years ago • 0 comments

     case VectorType => {
        val field = row.get(index)
        field match {
          case v: SparseVector => FloatListFeatureEncoder.encode(v.toDense.toArray.map(_.toFloat))
          case v: DenseVector => FloatListFeatureEncoder.encode(v.toArray.map(_.toFloat))
          case _ => throw new RuntimeException(s"Cannot convert $field to vector")
        }
      }

I found this code in your DefaultTfRecordRowEncoder.scala, explicitly converse a SparseVector to a DenseVector.

I have a 1000-dimentional feature vector in my DataFrame which has about 90 non-zero values. So this conversion make the size of tfrecord dataset very much larger than snappy.parquet in Spark.

I'm a little confused about the conversion.

NeilRon avatar Oct 29 '19 12:10 NeilRon