arrow icon indicating copy to clipboard operation
arrow copied to clipboard

Holder instances are returned with null buffer

Open shivamka1 opened this issue 3 years ago • 1 comments

Creating an instance of holders new VarCharHolder() is returning buffer as null. What mistake am I making here?

Screenshot 2022-07-27 at 3 52 41 PM

build.sbt

libraryDependencies ++= Seq(
  "org.apache.arrow"            % "arrow-vector"                   % "8.0.0",
  "org.apache.arrow"            % "arrow-memory-unsafe"            % "8.0.0"
)

shivamka1 avatar Jul 27 '22 10:07 shivamka1

Hi @iamsmkr Can you provide the code you're using to create the holders?

thanks

lwhite1 avatar Aug 02 '22 19:08 lwhite1

Hi @lwhite1, Is there any documentation on how to work with holders properly? Looks like holder instances don't come with buffers. Buffers need to be assigned to holders!

shivamka1 avatar Aug 22 '22 08:08 shivamka1

Ax far as I know, there is no documentation other than what is in the source code. A holder is just a clever way to access values in a Vector generically without boxing every primitive value, and only creating one holder per ValueVector.

So for example, If you have an IntVector called ageVector, and you want to add a year to everyone's age, you would create your IntHolder(s) once before starting your loop, then set its value inside the loop and pass the holder to the vector or a writer.

   IntVector ageVector = ...
   IntHolder ageWriteHolder = new IntHolder();
   IntHolder ageReadHolder = new IntHolder();
   for (int i = 0; i < ageVector.getRowCount(); i++); 
            intVector.get(i, ageReadHolder);
            ageWriteHolder.value = ageReadHolder.value + 1;
            intVector.set(i, ageWriteHolder);
   }

lwhite1 avatar Aug 22 '22 14:08 lwhite1

So I have a use case where I was trying to create a varchar holder, populate the buffer manually (inside the holder) and then forward the holder to the ListVector writer. Below is the work around but it doesn't seem to be an ideal approach since we are asking for new buffer of various size everytime. It would be really helpful, if you could suggest a better alternative. Thanks @lwhite1 !

      writer.startList()
      writer.setPosition(row)

      value.foreach { str =>
        val bytes = str.getBytes(StandardCharsets.UTF_8)
        val length = bytes.size

        val buffer = vector.getAllocator.buffer(length) // TODO Fix this 
        buffer.setBytes(0, bytes)
        writer.writeVarChar(0, length, buffer)
        buffer.clear()
      }

      writer.endList()

A rather complete working example:

object TestBuffers2 extends App {

  val allocator = new RootAllocator()

  val lv = ListVector.empty("listStr", allocator)
  lv.allocateNew()

  val writer = lv.getWriter
  writer.startList()
  writer.setPosition(0)

  val ls = List("Shivam", "works", "for", "pometry")
  ls.foreach { str =>
    val bytes = str.getBytes(StandardCharsets.UTF_8)
    val length = bytes.size

    val buffer = lv.getAllocator.buffer(length)
    buffer.setBytes(0, bytes)
    writer.writeVarChar(0, length, buffer)
    buffer.clear()
  }
  writer.endList()

  writer.startList()
  writer.setPosition(1)
  ls.foreach { str =>
    val bytes = str.getBytes(StandardCharsets.UTF_8)
    val buffer = lv.getAllocator.buffer(bytes.size)
    buffer.setBytes(0, bytes)
    writer.writeVarChar(0, bytes.size, buffer)
    buffer.clear()
  }
  writer.endList()

  lv.setValueCount(2)

  println(lv.getObject(0).asScala.toSet.asInstanceOf[Set[String]])
  println(lv.getObject(1).asScala.toSet.asInstanceOf[Set[String]])

  val res = mutable.Set.empty[String]
  val itr = lv.getObject(0).iterator()

  while(itr.hasNext) {
    val o = itr.next().asInstanceOf[org.apache.arrow.vector.util.Text]
    res.add(new String(o.getBytes, StandardCharsets.UTF_8))
  }

  println(res.toSet)
}

shivamka1 avatar Aug 23 '22 13:08 shivamka1