Holder instances are returned with null buffer
Creating an instance of holders new VarCharHolder() is returning buffer as null. What mistake am I making here?
build.sbt
libraryDependencies ++= Seq(
"org.apache.arrow" % "arrow-vector" % "8.0.0",
"org.apache.arrow" % "arrow-memory-unsafe" % "8.0.0"
)
Hi @iamsmkr Can you provide the code you're using to create the holders?
thanks
Hi @lwhite1, Is there any documentation on how to work with holders properly? Looks like holder instances don't come with buffers. Buffers need to be assigned to holders!
Ax far as I know, there is no documentation other than what is in the source code. A holder is just a clever way to access values in a Vector generically without boxing every primitive value, and only creating one holder per ValueVector.
So for example, If you have an IntVector called ageVector, and you want to add a year to everyone's age, you would create your IntHolder(s) once before starting your loop, then set its value inside the loop and pass the holder to the vector or a writer.
IntVector ageVector = ...
IntHolder ageWriteHolder = new IntHolder();
IntHolder ageReadHolder = new IntHolder();
for (int i = 0; i < ageVector.getRowCount(); i++);
intVector.get(i, ageReadHolder);
ageWriteHolder.value = ageReadHolder.value + 1;
intVector.set(i, ageWriteHolder);
}
So I have a use case where I was trying to create a varchar holder, populate the buffer manually (inside the holder) and then forward the holder to the ListVector writer. Below is the work around but it doesn't seem to be an ideal approach since we are asking for new buffer of various size everytime. It would be really helpful, if you could suggest a better alternative. Thanks @lwhite1 !
writer.startList()
writer.setPosition(row)
value.foreach { str =>
val bytes = str.getBytes(StandardCharsets.UTF_8)
val length = bytes.size
val buffer = vector.getAllocator.buffer(length) // TODO Fix this
buffer.setBytes(0, bytes)
writer.writeVarChar(0, length, buffer)
buffer.clear()
}
writer.endList()
A rather complete working example:
object TestBuffers2 extends App {
val allocator = new RootAllocator()
val lv = ListVector.empty("listStr", allocator)
lv.allocateNew()
val writer = lv.getWriter
writer.startList()
writer.setPosition(0)
val ls = List("Shivam", "works", "for", "pometry")
ls.foreach { str =>
val bytes = str.getBytes(StandardCharsets.UTF_8)
val length = bytes.size
val buffer = lv.getAllocator.buffer(length)
buffer.setBytes(0, bytes)
writer.writeVarChar(0, length, buffer)
buffer.clear()
}
writer.endList()
writer.startList()
writer.setPosition(1)
ls.foreach { str =>
val bytes = str.getBytes(StandardCharsets.UTF_8)
val buffer = lv.getAllocator.buffer(bytes.size)
buffer.setBytes(0, bytes)
writer.writeVarChar(0, bytes.size, buffer)
buffer.clear()
}
writer.endList()
lv.setValueCount(2)
println(lv.getObject(0).asScala.toSet.asInstanceOf[Set[String]])
println(lv.getObject(1).asScala.toSet.asInstanceOf[Set[String]])
val res = mutable.Set.empty[String]
val itr = lv.getObject(0).iterator()
while(itr.hasNext) {
val o = itr.next().asInstanceOf[org.apache.arrow.vector.util.Text]
res.add(new String(o.getBytes, StandardCharsets.UTF_8))
}
println(res.toSet)
}