How to save NdArray object to local disk?
Hi: when I generate NdArray obj from spark dataframe ,I want to save the NdArray obj for next model training,but I don't know how to save it.
java.io.NotSerializableException: org.tensorflow.ndarray.impl.dense.DoubleDenseNdArray
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1185)
at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553)
at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510)
at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179)
at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349)
Unfortunately there is no endpoint out-of-the-box for serializing/deserializing an NdArray. It would be a great addition though. You probably want to save the array type, shape and data instead of an object?
Can you show me how you initialize your NdArray? That might give me some cue on how we can tackle this.
Unfortunately there is no endpoint out-of-the-box for serializing/deserializing an
NdArray. It would be a great addition though. You probably want to save the array type, shape and data instead of an object?Can you show me how you initialize your
NdArray? That might give me some cue on how we can tackle this.
thanks, let me see. Now we have three ways to generate NdArray
- from java& scala normal array
val testMatrix1 = StdArrays.ndCopyOf(Array[Array[Int]](Array(1, 2, 3, 4, 45), Array(2, 4, 6, 8, 10), Array(3, 6, 9, 12, 15), Array(4, 8, 12, 16, 20)))
val testMatrix2 = StdArrays.ndCopyOf(Array[Array[Int]](Array(1), Array(0), Array(1), Array(1)))
- from data disk file like csv or zip file or inputStream , nio Databuffers
@throws[IOException]
private def readArchive(archiveName: String) = {
// val dataset =classOf[MnistDataset].getClassLoader.getResourceAsStream(archiveName) //NullPointerException
val dataset = new java.io.FileInputStream(archiveName)
val gzipInputStream = new GZIPInputStream(dataset)
val archiveStream = new DataInputStream(gzipInputStream) //new GZIPInputStream(new java.io.FileInputStream("src/main/resources/"+archiveName))
// )
archiveStream.readShort // first two bytes are always 0
val magic = archiveStream.readByte
if (magic != TYPE_UBYTE) throw new IllegalArgumentException("\"" + archiveName + "\" is not a valid archive")
val numDims = archiveStream.readByte
val dimSizes = new Array[Long](numDims)
var size = 1 // for simplicity, we assume that total size does not exceeds Integer.MAX_VALUE
for (i <- 0 until dimSizes.length) {
dimSizes(i) = archiveStream.readInt
size = size * dimSizes(i).toInt
}
println(s"size ${size}")
val bytes = new Array[Byte](size)
archiveStream.readFully(bytes)
NdArrays.wrap(Shape.of(dimSizes: _*), DataBuffers.of(bytes, true, false))
}
3.from spark dataframe vectorUDT to generate
here need to import three package,some complex convert to generate, I think maybe could from ndarray return regenerate spark dataframe to write like csv or parquet file
4.also from Operand[T] tensor or tfrecord Dataset here I only see from ndArray to generate tensor ,and from Dataset to generate ndArray ...