java How to save NdArray object to local disk?

Hi: when I generate NdArray obj from spark dataframe ,I want to save the NdArray obj for next model training，but I don't know how to save it.

Jul 01 '22 06:07 mullerhai

java.io.NotSerializableException: org.tensorflow.ndarray.impl.dense.DoubleDenseNdArray
  at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1185)
  at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553)
  at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510)
  at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
  at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179)
  at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349)

Jul 01 '22 10:07 mullerhai

Unfortunately there is no endpoint out-of-the-box for serializing/deserializing an NdArray. It would be a great addition though. You probably want to save the array type, shape and data instead of an object?

Can you show me how you initialize your NdArray? That might give me some cue on how we can tackle this.

Jul 01 '22 13:07 karllessard

Unfortunately there is no endpoint out-of-the-box for serializing/deserializing an NdArray. It would be a great addition though. You probably want to save the array type, shape and data instead of an object?

Can you show me how you initialize your NdArray? That might give me some cue on how we can tackle this.

thanks, let me see. Now we have three ways to generate NdArray

from java& scala normal array

val testMatrix1 = StdArrays.ndCopyOf(Array[Array[Int]](Array(1, 2, 3, 4, 45), Array(2, 4, 6, 8, 10), Array(3, 6, 9, 12, 15), Array(4, 8, 12, 16, 20)))
    val testMatrix2 = StdArrays.ndCopyOf(Array[Array[Int]](Array(1), Array(0), Array(1), Array(1)))

from data disk file like csv or zip file or inputStream , nio Databuffers

  @throws[IOException]
  private def readArchive(archiveName: String) = {
    //    val dataset =classOf[MnistDataset].getClassLoader.getResourceAsStream(archiveName) //NullPointerException
    val dataset = new java.io.FileInputStream(archiveName)

    val gzipInputStream = new GZIPInputStream(dataset)
    val archiveStream = new DataInputStream(gzipInputStream) //new GZIPInputStream(new java.io.FileInputStream("src/main/resources/"+archiveName))
    //      )
    archiveStream.readShort // first two bytes are always 0

    val magic = archiveStream.readByte
    if (magic != TYPE_UBYTE) throw new IllegalArgumentException("\"" + archiveName + "\" is not a valid archive")
    val numDims = archiveStream.readByte
    val dimSizes = new Array[Long](numDims)
    var size = 1 // for simplicity, we assume that total size does not exceeds Integer.MAX_VALUE
    for (i <- 0 until dimSizes.length) {
      dimSizes(i) = archiveStream.readInt
      size = size * dimSizes(i).toInt
    }
    println(s"size  ${size}")
    val bytes = new Array[Byte](size)
    archiveStream.readFully(bytes)
    NdArrays.wrap(Shape.of(dimSizes: _*), DataBuffers.of(bytes, true, false))
  }

3.from spark dataframe vectorUDT to generate

here need to import three package,some complex convert to generate, I think maybe could from ndarray return regenerate spark dataframe to write like csv or parquet file

4.also from Operand[T] tensor or tfrecord Dataset here I only see from ndArray to generate tensor ,and from Dataset to generate ndArray ...

Jul 02 '22 04:07 mullerhai