java icon indicating copy to clipboard operation
java copied to clipboard

How to save NdArray object to local disk?

Open mullerhai opened this issue 3 years ago • 3 comments

Hi: when I generate NdArray obj from spark dataframe ,I want to save the NdArray obj for next model training,but I don't know how to save it.

mullerhai avatar Jul 01 '22 06:07 mullerhai

java.io.NotSerializableException: org.tensorflow.ndarray.impl.dense.DoubleDenseNdArray
  at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1185)
  at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553)
  at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510)
  at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
  at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179)
  at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349)

mullerhai avatar Jul 01 '22 10:07 mullerhai

Unfortunately there is no endpoint out-of-the-box for serializing/deserializing an NdArray. It would be a great addition though. You probably want to save the array type, shape and data instead of an object?

Can you show me how you initialize your NdArray? That might give me some cue on how we can tackle this.

karllessard avatar Jul 01 '22 13:07 karllessard

Unfortunately there is no endpoint out-of-the-box for serializing/deserializing an NdArray. It would be a great addition though. You probably want to save the array type, shape and data instead of an object?

Can you show me how you initialize your NdArray? That might give me some cue on how we can tackle this.

thanks, let me see. Now we have three ways to generate NdArray

  1. from java& scala normal array
val testMatrix1 = StdArrays.ndCopyOf(Array[Array[Int]](Array(1, 2, 3, 4, 45), Array(2, 4, 6, 8, 10), Array(3, 6, 9, 12, 15), Array(4, 8, 12, 16, 20)))
    val testMatrix2 = StdArrays.ndCopyOf(Array[Array[Int]](Array(1), Array(0), Array(1), Array(1)))

  1. from data disk file like csv or zip file or inputStream , nio Databuffers
  @throws[IOException]
  private def readArchive(archiveName: String) = {
    //    val dataset =classOf[MnistDataset].getClassLoader.getResourceAsStream(archiveName) //NullPointerException
    val dataset = new java.io.FileInputStream(archiveName)

    val gzipInputStream = new GZIPInputStream(dataset)
    val archiveStream = new DataInputStream(gzipInputStream) //new GZIPInputStream(new java.io.FileInputStream("src/main/resources/"+archiveName))
    //      )
    archiveStream.readShort // first two bytes are always 0

    val magic = archiveStream.readByte
    if (magic != TYPE_UBYTE) throw new IllegalArgumentException("\"" + archiveName + "\" is not a valid archive")
    val numDims = archiveStream.readByte
    val dimSizes = new Array[Long](numDims)
    var size = 1 // for simplicity, we assume that total size does not exceeds Integer.MAX_VALUE
    for (i <- 0 until dimSizes.length) {
      dimSizes(i) = archiveStream.readInt
      size = size * dimSizes(i).toInt
    }
    println(s"size  ${size}")
    val bytes = new Array[Byte](size)
    archiveStream.readFully(bytes)
    NdArrays.wrap(Shape.of(dimSizes: _*), DataBuffers.of(bytes, true, false))
  }

3.from spark dataframe vectorUDT to generate

here need to import three package,some complex convert to generate, I think maybe could from ndarray return regenerate spark dataframe to write like csv or parquet file

4.also from Operand[T] tensor or tfrecord Dataset here I only see from ndArray to generate tensor ,and from Dataset to generate ndArray ...

mullerhai avatar Jul 02 '22 04:07 mullerhai