java icon indicating copy to clipboard operation
java copied to clipboard

Big NDArray generate Operand Tensor meet protobuf exceeded maximum protobuf size of 2GB ?

Open mullerhai opened this issue 3 years ago • 1 comments

Hi : from spark DataFrame generate org.tensorflow.ndarray.DoubleNdArray , after I want to generate Operand[TFloat64] tensor , meet error


scala> val featureVector = SparkConverter.sparkDataframeFeatureVectorConvertTfTensor(finalInputDf,"final_features" )
featureVector: org.tensorflow.ndarray.DoubleNdArray = org.tensorflow.ndarray.impl.dense.DoubleDenseNdArray@e3f6a6a0
scala> val ft  = tf.constant(featureVector)
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/message_lite.cc:451] tensorflow.AttrValue exceeded maximum protobuf size of 2GB: 6279090916
org.tensorflow.exceptions.TFInvalidArgumentException: AttrValue missing value with expected type 'tensor'
         for attr 'value'
        ; NodeDef: {{node Const}}; Op<name=Const; signature= -> output:dtype; attr=value:tensor; attr=dtype:type>
  at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:87)
  at org.tensorflow.EagerOperationBuilder.execute(EagerOperationBuilder.java:314)
  at org.tensorflow.EagerOperationBuilder.build(EagerOperationBuilder.java:77)
  at org.tensorflow.EagerOperationBuilder.build(EagerOperationBuilder.java:64)
  at org.tensorflow.op.core.Constant.create(Constant.java:1350)
  at org.tensorflow.op.core.Constant.tensorOf(Constant.java:521)
  at org.tensorflow.op.Ops.constant(Ops.java:1669)
  ... 59 elided

but if I filter some small part Dataframe is ok


scala> val featureVector = SparkConverter.sparkDataframeFeatureVectorConvertTfTensor(finalInputDf.filter(col("pay_status").equalTo(1)),"final_features" )
featureVector: org.tensorflow.ndarray.DoubleNdArray = org.tensorflow.ndarray.impl.dense.DoubleDenseNdArray@627077a

scala> val ft_small  = tf.constant(featureVector)
ft_small: org.tensorflow.op.core.Constant[org.tensorflow.types.TFloat64] = <Const 'Const_2'>

scala> ft_small.asTensor().numBytes()
res43: Long = 1058424696

mullerhai avatar Jul 04 '22 09:07 mullerhai

need I have to split the DoubleNdArray to some part ? or we have another way to convert it to Operand[T]?

I found we have java.util.Spliterator

scala> featureVector.shape
res46: org.tensorflow.ndarray.Shape = [900021, 147]

scala> featureVector.scalars()
res47: org.tensorflow.ndarray.NdArraySequence[org.tensorflow.ndarray.DoubleNdArray] = org.tensorflow.ndarray.impl.sequence.FastElementSequence@f9698af

scala> featureVector.scalars().spliterator
res48: java.util.Spliterator[org.tensorflow.ndarray.DoubleNdArray] = java.util.Spliterators$IteratorSpliterator@bdc74838

scala> featureVector.scalars().spliterator.trySplit
res49: java.util.Spliterator[org.tensorflow.ndarray.DoubleNdArray] = java.util.Spliterators$ArraySpliterator@f4f92e6e


mullerhai avatar Jul 04 '22 09:07 mullerhai