Attemp to add schedule sampling in ComputePredictionFunctional failed
https://github.com/tensorflow/lingvo/blob/2d05484a7d5d73db23f8a4b47d6d729b5e01fa6a/lingvo/tasks/asr/decoder.py#L1059
i found that ComputePredictionDynamic is too slow for schedule sampling, so i tried to add schedule sampling in the cell function RnnStep but failed following is my code
def ScheduledSampling(state0, inputs):
pick_groundtruth = tf.less(
tf.random_uniform([dec_bs], seed=p.random_seed),
state0.misc_states.groundtruth_p)
emb_ids = tf.stop_gradient(state0.misc_states.prev_predicted_ids)
curr_emb = self.emb.EmbLookupDefaultTheta(emb_ids)
target_emb = tf.where(pick_groundtruth,
inputs.emb,
curr_emb)
#inputs.id outside is int32, but inside is int64
target_id = tf.where(pick_groundtruth,
inputs.id,
tf.cast(state0.misc_states.prev_predicted_ids, inputs.id.dtype))
return py_utils.NestedMap(id=target_id,
label=inputs.label,
weight=inputs.weight,
emb=target_emb,
padding=inputs.padding,
misc=inputs.misc)
def RnnStep(recurrent_theta, state0, inputs):
self._max_label_prob = 0.1
theta = recurrent_theta.theta
packed_src = recurrent_theta.packed_src
# Use different id and embedding for scheduled sampling.
if self._max_label_prob > 0:
inputs = ScheduledSampling(state0, inputs)
"""Computes one rnn step."""
with tf.name_scope('single_decode_step'):
step_outs, state1 = self.SingleDecodeStep(
theta,
packed_src,
inputs,
state0,
use_deterministic_random=True)
state1.step_outs = step_outs
if self._max_label_prob > 0:
# Compute logits.
logits = self.softmax.Logits(theta.softmax, [step_outs])
state1 = self.PostStepDecoderStateUpdate(state1, logits)
else:
state1 = self.PostStepDecoderStateUpdate(state1, inputs.label)
return state1, py_utils.NestedMap()
program failed in the step:
curr_emb = self.emb.EmbLookupDefaultTheta(emb_ids)
following is the error info
I0625 13:29:42.561038 140512958846720 base_runner.py:236] trainer done (fatal error).
I0625 13:29:42.561534 140512958846720 base_runner.py:115] trainer exception: Combined status information from 5 operations:
Status code: Cancelled [2x]
[[{{node While}}]]
[[fprop/Cheji/tower_0_2/enc/Forward_M033FFondj4_3]] [1x]
[[{{node While}}]]
[[fprop/Cheji/tower_0_3/enc/Forward_sqqax2No8xE_3]] [1x]
Status code: Not found [3x]
No registered 'DynamicPartition' OpKernel for GPU devices compatible with node {{node ForwardLoopBody_IT6gRuK6Trc/Fwd_yuDjrWf0kAk/embedding_lookup/DynamicPartition}}
(OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT32, num_partitions=8, _device="/job:local/replica:0/task:0/device:GPU:1"
. Registered: device='CPU'; T in [DT_VARIANT]
device='CPU'; T in [DT_RESOURCE]
device='CPU'; T in [DT_STRING]
device='CPU'; T in [DT_BOOL]
device='CPU'; T in [DT_COMPLEX128]
device='CPU'; T in [DT_COMPLEX64]
device='CPU'; T in [DT_DOUBLE]
device='CPU'; T in [DT_FLOAT]
device='CPU'; T in [DT_BFLOAT16]
device='CPU'; T in [DT_HALF]
device='CPU'; T in [DT_INT8]
device='CPU'; T in [DT_UINT8]
device='CPU'; T in [DT_INT16]
device='CPU'; T in [DT_UINT16]
device='CPU'; T in [DT_INT32]
device='CPU'; T in [DT_INT64]
device='GPU'; T in [DT_COMPLEX128]
device='GPU'; T in [DT_COMPLEX64]
device='GPU'; T in [DT_DOUBLE]
device='GPU'; T in [DT_FLOAT]
device='GPU'; T in [DT_HALF]
[[ForwardLoopBody_IT6gRuK6Trc/Fwd_yuDjrWf0kAk/embedding_lookup/DynamicPartition]]
[[While]]
[[ArithmeticOptimizer/AddOpsRewrite_add_31_G1154]] [1x]
it seems that the attribure is change in the rnn step any solution? really appreciate.
As the error says it seems the EmbLookupDefaultTheta cannot be made with int32 dtype. Try casting emb_ids to float32?
https://tensorflow.google.cn/api_docs/python/tf/nn/embedding_lookup the function
tf.nn.embedding_lookup
use in
curr_emb = self.emb.EmbLookupDefaultTheta(emb_ids)
require int32 or int64, and both type is tried but failed
Hmm...
To check if EmbLookupDefaultTheta is actually the problem, can you replace that line with curr_emb = tf.zeros(expected_size) and see if everything runs fine?
i think it's about device. i can run this function on cpu, but failed on gpu
Yes but from the message it seems to be a problem with embedding_lookup but tf.nn.embedding_lookup should be supported on GPU.
this is the way i call this function, i need to set allow_implicit_capture=True or another assertion error will jump out, maybe the question is about the mechanism of core function recurrent.Recurrent
accumulated_states, _ = recurrent.Recurrent(
recurrent_theta, state0_no_fusion, inputs, RnnStep, allow_implicit_capture=True)
after replace that line as you said, i can run this function successfully
Hmm... To check if EmbLookupDefaultTheta is actually the problem, can you replace that line with curr_emb = tf.zeros(expected_size) and see if everything runs fine?
i guess that when using gpu, recurrent.Recurrent sets everything in cell_fn to run on gpu, but embedding can only run on cpu, thus there is no node for gpu embedding
Yes, that is exactly the problem, except that I thought tf.nn.embedding_lookup was supposed to work on GPU.
Otherwise, if it is not possible to use tf.nn.embedding_lookup inside Recurrent on GPU, then you will need to implement your own version of embedding lookup that does work. It should be possible using tf.gather.
Schedule sampling in while loop is much slower than I expected. For example, a model with speed 60example/second in Recurrent function, can only reach speed 24example/second after using ss in dynamic while loop.
watch this.