李理 comments

Results 45 comments of


                                            李理

The setting of dropout at './examples/07_convnet_layers.py' looks weird

I think tf.layers.dropout is designed for tf.estimator. See https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/examples/tutorials/layers/cnn_mnist.py for an example. To use low level API, it seems we still need a placeholder.

training process is killed because OOM

I changed batch_size to 8 but it's still killed. [3256824.391743] Killed process 9666 (python) total-vm:53893188kB, anon-rss:23892380kB, file-rss:152808kB it use too much memory

training process is killed because OOM

so what's wrong? From the /var/log. it seems this python process used 23892380kB(23GB) cpu memory(not gpu memory). [3256824.391743] Killed process 9666 (python) total-vm:53893188kB, anon-rss:23892380kB, file-rss:152808kB

Installation problems (TF bindings)

I face the same problem. When I add as bidai541 suggested, it seems to work. But When I run python setup.py test. it failed with:KeyError: "Registering two gradient with name...

DeepFM,FM部分疑问

http://fancyerii.github.io/2019/12/19/deepfm/

ctc和tensorflow绑定以后运行出了问题， undefined symbol: _ZNK10tensorflow14TensorShapeRep11DebugStringB5cxx11Ev

这是由于新版的Tensorflow的自定义Operation改了的原因。 https://github.com/fancyerii/deep_learning_theory_and_practice/blob/master/samples/ctc.pdf

Is it possible to shard model at loading time (FSDP)

see [this issue](https://github.com/TimDettmers/bitsandbytes/issues/1092#issuecomment-1969161870). QLora(load_in_4bits) is not compatible with fsdp/deepspeed. try with --use_bnb=False. I also recommend use deepspeed instead of fsdp. In my own experence, fsdp is not well implemented. you...

李理

The setting of dropout at './examples/07_convnet_layers.py' looks weird

training process is killed because OOM

training process is killed because OOM

Installation problems (TF bindings)

DeepFM,FM部分疑问

ctc和tensorflow绑定以后运行出了问题， undefined symbol: _ZNK10tensorflow14TensorShapeRep11DebugStringB5cxx11Ev

Is it possible to shard model at loading time (FSDP)

`GLIBC_2.29' not found

Is the code for Frame Interpolation in SVD open source?

Question about text preprocess in examples/language/llama2 and applications/Colossal-LLaMA-2