bestbzw
bestbzw
@gouchangjiang I use this script https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/gemini/run_gemini.sh with my own DataLoader. the gpc config is: BATCH_SIZE = 4 WARMUP_STEPS = 1000 TOTAL_STEPS = 2e+8 SEQ_LEN = 1024 HIDDEN_SIZE = 5120 VOCAB_SIZE...
@frankxyy I met the same problem, could you tell me how you install the full version of cuda toolkit.
hi I also met the same problem. @tjruwase have you found a solution?
@JACOBIN-SCTCS I've read the paper"pointer network", but it doesn't mention whether it uses one-hot vectors. I found that the Pointer-Generator-Network (Get To The Point: Summarization with Pointer-Generator Networks) uses dense...