DingDing
DingDing
> 这个版本我也可以了
可以借助一个可以连huggingface datasets,并且可以将文件scp到目标服务器上的机器,例如本地机器, 运行python 代码 ```python from datasets import load_dataset mydataset = load_dataset("glue", "mrpc") mydataset.save_to_disk("YOURPATH/glue.mrpc") # 不一定叫glue.mrpc 取个名就行 ``` 在终端中 ```bash scp -r YOURPATH/glue.mprc USERNAME@IP:THE_ABSOLUTE_PATH_TO_SAVE_YOUR_DATASET ``` 之后在服务器中, 运行python代码 ```python from datasets...
知乎: https://zhuanlan.zhihu.com/p/55334148
1. debug时,batchsize和hidden 不要设置得能整除,有同学batchsize 和 hidden都设置成32, 这样维度错了也不知道。为了避免该问题可以在debug阶段设置成33和32; (其他维度也一样,不要能整除)不要整除而不是不相等是因为pytorch有自动广播机制。 2. 遇到疯狂输出的cuda error,例如  一般是列表索引越界,可以设置环境变量 CUDA_LAUNCH_BLOCKING=1来debug,具体可以参考网上很多帖子 3. 首先检查输入是否正确,对于NLP任务,可以将id复原为token,看句子是不是想要的那样。 4. 查看模型梯度是否正常,如下 ``` [(name, parameter.grad.abs().sum(), parameter.sum()) for name,parameter in model.named_parameters()] ``` 每个tuple里,第一个是模块名字,第二个是grad绝对值求和,第三个是参数求和; 如果 需要由grad的模块的grad绝对值求和为零,就说明有问题;如果两次打印之间模型参数求和没有变化,可能也有问题,说明学习率可能过小。...
1 million nodes may be too large for struc2vec. For me. a 80 ,000 nodes graph with threads 24 hasn't generate embedding after 5 hours training. And the algorithm is...
But I am wondering that, is there an upper bound for the number of threads we use? i.e. can the algorithm parallel well using large number of threads? Have the...
I also noticed this difference. In fact, in the tensorflow version of original paper, the train_adj and val_adj are different. So I think this implementation detail might be missed by...
Thank you for reminding me of the self loops
What's your torch, huggingface version? I can not replicate this problem. In my case, the log is: ````python >>> import torch >>> from transformers import LlamaTokenizerFast, LlamaForCausalLM >>> model_path =...
Thanks for your information, we will check whether 4.38.2 version updates break the code.