DingDing comments

Results 32 comments of


                                            DingDing

mac最新系统、微信最新版本、扩展最新版本打不开

> 这个版本我也可以了

国内网/服务器网不能自动从huggingface datasets上下载文件的解决方案

可以借助一个可以连huggingface datasets，并且可以将文件scp到目标服务器上的机器，例如本地机器, 运行python 代码 ```python from datasets import load_dataset mydataset = load_dataset("glue", "mrpc") mydataset.save_to_disk("YOURPATH/glue.mrpc") # 不一定叫glue.mrpc 取个名就行 ``` 在终端中 ```bash scp -r YOURPATH/glue.mprc USERNAME@IP:THE_ABSOLUTE_PATH_TO_SAVE_YOUR_DATASET ``` 之后在服务器中, 运行python代码 ```python from datasets...

深度学习代码debug方法

知乎: https://zhuanlan.zhihu.com/p/55334148

深度学习代码debug方法

1. debug时，batchsize和hidden 不要设置得能整除，有同学batchsize 和 hidden都设置成32，这样维度错了也不知道。为了避免该问题可以在debug阶段设置成33和32； (其他维度也一样，不要能整除）不要整除而不是不相等是因为pytorch有自动广播机制。 2. 遇到疯狂输出的cuda error，例如 ![1021656954422_ pic](https://user-images.githubusercontent.com/32740627/177328411-b2db2a0f-f66b-49ca-aa52-7bcf9899c14d.jpg) 一般是列表索引越界，可以设置环境变量 CUDA_LAUNCH_BLOCKING=1来debug，具体可以参考网上很多帖子 3. 首先检查输入是否正确，对于NLP任务，可以将id复原为token，看句子是不是想要的那样。 4. 查看模型梯度是否正常，如下 ``` [(name, parameter.grad.abs().sum(), parameter.sum()) for name,parameter in model.named_parameters()] ``` 每个tuple里，第一个是模块名字，第二个是grad绝对值求和，第三个是参数求和；如果需要由grad的模块的grad绝对值求和为零，就说明有问题；如果两次打印之间模型参数求和没有变化，可能也有问题，说明学习率可能过小。...

DingDing

mac最新系统、微信最新版本、扩展最新版本打不开

国内网/服务器网不能自动从huggingface datasets上下载文件的解决方案

深度学习代码debug方法

深度学习代码debug方法

The problem about the scalability

The problem about the scalability

Is this the implementation of transductive learning?

Is this the implementation of transductive learning?

[Feature Request]: Convert the finetuned checkpoint of MiniCPM to Llama format

[Feature Request]: Convert the finetuned checkpoint of MiniCPM to Llama format