Shaoyang Xu
Shaoyang Xu
您好,看了你的代码,大致理解了你的思路: 如果doc和word都是one-hot向量,那么x(feature)就是一个单位阵,所以我看到的第一层GCN设置的是featureless=True,即x并不需要参与计算;而第二层的x就不是单位阵了,而是第一层的 激活层(输出),所以x参与了计算。 你这么设置的原因是不是就是说:其实一开始的x就可以是一个稠密的矩阵,就比如word都用预训练的词向量,doc也有自己的向量表示,比如可以是句内所有word向量的平均,或者.... 问题就是: 1. 上面我理解的对吗? 2. 有尝试过预训练的词向量嘛,效果如何?
2021-01-18 03:07:40.679672: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-01-18 03:07:40.701768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:02:00.0 name: Tesla M40 24GB computeCapability: 5.2 coreClock: 1.112GHz coreCount: 24...
Hi, dear ziyi~ I found in your code, the bert output weights are not set to be the same as the input embedding, which can be proved in [here](https://github.com/neulab/awesome-align/blob/5f150d45bbe51e167daf0a84abebaeb07c3323d1/awesome_align/modeling.py#L374)(In detail,...
Hello, i found the grad matrix([grad = weight.grad](https://github.com/varun19299/rigl-reproducibility/blob/97443beac90e03f899652943594695e5152c2b09/sparselearning/funcs/grow.py#L86)) has many non-zero elements while their corresponding values in weight matrix are zero, I want to ask why this happen(as a beginner...
Hello authors, Your experiment results on harmfulness classification:`https://github.com/andyzoujm/representation-engineering/blob/main/examples/harmless_harmful/harmless_llama2.ipynb` shows that Llama-2-13b-chat achieves near 100% acc, even in the lower layers. I have tried more model: Llama-2-{7,70}b-chat, llama-2-7b, bloomz-{560m,1b1,1b7,3b,7b1}, bloom-7b1, all...