likuanppd comments

Results 11 comments of


                                            likuanppd

Some questions about pgd_test

Thanks for your reply. I find the reason anyway. The func-def preprocess is not the same in github/DeepRobust/utils.py as it in the Lib. I reinstall deeprobust-0.2.4, but it still report...

Some questions about pgd_test

I tried to load the dataset by `data = Dataset(root='/tmp/', name=args.dataset, setting='gcn')` It will destroy the performance of GCN on clean graph and make the attack fail, so the failure...

Some questions about pgd_test

The Line 90 seems equal to Meta-train, and Line 91-96 are similar to Meta-self. I conduct more experiments, and the results show that PGD-evasion only works on the standard split...

Hi Jin, I have published a paper in ICLR for this problem, which can be referred here --- https://openreview.net/forum?id=dSYoPjM5J_W&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DICLR.cc%2F2023%2FConference%2FAuthors%23your-submissions)

MetaAttack will OOM if PyTorch>1.9

I've tried your solution, and it works perfectly. Good job!

Docker support please

We are preparing it, and it should be released within a few days.

配置完openrouter后不需要改其他东西吗？

把脚本中 ###################################### ### 1. start server ### ###################################### 和 ####################################################### ### 2. Waiting for the server port to be ready ### ###################################################### 这两部分注释掉即可，注意model_path还是要写，因为需要加载tokenizer来统计是否超长

Will you truly open source the code for data synthesis in WebShaper?

Thanks for your interests! Larger-scale WebSailor is coming soon.

gaia复现问题

能提供更多细节吗，因为GAIA的复现已经有人尝试过了，是完全可以复现结果的 Reference issue：https://github.com/Alibaba-NLP/DeepResearch/issues/173 { "overall": {"avg_pass_at_3": 77.35, "best_pass_at_1": 82.52, "pass_at_3": 91.26}, "individual": {"Round1_Pass@1": 82.52, "Round2_Pass@1": 75.73, "Round3_Pass@1": 75.25}, "statistics": {"extra_length": 27.0, "num_invalid": 3.667, "avg_action": 16.814, "avg_visit_action": 8.458, "avg_search_action": 7.561, "avg_other_action":...

关于模型选型

1. 一方面是Qwen-3没有大规模的dense模型，另一方面更多的是效率和scaling问题，我们选择MOE的原因和选择所有大模型厂商普遍选择MOE模型的理由没有太多区别 2. 主流框架Megatron，Llamafactory之类的都支持MOE训练，可以看一下开源的训练框架