PyTorch-On-Angel issues

Error occurred when maven compiling java : Could not find artifact edu.princeton.cs:algs4:jar:1.0.3

1

here is the maven ( `mvn clean package -Dmaven.test.skip=true` ) output log: ``` [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 26.028 s [INFO] Finished at: 2022-05-05T16:24:57Z [INFO]...

ljacg

dependency issue

6

能否给一个教程，告诉我们要当前版本的pytorch-on-angel要搭配什么版本的angel以及其他的dependency？我反复尝试了很久，还是有各种version不匹配的问题。

xiaozhi-alan-zhu

2021Tencent Rhino-bird Open-source Training Program—Angel Zeng Shang

3

# 第一次作业 > 很荣幸入选 Angel 项目，开始开源实战环节。能够和导师们、同学们共同学习、了解 Angel 分布式机器学习平台架构设计原理是个难得的机会。以下是本次开源活动的实战笔记。因本人水平有限，错误和不足之处在所难免，敬请各位专家读者指正。 # Angel 环境搭建本次项目是基于 [Angel-ML/PyTorch-On-Angel](https://github.com/Angel-ML/PyTorch-On-Angel) 的一个论文复现，在进行其它工作之前，我们需要部署一个可以运行的环境。 ![https://github.com/Angel-ML/PyTorch-On-Angel/blob/master/docs/img/pytorch_on_angel_framework.png?raw=true](https://github.com/Angel-ML/PyTorch-On-Angel/blob/master/docs/img/pytorch_on_angel_framework.png?raw=true) PyTorch on Angel's architecture PyTorch-On-Angel 主要由三个模块构成： 1. Python Client：用于生成 ScriptModule 2. Angel PS：参数服务器，负责模型的分布式存储、同步和协调计算 3. Spark：Spark...

earlytobed

CMake error: add_subdirectory given source "pytorch_scatter-2.0.5" which is not an existing directory

1

hey, guys, I got a error at make stage -- The C compiler identification is GNU 6.3.0 -- The CXX compiler identification is GNU 6.3.0 -- Check for working C...

jinqinn

2021Tencent Rhino-bird Open-source Training Program—Angel Zhi Shen

1

## 腾讯犀牛鸟实战-Angel平台搭建和例程运行 ### 关于运行平台 1. 平台: AT平台的虚拟机都可以，另外一个云不行！！！实测另外一个平台搭建过程会报其他错，可能局域网有些其他设置或者hostname有问题吧。 2. 编译方式: 本地编译，伪分布式配置，系统centOS 7.2。 3. gcc: 7.3版本即可, cmake 3.21版本配置libtorch时候会报warning不知道会不会有问题，我后面换成3.12跑通的。参考网页： centOS下gcc的版本升级：https://blog.csdn.net/ncdx111/article/details/106047228 cmake下载安装：https://blog.csdn.net/weixin_30781433/article/details/98787965?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-1.base&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-1.base ### 关于hadoop，spark和pytorch版本 1. hadoop: 版本选2.7.x即可，2.7.1和2.7.5亲测可用。 2. spark: 之前群里有人测过这里spark2.3.0是必须的, 2.4.0版本会报错。 3....

xiaohu4313888

2021Tencent Rhino-bird Open-source Training Program—Angel YuZhengze

# TODO

Cavaradossi

modify gen_pt_model.sh for docker env Ye Huanjie

modify gen_pt_model.sh

YeHuanjie

liuqian mmoe python commit

xiaoSUM

gcn_modified.py is added by ZhenbangYou

After local test, I find that the original implementation of gcn (gcn.py) has some bugs. I fix them in a new python file called "gcn_modified.py" within the same fold, leaving...

ZhenbangYou

add sgcn

添加了可以通过GCNExample运行的sgcn，程序输出模型可直接用于在Pytorch-on-angel上运行GCNExample，之前的只能用于train.py。仓库下SGCN分支可用于train函数训练，SGCN-run可用于分布式训练，SGCN 实现可参考SGCN分支下README文档。 Cora数据集下测试结果如下：对于graphsage，训练200轮后测试集上accuracy为0.8380，训练耗时167s。对于sgcn, 训练200轮后测试集上accuracy为0.8341，训练耗时128s。实验结果和论文符合(论文数据分别为0.815和0.81)，模型精度略微下降，但因为去除了非线性，参数大大减少，训练时间得到了有效下降。 #101

xiaohu4313888

PyTorch-On-Angel
PyTorch-On-Angel copied to clipboard

Metadata

Error occurred when maven compiling java : Could not find artifact edu.princeton.cs:algs4:jar:1.0.3

dependency issue

2021Tencent Rhino-bird Open-source Training Program—Angel Zeng Shang

CMake error: add_subdirectory given source "pytorch_scatter-2.0.5" which is not an existing directory

2021Tencent Rhino-bird Open-source Training Program—Angel Zhi Shen

2021Tencent Rhino-bird Open-source Training Program—Angel YuZhengze

modify gen_pt_model.sh for docker env Ye Huanjie

liuqian mmoe python commit

gcn_modified.py is added by ZhenbangYou

add sgcn

← Metadata

Owner

Metadata

PyTorch-On-Angel PyTorch-On-Angel copied to clipboard

Metadata

← Metadata

Owner

Metadata

PyTorch-On-Angel
PyTorch-On-Angel copied to clipboard