euler icon indicating copy to clipboard operation
euler copied to clipboard

Euler2 RGCN speed

Open dewang23 opened this issue 5 years ago • 6 comments

Hi, I am using the RGCN implementation from examples directory on a custom dataset with 1137061 nodes and 58336927 edges. I have 6 node types and 67 edge types. There are no node features and 1 edge feature(which is equal to the edge type). The issue I am facing is that the training is extremely slow. The training was done on an n1-standard-64 machine of Google Cloud Platform (see here : https://cloud.google.com/compute/docs/machine-types) [64 cores, 240GB mem] I used the following parameters : layers = 1 num_negs = 2 lr = 0.01 optimizer = adam hidden_dim = 4 num_epochs = 1 embedding_dim 4 batch_size 1024 The training took total 183m51.489s. I have used very low settings here, and want to use higher settings like more dimensions, more number of epochs etc, but training time is an issue. Is such large training time expected for such kind of dataset? Or something is going wrong here? My training logs can be seen here => https://drive.google.com/file/d/1DyEPa9abK3X0UCiOqsemWZ8yxP5GjXBl/view?usp=sharing

dewang23 avatar Jul 10 '20 18:07 dewang23

try this?:

Euler-2.0 euler内核打开多线程支持(可选) euler主要是分布式面向吞吐优化的框架,为了降低线程调度带来的额外开销,euler的内核是基于单线程开发的,导致单机用户在某些情况下有性能问题。因此可以尝试在euler项目的顶层 CMakeLists.txt 中,将:

option(USE_OPENMP "Option for using open mp" OFF)

设置为

option(USE_OPENMP "Option for using open mp" ON)

然后重新运行 build.sh 脚本。

zakheav avatar Jul 11 '20 12:07 zakheav

我想请教下为什么我按照教程安装完成之后找不到tf_euler这个包,还有一些其他包也找不到?

ergouy avatar Jul 14 '20 01:07 ergouy

@zakheav I tried building euler with open mp on like you mentioned, but there is no improvement in training time. Here are the logs - For training (196m52.826s) : https://drive.google.com/file/d/1abcnW0ajZOjYasN-odmnSdrxlTjhIxKu/view?usp=sharing For building : https://drive.google.com/file/d/17Z7-hNsjyTSnHXWd8uQFnJkqHLMeo2-C/view?usp=sharing I installed tensorflow 1.12.0 from pip before building.

dewang23 avatar Jul 14 '20 02:07 dewang23

@dewang23 铁子,有没有安装教程,我按照官方教程安装完总是少东西,求指点!

ergouy avatar Jul 14 '20 02:07 ergouy

@zakheav I tried building euler with open mp on like you mentioned, but there is no improvement in training time. Here are the logs - For training (196m52.826s) : https://drive.google.com/file/d/1abcnW0ajZOjYasN-odmnSdrxlTjhIxKu/view?usp=sharing For building : https://drive.google.com/file/d/17Z7-hNsjyTSnHXWd8uQFnJkqHLMeo2-C/view?usp=sharing I installed tensorflow 1.12.0 from pip before building.

We implemented the basic version, which is related to relation number, so the speed is relatively slow there are many speed optimizations in the paper

alinamimi avatar Jul 14 '20 03:07 alinamimi

You mean the RGCN paper, right? Do you mean Sec.2.2 Regularization?

dewang23 avatar Jul 14 '20 14:07 dewang23