[BUG] [GAE] The triangles algorithm results cannot be consistent for the same data set
Describe the bug The triangles algorithm results cannot be consistent for the same data set when workers number is changed. I executed triangles algorithm with num_workers = 1, then improve num_workers to 2, the results are different. Does triangles algorithm works wrong in multi workers environment?
To Reproduce Steps to reproduce the behavior: code:
import graphscope
from graphscope.framework.loader import Loader
import os
def execute_triangles(num_workers=1):
sess = None
curr_dir = os.path.abspath('.')
dataset_name = "LDBC_SNB_REGRESSION"
try:
sess = graphscope.session(cluster_type="hosts", num_workers=num_workers, vineyard_shared_mem='10Gi')
graph = sess.g()
graph = graph.add_vertices(
Loader(os.path.join(curr_dir, dataset_name, "person.csv"), delimiter=","),
label="person"
, properties=["_rank"]
)
graph = graph.add_edges(
Loader(os.path.join(curr_dir, dataset_name, "person_knows_person.csv"), delimiter=","),
label="knows", src_label="person", dst_label="person"
, properties=["_rank"]
)
ctx = graphscope.triangles(graph)
dataframe = ctx.to_dataframe(selector={"v":"v.id","r":"r"}).sort_values(by='r', ascending=False)
print(dataframe)
return dataframe
finally:
if sess:
sess.close()
if __name__ == "__main__":
execute_triangles(num_workers=1).to_csv("triangles1.csv")
execute_triangles(num_workers=2).to_csv("triangles2.csv")
result:
# num_workers=1
v r
451 32985348834375 720
1443 2199023256816 642
468 6597069767242 543
184 1564 239
# num_workers=2
v r
986 32985348834375 719
346 2199023256816 642
617 6597069767242 540
479 1564 239
Expected behavior The number for triangles for each vertex should be same although num_workers is different, please help clarify if this case is working fine.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information): (gs20_py10) (base) $ python3 --version Python 3.10.11 (gs20_py10) (base) $ pip3 list|grep graphscope graphscope 0.26.0 graphscope-client 0.26.0 (gs20_py10) (base) $ uname -a Darwin f4d488816bd7 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:20 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6000 arm64
Additional context LDBC_SNB_REGRESSION.zip
Add any other context about the problem here.
Thanks for reporting! We will investigate this and keep you updated!
This triangles algorithm implementation requires the graph to be undirected, i.e., graph = sess.g(directed=False). On directed graphs, even in a single partition (single worker) setup, the correctness cannot be guaranteed. We've added schema detection logic.