GraphScope icon indicating copy to clipboard operation
GraphScope copied to clipboard

[BUG] [GAE] The triangles algorithm results cannot be consistent for the same data set

Open dzhiwei opened this issue 1 year ago • 1 comments

Describe the bug The triangles algorithm results cannot be consistent for the same data set when workers number is changed. I executed triangles algorithm with num_workers = 1, then improve num_workers to 2, the results are different. Does triangles algorithm works wrong in multi workers environment?

To Reproduce Steps to reproduce the behavior: code:

import graphscope
from graphscope.framework.loader import Loader
import os

def execute_triangles(num_workers=1):
    sess = None
    curr_dir = os.path.abspath('.')
    dataset_name = "LDBC_SNB_REGRESSION"
    try:
        sess = graphscope.session(cluster_type="hosts", num_workers=num_workers, vineyard_shared_mem='10Gi')
        graph = sess.g()
        graph = graph.add_vertices(
            Loader(os.path.join(curr_dir, dataset_name, "person.csv"), delimiter=","),
            label="person"
            , properties=["_rank"]
        )
        graph = graph.add_edges(
            Loader(os.path.join(curr_dir, dataset_name, "person_knows_person.csv"), delimiter=","),
            label="knows", src_label="person", dst_label="person"
            , properties=["_rank"]
        )
        ctx = graphscope.triangles(graph)
        dataframe = ctx.to_dataframe(selector={"v":"v.id","r":"r"}).sort_values(by='r', ascending=False)
        print(dataframe)
        return dataframe
    finally:
        if sess:
            sess.close()
            
if __name__ == "__main__":
    execute_triangles(num_workers=1).to_csv("triangles1.csv")
    execute_triangles(num_workers=2).to_csv("triangles2.csv")

result:

# num_workers=1
                   v    r
451   32985348834375  720
1443   2199023256816  642
468    6597069767242  543
184             1564  239
# num_workers=2
                   v    r
986   32985348834375  719
346    2199023256816  642
617    6597069767242  540
479             1564  239

Expected behavior The number for triangles for each vertex should be same although num_workers is different, please help clarify if this case is working fine.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information): (gs20_py10) (base) $ python3 --version Python 3.10.11 (gs20_py10) (base) $ pip3 list|grep graphscope graphscope 0.26.0 graphscope-client 0.26.0 (gs20_py10) (base) $ uname -a Darwin f4d488816bd7 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:20 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6000 arm64

Additional context LDBC_SNB_REGRESSION.zip

Add any other context about the problem here.

dzhiwei avatar May 09 '24 08:05 dzhiwei

Thanks for reporting! We will investigate this and keep you updated!

yecol avatar Jun 07 '24 02:06 yecol

This triangles algorithm implementation requires the graph to be undirected, i.e., graph = sess.g(directed=False). On directed graphs, even in a single partition (single worker) setup, the correctness cannot be guaranteed. We've added schema detection logic.

luoxiaojian avatar Jul 16 '24 05:07 luoxiaojian