incubator-hugegraph icon indicating copy to clipboard operation
incubator-hugegraph copied to clipboard

[Question] The number of edges I queried is inconsistent with the number of edges I imported

Open LiJie20190102 opened this issue 3 years ago • 17 comments

Problem Type (问题类型)

others (please edit later)

Before submit

  • [X] 我已经确认现有的 IssuesFAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

  • Server Version: 1.0.0 (Apache Release Version)
  • Backend: RocksDB x nodes, HDD or SSD
  • OS: xx CPUs, xx G RAM, Ubuntu 2x.x / CentOS 7.x
  • Data Size: 65608366 vertices, 1806067135 edges

Your Question (问题描述)

I imported 65608366 vertices and 1806067135 edges. When I used hugegraph-computer or gremlin to query, the number of query edges was correct.

However, when I used "hugeClient.traverser().iteratorEdges(shard, 500)" to query the number of edges for each shard, and finally accumulated it, I found that there was an additional number of edges (1806312225 at this time). I don't know why the numbers were inconsistent. Can't we use "hugeClient. traverer(). iteratorEdges" to obtain the data size of all edges?

hugegraph-computer log: image

gremlin result: image

"hugeClient.traverser().iteratorEdges(shard, 500)" detail: Step 1:Query all shards information (http://x.x.x.x:8065/graphs/hugegraph/traversers/edges/shards?split_size=1048576) Step 2:Use "hugeClient. traverser(). iteratorEdges" to obtain the number of edges for each shard and then sum them。 result: Number of edges is 1806312225 , not 1806067135 .

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

{
    "vertexlabels": [
        {
            "id": 1,
            "name": "person",
            "id_strategy": "CUSTOMIZE_NUMBER",
            "primary_keys": [],
            "nullable_keys": [],
            "index_labels": [
                "personByAge"
            ],
            "properties": [
                "id"
            ],
            "status": "CREATED",
            "ttl": 0,
            "enable_label_index": true,
            "user_data": {
                "~create_time": "2023-03-13 09:52:29.084"
            }
        }
    ]
}

{
    "edgelabels": [
        {
            "id": 1,
            "name": "friend",
            "source_label": "person",
            "target_label": "person",
            "frequency": "SINGLE",
            "sort_keys": [],
            "nullable_keys": [],
            "index_labels": [],
            "properties": [],
            "status": "CREATED",
            "ttl": 0,
            "enable_label_index": true,
            "user_data": {
                "~create_time": "2023-03-13 09:52:30.760"
            }
        }
    ]
}

LiJie20190102 avatar Mar 31 '23 08:03 LiJie20190102

Thanks a lot for the details, could u tell us how to reproduce it with the minimum data?

imbajin avatar Mar 31 '23 15:03 imbajin

Thanks a lot for the details, could u tell us how to reproduce it with the minimum data?

Sorry, I don't know yet. When the number of vertices is 65608366, it is still found to be the correct number

LiJie20190102 avatar Apr 01 '23 07:04 LiJie20190102

@coderzc @imbajin Hello, are you dealing with this issue? I think this issue is more important. Thank you for helping me with it

LiJie20190102 avatar Apr 03 '23 03:04 LiJie20190102

@coderzc @imbajin Hello, are you dealing with this issue? I think this issue is more important. Thank you for helping me with it

we need to know how to reproduce it first,thanks

imbajin avatar Apr 03 '23 03:04 imbajin

@coderzc @imbajin Hello, are you dealing with this issue? I think this issue is more important. Thank you for helping me with it

we need to know how to reproduce it first,thanks

The problem scenario is as follows:

  1. First, import 65608366 vertices and 1806067135 edges;

image

  1. When I used "hugeClient. traverser(). iteratorEdges (shard, 500)" to query and sum the number of edges for each shard, I found that it was 1806312225, not 1806067135.

LiJie20190102 avatar Apr 03 '23 03:04 LiJie20190102

https://blog.csdn.net/penriver/article/details/115124350. We conducted the test based on this article, and the number of edges and vertices is consistent with the article. Please help with this, thank you . @coderzc @imbajin

LiJie20190102 avatar Apr 04 '23 06:04 LiJie20190102

blog.csdn.net/penriver/article/details/115124350

OK. get it, thanks for the feedback, you could also try count(-1) in gremlin query

imbajin avatar Apr 04 '23 06:04 imbajin

When I used 'count (-1)', there were some exceptions

企业微信截图_16805945976451

LiJie20190102 avatar Apr 04 '23 07:04 LiJie20190102

When I used 'count (-1)', there were some exceptions

use async way to execute gremlin instead, refer async-gremlin

imbajin avatar Apr 04 '23 08:04 imbajin

When I use count (-1), I am unable to query the correct data as it displays as 0. image

image

At the same time, when I use count(), I can find the correct data: image

LiJie20190102 avatar Apr 04 '23 12:04 LiJie20190102

please note the 'count (-1)' may mean .limit(-1).count()

javeme avatar Apr 17 '23 12:04 javeme

please note the 'count (-1)' may mean .limit(-1).count() The result is image

LiJie20190102 avatar Apr 24 '23 05:04 LiJie20190102

@javeme @imbajin @coderzc Hello, do you have any relevant conclusions?

LiJie20190102 avatar May 10 '23 02:05 LiJie20190102

https://blog.csdn.net/penriver/article/details/115124350. We conducted the test based on this article, and the number of edges and vertices is consistent with the article. Please help with this, thank you . @coderzc @imbajin

@LiJie20190102 do you mean the counts of iteratorEdges() and g.E().count() with the backend rocksdb: count(iteratorEdges()) != g.E().count()

javeme avatar May 10 '23 13:05 javeme

https://blog.csdn.net/penriver/article/details/115124350. We conducted the test based on this article, and the number of edges and vertices is consistent with the article. Please help with this, thank you . @coderzc @imbajin

@LiJie20190102 do you mean the counts of iteratorEdges() and g.E().count() with the backend rocksdb: count(iteratorEdges()) != g.E().count()

yeah

LiJie20190102 avatar May 10 '23 13:05 LiJie20190102

@javeme @imbajin @coderzc We are planning to use hugegraph in the production environment, but we are currently experiencing this issue. Please help us solve it as soon as possible. Thank you all

LiJie20190102 avatar May 19 '23 08:05 LiJie20190102

@javeme @imbajin @coderzc We are planning to use hugegraph in the production environment, but we are currently experiencing this issue. Please help us solve it as soon as possible. Thank you all

We welcome you to use HugeGraph. The imprecision of shard may be caused by some empty hole, but we need a way to reproduce it for confirmation & lack the time/priority for now..

In addition, because this case is relatively small, it can only be solved during scheduling. If emergency positioning/special support is needed, you can reply "support" in the Wechat official account

BTW, another good way to get the high priority support is that join our dev community (I'm for everyone, then everyone is for me)

imbajin avatar May 19 '23 10:05 imbajin