incubator-graphar icon indicating copy to clipboard operation
incubator-graphar copied to clipboard

[#714] feat(Java): Chunk Info Reader

Open sapienza88 opened this issue 7 months ago • 14 comments

This PR introduces a set of utility methods designed to efficiently read information pertaining to graph vertex chunks. These utilities will streamline the process of accessing and interpreting how vertices are grouped and stored, which is crucial for optimizing graph processing and analysis tasks.

Specifically, this PR provides:

Methods to query and retrieve metadata about vertex chunks.

Functions to read the boundaries and contents of individual vertex chunks.

Utilities to facilitate the navigation and processing of vertex data in a chunked manner.

This enhancement will benefit features requiring granular access to graph vertex data, such as distributed graph algorithms, incremental graph updates, and optimized data loading.

sapienza88 avatar Jun 28 '25 18:06 sapienza88

If m lines of data is split into n files:

  • First n-1 files have chunkSize (get from VertexInfo/EdgeInfo getChunkSize()) entries each
  • Last file has the remainder: m - (n-1)*chunkSize

So we don’t need to read file record count.

How to know about "m"? the count of the data file associated with the vertex?

sapienza88 avatar Jun 30 '25 16:06 sapienza88

If m lines of data is split into n files:

  • First n-1 files have chunkSize (get from VertexInfo/EdgeInfo getChunkSize()) entries each
  • Last file has the remainder: m - (n-1)*chunkSize

So we don’t need to read file record count.

How to know about "m"? the count of the data file associated with the vertex?

It can be read from vertex_count.

yangxk1 avatar Jul 01 '25 02:07 yangxk1

@yangxk1 pls provide review and merge

sapienza88 avatar Jul 24 '25 02:07 sapienza88

@yangxk1 pls allow CI to re-trigger automatically when a new code is commited by me so that I don't have to wait for you to do it manually. Thanks.

sapienza88 avatar Jul 31 '25 07:07 sapienza88

@yangxk1 pls allow CI to re-trigger automatically when a new code is commited by me so that I don't have to wait for you to do it manually. Thanks.

To ensure security, workflow execution for first-time contributors requires approval from a project committer.

You can change the branches setting in java-info.yml to trigger the workflow in your own fork, or running the script commands in java-info.yml directly in your local environment. Just remember to revert any changes to the .yml file before submitting your final PR.

yangxk1 avatar Jul 31 '25 11:07 yangxk1

@yangxk1 thanks for merging the PR on the version parsing, pls also do the same for this PR to let us merge this. PS: Allow edits by maintainers is enabled.

sapienza88 avatar Aug 11 '25 12:08 sapienza88

@yangxk1 thanks for updating the PR on the version parsing, pls also do the same for this PR to let us merge this. PS: Allow edits by maintainers is enabled.

This PR does not use protobuf, so it is not that urgent. I will still help you as soon as possible.

yangxk1 avatar Aug 11 '25 12:08 yangxk1

@yangxk1 pls approve and merge this or let me know the changes required to merge before the next release so it can be included in the next release.

sapienza88 avatar Aug 16 '25 16:08 sapienza88

Hi @unical1988 ,I think you should open an issue to discuss the necessity of this pr.

yangxk1 avatar Aug 18 '25 02:08 yangxk1

@yangxk1 i already describe here whats the pr intended for, couldnt be clearer

sapienza88 avatar Aug 18 '25 09:08 sapienza88

You need to think about these:

If m lines of data is split into n files:

  • First n-1 files have chunkSize (get from VertexInfo/EdgeInfo getChunkSize()) entries each
  • Last file has the remainder: m - (n-1)*chunkSize

So we don’t need to read file record count.

Open an Issue focus on discussion rather than description. This is also the specification of the CONTRIBUTING document.

yangxk1 avatar Aug 18 '25 09:08 yangxk1

@yangxk1 i can and will add tests anything else has been discussed here, pls note that this is implemented in C++

sapienza88 avatar Aug 18 '25 10:08 sapienza88

@yangxk1 i can and will add tests anything else has been discussed here, pls note that this is implemented in C++

@yangxk1 do you agree that this function doesn't require opening an issue ? why do you think it is not correct?

sapienza88 avatar Sep 19 '25 22:09 sapienza88

  1. We have not discussed whether this function is necessary.
  2. We don’t know whether it is implemented in java-info or java-io or other some modules.

yangxk1 avatar Sep 22 '25 02:09 yangxk1