feat(python): Unify the graph level load_from and save to API & Bump up vineyard to v0.21.5
What do these changes do?
Unify the load_from and save_to API to support different kind of datasource.
related discussion issue: #2836 #2920
save_to
def save_to(self, path, format="serialization", **kwargs)
We use save_to to dump graph to certain format data, currently support graphar and serialization and it's extensible to add a new format.
-
pathoutput dir, support local,oss,s3,hdfs -
formatformat to save, default is 'serialization' -
related configurations the writing of each format may attach some related configurations. We set these configurations with naming strategy "format_config1", "format_config2". For example, graphar provide configuration:
- graphar_graph_name
- graphar_vertex_chunk_size
- graphar_edge_chunk_size
- graphar_file_type
- graphar_store_in_local
-
selectorto dump subgraph with selected vertices and edges User can defineselectorto only dump certain set of vertex and edge, example:
selector = {
"vertices": {
"person": ["id", "firstName", "secondName"],
"comment": None, # select all properties
},
"edges": {
"knows": ["CreationDate"],
"replyOf": None,
},
}
- return format
{"type": format, "URI": "format+file:///tmp/graph/xxx"}
Example:
# graphar
g.save_to("/tmp/graphar/", format="graphar", graphar_graph_name="ldbc", vertex_chunk_size=1024,
graphar_edge_chunk_size=4096, graphar_file_type="parquet", graphar_store_in_local=False)
{'type': 'grpahar', 'URI': 'graphar+file:///tmp/graphar/ldbc.graph.yml'}
# graphar with selector
selector = {
"vertices": {
"person": ["id", "firstName", "secondName"],
"comment": None, # select all properties
},
"edges": {
"knows": ["CreationDate"],
"replyOf": None,
},
}
g.save_to("/tmp/graphar/", format="graphar", selector=selector, graphar_graph_name="ldbc")
{'type': 'grpahar', 'URI': 'graphar+file:///tmp/graphar/ldbc.graph.yml'}
# serialization
g.save_to("/tmp/serialization/")
{'type':'serialization', 'URI': '/tmp/serialization/'}
load_from
def load_from(uri, **kwargs)
We use load_from to load graph to certain format data source, currently support graphar and serialization and it's extensible to add a new data source.
- uri examples
graphar+file:///tmp/graphar/ldbc.graph.yaml
graphar+oss://bucket/graphar/ldbc.graphar.yaml
/tmp/serialization
- related configurations
the loading of each format may attach some related configurations. We set these configurations with naming strategy "format_config1", "format_config2". For example, graphar provide configuration:
storage_options. for example graphar can set configurations:
- graphar_store_in_local # the graphar files are store in local file system of workers, only support for local file system.
-
selectorto load subgraph with selected vertices and edges User can defineselectorto only load certain set of vertex and edge. example:
selector = {
"vertices": {
"person": None # select all properties
"comment": None
},
"edges": {
"knows": None
"replyOf": None
},
}
- return A new Graph
Example:
# graphar
```python
Graph.load_from("graphar+file:///tmp/graphar/ldbc.graph.yml", graphar_store_in_local=True)
# graphar with selector
selector = {
"vertices": {
"person": None # select all properties
"comment": None
},
"edges": {
"knows": None
"replyOf": None
},
}
Graph.load_from("graphar+file:///tmp/graphar/ldbc.graph.yml", selector=selector)
# serialization
g.load_from("/tmp/serialization/")
Related issue number
Fixes #2836 Fixes #2920
Codecov Report
Attention: Patch coverage is 89.03226% with 17 lines in your changes are missing coverage. Please review.
Project coverage is 42.96%. Comparing base (
d46a354) to head (0971f2b). Report is 12 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #3610 +/- ##
===========================================
+ Coverage 27.76% 42.96% +15.19%
===========================================
Files 178 188 +10
Lines 16245 17540 +1295
===========================================
+ Hits 4511 7536 +3025
+ Misses 11734 10004 -1730
| Files | Coverage Δ | |
|---|---|---|
| python/graphscope/__init__.py | 82.85% <ø> (-0.48%) |
:arrow_down: |
| python/graphscope/client/session.py | 68.25% <ø> (+9.00%) |
:arrow_up: |
| python/graphscope/framework/dag_utils.py | 65.31% <100.00%> (+22.94%) |
:arrow_up: |
| python/graphscope/framework/graph_builder.py | 87.17% <ø> (+24.67%) |
:arrow_up: |
| python/graphscope/tests/unittest/test_graphar.py | 100.00% <100.00%> (ø) |
|
| python/graphscope/framework/graph.py | 82.87% <76.05%> (+17.68%) |
:arrow_up: |
... and 89 files with indirect coverage changes
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update bca8b74...0971f2b. Read the comment docs.
https://github.com/alibaba/GraphScope/actions/runs/8720836297/job/23936695839?pr=3610 The k8s-test always cancel after ~44m, Can someone help me with the CI? @siyuan0322 @dashanji @lidongze0629
@siyuan0322 Since manylinux2014-ci-test tag workflow has passed in https://github.com/alibaba/GraphScope/actions/runs/8733355972 , I have updated the manylinux2014 tag runner. FYI