Graph view builder (Smoother post-processing workflow for windowed data)
Would be really nice to have a workflow for how to get stats from multiple different algos and window sizes into a pandas dataframe (kind of similar to the to_df on the old Raphtory for global state algorithms). I don't know if this best exists within the core library or as a notebook example. Would be nice to have something like:
| time | windowsize | number_of_nodes | number_of_vertices | other_metrics |
|---|---|---|---|---|
| 1 | 86400 | 36 | 24 | ... |
| ... | ... | ... | ... | ... |
working with lists as numpy arrays is a bit painful for doing processing on
On this point, with #884 we added a function time_index() to window sets returned by g.rolling() and similar functions that gives you a python iterable with this kind of scenario in mind. Basically you can do things like:
windows= g.rolling('1 day')
df = pd.DataFrame()
df['time'] = window.time_index()
df['number_of_vertices'] = [w.num_vertices() for w in windows]
df
which should output:
| time | number_of_vertices |
|---|---|
| 2020-06-21 12:34:65 | 86400 |
| ... | ... |
I think we could go further by implementing more vectorized functions on top of window sets. For instance, a num_vertices function that return the number of vertices per window, so you can simply do:
df['number_of_vertices'] = windows.num_vertices()
And as an addition, we could have a function to_pandas available for the iterables that returns a pandas Series. That would allow us to integrate the time index on the same Series index. That way we could do things like:
windows= g.rolling('1 day')
df = pd.DataFrame()
df['number_of_vertices'] = windows.num_vertices().to_pandas()
df['number_of_edges'] = windows.num_edges().to_pandas()
df
And the output would be:
| index | number_of_vertices | number_of_edges |
|---|---|---|
| 2020-06-21 12:34:65 | 86400 | 340770 |
| ... | ... | ... |
without needing to explicitly set the index. Another name for to_pandas() might be with_index(), but maybe is better conveying the fact that we are moving to pandas world after calling this function.
Hijacking this ticket a little bit as it would be great if we could create a view_builder class which allows:
- Specifying combinations of views i.e. rolling window x these layers x these subgraphs
- An apply function to specify what to run and return over all of these views
- Post processing helpers for the results