Raphtory icon indicating copy to clipboard operation
Raphtory copied to clipboard

Graph view builder (Smoother post-processing workflow for windowed data)

Open narnolddd opened this issue 2 years ago • 2 comments

Would be really nice to have a workflow for how to get stats from multiple different algos and window sizes into a pandas dataframe (kind of similar to the to_df on the old Raphtory for global state algorithms). I don't know if this best exists within the core library or as a notebook example. Would be nice to have something like:

time windowsize number_of_nodes number_of_vertices other_metrics
1 86400 36 24 ...
... ... ... ... ...

working with lists as numpy arrays is a bit painful for doing processing on

narnolddd avatar May 19 '23 14:05 narnolddd

On this point, with #884 we added a function time_index() to window sets returned by g.rolling() and similar functions that gives you a python iterable with this kind of scenario in mind. Basically you can do things like:

windows= g.rolling('1 day')

df = pd.DataFrame()
df['time'] = window.time_index()
df['number_of_vertices'] = [w.num_vertices() for w in windows]
df

which should output:

time number_of_vertices
2020-06-21 12:34:65 86400
... ...

I think we could go further by implementing more vectorized functions on top of window sets. For instance, a num_vertices function that return the number of vertices per window, so you can simply do:

df['number_of_vertices'] = windows.num_vertices()

And as an addition, we could have a function to_pandas available for the iterables that returns a pandas Series. That would allow us to integrate the time index on the same Series index. That way we could do things like:

windows= g.rolling('1 day')

df = pd.DataFrame()
df['number_of_vertices'] = windows.num_vertices().to_pandas()
df['number_of_edges'] = windows.num_edges().to_pandas()
df

And the output would be:

index number_of_vertices number_of_edges
2020-06-21 12:34:65 86400 340770
... ... ...

without needing to explicitly set the index. Another name for to_pandas() might be with_index(), but maybe is better conveying the fact that we are moving to pandas world after calling this function.

ricopinazo avatar May 23 '23 10:05 ricopinazo

Hijacking this ticket a little bit as it would be great if we could create a view_builder class which allows:

  • Specifying combinations of views i.e. rolling window x these layers x these subgraphs
  • An apply function to specify what to run and return over all of these views
  • Post processing helpers for the results

miratepuffin avatar Sep 07 '24 20:09 miratepuffin