Is streamz not maintained anymore? What happened to cuStreamz?
As the title suggests I would be interested in knowing why there is no active development happening anymore? Is streamz deprecated? Not maintained anymore?
Also, does anyone know what happened to cuStreamz? (NVIDIA's GPU-accelerated streamz fork within their rapids.ai framework)
Yeah, I'm also wondering about this. I'd be happy to help with this, I actually like the framework very much.
Let's update it then I guess.
Hi! maintainer here. Thanks for the interest. Indeed, there has not been much activity at all in this project. I try to answer usage questions and fix anything broken, and I can make releases. However, there just hasn't been demand for new development. The niche streamz tried to fill seems to be one that few want (not that I have tried marketing).
@rjzamora , who at nvidia can speak to the status of cuStreamz?
I think custreamz is still maintained within the cudf repo, but I don’t think the project is very active. @jdye64 can probably provide a more accurate status :)
Hi. Panel and HoloViz ecosystem maintainer here.
I'm on constant doubt on whether I/ we should still be recommending/ promoting examples with Streamz library in our docs. For example here https://panel.holoviz.org/reference/panes/Streamz.html and here https://holoviews.org/user_guide/Streaming_Data.html.
When trying to use Streamz for my work related use cases I find it hard as its not a very active library. And most of the functionality (if not all?) of Streamz can now be replaced by Param Reactive Expressions and Param Generators for which I can find support in the community.
Any thoughts here? Would it be better for the Python ecosystem in general to continue using/ recommending Streamz? Or is it a project that should not be used any more?
In https://github.com/holoviz/panel/pull/6980 I suggest recommending to use Panel/ Param native features instead of Streamz. But that is my personal recommendation based on my experience using Streamz.
When trying to use Streamz for my work related use cases I find it hard as its not a very active library.
Yes.
And most of the functionality (if not all?) of Streamz can now be replaced by Param Reactive Expressions and Param Generators for which I can find support in the community.
Is that so? I do not want to go off-topic here, thus I will put the code in spoiler-boxes but I think Param only covers a subset of Streamz. I did not test the following code, might very well contain bugs (just quickly drafted something that came to mind where the two libraries would be similar).
If we were to build some reactive updating mechanism to dynamically change a parameter in a visualization task, then an implementation using param could look somewhat like this:
import param
import panel as pn
import matplotlib.pyplot as plt
class ReactivePlot(param.Parameterized):
frequency = param.Number(1, bounds=(0.1, 10))
@param.depends('frequency')
def view(self):
fig, ax = plt.subplots()
t = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * self.frequency * t)
ax.plot(t, y)
return fig
reactive_plot = ReactivePlot()
app = pn.Column(reactive_plot.param, reactive_plot.view)
app.show()
Whereas one might achieve something similar with streamz somewhat like this:
import numpy as np
import matplotlib.pyplot as plt
from streamz import Stream
import panel as pn
source = Stream()
def update_plot(frequency):
fig, ax = plt.subplots()
t = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * frequency * t)
ax.plot(t, y)
plt.close(fig)
return fig
plot_stream = source.map(update_plot)
def set_frequency(event):
source.emit(event.new)
slider = pn.widgets.FloatSlider(name='Frequency', start=0.1, end=10, value=1)
slider.param.watch(set_frequency, 'value')
pn.panel(plot_stream.latest()).servable()
pn.Row(slider).servable()
pn.serve()
In both cases, the crucial behavior to achieve would be having "something dynamic that gets updated throughout the process". This could be dynamic/reactive parameters using param or simply values that can flow at will through a pipeline that trigger a callback.
However, at least in my understanding @MarcSkovMadsen, both libraries also differ a lot. param mostly focuses on (dynamic) parameters and even specifying "types" (like "this is a number; interval is ..."). streamz does not specify or type anything. You can certainly filter or guard inputs or outputs at any moment within the stream but those would be functions (just like everything else) to implement within the stream.
In my opinion, param focuses on the actual data, types, ... and their behavior whereas streamz focus is not the data "running through the pipes" but rather having an easy and functional framework to build the "pipes" together to one Stream (computational graph, network, ... whatever you like to call it). What actually flows and happens inside the stream, is solely up to the functions that were chained.
Moreover, I lack to see where reactive behavior, parameter validation, lazy evaluation, ... would be available in streamz and, the other way around, I do not see where param would offer the ability for flow control, splitting and joining streams of data, batching & collecting, parallelization, ...
I think @Daniel451 's outline comparison of the libraries if fair. (@philippjfr surely has thoughts on whether param can be a complete replacement, with long experience with both)
As far as this library goes, indeed there has been no development, and there are no plans, as I said above. It always seemed like a great idea, but without quite enough of a demonstrative use case to get users interested. If your end use case is really connecting with panel or other viz/frontend, maybe param makes a lot of sense. If you actually want to do general purpose async event management with complex branching/aggregation logic in python, maybe you would end up writing something like streamz specialised to the particular use case.
Thx for taking the time @Daniel451. I also think your comparison is very fair.
For completenes I was thinking about and referring to param.rx/ panel.rx. Your example would look something like
`pn.rx`
import numpy as np
import matplotlib.pyplot as plt
import panel as pn
source = pn.rx(1) # same as param.rx(1)
def update_plot(frequency):
fig, ax = plt.subplots()
t = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * frequency * t)
ax.plot(t, y)
plt.close(fig)
return fig
plot_stream = source.rx.pipe(update_plot)
def set_frequency(event):
source.rx.value = event.new
slider = pn.widgets.FloatSlider(name='Frequency', start=0.1, end=10, value=1)
slider.param.watch(set_frequency, 'value')
pn.Column(plot_stream, slider).servable()
Here is an alternative version that is more .rx like
`pn.rx` - more .rx like
import numpy as np
import matplotlib.pyplot as plt
import panel as pn
slider = pn.widgets.FloatSlider(name='Frequency', start=0.1, end=10, value=1)
frequency = slider.rx()
t = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * frequency * t)
def update_plot(y):
fig, ax = plt.subplots()
ax.plot(t, y)
plt.close(fig)
return fig
plot_rx = pn.rx(update_plot)
plot_stream = plot_rx(y) # could also have been y.rx.pipe(update_plot)
pn.panel(plot_stream).servable()
In my view param.rx can play a role very similar to streamz.Stream. Param does not contain all the more advanced functionality for streams that Streamz contain. But is also what in my experience is hard to use and in my experience its often either not needed or is simpler to understand (also for a team) when implemented directly in the project.
What is especially hard for me to understand is whether Streamz provides any kind of performance optimizations built in? For example when displaying rolling windows of a DataFrame and streaming one row at a time. That would be a big advantage of Streamz over DIY via param.rx for specific use cases.
For completenes. Here is a streaming version based on a generator function.
`pn.rx` - streaming generator
import numpy as np
import matplotlib.pyplot as plt
from time import sleep
import panel as pn
def source():
while True:
for i in range(0,10):
yield i
sleep(0.25)
frequency = pn.rx(source)
t = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * frequency * t)
def update_plot(y):
fig, ax = plt.subplots()
ax.plot(t, y)
plt.close(fig)
return fig
plot_rx = pn.rx(update_plot)
plot_stream = plot_rx(y) # could also have been y.rx.pipe(update_plot)
pn.panel(plot_stream).servable()
whether Streamz provides any kind of performance optimizations
The plotting itself is handled by the panel/hv stack, so streamz only passes the data on. Streamz can handle stateful dataframes, dataframe updates and per-row also; and hv allows for the updating of in-browser data one row at a time without replacing previous data. You should get decent performance either way (within what is possible from the jupyter comm/websocket implementation).
Thx for taking the time @Daniel451. I also think your comparison is very fair.
For completenes I was thinking about and referring to
param.rx/panel.rx. Your example would look something like
pn.rxHere is an alternative version that is more.rxlike
pn.rx- more .rx like In my viewparam.rxcan play a role very similar tostreamz.Stream. Param does not contain all the more advanced functionality for streams that Streamz contain. But is also what in my experience is hard to use and in my experience its often either not needed or is simpler to understand (also for a team) when implemented directly in the project.What is especially hard for me to understand is whether Streamz provides any kind of performance optimizations built in? For example when displaying rolling windows of a DataFrame and streaming one row at a time. That would be a big advantage of Streamz over DIY via
param.rxfor specific use cases.For completenes. Here is a streaming version based on a generator function.
pn.rx- streaming generator
Rx may be useful for GUI apps, but I don't think streamz can be replaced for complex data pipelines, especially because it can scale with dask.
What's also important when going distributed is that data should not be updaded in-place, since tasks may have to be re-tried. For example, streamz' accumulation node lets you account for that by letting you specify how the state is updated by giving you access to both acc (previous state) and x (current element), whereas rx abstracts this.
What's also very nice is how you can call .visualize() on any node to see a flow-diagram of your entire pipeline.
I use streamz pretty much every day, so it's a shame that it's not maintained more actively.
While it has grouping by keys in the partition node, it should also support grouping in all other nodes that hold a state. E.g. here's an example of how I've adjusted streamz' accumulation node locally:
OB_latest_state = (
agg_deltas
.accumulate(
update_book_state,
start={},
groupby=symbol_grouper_fn, # <-- custom grouper
)
.map(OB_record_to_dataset)
)
Another nice thing to add could be metrics per node, so it's easier to spot where the backpressure is coming from.
Let's revisit this. I also now have some more time on my hands.
I am still watching, let me know if you have any concrete plans.
Will do I'll take a look at things this weekend and see what I can work on.
Still going through the codebase to understand this a bit more. I'll have an update for you for next week.
Sorry if the Tornado stuff is outmoded - this package was started very long ago!
No worries.