Low rendering performance due to gl_ReadPixels
I have been using visvis for a while now, but I get the impression I must be
doing something wrong.
Rendering performance has always been low; but it is starting to become an
issue. A simple mesh plot from any of the examples for instance, will update
with at most a few (<10) fps. I am running on a 2012 retina mbp. My graphics
card is working fine for any other application. What gives? I have the same
behavior on my work computer (win7, gtx680, hardware otherwise not the issue
either), and my old mbp.
I don't get any errors initializing ogl on startup or anything. But even if it
was software rendering, id be disappointed. There are fewer triangles onscreen
than in duke3d; and that's running buttery smooth on my android phone. Looking
at the visvis code, I don't see anything suspicious.
Is it just me? There seems to be a fillrate-dependency; if I shrink the window,
framerates become tolerable. But seriously...
Original issue reported on code.google.com by [email protected] on 16 Apr 2013 at 1:42
The problem is related to calls to gl_ReadPixels(), which make things very
small. This creates sort of a bottleneck, so that event the simplest graphics
draw at low speed. The extent to which these calls make rendering slow varies a
lot between OSes and kind of drivers (not so much how fast the card really is).
This can be fixed, but that requires some refactoring, and I haven't got around
that yet.
It should help if you set the 'useBuffer' property of the Axes objects to
False, since that would get rid of some of the calls.
Original comment by [email protected] on 16 Apr 2013 at 5:54
- Changed state: Accepted
- Added labels: Type-Enhancement
- Removed labels: Type-Defect
Thanks for the prompt reply. So is it correct to say that the issue is the
transfer of the frame from the GPU to the UI code running on the CPU, who then
has to send it back to its own GPU backend eventually?
I can imagine that keeping that buffer in GPU memory may be a bit of a pain,
depending on how and if the UI backend has been designed to accommodate that. A
shame though; id be thrilled if you did get around to fixing it! I now qt at
least has support for this kind of thing.
I suppose it is somewhat reassuring news though; I was worried that if
performance was like this for the simple scene I am drawing now, I figured the
scenes I want eventually I might as well draw by hand. But if this is the
problem, at least it shouldn't get worse.
Original comment by [email protected] on 16 Apr 2013 at 3:24
> So is it correct to say that the issue is the transfer of the frame from the
GPU to the UI code running on the CPU, who then has to send it back to its own
GPU backend eventually?
For the axes it is yes. Actually on my Linux machine with nvidia it makes
things faster if I have a volume rendering in one Axes and a simple plot in
another. But indeed I've seen terrible frame rates on quite modern ATI hardware
on Windows.
There is one place where the pixels are grabbed in order to support picking.
This can be removed as well, but requires some redesign.
>I suppose it is somewhat reassuring news though; I was worried that if
performance was like this for the simple scene I am drawing now, I figured the
scenes I want eventually I might as well draw by hand. But if this is the
problem, at least it shouldn't get worse.
Rest assured, I've done animated volume rendering at near-interactive
framerates :)
Original comment by [email protected] on 17 Apr 2013 at 5:59
I am running NVidia under win7 64 bit. But Linux has been tempting me for other
reasons as well; maybe I should give that a spin. I know readback from the GPU
is never going to be a path to killer performance, but it is a bit sad to see
that even in the days of PCIe 3 it isn't possible to send a single framebuffer
back and forth at anything resembling realtime framerates. There isn't anything
in the hardware that should make this convenience so extravagantly expensive,
so the blame probably is with some shitty windows kernel driver... perhaps
someone knows if windows 8 gives any relief here?
Original comment by [email protected] on 17 Apr 2013 at 7:14
In other projects we've been quite successful at using gl_ReadPixels only for the specific pixel under the mouse. This should be a significant performance win.
Looked into this briefly, but the screenshot functionality is intertwined with e.g. Axes.useBuffer. Improving this might easily break code. Probably not worth the effort.
In PyGfx we have a kick-ass picking system that only selects the pixel of interest.