Culling and zooming to 100% puts the system to full load / crash of dt / Windows blue screen
Describe the bug
In lighttable while culling, e.g. using two pictures in my selection, I zoom to 100%. Depending on the pictures, the CPU gets to 100% load quite soon, RAM fills up completely. This status takes very long time, rendering the whole system more or less unresponsive. Sometimes, this ends up in an application crash or even a blue screen. This was not happening ~2-3 months (perhaps even longer) ago on the same system.
The application crash is reported in Windows event viewer for darktable\bin\gdbus.exe => C:\WINDOWS\System32\ucrtbase.dll
Some crashes also involved the nvidia driver C:\WINDOWS\System32\DriverStore\FileRepository\nvamsi.inf_amd64_e3b048849db4cdea\Display.NvContainer\NVDisplay.Container.exe
Working with the pictures in darkroom is fine, the overall response normal and fluent.
Steps to reproduce
Select two pictures (see attachment) for culling, then quickly zoom to 100%, e.g. using the mouse-wheel. The issue occurs in full screen mode (F11) or without.
Expected behavior
being more responsive, neither crash nor blue screen would be fine
Logfile | Screenshot | Screencast
darktable-log.txt darktable-log_06_zoom_100perc_2_pictures_culling_crash_bluescreen.txt
Commit
No response
Where did you obtain darktable from?
self compiled
darktable version
4.9.0+865~gf3a78631ad
What OS are you using?
Windows
What is the version of your OS?
Windows 11
Describe your system?
Laptop, Intel i5 8265U, 8GB RAM, nvidia GTX 1050 Max-Q Design (4GB)
Are you using OpenCL GPU in darktable?
Yes
If yes, what is the GPU card and driver?
nvidia GTX 1050 Max-Q Design (4GB), driver 561.09
Please provide additional context if applicable. You can attach files too, but might need to rename to .txt or .zip
2 NEF with related XMP: picture_01.zip picture_02.zip
I can't think of anything introduced in dt that could lead to increased memory requirements that could be the culprit right now.
Two points from your log:
2,9664 [dt_get_sysresource_level] switched to 2 as `large'
2,9664 total mem: 8043MB
sounds hairy. Pretty small memory taking lot's of resources at least might be a problem depending on the mem already use by the OS or other running apps.
USE HEADROOM: 400Mb
might be very small if other apps (like firefox for example) or the OS make use of the nvidia card.
Please check.
Thanks for looking into this so quickly - I double checked on my computer
- my GPU has already been configured to be just used by darktable
- I changed options > processing > darktable resources from "large" to "default"
Unfortunately no change, I took another pair of pictures, zoomed in to 100% and after some (long) time of eating up CPU and RAM, I got a blue screen again. Sorry that I'm not able to report anything else 😬 currently I'm on [40f497b4]
Ok, one more thing in the logs that seems to be wrong. The CL device simply does not have that amount of memory.
185,8313 process CPU [thumbnail] diffuse ( 0/ 0) 3728x5600 scale=1,0000 --> ( 0/ 0) 3728x5600 scale=1,0000 34 IOP_CS_RGB 5428MB
187,1191 process CPU [thumbnail] diffuse.1 ( 0/ 0) 3728x5600 scale=1,0000 --> ( 0/ 0) 3728x5600 scale=1,0000 35 IOP_CS_RGB 4092MB
290,9487 process CL0 [thumbnail] exposure ( 0/ 0) 5600x3728 scale=1,0000 --> ( 0/ 0) 5600x3728 scale=1,0000 couldn't copy image to OpenCL device
290,9750 pipe aborts CL0 [thumbnail] exposure ( 0/ 0) 5600x3728 scale=1,0000 --> ( 0/ 0) 5600x3728 scale=1,0000 couldn't run module on GPU, falling back to CPU
290,9911 process CL0 [thumbnail] exposure ( 0/ 0) 5600x3728 scale=1,0000 --> ( 0/ 0) 5600x3728 scale=1,0000 couldn't copy data back to host memory (C)
Could you please - not running darktable while doing so:
Edit your darktablerc file and remove all lines containing opencl and remove the opencl kernels (don't know how to do that on windows but you will find out :-)
Then use darktable -d pipe -d opencl -d memory -d tiling for the log.
For sure your system is under big memory stress and we have to find out if something goes wrong with opencl tiling ...
BTW, does this all work without crashes if opencl is not activated? (It will be very slow)
So I deleted the openCL kernel cache (at %localappdata%\Microsoft\Windows\INetCache\darktable) and cleaned-up darktable.rc as you suggested. First try resulted in a blue screen, second (after rebooting the machine and thus with more RAM available) just took very long time and the application returned.
Logfiles:
- with blue screen: darktable-log_01.txt
- without blue screen: darktable-log_02.txt
I will post feedback about what happens when deactivating openCL later this evening...
After turning openCL off, restarting darktable (to be sure) I did two more tests also resulting in application crash or even blue screen
- less available RAM (browser was running): darktable-log_03.txt
- more available RAM after fresh boot due to a blue screen 😁: darktable-log_04.txt
Ok and thanks. For sure the memory consumption is too high. Not sure yet if this is something we can handle better in dt.
I think I understand now what's happening. You are requesting two mipmap images thus starting two export pipelines, the second is started immediately after the first.
The algorithm making sure "dt should tile" doesn't know about the above and allows every pipe the whole lot what dt is allowing.
I don't know how to fix that right now, certainly requires a lot of thinking and source reading. I am pretty sure this has been like this for very long. We relied on swapping to be safe, certainly not true an more.
Maybe something else has changed on your system making it more unstable if its forced into swapping.
Thanks for analyzing this so quickly.
I noted, that it also seems to depend on the set of applied modules. My base processing stack includes three instances of D&S. As color equalizer is such a nice tool I use for some months now, I would assume, that this might be the crucial change on my system... I double-checked and found that a set of pictures which I used for providing lens correction (mostly featureless white / grey sky), simply don't run into this problem.
So as this issue seems to be quite complicated to solve as you already indicated, bottom line seems to be -> invest in an additional SO-DIMM bar 😁 If there is anything I could do to assist further, please just let me know 👋
@Macchiato17 If OpenCL is enabled you might try to use the "very fast GPU" mode and make sure you have set in darktablerc "opencl_mandatory_timeout=10000" or alike to enforce the OpenCL pipe. I think that should help until you got your bar :-)
I really appreciate your great commitment for the dt development and all your effort you put into optimization also for quite small and not too powerful systems (as mine perhaps may be 😉). So I just saw your PR already being merged into master, did a compile and gave it a try (turning darktablerc back to "very fast GPU" and applying the increased timeout value as you suggested)
---> 😎 <---
The GPU is heavily stressed 😁 and the system keeps well responsive and rendition of the 100% zoom in lighttable returns in an acceptable amount of time. Thanks so much again for your strong support and such a quick healing action. For sure, that's not to be taken for granted!
Have a good time 👋