darktable icon indicating copy to clipboard operation
darktable copied to clipboard

Culling and zooming to 100% puts the system to full load / crash of dt / Windows blue screen

Open Macchiato17 opened this issue 1 year ago • 7 comments

Describe the bug

In lighttable while culling, e.g. using two pictures in my selection, I zoom to 100%. Depending on the pictures, the CPU gets to 100% load quite soon, RAM fills up completely. This status takes very long time, rendering the whole system more or less unresponsive. Sometimes, this ends up in an application crash or even a blue screen. This was not happening ~2-3 months (perhaps even longer) ago on the same system.

The application crash is reported in Windows event viewer for darktable\bin\gdbus.exe => C:\WINDOWS\System32\ucrtbase.dll

Some crashes also involved the nvidia driver C:\WINDOWS\System32\DriverStore\FileRepository\nvamsi.inf_amd64_e3b048849db4cdea\Display.NvContainer\NVDisplay.Container.exe

Working with the pictures in darkroom is fine, the overall response normal and fluent.

Steps to reproduce

Select two pictures (see attachment) for culling, then quickly zoom to 100%, e.g. using the mouse-wheel. The issue occurs in full screen mode (F11) or without.

Expected behavior

being more responsive, neither crash nor blue screen would be fine

Logfile | Screenshot | Screencast

darktable-log.txt darktable-log_06_zoom_100perc_2_pictures_culling_crash_bluescreen.txt

Commit

No response

Where did you obtain darktable from?

self compiled

darktable version

4.9.0+865~gf3a78631ad

What OS are you using?

Windows

What is the version of your OS?

Windows 11

Describe your system?

Laptop, Intel i5 8265U, 8GB RAM, nvidia GTX 1050 Max-Q Design (4GB)

Are you using OpenCL GPU in darktable?

Yes

If yes, what is the GPU card and driver?

nvidia GTX 1050 Max-Q Design (4GB), driver 561.09

Please provide additional context if applicable. You can attach files too, but might need to rename to .txt or .zip

2 NEF with related XMP: picture_01.zip picture_02.zip

Macchiato17 avatar Oct 17 '24 20:10 Macchiato17

I can't think of anything introduced in dt that could lead to increased memory requirements that could be the culprit right now.

Two points from your log:

     2,9664 [dt_get_sysresource_level] switched to 2 as `large'
     2,9664   total mem:       8043MB

sounds hairy. Pretty small memory taking lot's of resources at least might be a problem depending on the mem already use by the OS or other running apps.

USE HEADROOM:             400Mb

might be very small if other apps (like firefox for example) or the OS make use of the nvidia card.

Please check.

jenshannoschwalm avatar Oct 20 '24 08:10 jenshannoschwalm

Thanks for looking into this so quickly - I double checked on my computer

  • my GPU has already been configured to be just used by darktable
  • I changed options > processing > darktable resources from "large" to "default"

Unfortunately no change, I took another pair of pictures, zoomed in to 100% and after some (long) time of eating up CPU and RAM, I got a blue screen again. Sorry that I'm not able to report anything else 😬 currently I'm on [40f497b4]

Macchiato17 avatar Oct 20 '24 14:10 Macchiato17

Ok, one more thing in the logs that seems to be wrong. The CL device simply does not have that amount of memory.

   185,8313 process                   CPU [thumbnail]      diffuse                (   0/   0) 3728x5600 scale=1,0000 --> (   0/   0) 3728x5600 scale=1,0000  34 IOP_CS_RGB 5428MB
   187,1191 process                   CPU [thumbnail]      diffuse.1              (   0/   0) 3728x5600 scale=1,0000 --> (   0/   0) 3728x5600 scale=1,0000  35 IOP_CS_RGB 4092MB
   290,9487 process                   CL0 [thumbnail]      exposure               (   0/   0) 5600x3728 scale=1,0000 --> (   0/   0) 5600x3728 scale=1,0000 couldn't copy image to OpenCL device
   290,9750 pipe aborts               CL0 [thumbnail]      exposure               (   0/   0) 5600x3728 scale=1,0000 --> (   0/   0) 5600x3728 scale=1,0000 couldn't run module on GPU, falling back to CPU
   290,9911 process                   CL0 [thumbnail]      exposure               (   0/   0) 5600x3728 scale=1,0000 --> (   0/   0) 5600x3728 scale=1,0000 couldn't copy data back to host memory (C)

Could you please - not running darktable while doing so:

Edit your darktablerc file and remove all lines containing opencl and remove the opencl kernels (don't know how to do that on windows but you will find out :-)

Then use darktable -d pipe -d opencl -d memory -d tiling for the log.

For sure your system is under big memory stress and we have to find out if something goes wrong with opencl tiling ...

BTW, does this all work without crashes if opencl is not activated? (It will be very slow)

jenshannoschwalm avatar Oct 21 '24 06:10 jenshannoschwalm

So I deleted the openCL kernel cache (at %localappdata%\Microsoft\Windows\INetCache\darktable) and cleaned-up darktable.rc as you suggested. First try resulted in a blue screen, second (after rebooting the machine and thus with more RAM available) just took very long time and the application returned.

Logfiles:

I will post feedback about what happens when deactivating openCL later this evening...

Macchiato17 avatar Oct 21 '24 17:10 Macchiato17

After turning openCL off, restarting darktable (to be sure) I did two more tests also resulting in application crash or even blue screen

Macchiato17 avatar Oct 21 '24 20:10 Macchiato17

Ok and thanks. For sure the memory consumption is too high. Not sure yet if this is something we can handle better in dt.

jenshannoschwalm avatar Oct 24 '24 06:10 jenshannoschwalm

I think I understand now what's happening. You are requesting two mipmap images thus starting two export pipelines, the second is started immediately after the first.

The algorithm making sure "dt should tile" doesn't know about the above and allows every pipe the whole lot what dt is allowing.

I don't know how to fix that right now, certainly requires a lot of thinking and source reading. I am pretty sure this has been like this for very long. We relied on swapping to be safe, certainly not true an more.

Maybe something else has changed on your system making it more unstable if its forced into swapping.

jenshannoschwalm avatar Oct 24 '24 10:10 jenshannoschwalm

Thanks for analyzing this so quickly.

I noted, that it also seems to depend on the set of applied modules. My base processing stack includes three instances of D&S. As color equalizer is such a nice tool I use for some months now, I would assume, that this might be the crucial change on my system... I double-checked and found that a set of pictures which I used for providing lens correction (mostly featureless white / grey sky), simply don't run into this problem.

So as this issue seems to be quite complicated to solve as you already indicated, bottom line seems to be -> invest in an additional SO-DIMM bar 😁 If there is anything I could do to assist further, please just let me know 👋

Macchiato17 avatar Oct 24 '24 20:10 Macchiato17

@Macchiato17 If OpenCL is enabled you might try to use the "very fast GPU" mode and make sure you have set in darktablerc "opencl_mandatory_timeout=10000" or alike to enforce the OpenCL pipe. I think that should help until you got your bar :-)

jenshannoschwalm avatar Oct 25 '24 05:10 jenshannoschwalm

I really appreciate your great commitment for the dt development and all your effort you put into optimization also for quite small and not too powerful systems (as mine perhaps may be 😉). So I just saw your PR already being merged into master, did a compile and gave it a try (turning darktablerc back to "very fast GPU" and applying the increased timeout value as you suggested)

---> 😎 <---

The GPU is heavily stressed 😁 and the system keeps well responsive and rendition of the 100% zoom in lighttable returns in an acceptable amount of time. Thanks so much again for your strong support and such a quick healing action. For sure, that's not to be taken for granted!

Have a good time 👋

Macchiato17 avatar Oct 25 '24 20:10 Macchiato17