ddcutil 2.0.0-rc1 detect command reports incorrect state due to displays-cache

The --enable-displays-cache seems to be enabled by default. I have two monitors, I powered one down, which results in:

% ddcutil detect
Invalid display
   I2C bus:  /dev/i2c-0
   DRM connector:           card0-DP-1
...

When I power the monitor back, ddutil detect still reports Invalid display until I delete ~/.cache/ddcutil/displays or pass --disable-displays-cache

My assumption was that --enable-displays-cache would cache attributes for the capabilities command, but that the detect command would not use it.

I would prefer the existing behaviour be the default, but I can change vdu_controls to pass detect --disable-displays-cache for ddcutil version > 2.0 if necessary.

Jun 24 '23 04:06 digitaltrails

While debugging this I was deleting the displays cache without deleting the dsa cache. Could that cause issues? I was seeing some unexpected errors issuing getvcp/setvcp - they were transient and went away for some reason and I don't seem to be able to get them to reoccur (maybe dynamic sleep played some role).

Jun 24 '23 04:06 digitaltrails

After running vdu_controls with 2.0.0-rc1 all of today, I see I definitely need to pass --disable-displays-cache for all ddcutil invocations, not just for detect. Otherwise the different error behaviour defeats the monitor power-off/suspend heuristics I've coded. The intent of these heuristics being to prevent repeatedly reporting errors while monitors are offline (the monitors can be offline for quite some time when I walk away from my PC for an extended period).

I guess others who have scripts wrapping ddcutil may also be affected.

Jun 24 '23 08:06 digitaltrails

After running vdu_controls with 2.0.0-rc1 all of today, I see I definitely need to pass --disable-displays-cache for all ddcutil invocations, not just for detect. Otherwise the different error behaviour defeats the monitor power-off/suspend heuristics I've coded. The intent of these heuristics being to prevent repeatedly reporting errors while monitors are offline (the monitors can be offline for quite some time when I walk away from my PC for an extended period).

I guess others who have scripts wrapping ddcutil may also be affected.

While I definitely need to pass --disable-displays-cache for detect, I'm not 100% sure where some of the other differences I'm seeing are coming from. I've recently also changed the graphics driver, VDU physical connections, python version, and kernel version. I will conduct some tests on an older more stable system.

In particular I'm seeing some python exception chaining errors which I thought I would have been experiencing earlier. Either python has changed, or the sequencing of errors has changed and surfaced coding errors in vdu_controls. I've prepared a vdu_controls release to go out when ddcutil 2.0 is released.

Jun 25 '23 10:06 digitaltrails

The --enable-displays-cache seems to be enabled by default. I have two monitors, I powered one down, which results in:
% ddcutil detect
Invalid display
   I2C bus:  /dev/i2c-0
   DRM connector:           card0-DP-1
...
When I power the monitor back, ddutil detect still reports Invalid display until I delete ~/.cache/ddcutil/displays or pass --disable-displays-cache

My assumption was that --enable-displays-cache would cache attributes for the capabilities command, but that the detect command would not use it.

I would prefer the existing behaviour be the default, but I can change vdu_controls to pass detect --disable-displays-cache for ddcutil version > 2.0 if necessary.

Today I initially struggled to reproduce the above issue. I've found that it occurs if I run a detect while the monitor is powering up. If I pick the right moment, the display doesn't yet have a valid display number, and ddcutil caches the display (dispno) number as -1. Once cached as dispno -1, I need to pass detect --disable-displays-cache to clear it.

Cache files are attached: ddcutil-cache-2023-06-26.zip

Perhaps some of the other glitches I encountered are due to this timing issue. If vdu_contols is waiting to auto-adjust for a changed Lux reading, it might just happen to retry during the warm-up period - which would then trigger ongoing problems.

(Edited to attach the correct zip).

Jun 25 '23 22:06 digitaltrails

@digitaltrails I've added changes to branch 2.0.0-dev intended to avoid restoring stale cached display references. I was not able to replicate your problem of a display ref for a disabled display being restored once the monitor was turned back on, but I was able to create the inverse problem: the valid display ref was being restored from cache even though the monitor had been turned off. (Probably an amdgpu vs nvidia driver issue - if need be I'll install a Nvidia card on a test bench system.)

Basically what happens now is that a cached display reference is only restored at the DDC code level if the lower I2C layer detects slave address x37 present. It only writes out a cached display reference for valid displays. To watch what is being saved and restored, use options --trcfunc serialize_one_display --trcfunc ddc_find_deserialized_display. (Tracing the latter function is only available with the latest changes.)

Interestingly, even if a display is turned off, the EDID can still be read. That's because the eeprom storing the EDID gets power from gets power via its connector. Nor does turning a display off change the values of attributes enabled or status in /sys/class/drm/cardN-XXX for the display. So I've had to rely on whether slave address x37 is active as a way to detect whether the monitor is turned on. (That happens at the I2C layer of code. Testing for actual DDC communication happens at the higher DDC layer.)

You should not have to change your code because the displays cache is enabled. If you are encountering problems, others will as well. If this problem cannot be fixed then at the very least *--disable-displays-cache needs to be the fault. The only reason for changing application code should be to take advantage of new features.

Jun 26 '23 11:06 rockowitz

@digitaltrails I've added changes to branch 2.0.0-dev intended to avoid restoring stale cached display references. I was not able to replicate your problem of a display ref for a disabled display being restored once the monitor was turned back on, but I was able to create the inverse problem: the valid display ref was being restored from cache even though the monitor had been turned off. (Probably an amdgpu vs nvidia driver issue - if need be I'll install a Nvidia card on a test bench system.)

Thanks, I'll clone and build that branch shortly.

Basically what happens now is that a cached display reference is only restored at the DDC code level if the lower I2C layer detects slave address x37 present. It only writes out a cached display reference for valid displays. To watch what is being saved and restored, use options --trcfunc serialize_one_display --trcfunc ddc_find_deserialized_display. (Tracing the latter function is only available with the latest changes.)

Interestingly, even if a display is turned off, the EDID can still be read. That's because the eeprom storing the EDID gets power from gets power via its connector. Nor does turning a display off change the values of attributes enabled or status in /sys/class/drm/cardN-XXX for the display. So I've had to rely on whether slave address x37 is active as a way to detect whether the monitor is turned on. (That happens at the I2C layer of code. Testing for actual DDC communication happens at the higher DDC layer.)

I'd noticed this zombie monitor effect. Some time ago someone was asking how to determine what VDU's were actually on programmatically from Qt, I suggested that ddcutil was one of the few things that seemed to be able to tell. I believe DisplayPort differs from HDMI/DVI in this respect, KDE seems to know when I power down a DisplayPort monitor (and reconfigures the desktop and windows).

You should not have to change your code because the displays cache is enabled. If you are encountering problems, others will as well. If this problem cannot be fixed then at the very least *--disable-displays-cache needs to be the fault. The only reason for changing application code should be to take advantage of new features.

As long with --enable-displays-cache as the default, detect only reports display numbers for VDU's that are really on, then I think my code will be OK. I think too much has changed on my desktop to be certain about the causes of what I termed other glitches, but it would be reassuring if all ddctuil commands gave the exact same results no matter the state of --enable-displays-cache/--disable-display-cache.

I've reverted the Nvidia driver, so any further testing will be with what appeared to be a more stable configuration.

Jun 27 '23 03:06 digitaltrails

One difference between --disable-display-cache/--enable-displays-cache is what happens after a previously identified edid specified VDU gets powered off or suspended. Doing a getvcp to the edid with cache off returns the error 'Display not found\n', where as, with cache enabled, no message is issued. In both cases the exit code is 1.

Cache off:

subprocess result:  ddcutil --enable-dynamic-sleep --force --disable-displays-cache --brief getvcp 10 --edid 00ffffffffffff001e6d07777b5101... stderr='Display not found
', exception=... returned non-zero exit status 1.

Cache on:

19:53:38 ERROR: subprocess result:  ddcutil --enable-dynamic-sleep --force --enable-displays-cache --brief getvcp 10 --edid 00ffffffffffff001e6d07777b5101... stderr='', exception=... returned non-zero exit status 1.

This causes a change in behaviour for vdu_controls. For auto-brightness control I treat Display not Found for a previously detected VDU as a temporary error due to VDU's being suspended or turned off. This allows me to avoid excessive logging during the "down time". I throttle logging errors until the not-found errors cease. In the absence of a not-found message, I treat the error as a "real-error" and will log it repeatedly.

Presumably some folk may have scripts that rely on this behaviour.

(I should mention I've switched my ddcutil 2.0.0-dev - which is what the above refers to.)

Jun 27 '23 08:06 digitaltrails

It turns out that using slave address x13 as a proxy for whether the monitor is turned on does not always work. I have a monitor, (Samsung U32H750x) that reports address x13 as active even when the monitor is turned off using the using the adjustment nub. i2cdetect confirms this behaviour. Unless I can find another solution caching of display information will be removed.

Jun 29 '23 09:06 rockowitz

It turns out that using slave address x13 as a proxy for whether the monitor is turned on does not always work. I have a monitor, (Samsung U32H750x) that reports address x13 as active even when the monitor is turned off using the using the adjustment nub. i2cdetect confirms this behaviour. Unless I can find another solution caching of display information will be removed.

In vdu_controls I don't cache VDU on/off state, I just cache capabilities. It's been a long few days, so I might not be thinking straight, but would just caching capabilities be a sufficient win for caching in ddcutil?

Jun 29 '23 10:06 digitaltrails

Displays caching has been eliminated. Options --enable-displays-cache and --disable-displays-cache and command discard displays cache are no longer recognized. These changes have been applied to branch 2.0.0-dev. Thank you @digitaltrails for catching the problems with displays caching before the feature was released.

Jun 29 '23 10:06 rockowitz

@digitaltrails capabilities is by far the most expensive ddcutil command, entailing multiple write/read requests associated sleep time, and possible retries. For any given monitor model, the capabilities string is constant, so caching the string turns the capabilities command from expensive to cheap. It's a big win for that operation, but only that operation.

The capabilities string is cached as part of the displays cache, but that is simply because a display reference contains a copy of the capabilities string IF the string has been read.

The displays cache reflects the probing that happens during display detection at the start of every ddcutil command to verify DDC communication, check how the monitor reports invalid features, and query the MCCS version. In the typical case it uses two getvcp requests, i.e. write/read operations and associated sleeps. This causes in a noticeable elapsed time if the sleep-multiplier is 1.0 and there are multiple monitors. However, if the sleep-multiplier is significantly reduced, either by option --sleep-multiplier or option --enable-dynamic-sleep the detection operation becomes much faster. So long as --enable-dynamic-sleep proves solid, for most configurations the loss of --enable-displays-cache should not be particularly noticeable.

Jun 29 '23 11:06 rockowitz

Thanks for the explanation, so --enable-displays-cache was directed at speeding up all operations, quite different from caching in vdu_controls.

I've been using --enable-dynamic-sleep as my daily-driver right back from the time where it was activated by --dsa2. It's been working fine (except as noted in #321, which appears to be Nvidia's problem). Subjectively --enable-dynamic-sleep feels more responsive.

Jun 29 '23 21:06 digitaltrails