defold [DEFEDIT-6399] Retain specific expensive outputs in editor cache

Fixes #6399

User-facing changes

Reduced the time spent on subsequent Build and Save operations when editing a large project.
Added a Shared Editor Settings editor to the Project menu where projects can adjust the cache size used by the editor for all users working in the project. If you still experience frequent delays in the editor after this change, try increasing the cache size.

Technical changes

The system cache now uses a custom LRU cache implementation.
Certain expensive-to-compute outputs are protected from eviction from the cache as it fills. Effectively, these outputs do not count towards the cache limit, and will not be evicted to make room for new entries.
To compensate, the default cache capacity to store the remaining outputs have been decreased from 40000 to 20000.
When creating a system cache, you can now provide a :cache-retain? predicate that takes an Endpoint and is expected to return true if the value should be retained in the cache even as it fills up.
When loading a project, we will look for a file named /project.shared_editor_settings, and use its performance.cache_capacity setting for the size of the system cache, if specified.
We now defer loading of namespaces related to the code view until after we've shown the Welcome screen.
Added few helper functions to the resource-node namespace that lets us resolve the Resource from a resource node id without having to create an evaluation-context.
Added settings/SimpleSettingsResourceNode to enable editing of simple ini-file format resources.
Endpoints now implement the Comparable interface, enabling their use as keys in sorted maps.
Evaluating a property on an OverrideNode no longer evaluates it on all originals recursively all the time.
We now use transient updates when modifying the system cache, which is a bit more efficient for our use case.
The JVM property defold.tests will now be set to true during tests.

Aug 04 '22 13:08 matgis

This PR seems to be affecting our efforts on the #5447 front. I tried this code on editor-dev and this branch, using "lots_of_collections1.zip" repro in the linked issue:

(in-ns 'internal.graph)

(require 'criterium.core)

(let [c (dynamo.graph/cache)
      deps (basis-dependencies
             (dynamo.graph/now)
             #{#g/endpoint [(editor.defold-project/get-resource-node
                              (ffirst (dynamo.graph/targets-of 0 :resource-list))
                              "/main/main.script")
                            :modified-lines]})]
  (criterium.core/quick-bench
    (internal.cache/cache-invalidate c deps)))

Results on editor-dev:

;Evaluation count : 30 in 6 samples of 5 calls.
;             Execution time mean : 25,691161 ms
;    Execution time std-deviation : 4,889217 ms
;   Execution time lower quantile : 23,363918 ms ( 2,5%)
;   Execution time upper quantile : 34,175660 ms (97,5%)
;                   Overhead used : 7,127517 ns
;
;Found 1 outliers in 6 samples (16,6667 %)
;    low-severe     1 (16,6667 %)
; Variance from outliers : 48,2024 % Variance is moderately inflated by outliers

Results on this branch:

;Evaluation count : 18 in 6 samples of 3 calls.
;             Execution time mean : 42,450507 ms
;    Execution time std-deviation : 6,554726 ms
;   Execution time lower quantile : 36,502785 ms ( 2,5%)
;   Execution time upper quantile : 51,301903 ms (97,5%)
;                   Overhead used : 7,109459 ns

I'm worried this will hurt editor performance in a noticeable way...

Aug 04 '22 16:08 vlaaad

Maybe a different approach would be better, like adding a bespoke flag to defnode macro similar to :cached that will use a different type of cache altogether. I haven't thought this through, just spitballing here :)

Aug 04 '22 16:08 vlaaad

Ohh, nice find about the potential slowdown during cache invalidation! That's surprising. I'll look into it! We definitely do not make the cache-invalidation situation worse.

Regarding the approach taken, I considered the approach of tagging the affected outputs in the defnode declaration, but decided against it for multiple reasons. Let me give you some background.

The retain rules differ between output types, so it wouldn't be as simple as just tagging an output as :cache-retained. Save-related outputs are retained in the cache for nodes that represent actual files in the project, whereas build-related outputs are retained for both files and zip-entries. The node types would be the same for both files and zip-entries, so there would need to be something like :cache :save and :cache :build to categorize the outputs.

Unfortunately we cannot rely on the fact that only nodes representing actual project files (as opposed to zip-entries) are connected to the save system, because one of the outputs save relies on (and we want to retain), :source-value, is also evaluated in order to determine the load order of nodes through the resource-node/resource-node-dependencies function, which evaluates it on zip-entries as well as files, so we really do need the two different retain rules.

But even with that, we don't actually want too retain every output tagged :cache :save or :cache :build. A lot of the :source-value and :build-targets outputs are on override instances of resource nodes rather than the "real" resource nodes. Retaining everything would immensely grow the number of cached entries in a large project, so we only retain the value of these outputs for the "real" (i.e. original) resource nodes and take the hit of redoing the work in case a dependency changes. For non-originals, regular eviction rules apply.

One approach would be to refer to the retain? predicate in the output definition. Something like

(output save-data :cached :retain? retain-rules/save-related? ...)
(output source-value :cached :retain? retain-rules/save-related? ...)
(output build-targets :cached :retain? retain-rules/build-related? ...)

This looks like a promising approach, but it has some drawbacks. First, we'd have to modify every one of these outputs on every defnode that is also a ResourceNode, touching a lot of code and potentially missing a few outputs if we're not careful. Second, it introduces a dependency between every code file containing one of these node definitions to this new hypothetical retain-rules module and all its transitive dependencies, which might complicate things in the future.

I don't like involving philosophy when it comes to decisions about architecture because I think it tends to overshadow practical concerns, but you could make an argument for the decision around whether or not a value should be retained in the cache being a cache-concern, not a node-concern. Especially considering our system has multiple different cache implementations, and both the null cache and unlimited cache implementations aren't concerned about retaining entires at all.

I hope this explains my reasoning for the approach I've taken, but I love to discuss it further if we can do better!

Aug 05 '22 10:08 matgis

I was not able to reproduce as drastic of a slowdown as you were seeing. Perhaps there was still some background processing going on when you ran the test on my branch? I was also surprised to see my times being so much lower than yours. However, I did see a slight but consistent degradation of performance on my branch compared to editor-dev. On my branch, evicting these elements from the cache had around 113% of the runtime compared to editor-dev.

Attempting to isolate the cause of the slowdown, I was getting inconsistent run times. Even when I used the same cache implementation as on editor-dev, it seems the mere presence of my additional code around it was causing it to run slower. The only way I could get it back to running at the original speed was to remove my extension protocol from the cache implementations. I tried moving the functionality to multimethods instead, or even to standalone functions switching on class internally. Nothing worked. It was as if the mere presence of my code was causing calls to core.cache/evict to become slower.

Eventually I gave up and concluded my change was causing the generated bytecode to exceed some kind of internal JIT-limit, changing how the code was being optimized internally. I decided then to try to minimize the number of calls to the core.cache/evict function. The pattern used in the editor is to evict many cache entries at once, so I tried to use transient updates of the cache internals instead. Unfortunately, the data.priority-map used for the LRU does not support transient updates, so only the cache map itself benefits from this. After this change, the cache evictions surpassed their original speed on editor-dev. On my machine I was seeing around 70% of the runtime compared to editor-dev.

My machine running editor-dev (baseline)

Number of dependencies: 40535
WARNING: Final GC required 30.7771517470939 % of runtime
Evaluation count : 192 in 6 samples of 32 calls.
             Execution time mean : 3,214822 ms
    Execution time std-deviation : 48,177338 µs
   Execution time lower quantile : 3,179466 ms ( 2,5%)
   Execution time upper quantile : 3,291455 ms (97,5%)
                   Overhead used : 6,830577 ns

My branch before addressing slowdown

Number of dependencies: 40535
WARNING: Final GC required 30.34009822798207 % of runtime
Evaluation count : 168 in 6 samples of 28 calls.
             Execution time mean : 3,623864 ms
    Execution time std-deviation : 42,324257 µs
   Execution time lower quantile : 3,594355 ms ( 2,5%)
   Execution time upper quantile : 3,685611 ms (97,5%)
                   Overhead used : 6,830619 ns

My branch after using transient cache updates

Number of dependencies: 40535
WARNING: Final GC required 30.1312520559907 % of runtime
Evaluation count : 270 in 6 samples of 45 calls.
             Execution time mean : 2,247145 ms
    Execution time std-deviation : 15,017086 µs
   Execution time lower quantile : 2,231313 ms ( 2,5%)
   Execution time upper quantile : 2,268298 ms (97,5%)
                   Overhead used : 6,832376 ns

Aug 09 '22 08:08 matgis

Here's the code I used to perform the measurements in isolation without loading up the editor:

(ns scratch
  (:require [criterium.core :as criterium]
            [dynamo.graph :as g]
            [editor.defold-project :as project]
            [editor.editor-extensions :as extensions]
            [editor.progress :as progress]
            [editor.workspace :as workspace]
            [integration.test-util :as tu]
            [internal.cache :as c]
            [internal.graph.types :as gt]
            [support.test-support :as ts]))

(let [project-dir-path "/Users/mats/Defold/lot_of_collections1"]
  (System/gc)
  (ts/with-clean-system
    {:cache-size 20000
     :cache-retain? project/cache-retain?}
    (let [workspace (tu/setup-workspace! world project-dir-path)
          game-project-resource (workspace/find-resource workspace "/game.project")
          main-script-resource (workspace/find-resource workspace "/main/main.script")
          project-graph (g/make-graph! :history true :volatility 1)
          extensions (extensions/make project-graph)
          project (project/open-project! project-graph extensions workspace game-project-resource progress/null-render-progress!)]
      (let [main-script (project/get-resource-node project main-script-resource)
            basis (g/now)
            cache (g/cache)
            dependencies (gt/dependencies basis [(gt/endpoint main-script :modified-lines)])]
        (println "Number of dependencies:" (count dependencies))
        (criterium/quick-bench
          (c/cache-invalidate cache dependencies))))))

Aug 09 '22 08:08 matgis

I have a different number of dependencies (163454), I changed the project at some point to have even more references so the issue is more noticeable.

Aug 15 '22 08:08 vlaaad