ggimage icon indicating copy to clipboard operation
ggimage copied to clipboard

Add Image Caching Mechanism to Improve Performance of geom_image

Open xiayh17 opened this issue 1 year ago • 0 comments

Description:

This pull request introduces an image caching mechanism to geom_image, significantly improving performance for plots with multiple images or repeated plot generations.

Let's consider the example provided:

images = list.files(system.file("extdata", package="ggimage"),
                    pattern="png", full.names=TRUE)

df = data.frame(x = rep(1:20, each = 20),
                y = rep(1:20, 20),
                image = sample(images, 400, replace = TRUE))

ggplot(df, aes(x, y)) + 
  geom_image(aes(image=image), size=0.04)

This example creates a plot with 400 image points, potentially using multiple unique images repeatedly. Without caching:

  1. Each of the 400 points would require loading its image from disk, even if it's a duplicate.
  2. For large datasets or repeated plot generations (e.g., in Shiny apps), this could lead to significant performance issues.

With caching:

  1. Each unique image is loaded only once and stored in memory.
  2. Subsequent uses of the same image retrieve it from the cache instead of reloading from disk.
  3. This significantly reduces I/O operations and improves rendering speed, especially for larger datasets or interactive applications.

To demonstrate, you could run a simple benchmark:

library(microbenchmark)

> # With caching 
> microbenchmark(
+   print(ggplot(df, aes(x, y)) + geom_image(aes(image=image), size=0.04)),
+   times = 10
+ )
Unit: milliseconds
                                                                            expr      min      lq     mean   median       uq      max neval
 print(ggplot(df, aes(x, y)) + geom_image(aes(image = image),      size = 0.04)) 570.8385 574.692 600.3304 582.8108 597.2787 733.2289    10

> # Without caching
> microbenchmark(
+   print(ggplot(df, aes(x, y)) + geom_image(aes(image=image), size=0.04)),
+   times = 10
+ )
Unit: seconds
                                                                            expr     min      lq     mean   median       uq      max neval
 print(ggplot(df, aes(x, y)) + geom_image(aes(image = image),      size = 0.04)) 6.55834 48.1042 45.05545 49.15449 49.98102 51.79424    10

The results would likely show a significant performance improvement, especially on subsequent runs.

Key changes:

  1. Implemented an internal cache using an environment to store loaded images.
  2. Modified imageGrob and related functions to utilize the cache.
  3. Added functions to manage the cache (clear cache, get cache size).
  4. Deleted alpha and use opacity

Benefits:

  • Reduced disk I/O: Each unique image is loaded only once.
  • Improved rendering speed: Subsequent uses of the same image retrieve it from memory.
  • Enhanced performance for large datasets and interactive applications.
  • Disguised alpha and opacity

geom_subview also add a cache. but the not the key to speed up.

xiayh17 avatar Sep 01 '24 15:09 xiayh17