next-shared-cache Don't Discard Page Contents Generated During Build

Brief Description of the Bug

When configured to use a Redis cache, the package currently discards the page contents generated during the NextJS build phase. As a result, initial page contents are generated twice in the common case when deployment follows shortly after build: (1) once during build (slow, developer facing) and (2) again during first page load (slow, user facing (!)).

Severity

Major

Frequency of Occurrence

Always

Steps to Reproduce Provide detailed steps to reproduce the behavior, including any specific conditions or configurations where the bug occurs:

Build a NextJS project that uses ISR for some pages
Deploy the project using this package configured with a single redis-strings cache handler
Load one of the pages generated just seconds ago using ISR, experience a significant delay while the page is built again so that it can be loaded into the cache

Expected vs. Actual Behavior

Pages generated during the NextJS build phase should not be immediately rebuilt after deployment.

Environment:

OS: MacOS
Node.js version: 20.12.2
@neshca/cache-handler version: 1.3.1
next version: 14.2.3

Attempted Solutions or Workarounds

I've implemented a simple cache handler that loads initial data and HTML from the on-disk files included in the build output. But this should really be the default behavior and included in the handler, not something that a user has to implement—if not only because I probably didn't get it quite right. Another alternative would be to provide a filesystem cache handler which could provide this functionality, but I think users will be surprised that this doesn't work out of the box.

Additional context

It looks like a previous version of this package did implement this behavior, but it was removed right before 1.0.0 was released. (I reused some of the code from that version for my handler.) It wasn't clear to me why that option was removed, and it should probably have fired a warning so users would notice that the package was no longer supporting loading initial values from the filesystem. Next builds are also fairly slow when many pages are being built, so it seems problematic to do all that work and then discard it immediately.

There's language in the current documentation about the package being able to use (say) Redis during the build phase, which perhaps would address this? But that seems at odds with the Redis security model, which states that:

Redis is designed to be accessed by trusted clients inside trusted environments. This means that usually it is not a good idea to expose the Redis instance directly to the internet or, in general, to an environment where untrusted clients can directly access the Redis TCP port or UNIX socket.

At least in my deployment flow, I perform the builds on my local machine, push the Docker container, and deploy that to our Kubernetes cluster. Meaning that the build step does not run inside the trusted environment and so doesn't have access to the Redis instance using during production. Perhaps this is odd? It doesn't seem that way to me, necessarily. And it may be more common for users trying to deploy NextJS applications to non-Vercel environments, which AFAICT is essentially the intended audience for this package...

May 07 '24 20:05 gchallen

same thoughts here

May 27 '24 07:05 lzw1926

While implementing this for our starterkit for drupal -next.js projects I have been wondering about this same thing.

We are in the same situation described above, and it seems that basically using this cache handler makes the build step in the normal ISR next.js workflow futile. The data that is built at build time is then never used, I think.

And, coupled with the issue described here: https://github.com/caching-tools/next-shared-cache/issues/556 it also seems to negate a lot of the benefits of ISR in general.

Maybe I am missing something though :-)

May 31 '24 13:05 vermario

Regarding this, I am wondering if something like this could work (in our case we use CircleCI):

Add redis to circleci
during the build, add cached data to the redis instance inside circle ci
export the redis database, pass it on as an artifact
when starting redis in kubernetes, give it that data to import

Jun 07 '24 07:06 vermario

@vermario: I think that there are a variety of workarounds. The question is whether this library should support this out of the box.

It's also worth pointing out that a previous version did in fact support loading initial values from the filesystem, but that support was removed at some point. It's not clear why.

It does seem like the pre-built files should have enough data for cache insertion, given that the creation time can be used to determine when it's time to rebuild the page for ISR.

I currently have my own hacky workaround, but I'd rather see this library handle it internally so it can be done properly.

Jun 07 '24 16:06 gchallen

I am wondering if I just assumed that this library was meant to work with ISR, but maybe I have imagined it? :-)

Jun 07 '24 17:06 vermario

I would hope that it doesn't break later ISR page regeneration. I think what's being discussed here is just the initial load. NextJS doing an initial pass on generating ISR pages during the build step, assuming that these will be used as the initial page contents. (Of course, this assumes that the built pages will be rapidly deployed, but that's probably not a bad assumption. If they aren't, the ISR interval could expire before the pages were used.)

The initial page build also ensures that the first page load can always return a stale page immediately, even if an ISR build is triggered due to the contents being stale. That's part of what's so frustrating about this limitation of this library. Here's what's supposed to happen:

next build: creates page /foo
(sometime later): /foo loaded, returns prebuilt contents immediately
(if needed): ISR triggered if page is stale, rebuilt in background
(next load): gets latest /foo contents
and so on

Here's what happens when you use this caching library:

next build: creates page /foo
(sometime later): prebuilt /foo has been discarded, so a full page build is done while the user waits!

Not good.

Jun 08 '24 14:06 gchallen

In addition to this, if you check https://github.com/caching-tools/next-shared-cache/issues/556 , it seems that also under normal operations the stale-while-revalidate mechanism can't work, since the cached contents in redis are evicted straight away. This to my mind makes ISR not compatible at all with this.

Jun 10 '24 06:06 vermario

After modifying our handler according to @better-salmon 's suggestions in #556, we are able to see that stale while revalidate works nicely (great!) :-)

But the issue discussed in this thread still means that basically if we adopt this library, it means that pre-generating becomes totally futile, since next.js won't be aware of the prebuilt content, and will generate each page when the first user visits it, no matter if it was included in getStaticPaths or not.

I wonder if we are again missing something obvious, maybe @better-salmon you have better information about this?

It seems that somehow what is needed would be to read the json files that next.js generates during build and "import" into redis as initial data?

Jul 30 '24 13:07 vermario

Trying to investigate this more, we came across the serverDistDir context property on this page: https://caching-tools.github.io/next-shared-cache/api-reference/on-creation

That points to the absolute path of the server directory, which in turn contains the pages directory that has the json and html files for each pre-generated page.

Is the way to to to try and read the contents of that directory and somehow set them into redis when the client is instantiated? is this what you are doing with your custom code @gchallen ?

Jul 30 '24 14:07 vermario

so, we looked some more into this, and by adding some custom code it seems like we can achieve what we need:

when the client is instantiated for the first time:

we create a list of the HTML files that were generated during build in the pages folder
we iterate over the files, calling the handler to cache the contents according to the file modification date, and with the same attributes in terms of ttl and expiration that the handler would normally assign during runtime

this is visible here: https://github.com/wunderio/next-drupal-starterkit/pull/182/files#diff-e3c8a9ea9ddb0989e98b5e09063f02e372a49e57fbfb4a1bbd69ac55645d25c1

(feel free to take a look and tell us if your implementations are similar or we missed something :) )

Jul 31 '24 19:07 vermario

Thanks @vermario! That sounds like a nice approach. I was doing something much more hacky, but your approach seems more correct and I'll give it a try. Thanks for sharing!

Aug 01 '24 21:08 gchallen

Thanks! It seems to work ok. A couple of things:

the prepopulation happens when the client is created, which means when the running next.js server is hit by an actual request. I wonder if if's possible to make it happen when npm run start happens?
we see that the client might get instantiated twice (We added a safeguard mechanism based on setting a redis key with no expiration to signal that the cache population has happened already).

Aug 02 '24 05:08 vermario

Have you examined the NextJS instrumentation hook? That might do what you want WRT being able to do this on startup. In my case I'm using Kubernetes to hit a route before the container is activated, so I think the initialization will happen at that point.
Yup, saw that. I think that's the best you can do!

Aug 02 '24 18:08 gchallen

Also it looks like you're running into a problem I had, which is how to pull the correct expiration value on a per-page basis rather than using a hard-coded value. Is it possible this information is in the .json file?

Aug 02 '24 19:08 gchallen

Nope, it looks like the JSON file only has page contents.

FWIW, one way to avoid the big load on startup is to do what I was doing previously and only check the filesystem on a per-request basis. I merged in your code to set the timestamps with mine and it seems to work properly. However, you still need to track which contents have been loaded from the filesystem so that you don't reload from the filesystem after a revalidation. This seems to be safe, in that the Redis handler from this package will reject the change. However, it's inefficient in the sense that you're reading the same file twice.

Also, one thing I noticed is that revalidatePath in an App route will call revalidateTag with the tag equal to the page key prepended with N_T_. So for that to work, that tag needs to be added to the "PAGE" value loaded from the filesystem in your example above. Otherwise the revalidation will just silently not do anything. I think I had been running into that before and it's super frustrating. You actually also need to set this value in the set method as well.

Anyway, here's what I've come up with so far:

  const redisHandler = createRedisHandler({ client, keyPrefix: buildId })
  const mergedHandler: Handler = {
    name: "mergedHandler",
    async get(key, meta) {
      const existingResult = await redisHandler.get(key, meta)
      if (existingResult) {
        logger.debug(`Returning already cached result for ${key}`)
        return existingResult
      }

      if (await client.get(`${buildId}_${key}:filesystem`)) {
        logger.debug(`Not reloading ${key} from filesystem`)
        return null
      }

      logger.debug(`Looking for ${key} on filesystem`)
      try {
        const jsonPath = path.join(serverDistDir, "pages", `${key}.json`)
        const jsonExists = await fs
          .access(jsonPath, fs.constants.F_OK)
          .then(() => true)
          .catch(() => false)
        if (!jsonExists) {
          logger.debug(`JSON for ${key} does not exist, returning null`)
          return null
        }
        const pageData = JSON.parse(await fs.readFile(jsonPath, "utf8"))
        logger.trace(`pageData for ${key}: ${JSON.stringify(pageData, null, 2)}`)

        const htmlPath = path.join(serverDistDir, "pages", `${key}.html`)
        const htmlExists = await fs
          .access(htmlPath, fs.constants.F_OK)
          .then(() => true)
          .catch(() => false)
        if (!htmlExists) {
          logger.warn(`HTML for ${key} does not exist, returning null`)
          return null
        }
        const html = await fs.readFile(htmlPath, "utf8")

        const htmlFileStats = await fs.stat(htmlPath)
        const htmlFileLastModifiedTime = Math.round(htmlFileStats.mtimeMs / 1000)

        const newValue: CacheHandlerValue = {
          value: {
            kind: "PAGE",
            html,
            pageData,
            postponed: undefined,
            headers: undefined,
            status: undefined,
          },
          tags: [`${NEXT_CACHE_IMPLICIT_TAG_ID}${key}`],
          lastModified: htmlFileLastModifiedTime * 1000,
          lifespan: {
            lastModifiedAt: htmlFileLastModifiedTime,
            staleAge: DEFAULT_STALE_AGE,
            expireAge: DEFAULT_STALE_AGE,
            staleAt: htmlFileLastModifiedTime + DEFAULT_STALE_AGE,
            expireAt: htmlFileLastModifiedTime + DEFAULT_STALE_AGE,
            revalidate: REVALIDATE_LONG,
          },
        }

        logger.debug(`Inserting filesystem value for ${key} into cache`)
        await redisHandler.set(key, newValue)

        const newResult = await redisHandler.get(key, meta)
        if (newResult) {
          await client.set(`${buildId}_${key}:filesystem`, "true")
          return newResult
        } else {
          logger.error(`Value for ${key} does not exist after insertion`)
          return null
        }
      } catch (err) {
        logger.error(`Error looking in filesystem for ${key}: ${err}`)
        return null
      }
    },
    async set(key, cacheHandlerValue) {
      logger.debug(`Insertion for ${key}: ${cacheHandlerValue.tags}`)
      await client.get(`${buildId}_${key}:filesystem`)
      if (cacheHandlerValue.value.kind === "PAGE" && cacheHandlerValue.tags.length === 0) {
        cacheHandlerValue.tags = [`${NEXT_CACHE_IMPLICIT_TAG_ID}${key}`]
      }
      return redisHandler.set(key, cacheHandlerValue)
    },
    async revalidateTag(tag) {
      logger.debug(`Revalidating ${tag}`)
      return redisHandler.revalidateTag(tag)
    },
  }

  return {
    handlers: [mergedHandler],
    ttl: {
      defaultStaleAge: DEFAULT_STALE_AGE,
      estimateExpireAge: () => DEFAULT_STALE_AGE,
    },
  }

Overall I really feel like this is way too much work and requires knowing too much about NextJS internals. I may try implementing a cache handler using Redis from scratch, just to see if it's this package that's muddying the waters somehow.

Aug 03 '24 23:08 gchallen

@vermario @gchallen @lzw1926 hi,

I'm excited to share that the feature you've all been anticipating is now in progress!

I've just opened a PR that introduces pre-population of the cache with the initial data at application startup, which is expected to bring a noticeable performance improvement. This new functionality will be made possible via a registerInitialCache function, which leverages the CacheHandler class to populate the cache for all Handlers.

To achieve this, it is utilizing Next.js's instrumentation hook to run just before the server starts. This way, you can efficiently pre-populate the cache without the need for any additional scripts right before the server startup.

It's worth noting that this is currently an experimental feature and may evolve in future releases.

Feel free to check out the PR here: #743. I appreciate your feedback as I continue to refine this feature!

Sep 09 '24 21:09 better-salmon