docusaurus After updating to v3.1, large repo build takes 3 hours

Have you read the Contributing Guidelines on issues?

[X] I have read the Contributing Guidelines on issues.

Prerequisites

[X] I'm using the latest version of Docusaurus.
[X] I have tried the npm run clear or yarn clear command.
[X] I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
[ ] I have tried creating a repro with https://new.docusaurus.io.
[X] I have read the console error message carefully (if applicable).

Description

~/Downloads/www  on   main *3 +9 !1  qqqq                                                                                                               128 ✘  took 56s   base   at 12:56:46 PM 
$ $npm_execpath run prepare-to-launch && $npm_execpath run scc && $npm_execpath run format && git add . && git commit -m 'wrote something' && git push && $npm_execpath run build && $npm_execpath redirects && until $npm_execpath ship; do :; done
$ $npm_execpath run clear && $npm_execpath run sanitize && $npm_execpath run process-blog && $npm_execpath run process-docs && $npm_execpath run backlinks && $npm_execpath run figcaption && $npm_execpath run readme
$ docusaurus clear && rm -rf 'blog' && rm -rf 'docs' && rm -rf '**/*.config.js' && rm -rf '**/*.config.js.map' && rm -f 'docusaurus.config.js.map' && rm -f 'docusaurus.config.js' && rm -rf 'i18n /**/*.md' && cp tools/안녕.md i18n/ko/docusaurus-plugin-content-docs/current/Hey.md && rm -rf 'i18n /**/*.png' && rm -rf 'i18n /**/*.svg' && rm -rf 'i18n /**/*.jpg' && rm -rf 'i18n /**/*.jpeg'
[SUCCESS] Removed the Webpack persistent cache folder at "/Users/cho/Downloads/www/node_modules/.cache".
$ python3 tools/sanitize.py
Found 2289 MD and MDX files.
Replaced 0 hex marks.
$ python3 tools/process-blog.py
$ python3 tools/process-docs.py
Replaced 10642 wikilinks.
$ python3 tools/process-backlinks.py
Found 4552 MD files.
Wrote 2860 files with 8283 mentions to backlinks.ts.
Wrote 2276 filenames to filenames.ts.
$ python3 tools/img-alt-to-figcaption.py
Found 2324 MD and MDX files.
Replaced 1320 alt texts.
$ cp tools/README.src.md README.md && printf "\n\n## Last updated \n\n$(date)\n" >> README.md
$ printf '\n## Stats\n' >> README.md && printf '\n```\n' >> README.md && scc . >> README.md && printf '```\n' >> README.md
$ prettier --log-level silent --config .prettierrc -w '**/*.{ts,tsx,json,md,mdx,css,scss,html,yml,yaml,mts,mjs,cts,cjs,js,jsx,xml}'
[main 478a064d] wrote something
 9 files changed, 13 insertions(+), 6 deletions(-)
 create mode 100644 Research/assets/F6FE2E.png
Enumerating objects: 30, done.
Counting objects: 100% (30/30), done.
Delta compression using up to 12 threads
Compressing objects: 100% (16/16), done.
Writing objects: 100% (16/16), 1.43 MiB | 33.99 MiB/s, done.
Total 16 (delta 6), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (6/6), completed with 6 local objects.
To https://github.com/anaclumos/extracranial.git
   f316c4bd..478a064d  main -> main
$ NODE_OPTIONS="--max-old-space-size=16384" docusaurus build
[INFO] Website will be built for all these locales:
- en
- ko
[INFO] [en] Creating an optimized production build...

✔ Client
  Compiled successfully in 1.30m

✔ Server
  Compiled successfully in 4.46m

[SUCCESS] Generated static files in "build".
[INFO] [ko] Creating an optimized production build...

✔ Client
  Compiled successfully in 1.35m

✔ Server
  Compiled successfully in 4.43m

[SUCCESS] Generated static files in "build/ko".
[INFO] Use `npm run serve` command to test your build locally.
$ cp _redirects build/_redirects
$ wrangler pages deploy ./build --commit-dirty=true --project-name=memex
🌎  Uploading... (16403/16403)

✨ Success! Uploaded 8158 files (8245 already uploaded) (155.03 sec)

✨ Uploading _redirects
✨ Deployment complete! Take a peek over at https://25a4b06d.memex.pages.dev
⌛ Done in 12618.98s.

    ~/Downloads/www  on   main *3 !2                                                                                                                     ✔  took 3h 30m 19s   base   at 04:48:29 PM

On the last line, take a look at 3h 30m 19s. Even though the client and server was compiled in ~4m, it just hangs there forever, and the node process takes ~7GB of RAM. In previous versions it went up to ~14GB; was there any change in how docusaurus limit RAM usage in sacrifice of compilation speed?

Reproducible demo

https://github.com/anaclumos/extracranial

Steps to reproduce

Run all-in-one:build

Expected behavior

Compiles relatively fast, preferably under 30 minutes

Actual behavior

Takes 3 hours to build.

Your environment

Public source code:
Public site URL:
Docusaurus version used:
Environment name and version (e.g. Chrome 89, Node.js 16.4):
Operating system and version (e.g. Ubuntu 20.04.2 LTS):

Self-service

[X] I'd be willing to fix this bug myself.

Jan 16 '24 01:01 anaclumos

Hey

No we didn't change anything recently that could lead to such a significant difference.

But your report is not clear enough.

What was the version of Docusaurus you used before exactly?

How long did it take to build previously?

Can you replicate this only on your computer, or also on CI such as GitHub Actions?

What was the upgrade PR?

Are we even sure it's Docusaurus fault? Your log shows that Done in 12618.98s.. Please show us the time it takes executing only the Docusaurus build command, building just one language for example, and nothing else.

How comes you are reporting using Node.js 16.4 while Docusaurus v3.0 requires Node 18?

Jan 16 '24 14:01 slorber

First off, huge fan of Docusaurus. Wanted to comment along. Might be tangential to this, we also were increasing a 2x increase in build times upgrading from Docusaurus 3.0.1 to 3.1. We ended up downgrading back to 3.0.1. We leverage our own CI Solution, Harness CI Enterprise.

3.0.1 Builds: 8-9 mins 3.1 Builds: 17-20 mins.

We are wanting to dig in a little further if anyone on Docusaurus Project Side can weigh in On Broken Anchors feature [https://github.com/facebook/docusaurus/pull/9528]. If that feature has to build a list of the anchors, on larger sites if that step takes time. We tried configuring onBrokenMarkdownLinks to Ignore but I believe the process still runs, just. not producing or throwing the output. Potentially moving Ignore to not execute at all?

The big increase comes between Server Compile and the "done" hook.

[success] [webpackbar] Server: Compiled successfully in 7.36m
[SUCCESS] Generated static files in "build".
[INFO] Use `npm run serve` command to test your build locally.
Done in 983.84s.

Node Build Version: 18.19.0

Thanks for a great project!

Jan 16 '24 14:01 ravilach

Hey

No we didn't change anything recently that could lead to such a significant difference.

But your report is not clear enough.

What was the version of Docusaurus you used before exactly?

Was using 3.0.1.

How long did it take to build previously?

It took under 30 minutes.

Can you replicate this only on your computer, or also on CI such as GitHub Actions?

My Docusaurus site is pretty big and doesn't fit on CI machines. RAM usage used to spike to 14GB the sealing process, and all CI machines crashed at this point.

Are we even sure it's Docusaurus fault? Your log shows that Done in 12618.98s.. Please show us the time it takes executing only the Docusaurus build command, building just one language for example, and nothing else.

I am sure. Every other script finishes under 1 minute, and the it's only the Docusaurus build step that hangs.

How comes you are reporting using Node.js 16.4 while Docusaurus v3.0 requires Node 18?

I am using v18.17.1. Where did you get this information, may I ask?

Jan 16 '24 18:01 anaclumos

Hey No we didn't change anything recently that could lead to such a significant difference. But your report is not clear enough. What was the version of Docusaurus you used before exactly?

Was using 3.0.1.

How long did it take to build previously?

It took under 30 minutes.

Can you replicate this only on your computer, or also on CI such as GitHub Actions?

My Docusaurus site is pretty big and doesn't fit on CI machines. RAM usage used to spike to 14GB the sealing process, and all CI machines crashed at this point.

Are we even sure it's Docusaurus fault? Your log shows that Done in 12618.98s.. Please show us the time it takes executing only the Docusaurus build command, building just one language for example, and nothing else.

I am sure. Every other script finishes under 1 minute, and the it's only the Docusaurus build step that hangs.

How comes you are reporting using Node.js 16.4 while Docusaurus v3.0 requires Node 18?

I am using v18.17.1. Where did you get this information, may I ask?

+1 on the sealing process, where the resource usage/time seems to spike for us also. Anything added to that process from 3.0.1 -> 3.1, e.g On Broken Anchors? Thanks!

Jan 16 '24 18:01 ravilach

If it's coming from the brokenAnchor you may try to put onBrokenAnchors in docusaurus.config file to ignore

Maybe you can disable it in your CI but still have a build process somewhere that you run manually / every few times to check for broken links / anchors

Jan 17 '24 14:01 OzakIOne

Just to add we've recently rolled back from 3.1 to 3.0.1 for this exact same issue (we also have a large site). Normally would take approx 45 mins to build, and with 3.1 moves to just over 2 hours.

However, maybe of interest, when we initially rolled back we updated our package-lock.json and noticed the build times stayed the same (close to 2 hours). Reverting to the original package-lock.json prior to our 3.1 upgrade that we used when originally on 3.0.1, the build went back to 45 mins.

I've just tried it again, and when using 3.0.1, and building without a package-lock.json to use the latest dependencies, the build time more than doubles.

As an aside, onBrokenAnchors: "ignore", made no difference for us (and we also fixed all the broken anchors).

Jan 17 '24 19:01 andrewgbell

If it's coming from the brokenAnchor you may try to put onBrokenAnchors in docusaurus.config file to ignore

Maybe you can disable it in your CI but still have a build process somewhere that you run manually / every few times to check for broken links / anchors

Thanks @OzakIOne it's a great feature. Curious, we noticed the same behavior with ignore. Does the onBrokenAnchor always run but does not display results if ignore. If does not run, can weed that out.

Jan 17 '24 23:01 ravilach

@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression.

It would be super helpful for me to be able see/run that upgrade myself and study the package-lock.json diff.

Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.

Does the onBrokenAnchor always run but does not display results if ignore. If does not run, can weed that out.

@ravilach I'd recommend to try turning off both onBrokenLinks: "ignore" and onBrokenLinks: "ignore", because we only "bypass" the broken link checker if both are ignored atm.

I'll try to optimize that better in the future, but in the meantime the code looks like this:

  if (onBrokenLinks === 'ignore' && onBrokenAnchors === 'ignore') {
    return;
  }

  const brokenLinks = getBrokenLinks({
    routes,
    collectedLinks: normalizeCollectedLinks(collectedLinks),
  });

  reportBrokenLinks({brokenLinks, onBrokenLinks, onBrokenAnchors});

Note: is this possible that you encounter longer build times only due to cache eviction.

We use Webpack with persistent caching and on rebuilds it's supposed to rebuild faster.

It may be possible that your site builds longer simply because the caches were empty?

In this case I suggest trying to run docusaurus clear && docusaurus build on your "fast branch" and see if it becomes slower to build.

Jan 18 '24 10:01 slorber

@anaclumos I tried using your repo before the upgrade (https://github.com/anaclumos/extracranial/tree/f144432acdfff55d741a1dbc568ae0b51dd052fe) but the usage of Bun package manager makes it inconvenient to troubleshoot.

First when I run bun install on your repo with latest Bun version, it seems to resolve to newer versions of Docusaurus dependency ranges, and modify your bun.lockb file:

Then, the binary format of the lockfile makes it super inconvenient to inspect and diff.

Maybe I could try using the exact same version of Bun you are using, and it would not upgrade? For now I'm unable to troubleshoot this using your repo.

Jan 18 '24 11:01 slorber

@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression.

It would be super helpful for me to be able see/run that upgrade myself and study the package-lock.json diff.

Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.

Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need?

Jan 18 '24 14:01 andrewgbell

Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need?

@andrewgbell I'd have to run this locally myself, partially upgrading some libs in a dichotomic way to find out which transitive dep cause the problem. I doubt seeing a diff will be enough to identify the problem unfortunately, I need to run the code.

Jan 18 '24 16:01 slorber

@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression.

It would be super helpful for me to be able see/run that upgrade myself and study the package-lock.json diff.

Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.

Does the onBrokenAnchor always run but does not display results if ignore. If does not run, can weed that out.

@ravilach I'd recommend to try turning off both onBrokenLinks: "ignore" and onBrokenLinks: "ignore", because we only "bypass" the broken link checker if both are ignored atm.

I'll try to optimize that better in the future, but in the meantime the code looks like this:
  if (onBrokenLinks === 'ignore' && onBrokenAnchors === 'ignore') {
    return;
  }

  const brokenLinks = getBrokenLinks({
    routes,
    collectedLinks: normalizeCollectedLinks(collectedLinks),
  });

  reportBrokenLinks({brokenLinks, onBrokenLinks, onBrokenAnchors});
Note: is this possible that you encounter longer build times only due to cache eviction.

We use Webpack with persistent caching and on rebuilds it's supposed to rebuild faster.

It may be possible that your site builds longer simply because the caches were empty?

In this case I suggest trying to run docusaurus clear && docusaurus build on your "fast branch" and see if it becomes slower to build.

Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need?

@andrewgbell I'd have to run this locally myself, partially upgrading some libs in a dichotomic way to find out which transitive dep cause the problem. I doubt seeing a diff will be enough to identify the problem unfortunately, I need to run the code.

Ours is Open Source: https://github.com/harness/developer-hub if that helps. Currently on DS 3.0.1.

Here is the yarn.lock from the 3.1 upgrade: https://github.com/harness/developer-hub/blob/7b5fbafc4036f61d30e094362a67204cc573cf7a/yarn.lock

Jan 18 '24 16:01 ravilach

@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression. It would be super helpful for me to be able see/run that upgrade myself and study the package-lock.json diff. Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.

Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need?

@slorber If you need another repo, let me know as I can invite you into our org.

Jan 19 '24 08:01 andrewgbell

Still investigating your site @ravilach, but it looks like there are 2 problems:

the broken link checker now using node new URL() is much slower (edit: it's not that it's the matchRoutes calls)
a transitive dependency is causing longer build times (I suspect related to postcss or css-loader)

Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers?

yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest                                  @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest                                                                           @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest

onBrokenLinks: "ignore"
onBrokenAnchors: "ignore"

Jan 21 '24 22:01 slorber

Still investigating your site @ravilach, but it looks like there are 2 problems:

* the broken link checker now using node `new URL()` is much slower

* a transitive dependency is causing longer build times (I suspect related to postcss or css-loader)

Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers?

yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest                                  @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest                                                                           @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest

* `onBrokenLinks: "ignore"`

* `onBrokenAnchors: "ignore"`

Thanks @slorber, much appreciated!

Jan 21 '24 23:01 ravilach

Maybe I could try using the exact same version of Bun you are using, and it would not upgrade? For now I'm unable to troubleshoot this using your repo.

I migrated to pnpm.

Jan 22 '24 00:01 anaclumos

Still investigating your site @ravilach, but it looks like there are 2 problems:

the broken link checker now using node new URL() is much slower

a transitive dependency is causing longer build times (I suspect related to postcss or css-loader)

Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers?
yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest                                  @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest                                                                           @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest
onBrokenLinks: "ignore"

onBrokenAnchors: "ignore"

Hi, I've added:

  onBrokenLinks: "ignore",
  onBrokenAnchors: "ignore",
  onBrokenMarkdownLinks: "throw",

alongside running

npm upgrade @docusaurus/core @docusaurus/cssnano-preset @docusaurus/plugin-client-redirects @docusaurus/plugin-debug @docusaurus/plugin-google-analytics @docusaurus/plugin-google-gtag @docusaurus/plugin-sitemap @docusaurus/preset-classic @docusaurus/theme-classic @docusaurus/theme-mermaid @docusaurus/theme-search-algolia @docusaurus/module-type-aliases @docusaurus/tsconfig

And build time dropped back to the expected (in fact a few minutes quicker, approx 40 mins). I've tried removing onBrokenAnchors: "ignore",

However build time just back up again to over 2 hours.

I've also tried adding these ignores again

  onBrokenLinks: "ignore",
  onBrokenAnchors: "ignore",
  onBrokenMarkdownLinks: "throw",

but upgrading the whole package-lock.json again. As of today, it slows by about 10% over the run above (About 45 mins), which is a huge improvement on where we were last week so not sure if a dependency has updated since.

So looks like you're correct on the two issues though the brokenlinks and anchors seems to have a far greater impact.

Jan 22 '24 13:01 andrewgbell

Still investigating your site @ravilach, but it looks like there are 2 problems:

the broken link checker now using node new URL() is much slower

a transitive dependency is causing longer build times (I suspect related to postcss or css-loader)

Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers?
yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest                                  @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest                                                                           @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest
onBrokenLinks: "ignore"

onBrokenAnchors: "ignore"

Hi, I've added:

  onBrokenLinks: "ignore",
  onBrokenAnchors: "ignore",
  onBrokenMarkdownLinks: "throw",

alongside running

npm upgrade @docusaurus/core @docusaurus/cssnano-preset @docusaurus/plugin-client-redirects @docusaurus/plugin-debug @docusaurus/plugin-google-analytics @docusaurus/plugin-google-gtag @docusaurus/plugin-sitemap @docusaurus/preset-classic @docusaurus/theme-classic @docusaurus/theme-mermaid @docusaurus/theme-search-algolia @docusaurus/module-type-aliases @docusaurus/tsconfig

And build time dropped back to the expected (in fact a few minutes quicker, approx 40 mins). I've tried removing onBrokenAnchors: "ignore",

However build time just back up again to over 2 hours.

I've also tried adding these ignores again onBrokenLinks: "ignore", onBrokenAnchors: "ignore", onBrokenMarkdownLinks: "throw",

but this time upgrading the whole package-lock.json. As of today, it now runs through at the same speed as above so not sure if a dependency has updated since.

So looks like you're correct on the brokenlinks and anchors seems to have a far greater impact.

Jan 22 '24 13:01 andrewgbell

Thanks for reporting @andrewgbell

I've submitted a PR that should optimize things, likely faster than before: https://github.com/facebook/docusaurus/pull/9778

So far it seems to work on @ravilach site.

Could you give it a test by building locally with this modified file?

node_modules/@docusaurus/core/lib/server/brokenLinks.js

"use strict";
/**
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 */
Object.defineProperty(exports, "__esModule", { value: true });
exports.handleBrokenLinks = void 0;
const tslib_1 = require("tslib");
const lodash_1 = tslib_1.__importDefault(require("lodash"));
const logger_1 = tslib_1.__importDefault(require("@docusaurus/logger"));
const react_router_config_1 = require("react-router-config");
const utils_1 = require("@docusaurus/utils");
const utils_2 = require("./utils");
function matchRoutes(routeConfig, pathname) {
    // @ts-expect-error: React router types RouteConfig with an actual React
    // component, but we load route components with string paths.
    // We don't actually access component here, so it's fine.
    return (0, react_router_config_1.matchRoutes)(routeConfig, pathname);
}
function createBrokenLinksHelper({ collectedLinks, routes, }) {
    const validPathnames = new Set(collectedLinks.keys());
    // Matching against the route array can be expensive
    // If the route is already in the valid pathnames,
    // we can avoid matching against it as an optimization
    const remainingRoutes = routes
        .filter((route) => !validPathnames.has(route.path));
    function isPathnameMatchingAnyRoute(pathname) {
        if (matchRoutes(remainingRoutes, pathname).length > 0) {
            // IMPORTANT: this is an optimization here
            // See https://github.com/facebook/docusaurus/issues/9754
            // Large Docusaurus sites have many routes!
            // We try to minimize calls to a possibly expensive matchRoutes function
            validPathnames.add(pathname);
            return true;
        }
        return false;
    }
    function isPathBrokenLink(linkPath) {
        const pathnames = [linkPath.pathname, decodeURI(linkPath.pathname)];
        if (pathnames.some((p) => validPathnames.has(p))) {
            return false;
        }
        if (pathnames.some(isPathnameMatchingAnyRoute)) {
            return false;
        }
        return true;
    }
    function isAnchorBrokenLink(linkPath) {
        const { pathname, hash } = linkPath;
        // Link has no hash: it can't be a broken anchor link
        if (hash === undefined) {
            return false;
        }
        // Link has empty hash ("#", "/page#"...): we do not report it as broken
        // Empty hashes are used for various weird reasons, by us and other users...
        // See for example: https://github.com/facebook/docusaurus/pull/6003
        if (hash === '') {
            return false;
        }
        const targetPage = collectedLinks.get(pathname) || collectedLinks.get(decodeURI(pathname));
        // link with anchor to a page that does not exist (or did not collect any
        // link/anchor) is considered as a broken anchor
        if (!targetPage) {
            return true;
        }
        // it's a not broken anchor if the anchor exists on the target page
        if (targetPage.anchors.has(hash) ||
            targetPage.anchors.has(decodeURIComponent(hash))) {
            return false;
        }
        return true;
    }
    return {
        collectedLinks,
        isPathBrokenLink,
        isAnchorBrokenLink,
    };
}
function getBrokenLinksForPage({ pagePath, helper, }) {
    const pageData = helper.collectedLinks.get(pagePath);
    const brokenLinks = [];
    pageData.links.forEach((link) => {
        const linkPath = (0, utils_1.parseURLPath)(link, pagePath);
        if (helper.isPathBrokenLink(linkPath)) {
            brokenLinks.push({
                link,
                resolvedLink: (0, utils_1.serializeURLPath)(linkPath),
                anchor: false,
            });
        }
        else if (helper.isAnchorBrokenLink(linkPath)) {
            brokenLinks.push({
                link,
                resolvedLink: (0, utils_1.serializeURLPath)(linkPath),
                anchor: true,
            });
        }
    });
    return brokenLinks;
}
/**
 * The route defs can be recursive, and have a parent match-all route. We don't
 * want to match broken links like /docs/brokenLink against /docs/*. For this
 * reason, we only consider the "final routes" that do not have subroutes.
 * We also need to remove the match-all 404 route
 */
function filterIntermediateRoutes(routesInput) {
    const routesWithout404 = routesInput.filter((route) => route.path !== '*');
    return (0, utils_2.getAllFinalRoutes)(routesWithout404);
}
function getBrokenLinks({ collectedLinks, routes, }) {
    const filteredRoutes = filterIntermediateRoutes(routes);
    const helper = createBrokenLinksHelper({
        collectedLinks,
        routes: filteredRoutes,
    });
    const result = {};
    collectedLinks.forEach((_unused, pagePath) => {
        try {
            result[pagePath] = getBrokenLinksForPage({
                pagePath,
                helper,
            });
        }
        catch (e) {
            throw new Error(`Unable to get broken links for page ${pagePath}.`, {
                cause: e,
            });
        }
    });
    return result;
}
function brokenLinkMessage(brokenLink) {
    const showResolvedLink = brokenLink.link !== brokenLink.resolvedLink;
    return `${brokenLink.link}${showResolvedLink ? ` (resolved as: ${brokenLink.resolvedLink})` : ''}`;
}
function createBrokenLinksMessage(pagePath, brokenLinks) {
    const type = brokenLinks[0]?.anchor === true ? 'anchor' : 'link';
    const anchorMessage = brokenLinks.length > 0
        ? `- Broken ${type} on source page path = ${pagePath}:
   -> linking to ${brokenLinks
            .map(brokenLinkMessage)
            .join('\n   -> linking to ')}`
        : '';
    return `${anchorMessage}`;
}
function createBrokenAnchorsMessage(brokenAnchors) {
    if (Object.keys(brokenAnchors).length === 0) {
        return undefined;
    }
    return `Docusaurus found broken anchors!

Please check the pages of your site in the list below, and make sure you don't reference any anchor that does not exist.
Note: it's possible to ignore broken anchors with the 'onBrokenAnchors' Docusaurus configuration, and let the build pass.

Exhaustive list of all broken anchors found:
${Object.entries(brokenAnchors)
        .map(([pagePath, brokenLinks]) => createBrokenLinksMessage(pagePath, brokenLinks))
        .join('\n')}
`;
}
function createBrokenPathsMessage(brokenPathsMap) {
    if (Object.keys(brokenPathsMap).length === 0) {
        return undefined;
    }
    /**
     * If there's a broken link appearing very often, it is probably a broken link
     * on the layout. Add an additional message in such case to help user figure
     * this out. See https://github.com/facebook/docusaurus/issues/3567#issuecomment-706973805
     */
    function getLayoutBrokenLinksHelpMessage() {
        const flatList = Object.entries(brokenPathsMap).flatMap(([pagePage, brokenLinks]) => brokenLinks.map((brokenLink) => ({ pagePage, brokenLink })));
        const countedBrokenLinks = lodash_1.default.countBy(flatList, (item) => item.brokenLink.link);
        const FrequencyThreshold = 5; // Is this a good value?
        const frequentLinks = Object.entries(countedBrokenLinks)
            .filter(([, count]) => count >= FrequencyThreshold)
            .map(([link]) => link);
        if (frequentLinks.length === 0) {
            return '';
        }
        return logger_1.default.interpolate `

It looks like some of the broken links we found appear in many pages of your site.
Maybe those broken links appear on all pages through your site layout?
We recommend that you check your theme configuration for such links (particularly, theme navbar and footer).
Frequent broken links are linking to:${frequentLinks}`;
    }
    return `Docusaurus found broken links!

Please check the pages of your site in the list below, and make sure you don't reference any path that does not exist.
Note: it's possible to ignore broken links with the 'onBrokenLinks' Docusaurus configuration, and let the build pass.${getLayoutBrokenLinksHelpMessage()}

Exhaustive list of all broken links found:
${Object.entries(brokenPathsMap)
        .map(([pagePath, brokenPaths]) => createBrokenLinksMessage(pagePath, brokenPaths))
        .join('\n')}
`;
}
function splitBrokenLinks(brokenLinks) {
    const brokenPaths = {};
    const brokenAnchors = {};
    Object.entries(brokenLinks).forEach(([pathname, pageBrokenLinks]) => {
        const [anchorBrokenLinks, pathBrokenLinks] = lodash_1.default.partition(pageBrokenLinks, (link) => link.anchor);
        if (pathBrokenLinks.length > 0) {
            brokenPaths[pathname] = pathBrokenLinks;
        }
        if (anchorBrokenLinks.length > 0) {
            brokenAnchors[pathname] = anchorBrokenLinks;
        }
    });
    return { brokenPaths, brokenAnchors };
}
function reportBrokenLinks({ brokenLinks, onBrokenLinks, onBrokenAnchors, }) {
    // We need to split the broken links reporting in 2 for better granularity
    // This is because we need to report broken path/anchors independently
    // For v3.x retro-compatibility, we can't throw by default for broken anchors
    // TODO Docusaurus v4: make onBrokenAnchors throw by default?
    const { brokenPaths, brokenAnchors } = splitBrokenLinks(brokenLinks);
    const pathErrorMessage = createBrokenPathsMessage(brokenPaths);
    if (pathErrorMessage) {
        logger_1.default.report(onBrokenLinks)(pathErrorMessage);
    }
    const anchorErrorMessage = createBrokenAnchorsMessage(brokenAnchors);
    if (anchorErrorMessage) {
        logger_1.default.report(onBrokenAnchors)(anchorErrorMessage);
    }
}
// Users might use the useBrokenLinks() API in weird unexpected ways
// JS users might call "collectLink(undefined)" for example
// TS users might call "collectAnchor('#hash')" with/without #
// We clean/normalize the collected data to avoid obscure errors being thrown
// We also use optimized data structures for a faster algorithm
function normalizeCollectedLinks(collectedLinks) {
    const result = new Map();
    Object.entries(collectedLinks).forEach(([pathname, pageCollectedData]) => {
        result.set(pathname, {
            links: new Set(pageCollectedData.links.filter(lodash_1.default.isString)),
            anchors: new Set(pageCollectedData.anchors
                .filter(lodash_1.default.isString)
                .map((anchor) => (anchor.startsWith('#') ? anchor.slice(1) : anchor))),
        });
    });
    return result;
}
async function handleBrokenLinks({ collectedLinks, onBrokenLinks, onBrokenAnchors, routes, }) {
    if (onBrokenLinks === 'ignore' && onBrokenAnchors === 'ignore') {
        return;
    }
    const brokenLinks = getBrokenLinks({
        routes,
        collectedLinks: normalizeCollectedLinks(collectedLinks),
    });
    reportBrokenLinks({ brokenLinks, onBrokenLinks, onBrokenAnchors });
}
exports.handleBrokenLinks = handleBrokenLinks;

Jan 22 '24 17:01 slorber

node_modules/@docusaurus/core/lib/server/brokenLinks.js

@slorber Yes, that worked great! Replaced the file and ran it with:

onBrokenLinks: "warn", onBrokenAnchors: "warn", onBrokenMarkdownLinks: "throw",

And it built just as quick as earlier. Thanks for all your help with this!

Jan 22 '24 20:01 andrewgbell

Thanks @andrewgbell

Don't you see any improvement too? On @ravilach site (that I simplified a bit, just 1 docs plugin instance instead of 5), I see a significant improvement in time to handle broken links and total build time.

3.0
handleBrokenLinks: 3:28.361 (m:ss.mmm)
✨  Done in 636.47s.

3.1 before
handleBrokenLinks: 6:32.570 (m:ss.mmm)
✨  Done in 785.73s.

3.1 after optimizations
handleBrokenLinks: 694.907ms
✨  Done in 361.92s.

Jan 23 '24 09:01 slorber

Hi @slorber , sorry yes. I'd been comparing 3.1 optimisations with 3.1 ignore broken links so hadn't spotted it.

But looking again we get:

3.0 build time with handleBrokenLinks - 54 mins

3.1 (without fix) build time with handleBrokenLinks - 137 mins

3.1 (with fix) build time with handleBrokenLinks - 41 mins

So very significant! Thanks!

Jan 23 '24 10:01 andrewgbell

Awesome news then 🎉 thanks for reporting

Jan 23 '24 10:01 slorber

Just updated. It's even faster than before!! Thank you so much 😃

Jan 26 '24 19:01 anaclumos

awesome news @anaclumos

Do you mind sharing numbers? How much faster is it?

Jan 26 '24 21:01 slorber

It used to take around 20 minutes. Now it finishes around 11 minutes.

Jan 27 '24 12:01 anaclumos

🤯 Didn't expect it to have such an impact.

Finally this perf regression was a good thing 😄

Jan 27 '24 13:01 slorber