After updating to v3.1, large repo build takes 3 hours
Have you read the Contributing Guidelines on issues?
- [X] I have read the Contributing Guidelines on issues.
Prerequisites
- [X] I'm using the latest version of Docusaurus.
- [X] I have tried the
npm run clearoryarn clearcommand. - [X] I have tried
rm -rf node_modules yarn.lock package-lock.jsonand re-installing packages. - [ ] I have tried creating a repro with https://new.docusaurus.io.
- [X] I have read the console error message carefully (if applicable).
Description
~/Downloads/www on main *3 +9 !1 qqqq 128 ✘ took 56s base at 12:56:46 PM
$ $npm_execpath run prepare-to-launch && $npm_execpath run scc && $npm_execpath run format && git add . && git commit -m 'wrote something' && git push && $npm_execpath run build && $npm_execpath redirects && until $npm_execpath ship; do :; done
$ $npm_execpath run clear && $npm_execpath run sanitize && $npm_execpath run process-blog && $npm_execpath run process-docs && $npm_execpath run backlinks && $npm_execpath run figcaption && $npm_execpath run readme
$ docusaurus clear && rm -rf 'blog' && rm -rf 'docs' && rm -rf '**/*.config.js' && rm -rf '**/*.config.js.map' && rm -f 'docusaurus.config.js.map' && rm -f 'docusaurus.config.js' && rm -rf 'i18n /**/*.md' && cp tools/안녕.md i18n/ko/docusaurus-plugin-content-docs/current/Hey.md && rm -rf 'i18n /**/*.png' && rm -rf 'i18n /**/*.svg' && rm -rf 'i18n /**/*.jpg' && rm -rf 'i18n /**/*.jpeg'
[SUCCESS] Removed the Webpack persistent cache folder at "/Users/cho/Downloads/www/node_modules/.cache".
$ python3 tools/sanitize.py
Found 2289 MD and MDX files.
Replaced 0 hex marks.
$ python3 tools/process-blog.py
$ python3 tools/process-docs.py
Replaced 10642 wikilinks.
$ python3 tools/process-backlinks.py
Found 4552 MD files.
Wrote 2860 files with 8283 mentions to backlinks.ts.
Wrote 2276 filenames to filenames.ts.
$ python3 tools/img-alt-to-figcaption.py
Found 2324 MD and MDX files.
Replaced 1320 alt texts.
$ cp tools/README.src.md README.md && printf "\n\n## Last updated \n\n$(date)\n" >> README.md
$ printf '\n## Stats\n' >> README.md && printf '\n```\n' >> README.md && scc . >> README.md && printf '```\n' >> README.md
$ prettier --log-level silent --config .prettierrc -w '**/*.{ts,tsx,json,md,mdx,css,scss,html,yml,yaml,mts,mjs,cts,cjs,js,jsx,xml}'
[main 478a064d] wrote something
9 files changed, 13 insertions(+), 6 deletions(-)
create mode 100644 Research/assets/F6FE2E.png
Enumerating objects: 30, done.
Counting objects: 100% (30/30), done.
Delta compression using up to 12 threads
Compressing objects: 100% (16/16), done.
Writing objects: 100% (16/16), 1.43 MiB | 33.99 MiB/s, done.
Total 16 (delta 6), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (6/6), completed with 6 local objects.
To https://github.com/anaclumos/extracranial.git
f316c4bd..478a064d main -> main
$ NODE_OPTIONS="--max-old-space-size=16384" docusaurus build
[INFO] Website will be built for all these locales:
- en
- ko
[INFO] [en] Creating an optimized production build...
✔ Client
Compiled successfully in 1.30m
✔ Server
Compiled successfully in 4.46m
[SUCCESS] Generated static files in "build".
[INFO] [ko] Creating an optimized production build...
✔ Client
Compiled successfully in 1.35m
✔ Server
Compiled successfully in 4.43m
[SUCCESS] Generated static files in "build/ko".
[INFO] Use `npm run serve` command to test your build locally.
$ cp _redirects build/_redirects
$ wrangler pages deploy ./build --commit-dirty=true --project-name=memex
🌎 Uploading... (16403/16403)
✨ Success! Uploaded 8158 files (8245 already uploaded) (155.03 sec)
✨ Uploading _redirects
✨ Deployment complete! Take a peek over at https://25a4b06d.memex.pages.dev
⌛ Done in 12618.98s.
~/Downloads/www on main *3 !2 ✔ took 3h 30m 19s base at 04:48:29 PM
On the last line, take a look at 3h 30m 19s. Even though the client and server was compiled in ~4m, it just hangs there forever, and the node process takes ~7GB of RAM. In previous versions it went up to ~14GB; was there any change in how docusaurus limit RAM usage in sacrifice of compilation speed?
Reproducible demo
https://github.com/anaclumos/extracranial
Steps to reproduce
- Run
all-in-one:build
Expected behavior
Compiles relatively fast, preferably under 30 minutes
Actual behavior
Takes 3 hours to build.
Your environment
- Public source code:
- Public site URL:
- Docusaurus version used:
- Environment name and version (e.g. Chrome 89, Node.js 16.4):
- Operating system and version (e.g. Ubuntu 20.04.2 LTS):
Self-service
- [X] I'd be willing to fix this bug myself.
Hey
No we didn't change anything recently that could lead to such a significant difference.
But your report is not clear enough.
What was the version of Docusaurus you used before exactly?
How long did it take to build previously?
Can you replicate this only on your computer, or also on CI such as GitHub Actions?
What was the upgrade PR?
Are we even sure it's Docusaurus fault? Your log shows that Done in 12618.98s.. Please show us the time it takes executing only the Docusaurus build command, building just one language for example, and nothing else.
How comes you are reporting using Node.js 16.4 while Docusaurus v3.0 requires Node 18?
First off, huge fan of Docusaurus. Wanted to comment along. Might be tangential to this, we also were increasing a 2x increase in build times upgrading from Docusaurus 3.0.1 to 3.1. We ended up downgrading back to 3.0.1. We leverage our own CI Solution, Harness CI Enterprise.
3.0.1 Builds: 8-9 mins 3.1 Builds: 17-20 mins.
We are wanting to dig in a little further if anyone on Docusaurus Project Side can weigh in On Broken Anchors feature [https://github.com/facebook/docusaurus/pull/9528]. If that feature has to build a list of the anchors, on larger sites if that step takes time. We tried configuring onBrokenMarkdownLinks to Ignore but I believe the process still runs, just. not producing or throwing the output. Potentially moving Ignore to not execute at all?
The big increase comes between Server Compile and the "done" hook.
[success] [webpackbar] Server: Compiled successfully in 7.36m
[SUCCESS] Generated static files in "build".
[INFO] Use `npm run serve` command to test your build locally.
Done in 983.84s.
Node Build Version: 18.19.0
Thanks for a great project!
Hey
No we didn't change anything recently that could lead to such a significant difference.
But your report is not clear enough.
What was the version of Docusaurus you used before exactly?
Was using 3.0.1.
How long did it take to build previously?
It took under 30 minutes.
Can you replicate this only on your computer, or also on CI such as GitHub Actions?
My Docusaurus site is pretty big and doesn't fit on CI machines. RAM usage used to spike to 14GB the sealing process, and all CI machines crashed at this point.
Are we even sure it's Docusaurus fault? Your log shows that
Done in 12618.98s.. Please show us the time it takes executing only the Docusaurus build command, building just one language for example, and nothing else.
I am sure. Every other script finishes under 1 minute, and the it's only the Docusaurus build step that hangs.
How comes you are reporting using Node.js 16.4 while Docusaurus v3.0 requires Node 18?
I am using v18.17.1. Where did you get this information, may I ask?
Hey No we didn't change anything recently that could lead to such a significant difference. But your report is not clear enough. What was the version of Docusaurus you used before exactly?
Was using 3.0.1.
How long did it take to build previously?
It took under 30 minutes.
Can you replicate this only on your computer, or also on CI such as GitHub Actions?
My Docusaurus site is pretty big and doesn't fit on CI machines. RAM usage used to spike to 14GB the sealing process, and all CI machines crashed at this point.
Are we even sure it's Docusaurus fault? Your log shows that
Done in 12618.98s.. Please show us the time it takes executing only the Docusaurus build command, building just one language for example, and nothing else.I am sure. Every other script finishes under 1 minute, and the it's only the Docusaurus build step that hangs.
How comes you are reporting using Node.js 16.4 while Docusaurus v3.0 requires Node 18?
I am using v18.17.1. Where did you get this information, may I ask?
+1 on the sealing process, where the resource usage/time seems to spike for us also. Anything added to that process from 3.0.1 -> 3.1, e.g On Broken Anchors? Thanks!
If it's coming from the brokenAnchor you may try to put onBrokenAnchors in docusaurus.config file to ignore
Maybe you can disable it in your CI but still have a build process somewhere that you run manually / every few times to check for broken links / anchors
Just to add we've recently rolled back from 3.1 to 3.0.1 for this exact same issue (we also have a large site). Normally would take approx 45 mins to build, and with 3.1 moves to just over 2 hours.
However, maybe of interest, when we initially rolled back we updated our package-lock.json and noticed the build times stayed the same (close to 2 hours). Reverting to the original package-lock.json prior to our 3.1 upgrade that we used when originally on 3.0.1, the build went back to 45 mins.
I've just tried it again, and when using 3.0.1, and building without a package-lock.json to use the latest dependencies, the build time more than doubles.
As an aside, onBrokenAnchors: "ignore", made no difference for us (and we also fixed all the broken anchors).
If it's coming from the brokenAnchor you may try to put onBrokenAnchors in docusaurus.config file to
ignoreMaybe you can disable it in your CI but still have a build process somewhere that you run manually / every few times to check for broken links / anchors
Thanks @OzakIOne it's a great feature. Curious, we noticed the same behavior with ignore. Does the onBrokenAnchor always run but does not display results if ignore. If does not run, can weed that out.
@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression.
It would be super helpful for me to be able see/run that upgrade myself and study the package-lock.json diff.
Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.
Does the onBrokenAnchor always run but does not display results if ignore. If does not run, can weed that out.
@ravilach I'd recommend to try turning off both onBrokenLinks: "ignore" and onBrokenLinks: "ignore", because we only "bypass" the broken link checker if both are ignored atm.
I'll try to optimize that better in the future, but in the meantime the code looks like this:
if (onBrokenLinks === 'ignore' && onBrokenAnchors === 'ignore') {
return;
}
const brokenLinks = getBrokenLinks({
routes,
collectedLinks: normalizeCollectedLinks(collectedLinks),
});
reportBrokenLinks({brokenLinks, onBrokenLinks, onBrokenAnchors});
Note: is this possible that you encounter longer build times only due to cache eviction.
We use Webpack with persistent caching and on rebuilds it's supposed to rebuild faster.
It may be possible that your site builds longer simply because the caches were empty?
In this case I suggest trying to run docusaurus clear && docusaurus build on your "fast branch" and see if it becomes slower to build.
@anaclumos I tried using your repo before the upgrade (https://github.com/anaclumos/extracranial/tree/f144432acdfff55d741a1dbc568ae0b51dd052fe) but the usage of Bun package manager makes it inconvenient to troubleshoot.
First when I run bun install on your repo with latest Bun version, it seems to resolve to newer versions of Docusaurus dependency ranges, and modify your bun.lockb file:
Then, the binary format of the lockfile makes it super inconvenient to inspect and diff.
Maybe I could try using the exact same version of Bun you are using, and it would not upgrade? For now I'm unable to troubleshoot this using your repo.
@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression.
It would be super helpful for me to be able see/run that upgrade myself and study the
package-lock.jsondiff.Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.
Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need?
Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need?
@andrewgbell I'd have to run this locally myself, partially upgrading some libs in a dichotomic way to find out which transitive dep cause the problem. I doubt seeing a diff will be enough to identify the problem unfortunately, I need to run the code.
@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression.
It would be super helpful for me to be able see/run that upgrade myself and study the
package-lock.jsondiff.Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.
Does the onBrokenAnchor always run but does not display results if ignore. If does not run, can weed that out.
@ravilach I'd recommend to try turning off both
onBrokenLinks: "ignore"andonBrokenLinks: "ignore", because we only "bypass" the broken link checker if both are ignored atm.I'll try to optimize that better in the future, but in the meantime the code looks like this:
if (onBrokenLinks === 'ignore' && onBrokenAnchors === 'ignore') { return; } const brokenLinks = getBrokenLinks({ routes, collectedLinks: normalizeCollectedLinks(collectedLinks), }); reportBrokenLinks({brokenLinks, onBrokenLinks, onBrokenAnchors});Note: is this possible that you encounter longer build times only due to cache eviction.
We use Webpack with persistent caching and on rebuilds it's supposed to rebuild faster.
It may be possible that your site builds longer simply because the caches were empty?
In this case I suggest trying to run
docusaurus clear && docusaurus buildon your "fast branch" and see if it becomes slower to build.
Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need?
@andrewgbell I'd have to run this locally myself, partially upgrading some libs in a dichotomic way to find out which transitive dep cause the problem. I doubt seeing a diff will be enough to identify the problem unfortunately, I need to run the code.
Ours is Open Source: https://github.com/harness/developer-hub if that helps. Currently on DS 3.0.1.
Here is the yarn.lock from the 3.1 upgrade: https://github.com/harness/developer-hub/blob/7b5fbafc4036f61d30e094362a67204cc573cf7a/yarn.lock
@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression. It would be super helpful for me to be able see/run that upgrade myself and study the
package-lock.jsondiff. Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need?
@slorber If you need another repo, let me know as I can invite you into our org.
Still investigating your site @ravilach, but it looks like there are 2 problems:
- the broken link checker now using node
new URL()is much slower (edit: it's not that it's the matchRoutes calls) - a transitive dependency is causing longer build times (I suspect related to postcss or css-loader)
Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers?
yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest
-
onBrokenLinks: "ignore" -
onBrokenAnchors: "ignore"
Still investigating your site @ravilach, but it looks like there are 2 problems:
* the broken link checker now using node `new URL()` is much slower * a transitive dependency is causing longer build times (I suspect related to postcss or css-loader)Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers?
yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest* `onBrokenLinks: "ignore"` * `onBrokenAnchors: "ignore"`
Thanks @slorber, much appreciated!
Maybe I could try using the exact same version of Bun you are using, and it would not upgrade? For now I'm unable to troubleshoot this using your repo.
I migrated to pnpm.
Still investigating your site @ravilach, but it looks like there are 2 problems:
- the broken link checker now using node
new URL()is much slower- a transitive dependency is causing longer build times (I suspect related to postcss or css-loader)
Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers?
yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest
onBrokenLinks: "ignore"onBrokenAnchors: "ignore"
Hi, I've added:
onBrokenLinks: "ignore",
onBrokenAnchors: "ignore",
onBrokenMarkdownLinks: "throw",
alongside running
npm upgrade @docusaurus/core @docusaurus/cssnano-preset @docusaurus/plugin-client-redirects @docusaurus/plugin-debug @docusaurus/plugin-google-analytics @docusaurus/plugin-google-gtag @docusaurus/plugin-sitemap @docusaurus/preset-classic @docusaurus/theme-classic @docusaurus/theme-mermaid @docusaurus/theme-search-algolia @docusaurus/module-type-aliases @docusaurus/tsconfig
And build time dropped back to the expected (in fact a few minutes quicker, approx 40 mins). I've tried removing
onBrokenAnchors: "ignore",
However build time just back up again to over 2 hours.
I've also tried adding these ignores again
onBrokenLinks: "ignore",
onBrokenAnchors: "ignore",
onBrokenMarkdownLinks: "throw",
but upgrading the whole package-lock.json again. As of today, it slows by about 10% over the run above (About 45 mins), which is a huge improvement on where we were last week so not sure if a dependency has updated since.
So looks like you're correct on the two issues though the brokenlinks and anchors seems to have a far greater impact.
Still investigating your site @ravilach, but it looks like there are 2 problems:
- the broken link checker now using node
new URL()is much slower- a transitive dependency is causing longer build times (I suspect related to postcss or css-loader)
Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers?
yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest
onBrokenLinks: "ignore"onBrokenAnchors: "ignore"
Hi, I've added:
onBrokenLinks: "ignore",
onBrokenAnchors: "ignore",
onBrokenMarkdownLinks: "throw",
alongside running
npm upgrade @docusaurus/core @docusaurus/cssnano-preset @docusaurus/plugin-client-redirects @docusaurus/plugin-debug @docusaurus/plugin-google-analytics @docusaurus/plugin-google-gtag @docusaurus/plugin-sitemap @docusaurus/preset-classic @docusaurus/theme-classic @docusaurus/theme-mermaid @docusaurus/theme-search-algolia @docusaurus/module-type-aliases @docusaurus/tsconfig
And build time dropped back to the expected (in fact a few minutes quicker, approx 40 mins). I've tried removing
onBrokenAnchors: "ignore",
However build time just back up again to over 2 hours.
I've also tried adding these ignores again onBrokenLinks: "ignore", onBrokenAnchors: "ignore", onBrokenMarkdownLinks: "throw",
but this time upgrading the whole package-lock.json. As of today, it now runs through at the same speed as above so not sure if a dependency has updated since.
So looks like you're correct on the brokenlinks and anchors seems to have a far greater impact.
Thanks for reporting @andrewgbell
I've submitted a PR that should optimize things, likely faster than before: https://github.com/facebook/docusaurus/pull/9778
So far it seems to work on @ravilach site.
Could you give it a test by building locally with this modified file?
node_modules/@docusaurus/core/lib/server/brokenLinks.js
"use strict";
/**
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
Object.defineProperty(exports, "__esModule", { value: true });
exports.handleBrokenLinks = void 0;
const tslib_1 = require("tslib");
const lodash_1 = tslib_1.__importDefault(require("lodash"));
const logger_1 = tslib_1.__importDefault(require("@docusaurus/logger"));
const react_router_config_1 = require("react-router-config");
const utils_1 = require("@docusaurus/utils");
const utils_2 = require("./utils");
function matchRoutes(routeConfig, pathname) {
// @ts-expect-error: React router types RouteConfig with an actual React
// component, but we load route components with string paths.
// We don't actually access component here, so it's fine.
return (0, react_router_config_1.matchRoutes)(routeConfig, pathname);
}
function createBrokenLinksHelper({ collectedLinks, routes, }) {
const validPathnames = new Set(collectedLinks.keys());
// Matching against the route array can be expensive
// If the route is already in the valid pathnames,
// we can avoid matching against it as an optimization
const remainingRoutes = routes
.filter((route) => !validPathnames.has(route.path));
function isPathnameMatchingAnyRoute(pathname) {
if (matchRoutes(remainingRoutes, pathname).length > 0) {
// IMPORTANT: this is an optimization here
// See https://github.com/facebook/docusaurus/issues/9754
// Large Docusaurus sites have many routes!
// We try to minimize calls to a possibly expensive matchRoutes function
validPathnames.add(pathname);
return true;
}
return false;
}
function isPathBrokenLink(linkPath) {
const pathnames = [linkPath.pathname, decodeURI(linkPath.pathname)];
if (pathnames.some((p) => validPathnames.has(p))) {
return false;
}
if (pathnames.some(isPathnameMatchingAnyRoute)) {
return false;
}
return true;
}
function isAnchorBrokenLink(linkPath) {
const { pathname, hash } = linkPath;
// Link has no hash: it can't be a broken anchor link
if (hash === undefined) {
return false;
}
// Link has empty hash ("#", "/page#"...): we do not report it as broken
// Empty hashes are used for various weird reasons, by us and other users...
// See for example: https://github.com/facebook/docusaurus/pull/6003
if (hash === '') {
return false;
}
const targetPage = collectedLinks.get(pathname) || collectedLinks.get(decodeURI(pathname));
// link with anchor to a page that does not exist (or did not collect any
// link/anchor) is considered as a broken anchor
if (!targetPage) {
return true;
}
// it's a not broken anchor if the anchor exists on the target page
if (targetPage.anchors.has(hash) ||
targetPage.anchors.has(decodeURIComponent(hash))) {
return false;
}
return true;
}
return {
collectedLinks,
isPathBrokenLink,
isAnchorBrokenLink,
};
}
function getBrokenLinksForPage({ pagePath, helper, }) {
const pageData = helper.collectedLinks.get(pagePath);
const brokenLinks = [];
pageData.links.forEach((link) => {
const linkPath = (0, utils_1.parseURLPath)(link, pagePath);
if (helper.isPathBrokenLink(linkPath)) {
brokenLinks.push({
link,
resolvedLink: (0, utils_1.serializeURLPath)(linkPath),
anchor: false,
});
}
else if (helper.isAnchorBrokenLink(linkPath)) {
brokenLinks.push({
link,
resolvedLink: (0, utils_1.serializeURLPath)(linkPath),
anchor: true,
});
}
});
return brokenLinks;
}
/**
* The route defs can be recursive, and have a parent match-all route. We don't
* want to match broken links like /docs/brokenLink against /docs/*. For this
* reason, we only consider the "final routes" that do not have subroutes.
* We also need to remove the match-all 404 route
*/
function filterIntermediateRoutes(routesInput) {
const routesWithout404 = routesInput.filter((route) => route.path !== '*');
return (0, utils_2.getAllFinalRoutes)(routesWithout404);
}
function getBrokenLinks({ collectedLinks, routes, }) {
const filteredRoutes = filterIntermediateRoutes(routes);
const helper = createBrokenLinksHelper({
collectedLinks,
routes: filteredRoutes,
});
const result = {};
collectedLinks.forEach((_unused, pagePath) => {
try {
result[pagePath] = getBrokenLinksForPage({
pagePath,
helper,
});
}
catch (e) {
throw new Error(`Unable to get broken links for page ${pagePath}.`, {
cause: e,
});
}
});
return result;
}
function brokenLinkMessage(brokenLink) {
const showResolvedLink = brokenLink.link !== brokenLink.resolvedLink;
return `${brokenLink.link}${showResolvedLink ? ` (resolved as: ${brokenLink.resolvedLink})` : ''}`;
}
function createBrokenLinksMessage(pagePath, brokenLinks) {
const type = brokenLinks[0]?.anchor === true ? 'anchor' : 'link';
const anchorMessage = brokenLinks.length > 0
? `- Broken ${type} on source page path = ${pagePath}:
-> linking to ${brokenLinks
.map(brokenLinkMessage)
.join('\n -> linking to ')}`
: '';
return `${anchorMessage}`;
}
function createBrokenAnchorsMessage(brokenAnchors) {
if (Object.keys(brokenAnchors).length === 0) {
return undefined;
}
return `Docusaurus found broken anchors!
Please check the pages of your site in the list below, and make sure you don't reference any anchor that does not exist.
Note: it's possible to ignore broken anchors with the 'onBrokenAnchors' Docusaurus configuration, and let the build pass.
Exhaustive list of all broken anchors found:
${Object.entries(brokenAnchors)
.map(([pagePath, brokenLinks]) => createBrokenLinksMessage(pagePath, brokenLinks))
.join('\n')}
`;
}
function createBrokenPathsMessage(brokenPathsMap) {
if (Object.keys(brokenPathsMap).length === 0) {
return undefined;
}
/**
* If there's a broken link appearing very often, it is probably a broken link
* on the layout. Add an additional message in such case to help user figure
* this out. See https://github.com/facebook/docusaurus/issues/3567#issuecomment-706973805
*/
function getLayoutBrokenLinksHelpMessage() {
const flatList = Object.entries(brokenPathsMap).flatMap(([pagePage, brokenLinks]) => brokenLinks.map((brokenLink) => ({ pagePage, brokenLink })));
const countedBrokenLinks = lodash_1.default.countBy(flatList, (item) => item.brokenLink.link);
const FrequencyThreshold = 5; // Is this a good value?
const frequentLinks = Object.entries(countedBrokenLinks)
.filter(([, count]) => count >= FrequencyThreshold)
.map(([link]) => link);
if (frequentLinks.length === 0) {
return '';
}
return logger_1.default.interpolate `
It looks like some of the broken links we found appear in many pages of your site.
Maybe those broken links appear on all pages through your site layout?
We recommend that you check your theme configuration for such links (particularly, theme navbar and footer).
Frequent broken links are linking to:${frequentLinks}`;
}
return `Docusaurus found broken links!
Please check the pages of your site in the list below, and make sure you don't reference any path that does not exist.
Note: it's possible to ignore broken links with the 'onBrokenLinks' Docusaurus configuration, and let the build pass.${getLayoutBrokenLinksHelpMessage()}
Exhaustive list of all broken links found:
${Object.entries(brokenPathsMap)
.map(([pagePath, brokenPaths]) => createBrokenLinksMessage(pagePath, brokenPaths))
.join('\n')}
`;
}
function splitBrokenLinks(brokenLinks) {
const brokenPaths = {};
const brokenAnchors = {};
Object.entries(brokenLinks).forEach(([pathname, pageBrokenLinks]) => {
const [anchorBrokenLinks, pathBrokenLinks] = lodash_1.default.partition(pageBrokenLinks, (link) => link.anchor);
if (pathBrokenLinks.length > 0) {
brokenPaths[pathname] = pathBrokenLinks;
}
if (anchorBrokenLinks.length > 0) {
brokenAnchors[pathname] = anchorBrokenLinks;
}
});
return { brokenPaths, brokenAnchors };
}
function reportBrokenLinks({ brokenLinks, onBrokenLinks, onBrokenAnchors, }) {
// We need to split the broken links reporting in 2 for better granularity
// This is because we need to report broken path/anchors independently
// For v3.x retro-compatibility, we can't throw by default for broken anchors
// TODO Docusaurus v4: make onBrokenAnchors throw by default?
const { brokenPaths, brokenAnchors } = splitBrokenLinks(brokenLinks);
const pathErrorMessage = createBrokenPathsMessage(brokenPaths);
if (pathErrorMessage) {
logger_1.default.report(onBrokenLinks)(pathErrorMessage);
}
const anchorErrorMessage = createBrokenAnchorsMessage(brokenAnchors);
if (anchorErrorMessage) {
logger_1.default.report(onBrokenAnchors)(anchorErrorMessage);
}
}
// Users might use the useBrokenLinks() API in weird unexpected ways
// JS users might call "collectLink(undefined)" for example
// TS users might call "collectAnchor('#hash')" with/without #
// We clean/normalize the collected data to avoid obscure errors being thrown
// We also use optimized data structures for a faster algorithm
function normalizeCollectedLinks(collectedLinks) {
const result = new Map();
Object.entries(collectedLinks).forEach(([pathname, pageCollectedData]) => {
result.set(pathname, {
links: new Set(pageCollectedData.links.filter(lodash_1.default.isString)),
anchors: new Set(pageCollectedData.anchors
.filter(lodash_1.default.isString)
.map((anchor) => (anchor.startsWith('#') ? anchor.slice(1) : anchor))),
});
});
return result;
}
async function handleBrokenLinks({ collectedLinks, onBrokenLinks, onBrokenAnchors, routes, }) {
if (onBrokenLinks === 'ignore' && onBrokenAnchors === 'ignore') {
return;
}
const brokenLinks = getBrokenLinks({
routes,
collectedLinks: normalizeCollectedLinks(collectedLinks),
});
reportBrokenLinks({ brokenLinks, onBrokenLinks, onBrokenAnchors });
}
exports.handleBrokenLinks = handleBrokenLinks;
node_modules/@docusaurus/core/lib/server/brokenLinks.js
@slorber Yes, that worked great! Replaced the file and ran it with:
onBrokenLinks: "warn", onBrokenAnchors: "warn", onBrokenMarkdownLinks: "throw",
And it built just as quick as earlier. Thanks for all your help with this!
Thanks @andrewgbell
Don't you see any improvement too? On @ravilach site (that I simplified a bit, just 1 docs plugin instance instead of 5), I see a significant improvement in time to handle broken links and total build time.
3.0
handleBrokenLinks: 3:28.361 (m:ss.mmm)
✨ Done in 636.47s.
3.1 before
handleBrokenLinks: 6:32.570 (m:ss.mmm)
✨ Done in 785.73s.
3.1 after optimizations
handleBrokenLinks: 694.907ms
✨ Done in 361.92s.
Hi @slorber , sorry yes. I'd been comparing 3.1 optimisations with 3.1 ignore broken links so hadn't spotted it.
But looking again we get:
3.0 build time with handleBrokenLinks - 54 mins
3.1 (without fix) build time with handleBrokenLinks - 137 mins
3.1 (with fix) build time with handleBrokenLinks - 41 mins
So very significant! Thanks!
Awesome news then 🎉 thanks for reporting
Just updated. It's even faster than before!! Thank you so much 😃
awesome news @anaclumos
Do you mind sharing numbers? How much faster is it?
It used to take around 20 minutes. Now it finishes around 11 minutes.
🤯 Didn't expect it to have such an impact.
Finally this perf regression was a good thing 😄