Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RangeError: Invalid string length when building with a very large number of blogs #9907

Open
6 of 7 tasks
ardavank opened this issue Mar 4, 2024 · 7 comments
Open
6 of 7 tasks
Labels
bug An error in the Docusaurus core causing instability or issues with its execution

Comments

@ardavank
Copy link

ardavank commented Mar 4, 2024

Have you read the Contributing Guidelines on issues?

Prerequisites

  • I'm using the latest version of Docusaurus.
  • I have tried the npm run clear or yarn clear command.
  • I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
  • I have tried creating a repro with https://new.docusaurus.io.
  • I have read the console error message carefully (if applicable).

Description

docusaurus build returns RangeError: Invalid string length when building the website with a very large number of blogs (~30K)

Error:

$ NODE_OPTIONS=--max_old_space_size=12288 npm run build -- --locale fa

> [email protected] build
> docusaurus build --locale fa

[INFO] [fa] Creating an optimized production build...

Error: Unable to build website for locale fa.
    at tryToBuildLocale (/Users/ardavan/Documents/sample.com/node_modules/@docusaurus/core/lib/commands/build.js:55:19) {
  [cause]: RangeError: Invalid string length
      at JSON.stringify (<anonymous>)
      at Object.contentLoaded (/Users/ardavan/Documents/sample.com/node_modules/@docusaurus/plugin-content-blog/lib/index.js:135:104)
      at /Users/ardavan/Documents/sample.com/node_modules/@docusaurus/core/lib/server/plugins/index.js:99:22
      at async Promise.all (index 3)
      at async loadPlugins (/Users/ardavan/Documents/sample.com/node_modules/@docusaurus/core/lib/server/plugins/index.js:63:5)
      at async load (/Users/ardavan/Documents/sample.com/node_modules/@docusaurus/core/lib/server/index.js:76:58)
      at async buildLocale (/Users/ardavan/Documents/sample.com/node_modules/@docusaurus/core/lib/commands/build.js:95:19)
      at async tryToBuildLocale (/Users/ardavan/Documents/sample.com/node_modules/@docusaurus/core/lib/commands/build.js:46:20)
}

[INFO] Docusaurus version: 3.1.1
Node version: v20.11.0
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Looking at the code, it looks like JSON.stringify({ blogPosts: listedBlogPosts } is the problematic part

Reproducible demo

No response

Steps to reproduce

  • Generate a lot of blog posts (~30K)
  • run build

Expected behavior

  • Should build the website without any issues

Actual behavior

it throws RangeError: Invalid string length

Your environment

  • Public source code:
  • Public site URL:
  • Docusaurus version used:
  • Environment name and version (e.g. Chrome 89, Node.js 16.4):
    Docusaurus version: 3.1.1
    Node version: v20.11.0
  • Operating system and version (e.g. Ubuntu 20.04.2 LTS):
    macOS 14.3.1 (23D60)

Self-service

  • I'd be willing to fix this bug myself.
@ardavank ardavank added bug An error in the Docusaurus core causing instability or issues with its execution status: needs triage This issue has not been triaged by maintainers labels Mar 4, 2024
@slorber slorber removed the status: needs triage This issue has not been triaged by maintainers label Mar 4, 2024
@slorber
Copy link
Collaborator

slorber commented Mar 4, 2024

A 30k blog post? 🤯 why do you have so many?

Yes, it seems like we should not use JSON.stringify but instead use a serialization lib supporting streaming.

For example https://github.com/dominictarr/JSONStream#jsonstreamstringifyopen-sep-close


Note to myself: I'll probably take this opportunity to encapsulate this streaming, and expose simpler, more testable interfaces to create route data bundles because I'm not a fan of our historic plugin actions API for that:

// Create a blog archive route
const archiveProp = await createData(
  `${docuHash(archiveUrl)}.json`,
  JSON.stringify({blogPosts: listedBlogPosts}, null, 2),
);
addRoute({
  path: archiveUrl,
  component: blogArchiveComponent,
  exact: true,
  modules: {
  archive: aliasedSource(archiveProp),
  },
});

That would be more convenient to have everything handled for you, and just write:

addRoute({
  path: archiveUrl,
  component: blogArchiveComponent,
  exact: true,
  props: {
    archive: {blogPosts: listedBlogPosts},
  },
});

Note we'd still need to keep a way to create data bundles independent from routes (+ streaming support), because those data bundles can be shared between routes, reducing the amount of data to load when navigating from one route to another. (although this data could probably be added as a routeContext)

@ardavank
Copy link
Author

ardavank commented Mar 5, 2024

@slorber thanks for looking into this issue, I'm replacing Wordpress with Docusaurus and I'm migrating an existing news website. The reason for this is the simplicity, speed and the low maintenance cost of Docusaurus!

You can see it in action: https://mesghalapp.com/en/news

Currently I have two blockers:
1- Is the issue with the JSON.stringify
2- The 26mb file limit of Cloudflare

Screenshot 2024-03-04 at 23 01 13

The screenshot above is for including only ~2K blogs
The JSON.stringify({blogPosts: listedBlogPosts} creates a massive file especially when the blogs are written in other languages that need to be encoded as well.

Screenshot 2024-03-04 at 23 14 18

is possible to split this file into smaller files as well?

@slorber
Copy link
Collaborator

slorber commented Mar 5, 2024

https://mesghalapp.com/en/news/archive/

To be honest I don't think Docusaurus is designed to support that kind of scale. It seems you have 2k entries just for 2024, and if it keeps growing at the same pace the build time will quickly become unsustainable.

You'd rather use a docs framework that supports server-side-rendering.

Note that the blog archive page can be disabled with option archiveBasePath: null, so it might unlock you temporary:
https://docusaurus.io/docs/api/plugins/@docusaurus/plugin-content-blog#archiveBasePath

But even with that solution, I doubt Docusaurus will be the best choice for your need.

@ardavank
Copy link
Author

ardavank commented Mar 6, 2024

@slorber thanks for info, currently I'm only keeping the news for 3 months by deleting the older ones to manage this limitation.
I feel like the build process can be improved, can you please guide me on how to debug the build and get more performance log for each step of the progress?
I'm interested in improving the build performance

@slorber
Copy link
Collaborator

slorber commented Mar 6, 2024

I'm interested in improving the build performance

You are lucky because I'm actively working on Docusaurus performance issues right now.

The upcoming v3.2 release will be faster and have some basic perf logging that you can turn on with DOCUSAURUS_PERF_LOGGER=true (internal usage, not publicly documented for now). You can try using the latest canary to benefit from these improvements. Afaik canary doesn't have yet any breaking change from v3 so it should be fine.

However, it does not fix all the problems yet, and the main unfixed bottleneck remains bundling your app with Webpack for both client consumption and SSR. Also, the bundle we assemble for server/node usage is historically a huge single JS file, that causes memory issues during the SSG phase.

Yes the build process can be improved, but this is likely quite technical and I'd prefer it to handle it myself. Most likely we will try to swap Webpack by Rspack soon and provide a flag to enable Rspack to provider an incremental migration path. But Rspack is not yet 100% retrocompatible with Webpack so it might not even work right now.

@ardavank
Copy link
Author

@slorber
I tried version 0.0.0-5861, and here are the results:

> [email protected] build
> docusaurus build --locale en --out-dir build/en

[PERF] Get locales to build: 0.197ms
[INFO] [en] Creating an optimized production build...
[PERF] Load - loadContext: 149.412ms
[PERF] Plugins - initPlugins: 123.724ms
[PERF] Plugin - loadContent - docusaurus-plugin-sitemap@default: 0.609ms
[PERF] Plugin - loadContent - docusaurus-plugin-google-gtag@default: 0.646ms
[PERF] Plugin - loadContent - docusaurus-bootstrap-plugin@default: 0.703ms
[PERF] Plugin - loadContent - docusaurus-mdx-fallback-plugin@default: 0.865ms
[PERF] Plugin - loadContent - docusaurus-theme-classic@default: 22.973ms
[PERF] Plugin - loadContent - docusaurus-plugin-content-pages@default: 110.539ms
[PERF] Plugin - loadContent - docusaurus-plugin-content-blog@default: 1.233s
[PERF] Plugins - loadContent: 1.234s
[PERF] Plugins - contentLoaded - docusaurus-plugin-sitemap@default: 0.081ms
[PERF] Plugins - contentLoaded - docusaurus-theme-classic@default: 0.13ms
[PERF] Plugins - contentLoaded - docusaurus-bootstrap-plugin@default: 0.207ms
[PERF] Plugins - contentLoaded - docusaurus-mdx-fallback-plugin@default: 0.254ms
[PERF] Plugins - contentLoaded - docusaurus-plugin-google-gtag@default: 1.55ms
[PERF] Plugins - contentLoaded - docusaurus-plugin-content-pages@default: 3.212ms
[PERF] Plugins - contentLoaded - docusaurus-plugin-content-blog@default: 803.121ms
[PERF] Plugins - contentLoaded: 814.188ms
[PERF] Plugins - allContentLoaded - docusaurus-plugin-content-blog@default: 0.085ms
[PERF] Plugins - allContentLoaded - docusaurus-plugin-content-pages@default: 0.089ms
[PERF] Plugins - allContentLoaded - docusaurus-plugin-sitemap@default: 0.115ms
[PERF] Plugins - allContentLoaded - docusaurus-theme-classic@default: 0.134ms
[PERF] Plugins - allContentLoaded - docusaurus-plugin-google-gtag@default: 0.152ms
[PERF] Plugins - allContentLoaded - docusaurus-bootstrap-plugin@default: 0.168ms
[PERF] Plugins - allContentLoaded - docusaurus-mdx-fallback-plugin@default: 0.185ms
[PERF] Plugins - allContentLoaded: 1.043ms
[PERF] Plugins - loadPlugins: 2.176s
[PERF] Load - loadPlugins: 2.176s
[PERF] Load - loadSiteMetadata: 0.478ms
[PERF] Load - loadCodeTranslations: 1.057ms
[PERF] Load - createSiteFiles: 195.6ms
[PERF] Loading site: 2.529s
[PERF] Creating webpack configs: 334.828ms
[PERF] Deleting previous client manifest: 0.569ms

✔ Client
  Compiled successfully in 5.21m

✔ Server
  


● Client █████████████████████████ cache (99%) shutdown IdleFileCachePlugin
 stored

✔ Server
  

[PERF] Bundling: 5:21.375 (m:ss.mmm)
[PERF] Reading client manifest: 17.088ms
[PERF] Compiling SSR template: 1.507ms
SSG - Load server bundle
[PERF] SSG - Load server bundle: 44.964ms
[PERF] SSG - Server bundle size = 29.351 MB
[PERF] SSG - Evaluate server bundle: 842.003ms
[PERF] Loading App renderer: 887.578ms
[PERF] Generate static files: 1:55.173 (m:ss.mmm)
[PERF] Executing static site generation: 1:56.080 (m:ss.mmm)
[PERF] Deleting server bundle: 2.146ms
[PERF] Executing postBuild(): 215.985ms
[PERF] Executing broken links checker: 809.84ms
[SUCCESS] Generated static files in "build/en".
[INFO] Use `npm run serve` command to test your build locally.

it looks like that Generate static files and Executing static site generation steps are taking ~4 minutes.
Is this expected?

@slorber
Copy link
Collaborator

slorber commented Mar 10, 2024

It's more 2min than 4min, because the log is unclear but one ask is composed of another.

And yes that seems expected that rendering, minifying and wriing thousands of static files takes time.

For a blog, the number of pages to generate can grow quickly depending on your usage of tags and your pagination setting.

What takes the most time remains the bundling phase, which has not been optimized yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An error in the Docusaurus core causing instability or issues with its execution
Projects
None yet
Development

No branches or pull requests

3 participants
@slorber @ardavank and others