Characters encoding issue in static pages in NextJS
Last modified on Wed 18 May 2022

Motivation

Although the characters encoding issue is not an issue with NextJS but React (check more details here), you can encounter this if you have static pages that have some HTML entities in the page content - e.g. ampersand (&) in query parameters.

These HTML entities will not be encoded by default, which will result in incorrect content. This will pose an issue mostly for crawlers.

Ampersand is encoded incorrect

This can be fixed by adding a custom decode method in the _document file.

Disclaimer:

To prepare for React 18, we recommend avoiding customizing getInitialProps and renderPage, if possible.

Implementation

If you do not have the _document file in the pages folder, create a file with the default content (copy all except gIP in _document) and:

import { decode } from 'html-entities';
static async getInitialProps(ctx) {
    const initialProps = await Document.getInitialProps(ctx);
    // based on https://github.com/vercel/next.js/issues/2006
    return {
        ...initialProps,
        html: initialProps.html.replace(
            /(href|src|srcSet)="([^"]+)"/g,
            (match, attribute, value) => `${attribute}="${decode(value)}"`
        ),
    };
}

The above code searches and replaces all href, src and srcSet attributes in all pages with decoded characters.

Ampersand is now encoded correctly

Conclusion

If you need to encode HTML entities, the proposed solution will work well. There is no need to add this by default.