Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you opt-out of webpackCompilationHash (to improve Netlify build times) #15872

Closed
afenton90 opened this issue Jul 18, 2019 · 21 comments · Fixed by #16686
Closed

Can you opt-out of webpackCompilationHash (to improve Netlify build times) #15872

afenton90 opened this issue Jul 18, 2019 · 21 comments · Fixed by #16686
Assignees

Comments

@afenton90
Copy link
Contributor

Summary

Within the gatsby-script-loader script at the bottom of any page there is a window.webpackCompilationHash set. This hash is updated during every change to anything in the src directory.

This means deploying onto services such as Netlify take longer than expected. Netlify has advanced features for builds so only assets which content has updated since the last build are transferred to the underlying CDN. However, because the webpackCompilationHash is updated if anything changes in src this means all files will be transferred for only a small change.

For example:

  1. gatsby build - To give an initial built site.
  2. Change the 404 template.
  3. gatsby build - Builds site but all outputted pages are changed because a single file has been updated.

For larger sites generating 1000s of pages this is a real problem. An update to a fairly static page will result in the dynamic pages also being updated.

Is it possible that we could add an option for the hash to not be generated an inlined in HTML output?

How integral is it to the inner workings of Gatsby?

Relevant information

Mainly affects large sites being deployed on services such as Netlify.

@afenton90
Copy link
Contributor Author

Guessing lines such as this

pageResources.page.webpackCompilationHash !==
mean that it is pretty integral to some Gatsby features.

@KyleAMathews
Copy link
Contributor

Just double checking but are you on the latest version? We recently made a change where the goal was to address this exact issue #11982

It's possible however we haven't completely nailed the solution still.

@afenton90
Copy link
Contributor Author

I've re-tried this morning with [email protected] and I see the same behaviour. It is only the window.webpackCompilationHash that updates, which is the root cause of the issue.

Happy to help working on a resolution for this. May need some guidance on the best place to start though.

@sidharthachatterjee
Copy link
Contributor

Just spoke with Alex and on preliminary investigation, it appears that webpackCompilationHash is just an artifact from before. I'll be looking into this with some end to end tests!

@afenton90
Copy link
Contributor Author

A thought on an approach for this. @KyleAMathews @sidharthachatterjee
Seems that webpackCompilationHash is still used to decipher whether the Gatsby app has been updated on the server or not. If the webpackCompilationHash in the page data file and window are different then the page reloads. Awesome!

I appreciate that this is still desirable functionality and from reading the comments in #13004 could an option be to use a chunkHash for each page data rather than a hash for the whole of src.

That way the browser could still figure out if the individual page/template had been updated on the server, and build change blast radius would be scoped to just those pages that have been updated.

In the example given in this issue, updating the 404 template would result in the hash for that page chunk changing, but not the hash for index page or any other generated pages. Thus meaning features such as those offered by Netlify are accessible.

@afenton90
Copy link
Contributor Author

After thinking more about this today through doing PR #16389 I think the problem may be better resolved by keeping webpackCompilationHash out of page-data.json files and instead building it into its own file. Say app-data.json?

app-data.json would have the following lifecycle:

  1. gatsby build generates an app-data.json file on every build containing the webpackCompilationHash.
  2. On load of production-app.js the app-data.json file will be fetched, and stored in its current location of window.___webpackCompilationHash.
  3. Once loaded, each navigation change will re-fetch the app-data.json file along with the page-data.json file.
  4. Existing behaviour will be maintained where if the re-fetched webpackCompilationHash does not match that which is assigned to window.___webpackCompilationHash the page will be reloaded.

Thoughts on this approach?

afenton90 added a commit to afenton90/gatsby that referenced this issue Aug 16, 2019
The webpackCompilationHash is loaded in a new app-data.json file to reduce the blast radius of src
changes on generated files

gatsbyjs#15872
afenton90 added a commit to afenton90/gatsby that referenced this issue Aug 16, 2019
The webpackCompilationHash is loaded in a new app-data.json file to reduce the blast radius of src
changes on generated files

gatsbyjs#15872
@KyleAMathews
Copy link
Contributor

Interesting idea! Maybe build-data.json? Thoughts @Moocar? We could throttle fetching this to say every 15 seconds to avoid littering people's network tabs with repeated fetches.

@Moocar
Copy link
Contributor

Moocar commented Aug 17, 2019

@afenton90 This is a really interesting idea. I definitely hear you on the problem of a single change to anything in src resulting in changes to thousands of files. It's one of the trade-offs that I wasn't happy with in the whole page-data.json change.

Without having thought too deeply, I think this could potentially work. One of the downside is that where we once had to only request a single page-data.json, we now have to request an additional app-data.json. But seeing as those two requests can be performed in parallel, and the app-data.json is so small, I don't think that would be a big issue.

I'll dive in to see if it would work.

@afenton90
Copy link
Contributor Author

Thanks for your thoughts @KyleAMathews & @Moocar. I went ahead and did PR #16686 to cover this off.

I’m going to pair with @sidharthachatterjee on Tuesday to make sure all scenarios for this are covered and the tests are good.
Feel free to jump on the call if you are interested?

@sidharthachatterjee
Copy link
Contributor

Published #16686 under gatsby@app-data

@gatsbot
Copy link

gatsbot bot commented Sep 10, 2019

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Sep 10, 2019
@afenton90
Copy link
Contributor Author

not stale

@KyleAMathews KyleAMathews added not stale and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Sep 11, 2019
@seidtgeist
Copy link

seidtgeist commented Sep 12, 2019

Me and @pauloelias are having a possibly related problem. We build periodically every hour and it seems our webpackCompilationHash is changing no matter what. That’s bad, but it’s more of an optimization problem. Also note that we’re running Fastly in front of our site. We only expire HTML pages on successful builds on the CDN, none of the assets.

Could it be that, since page-data.json files don’t contain any kind of unique identifier (such as a content hash) it’s possible to end up with very inconsistent set of compilation hashes across pages very quickly? Which then causes many page navigations to force reloads?

Update: This is really bad for us, as transitions from pages to pages with a query string almost always cause a page reload without the query string, breaking site functionality. Downgrading to 2.8 :(

@seidtgeist
Copy link

It looks like this line in navigation.js actually has a bug:

Shouldn’t window.location be set to the full path in to instead of just pathname? In other words, why remove the querystring and hash from the URL when forcing a reload? They are used for routing purposes and should not be discarded.

@sidharthachatterjee
Copy link
Contributor

Published in [email protected]

@msbtrends
Copy link

Hi is there any fix on this as of yet? trying to run gatsby experimental page build and it seems this is the only thing that updates on each page forcing a rebuild

i.e window.___webpackCompilationHash="bb7aaaf15e866acc2cb4"

@DennisRosen
Copy link

I am having the same issue. According to this PR from 4 years ago the webpackCompilationHash was moved to an app-data.json file. When doing a production build locally I can see that the file ends up in public/page-data/app-data.json and indeed only contains a webpackCompilationHash property. However all built HTML files still contain a __webpackCompilationHash with the same value as the one in app-data.json, which updates on every build causing slow build times. Was this PR partially reverted for some reason? Perhaps the PR only removed webpackCompilationHash from page-data.json files (I see no webpackCompilationHash property inside them)? Is there any reason the hash is still present in HTML files?

@maxthrottleup
Copy link

maxthrottleup commented Jun 14, 2023

I'm having a similar issue and I'm glad I'm not the only one, so we can discuss a solution or workaround. I've been doing research about this problem for about a day and I can present my findings here. But I suspect that this issue might not be related directly to the webpackCompilationHash. However, the change in hash makes the issue evident.

TL;DR
I suspect something in the build for our specific projects is appending a VERY long line at the end of the file. Sometimes that VERY long line is around 95% of the content of the file. And then at the end of the long line we find the webpackCompilationHash. So, when the webpackCompilationHash updates, it makes the Gatsby's incremental build process to believe a huge portion of the HTML file was changed, and thus re-generate the file.

As you see it's a combination of the structure of the code plus the modification of the hash.

More details on my research
In my case, I have a Gatsby TS project using Tailwind CSS and WordPress as the CMS using the gatsby-source-wordpress plugin. Even on incremental builds (with no code nor CMS updates), all the HTML files where re-generated which does not make sense. So I started ripping out parts of the code to find a minimum viable site that I can use to troubleshoot and I found out this article to troubleshoot the incremental builds: https://www.gatsbyjs.com/docs/debugging-incremental-builds/

I was able to cut down the site to 72 HTML pages for troubleshooting, and then I followed the guide for troubleshooting incremental guides. I found out that Gatsby detected that there was a change on those 72 HTML files and that's why it decides to regenerate them. Upon analyzing the diff between incremental builds, I found out that the webpackCompilationHash is the only thing that change, but since it shares the line with a bunch of inline Tailwind CSS stuff, Gatsby thinks (I am guessing here) that a lot of the HTML file was changed, even though when it was only the wepackCompilationHash sharing the same last line of the .html file.

Some things I did to support my theory:

  1. Created a starter plain gatsby site, and perform the steps mentioned in the article about incremental builds. The diff was not able to find any changes. So no webpackCompilationHash was modified here (since I did not do any code modifications).
  2. Created a starter gatsby site with Tailwind CSS and CMS connection to my back-end, however limiting the sourcing to only 10 nodes each. When I followed the same steps mentioned in the article, I was able to find that there was a diff being reported and all changes where due to the webpackCompilationHash. However this DID NOT trigger a re-generation of the HTML files. So this leads me to conclude that the webpackCompilationHash chage alone DOES NOT trigger a re-generation of HTML files.
  3. Repeated the same analysis with my code and found out the diff reports only changes on HTML files that contain in the last line a VERY long line of inline code (probably caused by Tailwind) and at the end the modified webpackCompilationHash. The diff reported 95% code change, but in reality it was only the webpackCompilationHash sharing line with other inline stuff. (It seems currently the diff tool detects changes by line)

CONCLUSSION / POSSIBLE SOLUTION
I think a viable solution here could be one of the following points:

  1. Make Gatsby's incremental build detection even more intelligent by allowing it to only detect if there was a change on the webpackCompilationHash.
  2. Make the webpackCompilationHash appear in a new line on HTML files so the diff is detected properly and so the diff does not get affected by what goes previous of the webpackCompilationHash in the same line.

These are just theories I have. I will continue to do some tests and report back here, but just wanted to share my insights so far. Thanks!

@maxthrottleup
Copy link

maxthrottleup commented Jun 14, 2023

UPDATE: I can confirm that the Solution 2 worked in my case. I modified these files from the gatsby core (I am using "gatsby": "4.21.1"):

node_modules/gatsby/cache-dir/static-entry.js: lines 366-369 (added a line break at the beginning of that string)

// Add page metadata for the current page
const windowPageData = `\n/*<![CDATA[*/window.pagePath="${pagePath}";window.___webpackCompilationHash="${webpackCompilationHash}";${
  inlinePageData ? `window.pageData=${JSON.stringify(pageData)};` : ``
}/*]]>*/`

node_modules/gatsby/cache-dir/commonjs/static-entry.js: lines 403-410 (added a line break at the beginning of that string)

const windowPageData = `\n/*<![CDATA[*/window.pagePath="${pagePath}";window.___webpackCompilationHash="${webpackCompilationHash}";${inlinePageData ? `window.pageData=${JSON.stringify(pageData)};` : ``}/*]]>*/`;
postBodyComponents.push( /*#__PURE__*/React.createElement("script", {
  key: `script-loader`,
  id: `gatsby-script-loader`,
  dangerouslySetInnerHTML: {
    __html: windowPageData
  }
})); // Add chunk mapping metadata

After applying those changes, modify a post in WordPress and rebuild incrementally, I was able to see that Gatsby successfully detected the incremental build and only generated one HTML file matching the change made.

@sidharthachatterjee Could you please take a look at my previous messages and advice on a long term solution for this? Thanks!!

@DennisRosen
Copy link

@maxthrottleup that's a great find! Not sure if @sidharthachatterjee is working on Gatsby anymore though. Actually I'm not sure on who to ping for support here at all.

@maxthrottleup
Copy link

@DennisRosen This issue may be related to this: #33450 (comment)

I have not tested the opting out of Webpack cache but I will when I have some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants