-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NODE_COMPILE_CACHE doesn't cache all loaded files #4484
Comments
cc @joyeecheung as I believe you implemented the cache |
Have you checked how many unique files are loaded? The load hooks are triggered multiple times whenever a module identified by the same URL is loaded (which I believe is a drawback of the current
The cache would be re-generated (invalidated) when the code changes, or the flags used to run Node.js is changed etc. It's hard to tell why without looking at the code. Have you tried running it without using the load hooks? To debug it without using the load hooks, you can try the |
Thanks for the fast response!
I used a set - so as long as the uniqueness isn't based on something other than the module URL, it should be good
I have not - about a year ago we migrated the code from something heinous into a more standard ESM approach, but to do this we used a load hook to add the import attributes where we import JSON files. I can probably fix these (maybe 100 places in total) - would simply eliminating the load hooks be sufficient? Or would I also need to remove the
as in, remove |
I suspect the load hooks might play a role in the strange behaviors you've seen here - the loader hooks are a bit all over the place and loading : compilation is not really 1:1. The compile cache is only hit when the module gets to the compilation phase. If the application somehow maps many different source code to the same URL, it may fare poorly as the compilation cache is basically mapped to source files 1:1, keyed by the file name (converted from URL if it's ESM).
You put in both - |
That gave me exactly the output I wanted - thanks! The problem was me killing the process made it exit halfway through writing the cache files. Now the startup takes only 11 seconds (roughly what I'd expected), there are Is there a programatic way to persist the files? Or a specific exit code that will ensure it will wait? Not super important, but how is the hash calculated? I'm still only matching 359 files (out of either 15k or 25k, depending on how you look at it).
No obvious variations (e.g., absolute) exist in the cache directory - can you point me to how to calculate the value? I'm pretty close because it works...some of the time :D For reference, The loader is super trivial:
|
There is a WIP in nodejs/node#54971
It's roughly: const kCommonJS = 0, kESM = 1;
function getCacheName(filename, type) {
let crc = 0;
crc = zlib.crc32(Buffer.from([type]), crc);
crc = zlib.crc32(Buffer.from(filename, 'utf-8'), crc);
return (crc >>> 0).toString(16) + '.cache';
}
getCacheName('/Users/joyee/projects/node/test/fixtures/compile-cache-flush.js', kCommonJS);
// '6a7f7d45.cache' - matches what I see locally with the logs Note that the full file name, not the file URL, is part of the hash, and it should be utf-8 encoded. |
The |
Actually I am not sure if the |
Node.js Version
v22.8.0
NPM Version
10.8.2
Operating System
Linux xxx 6.5.0-41-generic #41~22.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 3 11:32:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
fs, v8, vm
Description
I'm working on a large application that loads around 15k files at startup and takes 20 seconds - a lot of this time I suspect is just compiling the modules. As such, I've been playing around with the new
NODE_COMPILE_CACHE
option, and I'm a bit confused - this is primarily a request to understand the caching, or some hints on how to debug it myself.load
plugin to get an exact count of the files loaded - it came out to 15312. The same exact count across multiple runs - exactly as I'd expectEach run of the application should have been exactly the same - e.g., there are no dynamic imports, I literally ran
npm start
multiple times.Is there a good way of debugging this? Any hooks or logs that can be triggered either during cache insert, hit or miss? There is no obvious pattern to the files that were matched - e.g., there was a smattering of
node_modules
and individual/pairs of files from folders within the application with very similar siblings, loaded in the exact same way.Have I materially misunderstood the implementation of this feature? E.g., is the hash key perhaps dependant on load order?
Minimal Reproduction
It would be extremely difficult to provide a minimal repro here - I expect the complexity of the application is part of the problem. I was not able to repro on a trivial application (e.g., a single huge source file) - in this situation the cache contains a single entry and I'm able to match the hash from the absolute path to the cache file.
Output
No response
Before You Submit
The text was updated successfully, but these errors were encountered: