-
-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't store resources content in memory #386
Comments
To underline why this might be important, I was trying to scape https://www.image-line.com/support/flstudio_online_manual/ and ended up with a 20GB footprint. So that would have crashed on many machines =) |
I'm having issues with this same problem, is there an ETA? :) |
Hi @beije |
If you can write up what needs to be done, at least, then I'm sure someone would be willing to work on it. Even if there's no time to write code, there's always time to go "for those who want to work on this, you want to look in files A, B, and C, in the functions X, Y, and Z, because that's where U happens, which leads to V" =) The original comment is already more than most project maintainers will drop in a "to do" issue, but for external contributions it just needs a little bit more to get folks started on helping out. |
Hey @Pomax You are right, better to document what needs to be done. I've updated initial issue description with more information. Tricky part here is that it will be huge update - whole mechanism should be reworked - it looks quite complicated for me now. But there is also a good part - we have quite good test coverage that will help with updates :) |
oh dear, that does sound daunting... thank goodness for test coverage! And thank you for updating what needs to be done! |
Well, to not throw everything on its head, maybe first step could be using standard saving file from buffer, just file by file, as a lifecycle event (almost like a plugin, but would need to have default saving disabled), without streaming. It would be much much better anyways, because keeping X (where X = concurrent connections) is much better than having all files in the memory at the same time, until they are dumped (at the end). Additionally UX would improve, because first times i was running the script i was waiting for a long time, seeing nothing, because even output directory was not created until it was finished. I didnt look at the codebase, im just brainstorming to hopefully push things forward even if its not ideal on first iteration. |
Hi @pavelloz Just to clarify - not all files are stored in memory and saved only at the end. When resource has no dependencies - it's saved to directory immediately. And only resources with dependencies saved after all dependencies resolved and downloaded (for example, html file with 5 images will be saved after all images downloaded). Thank you for suggestion. |
I think we can actually solve the memory usage issue fairly easily.
|
Now all pages are stored in memory (each resource content is stored in
Resource.text
) which cause high memory consumption.It would be nice to avoid storing
Resource.text
and save resourcess directly to FS just after they were receivedProbably we can use streams for that
Request -> update links/images/styles/etc. -> saveResource
Request -> saveResource
when content modification is not neededTo do:
text
property and related functionality. Probably store reference to stream for resourcerequestQueue
property withstreamsQueue
, replacerequestedResourcePromises
withrequestResourceStreams
or remove it, use streams instead of promises in request fileafterResponse
,saveResource
Questions:
The text was updated successfully, but these errors were encountered: