Why does saving_lib/_load_model_from_fileobj unpack the archive? #19971
-
I came across a nasty race condition in _load_model_from_fileobj (which was recently patched by bba7b1a, kudos to @james77777778 for the fix) where two processes loading the model would each create a file named model_weights.h5 and then both would try to remove it. My workaround was to just raise an OSError manually so that the 'fallback' path is always used, but it got me wondering: Why unpack the zip file at all if we can extract the weights directly with H5IOStore(_VARS_NAME_H5, zf)? Is there a performance boost to extracting the weights beforehand? I'm on a NFS filesystem and it's really awful at working with many small files (or lots of seeks and small reads in large files), so if there's a way to reduce the number of file creation and deletion events, I'd be very happy. For reference, loading up a 256KiB .keras file takes about five seconds on our system with the (buggy) 3.4.1 release. Related, what's the timeline for releasing 3.4.2? I can't currently advise people to use my code on our institute cluster since they could encounter the race condition. Cheers, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hello Charles, If possible can you try, backend based model saving/loading for the model and check if that is unique to Keras only or just some condition in the backend is causing it? If possible prefer attaching a minimal Google Colab gist, that makes job easier to spot the bug and suggest a fix. Thank You. |
Beta Was this translation helpful? Give feedback.
-
Hi @mmtrebuchet |
Beta Was this translation helpful? Give feedback.
Hi @mmtrebuchet
I have submitted a PR #19989 to improve saving/loading performance for small models.
Feel free to share your thoughts on it. Thanks.