Why does saving_lib/_load_model_from_fileobj unpack the archive? #19971

mmtrebuchet · 2024-07-10T19:48:03Z

mmtrebuchet
Jul 10, 2024

I came across a nasty race condition in _load_model_from_fileobj (which was recently patched by bba7b1a, kudos to @james77777778 for the fix) where two processes loading the model would each create a file named model_weights.h5 and then both would try to remove it. My workaround was to just raise an OSError manually so that the 'fallback' path is always used, but it got me wondering: Why unpack the zip file at all if we can extract the weights directly with H5IOStore(_VARS_NAME_H5, zf)? Is there a performance boost to extracting the weights beforehand? I'm on a NFS filesystem and it's really awful at working with many small files (or lots of seeks and small reads in large files), so if there's a way to reduce the number of file creation and deletion events, I'd be very happy.

For reference, loading up a 256KiB .keras file takes about five seconds on our system with the (buggy) 3.4.1 release.

Related, what's the timeline for releasing 3.4.2? I can't currently advise people to use my code on our institute cluster since they could encounter the race condition.

Cheers,
Charles.

Answered by james77777778

Jul 18, 2024

Hi @mmtrebuchet
I have submitted a PR #19989 to improve saving/loading performance for small models.
Feel free to share your thoughts on it. Thanks.

View full answer

abhaskumarsinha · 2024-07-12T16:27:30Z

abhaskumarsinha
Jul 12, 2024

Hello Charles, If possible can you try, backend based model saving/loading for the model and check if that is unique to Keras only or just some condition in the backend is causing it?

If possible prefer attaching a minimal Google Colab gist, that makes job easier to spot the bug and suggest a fix.

Thank You.

1 reply

mmtrebuchet Jul 12, 2024
Author

Howdy, Abhas. The behavior is unique to Keras due to a logic error on lines 329-331 and 366-367 in saving_lib.py. The issue is that a file is created on disk and then removed, but another process could create and delete a file with the same name at the same time, causing the race. The bug was fixed by bba7b1a and I'm more curious about the reason for the behavior. Why are the weights extracted at all? It seems like an unnecessary filesystem operation.
Cheers,
Charles.

james77777778 · 2024-07-18T02:06:28Z

james77777778
Jul 18, 2024

Hi @mmtrebuchet
I have submitted a PR #19989 to improve saving/loading performance for small models.
Feel free to share your thoughts on it. Thanks.

1 reply

mmtrebuchet Jul 23, 2024
Author

That looks great. The use of a tempfile should clear up the race, and getting a performance boost for small models is icing on the cake! Thanks for your hard work!
Cheers,
Charles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does saving_lib/_load_model_from_fileobj unpack the archive? #19971

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Why does saving_lib/_load_model_from_fileobj unpack the archive? #19971

mmtrebuchet Jul 10, 2024

Replies: 2 comments · 2 replies

abhaskumarsinha Jul 12, 2024

mmtrebuchet Jul 12, 2024 Author

james77777778 Jul 18, 2024

mmtrebuchet Jul 23, 2024 Author

mmtrebuchet
Jul 10, 2024

Replies: 2 comments 2 replies

abhaskumarsinha
Jul 12, 2024

mmtrebuchet Jul 12, 2024
Author

james77777778
Jul 18, 2024

mmtrebuchet Jul 23, 2024
Author