You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(I've discussed this loosely with Matt and Jacob in Slack, but writing it up here)
When a site is hosted on a platform which has a hard, non-configurable threshold for how long HTTP request can take (eg 30 seconds) a transfer that involves a sizeable video, or a number of other media files, can easily exceed this threshold. This kills the transfer, leaving pages rolled back, but third-party models (eg wagtailmedia) can be in an indeterminate state in terms of files on disks, somewhere.
The timeout happens because the overall WT import takes place over a single HTTP request, and transferring an asset file as part of the request-response cycle involves the time take to copy the file.
This problem is exacerbated when media files are stored in cloud storage, which is common for many PaaS setups.
eg:
Destination Server -> asks Source Server -> asks Source's Storage for file -> Source's Storage returns file to Source Server -> Source Server sends file to Destination Server -> Destination Server stores file in Destination's Storage.
So that's the same file being processed (read or written) 3 or 4 times, depending on upload spooling.
Possible solutions
Temporary workaround: Use field-based identification of the problematic model types using WAGTAILTRANSFER_LOOKUP_FIELDS and manually pre-copy the data to the Destination server, so that the Source Server does not have to send it. This works for wagtailmedia.media and its MTI subclasses
A solution: Move the Transfer process to multiple AJAX calls (as suggested by Matt), so that we reduce the risk of a timeout. However for large files we may still miss this
Alternative solution: Detect if files are in cloud storage and support direct cloud-to-cloud syncing of those files, if possible. (However this could also get complex, especially if a cloud-based function is needed to do the copy)
Boto3 appears to have support for CopyObject [docs] which is promising
More to come - and more welcome
(Separate from all the above, it would be nice to have a pre-flight check before a transfer to warn about large files that will be sent over)
The text was updated successfully, but these errors were encountered:
I've done some work to do direct S3-to-S3 copying using a custom field adapter, which - while not yet in production - seems to be pretty reliable within some known constraints (eg only works for data with a public-read policy). If anyone's interested, the code is open source and I can point you at the relevant bits of the implementation.
If there's appetite for making this part of WT, @jacobtoppm, I'd be happy to do that when I have time.
(I've discussed this loosely with Matt and Jacob in Slack, but writing it up here)
When a site is hosted on a platform which has a hard, non-configurable threshold for how long HTTP request can take (eg 30 seconds) a transfer that involves a sizeable video, or a number of other media files, can easily exceed this threshold. This kills the transfer, leaving pages rolled back, but third-party models (eg
wagtailmedia
) can be in an indeterminate state in terms of files on disks, somewhere.The timeout happens because the overall WT import takes place over a single HTTP request, and transferring an asset file as part of the request-response cycle involves the time take to copy the file.
This problem is exacerbated when media files are stored in cloud storage, which is common for many PaaS setups.
eg:
Destination Server -> asks Source Server -> asks Source's Storage for file -> Source's Storage returns file to Source Server -> Source Server sends file to Destination Server -> Destination Server stores file in Destination's Storage.
So that's the same file being processed (read or written) 3 or 4 times, depending on upload spooling.
Possible solutions
Temporary workaround: Use field-based identification of the problematic model types using
WAGTAILTRANSFER_LOOKUP_FIELDS
and manually pre-copy the data to the Destination server, so that the Source Server does not have to send it. This works forwagtailmedia.media
and its MTI subclassesA solution: Move the Transfer process to multiple AJAX calls (as suggested by Matt), so that we reduce the risk of a timeout. However for large files we may still miss this
Alternative solution: Detect if files are in cloud storage and support direct cloud-to-cloud syncing of those files, if possible. (However this could also get complex, especially if a cloud-based function is needed to do the copy)
CopyObject
[docs] which is promisingMore to come - and more welcome
(Separate from all the above, it would be nice to have a pre-flight check before a transfer to warn about large files that will be sent over)
The text was updated successfully, but these errors were encountered: