-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to Manual and new case study from BFI #432
base: main
Are you sure you want to change the base?
Conversation
First entry
Adding anchor points
1 year study data
Image sequence assessment
Add horizontal lines
Add demux section
Update mux to encode
Spaces and conclusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added those amendments Jérôme, thanks!
- 40Gbps Network card | ||
- NAS storage with 40Gbps network card | ||
|
||
The more CPU threads you have the better your FFmpeg encode to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our configuration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpu```. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@digitensions is it possible to have a rough estimate of the encoding speed with one content processed e.g. RGB 10-bit 2K. I am curious to see how the CPU behaves with one single encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@digitensions also, if possible, to have an average speed per "parallel" jobs and the count of "parallel" jobs you have, for similar content e.g. all RGB 10-bit 2K, so we can see the difference between a single job and "parallel" usage.
Yes I’ll try and get one encoded in coming weeks. We have a lot of encodings running at the moment to clear storage backlogs.On 7 Mar 2024, at 15:43, Jérôme Martinez ***@***.***> wrote:
@JeromeMartinez commented on this pull request.
In Doc/Case_study.md:
+* [Additional resources](#links)
+
+---
+### <a name="server_config">Server configurations</a>
+
+To encode our DPX sequences we have a single server that completes this work against 6 different Network Attached Storage (NAS) devices in parallel.
+
+Our current server configuration:
+- Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
+- 252GB RAM
+- 32-core with 64 CPU threads
+- Ubuntu 20.04 LTS
+- 40Gbps Network card
+- NAS storage with 40Gbps network card
+
+The more CPU threads you have the better your FFmpeg encode to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our configuration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpu```.
@digitensions is it possible to have a rough estimate of the encoding speed with one content processed e.g. RGB 10-bit 2K. I am curious to see how the CPU behaves with one single encoding.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Add 2k bits
Add sublicense data
Typing error fixes
Add MKV durations
Clean up month statement
Doc/Case_study.md
Outdated
|
||
Next the first file within the image sequence is checked against a [Media Area's MediaConch software](https://mediaarea.net/MediaConch) policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion or possible anomalies in the DPX. | ||
|
||
The frame pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the RAWcooked encode based on previous reduction experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequences to make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions could occur so map 1TB to 1TB. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB. Where a sequence is over 1TB we have Python scripts to split that DPX sequence across additional folders depending on total size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atleast
> at least
?
Doc/Case_study.md
Outdated
|
||
### <a name="muxing">Encoding the image sequence</a> | ||
|
||
To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file and embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be verified as bit-identical to the original source sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imporantly
> importantly
Doc/Case_study.md
Outdated
|
||
### <a name="ffv1_demux">FFV1 Matroska decode to image sequence</a> | ||
|
||
We have automation scripts that return an FFV1 Matroska back to the original image sequence. These are essential for our film preseration colleagues who may need to perform grading or enhancement work on preserved films. For this we use the ```--all``` command again which can select decode when an FFV1 Matroska is supplied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preseration
?
Doc/Case_study.md
Outdated
--- | ||
## Conclusion | ||
|
||
We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding a specific DPX scan - there can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seqeuence
> sequence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few typos
Update encoding timings with set parallel encoding number
Rewrite last paragraph, for review
Clean up spelling errors
Added network contention info to the introduction section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few typos, otherwise LGTM.
|
||
Next the first file within the image sequence is checked against a DPX policy created using [Media Area's MediaConch software](https://mediaarea.net/MediaConch) - ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion or possible anomalies in the DPX. | ||
|
||
The frame pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the RAWcooked encode based on previous reduction experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequence will make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions could occur so map 1TB to 1TB. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB. Where a sequence is over 1TB we have Python scripts to split that DPX sequence across additional folders depending on total size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atleast
> at least
?
|
||
### <a name="muxing">Encoding the image sequence</a> | ||
|
||
To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into one. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file and embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be verified as bit-identical to the original source sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imporantly
> importantly
|
||
We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. | ||
|
||
Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered. This is usually indicated in error logs or when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg while encoding a specific DPX scan. There can be many differences found in DPX metadata depending on the scanning technology used. Where errors are found by our automations these are reported to an error log named after the image seqeuence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seqeuence
> sequence
* Parallel 2K RGB 16-bit DPX (367 GB) - MKV duration 11:34 - encoding time 2:40:00 - MKV was 27.6% smaller than the DPX | ||
* Parallel 2K RGB 16-bit DPX (325 GB) - MKV duration 10:15 - encoding time 2:21:00 - MKV was 24.4% smaller than the DPX | ||
|
||
It provides us with great reassurance to implement the ```--all``` command and we remain highly satisfied with RAWcooked encoding of DPX sequences despite the reduction in our concurrent encodings. The embedded DPX hashes which ```--all``` includes are critical for long-term preservation of the digitised film. In addition there are checksums embedded in the slices of every video frame (up to 576 checksums *per* video frame) allowing granular analysis of any problems found with digital FFV1 preservation files, should they arise. This is thanks to the FFV1 codec, and it allows us to pinpoint exactly where digital damage may have ocurred. This means we can easily replace the impacted DPX files using our duplicate preservation copies. Open-source RAWcooked, FFV1 and Matroska allow open access to their source code which means reduced likelihood of obsolescence long into the future. Finally, we plan to begin testing RAWcooked encoding of TIFF image sequences with the intention of encoding DCDM image sequences to FFV1 also. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ocurred
> occurred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few typos, otherwise LGTM.
- muxes these into a Matroska container (.mkv) | ||
- uses FFmpeg for this encoding process | ||
|
||
To encode your sequences using the best preservation flags within RAWcooked then you can use the ```--all``` flag. This flag concatenates several important flags into one ensuring lossless compression and assured reversibilty: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reversibilty
> reversibility
| ```--accept-gaps``` | Where gaps in a sequence are found this flag ensures the encoding completes successfully. If you require that gaps are not encoded then follow the ```--all``` command with ```--no-accept-gaps``` | | ||
|
||
|
||
If you do not require all of these flags you can build your own command with just the flags you prefer, for exmaple: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exmaple
> example
|
||
## Capturing logs | ||
|
||
It is advisable to always capture the console output of your `RAWcooked` encoding and decoding for review over time. The console output will include `RAWcooked` software information, warning or error messagess, plus confirmation of a successful encode or decode. The console also outputs important encoding information from the FFmpeg encoding software including FFmpeg version, file metadata and stream encoding configurations. Over time this information can be valuable for understanding your compressed files. To capture console log outputs for standard output and standard errors you can use the following commands. You may want to add ```-y``` or ```-n``` which answers yes or no to any questions asked by `RAWcooked` software, unless you're happy monitoring the logs as they are created to intercept any questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
messagess
> messages
Update 4K results following NAS upgrades
Hi there,
Submitting some developments for your review, as possible additions/developments of the RAWcooked website.
Thanks,
Joanna