-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix mtime of sdist archive members #452
Conversation
3cd27c3
to
079ac4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dnicolodi!
SOURCE_DATE_EPOCH
, if it's set, is actually used for all the files in the wheel - both on main
and with this PR. I think that's a useful feature, and it's tested - so we should probably document that.
In addition, SOURCE_DATE_EPOCH
was used for the generated PKG_INFO
file in main
, and that is dropped in this PR. That is perhaps a regression, because with this PR it now always uses the current time - and that makes it impossible for two subsequent invocations of python -m build --sdist
to yield the exact same output.
The main change in this PR - changing from time zero (1-1-1970) to the time of the last commit - does seem to me to be correct and an improvement on what this code did before.
Why is this useful? The whole reproducible builds thing is about producing reproducible binaries, not source tarballs. What we should maybe do is to use the time of the last commit for
I think the current implementation is just the best of the compromises. We still copy into the tarball the file content from the source directory instead than the content of the last commit. Thus using the timestamp of the commit as mtime seems a bit off. I still think that picking up changes not committed to the version control is not useful. But, as we discussed, removing this behavior requires a bit more of analysis of the use cases, thus I left it in place. |
https://reproducible-builds.org/docs/archives/ talks about timesteps for archives and how to fix this up if it isn't reproducible with the For SciPy we just got a couple of bug reports from OpenSUSE related to non-reproducible builds. So let's ask the author of those issues. @bmwiedemann does it matter to you whether sdists (source archives) have non-reproducible timestamps, or only wheels (installable packages)? In other words, would you ever run |
reproducible builds can be seen as a more general framework for deterministic transformations, including a files->tar transformation. So there is some value in being able to it. Some builds let .tar files end up in packages, so it can matter there for the journey to bit-reproducible results. |
@bmwiedemann Thanks for chiming in. I understand that a desirable property of an archiving tool is to provide always binary identical archives for the same input files. However, I would like input on a more nuanced case. We are wondering which modification time we should use to store the generated file in the archive: the same as for the source code files or the current time. Further, we would like to know if honoring I'm ambivalent in choosing between current time and last commit timestamp. I see value in using either. The latter of course results in a reproducible source code distribution archive, nor the first. However, I'm pretty sure that respecting |
@rgommers One of the reasons I went for the current time for |
The current patch would make it pretty hard to reproduce tarballs (even if it is "only" source tarballs) because And SOURCE_DATE_EPOCH can be a good alternative to current time. It is at least an indicator that someone would prefer builds to be reproducible. |
Sure. But does anyone care? All the source tarballs produced by meson are not reproducible (there is at least one souece of non determinism: meson stores the current time in the gzip header) and I don't know of any complaint.
I don't understand what you mean. This is about the source distribution for Meson, which is not created using Anyway, the source distribution for meson is not reproducible. Meson 0.29.0 is ancient history, however: $ git checkout 0.29.0
$ python3.11 -m build -s
$ mv dist/meson-0.29.0.tar.gz dist/meson-0.29.0.tar.gz.A
$ python3.11 -m build -s
$ mv dist/meson-0.29.0.tar.gz dist/meson-0.29.0.tar.gz.B
$ cmp dist/meson-0.29.0.tar.gz.A dist/meson-0.29.0.tar.gz.B
dist/meson-0.29.0.tar.gz.A dist/meson-0.29.0.tar.gz.B differ: char 5, line 1 |
Not about PyPI, but more general: |
This is a very broad scope project. If this is your goal, I think that you may want to start pressuring more popular tools commonly used to produce source distributions than meson-python. Even focusing only on the Python packaging niche: probably the most commonly used tools is setuptools, which as demonstrated above, does not produce reproducible tarballs. Even if we embrace the goal of reproducible source distributions, |
Set the mtime of all archive members to the mtime set by meson dist, also for files that may have been locally modified. Use the current time for the PKG-INFO file. Drop support for the $SOURCE_DATE_EPOCH environment variable when creating the sdist. This environment variable is for meant to be used as a work-around for build tools that produce non-reproducible binary packages. It does not have a standardized meaning for tools producing source distributions. See https://reproducible-builds.org/docs/source-date-epoch/ Fixes mesonbuild#450.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
One of the reasons I went for the current time for
PKG_INFO
is that other generated files in the archive (added or modified by the scripts registered withmeson.add_dist_script()
) take the current time as mtime in the archive generated by Meson, unless the package authors take special care (and I'm almost certain that very few do). As we don't do anything to make the mtime of these constant, I thought using the current time forPKG_INFO
is also fine
I'm not full convinced by this argument, because (a) only a small fraction of packages will be using add_dist_script
and (b) it didn't seem necessary to make this change, and it's not impossible that someone may need this kind of sdist reproducibility.
However, the main change in this PR is more important and in good shape. So let's get this in in its current form, because it fixes a bug. If someone comes knocking about the change to PKG_INFO
, we can always deal with that then - and we'll have learned something.
So in it goes. Thanks again @dnicolodi. And thanks @bmwiedemann for your thoughts on this issue.
No description provided.