Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config: support unicode en-dash (codepoint 8211) during copyright year substitution. #12449

Closed
jayaddison opened this issue Jun 20, 2024 · 4 comments

Comments

@jayaddison
Copy link
Contributor

jayaddison commented Jun 20, 2024

Is your feature request related to a problem? Please describe.
During evaluation of the config.copyright and config.epub_copyright fields, Sphinx attempts to substitute the year evaluated from the SOURCE_DATE_EPOCH environment variable, when it is set.

This supports future rebuilds of Sphinx documentation projects into bit-for-bit identical results, a property that provides reassurance that the source code and build infrastructure used have not been tampered with, and that the resulting documentation is as-intended.

The copyright format pattern matches that Sphinx uses support year-to-year ranges (1998-2004 for example), but only an ASCII dash is supported in the range part, not the Unicode dash at decimal codepoint 8211.

Describe the solution you'd like
Extend the config-handling code to additionally support the Unicode en-dash at codepoint 8211 decimal in the check below:

if copyright_line[4] != '-':

Describe alternatives you've considered
An alternative, and non-mutually-exclusive approach would be to track down individual projects affected by this and suggest that they use the ASCII-based dash character instead. That could be useful in some cases, especially if their plans to upgrade Sphinx are limited.

Additional context
One software package in Debian known to be affected by this -- and thus is failing build reproducibility testing -- is the uncertainties Python library. The current output of reproducible build testing for this package illustrates a diff that occurs in the copyright footer when two comparative builds occur each using different-year host build timestamps: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/uncertainties.html

image

@jayaddison
Copy link
Contributor Author

Please read #12451 before considering applying changes that adjust this behaviour.

@jayaddison
Copy link
Contributor Author

matplotlib/matplotlib#28418 (comment) confirms that matplotlib chose to use en-dash intentionally so that substitution does not occur for their copyright lines when SOURCE_DATE_EPOCH is enabled.

So we should not do this. A different approach for the uncertainties library may be required; I'll either suggest an individual patch to Debian, or perhaps some other approach.

@jayaddison
Copy link
Contributor Author

Leaving this issue open for a while until there's a chance to gain consensus and have any discussion about it, but at some point I intend to close this as 'not planned'.

@jayaddison
Copy link
Contributor Author

I don't think that we should implement a change to support en-dash during copyright substitution matching at the moment, because en-dash is currently used intentionally to work-around that replacement logic in at least one case.

Closing pending an alternative, ideally more precise, method for copyright notice year substitution.

@jayaddison jayaddison closed this as not planned Won't fix, can't repro, duplicate, stale Jun 29, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
1 participant