-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expose links to all export formats via Signposting #11045
base: develop
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Nice to have all our formats in Signposting - we should make sure Herbert vdS knows. I suggested on change to avoid problems with permalinks.
try { | ||
exporter = ExportService.getInstance().getExporter(formatName); | ||
describedby += ",<" + systemConfig.getDataverseSiteUrl() + "/api/datasets/export?exporter=" + formatName + "&persistentId=" | ||
+ ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier() + ">;rel=\"describedby\"" + ";type=\"" + exporter.getMediaType() + "\""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These (this and line 137) won't work for all permalinks since they don't necessarily have / as a separator. I think you can just ds.getGlobalId().asString() instead. For a real dataset, I don't think you can ever have a null GlobalId so not sure you even need to check for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
This comment has been minimized.
This comment has been minimized.
Whoops, tests are failing. https://jenkins.dataverse.org/blue/organizations/jenkins/IQSS-Dataverse-Develop-PR/detail/PR-11045/3/tests I'll take a look. |
…0542 The test file is used in InfoIT#testGetExportFormats
Before this PR... In development: Expected: is "http://localhost:8080/dataset.xhtml?persistentId=doi:10.5072/FK2/6A3292" Actual: is "http://localhost:8080/dataset.xhtml?persistentId=doi:10.5072/FK2/6A3292" On Jenkins Expected: is "http://localhost:8080/dataset.xhtml?persistentId=doi:10.5072/FK2/6A3292" Actual: http://ec2-3-225-221-142.compute-1.amazonaws.com/dataset.xhtml?persistentId=doi:10.5072/FK2/6A3292 So we'll change to just "endsWith" since we aren't actually testing the baseurl, just the datasetPid which we fixed up in ca93d60.
I pushed some commits to fix the broken tests, update the API changelog and release note, and fix a broken header in the docs. This test was failing in Jenkins:
... on this line:
Again, I don't think it has anything to do with changes in the PR but I'm just mentioning it in case we start seeing it elsewhere. @qqmyers if you would take another look I'd appreciate it! |
By the way, @4tikhonov pointed out to me that Dataverse shows up in this report of who has implemented Signposting: https://s11.no/2024/signposting-report/ |
📦 Pushed preview images as
🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates look fine. I agree the one test failure is probably unrelated to the PR - possibly a timing issue again.
What this PR does / why we need it:
Especially for Croissant but really all export formats, we're interested in exposing the URLs for each format via Signposting so that crawlers (and API users) can efficiently get just the HEAD of a page (or linkset API) to get the URLs.
If Google and others adopt Signposting, it will mean they can do a HEAD, get the Croissant URL (for example), and download the Croissant file, which has the potential to be large. See also discussion at mlcommons/croissant#530 (comment) and especially this URL:
<head>
mlcommons/croissant#646Which issue(s) this PR closes:
Special notes for your reviewer:
Please see the comment about how I changed the mimetype for our schema.org format.
I also did a fair amount of doc improvement. Feedback welcome. Heads up to @julian-schneider that I tweaked your docs in #10739 (just merged).
Suggestions on how to test this:
Try HEAD and GET on a published dataset, looking for the "Link header:
Try the linkset API
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No.
Is there a release notes update needed for this change?:
Yes, included.
Additional documentation:
A good entry point for doc changes: https://dataverse-guide--11045.org.readthedocs.build/en/11045/api/native-api.html#retrieve-signposting-information