Skip to content

PDF Manuscripts

Erik Hetzner edited this page Apr 18, 2018 · 7 revisions

PDF Manuscripts

Overview

PDF Upload support landed through the tickets in the epic for PDF and \latex support. The benefit is that authors using \latex can submit their manuscript looking just the way they want for the initial phase.

To achieve this support, we employ PDF.JS (Section PDF.JS and pdf.js-viewer) to render the document on the client. There are also feature flags and interactions with Attachments workflow that changed for this addition.

The following sections touch on the high-level points of using PDFs in Aperta. The discussion of the back-end models and settings will set the stage illustrating how to upload a PDF manuscript. The front end will focus on display and some screenshots of the specific user interactions when the journal enables PDF uploads.

Rails (server) Details

Starting at the top of Figure pdfmanuscripterd , Journal appears first. Each Journal has it's own setting enabling a PDF upload. Listing journalschema shows an abbreviated schema for Journal with the pdf_allowed boolean. Any given journal also can hold custom css for pdf display. The default is to not allow a PDF submission. New journals will need to set this appropriately.

create_table "journals", force: :cascade do |t|
  # <snip>
  t.text     "pdf_css"
  # <snip>
  t.boolean  "pdf_allowed",          default: false
end

Moving to the next model in Figure pdfmanuscripterd , we arrive at Paper. It is the parent, or owner, of any attachments. A manuscript is such an attachment being of the type ManuscriptAttachment. Paper comes into play by updating the body text correctly when an author uploads a new manuscript. In the case of PDF, this blanks out the body for versioning.

Finally, there is Attachment. This class is the parent of the sub-classes ManuscriptAttachment and SourcefileAttachment. The first is where any type (.doc, .docx, .pdf) of manuscript lives. ManuscriptAttachment uses CarrierWave for storage to S3. There is nothing unique about this class for PDFs as the manuscripts themselves are just files. It does store type in the .file attribute.

If the staff accept a manuscript for publication, they will require the source document of the PDF. This is where SourcefileAttachment comes into play. Aperta's current behavior requires any submission 1.0 and beyond to have an associated source file for export.

Exporting

One final class caring about a PDF manuscript is ExportPackager. The source file requirements exists just for export. The packager adds all of the files (PDF and source) to the exported zip file.

Converting

Aperta has pieces in place to convert documents into PDFs. The PaperConverters module handles these transformations. Table conversions shows the main transformations available.

Source Type Output Requested Converter
pdf pdf IdentityPaperConverter
docx/doc docx/doc IdentityPaperConverter
docx/doc docx/doc (sourcefile) SourcePaperConverter
docx/doc pdf (with attachments) PdfPaperConverter
pdf pdf with attachments PdfWithAttachmentsPaperConverter
docx/doc pdf for export (router) AutoGeneratedDocxToPdfConverter
pdf pdf for export (router) AutoGeneratedDocxToPdfConverter

The first three entries make sure that we can generate the right URL for to directly link to the files requested. The only complexitiy is for a docx listed as a source file. It has a separate converter.

In the second group, the generated PDFs have any attachments appended into the PDF. This is true for Microsoft Word sources or PDF sources.

Finally, the last section converts the source materials into a PDF for the router. This is the export PDF which gets packaged up for Apex.

Allowing PDF Uploads

This is a manual step taken on by a developer. There is no setting on the administration section of Aperta to turn this on, so someone must use the console. The following command in Listing pdfallowed will make the necessary journal setting.

Journal.find(id).update(pdf_allowed: true)

Ember (client) Details

Allowing PDF uploads changes the client-side interactions. The following sections discuss these additions to the submission workflow.

PDF.JS and pdf.js-viewer

For the client side, Aperta shows an HTML representation of the body if the uploaded manuscript is a docx file. For a PDF, Aperta makes use of the PDF.JS library from the good people at Mozilla.

However, the main drawback of this library is that it lacks an embeddable viewer. It ships with an example of a full-page implementation that looks exactly like the PDF viewer in Firefox. They ask that if you use this then your version should not look the same as the reference.

This is where pdf.js-viewer comes into play. It is an implementation of an embeddable PDF viewer based on PDF.JS. However, customization is always a snag. This library modifies the PDF.JS reference to get the job done. For Aperta, the pdf.js-viewer needed editing to meet the system needs.

This means we have the following installation, implementation and upgrade path:

  1. Get latest PDF.JS
  2. Appy pdf.js-viewer patches
    1. Fix any broken patches
  3. Apply Aperta patches
    1. Fix any broken patches
    2. Update build script

The path vendor/assets/pdfjs-viewer/ contains all of the files related to these assets. There is a README.md explaining the steps and an update.sh script automating the upgrade steps. Although these helpers exist, the upgrade is still intensly manual.

The first reason that Aperta hosts it's own version of these files is due to customization. The other is that the library is only included on pages that need PDF support. In this way, the system minimizes the extra libraries for the client.

Uploading

For submissions to journals with PDF upload enabled, an author sees the intial prompt in Figure uploadpdf . The text under the button changes to only list the Microsoft Word formats if PDF upload is disabled.

After the inital upload, every subsequent submission requires the author to complete the Upload Manuscript task. Figure upload shows the task for PDF submissions. There is the normal file uploaded for the manuscript and then a secondary one for the source file.

Contrast Figure upload with Figure uploaddocx . In the latter, there is no extra uploader for a source file. We can tell PDFs are permissable by the upload instructions. If the authors swaps out the manuscript for a PDF version, the task will again ask for the source.

The source files can be .docx, .doc or .zip. The zip files are intended to package up all of the assets for a \LaTeX submission.

The attached source file is enforced by task validation. If an author tries to finish the task without uploading all relevant files it will show some errors for the user to fix. Figure uploadfail has an example of these failures.

Versions

When using PDFs, Aperta does not provide the ability to see differences in the body between versions. Figure pdfdiff illustrates the warning displayed for the user.

The meta-data associated with a manuscript will still show any changes in the right pane. Notice too that the files in Figure pdfdiff indicate that they are PDFs.

Downloads

For the manuscript downloads, PDF files are generated from submitted .docx files. However, when a user submits a PDF, the uploaded PDF can be provided directly. This is what happens in the Downloads section of the interface in Figure downloads .

Each submission in Figure downloads has a PDF version. It's easy to see that the 0.0 version had only a PDF since no Word format is listed.

In journals without PDF support, a user would still see the PDF links, but those versions would be generated from the Word files and not the original author submissions.

Future Enhancements

I know that you thought that the PDF support was perfect, but... Surprise! There are areas to improve. Two main suggestions are to provide some level of manuscript diffing and maybe craft Aperta's own viewer.

Diffing

Showing manuscript body differences is not possible when the source is PDF. Aperta lacks the ability to show changes between two different PDF submissions. If this becomes a need for reviewers and staff, then APERTA-6472 and APERTA-7296 may provide a starting point.

PLOS PDF.JS

The initial integration of pdfjs-viewer successfully displays PDFs, but has proven to be difficult to maintain going forward. APERTA-11040 outlines using PDF.JS to create an Aperta-specific PDF viewer.

Implementing this will be non-trivial, but provide the ability to more closely track PDF.JS releases and customize the display look, feel, and function.

It may also be possible to build this with the base PDF.JS in such a way as to pull down a CDN version of that library. This avoids hosting it with Aperta and may be kinder to those far away from the servers (provided they have a CDN nearby).

Appendix

No browsers were harmed in the collection of screenshots. All computes generating the diagrams are not certified organic. Do not iron printed materials. The following sections explain all diagramming tools and techniques used in this document.

ERD Generation

Listing papererd shows the code to generate the ERD diagram. Note that this requires the installation of the rails-erd gem.

rake erd only="Journal,Paper,Attachment,ManuscriptAttachment,SourcefileAttachment" filename="pdfmanuscripterd" filetype="pdf" title=false inheritance=true attributes=false

References

Attachments:

pdfmanuscripterd.pdf (application/pdf)

pdfviewer.png (image/png)

uploadpdf.png (image/png)

upload.png (image/png)

uploaddocx.png (image/png)

uploadfail.png (image/png)

downloads.png (image/png)

pdfdiff.png (image/png)

Clone this wiki locally