PDF documents have the following disadvantages when building a project in Project Builder
- PDF documents cannot be merged, split, or pages deleted.
- PDF documents make testing classification, barcode, zone and table locators VERY SLOW.
Kofax Transformation needs to "render" a PDF page to pixels for pixel-based locators (The automatic table locator which is looking for horizontal and vertical lines) and layout classification to see the non-text parts of the document. This rendering can take about 1 second/page.
Here is how to make your locators, benchmarks and classification run 10x or more faster in KT Project Builder.
This script converts the PDFs into TIFF images and gives these benefits
- The PDF text layer is preserved.
- documents can be split, merged, pages deleted.
- All locators, layout classification and all Benchmark Tools now run at full speed.
This script makes Project Builder run faster, but it does not make your project run faster at production time. It provides no value for production systems.
- Add the script below to the project level script, by right-clicking on Project Class in the Project Tree and selecting Show Script
- Open your Document Set with PDF Documents.
- Select the Documents that you want to convert from PDF to TIFF.
- Select Classify From Process Menu (or Press F5).
- Save the Documents by clicking Save Selected Documents (the single floppy disk icon, not the double!).
- Reload your Document Set.
- You now see that the PDF icon is gone, and that the the XDocuments are now TIFF based
- In Windows Explorer you will see the original PDF, the XDocument and each page as a TIFF image.
- Remove the script from your project if you no longer needed. It can run in produciton, but only if you want theses TIFFs at runtime for some reason.
- To undo everything and revert back to the PDFs directly, load the document set with Source files set to PDF only.
- Create a single-page TIFF image for each page. (Kofax Capture, KTM and KTA all prefer to work with single page TIFFS. Export connectors can merge them to multi-page tiffs if needed)
- Optionally perform basic VRS on the image.
- Replace each page in the XDoc with the TIFF pages.
- Preserve the PDF Text layer to avoid performing OCR.
Private Sub Document_AfterClassifyXDoc(ByVal pXDoc As CASCADELib.CscXDocument)
XDocument_ReplacePDFwithTIFF(pXDoc)
End Sub
Public Sub XDocument_ReplacePDFwithTIFF(pXDoc As CscXDocument)
'Project Builder is very slow with PDFs. use this the keep the pdf text but use TIFF images for speed while testing
Dim P As Long, Image As CscImage, Filename As String
For p=0 To pXDoc.CDoc.Pages.Count-1
Set image=pXDoc.CDoc.Pages(p).GetImage'.BinarizeWithVRS() ' The Table Locator NEEDS black&white images for vertical&horizontal line detection
filename=Replace(pXDoc.FileName,".xdc","_") & Format(p+1,"000") & ".tif"
image.Save(filename,CscImgFileFormatTIFFFaxG4)
pXDoc.ReplacePageSourceFile(filename,"TIFF",p,0)
Next
pXDoc.Save
End Sub