Edit Workflow - Processing steps

Processing steps can transform or extract data from scanned documents. Each of the processing steps can be enabled and configured independently of each other.

You may enable or disable all of the below processing steps in any workflow. Note that the processing steps available in the YSoft SafeQ management interface will depend on your YSoft SafeQ Managed Workflow license.

Processing steps can either change the resulting document (e.g., process images into a searchable PDF) or extract information from scanned documents into variables.

Barcode

Enable this processing step to find barcodes of the selected barcode type.

  • Workflow variable – The barcode step reads the barcode value and saves it into the %barcode% variable. If multiple barcodes of the selected type are found the leftmost and uppermost barcode value is used. If no barcode is found the %barcode% variable is set to an empty string.

    • Special characters are removed using the configuration option parameterProhibitedCharacters to prevent unsafe user input.

    • Example:

      • %barcode% = ABCD1234

  • All workflow variables – Variables %barcode<X>% contain each barcode value with X starting from 1. If multiple barcodes are found they are sorted from upper-left to bottom-right.

    • Special characters are removed using the configuration option parameterProhibitedCharacters to prevent unsafe user input.

    • Example:

      • %barcode1% = ABCD1234

      • %barcode2% = C--TEMP

  • External Processing Step variables – Variables %barcodeInsecure<X>% c ontain each unmodified barcode value with X starting from 1 . If multiple barcodes are found they are sorted from upper-left to bottom-right.

    • The value may contain unsafe user input and should be used in an External Processing Step.

    • Example:

      • % barcodeInsecure1% = ABCD1234

      • % barcodeInsecure2% = C:\TEMP

  • JSON variables – Variables %barcode_<JSON key path>% contain barcode JSON values with a key path using dot notation replaced by an underscore character .

    • Special characters are removed using the configuration option parameterProhibitedCharacters to prevent unsafe user input.

    • Example:

      • %barcode_store_book_0_title% = My Book

      • %barcode_store_book_0_price% = $10.00


images/download/attachments/246350603/ProcessingStepBarcode.PNG

  1. Barcode type – Determines which barcode type should be identified and extracted from a document. Valid types are:

    • UPC A

    • UPC E

    • EAN 8

    • EAN 13

    • Code 39

    • Code 93

    • Code 128

    • Codabar

    • ITF

    • RSS 14

    • RSS Expanded

    • Any 1D barcode – the barcode in a document can be any of the supported (above-listed) 1D barcode types

    • Aztec

    • Data Matrix

    • Maxicode

    • PDF 417

    • QR Code

Scan Job Separation ADVANCED WORKFLOWS

Enable these processing steps to split a batch scan into multiple documents. Scan jobs may be separated as follows:

  • Upon detection of a barcode: A new document starts when a barcode of the specified type is detected in the scan

  • Upon detection of a Y Soft standard separation sheet: A new document starts when the standard separation sheet is detected in the scan. The standard separation sheet can be downloaded in the Scan Job Separation processing step section of the Edit Workflow page

  • Page count: A new document starts after the number of specified pages


Item

Options

Description

Barcode

Barcode type:

  • UPC A

  • UPC E

  • EAN 8

  • EAN 13

  • Code 39

  • Code 93

  • Code 128

  • Codabar

  • ITF

  • RSS 14

  • RSS Expanded

  • Any 1D barcode

  • Aztec

  • Data Matrix

  • Maxicode

  • PDF 417

  • QR Code

Starts a new document (and finishes the previous one) each time the selected type of barcode is encountered. There is a variable %separationBarcode% which contains the value of the corresponding barcode. Pages with these barcodes can be included in or excluded from scan jobs.

Standard Separation sheet

Link to download the standard separation sheet

Starts a new document (and finishes the previous one) each time the standard separation sheet is encountered. Separation sheets can be included in or excluded from scan jobs.

Page Count

Every # page

Starts a new document (and finishes the previous) each # pages

Highlighter Extraction ADVANCED WORKFLOWS

Enable the highlighter extraction step to identify all text in a scanned document highlighted with a given color. The highlighter extraction step concatenates all highlighted text into a single-line text string (individual words are separated by the white space character) and saves the text string into the %highlightedText% variable.

Remarks:

  • The highlighter extraction step works on black and white documents only

  • The optimal results for the highlight feature are at 300 DPI. Higher DPI settings are not recommended and will have a negative impact on the performance of the highlighter extraction

  • A word should be highlighted precisely to allow the ABBYY OCR engine to detect the word correctly. The word must be fully highlighted and not include highlighted parts of another word as the ABBYY OCR engine might recognize it as additional characters

images/download/attachments/246350603/highlightExtractionWithLanguage.png

  1. Highlighter color – Determines the highlighter color that should be detected. Text highlighted with this color will be extracted. Available colors are:

    • Green

    • Red

  2. Language – The language of the extracted text (improves the accuracy of the extracted text).

  3. Search for highlighter

    The page range in the scanned document for highlighted text.

    • In the entire document

    • Specific range

      • Use page numbers (including blank pages)


      images/download/attachments/246350603/ProcessingStepHighlightExtractionRange.PNG

      • From – The lower limit of page range. If empty, the range starts on the first scanned page

      • To – The upper limit of page range. If empty, the range ends on the last scanned page

Highlighter Redaction ADVANCED WORKFLOWS

Use the highlighter redaction step to redact (overlay with black) areas in a scanned document marked with the given color.

images/download/attachments/246350603/ScanWorkflows-highlightRedaction-callouts.png

Remarks:

  • The highlighter redaction step works on black and white documents only

  • Redaction on highly compressed files is not recommended. Ensure that scan quality settings are adjusted accordingly prior to redacting

  • Best results are achieved using the lowest compression settings on the MFD (even though the processing step filters compression-related image noise as much as possible)

  1. Highlighter color – Determines the highlighter color that should be detected. Areas highlighted with this color will be redacted. Available colors are:

    • Green

    • Red

  2. Search for highlighter

    The page range in the scanned document for the highlighted text.

    • In the entire document

    • Specific range

      • Use page numbers (including blank pages)

        images/download/attachments/246350603/ProcessingStepHighlightExtractionRange2.PNG
      • From – The lower limit of the page range. If empty, the range starts on the first scanned page

      • To – The upper limit of the page range. If empty, the range ends on the last scanned page

OCR ADVANCED WORKFLOWS OCR

Use the OCR step to analyze text documents, recognize and extract document text and formatting, and save the result to a file of a selected output format. The OCR step recognizes only the common typographic type of text.

images/download/attachments/246350603/ProcessingStepOcr.PNG

  1. Language – Document languages. OCR processing requires the language setting to correctly recognize the document's text. It is possible to select multiple languages.

    Please refer to OCR Processing Step – Supported Languages for document languages recognized by the OCR processing step.

    The accuracy of OCR depends on many factors that are out of Y Soft SafeQ's control, such as the quality of the printed document, scanner quality, and text size and is, therefore, never 100% even with correctly configured document language. To get the best results out of OCR, please use the following guide for scan resolution settings:

    • For regular texts (font size 8-10 points), it is recommended to use 300 dpi resolution for OCR (in most cases, represented by the "Fine" setting) as the OCR technologies are tuned for that resolution

    • For smaller font text sizes (8 points or smaller), it is recommended to use 400-600 dpi resolution

    • Low image quality (i.e. resolution/DPI) may lead to quality and speed degradation as uncertainty in the character picture produces more recognition variants to process


  2. Remove blank pages – If checked, blank pages are removed from the scanned document. The blank page detection algorithm is heuristic and so its results cannot always be accurate. There are predefined thresholds used for blank page detection. When any value of any of the following criteria is exceeded, the page is not considered blank.

    • The maximum number of letters belonging to the recognition languages is five

    • The maximum percentage of black areas on a page is one

    • The maximum number of objects found on a page is 20

    • If a page contains a barcode, it is not considered blank

  3. Detect page orientation – If checked, detects the page orientation of each page in the document and rotates pages that are upside-down to correct the orientation

  4. Split dual pages – If checked, splits dual pages (e.g., when scanning double pages of a book) into two pages in the resulting document

  5. Despeckle – If checked, cleans the noise in the scanned document. This may impact the speed and performance of the OCR engine

The speed of OCR can be increased by the ocrProcessesPerJob and ocrPoolSize properties in the expert configuration.

The type of processed file that is being sent from the MFD for OCR processing can be modified by the ocrInputFileType property in the expert configuration.

External Processing Step

The External Processing Step is a way to extend the built-in capabilities of Scan Workflows in YSoft SafeQ. Use it to run an external command before delivering scanned documents. The command will be executed by the WPS service on the server that processes the scan job, under the identity of the WPS service account, and it will be able to modify the scan job before it is delivered by WPS.

Some examples of how the External Processing Step can be used include:

  • Archiving all scan jobs

  • Delivering scan jobs with an accompanying metadata file in a custom format

  • Rejecting scans not fulfilling some condition, e.g. where a particular text or barcode pattern is not found

  • Delivering multiple image files in a single ZIP archive

images/download/attachments/246350603/image2020-3-3_21-6-38.png

The command will be executed as the last step before delivering the documents to a destination. Two text files in the UTF-8 encoding will be created before the command is executed:

  • A metadata file that lists all available metadata and user variables. Each line contains a key=value pair. Use the %metadata% variable to pass the filename of the metadata file as an argument.

  • A file-list file that lists all image files that comprise the scan job, with each file name on a separate line. Use the %fileList% variable to pass the filename of the file as an argument (or read the fileList variable from the metadata file).

The command may modify one or both of the files in place in order to modify the outcome of the workflow.

If the command fails to execute or returns a non-zero exit code, the workflow will be aborted.