Edit Workflow - Processing steps
Processing steps can transform or extract data from scanned documents. Each of the processing steps can be enabled and configured independently of each other.
You may enable or disable all of the below processing steps in any workflow. Note that the processing steps available in the YSoft SafeQ management interface will depend on your YSoft SafeQ Managed Workflow license.
Processing steps can either change the resulting document (e.g., process images into a searchable PDF) or extract information from scanned documents into variables.
Barcode
Enable this processing step to find barcodes of the selected barcode type.
Workflow variable – The barcode step reads the barcode value and saves it into the %barcode% variable. If multiple barcodes of the selected type are found the leftmost and uppermost barcode value is used. If no barcode is found the %barcode% variable is set to an empty string.
Special characters are removed using the configuration option parameterProhibitedCharacters to prevent unsafe user input.
Example:
%barcode% = ABCD1234
All workflow variables – Variables %barcode<X>% contain each barcode value with X starting from 1. If multiple barcodes are found they are sorted from upper-left to bottom-right.
Special characters are removed using the configuration option parameterProhibitedCharacters to prevent unsafe user input.
Example:
%barcode1% = ABCD1234
%barcode2% = C--TEMP
External Processing Step variables – Variables %barcodeInsecure<X>% c ontain each unmodified barcode value with X starting from 1 . If multiple barcodes are found they are sorted from upper-left to bottom-right.
The value may contain unsafe user input and should be used in an External Processing Step.
Example:
% barcodeInsecure1% = ABCD1234
% barcodeInsecure2% = C:\TEMP
JSON variables – Variables %barcode_<JSON key path>% contain barcode JSON values with a key path using dot notation replaced by an underscore character .
Special characters are removed using the configuration option parameterProhibitedCharacters to prevent unsafe user input.
Example:
%barcode_store_book_0_title% = My Book
%barcode_store_book_0_price% = $10.00
Barcode type – Determines which barcode type should be identified and extracted from a document. Valid types are:
UPC A
UPC E
EAN 8
EAN 13
Code 39
Code 93
Code 128
Codabar
ITF
RSS 14
RSS Expanded
Any 1D barcode – the barcode in a document can be any of the supported (above-listed) 1D barcode types
Aztec
Data Matrix
Maxicode
PDF 417
QR Code
Scan Job Separation ADVANCED WORKFLOWS
Enable these processing steps to split a batch scan into multiple documents. Scan jobs may be separated as follows:
Upon detection of a barcode: A new document starts when a barcode of the specified type is detected in the scan
Upon detection of a Y Soft standard separation sheet: A new document starts when the standard separation sheet is detected in the scan. The standard separation sheet can be downloaded in the Scan Job Separation processing step section of the Edit Workflow page
Page count: A new document starts after the number of specified pages
Item | Options | Description |
Barcode | Barcode type:
| Starts a new document (and finishes the previous one) each time the selected type of barcode is encountered. There is a variable %separationBarcode% which contains the value of the corresponding barcode. Pages with these barcodes can be included in or excluded from scan jobs. |
Standard Separation sheet | Link to download the standard separation sheet | Starts a new document (and finishes the previous one) each time the standard separation sheet is encountered. Separation sheets can be included in or excluded from scan jobs. |
Page Count | Every # page | Starts a new document (and finishes the previous) each # pages |
Highlighter Extraction ADVANCED WORKFLOWS
Enable the highlighter extraction step to identify all text in a scanned document highlighted with a given color. The highlighter extraction step concatenates all highlighted text into a single-line text string (individual words are separated by the white space character) and saves the text string into the %highlightedText% variable.
Remarks:
The highlighter extraction step works on black and white documents only
The optimal results for the highlight feature are at 300 DPI. Higher DPI settings are not recommended and will have a negative impact on the performance of the highlighter extraction
A word should be highlighted precisely to allow the ABBYY OCR engine to detect the word correctly. The word must be fully highlighted and not include highlighted parts of another word as the ABBYY OCR engine might recognize it as additional characters
Highlighter color – Determines the highlighter color that should be detected. Text highlighted with this color will be extracted. Available colors are:
Green
Red
Language – The language of the extracted text (improves the accuracy of the extracted text).
Search for highlighter
The page range in the scanned document for highlighted text.
In the entire document
Specific range
Use page numbers (including blank pages)
From – The lower limit of page range. If empty, the range starts on the first scanned page
To – The upper limit of page range. If empty, the range ends on the last scanned page
Highlighter Redaction ADVANCED WORKFLOWS
Use the highlighter redaction step to redact (overlay with black) areas in a scanned document marked with the given color.
Remarks:
The highlighter redaction step works on black and white documents only
Redaction on highly compressed files is not recommended. Ensure that scan quality settings are adjusted accordingly prior to redacting
Best results are achieved using the lowest compression settings on the MFD (even though the processing step filters compression-related image noise as much as possible)
Highlighter color – Determines the highlighter color that should be detected. Areas highlighted with this color will be redacted. Available colors are:
Green
Red
Search for highlighter
The page range in the scanned document for the highlighted text.
In the entire document
Specific range
Use page numbers (including blank pages)
From – The lower limit of the page range. If empty, the range starts on the first scanned page
To – The upper limit of the page range. If empty, the range ends on the last scanned page
OCR ADVANCED WORKFLOWS OCR
Use the OCR step to analyze text documents, recognize and extract document text and formatting, and save the result to a file of a selected output format. The OCR step recognizes only the common typographic type of text.
Language – Document languages. OCR processing requires the language setting to correctly recognize the document's text. It is possible to select multiple languages.
Please refer to OCR Processing Step – Supported Languages for document languages recognized by the OCR processing step.
The accuracy of OCR depends on many factors that are out of Y Soft SafeQ's control, such as the quality of the printed document, scanner quality, and text size and is, therefore, never 100% even with correctly configured document language. To get the best results out of OCR, please use the following guide for scan resolution settings:
For regular texts (font size 8-10 points), it is recommended to use 300 dpi resolution for OCR (in most cases, represented by the "Fine" setting) as the OCR technologies are tuned for that resolution
For smaller font text sizes (8 points or smaller), it is recommended to use 400-600 dpi resolution
Low image quality (i.e. resolution/DPI) may lead to quality and speed degradation as uncertainty in the character picture produces more recognition variants to process
Remove blank pages – If checked, blank pages are removed from the scanned document. The blank page detection algorithm is heuristic and so its results cannot always be accurate. There are predefined thresholds used for blank page detection. When any value of any of the following criteria is exceeded, the page is not considered blank.
The maximum number of letters belonging to the recognition languages is five
The maximum percentage of black areas on a page is one
The maximum number of objects found on a page is 20
If a page contains a barcode, it is not considered blank
Detect page orientation – If checked, detects the page orientation of each page in the document and rotates pages that are upside-down to correct the orientation
Split dual pages – If checked, splits dual pages (e.g., when scanning double pages of a book) into two pages in the resulting document
Despeckle – If checked, cleans the noise in the scanned document. This may impact the speed and performance of the OCR engine
The speed of OCR can be increased by the ocrProcessesPerJob and ocrPoolSize properties in the expert configuration.
The type of processed file that is being sent from the MFD for OCR processing can be modified by the ocrInputFileType property in the expert configuration.
External Processing Step
The External Processing Step is a way to extend the built-in capabilities of Scan Workflows in YSoft SafeQ. Use it to run an external command before delivering scanned documents. The command will be executed by the WPS service on the server that processes the scan job, under the identity of the WPS service account, and it will be able to modify the scan job before it is delivered by WPS.
Some examples of how the External Processing Step can be used include:
Archiving all scan jobs
Delivering scan jobs with an accompanying metadata file in a custom format
Rejecting scans not fulfilling some condition, e.g. where a particular text or barcode pattern is not found
Delivering multiple image files in a single ZIP archive
The command will be executed as the last step before delivering the documents to a destination. Two text files in the UTF-8 encoding will be created before the command is executed:
A metadata file that lists all available metadata and user variables. Each line contains a key=value pair. Use the %metadata% variable to pass the filename of the metadata file as an argument.
A file-list file that lists all image files that comprise the scan job, with each file name on a separate line. Use the %fileList% variable to pass the filename of the file as an argument (or read the fileList variable from the metadata file).
The command may modify one or both of the files in place in order to modify the outcome of the workflow.
If the command fails to execute or returns a non-zero exit code, the workflow will be aborted.