Centralized YSoft SafeQ Workflow Processing System deployment

Advanced Workflows especially with OCR can be very resource intensive. There are benefits in consolidating scanning in a central server or server farm.

Remember, YSoft SafeQ is licensed based on number of devices, amount of (Scan or any) Servers has no influence on YSoft SafeQ license cost.

High latency between device and Scan Server negatively impacts user experience on the terminal. Each action (tap) has high response time.

Server Roles

Management Server

Stores and distributes system configuration including devices and terminals. Replicates users from external sources. Provides a management interface for administrators.

Site Server

Serves as a caching Management Server proxy for other YSoft SafeQ server roles. Controls devices, terminals and YSoft SafeQ Client applications in a part of the system. Provides an end user interface (e.g. allows users to print via web upload). Implements a server-based spooler (as opposed to a client-based spooler implemented by the YSoft SafeQ Client).

Scan Server

Enables document capture on device equipped with a YSoft SafeQ terminal, processes scanned documents (including OCR) and delivers processed documents to configured destinations. It contains YSoft SafeQ Workflow Processing service only.

Scenario: Scanning to the Cloud

This scenario can optimize resource usage in distributed environments, especially when utilizing centralized (commercial) cloud infrastructure:

  • The customer does not need to duplicate powerful hardware at branch offices. The resource-intensive Scan Servers are deployed to a datacenter which already has powerful hardware.

  • While the Management Server requires fixed allocation of resources to guarantee timely responses to real-time requests and does not benefit much from extra resources, Scan Servers can benefit from dynamic allocation of extra resources for faster image processing.

All scan jobs are transferred to the data center, likely over WAN.

No third-party network load balancer is required. Each branch office uses one Scan Server. Multiple branch offices can share a common Scan Server. See the Scanning Farm scenario below when more Scan Servers are required to process scans from a single branch office.

The Site Server in the data center is used for Scan Server configuration and management.

images/download/attachments/160480846/Centralized_scanning_1.png


Scenario: Scanning Farm

When expecting to run scanning with OCR on thousands of pages, scanning farm can be created. Scan Servers can benefit from dynamic allocation of resources, to ensure that all the performance is available when needed. This scenario requires a third-party load balancer.

All scan jobs are transferred to the data center, likely over WAN.

The Site Server in the data center is used for Scan Server configuration and management.

images/download/attachments/160480846/Centralized_scanning_2.png


Component Communication Overview

The interface between Site Server and Scan Server is HTTP/S and it is designed as state-less. Therefore any Scan Server can serve any requests.

Each Scan Server (or Scan Server farm) requires a Document Store. For this WebDAV or shared folder (SMB) protocols are used, See Configuring Document Store for details about how the communication flows.

How to Configure

For all scenarios, it is important to configure:

  1. Location where the documents are being scanned into. See Configuring Document Store.

  2. Point the Scan Server to a Site Server (in all scenarios there is one Site Server in the Data Center dedicated for Scan Servers). Edit configuration file <SafeQ6>\WPS\WpsService.exe.config:

    <add key="spoolerControllerEndpoint" value="tcp://SITE SERVER FQDN OR IP:5555" />
  3. Point Terminal Server to the Scan Server. Edit configuration in file <SAFEQ6>\SPOC\terminalserver\terminalserver.exe.config

    <add key="wpsBaseAddress" value="http://SCAN SERVER FQDN OR IP:5600/" />
  4. In case there are multiple YSoft SafeQ Workflow Processing Services (Scan Servers) connected to a single YSoft SafeQ Spooler Controller (Site Server), you have to manually set YSoft SafeQ Workflow Processing Service instance name to a unique value in <SafeQ6>\WPS\WpsService.exe.config:

    <add key="name" value="UNIQUE INSTANCE NAME" />

Load Balancer Intergation

Configure the load balancer to send requests coming HTTPS communication on port 5600 to the pool of Scan Servers. Persistence per one request is 30 minutes.

Change configuration in file <SAFEQ6>\SPOC\terminalserver\terminalserver.exe.config

<add key="wpsBaseAddress" value="http://LOAD BALANCER FQDN OR IP:5600/" />

Optimize Performance

In Management Interface, find and adjust the following parameters:

  • ocrProcessesPerJob - The maximum number of simultaneous OCR threads that can be used to process a single job. Higher numbers speed up processing of large scan jobs. The maximum supported value is 8. See also ocrPoolSize which controls the number of jobs that involve OCR that can be processed in parallel. Note that the number of OCR threads actually used to process a single job is further limited by the number of pages and the total number of CPU cores in the system. Also note that the final document synthesis is done in a single thread because it cannot be parallelized.

    Example: ocrPoolSize = 2 and ocrProcessesPerJob = 2. At most two jobs are processed simultaneously. More jobs will wait in a queue. Each job utilizes at most two CPU cores. In total, OCR processing may utilize as many as 4 CPU cores. If the host has only 4 cores, other applications running on the same host may not have enough resources.

  • ocrPoolSize - The maximum number of scan jobs that involve OCR that will be processed in parallel by a single Workflow Processing System server (scan server). More jobs will wait in a queue. See also ocrProcessesPerJob which controls the number of OCR threads that can be used to process a single job. Together, these two options limit the number of CPU cores that can be used for OCR processing.

    Example, ocrPoolSize = 2 and ocrProcessesPerJob = 3 means that the system will process a maximum of two jobs in parallel and will utilize as many as 6 CPU cores in total.

In summary:

  • If the customer produces a large number of small jobs, you should increase ocrPoolSize and keep ocrProcessesPerJob low.

  • If the customer produces large jobs, you should keep ocrPoolSize low and increase ocrProcessesPerJob.

Limitations

  • No offline mode: if connection to the Scan Server is not available, the scan application on terminals cannot be displayed.

  • All scan jobs are considered equal as well as all Scan Servers. As an administrator, I cannot have one server for large jobs (keep ocrPoolSize low and increase ocrProcessesPerJob) and others for small jobs (increase ocrPoolSize and keep ocrProcessesPerJob low).

  • High latency between device and Scan Server negatively impacts user experience on the terminal. Each action in the scanning application interface (each click) has high response time.