UiPath Document Understanding Solution Architecture and Approach
Solution Architecture for Document Understanding
Era of Automation | Issue 05 | 28-Jul-2020
Most of the routine and mundane tasks we perform consist of a considerable amount of paperwork. The documents used for these processes can be in a structured, unstructured, or semi-structured format. We spend many hours reading the materials, extracting the data, and passing the much-needed information to the respective applications manually. Further, due to its highly manual nature, the data extraction from the documents is highly subject to human error. Hence, such processes that involve a lot of paperwork may not be accurate and reliable. However, the UiPath Document Understanding solution allows you to intelligently process any type of documents such as invoices, receipts, financial statements, utility bills, and any other kind of text that has different structures with a high level of accuracy and reliability.
While understanding the UiPath Document Understanding Framework shown in the above diagram as well as through the publication of “Era of Automation — Understanding Documents with UiPath,” the next question is, what is the best architecture to design your Document Understanding solution to suit your needs.
To decide the approach, we need to list the core requirement of document understanding.
- Do we need only to classify the documents?
- Do we need to classify and extract data from documents?
- Who will be handling the manual verification when needed?
- How would you escalate verification needs to higher management?
- What is the basis of verification and escalation to management?
One primary concern of the solution is that it should not stop the entire process until a human performs a manual verification. The process should escalate the check to the respective party, and continue evaluating and processing the rest of the documents.
Classification Based Approach
There are scenarios where the data extraction is not essential, but the priority is to segregate the documents based on its classification for further processing later in another process. In such cases, the UiPath Document Understanding solution comes in very handy as it provides the capability to classify documents based on keywords found within the document. The Document Understanding solution also offers the ability to train the classifiers intelligently when setting up the automation solution. These classifiers are also capable of learning by itself every time a document is classified to improve accuracy over time. The classification and classification verification is suitable for attended automation. The attended automation provides a Classification Station, where the user can verify and correct the obtained classification if the confidence of the auto capturing rating is less than a pre-defined value. The diagram shown below is one sample solution design to handle this requirement.
Document Classification and Extraction Approach
In most scenarios, classification is not the only requirement. Many processes require to extract the data from the documents and process them according to the business requirements. However, we still cannot ignore the classification in the automated approach, as it is essential to identify the type of the document so that the robot knows how and what fields to extract. Different methods are available to handle the manual verification.
- Attended only approach
Validation Station will be used to show the validation screen on the machine the process is executing. This approach is also ideal when the UiPath Orchestrator is not available.
- Unattended only approach
Use Action Center to handle the human involvement.
- Hybrid approach
Use attended and unattended collaboration in scenarios where the process should be manually triggered. If the same user who triggered the process is doing the validation, the use of a Validation Station is possible. However, based on the business logic, if certain exceptional cases need management approval, such escalations can be directed to Action Center without showing the Validation Station to the user.
Irrespective of the three approaches, when designing your Document Understanding solution, it is a good practice to break the solution into separate manageable sub-processes. As a generic solution that fits for most of the cases, we could introduce three sub-processes to handle the Document Understanding Framework.
The high-level diagram showcases a sample architecture for the Document Understanding process. The architecture used here breaks the entire document understanding process into three main sub-processes. The three main components are Initiator process along with processing logic (Process 1), UiPath Action Center for task assignment and management (Process 2), and finally, the Train models component (Process 3) which handles the training of the intelligent classifiers and passing the data for other applications. The detailed architecture of each part is as follows.
Process 1 — The initiator process
The Initiator process will be the primary process that handles document classification, data extraction, and the verification logic. The verification logic will include the rules that define how to handle the verification automatically, and the use of the Validation Station, Action Center, or both when human intervention is needed. Depending on the option chosen based on the validation logic, finally, the data will be passed into either Action Center Processor or Post Processing process to continue to the next steps. The diagram below shows a sample architecture for the initiator process.
Process 2 — Action Center Processor
The Action Center processor is the process that handles the task creation in Action Center, waiting for task completion, and finally passing the data to the Post Processing process for the end of the Document Understanding Framework. The diagram below shows a sample architecture for the action center process.
Process 3 — Post Processing
The post-processing includes the steps needed for exporting the final verified data, training the models, and finally, passing the data to a different process outside the document understanding framework to continue with any system interactions, etc. The data is handed over to a separate process because such steps are not part of the document understanding scope, and those should be maintained independently to maintain the integrity and reusability of each component/ process. It is also a part of the design best practices to have sub-processes that address various segments of an extensive process. The diagram below shows a sample architecture design for the final stage of the document understanding framework, and it also showcases how the Process 1 and Process 2 connects with Process 3.
The architecture showcased above could represent the foundation of the document understanding process of your organization. Further, you may also think about customizing the showcased architecture to suit the business requirement. However, though the business need for the process may change from one to another, the core architecture showcased above remains the same across all document understanding processes.
If you want to learn more about UiPath Document Understanding, join the UiPath Academy, where you can find courses on the above subject.
Excellence is achieved through constant challenging and breaking off from your limitations. It isn’t taught or given; it begins within you. — Lahiru Fernando