The FormAssist : Deep learning methods for converting handwritten forms into digital assets

Srivathshan KS
Saurav Kumar
Shreekanth R
Midhilesh E
Parvej Reja Saleh


Customer agreement are required to follow statuary and legal requirements, which include agreements to be manually signed. In India, paper forms are still prevalent in Banking Industry. The paper forms require customers to fill a template form in capital letters and manually sign by agreeing to the terms. This creates challenge in analytical systems as the data is captured outside the system and requires time to become part of data pipeline. The future of banks is poised to be digital, however we still need historical data for train models for current data applications. This limitation is a known bottleneck in designing data applications for real time decision making. Developing Optical Character Recognition (OCR) with capabilities commensurable to that of human is still not achievable, in spite of decades of excruciating research. Due to idiosyncrasy of individual form, analysts from industry and scholastic circles have coordinated their considerations towards OCR. The work in this paper shows an efficient model to capture offline handwritten forms and convert them into digital records. The model techniques are based on deep learning methodologies and show higher accuracy for our testing set of real application forms of selected Banks. We have experimented with different feature extraction techniques to extract hand written characters in the forms. Our experimentation has evolved over time to find a generalized solution and better results. The final model uses relative position of the characters for extracting characters from the forms and Convolutional Neural Networks (CNNs) to predict the characters. The paper also discusses the serverless architecture to host the FormAssist as a REST API with model calibration feature to accommodate multiple types of forms.

