Text Classification Models for Form Entity Linking
- 1 Department of Mathematics and Computer Science, University of La Rioja
Editorial: La Rochelle Université
Año de publicación: 2022
Páginas: 40-43
Congreso: 15th IAPR International Workshop on Document Analysis Systems (DAS 2022), 22-25 may 2022.
Tipo: Aportación congreso
beta Ver similares en nube de resultadosResumen
Forms are a widespread type of template-based documentused in a great variety of fields. The automatic extraction of the informationincluded in these documents is greatly demanded due to theincreasing volume of forms that are generated in a daily basis. However,this is not a straightforward task when working with scanned forms becauseof the great diversity of templates with different location of formentities, and the quality of the scanned documents. In this context, thereis a feature that is shared by all forms: they contain a collection of interlinkedentities built as key-value (or label-value) pairs, together withother entities such as headers or images. In this work, we have tackled theproblem of entity linking in forms by combining image processing techniquesand a text classification model based on the BERT architecture.This approach achieves state-of-the-art results with a F1-score of 0.80on the FUNSD dataset, a 5% improvement regarding the best previousmethod.