Text Classification Models for Form Entity Linking

María Villota; César Domínguez; Jónathan Heras; Eloy Mata; Vico Pascual

doi:10.48550/ARXIV.2112.07443

Text Classification Models for Form Entity Linking

1 Department of Mathematics and Computer Science, University of La Rioja

Actas:

15th IAPR International Workshop on Document Analysis Systems (DAS 2022), 22-25 may 2022. Short Paper Booklet

Editorial: La Rochelle Université

Año de publicación: 2022

Páginas: 40-43

Congreso: 15th IAPR International Workshop on Document Analysis Systems (DAS 2022), 22-25 may 2022.

Tipo: Aportación congreso

beta Ver similares en nube de resultados

DOI: 10.48550/ARXIV.2112.07443 GOOGLE SCHOLAR

Repositorio institucional: Acceso abierto Editor

Resumen

Forms are a widespread type of template-based documentused in a great variety of fields. The automatic extraction of the informationincluded in these documents is greatly demanded due to theincreasing volume of forms that are generated in a daily basis. However,this is not a straightforward task when working with scanned forms becauseof the great diversity of templates with different location of formentities, and the quality of the scanned documents. In this context, thereis a feature that is shared by all forms: they contain a collection of interlinkedentities built as key-value (or label-value) pairs, together withother entities such as headers or images. In this work, we have tackled theproblem of entity linking in forms by combining image processing techniquesand a text classification model based on the BERT architecture.This approach achieves state-of-the-art results with a F1-score of 0.80on the FUNSD dataset, a 5% improvement regarding the best previousmethod.