Building Spanish Trustworthy Question-Answer Datasets for Suicide Information

Pablo Ascorbe; María S. Campos; César Domínguez; Jónathan Heras; Magdalena Pérez; Ana Rosa Terroba-Reinares

Building Spanish Trustworthy Question-Answer Datasets for Suicide Information

Pablo Ascorbe ¹
María S. Campos ²
César Domínguez ¹
Jónathan Heras ¹
Magdalena Pérez ³
Ana Rosa Terroba-Reinares ¹⁴

1 Departamento de Matemaicas y Computación, Universidad de La Rioja, Spain
2 Unidad de Salud Mental Espartero, Logroño, La Rioja
3 Teléfono de la Esperanza
4 Fundación Rioja Salud

Mostrar afiliaciones +

Actas:

Nineteenth International Conference on Computer Aided Systems Theory (EUROCAST 2024). Extended Abstracts

Alexis Quesada-Arencibia (ed. lit.)
José Carlos Rodríguez-Rodríguez (ed. lit.)
Gabriele Salvatore de Blasio (ed. lit.)
Carmelo Rubén García (ed. lit.)
Roberto Moreno-Díaz (ed. lit.)

Editorial: IUCTC Universidad de Las Palmas de Gran Canaria

ISBN: 978-84-09-58721-6

Año de publicación: 2024

Páginas: 157-158

Congreso: 19th International Conference on Computer Aided Systems Theory (EUROCAST 2024). Extended Abstracts. Las Palmas de Gran Canaria, Spain, February 25-March 1, 2024

Tipo: Aportación congreso

GOOGLE SCHOLAR Acceso abierto editor

Repositorio institucional: Acceso abierto Editor

Resumen

Suicide is a public health problem since worldwide more than 800,000 suicidesare estimated to occur every year, one every 40 seconds, figures of epidemicproportions [4]. Such is the scale of the problem that specialists tasked withanswering questions related to it are overwhelmed and overburdened. Moreover,much of the information that can be found about suicide on the Internet couldbe more harmful than helpful. These problems could be tackled by means ofautomatic Question-Answering (Q&A) systems [3]; however, it is necessary tohave trustworthy corpora that help to validate, train, and guide the constructionof such systems [2]. Since, up to the best of the authors knowledge, there is nota Spanish Q&A corpus for suicide information, the aim of this work is to createsuch a corpora for suicide information. In particular, we have considered threelevels of quality [1]: bronze-standard, when the entire data has been generatedautomatically and has little processing; silver-standard, when starting from abronze corpora, a processing stage is applied to refine the data followed by anannotation and validation stage conducted by experts; and, gold-standard, whenthe corpus has been manually generated and validated by experts.The starting point to obtain the corpora is a set of trustworthy Spanishdocuments provided by suicide experts (in our case, it is composed of a to-tal of 151 documents). Then, a bronze-standard Q&A corpora was built usingthree large language models, two in Spanish (bertin-gpt-j-6B-alpaca and Llama-2-7b-ft-instruct-es) and one in English (t5-base-squad-qag), all of them freelyavailable at Hugging Face platform. We split the documents into chunks, andfor each chunk, we asked the language models to generate a Q&A pair. Usingthis procedure, a total of 22,920 Q&A pairs were obtained, but many generatedquestions were incomplete, repeated or contained essentially the same informa-tion among them. Therefore, some filters were applied to increase the quality of the generated corpus. A first filter was basic data processing operations, suchas the elimination of empty, duplicated or incomplete pairs. Then, we trained adeep learning classification model (named bertin-roberta-base-spanish-spanish-suicide-intent, again freely available at Hugging Face) using a suicidal behaviourdataset to determine whether a question-answer pair contains information aboutsuicide. This model allowed us to filter out not suicide related pairs. Finally, weremoved the pairs that were semantically similar by using an embedding andthe cosine distance. After all these steps, we obtained the final version of ourbronze-standard corpora, leaving us with 4,901 Q&A pairs.From that bronze corpora, a manual filtering was conducted by non-expertsto eliminate Q&A uninteresting pairs; i.e., they talked about suicide but seemednot to be useful or were too ambiguous. This left 484 pairs to be evaluated byexperts. In order to perform such an evaluation, a web application was developedfor the validation of the corpus by a group of psychologists and psychiatrists,allowing them to update or remove the pairs that did not pass their filter; thusobtaining the silver-standard corpus with 380 Q&A pairs.In order to evaluate the models performance to generate interesting Q&Apairs, we describe the number of pairs obtained by each model in each step ofthe process. As a starting point, 7,806 were generated from Bertin, 4,558 fromLlama-2, and 10,557 from t5-base-squad. After automatic filters, the numberof pairs were reduced to 760 from Bertin, 1,166 from Llama-2, and 2,977 fromt5-base-squad. After manual filtering by a non-expert, we had 337 from Bertin,55 from Llama-2, and 92 from t5-base-squad. Finally, the silver corpus had 272from Bertin, 45 from Llama-2, and 63 from t5-base-squad. We can observe thatthe Bertin model had the best performance.In addition, a gold-standard corpus directly extracted from the FAQs sec-tions contained in some of the documents provided by experts (i.e. they are notautomatically generated) has been also provided. This dataset consists of 118Q&A items.The built Q&A corpus and models described previously are freely availableat https://huggingface.co/PrevenIA. These corpus are the first step towardsthe validation, training, and construction of automatic Q&A systems that pro-vide information about suicide

Referencias bibliográficas

1. Casola, S., Lavelli, A., Saggion, H.: Creating a silver standard for patent simplifica- tion. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 1045–1055 (2023)
2. Das, B., Nirmala, S.J.: Improving healthcare question answering system by identi- fying suitable answers. In: 2022 IEEE 2nd Mysore Sub Section International Con- ference (MysuruCon). pp. 1–6. IEEE (2022)
3. Rogers, A., Gardner, M., Augenstein, I.: QA dataset explosion: A taxonomy of NLP resources for question answering and reading comprehension. ACM Computing Surveys 55(10), 1–45 (2023)
4. WHO: Suicide worldwide in 2019: global health estimates (2021)