Enhancing the understanding of clinical trials with a sentence-level simplification dataset

  1. Campillos-Llanos, Leonardo
  2. Bartolomé, Rocío
  3. Terroba Reinares, Ana R.
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2024

Número: 72

Páginas: 31-43

Tipo: Artículo

beta Ver similares en nube de resultados

Otras publicaciones en: Procesamiento del lenguaje natural

Repositorio institucional: lock_openAcceso abierto Editor

Resumen

Se presenta un conjunto de 1200 frases de ensayos clínicos en español simplificadas manualmente (144 019 tokens). Se analizaron 1040 anuncios del Registro Europeo de Ensayos Clínicos (EudraCT), seleccionando frases con ambigüedades o con más de 25 palabras. Se elaboraron criterios de simplificación recogidos en una guía distribuida públicamente con el conjunto de datos. Se obtuvieron dos versiones: oraciones simplificadas sintácticamente, y oraciones con simplificación léxica y sintáctica. Se presenta una evaluación cuantitativa, cualitativa y por tres evaluadores independientes sobre la gramaticalidad/fluidez, adecuación semántica y simplificación. Los resultados muestran que el recurso es adecuado para avanzar en la investigación en simplificación automática de textos médicos.

Referencias bibliográficas

  • Alarcón, R., P. Martínez, and L. Moreno. 2023. Tuning bart models to simplify spanish health-related content. Procesamiento del Lenguaje Natural, 70:111-122.
  • Alarcon, R., L. Moreno, and P. Martínez. 2023. EASIER corpus: A lexical simplification resource for people with cognitive impairments. Plos one, 18(4):e0283622.
  • Alva-Manchego, F., L. Martin, A. Bordes, C. Scarton, B. Sagot, and L. Specia. 2020. ASSET: A dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations. In Proc. of the 58th ACL, page 4668–4679.
  • Alva-Manchego, F., C. Scarton, and L. Specia. 2021. The (un) suitability of automatic evaluation metrics for text simplification. Computational Linguistics, 47(4):861–889.
  • Bansal, S. and C. Aggarwal. 2021. Textstat. https://pypi.org/project/textstat/.
  • Barrio-Cantalejo, I. M., P. Simón-Lorda, M. Melguizo, I. Escalona, M. I. Marijuán, and P. Hernando. 2008. Validación de la Escala INFLESZ para evaluar la legibilidad de los textos dirigidos a pacientes. 31(2):135–152.
  • Bodenreider, O. 2004. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, 32(suppl 1):D267–D270.
  • Bott, S. M. and H. Saggion. 2011. Spanish text simplification: An exploratory study. Procesamiento del Lenguaje Natural, 47:87–95.
  • Brouwers, L., D. Bernhard, A.-L. Ligozat, and T. François. 2014. Syntactic sentence simplification for French. In Proc. Of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), pages 47–56.
  • Campillos-Llanos, L., A. R. Terroba Reinares, S. Zakhir Puig, A. Valverde-Mateos, and A. Capllonch-Carrión. 2022. Building a comparable corpus and a benchmark for Spanish medical text simplification. Procesamiento del lenguaje natural, pages 189–196.
  • Campillos-Llanos, L. 2023. MedLexSpa medical lexicon for Spanish medical natural language processing. Journal of Biomedical Semantics, 14(1):1–23.
  • Campillos-Llanos, L., A. Valverde-Mateos, A. Capllonch-Carrión, and A. Moreno-Sandoval. 2021. A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine. BMC Med Inform Decis Mak, 21(1):1–19.
  • Carbajo, B. and A. Moreno-Sandoval. 2023. Financial concepts extraction and lexical simplification in spanish. (Under review).
  • Cardon, R., A. Bibal, R. Wilkens, D. Alfter, M. Norré, A. Müller, W. Patrick, and T. François. 2022. Linguistic corpus annotation for automatic text simplification evaluation. In Proc. of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1842–1866.
  • Chatterjee, N. and R. Agarwal. 2021. DEPSYM: A Lightweight Syntactic Text Simplification Approach using Dependency Trees. In CTTS@ SEPLN, pages 42–56.
  • Collados, J. C. 2013. Splitting complex sentences for natural language processing applications: Building a simplified Spanish corpus. Procedia-Social and Behavioral Sciences, 95:464–472.
  • da Cunha, I. 2022. Un redactor asistido para adaptar textos administrativos a lenguaje claro. Procesamiento del Lenguaje Natural, 69:39–49.
  • Deléger, L. and P. Zweigenbaum. 2009. Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora. In Proc. of the 2nd Workshop on Building and Using Comparable Corpora, pages 2–10.
  • Elhadad, N. and K. Sutaria. 2007. Mining a lexicon of technical terms and lay equivalents. In Biological, translational, and clinical language processing, pages 49–56.
  • European-Commission. 2016. Cómo escribir con claridad. Brussels: Directorate-General for Translation, Publications Office.
  • Evans, R. and C. Orasan. 2019. Identifying signs of syntactic complexity for rule-based sentence simplification. Natural Language Engineering, 25(1):69–119.
  • Fang, Y., J. H. Kim, B. R. S. Idnay, R. A. Garcia, C. E. Castillo, Y. Sun, H. Liu, C. Liu, C. Yuan, and C. Weng. 2021. Participatory design of a clinical trial eligibility criteria simplification method. In Medical Informatics Europe, pages 984–988.
  • Ferrés, D., M. Marimon, H. Saggion, and A. AbuRa’ed. 2016. YATS: yet another text simplifier. In Proc. of the 21st Int. Conf. on Applications of Natural Language to Information Systems, NLDB 2016, pages 335–342. Springer.
  • Ferrés, D. and H. Saggion. 2022. ALEXSIS: a dataset for lexical simplification in Spanish. In Proc. of LREC 2022, pages 3582–94, Marseille, France.
  • Flesch, R. 1948. A new readability yardstick. Journal of applied psychology, 32(3):221.
  • Grabar, N. and T. Hamon. 2016. A large rated lexicon with French medical words. In Proc. of LREC 2016, pages 2643–2648, Portoroz, Slovenia.
  • Grabar, N. and H. Saggion. 2022. Evaluation of automatic text simplification: Where are we now, where should we go from here. In Actes de la 29e Conférence TALN, pages 453–463.
  • Joseph, S., K. Kazanas, K. Reina, V. J. Ramanathan, W. Xu, B. C. Wallace, and J. J. Li. 2023. Multilingual simplification of medical texts. arXiv preprint arXiv:2305.12532.
  • Koptient, A., R. Cardon, and N. Grabar. 2019. Simplification-induced transformations: typology and some characteristics. In BioNLP 2019, page 309–318.
  • Koptient, A. and N. Grabar. 2020. Finegrained text simplification in French: steps towards a better grammaticality. In P. Bath, P. Jokela, and L. Sbaffi, editors, Proc. of Int. Symp. on Health Information Management Research.
  • Lalor, J. P., B. Woolf, and H. Yu. 2019. Improving electronic health record note comprehension with NoteAid: randomized trial of electronic health record note comprehension interventions with crowdsourced workers. Journal of medical Internet research, 21(1):e10793.
  • Lin, C.-Y. 2004. ROUGE: A package for automatic evaluation of summaries. In Proc. of Workshop on Text Summarization of ACL, pages 74–81, Barcelona, Spain.
  • Maddela, M., F. Alva-Manchego, and W. Xu. 2021. Controllable text simplification with explicit paraphrasing. In Proc. Of NAACL, pages 3536–3553.
  • Martin, L., A. Fan, É. de la Clergerie, A. Bordes, and B. Sagot. 2021. Muss: Multilingual unsupervised sentence simplification by mining paraphrases. arXiv preprint arXiv:2005.00352.
  • Martin, L., B. Sagot, E. de la Clergerie, and A. Bordes. 2020. Controllable sentence simplification. In Proc. of LREC 2020, pages 4689–4698, Marseille, France.
  • Menta, A. and A. García-Serrano. 2022. Controllable sentence simplification using transfer learning. Proc. of the Working Notes of CLEF.
  • Mukherjee, P., G. Leroy, D. Kauchak, S. Rajanarayanan, D. Y. R. Diaz, N. P. Yuan, T. G. Pritchard, and S. Colina. 2017. NegAIT: A new parser for medical text simplification using morphological, sentential and double negation. Journal of biomedical informatics, 69:55–62.
  • North, K., T. Ranasinghe, M. Shardlow, and M. Zampieri. 2023. Deep learning approaches to lexical simplification: A survey. arXiv preprint arXiv:2305.12000.
  • Ondov, B., K. Attal, and D. Demner-Fushman. 2022. A survey of automated methods for biomedical text simplification. Journal of the American Medical Informatics Association, 29(11):1976–1988.
  • Paetzold, G. H. and L. Specia. 2017. A survey on lexical simplification. Journal of Artificial Intelligence Research, 60:549–593.
  • Papineni, K., S. Roukos, T. Ward, and W.-J. Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proc. of the 40th Annual Meeting of the ACL, pages 311–318.
  • Peng, Y., C. O. Tudor, M. Torii, C. H. Wu, and K. Vijay-Shanker. 2012. iSimp: A sentence simplification system for biomedical text. In 2012 IEEE International Conference on Bioinformatics and Biomedicine, pages 1–6. IEEE.
  • Saggion, H. 2017. Automatic text simplification, volume 32. Synthesis Lectures on Human Language Technologies, Springer.
  • Saggion, H., S. Stajner, S. Bott, S. Mille, L. Rello, and B. Drndarevic. 2015. Making it Simplext: Implementation and evaluation of a text simplification system for Spanish. ACM Transactions on Accessible Computing (TACCESS), 6(4):1–36.
  • Saggion, H., S. ˇStajner, D. Ferr´es, K. C. Sheang, M. Shardlow, K. North, and M. Zampieri. 2023. Findings of the TSAR-2022 shared task on multilingual lexical simplification. arXiv preprint arXiv:2302.02888.
  • Scarton, C., A. P. Aprosio, S. Tonelli, T. M. Wanton, and L. Specia. 2017. MUSST: A multilingual syntactic simplification tool. In Proc. of the IJCNLP 2017, System Demonstrations, pages 25–28.
  • Segura-Bedmar, I. and P. Martínez. 2017. Simplifying drug package leaflets written in Spanish by using word embedding. Journal of biomedical semantics, 8(1):1–9.
  • Seretan, V. 2012. Acquisition of Syntactic Simplification Rules for French. In Proc. of LREC, pages 4019–4026.
  • Shardlow, M. 2013. A comparison of techniques to automatically identify complex words. In Proc. of the 51st annual meeting of the Association for Computational Linguistics, pages 103–109.
  • Shardlow, M. 2014. A survey of automated text simplification. International Journal of Advanced Computer Science and Applications, 4(1):58–70.
  • Siddharthan, A. 2006. Syntactic simplification and text cohesion. Research on Language and Computation, 4:77–109.
  • Sulem, E., O. Abend, and A. Rappoport. 2018. BLEU is not suitable for the evaluation of text simplification. In E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, editors, Proc. of the 2018 EMNLP Conference, pages 738–744, Brussels, Belgium. Association for Computational Linguistics.
  • Szep, A., M. Szep, G. Leroy, D. Kauchak, N. Kloehn, D. Revere, and M. Just. 2019. Algorithmic generation of grammar simplification rules using large corpora. AMIA Summits on Translational Science Proceedings, 2019:72–81.
  • Trienes, J., J. Schlötterer, H.-U. Schildhaus, and C. Seifert. 2022. Patient-friendly clinical notes: towards a new text simplification dataset. In Proc. of the TSAR-2022 Workshop, pages 19–27.
  • Wei, C.-H., R. Leaman, and Z. Lu. 2014. Simconcept: A hybrid approach for simplifying composite named entities in biomedicine. In Proc. of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pages 138–146.
  • Wilkens, R., B. Oberle, and A. Todirascu. 2020. Coreference-based text simplification. In Proc. of the 1st READI Workshop, pages 93–100.
  • Wu, D. T., D. A. Hanauer, Q. Mei, P. M. Clark, L. C. An, J. Proulx, Q. T. Zeng, V. V. Vydiswaran, K. Collins-Thompson, and K. Zheng. 2016. Assessing the readability of clinicaltrials.gov. Journal of the American Medical Informatics Association, 23(2):269–275.
  • Xu, W., C. Callison-Burch, and C. Napoles. 2015. Problems in current text simplification research: New data can help. Transactions of the Association for Computational Linguistics, 3:283–297.
  • Xu, W., C. Napoles, E. Pavlick, Q. Chen, and C. Callison-Burch. 2016. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, 4:401–415.
  • Yamaguchi, D., R. Miyata, S. Shimada, and S. Sato. 2023. Gauging the gap between human and machine text simplification through analytical evaluation of simplification strategies and errors. In Findings of EACL 2023, pages 359–375.
  • Zeng-Treitler, Q., H. Kim, S. Goryachev, A. Keselman, L. Slaughter, and C.-A. Smith. 2007. Text characteristics of clinical reports and their implications for the readability of personal health records. Studies in health technology and informatics, 129(2):1117.