Generación de flexión morfológica con UniMorph.Evaluación con base de datos relacional y pautas de entrenamiento

  1. Martín Arista, Javier
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2022

Issue: 68

Pages: 61-70

Type: Article

More publications in: Procesamiento del lenguaje natural

Institutional repository: lock_openOpen access Postprint lock_openOpen access Editor

Abstract

The aim of this article is to assess the morphological inflection generation of Old English of the UniMorph data set. The method of this study is based on McCarthy et al.´s (2020) model of generation of putative morphological paradigms. The assessment includes inflections (morphological features and values), inflectional forms and stems. The question is also addressed of plausibility, understood as the effective attestedness of an inflectional form. The assessment tasks are carried out in a relational database specifically designed for filing and comparing the relevant data sets, including treebanks and databases of Old English lexicographical and textual sources. The overall conclusion is that the Old English UniMorph data set is consistent and robust. On the basis of the assessment, however, training guidelines of the generation model are proposed that include characters, diacritical marks, the prefix ge- in verbs, the superlative grade of adjectives, the adjectivally inflected participle and some local shortcomings.

Research Data

Bibliographic References

  • Anthony, L. 2020. AntConc (Version 3.5.9) [Computer Software]. Tokyo, Japan: Waseda University. Available from https://www.laurenceanthony.net/software
  • Campbell, A. 1987. Old English Grammar. Oxford University Press, Oxford.
  • Çöltekin, Çağrı. 2019. Cross-lingual morphological inflection with explicit alignment. Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 71–79, Association for Computational Linguistics.
  • Cotterell, R., C. Kirov, J. Sylak-Glassman, G. Walther, E. Vylomova, A. D. McCarthy, K. Kann, S. Mielke, G. Nicolai, M. Silfverberg, D. Yarowsky, J. Eisner, and M. Hulden. 2018. The CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection. Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection, pages 1-27, Association for Computational Linguistics.
  • Healey, A. (ed.), J. Wilkin, and X. Xiang. 2004. The Dictionary of Old English web corpus. Toronto: Dictionary of Old English Project, Centre for Medieval Studies, University of Toronto.
  • Healey, A. (ed.). 2018. The Dictionary of Old English in electronic form A-I. Toronto: Dictionary of Old English Project, Centre for Medieval Studies, University of Toronto.
  • Hogg, R. M., and R. D. Fulk. 2011. A Grammar of Old English. Volume 2: Morphology. Blackwell.
  • Johnson, B. 2009. Using the Levenshtein algorithm for automatic lemmatization in Old English. MA Thesis, The University of Georgia.
  • Jurafsky, D., and J. H. Martin. Speech and Language Processing (3rd. edition). Forthcoming.
  • Kastovsky, D. 1992. Semantics and vocabulary. In R. Hogg (ed.) The Cambridge history of the English language I: The beginnings to 1066, pages 290-408, Cambridge University Press, Cambridge.
  • Martín Arista, J. 2012. The Old English prefix ge-: A panchronic reappraisal. Australian Journal of Linguistics, 32(4):411–433.
  • Martín Arista, J., S. Domínguez Barragán, L. García Fernández, E. Ruíz Narbona, R. Torre Alonso, R., and R. Vea Escarza. 2021. ParCorOEv2. An open access annotated parallel corpus Old English-English. Nerthus Project, Universidad de La Rioja, www.nerthusproject.com.
  • McCarthy, A. D., C. Kirov, M. Grella, A. Nidhi, P. Xia, K. Gorman, E. Vylomova, S. J. Mielke, G. Nicolai, M. Silfverberg, T. Arkhangelskij, N. Krizhanovsky, A. Krizhanovsky, E. Klyachko, A. Sorokin, J. Mansfield, V. Ernštreits, Y. Pinter, C. L. Jacobs, R. Cotterell, M. Hulden, and D. Yarowsky. 2020. UniMorph 3.0: Universal Morphology. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 3922–3931, European LanguageResources Association.
  • Sylak-Glassman, J. 2016. The Composition and Use of the Universal Morphological Feature Schema (UniMorph Schema). Working draft, v. 2. Forthcoming.
  • Taylor, A., A. Warner, S. Pintzuk, and F. Beths. 2003. The York-Toronto-Helsinki Parsed Corpus of Old English Prose https://www-users.york.ac.uk/~lang22/YcoeHome1.htm.
  • Torre Alonso, R. 2021. Old English Class I Strong Verbs Lemmatization: A Morphological Generation Approach. Studia Neophilologica. To appear. DOI: 10.1080/00393274.2021.2010128.