Lemmatising Treebanks. Corpus Annotation with Knowledge Bases

  1. Carmen Novo Urraca
  2. Ana Elvira Ojanguren López
Revista:
RAEL: revista electrónica de lingüística aplicada

ISSN: 1885-9089

Año de publicación: 2018

Volumen: 17

Número: 1

Páginas: 99-120

Tipo: Artículo

Otras publicaciones en: RAEL: revista electrónica de lingüística aplicada

Repositorio institucional: lock_openAcceso abierto Postprint lock_openAcceso abierto Editor

Resumen

Este artículo se centra en la lexicografía del inglés antiguo y el análisis de corpus. El objetivo es definir un procedimiento de lematización para un tipo de corpus del inglés antiguo anotado y parseado conocido como treebank. Este estudio se centra en dos cuestiones, concretamente en indicar dónde se encuentran los datos con los que se puede lematizar el treebank del inglés antiguo; y qué procedimiento debe adoptarse para enlazar la lematización disponible en las fuentes con el treebank. A partir de las bases de conocimiento del Proyecto Nerthus, se diseña, pone en práctica y evalúa un procedimiento semiautomático para dotar The York-Toronto-Helsinki Parsed Corpus of Old English Prose de etiquetas de lemas.

Referencias bibliográficas

  • Abeillé, A. (2003). Introduction. In A. Abeillé (Ed.), Treebanks: Building and Using Parsed Corpora. Dordrecht: Kluwer. xiii xxvi.
  • Bosworth, J. & Toller, T. N. 1973 (1898). An Anglo-Saxon Dictionary. Oxford: Oxford University Press.
  • Brunner, K. (1965). Altenglische Grammatik nach der Angelsӓchsischen Grammatik von Eduard Sievers (3rd ed.). Tübingen: Max Niemeyer Verlag.
  • Campbell, A. 1987 (1959). Old English Grammar. Oxford: Oxford University Press.
  • Clark Hall, J. R. (1996). A Concise Anglo-Saxon Dictionary. Supplement by H. D. Merritt. Toronto: University of Toronto Press.
  • Fowler, R. (1972). Wulfstan´s Canons of Edgar. Oxford: Oxford University Press.
  • García Fernández, L. (2015). The Lemmatisation of Derived Preterite-Present and Irregular Verbs on a Lexical Database of Old English. Master´s Thesis. University of La Rioja.
  • García Fernández, L. Preterite-present verb lemmas from a corpus of Old English. In P. Guerrero Medina, R. Torre Alonso and R. Vea Escarza (Eds.), Verbs, Clauses and Constructions: Functional and Typological Approaches. Newcastle: Cambridge Scholars Publishing. Forthcoming.
  • García García, L. (2012). Morphological causatives in Old English: the quest for a vanishing formation. Transactions of the Philological Society, 110(1), 112-148. doi: 10.1111/j.1467-968X.2012.01287.x
  • García García, L. (2013). Lexicalization and morphological simplification in Old English jan-causatives: some open questions. Sprachwissenschaft, 38(2), 245-264.
  • González Torres, E. (2010a). The continuum inflection-derivation and the Old English suffixes -a, -e, -o, -u. ATLANTIS, 32.1, 103-122.
  • González Torres, E. (2010b). The bases of derivation of Old English affixed nouns: status and category. Studia Anglica Posnaniensia, 46(2), 21-43.
  • González Torres, E. (2011). Morphological complexity, recursiveness and templates in the formation of Old English nouns. Estudios Ingleses de la Universidad Complutense, 19, 45-70.
  • Hajičová, E., Abeillé, A., Hajič, J., Mírovský, J. & Urešová, Z. (2010). Treebank annotation. In N. Indurkhaya & F. Damerau (Eds.), Handbook of Natural Language Processing (pp. 167-188). Boca Raton, FL: Chapman & Hall/CRC.
  • Hargrove, H. L. (1902). King Alfred´s Old English Version of St. Augustine´s Soliloquies. New York: Henry Holt and Company.
  • Haug, D. (2015). Treebanks in historical linguistic research. In C. Viti (Ed.), Perspectives on Historical Syntax, (pp. 188-202). Amsterdam: John Benjamins.
  • Healey, A. diPaolo (Ed.). (2016). The Dictionary of Old English: A to H. Toronto: Dictionary of Old English Project, Centre for Medieval Studies, University of Toronto.
  • Healey, A. diPaolo, (Ed.) Price, J. & Xiang, X. (2004). The Dictionary of Old English Web Corpus. Toronto: Dictionary of Old English Project, Centre for Medieval Studies, University of Toronto.
  • Healey, A. dePaolo (Ed.). (2016). The Dictionary of Old English in Electronic Form A-H. Toronto: Dictionary of Old English Project, Centre for Medieval Studies, University of Toronto.
  • Hogg, R. M. & Fulk, R. D. (2011). A Grammar of Old English. Oxford: Wiley-Blackwell.
  • Kastovsky, D. (1992). Semantics and vocabulary. In R. M. Hogg (Ed.), The Cambridge History of the English Language I: The Beginnings to 1066 (pp. 290-408). Cambridge: Cambridge University Press. doi: 10.1017/CHOL9780521264747.006
  • Marcus, M., Marcinkiewicz, M, & Santorini, B. (1993). Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19(2), 313-330.
  • Marsden, R. (2004). The Cambridge Old English Reader. Cambridge: Cambridge University Press.
  • Martín Arista, J. (2012a). Lexical database, derivational map and 3D representation. RESLA-Revista Española de Lingüística Aplicada, (Extra 1), 119-144.
  • Martín Arista, J. (2012b). The Old English prefix Ge-: A panchronic reappraisal. Australian Journal of Linguistics, 32(4), 411-433. doi: 10.1080/07268602.2012.744264
  • Martín Arista, J. (2013a). Recursivity, derivational depth and the search for Old English lexical primes. Studia Neophilologica, 85(1), 1-21. doi: 10.1080/00393274.2013.771829
  • Martín Arista, J. (2013b). Nerthus. Lexical Database of Old English: From word-formation to meaning construction. Research Seminar, School of English, University of Sheffield.
  • Martín Arista, J. (2014). Noun layers in Old English. Asymmetry and mismatches in lexical derivation. Nordic Journal of English Studies, 13(3), 160-187.
  • Martín Arista, J. (2017a). El paradigma derivativo del inglés antiguo. Onomazeín, 37, 144-169.
  • Martín Arista, J. (2017b). The design and implementation of a pilot parallel corpus of Old English. Paper presented at the SHELL Session of the 2017 International Medieval Conference. Leeds, University of Leeds, United Kingdom. July, 4.
  • Martín Arista, J. (2017c). The Nerthus Project at the crossroads. From lexical database to parallel corpus of Old English. Lecture delivered at the 2017 International Conference of SELIM. Málaga, University of Málaga, Spain.
  • Martín Arista, J. (2017d). The semantic poles of Old English. Towards the 3D representation of complex polysemy. Digital Scholarship in the Humanities. Forthcoming. doi: 10.1093/llc/fqx004.
  • Martín Arista, J. (coord.). Parallel Corpus of Old English Prose. Nerthus Project. Universidad de La Rioja. In preparation.
  • Martín Arista, J. & Cortés Rodríguez, F. (2014). From directionals to telics: meaning construction, word-formation and grammaticalization in role and reference grammar. In M. A. Gómez González, F. Ruiz de Mendoza Ibáñez & F. Gonzálvez García (Eds.), Theory and Practice in Functional-Cognitive Space, (pp. 229-250). Amsterdam: John Benjamins. doi: 10.1075/sfsl.69.10mar
  • Martín Arista, J. (Ed.) (2016). NerthusV3. Online Lexical Database of Old English. Nerthus Project. Universidad de La Rioja. Retrieved from: www.nerthusproject.com
  • Martín Arista, J. & Vea Escarza, R. (2016). Assessing the semantic transparency of Old English affixation: adjective and noun formation. English Studies, 97(1-2), 61-77.
  • Mateo Mendaza, R. (2013). The Old English exponent for the semantic prime TOUCH. Descriptive and methodological questions. Australian Journal of Linguistics, 33(4), 449-466. doi: 10.1080/0726.8602.2013.
  • Mateo Mendaza, R. (2014). The Old English adjectival affixes ful- and –ful: a text-based account on productivity. NOWELE-North-Western European Language Evolution, 67.1, 77-94. doi: 10.1075/nowele.67.1.
  • Mateo Mendaza, R. (2015a). Matching productivity indexes and diachronic evolution. The Old English affixes ful-, -isc, -cund and -ful. Canadian Journal of Linguistics, 60(1), 1-24.
  • Mateo Mendaza, R. (2015b). The search for Old English semantic primes: the case of HAPPEN. Nordic Journal of English Studies, 15, 71-99.
  • Mateo Mendaza, R. (2016). The Old English exponent for the semantic prime MOVE. Australian Journal of Linguistics, 34(4), 542-559. doi: 10.1080/07268602.2016.1169976
  • Metola Rodríguez, D. (2015). Lemmatisation of Old English Strong Verbs on a Lexical Database. PhD Dissertation, University of La Rioja, Spain.
  • Metola Rodríguez, D. (2017). Strong verb lemmas from a corpus of Old English. Advances and issues. Revista de Lingüística y Lenguas Aplicadas, 12, 65-76.
  • Mitchell, B. & Robinson, F. (1985). A Guide to Old English. Oxford: Blackwell.
  • Nivre, J. (2008). Treebanks. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook (Volume 1). Berlin: Mouton de Gruyter, 225-241.
  • Novo Urraca, C. 2015. Old English deadjectival paradigms. Productivity and recursivity. NOWELE-North-Western European Language Evolution, 68 (1):61-80.
  • Novo Urraca, C. 2016a. Old English suffixation. Content and transposition. English Studies, 97(6).
  • Novo Urraca, C. 2016b. Morphological relatedness and the typology of adjectival formation in Old English. Studia Neophilologica, 88(1).
  • O'Neill, P. P. (2001). King Alfred's Old English Prose Translation of the First Fifty