La perplejidad como herramienta para estimar la asignación de nivel de competencia en escritos de una lengua extranjera

  1. Mata, Gadea
  2. Rubio, Julio
  3. Agustín Llach, María del Pilar
  4. Heras, Jonathan
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2023

Issue: 71

Pages: 29-38

Type: Article

More publications in: Procesamiento del lenguaje natural

Institutional repository: lock_openOpen access Editor

Abstract

The allocation of proficiency levels to utterances written by foreign language learners is a subjective task. Therefore, the development of methods to automatically evaluate written sentences can help both students and teachers. In this work, we have explored two different approaches to tackle this task by using the corpus CAES, which contains written utterances of learners of Spanish labelled with CEFR levels (up to C1). The first approach is a deep learning model called Deep-ELE which assigns proficiency levels to sentences. The second approach consists in studying the perplexity of sentences written by students of different levels, to later allocate levels to those sentences based on such an analysis. Both approaches have been evaluated, and results confirm that they can be used to successfully classify written sentences into proficiency levels. In particular, the Deep-ELE model reaches an accuracy of 81.3% and a weighted Cohen Kappa of 0.83. As a conclusion, this work is a step towards better understanding how natural language processing methods can help learners of a second language.

Bibliographic References

  • Burstein, J., J. Tetreault, y N. Madnani. 2013. The e-rater automated essay scoring system. En Handbook of Automated Essay Evaluation. Routledge, páginas 55—-67.
  • CAES. 2022. Corpus de aprendices de español (CAES). https://galvan.usc.es/caes/.
  • COE. 2021. CEFR: Common European Framework of Reference for Languages. Council of Europe. https://www.coe.int/en/web/commoneuropean- framework-reference-languages. Cotos, E. 2014. Genre-based automated writing evaluation for L2 research writing: From design to evaluation and enhancement. Macmillan.
  • Devlin, J., M.-W. Chang, K. Lee, y K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. En Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), páginas 4171–4186. Association for Computational Linguistics.
  • Ding, H., Q. Zhong, S. Zhang, y L. Yang. 2021. Text difficulty classification by combining machine learning and language features. En The International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, páginas 1055– 1063. Springer.
  • Foltz, P. W., L. A. Streeter, K. E. Lochbaum, y T. K. Landauer. 2013. Implementation and applications of the Intelligent Essay Assessor. En Handbook of Automated Essay Evaluation. Routledge, páginas 68–88.
  • Fu, J. 2020. Automatic Proficiency Evaluation of Spoken English by Japanese Learners for Dialogue-Based Language Learning System Based on Deep Learning. Ph.D. tesis, Tohoku University.
  • Gilliam, W. 2021. Blur: A library that integrates huggingface transformers with version 2 of the fastai framework. https://github.com/ohmeow/blurr.
  • Hamp-Lyons, L., editor. 1991. Assessing second language writing in academic contexts. Ablex.
  • Hancke, J. y D. Meurers. 2013. Exploring CEFR classification for german based on rich linguistic modeling. Learner Corpus Research, páginas 54–56.
  • Hao, T., X. Li, Y. He, F. L. Wang, y. Qu. 2022. Recent progress in leveraging deep learning methods for question answering. Neural Computing and Applications, páginas 1–19.
  • Heafield, K. 2023. Kenlm language model toolkit. https://kheafield.com/code/kenlm/. Howard, J. y S. Gugger. 2020. Fastai: A layered API for deep learning. Information, 11:108.
  • Jacobs, H. L., S. A. Zinkgraf, D. R. Wormuth, V. F. Hearfiel, y J. B. Hughey. 1981. Testing ESL Composition: A Practical Approach. English Composition Program. Newbury House Publishers, Inc.
  • Jarvis, S., R. Alonso, y S. Crossley. 2019. Native language identification by human judges. En Cross-linguistic influence: From empirical evidence to classroom practice. Springer, páginas 215–231.
  • Jarvis, S. y M. Paquot. 2015. Native language identification. Cambridge University Press.
  • Jurafsky, D. y J. H. Martin. 2021. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall.
  • Kobayashi, A. y I. Wilson. 2020. Using deep learning to classify english native pronunciation level from acoustic information. En SHS Web of Conferences, volumen 77, página 02004. EDP Sciences.
  • Kouris, P., G. Alexandridis, y A. Stafylopatis. 2021. Abstractive text summarization: Enhancing sequence-to-sequence models using word sense disambiguation and semantic content generalization. Computational Linguistics, 47(4):813–859.
  • Lab, T. L. A. 2023. English language learning: Evaluating language knowledge of ell students from grades 8-12. https://www.kaggle.com/competitions/feedbackprize- english-language-learning.
  • Lim, K., J. Song, y J. Park. 2022. Neural automated writing evaluation for korean L2 writing. Natural Language Engineering, páginas 1–23.
  • Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, y V. Stoyanov. 2019. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
  • Malmasi, S., K. Evanini, A. Cahill, J. Tetreault, R. Pugh, C. Hamill, D. Napolitano, y. Qian. 2017. A report on the 2017 native language identification shared task. En 12th Workshop on Innovative Use of NLP for Building Educational Applications, páginas 62–75. Association for Computational Linguistics.
  • Metallinou, A. y J. Cheng. 2014. Using deep neural networks to improve proficiency assessment for children english language learners. En Fifteenth Annual Conference of the International Speech Communication Association.
  • Minaee, S., N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, y J. Gao. 2021. Deep learning–based text classification: a comprehensive review. ACM Computing Surveys (CSUR), 54(3):1–40.
  • Narayan, S. y C. Gardent. 2020. Deep learning approaches to text production. Synthesis Lectures on Human Language Technologies, 13(1):1–199.
  • Ney, H., U. Essen, y R. Kneser. 1994. On structuring probabilistic dependences in stochastic language modelling. Computer Speech & Language, 8(1):1–38.
  • Paszke, A., S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. K¨opf, E. Z. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, y S. Chintala. 2019. PyTorch: An imperative style, highperformance deep learning library. En Advances in Neural Information Processing Systems 32. Curran Associates, Inc., páginas 8024–8035.
  • Polio, C. y H. Yoon. 2020. Exploring multiword combinations as measures of linguistic accuracy in second language writing. En Learner corpora and second language acquisition research. Cambridge University Press, páginas 96–121.
  • Santos, R., J. Rodrigues, A. Branco, y R. Vaz. 2021. Neural text categorization with transformers for learning portuguese as a second language. En EPIA Conference on Artificial Intelligence, páginas 715–726. Springer.
  • Santucci, V., L. Forti, F. Santarelli, S. Spina, y A. Milani. 2020. Learning to classify text complexity for the italian language using support vector machines. En International Conference on Computational Science and Its Applications, páginas 367– 376. Springer.
  • Shao, C., Y. Feng, J. Zhang, F. Meng, y J. Zhou. 2021. Sequence-level training for non-autoregressive neural machine translation. Computational Linguistics, 47(4):891–925.
  • Sharif Razavian, A., H. Azizpour, J. Sullivan, y S. Carlsson. 2014. CNN features off-theshelf: An astounding baseline for recognition. En CVPRW’14, páginas 512–519.
  • Sung, Y.-T., W.-C. Lin, S. B. Dyson, K.- E. Chang, y Y.-C. Chen. 2015. Leveling l2 texts through readability: Combining multilevel linguistic features with the CEFR. The Modern Language Journal, 99(2):371–391.
  • Takai, K., P. Heracleous, K. Yasuda, y A. Yoneyama. 2020. Deep learning-based automatic pronunciation assessment for second language learners. En International Conference on Human-Computer Interaction, páginas 338–342. Springer.
  • Tunstall, L., L. von Werra, y T. Wolf. 2022. Natural language processing with transformers. O’Reilly Media, Inc.
  • Weigle, S. C. 2002. Assessing writing. Cambridge University Press.
  • Wolf, T., L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, y A. Rush. 2020. Transformers: State-of-the-art natural language processing. En Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, páginas 38– 45. Association for Computational Linguistics.
  • Wolfe-Quintero, K., S. Inagaki, y H.-Y. Kim. 1998. Second language development in writing: Measures of fluency, accuracy, and complexity. University of Hawai’i Press.