Document Type : Research Paper
Author
Alzahra University
Abstract
The evaluation of students' writings and the allocation of scores are traditionally time-intensive and inherently subjective, often resulting in inconsistencies among human raters. Automated essay scoring systems were introduced to address these issues; however, their development has historically been resource-intensive, restricting their application to standardized tests such as TOEFL and IELTS. Consequently, these systems were not readily accessible to educators and learners. Recent advancements in Artificial Intelligence (AI) have expanded the potential of automated scoring systems, enabling them to analyze written texts and assign scores with increased efficiency and versatility. This study aimed to compare the efficacy of an AI-based scoring system, DeepAI, with human evaluators. A quantitative approach, grounded in Corder's (1974) Error Analysis framework, was used to analyze approximately 200 essays written by Persian-speaking EFL learners. Paired sample t-tests and Pearson correlation coefficients were employed to assess the congruence between errors identified and scores assigned by the two methods. The findings revealed a moderate correlation between human and AI scores, with AI diagnosing a greater number of errors than human raters. These results underscore the potential of AI in augmenting writing assessment practices while highlighting its pedagogical implications for language instructors and learners, particularly in evaluating the essays of EFL students.
Keywords
Main Subjects