Analyzing Dependability and Bias in WDCT and DSAT using Many-Facet Rasch and G-Theory

Shahi, Reza; Ravand, Hamdollah

doi:10.22099/tesl.2025.52065.3370

Analyzing Dependability and Bias in WDCT and DSAT using Many-Facet Rasch and G-Theory

Document Type : Research Paper

Authors

Reza Shahi ¹
Hamdollah Ravand ²

¹ Faculty of Foreign Languages, Ilam University, Ilam , Iran

² Vali-e-asr University of Rafsanjan, Kerman, Iran, Iran

10.22099/tesl.2025.52065.3370

Abstract

Pragmatic testing depends on a variety of factors that can impact its dependability. This study intended to examine these factors using a two-phase approach. The first phase examined the impact of test methods, items, raters, and test-takers' characteristics on the variance in pragmatic test scores using generalizability theory, and the second phase explored potential rater bias using the Many-Facet Rasch model. Two test types, including a Written Discourse Completion Test (WDCT) and a Discourse Self-Assessment Test (DSAT), were administered to 110 English language students (98 female, 12 male) aged 17-24 at Vali-e-Asr University of Rafsanjan. Four raters scored the WDCT by using a standardized rubric developed by Lui (2004). The DSAT was self-assessed by test takers based on the same rubric. The findings revealed no significant difference between the WDCT and DSAT test types. However, items and the interaction between items and test takers emerged as substantial contributors to the variance of the scores. This highlights the importance of item calibration and rater training to mitigate bias in pragmatic testing. Finally, the implications were discussed.

Keywords

Main Subjects

English language assessment/testing

References

Ahn, R. C. (2005). Five measures of interlanguage pragmatics in KFL (Korean as a foreign language) learners. Unpublished PhD thesis, University of Hawaii at Manoa. https://www.proquest.com/openview/b77e6b2a157cc7f064eef80369123ad8/1?pq-origsite=gscholar&cbl=18750&diss=y

Akhavan Masoumi, G., & Sadeghi, K. (2020). Impact of test format on vocabulary test performance of EFL learners: the role of gender. Language Testing in Asia, 10(1), 2.

Alemi, M., & Rezanejad, A. (2014). Native and non-native English teachers' rating criteria and variation in the assessment of L2 pragmatic production: The speech act of compliment. Issues in Language Teaching, 3(1), 88-65. https://ilt.atu.ac.ir/article_1374.html

Anthony, C. J., Styck, K. M., Volpe, R. J., & Robert, C. R. (2023). Using many-facet Rasch measurement and generalizability theory to explore rater effects for direct behavior rating–multi-item scales. School Psychology, 38(2), 119 –128.. https://doi.org/10.1037/spq0000518

Azizi, Z., & Namaziandost, E. (2023). Implementing Peer-Dynamic Assessment to Cultivate Iranian EFL Learners' Interlanguage Pragmatic Competence: A Mixed-Methods Approach. International Journal of Language Testing, 13(1), 18-43.

Bachman, L. F., & Palmer, A. S. (1982). The construct validation of some components of communicative proficiency. TESOL Quarterly, 16(4), 449-465. https://doi.org/10.2307/3586464

Bardovi-Harlig, K., & Hartford, B. S. (1993). Learning the rules of academic talk: A longitudinal study of pragmatic change. Studies in Second Language Acquisition, 15(3), 279-304. https://doi.org/10.1017/S0272263100012122

Bardovi-Harlig, K., & Su, Y. (2023). Developing an empirically-driven aural multiple-choice DCT for conventional expressions in L2 pragmatics. Applied Pragmatics, 5(1), 1-40. https://doi.org/10.1075/ap.20020.bar

Beltran,j. (2019). A Meaning-Based Multiple-Choice test of pragmatic knowledge: Does It Work?. Studies in Applied Linguistics and TESOL 19(1):42-71. https://doi.org/10.7916/salt.v19i1.1407

Billmyer, K., & Varghese, M. (2000). Investigating instrument-based pragmatic variability: Effects of enhancing discourse completion tests. Applied Linguistics, 21(4), 517-552. https://doi.org/10.1093/applin/21.4.517

Brown, J. D. (2001). Six types of pragmatics tests in two different contexts. In K. Rose & G. Kasper (Eds.), Pragmatics in Language Teaching (pp.301-325). New York: Cambridge University Press.

Brown, A. (1995). The effect of rater variables in the development of an occupation-specific language performance test. Language Testing, 12(1), 1–15. https://doi.org/10.1177/026553229501200101

Brown, J. D. (2008). Raters, functions, item types, and the dependability of L2 pragmatics tests. Investigating pragmatics in foreign language learning, teaching and testing, 30, 224-48.

Brown, J. D., & Ahn, R. C. (2011). Variables that affect the dependability of L2 pragmatics tests. Journal of Pragmatics, 43(1), 198-217. https://doi.org/10.1016/j.pragma.2010.07.026.

Budeng, R. B., & Merza, H. N. M. (2023). Assessing Interlanguage Pragmatic Competence on Speech Acts in a Filipino ESL Context. Corpus Pragmatics, 7(2), 85-102. https://doi.org/10.1007/s41701-023-00137-y

Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language testing, 32(3), 385-405. https://doi.org/10.1177/0265532214565386

Cardinet, J., Johnson, S., & Pini, G. (2011). Applying generalizability theory using EduG. Routledge.

Chen, Y. S., & Liu, J. (2016). Constructing a scale to assess L2 written speech act performance: WDCT and e-mail tasks. Language Assessment Quarterly, 13(3), 231-250. https://doi.org/10.1080/15434303.2016.1213844

Cordier, R., Munro, N., Wilkes-Gillan, S., Speyer, R., Parsons, L., & Joosten, A. (2019). Applying Item Response Theory (IRT) modeling to an observational measure of childhood pragmatics: The pragmatics observational measure-2. Frontiers in Psychology, 10, 408. https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2019.00408/full

Cohen, A. D. (2020). Considerations in assessing pragmatic appropriateness in spoken language. Language Teaching, 53(2), 183-202.

Derakhshan, A., Shakki, F., & Sarani, M. A. (2020). The effect of dynamic and non-dynamic assessment on the comprehension of Iranian intermediate EFL learners' speech acts of apology and request. Language Related Research, 11(4), 605-637. https://lrr.modares.ac.ir/article-14-40648-en.html.

Engelhard Jr, G., & Wind, S. (2017). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge.

Enochs, K., & Yoshitake-Strain, S. (1999). Evaluating six measures of EFL learners' pragmatic competence. JALT Journal, 21(1), 29-50. https://files.eric.ed.gov/fulltext/ED451718.pdf#page=32.

Farashaiyan, A., Sahragard, R., Muthusamy, P., & Muniandy, R. (2020). Questionnaire development and validation of interlanguage pragmatic instructional approaches & techniques in EFL contexts. International Journal of Higher Education, 9(2), 330-342. https://eric.ed.gov/?id=EJ1255710.

Fussman, S., & Mashal, N. (2022). Initial validation for the Assessment of Pragmatic Abilities and Cognitive Substrates (APACS) Hebrew battery in adolescents and young adults with typical development. Frontiers in Communication, 6, 758384. https://doi.org/10.3389/fcomm.2021.758384

Dabbagh, A., & Babaii, E. (2021). L1 pragmatic cultural schema and pragmatic assessment: Variations in non-native teachers' scoring criteria. TESL-EJ, 25(1), 1-17. https://eric.ed.gov/?id=EJ1302438.

Grabowski, K. (2008). Measuring pragmatic knowledge: Issues of construct underrepresentation or labeling. Language Assessment Quarterly, 5, 154-159. https://doi.org/10.1080/15434300801934736

Gordon, R. A., Peng, F., Curby, T. W., & Zinsser, K. M. (2021). An introduction to the many-facet Rasch model as a method to improve observational quality measures with an application to measuring the teaching of emotion skills. Early Childhood Research Quarterly, 55, 149-164. https://doi.org/10.1016/j.ecresq.2020.11.005

Han, C. (2021). Detecting and measuring rater effects in interpreting assessment: A methodological comparison of classical test theory, generalizability theory, and many-facet rasch measurement. Testing and Assessment of Interpreting: Recent Developments in China, 85-113.

Hernández, T. A. (2018). L2 Spanish apologies development during short-term study abroad. Studies in Second Language Learning and Teaching, 8(3), 599-620. https://www.ceeol.com/search/article-detail?id=690173

Hernández, T. A., & Boero, P. (2018). Explicit intervention for Spanish pragmatic development during short‐term study abroad: An examination of learner request production and cognition. Foreign Language Annals, 51(2), 389-410. https://doi.org/10.1111/flan.12334

Hernández, T. A. (2021). Explicit instruction for the development of L2 Spanish pragmatic ability during study abroad. System, 96, 102395. https://doi.org/10.1016/j.system.2020.102395

Hudson, T., Detmer, E., & Brown, J. D. (1992). A framework for testing cross-cultural pragmatics (Vol. 2). Natl Foreign Lg Resource Ctr.

Iramaneerat, C., Yudkowsky, R., Myford, C. M., & Downing, S. M. (2008). Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement. Advances in Health Sciences Education, 13, 479-493.

Karami, H. (2012). The relative impact of persons, items, subtests, and academic background on performance on a language proficiency test. Psychological Test and Assessment Modeling, 54(3), 211. https://ptam-journal.com/wp-content/uploads/2025/01/04_Ravand_.pdf

Kang, O., Rubin, D., & Kermad, A. (2019). The effect of training and rater differences on oral proficiency assessment. Language Testing, 36(4), 481-504. https://doi.org/10.1177/0265532219849522

Kecskes, I. (2014). Intercultural pragmatics (Vol. 288). Oxford: Oxford University Press.

Khodi, A. (2021). The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution. Language Testing in Asia, 11(1), 30. https://doi.org/10.1186/s40468-021-00134-5

Kumar, D., Jaipurkar, R., Shekhar, A., Sikri, G., & Srinivas, V. (2021). Item analysis of multiple choice questions: A quality assurance test for an assessment tool. Medical Journal Armed Forces India, 77, S85-S89.

Lozano-Ruiz, A., Fasfous, A. F., Ibanez-Casas, I., Cruz-Quintana, F., Perez-Garcia, M., & Pérez-Marfil, M. N. (2021). Cultural bias in intelligence assessment using a culture-free test in Moroccan children. Archives of Clinical Neuropsychology, 36(8), 1502-1510.

Li, G., Pan, Y., & Wang, W. (2021). Using generalizability theory and many-facet Rasch model to evaluate in-basket tests for managerial positions. Frontiers in Psychology, 12, 660553. https://doi.org/10.3389/fpsyg.2021.660553

Li, S., Li, X., Feng, Y., & Wen, T. (2023). Non-expert raters' scoring behavior and cognition in assessing pragmatic production in L2 Chinese. In Crossing Boundaries in Researching, Understanding, and Improving Language Education: Essays in Honor of G. Richard Tucker (pp. 79-102). Cham: Springer International Publishing. https://link.springer.com/chapter/10.1007/978-3-031-24078-2_4.

Li, S., Taguchi, N., & Xiao, F. (2019). Variations in rating scale functioning in assessing pragmatic performance in L2 Chinese. Language Assessment Quarterly, 16(3), 271–293. https://doi.org/10.1080/15434303.2019.1648473.

Li, S., Wen, T., Li, X., Feng, Y., & Lin, C. (2023). Comparing holistic and analytic marking methods in assessing speech act production in L2 Chinese. Language Testing, 40(2), 249-275. https://doi.org/10.1177/026553222211139.

Liu, J. (2007). Comparing native and non-native speakers' scoring in an interlanguage pragmatics test. Modern Foreign Languages, 30(4), 395-404.

Liu, J., & Xie, L. (2014). Examining rater effects in a WDCT pragmatics test. Iranian Journal of Language Testing, 4(1), 50-65. https://www.ijlt.ir/article_114393.html.

Liu, J. (2004). Measuring interlanguage pragmatic knowledge of Chinese EFL learners (Doctoral dissertation, City University of Hong Kong). https://www.peterlang.com/document/1100119.

Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158-180. https://doi.org/10.1177/026553229801500202

Mohammad Hosseinpur, R., Bagheri Nevisi, R., & Lowni, A. (2021). A tale of four measures of pragmatic knowledge in an EFL institutional context. Pragmatics, 31(1), 114-143. https://doi.org/10.1075/prag.18052.moh

Namaziandost, E., Nasri, M., Rahimi Esfahani, F., Neisi, L., & Ahmadpour KarimAbadi, F. (2020). A cultural comparison of Persian and English short stories regarding the use of emotive words: implications for teaching English to Iranian young learners. Asian-Pacific Journal of Second and Foreign Language Education, 5(1), 7. https://doi.org/ 10.1186/s40862-020-00085-z

Neiriz, R. (2023). Developing and evaluating a contextualized interactional competence rating scale based on a metaphorical conceptualization: A pragmatic mixed-method approach. Journal of Second Language Studies, 6(1), 61-94. https://doi.org/10.1075/jsls.22003.nei

Reynolds, C. R., Altmann, R. A., & Allen, D. N. (2021). The problem of bias in psychological assessment. In Mastering modern psychological testing: Theory and methods (pp. 573-613). Cham: Springer International Publishing.

Richard, P. J., Devinney, T. M., Yip, G. S., & Johnson, G. (2009). Measuring organizational performance: Towards methodological best practice. Journal of Management, 35(3), 718-804. https://doi.org/10.1177/0149206308330560

Rose, K. R. (1992). Speech acts and questionnaires: The effect of hearer response. Journal of Pragmatics, 17(1), 49-62. https://doi.org/10.1016/0378-2166(92)90028-A

Rose, K. R. (1994). On the Validity of Discourse Completion Tests in Non-Western Contexts. Applied Linguistics, 15(1), 1-14. https://doi.org/10.1093/applin/15.1.1

Rose, K. R., & Ng, C. (2001). Inductive and deductive teaching of compliments and compliment responses. Pragmatics in Language Teaching, 145(1), 145-170. https://www.researchgate.net/publication/265288342_Inductive_and_deductive_approaches_to_teaching_compliments_and_compliment_responses.

Rose, K. R., & Ono, R. (1995). Eliciting speech act data in Japanese: The effect of questionnaire type. Language Learning, 45(2), 191-223. https://doi.org/10.1111/j.1467-1770.1995.tb00438.x

Roever, C. (2008). Rater, item, and candidate effects in discourse completion tests: A FACETS approach. In E.A. Soler and A.M. Flor (eds.) Investigating pragmatics in foreign language learning, teaching and testing (pp. 249–266). Clevedon, UK: Multilingual Matters. https://books.google.nl/books

Roever, C. (2011). Testing of second language pragmatics: Past and future. Language Testing, 28(4), 463-481. https://doi.org/10.1177/0265532210394633

Roever, C. (2013). Testing implicature under operational conditions. In Assessing second language pragmatics (pp. 43-64). London: Palgrave Macmillan UK. https://link.springer.com/chapter/10.1057/9781137003522_2

Rossi, O. & Tineke, B. (2020). Raters of Subjectively‐Scored Tests. English Language Teaching, 1-7. https://doi.org/10.1002/9781118784235.eelt0985

Saleem, A., Saleem, T., & Aziz, A. (2022). A pragmatic study of congratulation strategies of Pakistani ESL learners and British English speakers. Asian-Pacific Journal of Second and Foreign Language Education, 7(1), 8. https://doi.org/10.1186/s40862-022-00134-9

Shahi, R., Ravand, H. & Rohani, G. R. (2025). Examining the Effect of Item Difficulty and Rater Leniency on Iranian Test Takers' Performance on WDCT and DSAT: A Comparative Study. International Journal of Language Testing, 15(1), 1-19. doi: 10.22034/ijlt.2024.454478.1341

Sonnenburg-Winkler, S. L., Eslami, Z. R., & Derakhshan, A. (2020). Rater variation in pragmatic assessment: The impact of the linguistic background on peer-assessment and self-assessment. Lodz Papers in Pragmatics, 16(1), 67-85. https://doi.org/10.1515/lpp-2020-0004

Steyer, R. (2001). Classical (psychometric) test theory. International Encyclopedia of the Social & Behavioral Sciences. 1955-1962. https://doi.org/ 10.1016. B0-08-043076-7/00721-X.

Sitorus, T. A. P., Siregar, D. Y., Aulia, D. N., Zahra, N. A., Parinduri, A. I., Lubis, D. N. A., & Wardiah, F. D. (2025). A Systematic Review of Pragmatic Competence in Second Language Acquisition. Sintaksis: Publikasi Para ahli Bahasa dan Sastra Inggris, 3(1), 142-152. https://doi.org/10.61132/sintaksis.v3i1.1291

Su, Y., & Shin, S. Y. (2024). Comparing two formats of data-driven rating scales for classroom assessment of pragmatic performance with role-plays. Language Testing, 41(2), 357-383. https://doi.org/10.1177/02655322231210217

Sydorenko, T., Maynard, C., & Guntly, E. (2014). Rater behavior when judging language learners' pragmatic appropriateness in extended discourse. TESL Canada Journal, 32(1), 19–41. https://doi.org/doi:10.18806/tesl.v32i1.1197

Taguchi, N. (2011). Rater variation in the assessment of speech acts. Pragmatics. Quarterly Publication of the International Pragmatics Association (IPrA), 21(3), 453-471. https://doi.org/10.1075/prag.21.3.08tag

Taguchi, N., & Li, S. (2020). Contrastive pragmatics and second language (L2) pragmatics: Approaches to assessing L2 speech act production. Contrastive Pragmatics, 2(1), 1-23. https://brill.com/view/journals/jocp/2/1/article-p1_1.xml

Tajeddin, Z., & Alemi, M. (2014). Pragmatic rater training: Does It affect non-native L2 teachers' rating accuracy and bias?. International Journal of Language Testing, 4(1), 66-83. https://www.ijlt.ir/article_114394.html

Tajeddin, Z., Alemi, M., & Khanlarzadeh, N. (2020). Rating Criteria and Norms for Pragmatic Assessment in the Context of EIL. In Pragmatics Pedagogy in English as an International Language (pp. 212-231). Routledge.

Timpe-Laughlin, V., & Choi, I. (2017). Exploring the validity of a second language intercultural pragmatics assessment tool. Language Assessment Quarterly, 14(1), 19-35. https://doi.org/10.1080/15434303.2016.1256406

Toe, D., Mood, D., Most, T., Walker, E., & Tucci, S. (2020). The assessment of pragmatic skills in young deaf and hard-of-hearing children. Pediatrics, 146(Supplement_3), S284-S291.

Walters, F. S. (2007). A conversation-analytic hermeneutic rating protocol to assess L2 oral pragmatic competence. Language Testing, 24(2), 155-183. https://doi.org/10.1177/0265532207076362

Wolcott, M. D., Olsen, A. A., & Augustine, J. M. (2022). Item response theory in high-stakes pharmacy assessments. Currents in Pharmacy Teaching and Learning, 14(9), 1206-1214.

Wilson, A. C., & Bishop, D. V. (2022). A novel online assessment of pragmatic and core language skills: An attempt to tease apart language domains in children. Journal of Child Language, 49(1), 38-59

Xu, L., & Wannaruk, A. (2018). Reliability and validity of WDCT in testing interlanguage pragmatic competence for EFL learners. Journal of Language Teaching and Research, 6(6), 1206-1215. https://doi.org/10.17507/jltr.0606.07

Yang, H. (2022). Second language learners' competence of and beliefs about pragmatic comprehension: Insights from the Chinese EFL context. Frontiers in Psychology, 12, 801315. https://doi.org/10.3389/fpsyg.2021.801315

Youn, S. J. (2007). Rater bias in assessing the pragmatics of KFL learners using facets analysis. Second Language Studies 26(1): 85–163. http://hdl.handle.net/10125/40691

Youn, S. J. (2015). Validity argument for assessing L2 pragmatics in interaction using mixed methods. Language Testing, 32(2), 199-225.
https://doi.org/10.1177/026553221455711

Youn, S. J. (2020). Interactional features of L2 pragmatic interaction in role‐play speaking assessment. TESOL Quarterly, 54(1), 201-233. https://doi.org/10.1002/tesq.542

Youn, S. J., & Bi, N. Z. (2019). Investigating test-takers' strategy use in task-based L2 pragmatic speaking assessment. Intercultural Pragmatics, 16(2), 185-218.

Youn, S. J., & Brown, J. D. (2013). Item difficulty and heritage language learner status in pragmatic tests for Korean as a foreign language. Assessing second language pragmatics (pp. 98-123). London: Palgrave Macmillan UK. https://link.springer.com/chapter/10.1057/9781137003522_4.

Yamashita, S.O. (1996). Comparing six cross-cultural pragmatics measures. Unpublished doctoral dissertation, Temple University, Philadelphia, PA. https://www.proquest.com/openview/a45390785a21b1a799ba10f4e346bced/1?pq-origsite=gscholar&cbl=18750&diss=y

Yamashita, S. O. (1997). Self-assessment and role play methods of measuring cross-cultural pragmatics. Pragmatics and Language Learning, 8(1), 129-162. https://scholarspace.manoa.hawaii.edu/collections/abc81e47-c948-4d15-9284-783942d637cd

Zangoei, A., & Derakhshan, A. (2021). Measuring the predictability of Iranian EFL students' pragmatic listening comprehension with language proficiency, self-regulated learning in listening, and willingness to communicate. Journal of Applied Linguistics and Applied Literature: Dynamics and Advances, 9(2), 79-104.

Zhai., X, Kevin, C., Haudek., Chris, H., Wilson., Molly, Stuhlsatz. (2021). A Framework of Construct-Irrelevant Variance for Contextualized Constructed Response Assessment. Frontiers in Education, 6 doi: 10.3389/FEDUC.2021.751283

Teaching English as a Second Language Quarterly

Analyzing Dependability and Bias in WDCT and DSAT using Many-Facet Rasch and G-Theory

References

Volume 45, Issue 1
2026
Pages 1-28

Files

History

Share

How to cite

Statistics

Analyzing Dependability and Bias in WDCT and DSAT using Many-Facet Rasch and G-Theory

References

Volume 45, Issue 1 2026Pages 1-28

Files

History

Share

How to cite

Statistics

Volume 45, Issue 1
2026
Pages 1-28