Analyzing Dependability and Bias in WDCT and DSAT using Many-Facet Rasch and G-Theory

Document Type : Research Paper

Authors

1 Faculty of Foreign Languages, Ilam University, Ilam , Iran

2 Vali-e-asr University of Rafsanjan, Kerman, Iran, Iran

Abstract

Pragmatic testing depends on a variety of factors that can impact its dependability. This study intended to examine these factors using a two-phase approach. The first phase examined the impact of test methods, items, raters, and test-takers' characteristics on the variance in pragmatic test scores using generalizability theory, and the second phase explored potential rater bias using the Many-Facet Rasch model. Two test types, including a Written Discourse Completion Test (WDCT) and a Discourse Self-Assessment Test (DSAT), were administered to 110 English language students (98 female, 12 male) aged 17-24 at Vali-e-Asr University of Rafsanjan. Four raters scored the WDCT by using a standardized rubric developed by Lui (2004). The DSAT was self-assessed by test takers based on the same rubric. The findings revealed no significant difference between the WDCT and DSAT test types. However, items and the interaction between items and test takers emerged as substantial contributors to the variance of the scores. This highlights the importance of item calibration and rater training to mitigate bias in pragmatic testing. Finally, the implications were discussed.

Keywords

Main Subjects


Ahn, R. C. (2005). Five measures of interlanguage pragmatics in KFL (Korean as a foreign language) learners. Unpublished PhD thesis, University of Hawaii at Manoa. https://www.proquest.com/openview/b77e6b2a157cc7f064eef80369123ad8/1?pq-origsite=gscholar&cbl=18750&diss=y
Akhavan Masoumi, G., & Sadeghi, K. (2020). Impact of test format on vocabulary test performance of EFL learners: the role of gender. Language Testing in Asia10(1), 2.
Alemi, M., & Rezanejad, A. (2014). Native and non-native English teachers' rating criteria and variation in the assessment of L2 pragmatic production: The speech act of compliment. Issues in Language Teaching3(1), 88-65. https://ilt.atu.ac.ir/article_1374.html
Anthony, C. J., Styck, K. M., Volpe, R. J., & Robert, C. R. (2023). Using many-facet Rasch measurement and generalizability theory to explore rater effects for direct behavior rating–multi-item scales. School Psychology38(2), 119 –128.. https://doi.org/10.1037/spq0000518
Azizi, Z., & Namaziandost, E. (2023). Implementing Peer-Dynamic Assessment to Cultivate Iranian EFL Learners' Interlanguage Pragmatic Competence: A Mixed-Methods Approach. International Journal of Language Testing13(1), 18-43.      
Bachman, L. F., & Palmer, A. S. (1982). The construct validation of some components of communicative proficiency. TESOL Quarterly16(4), 449-465. https://doi.org/10.2307/3586464
Bardovi-Harlig, K., & Hartford, B. S. (1993). Learning the rules of academic talk: A longitudinal study of pragmatic change. Studies in Second Language Acquisition15(3), 279-304. https://doi.org/10.1017/S0272263100012122
Bardovi-Harlig, K., & Su, Y. (2023). Developing an empirically-driven aural multiple-choice DCT for conventional expressions in L2 pragmatics. Applied Pragmatics5(1), 1-40. https://doi.org/10.1075/ap.20020.bar
Beltran,j. (2019). A Meaning-Based Multiple-Choice test of pragmatic knowledge: Does It Work?. Studies in Applied Linguistics and TESOL 19(1):42-71. https://doi.org/10.7916/salt.v19i1.1407
Billmyer, K., & Varghese, M. (2000). Investigating instrument-based pragmatic variability: Effects of enhancing discourse completion tests. Applied Linguistics21(4), 517-552. https://doi.org/10.1093/applin/21.4.517
Brown, J. D. (2001). Six types of pragmatics tests in two different contexts. In K. Rose & G. Kasper (Eds.), Pragmatics in Language Teaching (pp.301-325). New York: Cambridge University Press.
Brown, A. (1995). The effect of rater variables in the development of an occupation-specific language performance test. Language Testing, 12(1), 1–15. https://doi.org/10.1177/026553229501200101
Brown, J. D. (2008). Raters, functions, item types, and the dependability of L2 pragmatics tests. Investigating pragmatics in foreign language learning, teaching and testing30, 224-48.
Brown, J. D., & Ahn, R. C. (2011). Variables that affect the dependability of L2 pragmatics tests. Journal of Pragmatics43(1), 198-217. https://doi.org/10.1016/j.pragma.2010.07.026.
Budeng, R. B., & Merza, H. N. M. (2023). Assessing Interlanguage Pragmatic Competence on Speech Acts in a Filipino ESL Context. Corpus Pragmatics7(2), 85-102. https://doi.org/10.1007/s41701-023-00137-y
Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language testing32(3), 385-405. https://doi.org/10.1177/0265532214565386
Cardinet, J., Johnson, S., & Pini, G. (2011). Applying generalizability theory using EduG. Routledge.
Chen, Y. S., & Liu, J. (2016). Constructing a scale to assess L2 written speech act performance: WDCT and e-mail tasks. Language Assessment Quarterly13(3), 231-250. https://doi.org/10.1080/15434303.2016.1213844
Cordier, R., Munro, N., Wilkes-Gillan, S., Speyer, R., Parsons, L., & Joosten, A. (2019). Applying Item Response Theory (IRT) modeling to an observational measure of childhood pragmatics: The pragmatics observational measure-2. Frontiers in Psychology10, 408. https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2019.00408/full
Cohen, A. D. (2020). Considerations in assessing pragmatic appropriateness in spoken language. Language Teaching53(2), 183-202.
Derakhshan, A., Shakki, F., & Sarani, M. A. (2020). The effect of dynamic and non-dynamic assessment on the comprehension of Iranian intermediate EFL learners' speech acts of apology and request. Language Related Research11(4), 605-637. https://lrr.modares.ac.ir/article-14-40648-en.html.
Engelhard Jr, G., & Wind, S. (2017). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge.
Enochs, K., & Yoshitake-Strain, S. (1999). Evaluating six measures of EFL learners' pragmatic competence. JALT Journal21(1), 29-50. https://files.eric.ed.gov/fulltext/ED451718.pdf#page=32.
Farashaiyan, A., Sahragard, R., Muthusamy, P., & Muniandy, R. (2020). Questionnaire development and validation of interlanguage pragmatic instructional approaches & techniques in EFL contexts. International Journal of Higher Education9(2), 330-342. https://eric.ed.gov/?id=EJ1255710.
Fussman, S., & Mashal, N. (2022). Initial validation for the Assessment of Pragmatic Abilities and Cognitive Substrates (APACS) Hebrew battery in adolescents and young adults with typical development. Frontiers in Communication6, 758384. https://doi.org/10.3389/fcomm.2021.758384
Dabbagh, A., & Babaii, E. (2021). L1 pragmatic cultural schema and pragmatic assessment: Variations in non-native teachers' scoring criteria. TESL-EJ25(1), 1-17. https://eric.ed.gov/?id=EJ1302438.
Grabowski, K. (2008). Measuring pragmatic knowledge: Issues of construct underrepresentation or labeling. Language Assessment Quarterly, 5, 154-159. https://doi.org/10.1080/15434300801934736
Gordon, R. A., Peng, F., Curby, T. W., & Zinsser, K. M. (2021). An introduction to the many-facet Rasch model as a method to improve observational quality measures with an application to measuring the teaching of emotion skills. Early Childhood Research Quarterly55, 149-164. https://doi.org/10.1016/j.ecresq.2020.11.005
Han, C. (2021). Detecting and measuring rater effects in interpreting assessment: A methodological comparison of classical test theory, generalizability theory, and many-facet rasch measurement. Testing and Assessment of Interpreting: Recent Developments in China, 85-113.
Hernández, T. A. (2018). L2 Spanish apologies development during short-term study abroad. Studies in Second Language Learning and Teaching8(3), 599-620. https://www.ceeol.com/search/article-detail?id=690173
Hernández, T. A., & Boero, P. (2018). Explicit intervention for Spanish pragmatic development during short‐term study abroad: An examination of learner request production and cognition. Foreign Language Annals51(2), 389-410. https://doi.org/10.1111/flan.12334
Hernández, T. A. (2021). Explicit instruction for the development of L2 Spanish pragmatic ability during study abroad. System96, 102395. https://doi.org/10.1016/j.system.2020.102395
Hudson, T., Detmer, E., & Brown, J. D. (1992). A framework for testing cross-cultural pragmatics (Vol. 2). Natl Foreign Lg Resource Ctr.
Iramaneerat, C., Yudkowsky, R., Myford, C. M., & Downing, S. M. (2008). Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement. Advances in Health Sciences Education13, 479-493.
Karami, H. (2012). The relative impact of persons, items, subtests, and academic background on performance on a language proficiency test. Psychological Test and Assessment Modeling, 54(3), 211. https://ptam-journal.com/wp-content/uploads/2025/01/04_Ravand_.pdf
Kang, O., Rubin, D., & Kermad, A. (2019). The effect of training and rater differences on oral proficiency assessment. Language Testing36(4), 481-504. https://doi.org/10.1177/0265532219849522
Kecskes, I. (2014). Intercultural pragmatics (Vol. 288). Oxford: Oxford University Press.
Khodi, A. (2021). The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution. Language Testing in Asia11(1), 30. https://doi.org/10.1186/s40468-021-00134-5
Kumar, D., Jaipurkar, R., Shekhar, A., Sikri, G., & Srinivas, V. (2021). Item analysis of multiple choice questions: A quality assurance test for an assessment tool. Medical Journal Armed Forces India77, S85-S89.
Lozano-Ruiz, A., Fasfous, A. F., Ibanez-Casas, I., Cruz-Quintana, F., Perez-Garcia, M., & Pérez-Marfil, M. N. (2021). Cultural bias in intelligence assessment using a culture-free test in Moroccan children. Archives of Clinical Neuropsychology36(8), 1502-1510.
Li, G., Pan, Y., & Wang, W. (2021). Using generalizability theory and many-facet Rasch model to evaluate in-basket tests for managerial positions. Frontiers in Psychology12, 660553. https://doi.org/10.3389/fpsyg.2021.660553
Li, S., Li, X., Feng, Y., & Wen, T. (2023). Non-expert raters' scoring behavior and cognition in assessing pragmatic production in L2 Chinese. In Crossing Boundaries in Researching, Understanding, and Improving Language Education: Essays in Honor of G. Richard Tucker (pp. 79-102). Cham: Springer International Publishing. https://link.springer.com/chapter/10.1007/978-3-031-24078-2_4.
Li, S., Taguchi, N., & Xiao, F. (2019). Variations in rating scale functioning in assessing pragmatic performance in L2 Chinese. Language Assessment Quarterly, 16(3), 271–293. https://doi.org/10.1080/15434303.2019.1648473.
Li, S., Wen, T., Li, X., Feng, Y., & Lin, C. (2023). Comparing holistic and analytic marking methods in assessing speech act production in L2 Chinese. Language Testing, 40(2), 249-275. https://doi.org/10.1177/026553222211139.
Liu, J. (2007). Comparing native and non-native speakers' scoring in an interlanguage pragmatics test. Modern Foreign Languages30(4), 395-404.
Liu, J., & Xie, L. (2014). Examining rater effects in a WDCT pragmatics test. Iranian Journal of Language Testing4(1), 50-65. https://www.ijlt.ir/article_114393.html.
Liu, J. (2004). Measuring interlanguage pragmatic knowledge of Chinese EFL learners (Doctoral dissertation, City University of Hong Kong). https://www.peterlang.com/document/1100119.
Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing15(2), 158-180. https://doi.org/10.1177/026553229801500202
Mohammad Hosseinpur, R., Bagheri Nevisi, R., & Lowni, A. (2021).  A tale of four measures of pragmatic knowledge in an EFL institutional context. Pragmatics, 31(1), 114-143.  https://doi.org/10.1075/prag.18052.moh
Namaziandost, E., Nasri, M., Rahimi Esfahani, F., Neisi, L., & Ahmadpour KarimAbadi, F. (2020). A cultural comparison of Persian and English short stories regarding the use of emotive words: implications for teaching English to Iranian young learners. Asian-Pacific Journal of Second and Foreign Language Education5(1), 7. https://doi.org/ 10.1186/s40862-020-00085-z
Neiriz, R. (2023). Developing and evaluating a contextualized interactional competence rating scale based on a metaphorical conceptualization: A pragmatic mixed-method approach. Journal of Second Language Studies6(1), 61-94. https://doi.org/10.1075/jsls.22003.nei
Reynolds, C. R., Altmann, R. A., & Allen, D. N. (2021). The problem of bias in psychological assessment. In Mastering modern psychological testing: Theory and methods (pp. 573-613). Cham: Springer International Publishing.
Richard, P. J., Devinney, T. M., Yip, G. S., & Johnson, G. (2009). Measuring organizational performance: Towards methodological best practice. Journal of Management35(3), 718-804. https://doi.org/10.1177/0149206308330560
Rose, K. R. (1992). Speech acts and questionnaires: The effect of hearer response. Journal of Pragmatics17(1), 49-62. https://doi.org/10.1016/0378-2166(92)90028-A
Rose, K. R. (1994). On the Validity of Discourse Completion Tests in Non-Western Contexts. Applied Linguistics15(1), 1-14. https://doi.org/10.1093/applin/15.1.1
Rose, K. R., & Ng, C. (2001). Inductive and deductive teaching of compliments and compliment responses. Pragmatics in Language Teaching145(1), 145-170. https://www.researchgate.net/publication/265288342_Inductive_and_deductive_approaches_to_teaching_compliments_and_compliment_responses.
Rose, K. R., & Ono, R. (1995). Eliciting speech act data in Japanese: The effect of questionnaire type. Language Learning45(2), 191-223. https://doi.org/10.1111/j.1467-1770.1995.tb00438.x
Roever, C. (2008). Rater, item, and candidate effects in discourse completion tests: A FACETS approach. In E.A. Soler and A.M. Flor (eds.) Investigating pragmatics in foreign language learning, teaching and testing (pp. 249–266). Clevedon, UK: Multilingual Matters. https://books.google.nl/books
Roever, C. (2011). Testing of second language pragmatics: Past and future. Language Testing28(4), 463-481. https://doi.org/10.1177/0265532210394633
Roever, C. (2013). Testing implicature under operational conditions. In Assessing second language pragmatics (pp. 43-64). London: Palgrave Macmillan UK. https://link.springer.com/chapter/10.1057/9781137003522_2
Rossi, O. & Tineke, B. (2020). Raters of Subjectively‐Scored Tests. English Language Teaching, 1-7. https://doi.org/10.1002/9781118784235.eelt0985
Saleem, A., Saleem, T., & Aziz, A. (2022). A pragmatic study of congratulation strategies of Pakistani ESL learners and British English speakers. Asian-Pacific Journal of Second and Foreign Language Education7(1), 8. https://doi.org/10.1186/s40862-022-00134-9
Shahi, R., Ravand, H. & Rohani, G. R. (2025). Examining the Effect of Item Difficulty and Rater Leniency on Iranian Test Takers' Performance on WDCT and DSAT: A Comparative Study. International Journal of Language Testing15(1), 1-19. doi: 10.22034/ijlt.2024.454478.1341
Sonnenburg-Winkler, S. L., Eslami, Z. R., & Derakhshan, A. (2020). Rater variation in pragmatic assessment: The impact of the linguistic background on peer-assessment and self-assessment. Lodz Papers in Pragmatics16(1), 67-85. https://doi.org/10.1515/lpp-2020-0004
Steyer, R. (2001). Classical (psychometric) test theory. International Encyclopedia of the Social & Behavioral Sciences. 1955-1962. https://doi.org/ 10.1016. B0-08-043076-7/00721-X.
Sitorus, T. A. P., Siregar, D. Y., Aulia, D. N., Zahra, N. A., Parinduri, A. I., Lubis, D. N. A., & Wardiah, F. D. (2025). A Systematic Review of Pragmatic Competence in Second Language Acquisition. Sintaksis: Publikasi Para ahli Bahasa dan Sastra Inggris3(1), 142-152. https://doi.org/10.61132/sintaksis.v3i1.1291
Su, Y., & Shin, S. Y. (2024). Comparing two formats of data-driven rating scales for classroom assessment of pragmatic performance with role-plays. Language Testing41(2), 357-383. https://doi.org/10.1177/02655322231210217
Sydorenko, T., Maynard, C., & Guntly, E. (2014). Rater behavior when judging language learners' pragmatic appropriateness in extended discourse. TESL Canada Journal, 32(1), 19–41. https://doi.org/doi:10.18806/tesl.v32i1.1197
Taguchi, N. (2011). Rater variation in the assessment of speech acts. Pragmatics. Quarterly Publication of the International Pragmatics Association (IPrA)21(3), 453-471. https://doi.org/10.1075/prag.21.3.08tag
Taguchi, N., & Li, S. (2020). Contrastive pragmatics and second language (L2) pragmatics: Approaches to assessing L2 speech act production. Contrastive Pragmatics2(1), 1-23. https://brill.com/view/journals/jocp/2/1/article-p1_1.xml
Tajeddin, Z., & Alemi, M. (2014). Pragmatic rater training: Does It affect non-native L2 teachers' rating accuracy and bias?. International Journal of Language Testing4(1), 66-83. https://www.ijlt.ir/article_114394.html
Tajeddin, Z., Alemi, M., & Khanlarzadeh, N. (2020). Rating Criteria and Norms for Pragmatic Assessment in the Context of EIL. In Pragmatics Pedagogy in English as an International Language (pp. 212-231). Routledge.
Timpe-Laughlin, V., & Choi, I. (2017). Exploring the validity of a second language intercultural pragmatics assessment tool. Language Assessment Quarterly14(1), 19-35. https://doi.org/10.1080/15434303.2016.1256406
Toe, D., Mood, D., Most, T., Walker, E., & Tucci, S. (2020). The assessment of pragmatic skills in young deaf and hard-of-hearing children. Pediatrics146(Supplement_3), S284-S291.
Walters, F. S. (2007). A conversation-analytic hermeneutic rating protocol to assess L2 oral pragmatic competence. Language Testing24(2), 155-183. https://doi.org/10.1177/0265532207076362
Wolcott, M. D., Olsen, A. A., & Augustine, J. M. (2022). Item response theory in high-stakes pharmacy assessments. Currents in Pharmacy Teaching and Learning14(9), 1206-1214.
Wilson, A. C., & Bishop, D. V. (2022). A novel online assessment of pragmatic and core language skills: An attempt to tease apart language domains in children. Journal of Child Language49(1), 38-59
Xu, L., & Wannaruk, A. (2018). Reliability and validity of WDCT in testing interlanguage pragmatic competence for EFL learners. Journal of Language Teaching and Research, 6(6), 1206-1215. https://doi.org/10.17507/jltr.0606.07
Yang, H. (2022). Second language learners' competence of and beliefs about pragmatic comprehension: Insights from the Chinese EFL context. Frontiers in Psychology12, 801315. https://doi.org/10.3389/fpsyg.2021.801315
Youn, S. J. (2007). Rater bias in assessing the pragmatics of KFL learners using facets analysis. Second Language Studies 26(1): 85–163. http://hdl.handle.net/10125/40691
Youn, S. J. (2015). Validity argument for assessing L2 pragmatics in interaction using mixed methods. Language Testing32(2), 199-225.
https://doi.org/10.1177/026553221455711
Youn, S. J. (2020). Interactional features of L2 pragmatic interaction in role‐play speaking assessment. TESOL Quarterly54(1), 201-233. https://doi.org/10.1002/tesq.542
Youn, S. J., & Bi, N. Z. (2019). Investigating test-takers' strategy use in task-based L2 pragmatic speaking assessment. Intercultural Pragmatics16(2), 185-218.
Youn, S. J., & Brown, J. D. (2013). Item difficulty and heritage language learner status in pragmatic tests for Korean as a foreign language.  Assessing second language pragmatics (pp. 98-123). London: Palgrave Macmillan UK. https://link.springer.com/chapter/10.1057/9781137003522_4.
Yamashita, S.O. (1996). Comparing six cross-cultural pragmatics measures. Unpublished doctoral dissertation, Temple University, Philadelphia, PA. https://www.proquest.com/openview/a45390785a21b1a799ba10f4e346bced/1?pq-origsite=gscholar&cbl=18750&diss=y
Yamashita, S. O. (1997). Self-assessment and role play methods of measuring cross-cultural pragmatics. Pragmatics and Language Learning8(1), 129-162. https://scholarspace.manoa.hawaii.edu/collections/abc81e47-c948-4d15-9284-783942d637cd
Zangoei, A., & Derakhshan, A. (2021). Measuring the predictability of Iranian EFL students' pragmatic listening comprehension with language proficiency, self-regulated learning in listening, and willingness to communicate. Journal of Applied Linguistics and Applied Literature: Dynamics and Advances9(2), 79-104.
Zhai., X,  Kevin, C., Haudek., Chris, H., Wilson., Molly, Stuhlsatz. (2021). A Framework of Construct-Irrelevant Variance for Contextualized Constructed Response Assessment. Frontiers in Education, 6 doi: 10.3389/FEDUC.2021.751283