Abbott, M. (2007). A confirmatory approach to differential item functioning on an ESL reading assessment. Language Testing, 24(1), 7-36.
Ahmadi, A. & Darabi Bazvand, A. (2016). Gender differential item functioning on a national field-specific test: The case of PhD entrance exam of TEFL in Iran. Iranian Journal of Language Teaching Research, 4(1),63-82.
Ahmadi, A. & Jalili. T. (2014). A confirmatory study of differential item functioning on EFL reading comprehension. Applied Research on English Language, 3(2), 55-68.
Alderson, J. C. (2000). Assessing reading. Cambridge: Cambridge University Press.
Allalouf, A. & Abramzon, A. (2008). Constructing better second language assessments based on differential item functioning analysis. Language Assessment Quarterly, 5(2), 120-141.
Allalouf, A. Hambleton, R. K., & Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36(3), 185-198.
Aryadoust, V., Goh, C. C. M., & Kim, L. O. (2011). An investigation of differential item functioning in the MELAB listening test. Language Assessment Quarterly, 8, 361-385.
Banks, K. (2012). Are inferential reading items more susceptible to cultural bias than literal reading items? Applied Measurement in Education, 25, 220-245.
Barati, H., Ketabi, S. & Ahmadi, A. (2006). Differential item functioning in high-stakes tests: The effect of field of study. IJAL, 19(2), 27-42.
Bolt, S. & Thurlow, M. (2007). Item-level effects of the read-aloud accommodation for students with reading disabilities. Assessment for effective Intervention, 33, 15-28.
Brantmeier, C. (2001). Second language reading research on passage content and gender: Challenges for the intermediate-level curriculum. Foreign Language Annals, 34(4), 325-333.
Brantmeier, C. (2003). Beyond linguistics knowledge: Individual differences in second language reading. Foreign Language Annals, 36(1), 33-43.
Brantmeier, C. (2007). Adult second language reading in the USA: The effects of readers’ gender and test methods. Forum on public policy, 14, 1-34.
Chen, Z. & Henning, G. (1985). Linguistic and cultural bias in language proficiency tests. Language Testing, 2(2), 155-163.
Cheong, Y. F. (2006). Analysis of school context effects on differential item functioning using hierarchical generalized linear models. International Journal of Testing, 6(1), 57-79.
Cheong, Y. F., & Kamata, A. (2013). Centering, scale indeterminacy, and differential item functioning detection in hierarchical generalized linear and generalized linear mixed models. Applied Measurement in Education, 26, 233-252.
Cho, H-J., Lee, J., & Kingston, N. (2012). Examining the effectiveness of test accommodation using DIF and a mixture IRT model. Applied Measurement in Education, 25, 281-304.
Cohen, A. S. & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133-148.
Cohen, A. & Macaro, E. (Eds.). (2007). Language learner strategies: Thirty years of research and practice. Oxford, UK: Oxford University Press.
Dörnyei, Z. (2003). Questionnaires in second language research. Lawrence Erlbaum Associates, Inc.
Dörnyei, Z. & Skehan, P. (2003). Individual differences in L2 learning. In C. Doughty & M. Long (Eds.), The handbook of second language acquisition (pp. 589-630). Malden, MA: Blackwell Publishing.
Elder, C., McNamara, T. F., & Congdon, P. (2003). Understanding Rasch measurement: Rasch techniques for detecting bias in performance assessment: An example comparing the performance of native and non-native speakers on a test of academic English. Journal of Applied Measurement, 4, 181-197.
Elosua, P. & Lopez-Jauregui, A. (2007). Potential sources of differential item functioning in the adaptation of tests. International Journal of Testing, 7(1), 39-52.
Ercikan, K. (2002). Disentangling sources of differential item functioning in multilanguage assessments. International Journal of Testing, 2(3&4), 199-215.
Ercikan, K., Roth, W., Simon, M., Sandilands, D., & Lyons-Thomas, J. (2014). Inconsistencies in DIF detection for sub-groups in heterogeneous language groups. Applied Measurement in Education, 27, 273-285.
Ferne, T. & Rupp, A. A. (2007). A synthesis of 15 years of research on DIF in language testing: Methodological advances, challenges, and recommendations. Language Assessment Quarterly, 4(2),113-148.
Finch, W. H., Hernández Finch, M. E., & French, B. F. (2016). Recursive partitioning to identify potential causes of differential item functioning in cross-national data. International Journal of Testing, 16, 21-53.
Gόmez-Benito, J., Sirecim S., Padila, J-L., Hidalgo, M. D., & Benítez, I. (2018). Differential Item functioning: Beyond validity evidence based on internal structure. Psicothema, 30(1),104-109.
Harding, L. (2011). Accent, listening assessment and the potential for a shared-L1 advantage: A DIF perspective. Language Testing, 29(2), 163-180.
Helwig, R., Rozek-Tedesco, M. A., Tindal, G., Heath, B., & Almond, P. J. (1999). Reading as an access to mathematics problem solving on multiple-choice tests for six-grade students. Journal of Educational Research, 93, 113-125.
Hidalgo, M. D. & Gόmez-Benito, J. (2010). Differential item functioning. In P. Peterson, E. Baker, & B. McGaw, (Eds.), International encyclopedia of education, 4, (pp. 36-44). Oxford: Elsevier.
Hidalgo, M. D. & Lόpez-Pina, J. A. (2004). DIF detection and effect size: A comparison between logistic regression and Mantle-Haenszel variation. Educational and Psychological Measurement, 64, 903-915.
Jang, E. E. & Roussos, L. (2009). Integrative analytic approach to detecting and interpreting L2 vocabulary DIF. International Journal of Testing, 9(3), 238-259.
Jodoin, M. G., & Gierl, M. J. (2001). Type-one error and power rates using an effect size measure with the logistic regression for DIF detection. Applied Measurement in Education, 14, 329-349.
Kim, M. (2001). Detecting DIF across the different language groups in a speaking test. Language Testing, 18(1),89-114.
Koo, J., Becker, B. J., & Kim, Y-S. (2014). Examining differential item functioning trends for English language learners in a reading test: A meta-analytical approach. Language Testing, 31(1), 89-109.
Kunnan, A. J. (1990). DIF in native language and gender groups in an ESL placement test. TESOL Quarterly, 24, 741-746.
Le, L. T. (2009). Investigating gender differential item functioning across countries and test languages for PISA science items. International Journal of Testing, 9(2), 122-133.
Lee, H., & Geisinger, K. F. (2014). The effect of propensity scores on DIF analysis: Inference on the potential cause of DIF. International Journal of Testing, 14, 313-338.
Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and Many-Facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158-180.
Mazor, K. M., Kanjee, A., & Clause, B. E. (1995). Using logistic regression and the Mantel-Haenszel with multiple ability estimates to detect differential item functioning. Journal of Educational Measurement, 32(2),131-144.
McNamara, T. & Roever, C. (2006). Language testing: The social dimension. Malden, MA & Oxford: Blackwell.
Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19(4),289-304.
Newman, M. L., Groom, C. J., Handelman, L. D., & Pennebaker, J. W. (2008). Gender differences in language use: An analysis of 1400 text samples. Discourse Processes, 45, 211-236.
Oliveri, M. E., Ercikan, K., Zumbo, B. D. (2014). Effects of population heterogeneity on accuracy of DIF detection. Applied Measurement in Education, 27(4), 286-300.
O’Neill, K. A. & McPeek, W. M. (1993). Item and test characteristics that are associated with differential item functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning, 255-276. Hillsdale, NJ: Lawrence Erlbaum Associates.
Oshima, T. C., Raju, N. S., Flowers, C. P., & Slinde, J. A. (1998). Differential bundle functioning using the DFIT framework: Procedures for identifying possible sources of differential functioning. Applied Measurement in Education, 11(4), 353-369.
Pae, T. I. (2004b). Gender effect on reading comprehension with Korean EFL learners. System, 32(3),265-281.
Pae, T. I. (2012). Causes of gender DIF on an EFL language test: A multiple-data analysis over nine years. Language Testing, 29(4), 533-554.
Roever, C. (2005). That’s not fair: Fairness, bias, and differential item functioning in language testing. SLS Brownbag, 1-14.
Roth, W. M., Oliveri, M. E., Sandilands, D. D., & Lyons-Thomas, J. (2013). Investigating linguistic sources of differential item functioning using expert think-aloud protocols in science achievement tests. International Journal of Science Education, 35(4), 546-576.
Ryan, K. & Bachman, L. F. (1992). Differential item functioning on two tests of EFL proficiency. Language Testing, 9(1), 12-29.
Santelices, M. V. & Wilson, M. (2012). On the relationship between differential item functioning and item difficulty: An issue of methods? Item response theory approach to differential item functioning. Educational and Psychological Measurement, 72(1), 5-36.
Sasaki, M. (1991). A comparison of two methods for detecting differential item functioning in an ESL placement test. Language Testing, 8(2), 95-111.
Shermis, M. D., Mao, L., Mulholland, M., & Kieftenbeld, V. (2017). Use of automated scoring features to generate hypotheses regarding language-based DIF. International Journal of Testing, 17(4), 351-371.
Shimizu, Y., & Zumbo, B. D. (2005). Logistic regression for differential item functioning: A primer. Japan Language Testing Association Journal, 7, 110-124.
Sireci, S. G., & Rios, J. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19(2-3), 170-187.
Stricker, L. J., & Emmerich, W. (1999). Possible Determinants of differential item functioning: Familiarity, interest, and emotional reaction. Journal of Educational Measurement, 36(4), 347-366.
Suh, Y., & Talley, A. E. (2015). An empirical comparison of DDF detection methods for understanding the causes of DIF in multiple-choice items. Applied Measurement in Education, 28, 48-67.
Takala, S. & Kaftandjieva, F. (2000). Test fairness: A DIF analysis of an L2 vocabulary test. Language Testing, 17(3), 323-340.
Taylor, C. S., & Lee, Y. (2012). Gender DIF in reading and mathematics tests with mixed item formats. Applied Measurement in Education, 25, 246-280.
Tsaousis, I., Sideridis, G., & Al-Saawi, F. (2018). Differential distractor functioning as a method for explaining DIF: The case for a national admissions test in Saudi Arabia. International Journal of Testing, 18(1), 1-26.
Uiterwijk, H. & Vallen, T. (2005). Linguistic sources of item bias for second generation immigrants in Dutch tests. Language Testing, 22(2),211-234.
Vermunt, J. K. (2008). Latent class and finite mixture models for multilevel data sets. Statistical Methods in Medical Research, 17, 33-51.
Widdowson, H. (2001). Communicative language testing: The art of the possible. In C. Elder, A. Brown, E. Grove, K. Hill, N. Iwashita, T. Lumley, T. McNamara, & K. O'Loughlin (Eds.), Experimenting with uncertainty: Essays in honour of Alan Davis (pp. 12-21). Cambridge: Cambridge University Press.
Wu, A. D. & Ercikan, K. (2006). Using multiple-variable matching to identify cultural sources of differential item functioning. International Journal of Testing, 6(3), 287-300.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zumbo, B. D. (2007). Three generations of DIF analysis: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233.
Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65-82). Charlotte, NC: IAP-Information Age Publishing, Inc.
Zumbo, B. D. & Gelin, M. N. (2005). A matter of test bias in educational policy research: bringing the context into picture by investigating sociological/community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5(1),1-23.
Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Olvera Astivia, O. L., & Ark, T. K. (2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12, 136-151.