Charles Explorer logo
🇬🇧

Deeper analysis of group disparities in ratings motivated by simulated and real data examples

Publication at Faculty of Education |
2020

Abstract

In this talk we present simulated as well as real data examples motivating deeper analysis of disparities in applicant and student ratings. First, we discuss methods for detection of differential item functioning (DIF).

We introduce two cases pointing to the importance of DIF analysis: A simulated example showing that, hypothetically, two groups may have an identical distribution of total scores, yet there may be a DIF and thus potentially unfair item present in the data. Contrary, a real data example is provided, whereby the two groups differ significantly in their overall ability, yet no item bias is detected.

Second, we describe real-data example motivating development of a more flexible model-based estimate of inter-rater reliability (IRR). In this example, IRR calculated on stratified data is not able to detect group differences, while the proposed more flexible model-based approach shows significant difference in IRR when rating internal vs. external applicants.

Finally, we present R package ShinyItemAnalysis providing toy data and interactive features for deeper item-level analysis of ratings and assessments. We argue that interesting datasets presented in an interactive way may activate students, enforce understanding, and motivate research developing more flexible analytic methods.