The stability of IRT parameters under several test equating conditions

Weber, Dominik; Becker, Nicolas; Spinath, Frank M.; Koch, Marco

Bitte benutzen Sie diese Referenz, um auf diese Ressource zu verweisen: doi:10.22028/D291-47880

Titel:	The stability of IRT parameters under several test equating conditions
VerfasserIn:	Weber, Dominik Becker, Nicolas Spinath, Frank M. Koch, Marco
Sprache:	Englisch
Titel:	Frontiers in Psychology
Bandnummer:	16
Verlag/Plattform:	Frontiers
Erscheinungsjahr:	2026
Freie Schlagwörter:	test equating item linking test validity anchor item item response theory simulation study
DDC-Sachgruppe:	150 Psychologie
Dokumenttyp:	Journalartikel / Zeitschriftenartikel
Abstract:	Introduction: It is crucial for researchers and test developers to compare results from different test sets (e. g., re-testing, parallel test forms). To ensure comparability, test sets are often linked using anchor items as a common denominator alongside distinct items. To date, most studies on test equating have been limited in scope, typically comparing only absolute numbers of anchor items or focusing on a single IRT model or equating method. Furthermore, previous research has primarily evaluated the absolute deviation of estimated parameters from true parameters. However, in diagnostic contexts, the correlation between these values is often more relevant for ensuring validity and test fairness. Therefore, the aim of this simulation study was to examine the impact of a broad range of key factors on test equating. Methods: We evaluated correlations and recovery indices between predefined true values and values estimated through test equating for three IRT parameters (discrimination, difficulty, and ability). To this end, we varied the equating method (MS, MM, MGM, IRF, TRF), the IRT model (2PL vs. 3PL), guessing probability (0.000–0.250), anchor item proportion (5–25%), test set size (20–80 items), and the discrimination parameters of the anchor items. In addition, we used samples of 25–100 individuals to assess equating quality under challenging conditions as well as samples of 500 and 1,000 individuals to reflect adequate modeling conditions. Results: Low guessing probabilities and high anchor item discrimination parameters strongly improved test equating quality for all three IRT parameters. Recovery of discrimination and ability parameters increased logarithmically with larger test set sizes and higher anchor item proportions, with each of these two factors partially compensating for reductions in the other. While sample sizes below 100 individuals produced inadequate parameter recovery, samples of 100 or 500 individuals were justifiable under certain conditions. However, samples of only 100 individuals carried a slight risk of non-convergence. The choice of the equating method had rather minor effects and the impact of the IRT model was ambivalent. Discussion: These findings highlight the importance of using distractor-free response formats without any guessing probability, anchor items with high discrimination parameters, and large samples to ensure valid test equating. For individual research and test application purposes, we provide a comprehensive data set covering multiple factor levels and a step-by-step simulation guide.
DOI der Erstveröffentlichung:	10.3389/fpsyg.2025.1652341
URL der Erstveröffentlichung:	https://doi.org/10.3389/fpsyg.2025.1652341
Link zu diesem Datensatz:	urn:nbn:de:bsz:291--ds-478803 hdl:20.500.11880/41870 http://dx.doi.org/10.22028/D291-47880
ISSN:	1664-1078
Datum des Eintrags:	21-Mai-2026
Fakultät:	HW - Fakultät für Empirische Humanwissenschaften und Wirtschaftswissenschaft
Fachrichtung:	HW - Psychologie
Professur:	HW - Prof. Dr. Frank Spinath
Sammlung:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Dateien zu diesem Datensatz:

Datei	Beschreibung	Größe	Format
fpsyg-16-1652341.pdf		1,78 MB	Adobe PDF	Öffnen/Anzeigen

Export: BibTex Statistik anzeigen

Diese Ressource wurde unter folgender Copyright-Bestimmung veröffentlicht: Lizenz von Creative Commons