PROBABILITY MODELING AND STATISTICAL ANALYSIS OF STANDARDIZED TEST DATA FOR PURPOSES OF UNDERSTANDING AND MEASURING TEST EQUITY, LATENT ABILITY SPACE MULTI-DIMENSIONALITY, AND SKILLS DIAGNOSTIC ASSESSMENT
The paper surveys almost two decades of progress by me and colleagues in three psychometric research areas involving the probability modeling and statistical analysis of standardized ability test data: nonparametric multidimensional latent ability structure modeling and assessment, test fairness modeling and assessment, and modeling and assessment of skills diagnosis via educational testing. In the process, it is suggested that the unidimensional scoring testing paradigm that has driven standardized ability testing research for over half a century is giving way to a new multidimensional latent ability modeling and multiple scoring paradigm that in particular explains and allows the effective detection of test bias and embraces skills-level formative assessment, opening up a plethora of challenging, exciting, and societally important research problems for psychometricians. It is hoped that this light-stepping history will interest probabilists and statisticians in exploring the field of psychometrics.
Informally, test bias occurs when an examinee is under or over evaluated by his test score in terms of the purpose of the test. Also informally, skills diagnosis refers to evaluating examinee levels of mastery (usually done dichotomously as master versus nonmaster of each skill) on a moderate number of carefully selected skills for which having student skills profiles can greatly help individual student learning and classroom level.
My strategy, strongly influenced by his probabilistic background, for producing interesting and effective psychometric research is to choose psychometric research questions from practical challenges facing educational testing. Then, I and colleagues bring to bear sophisticated probability modeling and modern statistical thought to solve these questions, making effectiveness of the resulting research in meeting the educational testing challenges the ultimate criterion for judging its worth.
It is somebody's ancient proverb that the acorn sometimes falls far from the oak tree. Y. S. Chow taught me the tools of probability limit theorem research, taught me to approach research with enthusiasm and tenacity, and provided a very supportive environment for me and his other graduate students. Although psychometrics/educational measurement is far from the probabilistic oak tree, whatever success I've had as a psychometrician has been strongly influenced by the supportive, demanding, and creative environment Y. S. creates for his students. By now I have had many Ph.D. students in psychometrics: it was the just described model of Y. S.'s for mentoring Ph.D. students that I followed with all of them.
- Nonparametric IRT
- NIRT
- Latent unidimensionality
- Latent multidimensionality
- Essential unidimensionality
- Monotone locally independent unidimensional IRT model
- MLI1
- Item pair conditional covariances
- DIMTEST
- HCA/CCPROX
- DETECT
- Generalized compensatory IRT model
- Approximate simple structure
- DIF
- Differential item functioning
- Differential bundle functioning
- DBF
- Valid subtest
- Multidimensional model for DIF
- MMD
- SIBTEST
- MultiSIB
- Mantel-Haenszel
- PolySIB
- CrossingSIB
- Skills diagnosis
- Formative assessment
- Unified Model
- Bayes Unified Model
- MCMC
- PSAT Score Report Plus
- University of Illinois Department of Statistics Statistical Laboratory for Educational and Psychological Measurement