So an 8-year old would take the form created for 8-year olds, which would contain items appropriate for that age level as well as the age levels on either side (7 and 9). This was a fixed form test, or rather, a set of fixed form tests that were written for specific ages. In my previous job, I began working on a cognitive ability test developed with classical test theory. It also highlights the importance of sampling when piloting the test or measure. But maybe I need to do an A to Z of Rasch next year!)īasically, when doing classical item analysis, where the capability of your sample can completely change your item statistics, it becomes even more important to validate content and have experts on hand to help determine what items are appropriate for different ability groups. (Sadly, you'll have to take all of that at face value, because going into how IRT and Rasch is not sample dependent goes beyond the scope of this post/series. If you look at the math behind IRT and Rasch, you can see exactly where sample is being controlled for and therefore partialled out. Item response theory and Rasch, on the other hand, provide item difficulty that is not sample dependent. If you have an exceptionally capable sample take your test, your items will all look easy, even if relatively speaking they are not.
The problem here is that P value is entirely sample-dependent. If almost everyone gets the item correct, it is an easy item. If almost no one gets the item correct, it is a difficult item. In this context, P refers to difficulty, and it is abbreviated as P because it is the proportion or percentage of examinees who get the item correct. The main item statistic generated in classical item analysis is a P value, not to be confused with the p-value generated in inferential statistical analysis. But you can still get some useful item statistics when adopting a classical test theory approach, through classical item analysis. Tests and measures developed with classical test theory can't really be divided up in the way tests and measures developed with item response theory can. I've mentioned classical test theory before, which focuses on the overall test or measure, as opposed to individual items. Back when I worked at HMH, I discovered an R package called ITEMAN, which is used for classical item analysis.