MODERATING ABILITY OF ITEM RESPONSE THEORY THROUGH PRIOR GUESSING PARAMETER

A psycho-technology approach to discouraging guessi ng in multiple-choice formatted item can be done through reducing the a priori guessing probability of an item. This study proposes a psychometrics framework of Item Response Theory (IRT) to model th e effect of having various priori guessing probabilities across different items. A prior guess ing parameter is proposed to serves as a moderator of he ability parameter in the two parameter logistic IRT . The results show that the proposed prior guessing parameter successfully moderates the ability parame ters of the subjects with different degrees of gues sing. However, the prior guessing parameter is insensitiv e when the performance pattern is mixed within the testlet but similar across testlet with different p riori guessing probabilities.


INTRODUCTION
The pioneering Item Response Theory (IRT) to deal with guessing in multiple-choice formatted item is the Parameter Logistic (3PL)-IRT model (Lord, 1980), where a guessing parameter is introduced to take account of the guessing effect on top of the difficulty and discrimination parameters in the 2PL-IRT model (Birnbaum, 1968). However, the empirical study from Pelton (2002) shows that the estimation of the guessing parameters is unstable unless the parameters are made equal to a known or an unknown constant. Variants of IRT models have been developed to improve the modeling of guessing by modifying the parameters of IRT, for examples, difficulty plus guessing PL model (Kubinger and Draxler, 2006) and Ability-Based Guessing (1PL-AG) model (Boech and Leuven, 2006).
On the other hand, a psycho-technology approach to reducing the a priori guessing probability of an item (Kubinger et al., 2010) is proposed to discourage guessing. This approach involves two common ways of formatting the item response i.e., increases the number of options but maintains single correct answer and increases the number of correct response options. Let the number of response options be k. For response format with single correct answer, the a priori guessing probability is 1/k. By increasing the number of options but maintaining single correct answer, the a priori guessing probability will be less than 1/k. In the case of increasing the number of correct response options, r, the calculation of the probability depends on whether the r is made known to the subjects. If r is known, the probability is related to the permutation of r correct response options and (k-r) distractors, which is given by r!(k-r)!/k! = 1/ k C r . If the r is unknown, the a priori probability then amounts to (1/2) k , which only depends on the number of response options. It can be seen that 1/k>1/ k C r > (1/2) k . In other words, the a priori guessing probability is the lowest if there are multiple correct response options and the number of correct response options is unknown. Kubinger et al. (2010) show that the difficulty parameter of the response format of 'two of the five response options are correct' is higher than 'one out of six response options is correct' when the r is made known.

JMSS
However, the finding focuses on the difficulties of the different response formats rather than the guessing effect of the subjects.
In this study, we consider a mixture of items with different number of multiple correct response options and therefore the priori guessing probabilities of these items are different. We propose a prior guessing parameter to be the moderator for the ability parameter in the psychometrics framework of IRT.

IRT Model with Prior Guessing Parameter
We adopt the concept of testlet (Wainer and Kiely, 1987) to bundle the items with the same response option. The priori guessing probabilities for items within the same testlet are equal but different across testlets. We extend the notion of incorporating the testlet into IRT by Wang et al. (2002) to propose a variant of testlet response theory to model the prior guessing effect by subjects.
Let the observed dichotomous responses of n subjects to m items be Y ij , where i = 1, 2,…, n and j = 1, 2, …, m. The item is scored as 1 if correct and 0 if not. In testlet response model, the conditional probability that subject i responses correctly to the item j (Y ij = 1) is given by: where, α j , β j and c j are the discrimination, difficulty and guessing parameters respectively for item j, θ i is the ability parameter of subjects i, γ it(j) is the testlet parameter accounts for the random effect of subject i across items that belong to the same testlet and t(.) is the function relates the belonging of items to the testlets, for example, t(1) = 2 means Item 1 belongs to Testlet 2. Each testlet parameter is assumed to follow normal distribution N(0, σ 2 t(j) ) and represents a testlet effect through its own testlet specific variance, σ 2 t(j) . Procedures have been developed to estimate the variance of the testlet, Wainer et al. (2007); Glas et al. (2000) and Jiao et al. (2013). Note that without the testlet effect, Equation (1) becomes: which is the 3PL-IRT model (Lord, 1980). The model assumes that an individual guesses item j correctly with probability c j . If there is no guessing for all the items, c j = 0 and Equation (2) is then reduced to 2PL-IRT model (Birnbaum, 1968) given by Equation (3): In this study, we propose a 2PL-IRT model with testlet effect due to prior guessing. Consider m items with numbers of response options k = k 1 , k 2 , …, k d are bundled into d testlets t s = {t 1 , t 2 …, t d } respectively. The priori guessing probabilities of these testlets are ( ) respectively. The proposed model is given by Equation (4): where, t(j) = t s = {t 1 , t 2 …, t d } for j = 1, 2, …, m, is considered as the testlet of items having the same number of multiple-correct response options and λ it(j) is the testlet effect due to guessing items with different priori guessing probabilities for subject i. Let the number of items in the s-th testlet be m s . The i-th subject has a d-dimensional response vectors s )′ is the response vector at the s-th testlet and has a prior guessing effect vector λ i = (λ i1 , λ i2 , …, λ id )′, where λ is is the prior guessing effect at testlet t s . The correlation among the prior guessing parameters measured on the same subject across d different testlets i.e., λ i1 , λ i2 ,…, λ id , is expected to be higher than the prior guessing among differ-ent subjects within the same testlet i.e., λ 1s , λ 2s ,…, λ ns . We consider this correlation structure of prior guessing as a moderator for the ability parameter. The distribution of the ability parameters is normal, we assume the prior guessing effect vectors are also come from a normal distribution, λ i ~ N(µ λ , Σ λ ), with mean structure, µ λ and covariance, Σ λ . The covariance matrix Σ λ is considered unstructured and decomposed using Cholesky parameterization and becomes Σ λ = M λ M λ ′, where M λ is the lower triangular matrix with positive diagonal elements and unrestricted elements below the diagonal. Higher value in Σ λ indicates higher variation of the prior guessing effect for subject i across testlets with different multiple-correct response options and implies higher guessing in responding the items. Therefore, it can be used as a moderator for the ability parameter.

Simulations
Simulations are performed to study the inclusion of the proposed prior guessing parameter to moderate the ability parameter. Twenty items with multiple correct response options are considered. The items are bundled into 2 testlets with each consists of 10 items. The numbers of response options for the 2 testlets are k 1 = 4 and k 2 = 5 respectively but the numbers of correct response options are assumed unknown. Thus, the priori guessing probabilities of the 2 testlets are (1/2) 4 and (1/2) 5 respectively.
Response data of 9 groups of subjects which consists of 20 subjects each is generated. The subjects are considered to have 3 categories of ability i.e., low, average and high which respective ability parameters are -2.5,0.5 and 2.5. The subjects respond to the testlet of items in 3 performance patterns i.e., poor, average and good. Difficulty parameters used to represent the performance patterns for all the 9 groups of subjects are shown in Table 1. The IRT models used to generate the data are considered to have constrained discrimination parameter, α j = 1 for j = 1,2,…, m. The first 3 groups are assumed to respond without guessing and 2PL-IRT model is used to generate the response data. For the rest of the 6 groups, 3PL-IRT model with guessing parameters equal to the priori guessing probabilities is used to generate the response data. Groups 4 and 5 are assumed to guess more than Groups 6 and 7. Groups 8 and 9 are having mixed performance pattern in both of the testlets. Our focus is only on the subjects with lower ability.
In this study, Bayesian estimation with Markov Chain Monte Carlo (MCMC) is used to estimate the parameters of the proposed response model. The MCMC method is not only provides a framework to experiment with new models (Kim and Bolt, 2007;Martin et al., 2011), it is also more effective for heavy parameters IRT model (Baker, 1998;Azevedo et al., 2012;Cho et al., 2013). We adopt the prior distributions imposed on parameters θ I ~ N (0, 1), α j ~ N (0.8, 0.2 2 ), β j ~ N(0, 1) from Wainer et al. (2007) and Σ λ~ gamma (0.5, 1), from Bradlow et al. (1999) which is proposed for testlet and to restrict the diagonal elements of M λ to be positive. Random initial values are generated for the parameters. Since the number of iterations required for testlet parameters to converge is quite large (Sinharay, 2003;Sun et al., 2012), we consider 10,000 iterations in our study. The Deviance Information Criterion (DIC) (Spiegelhalter et al., 2002;Francois and Laval, 2011) developed as model selection method for Bayesian estimates of model parameters is used to compare the model fit of the pro-posed model and the benchmarked IRT model.
The simulations are performed using BUGS language implemented in OPENBUS version 3.2.1 (Lunn et al., 2009) and the statistical programming environment R (RDCT, 2010) version 2.14.1.

JMSS
The first analysis focuses on the performance of the prior guessing parameter in the non-guessing groups of Groups 1, 2 and 3. The ability parameters of all the 3 groups estimated by the proposed model are higher than the 2PL-IRT model. The abilities estimated by the proposed model are adjusted by the prior guessing parameter by +0.11 (= -1.25-(-1.36)), +0.12 (= 0.58-0.46) and +0.23 (= 2.61-2.38) respectively for Groups 1, 2 and 3.
The second analysis evaluates the effect of the prior guessing parameter in moderating the abilities of subjects with different degrees of guessing. Groups 4 and 5 performed better in Testlet 2 which has lower priori guessing probability. These groups are assumed to have more guessing. The ability parameters of these groups estimated by the proposed model are lower than the 2PL-IRT model by -0.39 (= -0.71-(-0.32)) and -0.34 (= -0.50-(-0.16)) for Group 4 and 5 respectively. On the contrary, Groups 6 and 7 which have opposite performance pattern to Groups 4 and 5 are assumed to have less guessing. The estimated ability parameters of Group 6 and 7 are respectively +0.32 (= -0.05-(-0.37)) and +0.36 (= -0.10-(-0.46)) higher in the proposed model compare to the 2PL-IRT.
The third analysis evaluates the sensitivity of the prior guessing parameter across testlets with different priori guessing probabilities but with similar mixed performance pattern. The result shows that the ability parameters estimated by both models are very close.
The results also show that the scale of the prior guessing parameter for the subjects from the same ability level related to their performance pattern across testlets of items with different priori guessing probabilities. In the analysis, the low ability groups i.e., from Group 4 to Group 9 are considered. It can be seen that the prior guessing parameters are the lowest for Groups 6 and 7, which are -1.52 and -1.66 respectively. These two groups perform worse in the testlet with lower prior guessing probability. For groups that perform better in the testlet with lower prior guessing probability i.e., Groups 4 and 5, the prior guessing parameters are the highest, which are -0.44 and -0.42 respectively. For groups that with mixed performance pattern within testlet but similar across testlets i.e., Groups 8 and 9, the values of prior guessing parameters are between the range of two aforementioned clusters of groups.

DISCUSSION
This paper has described a psychometrics framework based on testlet response model to deal with guesing effect and shown that it measures subject's ability more reflectively than 2PL-IRT model. The proposed model introduces a prior guessing parameter, λ is , which models the prior guessing effect of subject i at testlet t s , in the testlet response model. The notion is adopted form Glas et al. (2000) where there are at least three mathematically isomorphic ways to include the testlet parameter in the IRT model. With the proposed prior guessing parameter, the logit of Equation (4) can actually be configured as α j ((θ i + λ it(j) ) -β j ) where λ it(j) as part of ability, or α j (θ i + (λ it(j) -β j )) where λ it(j) as part of difficulty, or α j (θ i + λ it(j) -β j ) where λ it(j) as an independent entity. However, the focus of this paper is the first case where prior guessing parameter as a part of ability is considered. The simulation results show that the proposed prior guessing parameter works well as a moderator for the ability parameter. The results from the first analysis imply that the prior guessing parameters from the proposed model merits ability for being not guessing. The second analysis result supports the first analysis and further implies that the prior guessing parameter merits ability of subjects with lower degree of guessing but penalize ability of subjects with higher degree of guessing. In terms of the scale of the prior guessing parameter, it is lower for subjects who show higher degree of guessing.

JMSS
However, this model has some limitations. First, comparison between the third and first analysis results shows that the prior guessing parameter does not serve as a sensitive moderator in the case of similar performance pattern across testlets but mixed performance pattern within testlet. Second, the a priori guessing probability considered in this study is depending on the number of response options rather than the number of correct responses. In other words, the use of the design of multiple correct responses has not been utilised.

CONCLUSION
This study proposes a psychometrics framework of IRT to model the effect of having various priori guessing probabilities across items. The inclusion of the proposed prior guessing parameter in the 2PL-IRT model successfully serves as a moderator for the ability parameters. However, there are limitations on the model. The future works will be on two main scopes: the sensitivity of the prior guessing parameter towards the priori probability, number of items and number of testlets; how to model partial knowledge of the subject based on the design of multiple correct responses used in the proposed model.