--- title: "IRT without the normality assumption" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{IRT without the normality assumption} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} fontsize: 12pt --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(IRTest) library(ggplot2) library(gridExtra) ``` # 0. Introduction + **IRTest** is a useful tool for $\mathcal{\color{red}{IRT}}$ (item response theory) parameter $\mathcal{\color{red}{est}}$imation, especially when the violation of normality assumption on latent distribution is suspected. + **IRTest** deals with uni-dimensional latent variable. + For missing values, **IRTest** adopts full information maximum likelihood (FIML) approach. + In **IRTest**, including the conventional usage of Gaussian distribution, several methods are available for estimation of latent distribution: + empirical histogram method, + two-component Gaussian mixture distribution, + Davidian curve, + kernel density estimation, + log-linear smoothing. ## Installation The CRAN version of **IRTest** can be installed on R-console with: ``` install.packages("IRTest") ``` For the development version, it can be installed on R-console with: ``` devtools::install_github("SeewooLi/IRTest") ``` ## Functions Followings are the functions of **IRTest**. + `IRTest_Dich` is the estimation function when all items are *dichotomously* scored. + `IRTest_Poly` is the estimation function when all items are *polytomously* scored. + `IRTest_Cont` is the estimation function when all items are *continuously* scored. + `IRTest_Mix` is the estimation function for *a mixed-format test*, a test comprising both dichotomous item(s) and polytomous item(s). + `factor_score` estimates factor scores of examinees. + `coef_se` returns standard errors of item parameter estimates. + `best_model` selects the best model using an evaluation criterion. + `item_fit` tests the statistical fit of all items individually. + `inform_f_item` calculates the information value(s) of an item. + `inform_f_test` calculates the information value(s) of a test. + `plot_item` draws item response function(s) of an item. + `reliability` calculates marginal reliability coefficient of IRT. + `latent_distribution` returns evaluated PDF value(s) of an estimated latent distribution. + `DataGeneration` generates several objects that can be useful for computer simulation studies. Among these are simulated item parameters, ability parameters and the corresponding item-response data. + `dist2` is a probability density function of two-component Gaussian mixture distribution. + `original_par_2GM` converts re-parameterized parameters of two-component Gaussian mixture distribution into original parameters. + `cat_clps` recommends category collapsing based on item parameters (or, equivalently, item response functions). + `recategorize` implements the category collapsing. + `adaptive_test` estimates ability parameter(s) which can be utilized for computer adaptive testing (CAT). + For S3 methods, `anova`, `coef`, `logLik`, `plot`, `print`, and `summary` are available. \ --- \break # 1. Dichotomous items + **Preparing data** The function `DataGeneration` can be used in a preparation step. This function returns a set of artificial data and the true parameters underlying the data. ```{r} Alldata <- DataGeneration(model_D = 2, N=1000, nitem_D = 15, latent_dist = "2NM", d = 1.664, sd_ratio = 2, prob = 0.3) data <- Alldata$data_D theta <- Alldata$theta colnames(data) <- paste0("item", 1:15) ``` \ + **Analysis** (parameter estimation) ```{r, results='hide', message=FALSE} Mod1 <- IRTest_Dich(data = data, model = 2, latent_dist = "LLS", h=4) ``` \ + **Results** ```{r, message=FALSE, fig.align='center', fig.height=6, fig.width=6} ### Summary summary(Mod1) ### Log-likelihood logLik(Mod1) ### The estimated item parameters coef(Mod1) ### Standard errors of the item parameter estimates coef_se(Mod1) ### The estimated ability parameters fscore <- factor_score(Mod1, ability_method = "MLE") plot(theta, fscore$theta) abline(b=1, a=0) ### Standard errors of ability parameter estimates plot(fscore$theta, fscore$theta_se) ``` * The result of latent distribution estimation ```{r, fig.align='center', fig.asp=.6, fig.width=6} plot(Mod1, mapping = aes(colour="Estimated"), linewidth = 1) + lims(y = c(0, .75))+ geom_line( mapping=aes( x=seq(-6,6,length=121), y=dist2(seq(-6,6,length=121), prob = .3, d = 1.664, sd_ratio = 2), colour="True"), linewidth = 1)+ labs(title="The estimated latent density using '2NM'", colour= "Type")+ theme_bw() ``` * Item response function ```{r, fig.align='center', fig.asp=0.8, fig.width=6} p1 <- plot_item(Mod1,1) p2 <- plot_item(Mod1,4) p3 <- plot_item(Mod1,8) p4 <- plot_item(Mod1,10) grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2) ``` + **Item fit** ```{r} item_fit(Mod1) ``` * Reliability ```{r} reliability(Mod1) ``` * Posterior distributions for the examinees Each examinee's posterior distribution is identified in the E-step of the estimation algorithm (i.e., EM algorithm). Posterior distributions can be found in `Mod1$Pk`. ```{r, fig.align='center', fig.asp=.7, fig.width=6} set.seed(1) selected_examinees <- sample(1:1000,6) post_sample <- data.frame( X = rep(seq(-6,6, length.out=121),6), prior = rep(Mod1$Ak/(Mod1$quad[2]-Mod1$quad[1]), 6), posterior = 10*c(t(Mod1$Pk[selected_examinees,])), ID = rep(paste("examinee", selected_examinees), each=121) ) ggplot(data=post_sample, mapping=aes(x=X))+ geom_line(mapping=aes(y=posterior, group=ID, color='Posterior'))+ geom_line(mapping=aes(y=prior, group=ID, color='Prior'))+ labs(title="Posterior densities for selected examinees", x=expression(theta), y='density')+ facet_wrap(~ID, ncol=2)+ theme_bw() ``` * Test information function ```{r, fig.align='center', fig.asp=.8, fig.width=6} ggplot()+ stat_function( fun = inform_f_test, args = list(Mod1) )+ stat_function( fun=inform_f_item, args = list(Mod1, 1), mapping = aes(color="Item 1") )+ stat_function( fun=inform_f_item, args = list(Mod1, 2), mapping = aes(color="Item 2") )+ stat_function( fun=inform_f_item, args = list(Mod1, 3), mapping = aes(color="Item 3") )+ stat_function( fun=inform_f_item, args = list(Mod1, 4), mapping = aes(color="Item 4") )+ stat_function( fun=inform_f_item, args = list(Mod1, 5), mapping = aes(color="Item 5") )+ lims(x=c(-6,6))+ labs(title="Test information function", x=expression(theta), y='information')+ theme_bw() ``` ---------- \break # 2. Polytomous items + **Preparing data** ```{r} Alldata <- DataGeneration(model_P = "GRM", categ = rep(c(3,7), each = 7), N=1000, nitem_P = 14, latent_dist = "2NM", d = 1.664, sd_ratio = 2, prob = 0.3) data <- Alldata$data_P theta <- Alldata$theta colnames(data) <- paste0("item", 1:14) ``` \ + **Analysis** (parameter estimation) ```{r, results='hide', message=FALSE} Mod1 <- IRTest_Poly(data = data, model = "GRM", latent_dist = "KDE") ``` \ + **Results** ```{r, message=FALSE, fig.align='center', fig.height=6, fig.width=6} ### Summary summary(Mod1) ### Log-likelihood logLik(Mod1) ### The estimated item parameters coef(Mod1) ### Standard errors of the item parameter estimates coef_se(Mod1) ### The estimated ability parameters fscore <- factor_score(Mod1, ability_method = "MLE") plot(theta, fscore$theta) abline(b=1, a=0) ### Standard errors of ability parameter estimates plot(fscore$theta, fscore$theta_se) ``` * The result of latent distribution estimation ```{r, fig.align='center', fig.asp=.6, fig.width=6} plot(Mod1, mapping = aes(colour="Estimated"), linewidth = 1) + stat_function( fun = dist2, args = list(prob = .3, d = 1.664, sd_ratio = 2), mapping = aes(colour = "True"), linewidth = 1) + lims(y = c(0, .75)) + labs(title="The estimated latent density using '2NM'", colour= "Type")+ theme_bw() ``` * Item response function ```{r, fig.align='center', fig.asp=0.8, fig.width=6} p1 <- plot_item(Mod1,1) p2 <- plot_item(Mod1,4) p3 <- plot_item(Mod1,8) p4 <- plot_item(Mod1,10) grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2) ``` + **Item fit** ```{r} item_fit(Mod1) ``` * Reliability ```{r} reliability(Mod1) ``` * Posterior distributions for the examinees Each examinee's posterior distribution is identified in the E-step of the estimation algorithm (i.e., EM algorithm). Posterior distributions can be found in `Mod1$Pk`. ```{r, fig.align='center', fig.asp=.7, fig.width=6} set.seed(1) selected_examinees <- sample(1:1000,6) post_sample <- data.frame( X = rep(seq(-6,6, length.out=121),6), prior = rep(Mod1$Ak/(Mod1$quad[2]-Mod1$quad[1]), 6), posterior = 10*c(t(Mod1$Pk[selected_examinees,])), ID = rep(paste("examinee", selected_examinees), each=121) ) ggplot(data=post_sample, mapping=aes(x=X))+ geom_line(mapping=aes(y=posterior, group=ID, color='Posterior'))+ geom_line(mapping=aes(y=prior, group=ID, color='Prior'))+ labs(title="Posterior densities for selected examinees", x=expression(theta), y='density')+ facet_wrap(~ID, ncol=2)+ theme_bw() ``` * Test information function ```{r, fig.align='center', fig.asp=.8, fig.width=6} ggplot()+ stat_function( fun = inform_f_test, args = list(Mod1) )+ stat_function( fun=inform_f_item, args = list(Mod1, 1), mapping = aes(color="Item 1 (3 cats)") )+ stat_function( fun=inform_f_item, args = list(Mod1, 2), mapping = aes(color="Item 2 (3 cats)") )+ stat_function( fun=inform_f_item, args = list(Mod1, 3), mapping = aes(color="Item 3 (3 cats)") )+ stat_function( fun=inform_f_item, args = list(Mod1, 8), mapping = aes(color="Item 8 (7 cats)") )+ stat_function( fun=inform_f_item, args = list(Mod1, 9), mapping = aes(color="Item 9 (7 cats)") )+ stat_function( fun=inform_f_item, args = list(Mod1, 10, "p"), mapping = aes(color="Item10 (7 cats)") )+ lims(x=c(-6,6))+ labs(title="Test information function", x=expression(theta), y='information')+ theme_bw() ``` ---------- \break \ # 3. Continuous items + **Statistical details of the continuous IRT**
Beta distribution (click) $$ \begin{align} f(x) &= \frac{1}{Beta(\alpha, \beta)}x^{\alpha-1}(1-x)^{(\beta-1)} \\ &= \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{(\beta-1)} \end{align} $$ $E(x)=\frac{\alpha}{\alpha+\beta}$ and $Var(x)=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta=1)}$ If we reparameterize $\mu=\frac{\alpha}{\alpha+\beta}$ and $\nu=\alpha+\beta$, $$ f(x) = \frac{\Gamma(\nu)}{\Gamma(\mu\nu)\Gamma(\nu(1-\mu))}x^{\mu\nu-1}(1-x)^{(\nu(1-\mu)-1)} $$ No Jacobian transformation required since $\mu$ and $\nu$ are parameters of the $f(x)$, not variables.
Useful equations (click) $\psi(\bullet)$ and $\psi_1(\bullet)$ denote for digamma and trigamma functions, respectively. $$ \begin{align} E[\log{x}] &= \int_{0}^{1}{\log{x}f(x) \,dx} \\ &= \int_{0}^{1}{\log{x} \frac{1}{Beta(\alpha, \beta)}x^{\alpha-1}(1-x)^{(\beta-1)} \,dx} \\ &= \frac{1}{Beta(\alpha, \beta)} \int_{0}^{1}{\log{(x)} x^{\alpha-1}(1-x)^{(\beta-1)} \,dx} \\ &= \frac{1}{Beta(\alpha, \beta)} \int_{0}^{1}{\frac{\partial x^{\alpha-1}(1-x)^{(\beta-1)}}{\partial \alpha} \,dx} \\ &= \frac{1}{Beta(\alpha, \beta)} \frac{\partial}{\partial \alpha}\int_{0}^{1}{ x^{\alpha-1}(1-x)^{(\beta-1)} \,dx} \\ &= \frac{1}{Beta(\alpha, \beta)} \frac{\partial Beta(\alpha, \beta)}{\partial \alpha} \\ &= \frac{\partial \log{[Beta(\alpha, \beta)]}}{\partial \alpha} \\ &= \frac{\partial \log{[\Gamma(\alpha)]}}{\partial \alpha} - \frac{\partial \log{[\Gamma(\alpha + \beta)]}}{\partial \alpha} \\ &= \psi(\alpha) - \psi(\alpha+\beta) \end{align} $$ Similarly, $E[\log{(1-x)}]=\psi(\beta) - \psi(\alpha+\beta)$. Furthermore, using $\frac{\partial Beta(\alpha,\beta)}{\partial \alpha} = Beta(\alpha,\beta)\left(\psi(\alpha)-\psi(\alpha+\beta)\right)$ and $\frac{\partial^2 Beta(\alpha,\beta)}{\partial \alpha^2} = Beta(\alpha,\beta)\left(\psi(\alpha)-\psi(\alpha+\beta)\right)^2 + Beta(\alpha,\beta)\left(\psi_1(\alpha)-\psi_1(\alpha+\beta)\right)$, $$ \begin{align} E\left[(\log{x})^2\right] &= \frac{1}{Beta(\alpha, \beta)} \frac{\partial^2 Beta(\alpha, \beta)}{\partial \alpha^2} \\ &= \left(\psi(\alpha)-\psi(\alpha+\beta)\right)^2 + \left(\psi_1(\alpha)-\psi_1(\alpha+\beta)\right) \end{align} $$ This leads to, $$ \begin{align} Var\left[\log{x}\right] &= E\left[(\log{x})^2\right] - E\left[\log{x}\right]^2 \\ &=\psi_1(\alpha)-\psi_1(\alpha+\beta) \end{align} $$
Continuous IRT (click) + Expected item response $$ \mu = \frac{e^{a(\theta -b)}}{1+e^{a(\theta -b)}} \\ \frac{\partial \mu}{\partial \theta} = a\mu(1-\mu) \\ \frac{\partial \mu}{\partial a} = (\theta - b)\mu(1-\mu) \\ \frac{\partial \mu}{\partial b} = -a\mu(1-\mu) \\ \frac{\partial \mu}{\partial \nu} = 0 $$ + Probability of a response $$ f(x)=P(x|\, \theta, a, b, \nu) = \frac{\Gamma(\nu)}{\Gamma(\mu\nu)\Gamma(\nu(1-\mu))} x^{\mu\nu-1} (1-x)^{\nu(1-\mu)-1} \\ $$ + Gradient $$ \log{f} = \log{\Gamma(\nu)}-\log{\Gamma(\mu\nu)}-\log{\Gamma(\nu(1-\mu))} + (\mu\nu-1)\log{x} + (\nu(1-\mu)-1) \log{(1-x)} $$ $$ \frac{\partial \log{f}}{\partial \theta} = a\nu\mu(1-\mu)\left[-\psi{(\mu\nu)}+\psi{(\nu(1-\mu))}+ \log{\left(\frac{x}{1-x}\right)}\right] $$ + Information $$ E\left[ \left(\frac{\partial \log{f}}{\partial \theta}\right)^2 \right] = (a\nu\mu(1-\mu))^2\left[E\left[ \log{\left(\frac{x}{1-x}\right)^2}\right] -2 \left(\psi{(\mu\nu)}-\psi{(\nu(1-\mu))}\right )E\left[ \log{\left(\frac{x}{1-x}\right)}\right] +\left(\psi{(\mu\nu)}-\psi{(\nu(1-\mu))}\right )^2 \right] $$ $$ \begin{align} E\left[ \log{\left(\frac{x}{1-x}\right)^2}\right] &= E\left[ \log{\left(x\right)^2}\right] -2 E\left[ \log{\left(x\right)}\log{\left(1-x\right)}\right] + E\left[ \log{\left(1-x\right)^2}\right] \\ &= Var\left[ \log{\left(x\right)}\right]+E\left[ \log{\left(x\right)}\right]^2 \\ &\qquad -2 Cov\left[ \log{\left(x\right)}\log{\left(1-x\right)}\right]-2E\left[ \log{\left(x\right)}\right]E\left[ \log{\left(1-x\right)}\right] \\ &\qquad + Var\left[ \log{\left(1-x\right)}\right]+E\left[ \log{\left(1-x\right)}\right]^2 \\ &= \psi_{1}(\alpha)-\psi_{1}(\alpha+\beta) +E\left[ \log{\left(x\right)}\right]^2 \\ &\qquad +2 \psi_{1}(\alpha+\beta)-2E\left[ \log{\left(x\right)}\right]E\left[ \log{\left(1-x\right)}\right] \\ &\qquad + \psi_{1}(\beta)-\psi_{1}(\alpha+\beta)+E\left[ \log{\left(1-x\right)}\right]^2 \\ &= \psi_{1}(\alpha)-\psi_{1}(\alpha+\beta) +\left[ \psi(\alpha)-\psi(\alpha+\beta)\right]^2 \\ &\qquad +2 \psi_{1}(\alpha+\beta)-2 \left(\psi(\alpha)-\psi(\alpha+\beta)\right)\left(\psi(\beta)-\psi(\alpha+\beta)\right) \\ &\qquad + \psi_{1}(\beta)-\psi_{1}(\alpha+\beta)+\left[\psi(\beta)-\psi(\alpha+\beta)\right]^2 \\ &= \psi_{1}(\alpha) +\psi_{1}(\beta) +\left[\psi(\alpha)-\psi(\beta)\right]^2 \end{align} $$ $$ \begin{align} E\left[\left(\frac{\partial \log{f}}{\partial \theta}\right)^2 \right] & = (a\nu\mu(1-\mu))^2\left[E\left[ \log{\left(\frac{x}{1-x}\right)^2}\right] -2 \left(\psi{(\mu\nu)}-\psi{(\nu(1-\mu))}\right )E\left[ \log{\left(\frac{x}{1-x}\right)}\right] +\left(\psi{(\mu\nu)}-\psi{(\nu(1-\mu))}\right )^2 \right] \\ &= (a\nu\mu(1-\mu))^2\left[\psi_{1}(\alpha) +\psi_{1}(\beta) +\left[\psi(\alpha)-\psi(\beta)\right]^2 -2 \left(\psi{(\alpha)}-\psi{(\beta)}\right )\left(\psi{(\alpha)}-\psi{(\beta)}\right ) +\left(\psi{(\alpha)}-\psi{(\beta)}\right )^2 \right] \\ &= (a\nu\mu(1-\mu))^2\left[\psi_{1}(\alpha) +\psi_{1}(\beta) \right] \\ \end{align} $$ $$ I(\theta) = E\left[\left(\frac{\partial \log{f}}{\partial \theta}\right)^2 \right] = (a\nu\mu(1-\mu))^2\left[\psi_{1}(\alpha) +\psi_{1}(\beta)\right] $$ + Log-likelihood in the M-step of the MML-EM procedure Marginal log-likelihood of an item can be expressed as follows: $$ \ell = \sum_{j} \sum_{q}\gamma_{jq}\log{L_{jq}}, $$ where $\gamma_{jq}=E\left[\Pr\left(\theta_j \in \theta_{q}^{*}\right)\right]$ is the expected probability of respondent $j$'s ability ($\theta_j$) belonging to the $\theta_{q}^{*}$ of the quadrature scheme and is calculated at the E-step of the MML-EM procedure, and $L_{jq}$ is the likelihood of respondent $j$'s response at $\theta_{q}^{*}$ for the item of current interest. + First derivative of the likelihood $$ \frac{\partial \ell}{\partial a} = \sum_{q} \left(\theta_{q}-b\right)\nu\mu_{q}\left(1-\mu_{q}\right)\left[ S_{1q}-S_{2q}-f_q\left[ \psi(\mu_{q}\nu)-\psi(\nu(1-\mu_{q})) \right] \right] \\ \frac{\partial \ell}{\partial b} = -a\sum_{q}\nu\mu_{q}\left(1-\mu_{q}\right)\left[ S_{1q}-S_{2q}-f_q\left[ \psi(\mu_{q}\nu)-\psi(\nu(1-\mu_{q})) \right] \right] \\ \frac{\partial \ell}{\partial \nu} = N\psi(\nu) +\sum_{q}\left[ \mu_{q}(S_{1q}-f_q\psi(\mu_{q}\nu)) + (1-\mu_{q})(S_{2q}-f_q\psi(\nu(1-\mu_{q}))) \right] $$ where $S_{1q} = \sum_{j}{\gamma_{jq}\log{x_j}}$ and $S_{2q} = \sum_{j}{\gamma_{jq}\log{(1-x_j)}}$. Since $E_q[S_{1q}]=f_q\left[\psi(\mu_{q}\nu))-\psi(\nu)\right]$ and $E_q[S_{2q}]=f_q\left[\psi(\nu(1-\mu_{q})))-\psi(\nu)\right]$, the expected values of the first derivatives are 0. To keep $\nu$ positive, let $\nu = \exp{\xi}$; $\frac{\partial\nu}{\partial\xi}=\exp{\xi}=\nu$. $$ \frac{\partial \ell}{\partial \xi} = N\nu\psi(\nu) +\nu\sum_{q}\left[ \mu_{q}(S_{1q}-f_q\psi(\mu_{q}\nu)) + (1-\mu_{q})(S_{2q}-f_q\psi(\nu(1-\mu_{q}))) \right] $$ + Second derivative of the likelihood $$ E\left( \frac{\partial^2\ell}{\partial a^2}\right) = -\sum_{q} \left\{\left(\theta_{q}-b\right)\nu\mu_{q}\left(1-\mu_{q}\right)\right\}^{2}f_{q}\left[ \psi_{1}(\mu_{q}\nu)+\psi_{1}(\nu(1-\mu_{q})) \right] \\ E\left(\frac{\partial^2\ell}{\partial a \partial b}\right) = a\sum_{q} \left(\theta_{q}-b\right)\left\{\nu\mu_{q}\left(1-\mu_{q}\right)\right\}^{2}f_{q}\left[ \psi_{1}(\mu_{q}\nu)+\psi_{1}(\nu(1-\mu_{q})) \right] \\ E\left(\frac{\partial^2\ell}{\partial a \partial \nu}\right) = -\sum_{q} \left(\theta_{q}-b\right)\nu\mu_{q}\left(1-\mu_{q}\right)f_q\left[ \mu_{q}\psi_{1}(\mu_{q}\nu)-(1-\mu_{q})\psi_{1}(\nu(1-\mu_{q})) \right] \\ E\left(\frac{\partial^2\ell}{\partial b^2}\right) = -a^{2}\sum_{q} \left\{\nu\mu_{q}\left(1-\mu_{q}\right)\right\}^{2}f_{q}\left[ \psi_{1}(\mu_{q}\nu)+\psi_{1}(\nu(1-\mu_{q})) \right] \\ E\left(\frac{\partial^2\ell}{\partial b \partial \nu}\right) = a\sum_{q} \nu\mu_{q}\left(1-\mu_{q}\right)f_q\left[ \mu_{q}\psi_{1}(\mu_{q}\nu)-(1-\mu_{q})\psi_{1}(\nu(1-\mu_{q})) \right] \\ E\left(\frac{\partial^2\ell}{\partial \nu^2}\right) = N\psi_{1}(\nu) - \sum_{q}f_q\left[ \mu_{q}^{2}\psi_{1}(\mu_{q}\nu)+(1-\mu_{q})^{2}\psi_{1}(\nu(1-\mu_{q})) \right] $$ If we use $\xi$ instead of $\nu$, $$ E\left(\frac{\partial^2\ell}{\partial a \partial \xi}\right) = -\sum_{q} \left(\theta_{q}-b\right)\nu^{2}\mu_{q}\left(1-\mu_{q}\right)f_q\left[ \mu_{q}\psi_{1}(\mu_{q}\nu)-(1-\mu_{q})\psi_{1}(\nu(1-\mu_{q})) \right] \\ E\left(\frac{\partial^2\ell}{\partial b \partial \xi}\right) = a\sum_{q} \nu^{2}\mu_{q}\left(1-\mu_{q}\right)f_q\left[ \mu_{q}\psi_{1}(\mu_{q}\nu)-(1-\mu_{q})\psi_{1}(\nu(1-\mu_{q})) \right] \\ E\left(\frac{\partial^2\ell}{\partial \xi^2}\right) = N\nu^{2}\psi_{1}(\nu) - \nu^{2}\sum_{q}f_q\left[ \mu_{q}^{2}\psi_{1}(\mu_{q}\nu)+(1-\mu_{q})^{2}\psi_{1}(\nu(1-\mu_{q})) \right] $$
\ \ + **Preparing data** The function `DataGeneration` can be used in a preparation step. This function returns a set of artificial data and the true parameters underlying the data. ```{r} Alldata <- DataGeneration(N=1000, nitem_C = 8, latent_dist = "2NM", a_l = .3, a_u = .7, d = 1.664, sd_ratio = 2, prob = 0.3) data <- Alldata$data_C theta <- Alldata$theta colnames(data) <- paste0("item", 1:8) ``` \ + **Analysis** (parameter estimation) ```{r, results='hide', message=FALSE} Mod1 <- IRTest_Cont(data = data, latent_dist = "KDE") ``` \ + **Results** ```{r, message=FALSE, fig.align='center', fig.height=6, fig.width=6} ### Summary summary(Mod1) ### Log-likelihood logLik(Mod1) ### The estimated item parameters coef(Mod1) ### The estimated ability parameters fscore <- factor_score(Mod1, ability_method = "WLE") plot(theta, fscore$theta) abline(b=1, a=0) ### Standard errors of ability parameter estimates plot(fscore$theta, fscore$theta_se) ``` * The result of latent distribution estimation ```{r, fig.align='center', fig.asp=.6, fig.width=6} plot(Mod1, mapping = aes(colour="Estimated"), linewidth = 1) + lims(y = c(0, .75))+ geom_line( mapping=aes( x=seq(-6,6,length=121), y=dist2(seq(-6,6,length=121), prob = .3, d = 1.664, sd_ratio = 2), colour="True"), linewidth = 1)+ labs(title="The estimated latent density using '2NM'", colour= "Type")+ theme_bw() ``` * Item response function ```{r, fig.align='center', fig.asp=0.8, fig.width=6} p1 <- plot_item(Mod1,1) p2 <- plot_item(Mod1,2) p3 <- plot_item(Mod1,3) p4 <- plot_item(Mod1,4) grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2) ``` * Reliability ```{r} reliability(Mod1) ``` * Posterior distributions for the examinees Each examinee's posterior distribution is identified in the E-step of the estimation algorithm (i.e., EM algorithm). Posterior distributions can be found in `Mod1$Pk`. ```{r, fig.align='center', fig.asp=.7, fig.width=6} set.seed(1) selected_examinees <- sample(1:1000,6) post_sample <- data.frame( X = rep(seq(-6,6, length.out=121),6), prior = rep(Mod1$Ak/(Mod1$quad[2]-Mod1$quad[1]), 6), posterior = 10*c(t(Mod1$Pk[selected_examinees,])), ID = rep(paste("examinee", selected_examinees), each=121) ) ggplot(data=post_sample, mapping=aes(x=X))+ geom_line(mapping=aes(y=posterior, group=ID, color='Posterior'))+ geom_line(mapping=aes(y=prior, group=ID, color='Prior'))+ labs(title="Posterior densities for selected examinees", x=expression(theta), y='density')+ facet_wrap(~ID, ncol=2)+ theme_bw() ``` * Test information function ```{r, fig.align='center', fig.asp=.8, fig.width=6} ggplot()+ stat_function( fun = inform_f_test, args = list(Mod1) )+ stat_function( fun=inform_f_item, args = list(Mod1, 1), mapping = aes(color="Item 1") )+ stat_function( fun=inform_f_item, args = list(Mod1, 2), mapping = aes(color="Item 2") )+ stat_function( fun=inform_f_item, args = list(Mod1, 3), mapping = aes(color="Item 3") )+ stat_function( fun=inform_f_item, args = list(Mod1, 4), mapping = aes(color="Item 4") )+ stat_function( fun=inform_f_item, args = list(Mod1, 5), mapping = aes(color="Item 5") )+ stat_function( fun=inform_f_item, args = list(Mod1, 6), mapping = aes(color="Item 6") )+ stat_function( fun=inform_f_item, args = list(Mod1, 7), mapping = aes(color="Item 7") )+ stat_function( fun=inform_f_item, args = list(Mod1, 8), mapping = aes(color="Item 8") )+ lims(x=c(-6,6))+ labs(title="Test information function", x=expression(theta), y='information')+ theme_bw() ``` ---------- \break # 4. Mixed-format test + **Preparing data** As in the cases of dichotomous and polytomous items, the function `DataGeneration` can be used in the preparation step. This function returns artificial data and some useful objects for analysis (i.e., `theta`, `data_D`, `item_D`, `data_P`, & `item_P`). ```{r} Alldata <- DataGeneration(model_D = 2, model_P = "GRM", N=1000, nitem_D = 10, nitem_P = 5, latent_dist = "2NM", d = 1.664, sd_ratio = 1, prob = 0.5) DataD <- Alldata$data_D DataP <- Alldata$data_P theta <- Alldata$theta colnames(DataD) <- paste0("item", 1:10) colnames(DataP) <- paste0("item", 1:5) ``` \ \ \ + **Analysis** (parameter estimation) ```{r, results='hide', message=FALSE} Mod1 <- IRTest_Mix(data_D = DataD, data_P = DataP, model_D = "2PL", model_P = "GRM", latent_dist = "KDE") ``` \ \ \ + **Results** ```{r, message=FALSE, fig.align='center', fig.height=6, fig.width=6} ### Summary summary(Mod1) ### Log-likelihood logLik(Mod1) ### The estimated item parameters coef(Mod1) ### Standard errors of the item parameter estimates coef_se(Mod1) ### The estimated ability parameters fscore <- factor_score(Mod1, ability_method = "MLE") plot(theta, fscore$theta) abline(b=1, a=0) ### Standard errors of ability parameter estimates plot(fscore$theta, fscore$theta_se) ``` * The result of latent distribution estimation ```{r, fig.align='center', fig.asp=.6, fig.width=6} plot(Mod1, mapping = aes(colour="Estimated"), linewidth = 1) + stat_function( fun = dist2, args = list(prob = .5, d = 1.664, sd_ratio = 1), mapping = aes(colour = "True"), linewidth = 1) + lims(y = c(0, .75)) + labs(title="The estimated latent density using '2NM'", colour= "Type")+ theme_bw() ``` * Item response function ```{r, fig.align='center', fig.asp=0.8, fig.width=6} p1 <- plot_item(Mod1,1, type="d") p2 <- plot_item(Mod1,4, type="d") p3 <- plot_item(Mod1,8, type="d") p4 <- plot_item(Mod1,10, type="d") grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2) ``` ```{r, fig.align='center', fig.asp=0.8, fig.width=6} p1 <- plot_item(Mod1,1, type="p") p2 <- plot_item(Mod1,2, type="p") p3 <- plot_item(Mod1,3, type="p") p4 <- plot_item(Mod1,4, type="p") grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2) ``` + **Item fit** ```{r} item_fit(Mod1) ``` * Reliability ```{r} reliability(Mod1) ``` * Posterior distributions for the examinees Each examinee's posterior distribution is identified in the E-step of the estimation algorithm (i.e., EM algorithm). Posterior distributions can be found in `Mod1$Pk`. ```{r, fig.align='center', fig.asp=.7, fig.width=6} set.seed(1) selected_examinees <- sample(1:1000,6) post_sample <- data.frame( X = rep(seq(-6,6, length.out=121),6), prior = rep(Mod1$Ak/(Mod1$quad[2]-Mod1$quad[1]), 6), posterior = 10*c(t(Mod1$Pk[selected_examinees,])), ID = rep(paste("examinee", selected_examinees), each=121) ) ggplot(data=post_sample, mapping=aes(x=X))+ geom_line(mapping=aes(y=posterior, group=ID, color='Posterior'))+ geom_line(mapping=aes(y=prior, group=ID, color='Prior'))+ labs(title="Posterior densities for selected examinees", x=expression(theta), y='density')+ facet_wrap(~ID, ncol=2)+ theme_bw() ``` * Test information function ```{r, fig.align='center', fig.asp=.8, fig.width=8} ggplot()+ stat_function( fun = inform_f_test, args = list(Mod1) )+ stat_function( fun=inform_f_item, args = list(Mod1, 1, "d"), mapping = aes(color="Dichotomous Item 1") )+ stat_function( fun=inform_f_item, args = list(Mod1, 2, "d"), mapping = aes(color="Dichotomous Item 2") )+ stat_function( fun=inform_f_item, args = list(Mod1, 3, "d"), mapping = aes(color="Dichotomous Item 3") )+ stat_function( fun=inform_f_item, args = list(Mod1, 1, "p"), mapping = aes(color="Polytomous Item 1") )+ stat_function( fun=inform_f_item, args = list(Mod1, 2, "p"), mapping = aes(color="Polytomous Item 2") )+ stat_function( fun=inform_f_item, args = list(Mod1, 3, "p"), mapping = aes(color="Polytomous Item 3") )+ lims(x=c(-6,6))+ labs(title="Test information function", x=expression(theta), y='information')+ theme_bw() ``` ---------- \break # 5. Model comparison + **Data generation and model fitting** ```{r} data <- DataGeneration(N=1000, nitem_D = 10, latent_dist = "2NM", d = 1.664, sd_ratio = 2, prob = 0.3)$data_D ``` ```{r, results='hide', message=FALSE} model_fits <- list() model_fits[[1]] <- IRTest_Dich(data) model_fits[[2]] <- IRTest_Dich(data, latent_dist = "EHM") model_fits[[3]] <- IRTest_Dich(data, latent_dist = "2NM") model_fits[[4]] <- IRTest_Dich(data, latent_dist = "KDE") for(i in 1:10){ model_fits[[i+4]] <- IRTest_Dich(data, latent_dist = "DC", h = i) } names(model_fits) <- c("Normal", "EHM", "2NM", "KDM", paste0("DC", 1:10)) ``` + **The best Davidian-curve model** ```{r} do.call(what = "anova", args = model_fits[5:14]) do.call(what = "best_model", args = model_fits[5:14]) ``` + **The best model overall** ```{r} do.call(what = "best_model", args = c(model_fits[c(1:4,5)], criterion ="AIC")) ```