---
title: "IRT without the normality assumption"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{IRT without the normality assumption}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
fontsize: 12pt
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(IRTest)
library(ggplot2)
library(gridExtra)
```
# 0. Introduction
+ **IRTest** is a useful tool for $\mathcal{\color{red}{IRT}}$ (item response theory) parameter $\mathcal{\color{red}{est}}$imation, especially when the violation of normality assumption on latent distribution is suspected.
+ **IRTest** deals with uni-dimensional latent variable.
+ For missing values, **IRTest** adopts full information maximum likelihood (FIML) approach.
+ In **IRTest**, including the conventional usage of Gaussian distribution, several methods are available for estimation of latent distribution:
+ empirical histogram method,
+ two-component Gaussian mixture distribution,
+ Davidian curve,
+ kernel density estimation,
+ log-linear smoothing.
## Installation
The CRAN version of **IRTest** can be installed on R-console with:
```
install.packages("IRTest")
```
For the development version, it can be installed on R-console with:
```
devtools::install_github("SeewooLi/IRTest")
```
## Functions
Followings are the functions of **IRTest**.
+ `IRTest_Dich` is the estimation function when all items are *dichotomously* scored.
+ `IRTest_Poly` is the estimation function when all items are *polytomously* scored.
+ `IRTest_Cont` is the estimation function when all items are *continuously* scored.
+ `IRTest_Mix` is the estimation function for *a mixed-format test*, a test comprising both dichotomous item(s) and polytomous item(s).
+ `factor_score` estimates factor scores of examinees.
+ `coef_se` returns standard errors of item parameter estimates.
+ `best_model` selects the best model using an evaluation criterion.
+ `item_fit` tests the statistical fit of all items individually.
+ `inform_f_item` calculates the information value(s) of an item.
+ `inform_f_test` calculates the information value(s) of a test.
+ `plot_item` draws item response function(s) of an item.
+ `reliability` calculates marginal reliability coefficient of IRT.
+ `latent_distribution` returns evaluated PDF value(s) of an estimated latent distribution.
+ `DataGeneration` generates several objects that can be useful for computer simulation studies. Among these are simulated item parameters, ability parameters and the corresponding item-response data.
+ `dist2` is a probability density function of two-component Gaussian mixture distribution.
+ `original_par_2GM` converts re-parameterized parameters of two-component Gaussian mixture distribution into original parameters.
+ `cat_clps` recommends category collapsing based on item parameters (or, equivalently, item response functions).
+ `recategorize` implements the category collapsing.
+ `adaptive_test` estimates ability parameter(s) which can be utilized for computer adaptive testing (CAT).
+ For S3 methods, `anova`, `coef`, `logLik`, `plot`, `print`, and `summary` are available.
\
---
\break
# 1. Dichotomous items
+ **Preparing data**
The function `DataGeneration` can be used in a preparation step.
This function returns a set of artificial data and the true parameters underlying the data.
```{r}
Alldata <- DataGeneration(model_D = 2,
N=1000,
nitem_D = 15,
latent_dist = "2NM",
d = 1.664,
sd_ratio = 2,
prob = 0.3)
data <- Alldata$data_D
theta <- Alldata$theta
colnames(data) <- paste0("item", 1:15)
```
\
+ **Analysis** (parameter estimation)
```{r, results='hide', message=FALSE}
Mod1 <- IRTest_Dich(data = data,
model = 2,
latent_dist = "LLS",
h=4)
```
\
+ **Results**
```{r, message=FALSE, fig.align='center', fig.height=6, fig.width=6}
### Summary
summary(Mod1)
### Log-likelihood
logLik(Mod1)
### The estimated item parameters
coef(Mod1)
### Standard errors of the item parameter estimates
coef_se(Mod1)
### The estimated ability parameters
fscore <- factor_score(Mod1, ability_method = "MLE")
plot(theta, fscore$theta)
abline(b=1, a=0)
### Standard errors of ability parameter estimates
plot(fscore$theta, fscore$theta_se)
```
* The result of latent distribution estimation
```{r, fig.align='center', fig.asp=.6, fig.width=6}
plot(Mod1, mapping = aes(colour="Estimated"), linewidth = 1) +
lims(y = c(0, .75))+
geom_line(
mapping=aes(
x=seq(-6,6,length=121),
y=dist2(seq(-6,6,length=121), prob = .3, d = 1.664, sd_ratio = 2),
colour="True"),
linewidth = 1)+
labs(title="The estimated latent density using '2NM'", colour= "Type")+
theme_bw()
```
* Item response function
```{r, fig.align='center', fig.asp=0.8, fig.width=6}
p1 <- plot_item(Mod1,1)
p2 <- plot_item(Mod1,4)
p3 <- plot_item(Mod1,8)
p4 <- plot_item(Mod1,10)
grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)
```
+ **Item fit**
```{r}
item_fit(Mod1)
```
* Reliability
```{r}
reliability(Mod1)
```
* Posterior distributions for the examinees
Each examinee's posterior distribution is identified in the E-step of the estimation algorithm (i.e., EM algorithm).
Posterior distributions can be found in `Mod1$Pk`.
```{r, fig.align='center', fig.asp=.7, fig.width=6}
set.seed(1)
selected_examinees <- sample(1:1000,6)
post_sample <-
data.frame(
X = rep(seq(-6,6, length.out=121),6),
prior = rep(Mod1$Ak/(Mod1$quad[2]-Mod1$quad[1]), 6),
posterior = 10*c(t(Mod1$Pk[selected_examinees,])),
ID = rep(paste("examinee", selected_examinees), each=121)
)
ggplot(data=post_sample, mapping=aes(x=X))+
geom_line(mapping=aes(y=posterior, group=ID, color='Posterior'))+
geom_line(mapping=aes(y=prior, group=ID, color='Prior'))+
labs(title="Posterior densities for selected examinees", x=expression(theta), y='density')+
facet_wrap(~ID, ncol=2)+
theme_bw()
```
* Test information function
```{r, fig.align='center', fig.asp=.8, fig.width=6}
ggplot()+
stat_function(
fun = inform_f_test,
args = list(Mod1)
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 1),
mapping = aes(color="Item 1")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 2),
mapping = aes(color="Item 2")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 3),
mapping = aes(color="Item 3")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 4),
mapping = aes(color="Item 4")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 5),
mapping = aes(color="Item 5")
)+
lims(x=c(-6,6))+
labs(title="Test information function", x=expression(theta), y='information')+
theme_bw()
```
----------
\break
# 2. Polytomous items
+ **Preparing data**
```{r}
Alldata <- DataGeneration(model_P = "GRM",
categ = rep(c(3,7), each = 7),
N=1000,
nitem_P = 14,
latent_dist = "2NM",
d = 1.664,
sd_ratio = 2,
prob = 0.3)
data <- Alldata$data_P
theta <- Alldata$theta
colnames(data) <- paste0("item", 1:14)
```
\
+ **Analysis** (parameter estimation)
```{r, results='hide', message=FALSE}
Mod1 <- IRTest_Poly(data = data,
model = "GRM",
latent_dist = "KDE")
```
\
+ **Results**
```{r, message=FALSE, fig.align='center', fig.height=6, fig.width=6}
### Summary
summary(Mod1)
### Log-likelihood
logLik(Mod1)
### The estimated item parameters
coef(Mod1)
### Standard errors of the item parameter estimates
coef_se(Mod1)
### The estimated ability parameters
fscore <- factor_score(Mod1, ability_method = "MLE")
plot(theta, fscore$theta)
abline(b=1, a=0)
### Standard errors of ability parameter estimates
plot(fscore$theta, fscore$theta_se)
```
* The result of latent distribution estimation
```{r, fig.align='center', fig.asp=.6, fig.width=6}
plot(Mod1, mapping = aes(colour="Estimated"), linewidth = 1) +
stat_function(
fun = dist2,
args = list(prob = .3, d = 1.664, sd_ratio = 2),
mapping = aes(colour = "True"),
linewidth = 1) +
lims(y = c(0, .75)) +
labs(title="The estimated latent density using '2NM'", colour= "Type")+
theme_bw()
```
* Item response function
```{r, fig.align='center', fig.asp=0.8, fig.width=6}
p1 <- plot_item(Mod1,1)
p2 <- plot_item(Mod1,4)
p3 <- plot_item(Mod1,8)
p4 <- plot_item(Mod1,10)
grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)
```
+ **Item fit**
```{r}
item_fit(Mod1)
```
* Reliability
```{r}
reliability(Mod1)
```
* Posterior distributions for the examinees
Each examinee's posterior distribution is identified in the E-step of the estimation algorithm (i.e., EM algorithm).
Posterior distributions can be found in `Mod1$Pk`.
```{r, fig.align='center', fig.asp=.7, fig.width=6}
set.seed(1)
selected_examinees <- sample(1:1000,6)
post_sample <-
data.frame(
X = rep(seq(-6,6, length.out=121),6),
prior = rep(Mod1$Ak/(Mod1$quad[2]-Mod1$quad[1]), 6),
posterior = 10*c(t(Mod1$Pk[selected_examinees,])),
ID = rep(paste("examinee", selected_examinees), each=121)
)
ggplot(data=post_sample, mapping=aes(x=X))+
geom_line(mapping=aes(y=posterior, group=ID, color='Posterior'))+
geom_line(mapping=aes(y=prior, group=ID, color='Prior'))+
labs(title="Posterior densities for selected examinees", x=expression(theta), y='density')+
facet_wrap(~ID, ncol=2)+
theme_bw()
```
* Test information function
```{r, fig.align='center', fig.asp=.8, fig.width=6}
ggplot()+
stat_function(
fun = inform_f_test,
args = list(Mod1)
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 1),
mapping = aes(color="Item 1 (3 cats)")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 2),
mapping = aes(color="Item 2 (3 cats)")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 3),
mapping = aes(color="Item 3 (3 cats)")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 8),
mapping = aes(color="Item 8 (7 cats)")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 9),
mapping = aes(color="Item 9 (7 cats)")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 10, "p"),
mapping = aes(color="Item10 (7 cats)")
)+
lims(x=c(-6,6))+
labs(title="Test information function", x=expression(theta), y='information')+
theme_bw()
```
----------
\break
\
# 3. Continuous items
+ **Statistical details of the continuous IRT**
Beta distribution (click)
$$
\begin{align}
f(x) &= \frac{1}{Beta(\alpha, \beta)}x^{\alpha-1}(1-x)^{(\beta-1)} \\
&= \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{(\beta-1)}
\end{align}
$$
$E(x)=\frac{\alpha}{\alpha+\beta}$ and $Var(x)=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta=1)}$
If we reparameterize $\mu=\frac{\alpha}{\alpha+\beta}$ and $\nu=\alpha+\beta$,
$$
f(x) = \frac{\Gamma(\nu)}{\Gamma(\mu\nu)\Gamma(\nu(1-\mu))}x^{\mu\nu-1}(1-x)^{(\nu(1-\mu)-1)}
$$
No Jacobian transformation required since $\mu$ and $\nu$ are parameters of the $f(x)$, not variables.
Useful equations (click)
$\psi(\bullet)$ and $\psi_1(\bullet)$ denote for digamma and trigamma functions,
respectively.
$$
\begin{align}
E[\log{x}] &= \int_{0}^{1}{\log{x}f(x) \,dx} \\
&= \int_{0}^{1}{\log{x} \frac{1}{Beta(\alpha, \beta)}x^{\alpha-1}(1-x)^{(\beta-1)} \,dx} \\
&= \frac{1}{Beta(\alpha, \beta)} \int_{0}^{1}{\log{(x)} x^{\alpha-1}(1-x)^{(\beta-1)} \,dx} \\
&= \frac{1}{Beta(\alpha, \beta)} \int_{0}^{1}{\frac{\partial x^{\alpha-1}(1-x)^{(\beta-1)}}{\partial \alpha} \,dx} \\
&= \frac{1}{Beta(\alpha, \beta)} \frac{\partial}{\partial \alpha}\int_{0}^{1}{ x^{\alpha-1}(1-x)^{(\beta-1)} \,dx} \\
&= \frac{1}{Beta(\alpha, \beta)} \frac{\partial Beta(\alpha, \beta)}{\partial \alpha} \\
&= \frac{\partial \log{[Beta(\alpha, \beta)]}}{\partial \alpha} \\
&= \frac{\partial \log{[\Gamma(\alpha)]}}{\partial \alpha} - \frac{\partial \log{[\Gamma(\alpha + \beta)]}}{\partial \alpha} \\
&= \psi(\alpha) - \psi(\alpha+\beta)
\end{align}
$$
Similarly, $E[\log{(1-x)}]=\psi(\beta) - \psi(\alpha+\beta)$.
Furthermore, using $\frac{\partial Beta(\alpha,\beta)}{\partial \alpha} = Beta(\alpha,\beta)\left(\psi(\alpha)-\psi(\alpha+\beta)\right)$ and $\frac{\partial^2 Beta(\alpha,\beta)}{\partial \alpha^2} = Beta(\alpha,\beta)\left(\psi(\alpha)-\psi(\alpha+\beta)\right)^2 + Beta(\alpha,\beta)\left(\psi_1(\alpha)-\psi_1(\alpha+\beta)\right)$,
$$
\begin{align}
E\left[(\log{x})^2\right] &= \frac{1}{Beta(\alpha, \beta)} \frac{\partial^2 Beta(\alpha, \beta)}{\partial \alpha^2} \\
&= \left(\psi(\alpha)-\psi(\alpha+\beta)\right)^2 + \left(\psi_1(\alpha)-\psi_1(\alpha+\beta)\right)
\end{align}
$$
This leads to,
$$
\begin{align}
Var\left[\log{x}\right] &= E\left[(\log{x})^2\right] - E\left[\log{x}\right]^2 \\
&=\psi_1(\alpha)-\psi_1(\alpha+\beta)
\end{align}
$$
Continuous IRT (click)
+ Expected item response
$$
\mu = \frac{e^{a(\theta -b)}}{1+e^{a(\theta -b)}} \\
\frac{\partial \mu}{\partial \theta} = a\mu(1-\mu) \\
\frac{\partial \mu}{\partial a} = (\theta - b)\mu(1-\mu) \\
\frac{\partial \mu}{\partial b} = -a\mu(1-\mu) \\
\frac{\partial \mu}{\partial \nu} = 0
$$
+ Probability of a response
$$
f(x)=P(x|\, \theta, a, b, \nu) = \frac{\Gamma(\nu)}{\Gamma(\mu\nu)\Gamma(\nu(1-\mu))} x^{\mu\nu-1} (1-x)^{\nu(1-\mu)-1} \\
$$
+ Gradient
$$
\log{f} = \log{\Gamma(\nu)}-\log{\Gamma(\mu\nu)}-\log{\Gamma(\nu(1-\mu))} + (\mu\nu-1)\log{x} + (\nu(1-\mu)-1) \log{(1-x)}
$$
$$
\frac{\partial \log{f}}{\partial \theta} = a\nu\mu(1-\mu)\left[-\psi{(\mu\nu)}+\psi{(\nu(1-\mu))}+ \log{\left(\frac{x}{1-x}\right)}\right]
$$
+ Information
$$
E\left[ \left(\frac{\partial \log{f}}{\partial \theta}\right)^2 \right] =
(a\nu\mu(1-\mu))^2\left[E\left[ \log{\left(\frac{x}{1-x}\right)^2}\right]
-2 \left(\psi{(\mu\nu)}-\psi{(\nu(1-\mu))}\right )E\left[ \log{\left(\frac{x}{1-x}\right)}\right]
+\left(\psi{(\mu\nu)}-\psi{(\nu(1-\mu))}\right )^2
\right]
$$
$$
\begin{align}
E\left[ \log{\left(\frac{x}{1-x}\right)^2}\right]
&= E\left[ \log{\left(x\right)^2}\right]
-2 E\left[ \log{\left(x\right)}\log{\left(1-x\right)}\right]
+ E\left[ \log{\left(1-x\right)^2}\right] \\
&= Var\left[ \log{\left(x\right)}\right]+E\left[ \log{\left(x\right)}\right]^2 \\
&\qquad -2 Cov\left[ \log{\left(x\right)}\log{\left(1-x\right)}\right]-2E\left[ \log{\left(x\right)}\right]E\left[ \log{\left(1-x\right)}\right] \\
&\qquad + Var\left[ \log{\left(1-x\right)}\right]+E\left[ \log{\left(1-x\right)}\right]^2 \\
&= \psi_{1}(\alpha)-\psi_{1}(\alpha+\beta) +E\left[ \log{\left(x\right)}\right]^2 \\
&\qquad +2 \psi_{1}(\alpha+\beta)-2E\left[ \log{\left(x\right)}\right]E\left[ \log{\left(1-x\right)}\right] \\
&\qquad + \psi_{1}(\beta)-\psi_{1}(\alpha+\beta)+E\left[ \log{\left(1-x\right)}\right]^2 \\
&= \psi_{1}(\alpha)-\psi_{1}(\alpha+\beta) +\left[ \psi(\alpha)-\psi(\alpha+\beta)\right]^2 \\
&\qquad +2 \psi_{1}(\alpha+\beta)-2 \left(\psi(\alpha)-\psi(\alpha+\beta)\right)\left(\psi(\beta)-\psi(\alpha+\beta)\right) \\
&\qquad + \psi_{1}(\beta)-\psi_{1}(\alpha+\beta)+\left[\psi(\beta)-\psi(\alpha+\beta)\right]^2 \\
&= \psi_{1}(\alpha) +\psi_{1}(\beta) +\left[\psi(\alpha)-\psi(\beta)\right]^2
\end{align}
$$
$$
\begin{align}
E\left[\left(\frac{\partial \log{f}}{\partial \theta}\right)^2 \right] & =
(a\nu\mu(1-\mu))^2\left[E\left[ \log{\left(\frac{x}{1-x}\right)^2}\right]
-2 \left(\psi{(\mu\nu)}-\psi{(\nu(1-\mu))}\right )E\left[ \log{\left(\frac{x}{1-x}\right)}\right]
+\left(\psi{(\mu\nu)}-\psi{(\nu(1-\mu))}\right )^2
\right] \\
&= (a\nu\mu(1-\mu))^2\left[\psi_{1}(\alpha) +\psi_{1}(\beta)
+\left[\psi(\alpha)-\psi(\beta)\right]^2
-2 \left(\psi{(\alpha)}-\psi{(\beta)}\right )\left(\psi{(\alpha)}-\psi{(\beta)}\right )
+\left(\psi{(\alpha)}-\psi{(\beta)}\right )^2
\right] \\
&= (a\nu\mu(1-\mu))^2\left[\psi_{1}(\alpha) +\psi_{1}(\beta)
\right] \\
\end{align}
$$
$$
I(\theta) = E\left[\left(\frac{\partial \log{f}}{\partial \theta}\right)^2 \right] = (a\nu\mu(1-\mu))^2\left[\psi_{1}(\alpha) +\psi_{1}(\beta)\right]
$$
+ Log-likelihood in the M-step of the MML-EM procedure
Marginal log-likelihood of an item can be expressed as follows:
$$
\ell = \sum_{j} \sum_{q}\gamma_{jq}\log{L_{jq}},
$$
where $\gamma_{jq}=E\left[\Pr\left(\theta_j \in \theta_{q}^{*}\right)\right]$ is
the expected probability of respondent $j$'s ability ($\theta_j$) belonging to
the $\theta_{q}^{*}$ of the quadrature scheme and is calculated at the E-step
of the MML-EM procedure, and $L_{jq}$ is the likelihood of respondent $j$'s
response at $\theta_{q}^{*}$ for the item of current interest.
+ First derivative of the likelihood
$$
\frac{\partial \ell}{\partial a} = \sum_{q}
\left(\theta_{q}-b\right)\nu\mu_{q}\left(1-\mu_{q}\right)\left[
S_{1q}-S_{2q}-f_q\left[ \psi(\mu_{q}\nu)-\psi(\nu(1-\mu_{q})) \right]
\right] \\
\frac{\partial \ell}{\partial b} = -a\sum_{q}\nu\mu_{q}\left(1-\mu_{q}\right)\left[
S_{1q}-S_{2q}-f_q\left[ \psi(\mu_{q}\nu)-\psi(\nu(1-\mu_{q})) \right]
\right] \\
\frac{\partial \ell}{\partial \nu} = N\psi(\nu) +\sum_{q}\left[
\mu_{q}(S_{1q}-f_q\psi(\mu_{q}\nu)) + (1-\mu_{q})(S_{2q}-f_q\psi(\nu(1-\mu_{q})))
\right]
$$
where $S_{1q} = \sum_{j}{\gamma_{jq}\log{x_j}}$ and $S_{2q} = \sum_{j}{\gamma_{jq}\log{(1-x_j)}}$.
Since $E_q[S_{1q}]=f_q\left[\psi(\mu_{q}\nu))-\psi(\nu)\right]$ and
$E_q[S_{2q}]=f_q\left[\psi(\nu(1-\mu_{q})))-\psi(\nu)\right]$, the expected values of
the first derivatives are 0.
To keep $\nu$ positive, let $\nu = \exp{\xi}$; $\frac{\partial\nu}{\partial\xi}=\exp{\xi}=\nu$.
$$
\frac{\partial \ell}{\partial \xi} = N\nu\psi(\nu) +\nu\sum_{q}\left[
\mu_{q}(S_{1q}-f_q\psi(\mu_{q}\nu)) + (1-\mu_{q})(S_{2q}-f_q\psi(\nu(1-\mu_{q})))
\right]
$$
+ Second derivative of the likelihood
$$
E\left( \frac{\partial^2\ell}{\partial a^2}\right) = -\sum_{q}
\left\{\left(\theta_{q}-b\right)\nu\mu_{q}\left(1-\mu_{q}\right)\right\}^{2}f_{q}\left[ \psi_{1}(\mu_{q}\nu)+\psi_{1}(\nu(1-\mu_{q})) \right] \\
E\left(\frac{\partial^2\ell}{\partial a \partial b}\right) = a\sum_{q}
\left(\theta_{q}-b\right)\left\{\nu\mu_{q}\left(1-\mu_{q}\right)\right\}^{2}f_{q}\left[ \psi_{1}(\mu_{q}\nu)+\psi_{1}(\nu(1-\mu_{q})) \right] \\
E\left(\frac{\partial^2\ell}{\partial a \partial \nu}\right) = -\sum_{q}
\left(\theta_{q}-b\right)\nu\mu_{q}\left(1-\mu_{q}\right)f_q\left[ \mu_{q}\psi_{1}(\mu_{q}\nu)-(1-\mu_{q})\psi_{1}(\nu(1-\mu_{q})) \right] \\
E\left(\frac{\partial^2\ell}{\partial b^2}\right) = -a^{2}\sum_{q}
\left\{\nu\mu_{q}\left(1-\mu_{q}\right)\right\}^{2}f_{q}\left[ \psi_{1}(\mu_{q}\nu)+\psi_{1}(\nu(1-\mu_{q})) \right] \\
E\left(\frac{\partial^2\ell}{\partial b \partial \nu}\right) = a\sum_{q}
\nu\mu_{q}\left(1-\mu_{q}\right)f_q\left[ \mu_{q}\psi_{1}(\mu_{q}\nu)-(1-\mu_{q})\psi_{1}(\nu(1-\mu_{q})) \right] \\
E\left(\frac{\partial^2\ell}{\partial \nu^2}\right) = N\psi_{1}(\nu) - \sum_{q}f_q\left[ \mu_{q}^{2}\psi_{1}(\mu_{q}\nu)+(1-\mu_{q})^{2}\psi_{1}(\nu(1-\mu_{q})) \right]
$$
If we use $\xi$ instead of $\nu$,
$$
E\left(\frac{\partial^2\ell}{\partial a \partial \xi}\right) = -\sum_{q}
\left(\theta_{q}-b\right)\nu^{2}\mu_{q}\left(1-\mu_{q}\right)f_q\left[ \mu_{q}\psi_{1}(\mu_{q}\nu)-(1-\mu_{q})\psi_{1}(\nu(1-\mu_{q})) \right] \\
E\left(\frac{\partial^2\ell}{\partial b \partial \xi}\right) = a\sum_{q}
\nu^{2}\mu_{q}\left(1-\mu_{q}\right)f_q\left[ \mu_{q}\psi_{1}(\mu_{q}\nu)-(1-\mu_{q})\psi_{1}(\nu(1-\mu_{q})) \right] \\
E\left(\frac{\partial^2\ell}{\partial \xi^2}\right) = N\nu^{2}\psi_{1}(\nu) - \nu^{2}\sum_{q}f_q\left[ \mu_{q}^{2}\psi_{1}(\mu_{q}\nu)+(1-\mu_{q})^{2}\psi_{1}(\nu(1-\mu_{q})) \right]
$$
\
\
+ **Preparing data**
The function `DataGeneration` can be used in a preparation step.
This function returns a set of artificial data and the true parameters underlying the data.
```{r}
Alldata <- DataGeneration(N=1000,
nitem_C = 8,
latent_dist = "2NM",
a_l = .3,
a_u = .7,
d = 1.664,
sd_ratio = 2,
prob = 0.3)
data <- Alldata$data_C
theta <- Alldata$theta
colnames(data) <- paste0("item", 1:8)
```
\
+ **Analysis** (parameter estimation)
```{r, results='hide', message=FALSE}
Mod1 <- IRTest_Cont(data = data,
latent_dist = "KDE")
```
\
+ **Results**
```{r, message=FALSE, fig.align='center', fig.height=6, fig.width=6}
### Summary
summary(Mod1)
### Log-likelihood
logLik(Mod1)
### The estimated item parameters
coef(Mod1)
### The estimated ability parameters
fscore <- factor_score(Mod1, ability_method = "WLE")
plot(theta, fscore$theta)
abline(b=1, a=0)
### Standard errors of ability parameter estimates
plot(fscore$theta, fscore$theta_se)
```
* The result of latent distribution estimation
```{r, fig.align='center', fig.asp=.6, fig.width=6}
plot(Mod1, mapping = aes(colour="Estimated"), linewidth = 1) +
lims(y = c(0, .75))+
geom_line(
mapping=aes(
x=seq(-6,6,length=121),
y=dist2(seq(-6,6,length=121), prob = .3, d = 1.664, sd_ratio = 2),
colour="True"),
linewidth = 1)+
labs(title="The estimated latent density using '2NM'", colour= "Type")+
theme_bw()
```
* Item response function
```{r, fig.align='center', fig.asp=0.8, fig.width=6}
p1 <- plot_item(Mod1,1)
p2 <- plot_item(Mod1,2)
p3 <- plot_item(Mod1,3)
p4 <- plot_item(Mod1,4)
grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)
```
* Reliability
```{r}
reliability(Mod1)
```
* Posterior distributions for the examinees
Each examinee's posterior distribution is identified in the E-step of the estimation algorithm (i.e., EM algorithm).
Posterior distributions can be found in `Mod1$Pk`.
```{r, fig.align='center', fig.asp=.7, fig.width=6}
set.seed(1)
selected_examinees <- sample(1:1000,6)
post_sample <-
data.frame(
X = rep(seq(-6,6, length.out=121),6),
prior = rep(Mod1$Ak/(Mod1$quad[2]-Mod1$quad[1]), 6),
posterior = 10*c(t(Mod1$Pk[selected_examinees,])),
ID = rep(paste("examinee", selected_examinees), each=121)
)
ggplot(data=post_sample, mapping=aes(x=X))+
geom_line(mapping=aes(y=posterior, group=ID, color='Posterior'))+
geom_line(mapping=aes(y=prior, group=ID, color='Prior'))+
labs(title="Posterior densities for selected examinees", x=expression(theta), y='density')+
facet_wrap(~ID, ncol=2)+
theme_bw()
```
* Test information function
```{r, fig.align='center', fig.asp=.8, fig.width=6}
ggplot()+
stat_function(
fun = inform_f_test,
args = list(Mod1)
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 1),
mapping = aes(color="Item 1")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 2),
mapping = aes(color="Item 2")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 3),
mapping = aes(color="Item 3")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 4),
mapping = aes(color="Item 4")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 5),
mapping = aes(color="Item 5")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 6),
mapping = aes(color="Item 6")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 7),
mapping = aes(color="Item 7")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 8),
mapping = aes(color="Item 8")
)+
lims(x=c(-6,6))+
labs(title="Test information function", x=expression(theta), y='information')+
theme_bw()
```
----------
\break
# 4. Mixed-format test
+ **Preparing data**
As in the cases of dichotomous and polytomous items, the function `DataGeneration` can be used in the preparation step.
This function returns artificial data and some useful objects for analysis (i.e., `theta`, `data_D`, `item_D`, `data_P`, & `item_P`).
```{r}
Alldata <- DataGeneration(model_D = 2,
model_P = "GRM",
N=1000,
nitem_D = 10,
nitem_P = 5,
latent_dist = "2NM",
d = 1.664,
sd_ratio = 1,
prob = 0.5)
DataD <- Alldata$data_D
DataP <- Alldata$data_P
theta <- Alldata$theta
colnames(DataD) <- paste0("item", 1:10)
colnames(DataP) <- paste0("item", 1:5)
```
\
\
\
+ **Analysis** (parameter estimation)
```{r, results='hide', message=FALSE}
Mod1 <- IRTest_Mix(data_D = DataD,
data_P = DataP,
model_D = "2PL",
model_P = "GRM",
latent_dist = "KDE")
```
\
\
\
+ **Results**
```{r, message=FALSE, fig.align='center', fig.height=6, fig.width=6}
### Summary
summary(Mod1)
### Log-likelihood
logLik(Mod1)
### The estimated item parameters
coef(Mod1)
### Standard errors of the item parameter estimates
coef_se(Mod1)
### The estimated ability parameters
fscore <- factor_score(Mod1, ability_method = "MLE")
plot(theta, fscore$theta)
abline(b=1, a=0)
### Standard errors of ability parameter estimates
plot(fscore$theta, fscore$theta_se)
```
* The result of latent distribution estimation
```{r, fig.align='center', fig.asp=.6, fig.width=6}
plot(Mod1, mapping = aes(colour="Estimated"), linewidth = 1) +
stat_function(
fun = dist2,
args = list(prob = .5, d = 1.664, sd_ratio = 1),
mapping = aes(colour = "True"),
linewidth = 1) +
lims(y = c(0, .75)) +
labs(title="The estimated latent density using '2NM'", colour= "Type")+
theme_bw()
```
* Item response function
```{r, fig.align='center', fig.asp=0.8, fig.width=6}
p1 <- plot_item(Mod1,1, type="d")
p2 <- plot_item(Mod1,4, type="d")
p3 <- plot_item(Mod1,8, type="d")
p4 <- plot_item(Mod1,10, type="d")
grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)
```
```{r, fig.align='center', fig.asp=0.8, fig.width=6}
p1 <- plot_item(Mod1,1, type="p")
p2 <- plot_item(Mod1,2, type="p")
p3 <- plot_item(Mod1,3, type="p")
p4 <- plot_item(Mod1,4, type="p")
grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)
```
+ **Item fit**
```{r}
item_fit(Mod1)
```
* Reliability
```{r}
reliability(Mod1)
```
* Posterior distributions for the examinees
Each examinee's posterior distribution is identified in the E-step of the estimation algorithm (i.e., EM algorithm).
Posterior distributions can be found in `Mod1$Pk`.
```{r, fig.align='center', fig.asp=.7, fig.width=6}
set.seed(1)
selected_examinees <- sample(1:1000,6)
post_sample <-
data.frame(
X = rep(seq(-6,6, length.out=121),6),
prior = rep(Mod1$Ak/(Mod1$quad[2]-Mod1$quad[1]), 6),
posterior = 10*c(t(Mod1$Pk[selected_examinees,])),
ID = rep(paste("examinee", selected_examinees), each=121)
)
ggplot(data=post_sample, mapping=aes(x=X))+
geom_line(mapping=aes(y=posterior, group=ID, color='Posterior'))+
geom_line(mapping=aes(y=prior, group=ID, color='Prior'))+
labs(title="Posterior densities for selected examinees", x=expression(theta), y='density')+
facet_wrap(~ID, ncol=2)+
theme_bw()
```
* Test information function
```{r, fig.align='center', fig.asp=.8, fig.width=8}
ggplot()+
stat_function(
fun = inform_f_test,
args = list(Mod1)
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 1, "d"),
mapping = aes(color="Dichotomous Item 1")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 2, "d"),
mapping = aes(color="Dichotomous Item 2")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 3, "d"),
mapping = aes(color="Dichotomous Item 3")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 1, "p"),
mapping = aes(color="Polytomous Item 1")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 2, "p"),
mapping = aes(color="Polytomous Item 2")
)+
stat_function(
fun=inform_f_item,
args = list(Mod1, 3, "p"),
mapping = aes(color="Polytomous Item 3")
)+
lims(x=c(-6,6))+
labs(title="Test information function", x=expression(theta), y='information')+
theme_bw()
```
----------
\break
# 5. Model comparison
+ **Data generation and model fitting**
```{r}
data <- DataGeneration(N=1000,
nitem_D = 10,
latent_dist = "2NM",
d = 1.664,
sd_ratio = 2,
prob = 0.3)$data_D
```
```{r, results='hide', message=FALSE}
model_fits <- list()
model_fits[[1]] <- IRTest_Dich(data)
model_fits[[2]] <- IRTest_Dich(data, latent_dist = "EHM")
model_fits[[3]] <- IRTest_Dich(data, latent_dist = "2NM")
model_fits[[4]] <- IRTest_Dich(data, latent_dist = "KDE")
for(i in 1:10){
model_fits[[i+4]] <- IRTest_Dich(data, latent_dist = "DC", h = i)
}
names(model_fits) <- c("Normal", "EHM", "2NM", "KDM", paste0("DC", 1:10))
```
+ **The best Davidian-curve model**
```{r}
do.call(what = "anova", args = model_fits[5:14])
do.call(what = "best_model", args = model_fits[5:14])
```
+ **The best model overall**
```{r}
do.call(what = "best_model", args = c(model_fits[c(1:4,5)], criterion ="AIC"))
```