Package 'IRTest'

Title: Parameter Estimation of Item Response Theory with Estimation of Latent Distribution
Description: Item response theory (IRT) parameter estimation using marginal maximum likelihood and expectation-maximization algorithm (Bock & Aitkin, 1981 <doi:10.1007/BF02293801>). Within parameter estimation algorithm, several methods for latent distribution estimation are available. Reflecting some features of the true latent distribution, these latent distribution estimation methods can possibly enhance the estimation accuracy and free the normality assumption on the latent distribution.
Authors: Seewoo Li [aut, cre, cph]
Maintainer: Seewoo Li <[email protected]>
License: GPL (>= 3)
Version: 2.1.0
Built: 2025-02-22 22:50:38 UTC
Source: https://github.com/seewooli/irtest

Help Index


Ability parameter estimation with fixed item parameters

Description

Ability parameter estimation when item responses and item parameters are given. This function can be useful in ability parameter estimation is adaptive testing.

Usage

adaptive_test(
  response,
  item,
  model = "dich",
  ability_method = "EAP",
  quad = NULL,
  prior = NULL
)

Arguments

response

A matrix of item responses. For mixed-format test, a list of item responses where dichotomous item responses are the first element and polytomous item responses are the second element.

item

A matrix of item parameters. For mixed-format test, a list of item parameters where dichotomous item parameters are the first element and polytomous item parameters are the second element.

model

dich for dichotomous items, cont for continuous items, and a specific item response model (e.g., PCM, GPCM, GRM) for polytomous items and a mixed-format test. The default is dich.

ability_method

The ability parameter estimation method. The available options are Expected a posteriori (EAP), Maximum Likelihood Estimates (MLE), and weighted likelihood estimates (WLE). The default is EAP.

quad

A vector of quadrature points for EAP calculation. If NULL is passed, it is set as seq(-6,6,length.out=121). The default is NULL.

prior

A vector of the prior distribution for EAP calculation. The length of it should be the same as quad. If NULL is passed, the standard normal distribution is used. The default is NULL.

Value

theta

The estimated ability parameter values. If ability_method = "MLE". If an examinee receives a maximum or minimum score for all items, the function returns ±\pmInf.

theta_se

The standard errors of ability parameter estimates. It returns standard deviations of posteriors for EAPs and asymptotic standard errors (i.e., square root of inverse Fisher information) for MLE. If an examinee receives a maximum or minimum score for all items, the function returns NA for MLE.

Author(s)

Seewoo Li [email protected]

Examples

# dichotomous

response <- c(1,1,0)
item <- matrix(
  c(
      1, -0.5,   0,
    1.5,   -1,   0,
    1.2,    0, 0.2
  ), nrow = 3, byrow = TRUE
)
adaptive_test(response, item, model = "dich", ability_method = "WLE")


# polytomous

response <- c(1,2,0)
item <- matrix(
    c(
      1, -0.5, 0.5,
    1.5,   -1,   0,
    1.2,    0, 0.4
    ), nrow = 3, byrow = TRUE
  )
adaptive_test(response, item, model="GPCM", ability_method = "WLE")


# mixed-format test

response <- list(c(0,0,0),c(2,2,1))
item <- list(
  matrix(
    c(
        1, -0.5, 0,
      1.5,   -1, 0,
      1.2,    0, 0
    ), nrow = 3, byrow = TRUE
  ),
  matrix(
    c(
        1, -0.5, 0.5,
      1.5,   -1,   0,
      1.2,    0, 0.4
    ), nrow = 3, byrow = TRUE
  )
)
adaptive_test(response, item, model = "GPCM", ability_method = "WLE")


# continuous response

response <- c(0.88, 0.68, 0.21)
item <- matrix(
  c(
    1, -0.5, 10,
    1.5,   -1,  8,
    1.2,    0, 11
  ), nrow = 3, byrow = TRUE
)
adaptive_test(response, item, model = "cont", ability_method = "WLE")

Model comparison

Description

Model comparison

Usage

## S3 method for class 'IRTest'
anova(...)

Arguments

...

Objects of "IRTest"-class to be compared.

Value

Model-fit indices and results of likelihood ratio test (LRT).

Author(s)

Seewoo Li [email protected]


Selecting the best model

Description

Selecting the best model

Usage

best_model(..., criterion = "HQ")

Arguments

...

Candidate models

criterion

The criterion to be used. The default is HQ.

Value

The best model and model-fit indices.

Author(s)

Seewoo Li [email protected]


A recommendation for category collapsing of items based on item parameters

Description

In a polytomous item, one or more score categories may not have the highest probability among the categories in an acceptable θ\theta range. In this case, the category may possibly be regarded as redundant in a psychometric point of view and can be collapsed into another score category. This function returns a recommendation for a recategorization scheme based on item parameters.

Usage

cat_clps(item.matrix, range = c(-4, 4), increment = 0.005)

Arguments

item.matrix

A matrix of item parameters.

range

A range of θ\theta to be evaluated. The default is c(-4, 4).

increment

A width of the grid scheme. The default is 0.005.

Value

A list of recommended recategorization for each item.

Author(s)

Seewoo Li [email protected]


Extract Standard Errors of Model Coefficients

Description

Standard errors of model coefficients calculated by using Fisher information functions.

Usage

coef_se(object, complete = TRUE)

Arguments

object

An object for which the extraction of standard errors is meaningful.

complete

A logical value indicating if the full standard-error vector should be returned.

Value

Standard errors extracted from the model (object).


Extract Model Coefficients

Description

A generic function which extracts model coefficients from objects returned by modeling functions.

Usage

## S3 method for class 'IRTest'
coef(object, complete = TRUE, ...)

Arguments

object

An object for which the extraction of model coefficients is meaningful.

complete

A logical value indicating if the full coefficient vector should be returned.

...

Other arguments.

Value

Coefficients extracted from the model (object).


Generating an artificial item response dataset

Description

This function generates an artificial item response dataset allowing various options.

Usage

DataGeneration(
  seed = 1,
  N = 2000,
  nitem_D = 0,
  nitem_P = 0,
  nitem_C = 0,
  model_D = "2PL",
  model_P = "GPCM",
  latent_dist = "Normal",
  item_D = NULL,
  item_P = NULL,
  item_C = NULL,
  theta = NULL,
  prob = 0.5,
  d = 1.7,
  sd_ratio = 1,
  m = 0,
  s = 1,
  a_l = 0.8,
  a_u = 2.5,
  b_m = NULL,
  b_sd = NULL,
  c_l = 0,
  c_u = 0.2,
  categ = 5,
  possible_ans = c(0.1, 0.3, 0.5, 0.7, 0.9)
)

Arguments

seed

A numeric value that is used for random sampling. Seed number can guarantee a replicability of the result.

N

A numeric value of the number of examinees.

nitem_D

A numeric value of the number of dichotomous items.

nitem_P

A numeric value of the number of polytomous items.

nitem_C

A numeric value of the number of continuous response items.

model_D

A vector or a character string that represents the probability model for the dichotomous items.

model_P

A character string that represents the probability model for the polytomous items.

latent_dist

A character string that determines the type of latent distribution. Currently available options are "beta" (four-parameter beta distribution; betafunctions::rBeta.4P), "chi" (χ2\chi^2 distribution; rchisq), "normal", "Normal", or "N" (standard normal distribution; rnorm), and "Mixture" or "2NM" (two-component Gaussian mixture distribution; see Li (2021) for details.)

item_D

An item parameter matrix for using fixed parameter values. The number of columns should be 3: a parameter for the first, b parameter for the second, and c parameter for the third column. Default is NULL.

item_P

An item parameter matrix for using fixed parameter values. The number of columns should be 7: a parameter for the first, and b parameters for the rest of the columns. Default is NULL.

item_C

An item parameter matrix for using fixed parameter values. The number of columns should be 3: a parameter for the first, b parameter for the second, and nu parameter for the third column. Default is NULL.

theta

An ability parameter vector for using fixed parameter values. Default is NULL.

prob

A numeric value for using latent_dist = "2NM". It is the π=n1N\pi = \frac{n_1}{N} parameter of two-component Gaussian mixture distribution, where n1n_1 is the estimated number of examinees belonging to the first Gaussian component and NN is the total number of examinees (Li, 2021).

d

A numeric value for using latent_dist = "2NM". It is the δ=μ2μ1σˉ\delta = \frac{\mu_2 - \mu_1}{\bar{\sigma}} parameter of two-component Gaussian mixture distribution, where μ1\mu_1 and μ2\mu_2 are the estimated means of the first and second Gaussian components, respectively. And σˉ\bar{\sigma} is the overall standard deviation of the latent distribution (Li, 2021). Without loss of generality, μ2μ1\mu_2 \ge \mu_1 is assumed, thus δ0\delta \ge 0.

sd_ratio

A numeric value for using latent_dist = "2NM". It is the ζ=σ2σ1\zeta = \frac{\sigma_2}{\sigma_1} parameter of two-component Gaussian mixture distribution, where σ1\sigma_1 and σ2\sigma_2 are the estimated standard deviations of the first and second Gaussian components, respectively (Li, 2021).

m

A numeric value of the overall mean of the latent distribution. The default is 0.

s

A numeric value of the overall standard deviation of the latent distribution. The default is 1.

a_l

A numeric value. The lower bound of item discrimination parameters (a).

a_u

A numeric value. The upper bound of item discrimination parameters (a).

b_m

A numeric value. The mean of item difficulty parameters (b). If unspecified, m is passed on to the value.

b_sd

A numeric value. The standard deviation of item difficulty parameters (b). If unspecified, s is passed on to the value.

c_l

A numeric value. The lower bound of item guessing parameters (c).

c_u

A numeric value. The lower bound of item guessing parameters (c).

categ

A scalar or a numeric vector of length nitem_P. The default is 5. If length(categ)>1, the ith element equals the number of categories of the ith polyotomous item.

possible_ans

Possible options for continuous items (e.g., 0.1, 0.3, 0.5, 0.7, 0.9)

Value

This function returns a list of several objects:

theta

A vector of ability parameters (θ\theta).

item_D

A matrix of dichotomous item parameters.

initialitem_D

A matrix that contains initial item parameter values for dichotomous items.

data_D

A matrix of dichotomous item responses where rows indicate examinees and columns indicate items.

item_P

A matrix of polytomous item parameters.

initialitem_P

A matrix that contains initial item parameter values for polytomous items.

data_P

A matrix of polytomous item responses where rows indicate examinees and columns indicate items.

item_D

A matrix of continuous response item parameters.

initialitem_D

A matrix that contains initial item parameter values for continuous response items.

data_D

A matrix of continuous response item responses where rows indicate examinees and columns indicate items.

Author(s)

Seewoo Li [email protected]

References

Li, S. (2021). Using a two-component normal mixture distribution as a latent distribution in estimating parameters of item response models. Journal of Educational Evaluation, 34(4), 759-789.

Examples

# Dichotomous item responses

Alldata <- DataGeneration(N = 500,
                          nitem_D = 10)


# Polytomous item responses

Alldata <- DataGeneration(N = 1000,
                          nitem_P = 10)


# Mixed-format items

Alldata <- DataGeneration(N = 1000,
                          nitem_D = 20,
                          nitem_P = 10)

# Continuous items

AllData <- DataGeneration(N = 1000,
                          nitem_C = 10)

# Dataset from non-normal latent density using two-component Gaussian mixture distribution

Alldata <- DataGeneration(N=1000,
                          nitem_P = 10,
                          latent_dist = "2NM",
                          d = 1.664,
                          sd_ratio = 2,
                          prob = 0.3)

Re-parameterized two-component normal mixture distribution

Description

Probability density for the re-parameterized two-component normal mixture distribution.

Usage

dist2(x, prob = 0.5, d = 0, sd_ratio = 1, overallmean = 0, overallsd = 1)

Arguments

x

A numeric vector. The location to evaluate the density function.

prob

A numeric value of π=n1N\pi = \frac{n_1}{N} parameter of two-component Gaussian mixture distribution, where n1n_1 is the estimated number of examinees belonging to the first Gaussian component and NN is the total number of examinees (Li, 2021).

d

A numeric value of δ=μ2μ1σˉ\delta = \frac{\mu_2 - \mu_1}{\bar{\sigma}} parameter of two-component Gaussian mixture distribution, where μ1\mu_1 and μ2\mu_2 are the estimated mean of the first and second Gaussian component, respectively. And σˉ\bar{\sigma} is the overall standard deviation of the latent distribution (Li, 2021). Without loss of generality, μ2μ1\mu_2 \ge \mu_1 is assumed, thus δ0\delta \ge 0.

sd_ratio

A numeric value of ζ=σ2σ1\zeta = \frac{\sigma_2}{\sigma_1} parameter of two-component Gaussian mixture distribution, where σ1\sigma_1 and σ2\sigma_2 are the estimated standard deviation of the first and second Gaussian component, respectively (Li, 2021).

overallmean

A numeric value of μˉ\bar{\mu} that determines the overall mean of two-component Gaussian mixture distribution.

overallsd

A numeric value of σˉ\bar{\sigma} that determines the overall standard deviation of two-component Gaussian mixture distribution.

Details

The overall mean and overall standard deviation obtained from original parameters;

1) Overall mean (μˉ\bar{\mu})

μˉ=πμ1+(1π)μ2\bar{\mu}=\pi\mu_1 + (1-\pi)\mu_2

2) Overall standard deviation (σˉ\bar{\sigma})

σˉ=πσ12+(1π)σ22+π(1π)(μ2μ1)2\bar{\sigma}=\sqrt{\pi\sigma_{1}^{2}+(1-\pi)\sigma_{2}^{2}+\pi(1-\pi)(\mu_2-\mu_1)^2}

Value

The evaluated probability density value(s).

Author(s)

Seewoo Li [email protected]

References

Li, S. (2021). Using a two-component normal mixture distribution as a latent distribution in estimating parameters of item response models. Journal of Educational Evaluation, 34(4), 759-789.

Examples

# Evaluated density
dnst <- dist2(seq(-6,6,.1), prob = 0.3, d = 1, sd_ratio=0.5)

# Plot of the density
plot(seq(-6,6,.1), dnst)

Estimated factor scores

Description

Factor scores of examinees.

Usage

factor_score(x, ability_method = "EAP", quad = NULL, prior = NULL)

Arguments

x

A model fit object from either IRTest_Dich, IRTest_Poly, IRTest_Cont, or IRTest_Mix.

ability_method

The ability parameter estimation method. The available options are Expected a posteriori (EAP), Maximum Likelihood Estimates (MLE), and weighted likelihood estimates (WLE). The default is EAP.

quad

A vector of quadrature points for EAP calculation.

prior

A vector of the prior distribution for EAP calculation. The length of it should be the same as quad.

Value

theta

The estimated ability parameter values. If ability_method = "MLE". If an examinee receives a maximum or minimum score for all items, the function returns ±\pmInf.

theta_se

The standard errors of ability parameter estimates. It returns standard deviations of posteriors for EAPs and asymptotic standard errors (i.e., square root of inverse Fisher information) for MLE. If an examinee receives a maximum or minimum score for all items, the function returns NA for MLE.

Author(s)

Seewoo Li [email protected]

Examples

# A preparation of dichotomous item response data

data <- DataGeneration(N=500, nitem_D = 10)$data_D

# Analysis

M1 <- IRTest_Dich(data)

# Item fit statistics

factor_score(M1, ability_method = "MLE")

Item information function

Description

Item information function

Usage

inform_f_item(x, test, item = 1, type = "d")

Arguments

x

A vector of θ\theta value(s).

test

An object returned from an estimation function.

item

A natural number indicating the nnth item.

type

A character value for a mixed format test which determines the item type: "d" and "p" stand for a dichotomous and polytomous item, respectively.

Value

A vector of the evaluated item information values.

Author(s)

Seewoo Li [email protected]


Test information function

Description

Test information function

Usage

inform_f_test(x, test)

Arguments

x

A vector of θ\theta value(s).

test

An object returned from an estimation function.

Value

A vector of test information values of the same length as x.

Author(s)

Seewoo Li [email protected]


Item and ability parameters estimation for continuous response items

Description

This function estimates IRT item and ability parameters when all items are scored continuously. Based on Bock & Aitkin's (1981) marginal maximum likelihood and EM algorithm (EM-MML), this function provides several latent distribution estimation algorithms which could free the normality assumption on the latent variable. If the normality assumption is violated, application of these latent distribution estimation methods could reflect non-normal characteristics of the unknown true latent distribution, thereby providing more accurate parameter estimates (Li, 2021; Woods & Lin, 2009; Woods & Thissen, 2006).

Usage

IRTest_Cont(
  data,
  model = 2,
  range = c(-6, 6),
  q = 121,
  initialitem = NULL,
  ability_method = "EAP",
  latent_dist = "Normal",
  max_iter = 200,
  threshold = 1e-04,
  bandwidth = "SJ-ste",
  h = NULL
)

Arguments

data

A matrix or data frame of item responses where responses are coded as 0 or 1. Rows and columns indicate examinees and items, respectively.

model

A scalar or vector that represents types of item characteristic functions: 1, "1PL", "Rasch", or "RASCH" for one-parameter logistic model, and 2, "2PL" for two-parameter logistic model.

range

Range of the latent variable to be considered in the quadrature scheme. The default is from -6 to 6: c(-6, 6).

q

A numeric value that represents the number of quadrature points. The default value is 121.

initialitem

A matrix of initial item parameter values for starting the estimation algorithm. The default value is NULL.

ability_method

The ability parameter estimation method. The available options are Expected a posteriori (EAP), Maximum Likelihood Estimates (MLE), and weighted likelihood estimates (WLE). The default is EAP.

latent_dist

A character string that determines latent distribution estimation method. Insert "Normal", "normal", or "N" for the normality assumption on the latent distribution, "EHM" for empirical histogram method (Mislevy, 1984; Mislevy & Bock, 1985), "2NM" or "Mixture" for using two-component Gaussian mixture distribution (Li, 2021; Mislevy, 1984), "DC" or "Davidian" for Davidian-curve method (Woods & Lin, 2009), "KDE" for kernel density estimation method (Li, 2022), and "LLS" for log-linear smoothing method (Casabianca & Lewis, 2015). The default value is set to "Normal" to follow the convention.

max_iter

A numeric value that determines the maximum number of iterations in the EM-MML. The default value is 200.

threshold

A numeric value that determines the threshold of EM-MML convergence. A maximum item parameter change is monitored and compared with the threshold. The default value is 0.0001.

bandwidth

A character value that can be used if latent_dist = "KDE". This argument determines the bandwidth estimation method for "KDE". The default value is "SJ-ste". See density for available options.

h

A natural number less than or equal to 10 if latent_dist = "DC" or "LLS". This argument determines the complexity of the distribution.

Details

The probability of a response u=xu=x, where 0<u<10<u<1

P(u=xa,b,ν)=1B(μν,ν(1μ))uμν1(1u)ν(1μ)1P(u=x | a, b, \nu) = \frac{1}{B(\mu\nu, \,\nu(1-\mu))} u^{\mu\nu-1} (1-u)^{\nu(1-\mu)-1}

where μ=ea(θb)1+ea(θb)\mu = \frac{e^{a(\theta -b)}}{1+e^{a(\theta -b)}}.

Latent distribution estimation methods

1) Empirical histogram method

P(θ=Xk)=A(Xk)P(\theta=X_k)=A(X_k)

where k=1,2,...,qk=1, 2, ..., q, XkX_k is the location of the kkth quadrature point, and A(Xk)A(X_k) is a value of probability mass function evaluated at XkX_k. Empirical histogram method thus has q1q-1 parameters.

2) Two-component Gaussian mixture distribution

P(θ=X)=πϕ(X;μ1,σ1)+(1π)ϕ(X;μ2,σ2)P(\theta=X)=\pi \phi(X; \mu_1, \sigma_1)+(1-\pi) \phi(X; \mu_2, \sigma_2)

where ϕ(X;μ,σ)\phi(X; \mu, \sigma) is the value of a Gaussian component with mean μ\mu and standard deviation σ\sigma evaluated at XX.

3) Davidian curve method

P(θ=X)={λ=0hmλXλ}2ϕ(X;0,1)P(\theta=X)=\left\{\sum_{\lambda=0}^{h}{{m}_{\lambda}{X}^{\lambda}}\right\}^{2}\phi(X; 0, 1)

where hh corresponds to the argument h and determines the degree of the polynomial.

4) Kernel density estimation method

P(θ=X)=1Nhj=1NK(Xθjh)P(\theta=X)=\frac{1}{Nh}\sum_{j=1}^{N}{K\left(\frac{X-\theta_j}{h}\right)}

where NN is the number of examinees, θj\theta_j is jjth examinee's ability parameter, hh is the bandwidth which corresponds to the argument bandwidth, and K()K( \cdot ) is a kernel function. The Gaussian kernel is used in this function.

5) Log-linear smoothing method

P(θ=Xq)=exp(β0+m=1hβmXqm)P(\theta=X_{q})=\exp{\left(\beta_{0}+\sum_{m=1}^{h}{\beta_{m}X_{q}^{m}}\right)}

where hh is the hyper parameter which determines the smoothness of the density, and θ\theta can take total QQ finite values (X1,,Xq,,XQX_1, \dots ,X_q, \dots, X_Q).

Value

This function returns a list of several objects:

par_est

The item parameter estimates.

se

The asymptotic standard errors for item parameter estimates.

fk

The estimated frequencies of examinees at quadrature points.

iter

The number of EM-MML iterations elapsed for the convergence.

quad

The location of quadrature points.

diff

The final value of the monitored maximum item parameter change.

Ak

The estimated discrete latent distribution. It is discrete (i.e., probability mass function) by the quadrature scheme.

Pk

The posterior probabilities of examinees at quadrature points.

theta

The estimated ability parameter values. If ability_method = "MLE", the function returns ±\pmInf for all or none correct answers.

theta_se

Standard error of ability estimates. The asymptotic standard errors for ability_method = "MLE" (the function returns NA for all or none correct answers). The standard deviations of the posterior distributions for ability_method = "MLE".

logL

The deviance (i.e., -2logL).

density_par

The estimated density parameters.

Options

A replication of input arguments and other information.

Author(s)

Seewoo Li [email protected]

References

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459.

Casabianca, J. M., & Lewis, C. (2015). IRT item parameter recovery with marginal maximum likelihood estimation using loglinear smoothing models. Journal of Educational and Behavioral Statistics, 40(6), 547-578.

Li, S. (2021). Using a two-component normal mixture distribution as a latent distribution in estimating parameters of item response models. Journal of Educational Evaluation, 34(4), 759-789.

Li, S. (2022). The effect of estimating latent distribution using kernel density estimation method on the accuracy and efficiency of parameter estimation of item response models [Master's thesis, Yonsei University, Seoul]. Yonsei University Library.

Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49(3), 359-381.

Mislevy, R. J., & Bock, R. D. (1985). Implementation of the EM algorithm in the estimation of item parameters: The BILOG computer program. In D. J. Weiss (Ed.). Proceedings of the 1982 item response theory and computerized adaptive testing conference (pp. 189-202). University of Minnesota, Department of Psychology, Computerized Adaptive Testing Conference.

Woods, C. M., & Lin, N. (2009). Item response theory with estimation of the latent density using Davidian curves. Applied Psychological Measurement, 33(2), 102-117.

Woods, C. M., & Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 71(2), 281-301.

Examples

# Generating a continuous item response data
data <- DataGeneration(N = 1000, nitem_C = 10)$data_C

# Analysis
M1 <- IRTest_Cont(data, max_iter = 3) # increase `max_iter` in real analyses.

Item and ability parameters estimation for dichotomous items

Description

This function estimates IRT item and ability parameters when all items are scored dichotomously. Based on Bock & Aitkin's (1981) marginal maximum likelihood and EM algorithm (EM-MML), this function provides several latent distribution estimation algorithms which could free the normality assumption on the latent variable. If the normality assumption is violated, application of these latent distribution estimation methods could reflect non-normal characteristics of the unknown true latent distribution, and, thus, could provide more accurate parameter estimates (Li, 2021; Woods & Lin, 2009; Woods & Thissen, 2006).

Usage

IRTest_Dich(
  data,
  model = "2PL",
  range = c(-6, 6),
  q = 121,
  initialitem = NULL,
  ability_method = "EAP",
  latent_dist = "Normal",
  max_iter = 200,
  threshold = 1e-04,
  bandwidth = "SJ-ste",
  h = NULL
)

Arguments

data

A matrix or data frame of item responses where responses are coded as 0 or 1. Rows and columns indicate examinees and items, respectively.

model

A scalar or vector that represents types of item characteristic functions. Insert 1, "1PL", "Rasch", or "RASCH" for one-parameter logistic model, 2, "2PL" for two-parameter logistic model, and 3, "3PL" for three-parameter logistic model. The default is "2PL".

range

Range of the latent variable to be considered in the quadrature scheme. The default is from -6 to 6: c(-6, 6).

q

A numeric value that represents the number of quadrature points. The default value is 121.

initialitem

A matrix of initial item parameter values for starting the estimation algorithm. The default value is NULL.

ability_method

The ability parameter estimation method. The available options are Expected a posteriori (EAP), Maximum Likelihood Estimates (MLE), and weighted likelihood estimates (WLE). The default is EAP.

latent_dist

A character string that determines latent distribution estimation method. Insert "Normal", "normal", or "N" for the normality assumption on the latent distribution, "EHM" for empirical histogram method (Mislevy, 1984; Mislevy & Bock, 1985), "2NM" or "Mixture" for using two-component Gaussian mixture distribution (Li, 2021; Mislevy, 1984), "DC" or "Davidian" for Davidian-curve method (Woods & Lin, 2009), "KDE" for kernel density estimation method (Li, 2022), and "LLS" for log-linear smoothing method (Casabianca & Lewis, 2015). The default value is set to "Normal" to follow the convention.

max_iter

A numeric value that determines the maximum number of iterations in the EM-MML. The default value is 200.

threshold

A numeric value that determines the threshold of EM-MML convergence. A maximum item parameter change is monitored and compared with the threshold. The default value is 0.0001.

bandwidth

A character value that can be used if latent_dist = "KDE". This argument determines the bandwidth estimation method for "KDE". The default value is "SJ-ste". See density for available options.

h

A natural number less than or equal to 10 if latent_dist = "DC" or "LLS". This argument determines the complexity of the distribution.

Details

The probabilities for a correct response (u=1u=1)

1) One-parameter logistic (1PL) model

P(u=1θ,b)=exp(θb)1+exp(θb)P(u=1|\theta, b)=\frac{\exp{(\theta-b)}}{1+\exp{(\theta-b)}}

2) Two-parameter logistic (2PL) model

P(u=1θ,a,b)=exp(a(θb))1+exp(a(θb))P(u=1|\theta, a, b)=\frac{\exp{(a(\theta-b))}}{1+\exp{(a(\theta-b))}}

3) Three-parameter logistic (3PL) model

P(u=1θ,a,b,c)=c+(1c)exp(a(θb))1+exp(a(θb))P(u=1|\theta, a, b, c)=c + (1-c)\frac{\exp{(a(\theta-b))}}{1+\exp{(a(\theta-b))}}

Latent distribution estimation methods

1) Empirical histogram method

P(θ=Xk)=A(Xk)P(\theta=X_k)=A(X_k)

where k=1,2,...,qk=1, 2, ..., q, XkX_k is the location of the kkth quadrature point, and A(Xk)A(X_k) is a value of probability mass function evaluated at XkX_k. Empirical histogram method thus has q1q-1 parameters.

2) Two-component Gaussian mixture distribution

P(θ=X)=πϕ(X;μ1,σ1)+(1π)ϕ(X;μ2,σ2)P(\theta=X)=\pi \phi(X; \mu_1, \sigma_1)+(1-\pi) \phi(X; \mu_2, \sigma_2)

where ϕ(X;μ,σ)\phi(X; \mu, \sigma) is the value of a Gaussian component with mean μ\mu and standard deviation σ\sigma evaluated at XX.

3) Davidian curve method

P(θ=X)={λ=0hmλXλ}2ϕ(X;0,1)P(\theta=X)=\left\{\sum_{\lambda=0}^{h}{{m}_{\lambda}{X}^{\lambda}}\right\}^{2}\phi(X; 0, 1)

where hh corresponds to the argument h and determines the degree of the polynomial.

4) Kernel density estimation method

P(θ=X)=1Nhj=1NK(Xθjh)P(\theta=X)=\frac{1}{Nh}\sum_{j=1}^{N}{K\left(\frac{X-\theta_j}{h}\right)}

where NN is the number of examinees, θj\theta_j is jjth examinee's ability parameter, hh is the bandwidth which corresponds to the argument bandwidth, and K()K( \cdot ) is a kernel function. The Gaussian kernel is used in this function.

5) Log-linear smoothing method

P(θ=Xq)=exp(β0+m=1hβmXqm)P(\theta=X_{q})=\exp{\left(\beta_{0}+\sum_{m=1}^{h}{\beta_{m}X_{q}^{m}}\right)}

where hh is the hyper parameter which determines the smoothness of the density, and θ\theta can take total QQ finite values (X1,,Xq,,XQX_1, \dots ,X_q, \dots, X_Q).

Value

This function returns a list of several objects:

par_est

The item parameter estimates.

se

The asymptotic standard errors for item parameter estimates.

fk

The estimated frequencies of examinees at quadrature points.

iter

The number of EM-MML iterations elapsed for the convergence.

quad

The location of quadrature points.

diff

The final value of the monitored maximum item parameter change.

Ak

The estimated discrete latent distribution. It is discrete (i.e., probability mass function) by the quadrature scheme.

Pk

The posterior probabilities of examinees at quadrature points.

theta

The estimated ability parameter values. If ability_method = "MLE", the function returns ±\pmInf for all or none correct answers.

theta_se

Standard error of ability estimates. The asymptotic standard errors for ability_method = "MLE" (the function returns NA for all or none correct answers). The standard deviations of the posterior distributions for ability_method = "MLE".

logL

The deviance (i.e., -2logL).

density_par

The estimated density parameters.

Options

A replication of input arguments and other information.

Author(s)

Seewoo Li [email protected]

References

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459.

Casabianca, J. M., & Lewis, C. (2015). IRT item parameter recovery with marginal maximum likelihood estimation using loglinear smoothing models. Journal of Educational and Behavioral Statistics, 40(6), 547-578.

Li, S. (2021). Using a two-component normal mixture distribution as a latent distribution in estimating parameters of item response models. Journal of Educational Evaluation, 34(4), 759-789.

Li, S. (2022). The effect of estimating latent distribution using kernel density estimation method on the accuracy and efficiency of parameter estimation of item response models [Master's thesis, Yonsei University, Seoul]. Yonsei University Library.

Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49(3), 359-381.

Mislevy, R. J., & Bock, R. D. (1985). Implementation of the EM algorithm in the estimation of item parameters: The BILOG computer program. In D. J. Weiss (Ed.). Proceedings of the 1982 item response theory and computerized adaptive testing conference (pp. 189-202). University of Minnesota, Department of Psychology, Computerized Adaptive Testing Conference.

Woods, C. M., & Lin, N. (2009). Item response theory with estimation of the latent density using Davidian curves. Applied Psychological Measurement, 33(2), 102-117.

Woods, C. M., & Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 71(2), 281-301.

Examples

# A preparation of dichotomous item response data

data <- DataGeneration(N=500,
                       nitem_D = 10)$data_D

# Analysis

M1 <- IRTest_Dich(data)

Item and ability parameters estimation for a mixed-format item response data

Description

This function estimates IRT item and ability parameters when a test consists of mixed-format items (i.e., a combination of dichotomous and polytomous items). In educational context, the combination of these two item formats takes an advantage; Dichotomous item format expedites scoring and is conducive to cover broad domain, while Polytomous item format (e.g., free response item) encourages students to exert complex cognitive skills (Lee et al., 2020). Based on Bock & Aitkin's (1981) marginal maximum likelihood and EM algorithm (EM-MML), this function incorporates several latent distribution estimation algorithms which could free the normality assumption on the latent variable. If the normality assumption is violated, application of these latent distribution estimation methods could reflect some features of the unknown true latent distribution, and, thus, could provide more accurate parameter estimates (Li, 2021; Woods & Lin, 2009; Woods & Thissen, 2006).

Usage

IRTest_Mix(
  data_D,
  data_P,
  model_D = "2PL",
  model_P = "GPCM",
  range = c(-6, 6),
  q = 121,
  initialitem_D = NULL,
  initialitem_P = NULL,
  ability_method = "EAP",
  latent_dist = "Normal",
  max_iter = 200,
  threshold = 1e-04,
  bandwidth = "SJ-ste",
  h = NULL
)

Arguments

data_D

A matrix or data frame of item responses where responses are coded as 0 or 1. Rows and columns indicate examinees and items, respectively.

data_P

A matrix or data frame of item responses coded as 0, 1, ..., m for the m+1 category item. Rows and columns indicate examinees and items, respectively.

model_D

A scalar or vector that represents types of item characteristic functions. Insert 1, "1PL", "Rasch", or "RASCH" for one-parameter logistic model, 2, "2PL" for two-parameter logistic model, and 3, "3PL" for three-parameter logistic model. The default is "2PL".

model_P

A character value for an IRT model to be applied. Currently, PCM, GPCM, and GRM are available. The default is "GPCM".

range

Range of the latent variable to be considered in the quadrature scheme. The default is from -6 to 6: c(-6, 6).

q

A numeric value that represents the number of quadrature points. The default value is 121.

initialitem_D

A matrix of initial item parameter values for starting the estimation algorithm. The default value is NULL.

initialitem_P

A matrix of initial item parameter values for starting the estimation algorithm. The default value is NULL.

ability_method

The ability parameter estimation method. The available options are Expected a posteriori (EAP), Maximum Likelihood Estimates (MLE), and weighted likelihood estimates (WLE). The default is EAP.

latent_dist

A character string that determines latent distribution estimation method. Insert "Normal", "normal", or "N" for the normality assumption on the latent distribution, "EHM" for empirical histogram method (Mislevy, 1984; Mislevy & Bock, 1985), "2NM" or "Mixture" for using two-component Gaussian mixture distribution (Li, 2021; Mislevy, 1984), "DC" or "Davidian" for Davidian-curve method (Woods & Lin, 2009), "KDE" for kernel density estimation method (Li, 2022), and "LLS" for log-linear smoothing method (Casabianca & Lewis, 2015). The default value is set to "Normal" to follow the convention.

max_iter

A numeric value that determines the maximum number of iterations in the EM-MML. The default value is 200.

threshold

A numeric value that determines the threshold of EM-MML convergence. A maximum item parameter change is monitored and compared with the threshold. The default value is 0.0001.

bandwidth

A character value that can be used if latent_dist = "KDE". This argument determines the bandwidth estimation method for "KDE". The default value is "SJ-ste". See density for available options.

h

A natural number less than or equal to 10 if latent_dist = "DC" or "LLS". This argument determines the complexity of the distribution.

Details

Dichotomous: the probabilities for a correct response (u=1u=1)

1) One-parameter logistic (1PL) model

P(u=1θ,b)=exp(θb)1+exp(θb)P(u=1|\theta, b)=\frac{\exp{(\theta-b)}}{1+\exp{(\theta-b)}}

2) Two-parameter logistic (2PL) model

P(u=1θ,a,b)=exp(a(θb))1+exp(a(θb))P(u=1|\theta, a, b)=\frac{\exp{(a(\theta-b))}}{1+\exp{(a(\theta-b))}}

3) Three-parameter logistic (3PL) model

P(u=1θ,a,b,c)=c+(1c)exp(a(θb))1+exp(a(θb))P(u=1|\theta, a, b, c)=c + (1-c)\frac{\exp{(a(\theta-b))}}{1+\exp{(a(\theta-b))}}

Polytomous: the probability for scoring u=ku=k (i.e., k=0,1,...,m;m2k=0, 1, ..., m; m \ge 2)

1) Partial credit model (PCM)

P(u=0θ,b1,...,bm)=11+c=1mexp[v=1ca(θbv)]P(u=0|\theta, b_1, ..., b_{m})=\frac{1}{1+\sum_{c=1}^{m}{\exp{\left[\sum_{v=1}^{c}{a(\theta-b_v)}\right]}}}

P(u=1θ,b1,...,bm)=exp(θb1)1+c=1mexp[v=1cθbv]P(u=1|\theta, b_1, ..., b_{m})=\frac{\exp{(\theta-b_1)}}{1+\sum_{c=1}^{m}{\exp{\left[\sum_{v=1}^{c}{\theta-b_v}\right]}}}

\vdots

P(u=mθ,b1,...,bm)=exp[v=1mθbv]1+c=1mexp[v=1cθbv]P(u=m|\theta, b_1, ..., b_{m})=\frac{\exp{\left[\sum_{v=1}^{m}{\theta-b_v}\right]}}{1+\sum_{c=1}^{m}{\exp{\left[\sum_{v=1}^{c}{\theta-b_v}\right]}}}

2) Generalized partial credit model (GPCM)

P(u=0θ,a,b1,...,bm)=11+c=1mexp[v=1ca(θbv)]P(u=0|\theta, a, b_1, ..., b_{m})=\frac{1}{1+\sum_{c=1}^{m}{\exp{\left[\sum_{v=1}^{c}{a(\theta-b_v)}\right]}}}

P(u=1θ,a,b1,...,bm)=exp(a(θb1))1+c=1mexp[v=1ca(θbv)]P(u=1|\theta, a, b_1, ..., b_{m})=\frac{\exp{(a(\theta-b_1))}}{1+\sum_{c=1}^{m}{\exp{\left[\sum_{v=1}^{c}{a(\theta-b_v)}\right]}}}

\vdots

P(u=mθ,a,b1,...,bm)=exp[v=1ma(θbv)]1+c=1mexp[v=1ca(θbv)]P(u=m|\theta, a, b_1, ..., b_{m})=\frac{\exp{\left[\sum_{v=1}^{m}{a(\theta-b_v)}\right]}}{1+\sum_{c=1}^{m}{\exp{\left[\sum_{v=1}^{c}{a(\theta-b_v)}\right]}}}

3) Graded response model (GRM)

P(u=0θ,a,b1,...,bm)=111+exp[a(θb1)]P(u=0|\theta, a, b_1, ..., b_{m})=1-\frac{1}{1+\exp{\left[-a(\theta-b_1)\right]}}

P(u=1θ,a,b1,...,bm)=11+exp[a(θb1)]11+exp[a(θb2)]P(u=1|\theta, a, b_1, ..., b_{m})=\frac{1}{1+\exp{\left[-a(\theta-b_1)\right]}}-\frac{1}{1+\exp{\left[-a(\theta-b_2)\right]}}

\vdots

P(u=mθ,a,b1,...,bm)=11+exp[a(θbm)]0P(u=m|\theta, a, b_1, ..., b_{m})=\frac{1}{1+\exp{\left[-a(\theta-b_m)\right]}}-0

Latent distribution estimation methods

1) Empirical histogram method

P(θ=Xk)=A(Xk)P(\theta=X_k)=A(X_k)

where k=1,2,...,qk=1, 2, ..., q, XkX_k is the location of the kkth quadrature point, and A(Xk)A(X_k) is a value of probability mass function evaluated at XkX_k. Empirical histogram method thus has q1q-1 parameters.

2) Two-component Gaussian mixture distribution

P(θ=X)=πϕ(X;μ1,σ1)+(1π)ϕ(X;μ2,σ2)P(\theta=X)=\pi \phi(X; \mu_1, \sigma_1)+(1-\pi) \phi(X; \mu_2, \sigma_2)

where ϕ(X;μ,σ)\phi(X; \mu, \sigma) is the value of a Gaussian component with mean μ\mu and standard deviation σ\sigma evaluated at XX.

3) Davidian curve method

P(θ=X)={λ=0hmλXλ}2ϕ(X;0,1)P(\theta=X)=\left\{\sum_{\lambda=0}^{h}{{m}_{\lambda}{X}^{\lambda}}\right\}^{2}\phi(X; 0, 1)

where hh corresponds to the argument h and determines the degree of the polynomial.

4) Kernel density estimation method

P(θ=X)=1Nhj=1NK(Xθjh)P(\theta=X)=\frac{1}{Nh}\sum_{j=1}^{N}{K\left(\frac{X-\theta_j}{h}\right)}

where NN is the number of examinees, θj\theta_j is jjth examinee's ability parameter, hh is the bandwidth which corresponds to the argument bw, and K()K( \bullet ) is a kernel function. The Gaussian kernel is used in this function.

5) Log-linear smoothing method

P(θ=Xq)=exp(β0+m=1hβmXqm)P(\theta=X_{q})=\exp{\left(\beta_{0}+\sum_{m=1}^{h}{\beta_{m}X_{q}^{m}}\right)}

where hh is the hyper parameter which determines the smoothness of the density, and θ\theta can take total QQ finite values (X1,,Xq,,XQX_1, \dots ,X_q, \dots, X_Q).

Value

This function returns a list of several objects:

par_est

The list of item parameter estimates. The first and second objects are the matrices of dichotomous and polytomous item parameter estimates, respectively

se

The list of standard errors of the item parameter estimates. The first and second objects are the matrices of standard errors of dichotomous and polytomous item parameter estimates, respectively

fk

The estimated frequencies of examinees at quadrature points.

iter

The number of EM-MML iterations elapsed for the convergence.

quad

The location of quadrature points.

diff

The final value of the monitored maximum item parameter change.

Ak

The estimated discrete latent distribution. It is discrete (i.e., probability mass function) by the quadrature scheme.

Pk

The posterior probabilities of examinees at quadrature points.

theta

The estimated ability parameter values. If ability_method = "MLE". If an examinee receives a maximum or minimum score for all items, the function returns ±\pmInf.

theta_se

Standard error of ability estimates. The asymptotic standard errors for ability_method = "MLE" (the function returns NA for all or none correct answers). The standard deviations of the posterior distributions for ability_method = "MLE".

logL

The deviance (i.e., -2logL).

density_par

The estimated density parameters.

Options

A replication of input arguments and other information.

Author(s)

Seewoo Li [email protected]

References

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459.

Casabianca, J. M., & Lewis, C. (2015). IRT item parameter recovery with marginal maximum likelihood estimation using loglinear smoothing models. Journal of Educational and Behavioral Statistics, 40(6), 547-578.

Lee, W. C., Kim, S. Y., Choi, J., & Kang, Y. (2020). IRT Approaches to Modeling Scores on Mixed-Format Tests. Journal of Educational Measurement, 57(2), 230-254.

Li, S. (2021). Using a two-component normal mixture distribution as a latent distribution in estimating parameters of item response models. Journal of Educational Evaluation, 34(4), 759-789.

Li, S. (2022). The effect of estimating latent distribution using kernel density estimation method on the accuracy and efficiency of parameter estimation of item response models [Master's thesis, Yonsei University, Seoul]. Yonsei University Library.

Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49(3), 359-381.

Mislevy, R. J., & Bock, R. D. (1985). Implementation of the EM algorithm in the estimation of item parameters: The BILOG computer program. In D. J. Weiss (Ed.). Proceedings of the 1982 item response theory and computerized adaptive testing conference (pp. 189-202). University of Minnesota, Department of Psychology, Computerized Adaptive Testing Conference.

Woods, C. M., & Lin, N. (2009). Item response theory with estimation of the latent density using Davidian curves. Applied Psychological Measurement, 33(2), 102-117.

Woods, C. M., & Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 71(2), 281-301.

Examples

# A preparation of mixed-format item response data

Alldata <- DataGeneration(N=1000,
                          nitem_D = 5,
                          nitem_P = 3)

DataD <- Alldata$data_D   # item response data for the dichotomous items
DataP <- Alldata$data_P   # item response data for the polytomous items

# Analysis

M1 <- IRTest_Mix(DataD, DataP)

Item and ability parameters estimation for polytomous items

Description

This function estimates IRT item and ability parameters when all items are scored polytomously. Based on Bock & Aitkin's (1981) marginal maximum likelihood and EM algorithm (EM-MML), this function provides several latent distribution estimation algorithms which could free the normality assumption on the latent variable. If the normality assumption is violated, application of these latent distribution estimation methods could reflect non-normal characteristics of the unknown true latent distribution, and, thus, could provide more accurate parameter estimates (Li, 2021; Woods & Lin, 2009; Woods & Thissen, 2006).

Usage

IRTest_Poly(
  data,
  model = "GPCM",
  range = c(-6, 6),
  q = 121,
  initialitem = NULL,
  ability_method = "EAP",
  latent_dist = "Normal",
  max_iter = 200,
  threshold = 1e-04,
  bandwidth = "SJ-ste",
  h = NULL
)

Arguments

data

A matrix or data frame of item responses coded as 0, 1, ..., m for the m+1 category item. Rows and columns indicate examinees and items, respectively.

model

A character value for an IRT model to be applied. Currently, PCM, GPCM, and GRM are available. The default is "GPCM".

range

Range of the latent variable to be considered in the quadrature scheme. The default is from -6 to 6: c(-6, 6).

q

A numeric value that represents the number of quadrature points. The default value is 121.

initialitem

A matrix of initial item parameter values for starting the estimation algorithm. The default value is NULL.

ability_method

The ability parameter estimation method. The available options are Expected a posteriori (EAP), Maximum Likelihood Estimates (MLE), and weighted likelihood estimates (WLE). The default is EAP.

latent_dist

A character string that determines latent distribution estimation method. Insert "Normal", "normal", or "N" for the normality assumption on the latent distribution, "EHM" for empirical histogram method (Mislevy, 1984; Mislevy & Bock, 1985), "2NM" or "Mixture" for using two-component Gaussian mixture distribution (Li, 2021; Mislevy, 1984), "DC" or "Davidian" for Davidian-curve method (Woods & Lin, 2009), "KDE" for kernel density estimation method (Li, 2022), and "LLS" for log-linear smoothing method (Casabianca & Lewis, 2015). The default value is set to "Normal" to follow the convention.

max_iter

A numeric value that determines the maximum number of iterations in the EM-MML. The default value is 200.

threshold

A numeric value that determines the threshold of EM-MML convergence. A maximum item parameter change is monitored and compared with the threshold. The default value is 0.0001.

bandwidth

A character value that can be used if latent_dist = "KDE". This argument determines the bandwidth estimation method for "KDE". The default value is "SJ-ste". See density for available options.

h

A natural number less than or equal to 10 if latent_dist = "DC" or "LLS". This argument determines the complexity of the distribution.

Details

The probability for scoring u=ku=k (i.e., k=0,1,...,m;m2k=0, 1, ..., m; m \ge 2)

1) Partial credit model (PCM)

P(u=0θ,b1,...,bm)=11+c=1mexp[v=1ca(θbv)]P(u=0|\theta, b_1, ..., b_{m})=\frac{1}{1+\sum_{c=1}^{m}{\exp{\left[\sum_{v=1}^{c}{a(\theta-b_v)}\right]}}}

P(u=1θ,b1,...,bm)=exp(θb1)1+c=1mexp[v=1cθbv]P(u=1|\theta, b_1, ..., b_{m})=\frac{\exp{(\theta-b_1)}}{1+\sum_{c=1}^{m}{\exp{\left[\sum_{v=1}^{c}{\theta-b_v}\right]}}}

\vdots

P(u=mθ,b1,...,bm)=exp[v=1mθbv]1+c=1mexp[v=1cθbv]P(u=m|\theta, b_1, ..., b_{m})=\frac{\exp{\left[\sum_{v=1}^{m}{\theta-b_v}\right]}}{1+\sum_{c=1}^{m}{\exp{\left[\sum_{v=1}^{c}{\theta-b_v}\right]}}}

2) Generalized partial credit model (GPCM)

P(u=0θ,a,b1,...,bm)=11+c=1mexp[v=1ca(θbv)]P(u=0|\theta, a, b_1, ..., b_{m})=\frac{1}{1+\sum_{c=1}^{m}{\exp{\left[\sum_{v=1}^{c}{a(\theta-b_v)}\right]}}}

P(u=1θ,a,b1,...,bm)=exp(a(θb1))1+c=1mexp[v=1ca(θbv)]P(u=1|\theta, a, b_1, ..., b_{m})=\frac{\exp{(a(\theta-b_1))}}{1+\sum_{c=1}^{m}{\exp{\left[\sum_{v=1}^{c}{a(\theta-b_v)}\right]}}}

\vdots

P(u=mθ,a,b1,...,bm)=exp[v=1ma(θbv)]1+c=1mexp[v=1ca(θbv)]P(u=m|\theta, a, b_1, ..., b_{m})=\frac{\exp{\left[\sum_{v=1}^{m}{a(\theta-b_v)}\right]}}{1+\sum_{c=1}^{m}{\exp{\left[\sum_{v=1}^{c}{a(\theta-b_v)}\right]}}}

3) Graded response model (GRM)

P(u=0θ,a,b1,...,bm)=111+exp[a(θb1)]P(u=0|\theta, a, b_1, ..., b_{m})=1-\frac{1}{1+\exp{\left[-a(\theta-b_1)\right]}}

P(u=1θ,a,b1,...,bm)=11+exp[a(θb1)]11+exp[a(θb2)]P(u=1|\theta, a, b_1, ..., b_{m})=\frac{1}{1+\exp{\left[-a(\theta-b_1)\right]}}-\frac{1}{1+\exp{\left[-a(\theta-b_2)\right]}}

\vdots

P(u=mθ,a,b1,...,bm)=11+exp[a(θbm)]0P(u=m|\theta, a, b_1, ..., b_{m})=\frac{1}{1+\exp{\left[-a(\theta-b_m)\right]}}-0

Latent distribution estimation methods

1) Empirical histogram method

P(θ=Xk)=A(Xk)P(\theta=X_k)=A(X_k)

where k=1,2,...,qk=1, 2, ..., q, XkX_k is the location of the kkth quadrature point, and A(Xk)A(X_k) is a value of probability mass function evaluated at XkX_k. Empirical histogram method thus has q1q-1 parameters.

2) Two-component Gaussian mixture distribution

P(θ=X)=πϕ(X;μ1,σ1)+(1π)ϕ(X;μ2,σ2)P(\theta=X)=\pi \phi(X; \mu_1, \sigma_1)+(1-\pi) \phi(X; \mu_2, \sigma_2)

where ϕ(X;μ,σ)\phi(X; \mu, \sigma) is the value of a Gaussian component with mean μ\mu and standard deviation σ\sigma evaluated at XX.

3) Davidian curve method

P(θ=X)={λ=0hmλXλ}2ϕ(X;0,1)P(\theta=X)=\left\{\sum_{\lambda=0}^{h}{{m}_{\lambda}{X}^{\lambda}}\right\}^{2}\phi(X; 0, 1)

where hh corresponds to the argument h and determines the degree of the polynomial.

4) Kernel density estimation method

P(θ=X)=1Nhj=1NK(Xθjh)P(\theta=X)=\frac{1}{Nh}\sum_{j=1}^{N}{K\left(\frac{X-\theta_j}{h}\right)}

where NN is the number of examinees, θj\theta_j is jjth examinee's ability parameter, hh is the bandwidth which corresponds to the argument bw, and K()K( \bullet ) is a kernel function. The Gaussian kernel is used in this function.

5) Log-linear smoothing method

P(θ=Xq)=exp(β0+m=1hβmXqm)P(\theta=X_{q})=\exp{\left(\beta_{0}+\sum_{m=1}^{h}{\beta_{m}X_{q}^{m}}\right)}

where hh is the hyper parameter which determines the smoothness of the density, and θ\theta can take total QQ finite values (X1,,Xq,,XQX_1, \dots ,X_q, \dots, X_Q).

Value

This function returns a list of several objects:

par_est

The item parameter estimates.

se

The asymptotic standard errors for item parameter estimates.

fk

The estimated frequencies of examinees at quadrature points.

iter

The number of EM-MML iterations elapsed for the convergence.

quad

The location of quadrature points.

diff

The final value of the monitored maximum item parameter change.

Ak

The estimated discrete latent distribution. It is discrete (i.e., probability mass function) by the quadrature scheme.

Pk

The posterior probabilities of examinees at quadrature points.

theta

The estimated ability parameter values. If ability_method = "MLE". If an examinee receives a maximum or minimum score for all items, the function returns ±\pmInf.

theta_se

Standard error of ability estimates. The asymptotic standard errors for ability_method = "MLE" (the function returns NA for all or none correct answers). The standard deviations of the posterior distributions for ability_method = "MLE".

logL

The deviance (i.e., -2logL).

density_par

The estimated density parameters.

Options

A replication of input arguments and other information.

Author(s)

Seewoo Li [email protected]

References

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459.

Casabianca, J. M., & Lewis, C. (2015). IRT item parameter recovery with marginal maximum likelihood estimation using loglinear smoothing models. Journal of Educational and Behavioral Statistics, 40(6), 547-578.

Li, S. (2021). Using a two-component normal mixture distribution as a latent distribution in estimating parameters of item response models. Journal of Educational Evaluation, 34(4), 759-789.

Li, S. (2022). The effect of estimating latent distribution using kernel density estimation method on the accuracy and efficiency of parameter estimation of item response models [Master's thesis, Yonsei University, Seoul]. Yonsei University Library.

Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49(3), 359-381.

Mislevy, R. J., & Bock, R. D. (1985). Implementation of the EM algorithm in the estimation of item parameters: The BILOG computer program. In D. J. Weiss (Ed.). Proceedings of the 1982 item response theory and computerized adaptive testing conference (pp. 189-202). University of Minnesota, Department of Psychology, Computerized Adaptive Testing Conference.

Woods, C. M., & Lin, N. (2009). Item response theory with estimation of the latent density using Davidian curves. Applied Psychological Measurement, 33(2), 102-117.

Woods, C. M., & Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 71(2), 281-301.

Examples

# Preparation of dichotomous item response data

data <- DataGeneration(N=1000,
                       nitem_P = 8)$data_P

# Analysis

M1 <- IRTest_Poly(data)

Item fit diagnostics

Description

This function analyzes and reports item-fit test results.

Usage

item_fit(x, bins = 10, bin.center = "mean")

Arguments

x

A model fit object from either IRTest_Dich, IRTest_Poly, or IRTest_Mix.

bins

The number of bins to be used for calculating the statistics. Following Yen's Q1Q_{1} (1981), the default is 10.

bin.center

A method for calculating the center of each bin. Following Yen's Q1Q_{1} (1981), the default is "mean". Use "median" for Bock's χ2\chi^{2} (1960).

Details

Bock's χ2\chi^{2} (1960) or Yen's Q1Q_{1} (1981) is currently available.

Value

This function returns a matrix of item-fit test results.

Author(s)

Seewoo Li [email protected]

References

Bock, R.D. (1960), Methods and applications of optimal scaling. Chapel Hill, NC: L.L. Thurstone Psychometric Laboratory.

Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5(2), 245–262.

Examples

# A preparation of dichotomous item response data

data <- DataGeneration(N=500,
                       nitem_D = 10)$data_D

# Analysis

M1 <- IRTest_Dich(data)

# Item fit statistics

item_fit(M1)

Latent density function

Description

Density function of the estimated latent distribution with mean and standard deviation equal to 0 and 1, respectively.

Usage

latent_distribution(x, model.fit)

Arguments

x

A numeric vector. Value(s) on the thetatheta scale for evaluating the PDF.

model.fit

An object returned from an estimation function.

Value

The evaluated values of the PDF, a length of which equals to that of x.

Examples

# Data generation and model fitting
data <- DataGeneration(N=1000,
                       nitem_D = 15,
                       latent_dist = "2NM",
                       d = 1.664,
                       sd_ratio = 2,
                       prob = 0.3)$data_D

M1 <- IRTest_Dich(data = data, latent_dist = "KDE")

# Plotting the latent distribution
ggplot2::ggplot()+
  ggplot2::stat_function(fun=latent_distribution, args=list(M1))+
  ggplot2::lims(x=c(-6,6), y=c(0,0.5))

Extract Log-Likelihood

Description

Extract Log-Likelihood

Usage

## S3 method for class 'IRTest'
logLik(object, ...)

Arguments

object

A IRTest-class object from which a log-likelihood value is extracted.

...

Other arguments.

Value

Extracted log-likelihood.


Recovering original parameters of two-component Gaussian mixture distribution from re-parameterized values

Description

Recovering original parameters of two-component Gaussian mixture distribution from re-parameterized values

Usage

original_par_2GM(
  prob = 0.5,
  d = 0,
  sd_ratio = 1,
  overallmean = 0,
  overallsd = 1
)

Arguments

prob

The π=n1N\pi = \frac{n_1}{N} parameter of two-component Gaussian mixture distribution, where n1n_1 is the estimated number of examinees belonging to the first Gaussian component and NN is the total number of examinees (Li, 2021).

d

The δ=μ2μ1σˉ\delta = \frac{\mu_2 - \mu_1}{\bar{\sigma}} parameter of two-component Gaussian mixture distribution, where μ1\mu_1 and μ2\mu_2 are the estimated means of the first and second Gaussian components, respectively. And σˉ\bar{\sigma} is the overall standard deviation of the latent distribution (Li, 2021). Without loss of generality, μ2μ1\mu_2 \ge \mu_1 is assumed, thus δ0\delta \ge 0.

sd_ratio

A numeric value of ζ=σ2σ1\zeta = \frac{\sigma_2}{\sigma_1} parameter of two-component Gaussian mixture distribution, where σ1\sigma_1 and σ2\sigma_2 are the estimated standard deviations of the first and second Gaussian components, respectively (Li, 2021).

overallmean

A numeric value of μˉ\bar{\mu} that determines the overall mean of two-component Gaussian mixture distribution.

overallsd

A numeric value of σˉ\bar{\sigma} that determines the overall standard deviation of two-component Gaussian mixture distribution.

Details

Original two-component Gaussian mixture distribution

f(x)=π×ϕ(xμ1,σ1)+(1π)×ϕ(xμ2,σ2)f(x)=\pi\times \phi(x | \mu_1, \sigma_1)+(1-\pi)\times \phi(x | \mu_2, \sigma_2)

, where ϕ\phi is a Gaussian component.

Re-parameterized two-component Gaussian mixture distribution

f(x)=2GM(xπ,δ,ζ,μˉ,σˉ)f(x)=2GM(x|\pi, \delta, \zeta, \bar{\mu}, \bar{\sigma})

, where μˉ\bar{\mu} is overall mean and σˉ\bar{\sigma} is overall standard deviation of the distribution.

The original parameters retrieved from re-parameterized values

1) Mean of the first Gaussian component (m1).

μ1=(1π)δσˉ+μˉ\mu_1=-(1-\pi)\delta\bar{\sigma}+\bar{\mu}

2) Mean of the second Gaussian component (m2).

μ2=πδσˉ+μˉ\mu_2=\pi\delta\bar{\sigma}+\bar{\mu}

3) Standard deviation of the first Gaussian component (s1).

σ12=σˉ2(1π(1π)δ2π+(1π)ζ2)\sigma_1^2=\bar{\sigma}^2\left(\frac{1-\pi(1-\pi)\delta^2}{\pi+(1-\pi)\zeta^2}\right)

4) Standard deviation of the second Gaussian component (s2).

σ22=σˉ2(1π(1π)δ21ζ2π+(1π))=ζ2σ12\sigma_2^2=\bar{\sigma}^2\left(\frac{1-\pi(1-\pi)\delta^2}{\frac{1}{\zeta^2}\pi+(1-\pi)}\right)=\zeta^2\sigma_1^2

Value

This function returns a vector of length 4: c(m1,m2,s1,s2).

m1

The location parameter (mean) of the first Gaussian component.

m2

The location parameter (mean) of the second Gaussian component.

s1

The scale parameter (standard deviation) of the first Gaussian component.

s2

The scale parameter (standard deviation) of the second Gaussian component.

Author(s)

Seewoo Li [email protected]

References

Li, S. (2021). Using a two-component normal mixture distribution as a latent distribution in estimating parameters of item response models. Journal of Educational Evaluation, 34(4), 759-789.


Plot of item response functions

Description

This function draws item response functions of an item of the fitted model.

Usage

plot_item(x, item.number = 1, type = NULL)

Arguments

x

A model fit object from either IRTest_Dich, IRTest_Poly, IRTest_Cont, or IRTest_Mix.

item.number

A numeric value indicating the item number.

type

A character string required if inherits(x, c("mix")) == TRUE. It should be either "d" (dichotomous item) or "p" (polytomous item); item.number=1, type="d" indicates the first dichotomous item.

Value

This function returns a plot of item response functions.

Author(s)

Seewoo Li [email protected]

Examples

# A preparation of dichotomous item response data

data <- DataGeneration(N=500, nitem_D = 10)$data_D

# Analysis

M1 <- IRTest_Dich(data)

# Plotting item response function

plot_item(M1, item.number = 1)

Plot of the estimated latent distribution

Description

This function draws a plot of the estimated latent distribution (the population distribution of the latent variable).

Usage

## S3 method for class 'IRTest'
plot(x, ...)

Arguments

x

An object of "IRTest"-class obtained from either IRTest_Dich, IRTest_Poly, IRTest_Cont, or IRTest_Mix.

...

Other aesthetic argument(s) for drawing the plot. Arguments are passed on to ggplot2::stat_function, if the distribution estimation method is 2NM, KDE, or DC. Otherwise, they are passed on to ggplot2::geom_line.

Value

A plot of estimated latent distribution.

Author(s)

Seewoo Li [email protected]

Examples

# Data generation and model fitting

data <- DataGeneration(N=1000,
                       nitem_D = 15,
                       latent_dist = "2NM",
                       d = 1.664,
                       sd_ratio = 2,
                       prob = 0.3)$data_D

M1 <- IRTest_Dich(data = data, latent_dist = "KDE")

# Plotting the latent distribution

plot(x = M1, linewidth = 1, color = 'red') +
  ggplot2::lims(x = c(-6, 6), y = c(0, .5))

Printing the result

Description

This function prints the summarized information.

Usage

## S3 method for class 'IRTest'
print(x, ...)

Arguments

x

An object of "IRTest"-class obtained from either IRTest_Dich, IRTest_Poly, or IRTest_Mix.

...

Additional arguments (currently non-functioning).

Value

Printed texts on the console recommending the usage of summary function and the direct access to the details using "$" sign.

Author(s)

Seewoo Li [email protected]

Examples

data <- DataGeneration(N=1000, nitem_P = 8)$data_P

M1 <- IRTest_Poly(data = data, latent_dist = "KDE")

M1

Printing the summary

Description

This function prints the summarized information.

Usage

## S3 method for class 'IRTest_summary'
print(x, ...)

Arguments

x

An object returned from summary.IRTest.

...

Additional arguments (currently non-functioning).

Value

Summarized texts on the console.

Author(s)

Seewoo Li [email protected]

Examples

data <- DataGeneration(N=1000, nitem_P = 8)$data_P

M1 <- IRTest_Poly(data = data,
                  latent_dist = "2NM")

summary(M1)

Recategorization of data using a new categorization scheme

Description

With a recategorization scheme as an input, this function implements recategorization for the input data.

Usage

recategorize(data, new_cat)

Arguments

data

An item response matrix.

new_cat

A list of a new categorization scheme.

Value

Recategorized data

Author(s)

Seewoo Li [email protected]

Examples

# Preparation of dichotomous item response data

data <- DataGeneration(N=1000,
                       nitem_P = 8)$data_P

# Analysis

M1 <- IRTest_Poly(data)

# Recommendation of category collapsing

new_cat <- cat_clps(M1$par_est)

# Recategorization of data

recategorize(data, new_cat)

Marginal reliability coefficient of IRT

Description

Marginal reliability coefficient of IRT

Usage

reliability(x)

Arguments

x

A model fit object from either IRTest_Dich, IRTest_Poly, IRTest_Cont, or IRTest_Mix.

Details

Reliability coefficient on summed-score scale

In accordance with the concept of reliability in classical test theory (CTT), this function calculates the IRT reliability coefficients.

The basic concept and formula of the reliability coefficient can be expressed as follows (Kim & Feldt, 2010):

An observed score of Item ii, XiX_i, is decomposed as the sum of a true score TiT_i and an error eie_i. Then, with the assumption of σTiej=σeiej=0\sigma_{T_{i}e_{j}}=\sigma_{e_{i}e_{j}}=0, the reliability coefficient of a test is defined as;

ρTX=ρXX=σT2σX2=σT2σT2+σe2=1σe2σX2\rho_{TX}=\rho_{XX^{'}}=\frac{\sigma_{T}^{2}}{\sigma_{X}^{2}}=\frac{\sigma_{T}^{2}}{\sigma_{T}^{2}+\sigma_{e}^{2}}=1-\frac{\sigma_{e}^{2}}{\sigma_{X}^{2}}

See May and Nicewander (1994) for the specific formula used in this function.

Reliability coefficient on θ\theta scale

For the coefficient on the θ\theta scale, this function calculates the parallel-forms reliability (Green et al., 1984; Kim, 2012):

ρθ^θ^=σE(θ^θ)2σE(θ^θ)2+E(σθ^θ2)=11+E(I(θ^)1)\rho_{\hat{\theta} \hat{\theta}^{'}} =\frac{\sigma_{E\left(\hat{\theta}\mid \theta \right )}^{2}}{\sigma_{E\left(\hat{\theta}\mid \theta \right )}^{2}+E\left( \sigma_{\hat{\theta}|\theta}^{2} \right)} =\frac{1}{1+E\left(I\left(\hat{\theta}\right)^{-1}\right)}

This assumes that σE(θ^θ)2=σθ2=1\sigma_{E\left(\hat{\theta}\mid \theta \right )}^{2}=\sigma_{\theta}^{2}=1. Although the formula is often employed in several IRT studies and applications, the underlying assumption may not be true.

Value

Estimated marginal reliability coefficients.

Author(s)

Seewoo Li [email protected]

References

Green, B.F., Bock, R.D., Humphreys, L.G., Linn, R.L., & Reckase, M.D. (1984). Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21(4), 347–360.

Kim, S. (2012). A note on the reliability coefficients for item response model-based ability estimates. Psychometrika, 77(1), 153-162.

Kim, S., Feldt, L.S. (2010). The estimation of the IRT reliability coefficient and its lower and upper bounds, with comparisons to CTT reliability statistics. Asia Pacific Education Review, 11, 179–188.

May, K., Nicewander, A.W. (1994). Reliability and information functions for percentile ranks. Journal of Educational Measurement, 31(4), 313-325.

Examples

data <- DataGeneration(N=500, nitem_D = 10)$data_D

# Analysis

M1 <- IRTest_Dich(data)


# Reliability coefficients
reliability(M1)

Summary of the results

Description

This function summarizes the output (e.g., convergence of the estimation algorithm, number of parameters, model-fit, and estimated latent distribution).

Usage

## S3 method for class 'IRTest'
summary(object, ...)

Arguments

object

An object of "IRTest"-class obtained from either IRTest_Dich, IRTest_Poly, or IRTest_Mix.

...

Other argument(s).

Value

Summarized information.

Examples

data <- DataGeneration(N=1000, nitem_P = 8)$data_P

M1 <- IRTest_Poly(data = data, latent_dist = "KDE")

summary(M1)