Programming an IRT, 2Pl Model from Scratch

(So You Don’t Have to)

I recently had a client that was looking to build a 2PL model from scratch. The 2PL describes the process through which someone gets a correct or incorrect response on a test. More specifically, the probability of person j providing a positive answer to item i is given by:

\[Pr(Y_{ij} = 1| \theta) = exp(a_i(\theta_j - b_i)/(1 + exp(a_i(\theta_j - b_i))\] \[\theta \sim N(0, 1)\]

In this model, \[\alpha\] represents the discrimination of item i, \[\theta\] represents the latent trait (i.e., ability) of person j, and \[b\] represents represents the difficulty of item i.

The 2PL model is extremely similar to a logistic regression. In fact, it is often written in matrix notation form that is identical to a logistic regression:

\[Pr(Y_{ij} = 1| \theta) = exp(a_i\theta_j + \beta_i)/(1 + exp(a_i\theta_j + \beta_i))\] where

\[ a_i = a_i \] and \[ b_i = -\beta_i/alpha_i\]

The tricky part is that unlike a logistic regression, where X and Y are known, and you derive weights that make the best-fitting line for X on Y, with IRT, the only thing that is known is Y (the probability of passing an item). Everything else, alpha, beta, and theta are unknown.

To solve this problem, the Expectation–maximization algorithm is used. This procedure finds the maximum likelihood of estimates of parameters, when there are unobserved variables in the model. So, if you want to find parameter estimates for item discrimination and difficulty, you use the Expectation–maximization algorithm to account for the unobserved abilities. More specifically, you use a procedure known as Gaussian quadrature. With Gaussian quadrature, you assume that person abilities occur along a normal distribution, and so you divide a normal distribution into segments, with a representative value (i.e., the quadrature point), and a probability of occurrence (i.e., the weight). So, you pick a number of segments, and for each segment you derive the expected number of correct responses. Then with this expectation, you solve for item parameters. So, you have a list of item parameters over each of your segments, and then you apply weights and average over these item parameters, so that item parameters from unlikely person abilities are given less weight.

Below, I break up the entire procedure into a set of functions (this code was cribbed from Baker’s awesome guide to IRT)

First, make a function to get the probability of getting an item right. You’ll notice that this is identical to a logistic regression.

Prob_Correct = function(alpha, beta, person_parameter){
  DEV=(-beta + alpha) * person_parameter;

Second, get the likelihood of the computed probabilities. Note: there are three loops:

  1. We loop over patterns (in a pre-processing stage, we group people into patterns of responses, to speed up processing time)
  2. Loop over quadrate points
  3. Loop over items

In the end, we get a pattern likelihood where unlikely person traits are given less weight.

get_likelihood <- function(Y, CPT, A, QuadPoints, Weights, NumPatterns){
  PL = rep(0, NumPatterns)
  Prediction = c()
  Likelihood = matrix(nrow = NumPatterns, ncol = 10)
  for (l in 1:NumPatterns){
    for (k in 1:10){
      for (i in 1:NumItems){
        item = Y[l, i]
        Prediction[i] = Prob_Correct(CPT[i], A[i], QuadPoints[k])
        Prediction[i] = ifelse(item == 1, Prediction, 1 - Prediction)
        if ([l,k])){
          Likelihood[l,k] = Prediction[i]
          Likelihood[l,k] = Likelihood[l,k] * Prediction[i]
    PL[l] = sum(Likelihood[l,] * Weights)
  return(list(Likelihood, PL))

Here, we get the expected item responses patterns. We return the expected correct number of responses, and the total nunber of responses for each item and for each unobserved person abilities.

get_rik_nik = function(Likelihood, Weights, PL){
  RIK = matrix(nrow=NumItems, ncol = 10, 0)
  NIK = matrix(nrow=NumItems, ncol = 10, 0)
  for (i in 1:NumItems){
    for (k in 1:10){
      NT=(NumResponses * Likelihood[,k] * Weights[k])/PL
      RIK[i,k] = RIK[i,k] + sum(RT)
      NIK[i,k]=  NIK[i,k] + sum(NT) 
  return(list(RIK, NIK))

Here we estimate the actual parameters.

  • CPT = Intercept (annoyingly, beta is the intercept in IRT, and alpha is the slope).
  • A = Slope

Again, all values here used to calculate derivatives here were taken from Baker’s awesome guide to IRT.

get_parameters <- function(CPT, A, RIK, NIK, QuadPoints){
  for (NIT in 1:100){
    for (k in 1:10){
      PI = RIK[,k]/NIK[,k]
      DEV = CPT + A * QuadPoints[k]
      PH = 1/(1 + exp(-DEV))
      W = PH * (1-PH)
      V = (PI - PH)/W
      P1 = NIK[,k]*W;
    if (any(DM < .00009)){
      print ('an error occured!')
    if(sum(abs(DCPT))<=.05 & sum(abs(DA))<=.05){
    CPT = CPT + DCPT
    A = A + DA
  return(list(CPT, A))

Here we estimate the actual item parameters

estimate_items = function(data, NumItems, QuadPoints, Weights, NumPatterns){
  CPT = rep(0, NumItems)
  A = rep(1, NumItems)
  for (i in 1:100){
    Likelihood_pl = get_likelihood(Y, CPT, A, QuadPoints, Weights, NumPatterns)
    Likelihood = Likelihood_pl[[1]]
    PL = Likelihood_pl[[2]]
    RIK_NIK = get_rik_nik(Likelihood, Weights, PL)
    RIK = RIK_NIK[[1]]
    NIK = RIK_NIK[[2]]
    CPT_A = get_parameters(CPT, A, RIK, NIK, QuadPoints)
    if (sum(CPT - CPT_A[[1]]) < .0001 & sum(A - CPT_A[[2]]) < .0001){
      print ('early return!')
    CPT = CPT_A[[1]]
    A =  CPT_A[[2]]
  return(list(CPT, A))

And now we can test!

#### Make Dummy Data 

J = 5
N = 100

icc.2pl <- function (thetas, alphas, betas, N, J){
  theta <- matrix(thetas,ncol=1) %*% matrix(alphas,nrow=1) 
  logits <- (theta - t(matrix(betas,J,N))) 
  probs <- 1/(1+exp(-logits)) 

betas <- rnorm(J, sd=2) 
alphas <- exp(rnorm(J,sd=.7))
thetas <- rnorm(100, sd=2) 

# generate fake data
probs <- icc.2pl (thetas, alphas,betas,N,J) 

# make random Y 
Y <- matrix(runif(N*J) ,nrow=N,ncol=J)
# assign 1 and 0 
Y <- (ifelse (Y<probs,1,0))
Y = Y[,,]  
responses = Y

# Make response patterns
Response_Patterns = apply(Y, 1, paste, collapse="")
# Get the Count of the response pattern
GetCount <- function(x, Response_Patterns){
  x = paste(x, collapse = "")

NumResponses = apply(Y, 1, GetCount, Response_Patterns)
# dont double count response patterns
Keep = which(! duplicated(Y))
NumResponses = NumResponses[Keep]
Y = Y[Keep,]

NumPatterns = nrow(Y)
NumItems = ncol(Y)

Here we make upu the quadrature points and their associated weights.

QuadPoints = c(-4.0000,-3.1111,-2.2222,-1.3333,-.4444,.4444,1.3333, 2.2222,3.1111,4.0000)

Weights = c(.000119,

Now estimate

CPT_A = estimate_items(Y, NumItems, QuadPoints, Weights, NumPatterns)
## [1] "early return!"

Let’s make sure we did okay

CPT = CPT_A[[1]] 
A = CPT_A[[2]]

sum(Prob_Correct(CPT, A, thetas))
## [1] 51.17944
sum(Prob_Correct(alphas, betas, thetas))
## [1] 49.97478

Looks the predicted number of correct items is very close betwene the simulated parameters and the estimated parameters

Further Reading:

Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques. CRC Press.

Embretson, S. E., & Reise, S. P. (2013). Item response theory. Psychology Press.