Rrasch

Psychometrics in R, JAGS and WinBUGS, with some Python thrown in... oh.. and it's in the cloud ...

R Jags Rjags on an Ec2 Instance

| Comments

Winbugs and Jags free Item Response Theory from the dot matrix plots of proprietary software and open up a multicoloured world of posterior predictive model checking. Fitting IRT models using brute force is not for the impatient, however. That’s why, just as early psychometricians shipped off their calculations to teams of monks. I’ve shipped off my model fitting to the West Coast of Ireland (seems apt considering the number of monasteries there) so that I can use my computer to do more important things. Like write about why I’ve shipped off my model fitting to the West Coast of Ireland.

Thanks to Michael Rutter (and the ever-vigilant Dirk Eddelbuettel for the prompt) getting jags and r up and ready on a 64-bit server is as simple as:

CloudR - cloudr.sh
1
2
3
sudo add-apt-repository ppa:marutter/rrutter
sudo apt-get update
sudo apt-get install r-base-dev jags r-cran-rjags

Is running your model in the cloud worth a few bob?

If you want to run JAGS in the cloud it’ll probably cost you a few bob. That’s because on the EC2 micro instances JAGS models tend to trigger the CPU limit and grind to a halt. If you want to run odd jobs then I’d suggest setting up JAGS on an official Ubuntu EC2 Large image, shutting it down, saving the image, then running it at a spot price. That means for a you’ll be paying about $0.144 per hour when it is running. If you want an image that is always running I’d suggest you check out monthly hosting rates with other cloud services which can be cheaper than EC2.

Which ubuntu image should you use?

I should check out which ones are supported by Michael Rutter.

What is EC2?

If none of this means anything to you I’d recommend playing about with EC2 micro instances and ensuring you can get into and out of them using SSH. As my main computer is Windows based I use putty and FileZilla. FileZilla allows you to configure a default text editor so it is fine for writing most programming languages.

A useful guide on setting yourself up on EC2 with ubuntu
Setting up ftp on ubuntu

Only pay for what you use

Don’t forget to stop the instance and save the image on AWS. Then, when you are ready to run your job you can request a spot instance and pay for only the time it takes to run your job.

Keep your job running while you are away

If you are SSHing into your instance don’t forget to set your analysis going using the nohup command, otherwise your analysis will be terminated as soon as you log out.

Ten Reasons to Love the Cloud

| Comments

Ten reasons to love the cloud:

  1. The names on Heroku. empty-moon-9726. Suddenly the cloud seems prosaic when you can have the moon.
  2. Being reunited with putty. It’s like catching up with an old friend. At first the conversation is a little stilted, but then you soon remember just how well you got on and what brought you together in the first place. Things will never be the same between you and the command prompt ever again.
  3. Never having to talk to anyone in systems admin ever again.
  4. Having one instance for one programming language. Let’s face it, Ruby and Python don’t really get on so keep them on different boxes, in different continents.
  5. Running 64-bit R from your phone. Just because you can.
  6. Being able to say ‘I’ve just set the analysis going on my cluster in Rio de Janeiro’
  7. Never having to talk to anyone in systems. oh, did I say that?
  8. Telling your boss that 4 virtual cores with 2 EC2 Compute Units should do it. It’s just so Star Trek. Throw in a few Tachions and we’ll hit warp speed.
  9. Never ever writing a line of PHP again.
  10. sudo apt-get install r-base-dev

R in the Cloud

| Comments

Psychometrics, Qu’est-ce que c’est?

Say psychometrics to people and they think IQ tests. Fair enough.

I think eRm:

1
2
3
4
5
6
7

# Rasch model with beta.1 restricted to 0
data(raschdat1)
res <- RM(raschdat1, sum0 = FALSE)
print(res)
summary(res)
res$W

The joy of fitting your first Rasch model in R is unparallelled. Go on try, it. Hmmm, a list of numbers. No idea what they mean?

ok. so you take an IQ test. How do we know how clever you are? We’ve fitted a model to the test, and the model tells us. Ah, but what interests a psychometrician is the residuals. What the model doesn’t tell us. You may not fit our model. Interesting. Perhaps we need different questions for you, or perhaps we need a different model (sorry Benjamin D. Wright). In the old days that would mean expensive software where you spend days figuring out that 0 is not allowed as a valid response. Really? Who writes this software!

Ah, R has an answer. And I love R. I mean really love R. ltm. Eat your heart out with generalised models. Only now the misfit is getting quite obscure. I no longer know if you fit my model or the items fit my model or if nothing fits at all because there is a distribution of a statistic that Sitjsma has told me has certain properties, but I have no way of following his proof.

And so to Bayes, to JAGS, to WinBUGS, RJags, R2WinBUGS and models that have an elegance that was dreamt of but unrealised before Gibbs sampling. Models galore: S. Mackay Curtis and the testlet model. Now my items can live within testlets!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
model {
 for (i in 1:n){
 for (j in 1:p){
 Y[i , j] ~ dbern ( prob [i , j])
 logit ( prob [i , j]) <- alpha [j] * ( theta [i] - delta [j ] + gamma [i,d[j]])
}

theta [i] ~ dnorm (0.0 , 1.0)

for (k in 1:n.t){
 gamma [i , k] ~ dnorm (0.0 , pr. gamma [k])
}
 gamma [i , n.t + 1] <- 0.0
}

for (j in 1:p){
 alpha [j] ~ dnorm (m.alpha , pr. alpha ) I (0.0 , )
 delta [j] ~ dnorm (m.delta , pr. delta )
}
 pr. alpha <- pow(s.alpha , -2)
 pr. delta <- pow(s.delta , -2)

for (k in 1:n.t){
 pr. gamma [k] ~ dgamma (a. sigsq .gamma , b. sigsq . gamma )
 sigsq . gamma [k] <- 1.0 /pr. gamma [k]
 }
}

So you fit your model. And you load up your data. And you iterate. And iterate. And iterate. Minutes turn to hours turn to days. And if you are on Windows you have rebooted six times to install security updates. There must be a better way. Enter stage left Django, Python rpy2 and the cloud.

Follow me into the cloud.

Rooks in the Cloud

| Comments

Ever since R was born (evoked?) geeks have been trying to get it to talk HTML. A list of web interfaces for R is updated on CRAN here. Aims are various. Some seek to replace R with a traditional GUI. Others are more ambitious and open up a glimpse of an architecture that provides live analysis of ever changing data.

Perhaps the most exciting of the architectures is Rook a R web service that takes advantage of the built in web server in R.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
function(env){
  body = paste("<h1>Hello World! This is Rook",env$rook.version,".</h1>")
  list(
  status = 200L,
  headers = list(
  "Content-Type" = "text/html"
  ),
  body = body
  )
}

setRefClass(
  "HelloWorld",
  methods = list(
  call = function(env){
  list(
    status = 200L,
    headers = list(
    "Content-Type" = "text/html"
    ),
    body = paste("<h1>Hello World! This is Rook",env$rook.version,".</h1>")
    )
  }
  )
)

Cool!

While I love R, however, there are some things that I’m really not sure it should be doing everything. Making toast and being a web server two of them. The joy of django and Ruby is that they do stuff for you like look after cross site validation forgery requests. I’m a whizz with R, but I wouldn’t know where to start with those ones.

Anyhow, if the architecture exists where are the applications?

Over at Cambridge we have the brilliant Concerto which makes writing an adaptive test using catR a breeze. It really is worth a go. They will even host a demo account for you.

Elsewhere Jeroen Ooms cuts a lonely figure in R web services. Linear models and ggplots in web interfaces. Check them out here and how he did it here.

But where are the applications reaching critical mass? Where is the Ruby, django R integration work going on?

GIBBS Us a Break

| Comments

So you want to run R in the cloud so you can set your Gibbs sampling off, forget about it, and not be paranoid about power cuts and reboots. Andrew Gelman hosted a good debate on the pros and cons of R in the cloud on his blog.

The consensus seems to be RStudio and EC2. P.S. If you haven’t got Andrew’s book Data Analysis Using Regression and Multilevel/Hierarchical Models (2007) go and steal one. Seriously. Anyway, here is a lovely intro on how how to go down the R EC2 route

There you have it. If you want to run R on a 64-bit server with googles of power, that is your answer. So long as you know your cat from your whoami and you don’t want to use JAGS or BUGS. Really? ok, so it wouldn’t be too hard to get JAGS talking to RStudio, but is there more to web services R than this?