Suppose I have a data frame like "dat" below. For some context, this
is the format that represents student's taking a computer adaptive test.
first.item is the first item that student was administered and then score.1 is
the student's response to that item and so forth.
item.pool <- paste("item", 1:10, sep = "")
set.seed(54321)
dat <- data.frame(id = c(1,2,3,4,5), first.item = sample(item.pool, 5,
replace=TRUE),
second.item = sample(item.pool, 5,replace=TRUE), third.item =
sample(item.pool, 5,replace=TRUE),
score1 = sample(c(0,1), 5,replace=TRUE), score2 = sample(c(0,1),
5,replace=TRUE), score3 = sample(c(0,1), 5,replace=TRUE))
I need to restructure this into a new format. The new matrix df (after the loop)
is exactly what I want in the end. But, I'm annoyed at myself for not
thinking of a more efficient way to restructure this without using a loop.
df <- matrix(NA, ncol = length(item.pool), nrow = nrow(dat))
colnames(df) <- unique(item.pool)
for(i in 1:5){
for(j in 2:4){
rr <- which(dat[i,j] == colnames(df))
df[i,rr] <- dat[i, (j+3)]
}
}
Any thoughts?
Harold
[[alternative HTML version deleted]]
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Doran, Harold > Sent: Thursday, February 25, 2010 10:35 AM > To: r-help at r-project.org > Subject: [R] Restructure some data > > Suppose I have a data frame like "dat" below. For some > context, this is the format that represents student's taking > a computer adaptive test. first.item is the first item that > student was administered and then score.1 is the student's > response to that item and so forth. > > item.pool <- paste("item", 1:10, sep = "") > set.seed(54321) > dat <- data.frame(id = c(1,2,3,4,5), first.item = > sample(item.pool, 5, replace=TRUE), > second.item = sample(item.pool, > 5,replace=TRUE), third.item = sample(item.pool, 5,replace=TRUE), > score1 = sample(c(0,1), 5,replace=TRUE), > score2 = sample(c(0,1), 5,replace=TRUE), score3 = > sample(c(0,1), 5,replace=TRUE)) > > I need to restructure this into a new format. The new matrix > df (after the loop) is exactly what I want in the end. But, > I'm annoyed at myself for not thinking of a more efficient > way to restructure this without using a loop. > > df <- matrix(NA, ncol = length(item.pool), nrow = nrow(dat)) > colnames(df) <- unique(item.pool) > > for(i in 1:5){ > for(j in 2:4){ > rr <- which(dat[i,j] == colnames(df)) > df[i,rr] <- dat[i, (j+3)] > } > } > > Any thoughts?You can try subscripting by a 2-column matrix, the first giving the row index and the second the column index. E.g., > f <- function(dat) { allItems <- paste("item", 1:10, sep = "") items <- as.matrix(dat[2:4]) scores <- as.matrix(dat[, 5:7]) retval <- matrix(NA_real_, nrow = nrow(dat), ncol = 10, dimnames = list(character(), allItems)) retval[cbind(dat$id, match(items, allItems))] <- scores retval } > identical(f(dat), df) [1] TRUE That was a very nice problem description, letting me reproduce the example data and desired output with just copy and paste. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> > Harold > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Harold -
Here's what I came up with:
> tapply(as.vector(as.matrix(dat[5:7])),
+ list(rep(dat$id,3),as.vector(as.matrix(dat[2:4]))),I)
item1 item10 item2 item3 item4 item5 item7 item9
1 NA NA 1 NA NA 1 NA 0
2 0 NA NA NA NA 1 1 NA
3 1 NA 0 1 NA NA NA NA
4 NA NA NA 1 0 NA 0 NA
5 NA 1 NA 0 1 NA NA NA
I thought there would be a way to use xtabs, but I had
trouble preserving the NAs.
The columns aren't in the right order, and the item6 column is
missing, but it's pretty close.
Thanks for the easily reproducible example, and the interesting
puzzle.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Thu, 25 Feb 2010, Doran, Harold wrote:
> Suppose I have a data frame like "dat" below. For some context,
this is the format that represents student's taking a computer adaptive
test. first.item is the first item that student was administered and then
score.1 is the student's response to that item and so forth.
>
> item.pool <- paste("item", 1:10, sep = "")
> set.seed(54321)
> dat <- data.frame(id = c(1,2,3,4,5), first.item = sample(item.pool, 5,
replace=TRUE),
> second.item = sample(item.pool, 5,replace=TRUE), third.item
= sample(item.pool, 5,replace=TRUE),
> score1 = sample(c(0,1), 5,replace=TRUE), score2 =
sample(c(0,1), 5,replace=TRUE), score3 = sample(c(0,1), 5,replace=TRUE))
>
> I need to restructure this into a new format. The new matrix df (after the
loop) is exactly what I want in the end. But, I'm annoyed at myself for not
thinking of a more efficient way to restructure this without using a loop.
>
> df <- matrix(NA, ncol = length(item.pool), nrow = nrow(dat))
> colnames(df) <- unique(item.pool)
>
> for(i in 1:5){
> for(j in 2:4){
> rr <- which(dat[i,j] == colnames(df))
> df[i,rr] <- dat[i, (j+3)]
> }
> }
>
> Any thoughts?
>
> Harold
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>