Suppose I have a data frame like "dat" below. For some context, this is the format that represents student's taking a computer adaptive test. first.item is the first item that student was administered and then score.1 is the student's response to that item and so forth. item.pool <- paste("item", 1:10, sep = "") set.seed(54321) dat <- data.frame(id = c(1,2,3,4,5), first.item = sample(item.pool, 5, replace=TRUE), second.item = sample(item.pool, 5,replace=TRUE), third.item = sample(item.pool, 5,replace=TRUE), score1 = sample(c(0,1), 5,replace=TRUE), score2 = sample(c(0,1), 5,replace=TRUE), score3 = sample(c(0,1), 5,replace=TRUE)) I need to restructure this into a new format. The new matrix df (after the loop) is exactly what I want in the end. But, I'm annoyed at myself for not thinking of a more efficient way to restructure this without using a loop. df <- matrix(NA, ncol = length(item.pool), nrow = nrow(dat)) colnames(df) <- unique(item.pool) for(i in 1:5){ for(j in 2:4){ rr <- which(dat[i,j] == colnames(df)) df[i,rr] <- dat[i, (j+3)] } } Any thoughts? Harold [[alternative HTML version deleted]]
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Doran, Harold > Sent: Thursday, February 25, 2010 10:35 AM > To: r-help at r-project.org > Subject: [R] Restructure some data > > Suppose I have a data frame like "dat" below. For some > context, this is the format that represents student's taking > a computer adaptive test. first.item is the first item that > student was administered and then score.1 is the student's > response to that item and so forth. > > item.pool <- paste("item", 1:10, sep = "") > set.seed(54321) > dat <- data.frame(id = c(1,2,3,4,5), first.item = > sample(item.pool, 5, replace=TRUE), > second.item = sample(item.pool, > 5,replace=TRUE), third.item = sample(item.pool, 5,replace=TRUE), > score1 = sample(c(0,1), 5,replace=TRUE), > score2 = sample(c(0,1), 5,replace=TRUE), score3 = > sample(c(0,1), 5,replace=TRUE)) > > I need to restructure this into a new format. The new matrix > df (after the loop) is exactly what I want in the end. But, > I'm annoyed at myself for not thinking of a more efficient > way to restructure this without using a loop. > > df <- matrix(NA, ncol = length(item.pool), nrow = nrow(dat)) > colnames(df) <- unique(item.pool) > > for(i in 1:5){ > for(j in 2:4){ > rr <- which(dat[i,j] == colnames(df)) > df[i,rr] <- dat[i, (j+3)] > } > } > > Any thoughts?You can try subscripting by a 2-column matrix, the first giving the row index and the second the column index. E.g., > f <- function(dat) { allItems <- paste("item", 1:10, sep = "") items <- as.matrix(dat[2:4]) scores <- as.matrix(dat[, 5:7]) retval <- matrix(NA_real_, nrow = nrow(dat), ncol = 10, dimnames = list(character(), allItems)) retval[cbind(dat$id, match(items, allItems))] <- scores retval } > identical(f(dat), df) [1] TRUE That was a very nice problem description, letting me reproduce the example data and desired output with just copy and paste. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> > Harold > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Harold - Here's what I came up with:> tapply(as.vector(as.matrix(dat[5:7])),+ list(rep(dat$id,3),as.vector(as.matrix(dat[2:4]))),I) item1 item10 item2 item3 item4 item5 item7 item9 1 NA NA 1 NA NA 1 NA 0 2 0 NA NA NA NA 1 1 NA 3 1 NA 0 1 NA NA NA NA 4 NA NA NA 1 0 NA 0 NA 5 NA 1 NA 0 1 NA NA NA I thought there would be a way to use xtabs, but I had trouble preserving the NAs. The columns aren't in the right order, and the item6 column is missing, but it's pretty close. Thanks for the easily reproducible example, and the interesting puzzle. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Thu, 25 Feb 2010, Doran, Harold wrote:> Suppose I have a data frame like "dat" below. For some context, this is the format that represents student's taking a computer adaptive test. first.item is the first item that student was administered and then score.1 is the student's response to that item and so forth. > > item.pool <- paste("item", 1:10, sep = "") > set.seed(54321) > dat <- data.frame(id = c(1,2,3,4,5), first.item = sample(item.pool, 5, replace=TRUE), > second.item = sample(item.pool, 5,replace=TRUE), third.item = sample(item.pool, 5,replace=TRUE), > score1 = sample(c(0,1), 5,replace=TRUE), score2 = sample(c(0,1), 5,replace=TRUE), score3 = sample(c(0,1), 5,replace=TRUE)) > > I need to restructure this into a new format. The new matrix df (after the loop) is exactly what I want in the end. But, I'm annoyed at myself for not thinking of a more efficient way to restructure this without using a loop. > > df <- matrix(NA, ncol = length(item.pool), nrow = nrow(dat)) > colnames(df) <- unique(item.pool) > > for(i in 1:5){ > for(j in 2:4){ > rr <- which(dat[i,j] == colnames(df)) > df[i,rr] <- dat[i, (j+3)] > } > } > > Any thoughts? > > Harold > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >