thr3ads.net - R help - [R] Restructure some data [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Doran, Harold

2010-Feb-25 18:34 UTC

[R] Restructure some data

Suppose I have a data frame like "dat" below. For some context, this
is the format that represents student's taking a computer adaptive test.
first.item is the first item that student was administered and then score.1 is
the student's response to that item and so forth.

item.pool <- paste("item", 1:10, sep = "")
set.seed(54321)
dat <- data.frame(id = c(1,2,3,4,5), first.item = sample(item.pool, 5,
replace=TRUE),
                second.item = sample(item.pool, 5,replace=TRUE), third.item =
sample(item.pool, 5,replace=TRUE),
                score1 = sample(c(0,1), 5,replace=TRUE), score2 = sample(c(0,1),
5,replace=TRUE), score3 = sample(c(0,1), 5,replace=TRUE))

I need to restructure this into a new format. The new matrix df (after the loop)
is exactly what I want in the end. But, I'm annoyed at myself for not
thinking of a more efficient way to restructure this without using a loop.

df <- matrix(NA, ncol = length(item.pool), nrow = nrow(dat))
colnames(df) <- unique(item.pool)

for(i in 1:5){
                for(j in 2:4){
                                rr <- which(dat[i,j] == colnames(df))
                                df[i,rr] <- dat[i, (j+3)]
                }
}

Any thoughts?

Harold

	[[alternative HTML version deleted]]

William Dunlap

2010-Feb-25 20:59 UTC

head link

[R] Restructure some data

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Doran, Harold
> Sent: Thursday, February 25, 2010 10:35 AM
> To: r-help at r-project.org
> Subject: [R] Restructure some data
> 
> Suppose I have a data frame like "dat" below. For some 
> context, this is the format that represents student's taking 
> a computer adaptive test. first.item is the first item that 
> student was administered and then score.1 is the student's 
> response to that item and so forth.
> 
> item.pool <- paste("item", 1:10, sep = "")
> set.seed(54321)
> dat <- data.frame(id = c(1,2,3,4,5), first.item = 
> sample(item.pool, 5, replace=TRUE),
>                 second.item = sample(item.pool, 
> 5,replace=TRUE), third.item = sample(item.pool, 5,replace=TRUE),
>                 score1 = sample(c(0,1), 5,replace=TRUE), 
> score2 = sample(c(0,1), 5,replace=TRUE), score3 = 
> sample(c(0,1), 5,replace=TRUE))
> 
> I need to restructure this into a new format. The new matrix 
> df (after the loop) is exactly what I want in the end. But, 
> I'm annoyed at myself for not thinking of a more efficient 
> way to restructure this without using a loop.
> 
> df <- matrix(NA, ncol = length(item.pool), nrow = nrow(dat))
> colnames(df) <- unique(item.pool)
> 
> for(i in 1:5){
>                 for(j in 2:4){
>                                 rr <- which(dat[i,j] == colnames(df))
>                                 df[i,rr] <- dat[i, (j+3)]
>                 }
> }
> 
> Any thoughts?
You can try subscripting by a 2-column matrix, the first
giving the row index and the second the column index.  E.g.,

  > f <- function(dat) {
      allItems <- paste("item", 1:10, sep = "")
      items <- as.matrix(dat[2:4])
      scores <- as.matrix(dat[, 5:7])
      retval <- matrix(NA_real_, nrow = nrow(dat), ncol = 10,
          dimnames = list(character(), allItems))
      retval[cbind(dat$id, match(items, allItems))] <- scores
      retval
  }
  > identical(f(dat), df)
  [1] TRUE

That was a very nice problem description, letting me
reproduce the example data and desired output with
just copy and paste.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> 
> Harold
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Phil Spector

2010-Feb-25 22:38 UTC

head link

[R] Restructure some data

Harold -
    Here's what I came up with:
>  tapply(as.vector(as.matrix(dat[5:7])),+         list(rep(dat$id,3),as.vector(as.matrix(dat[2:4]))),I)
   item1 item10 item2 item3 item4 item5 item7 item9
1    NA     NA     1    NA    NA     1    NA     0
2     0     NA    NA    NA    NA     1     1    NA
3     1     NA     0     1    NA    NA    NA    NA
4    NA     NA    NA     1     0    NA     0    NA
5    NA      1    NA     0     1    NA    NA    NA

I thought there would be a way to use xtabs, but I had
trouble preserving the NAs.

The columns aren't in the right order, and the item6 column is
missing, but it's pretty close.
Thanks for the easily reproducible example, and the interesting
puzzle.

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu


On Thu, 25 Feb 2010, Doran, Harold wrote:
> Suppose I have a data frame like "dat" below. For some context,
this is the format that represents student's taking a computer adaptive
test. first.item is the first item that student was administered and then
score.1 is the student's response to that item and so forth.
>
> item.pool <- paste("item", 1:10, sep = "")
> set.seed(54321)
> dat <- data.frame(id = c(1,2,3,4,5), first.item = sample(item.pool, 5,
replace=TRUE),
>                second.item = sample(item.pool, 5,replace=TRUE), third.item
= sample(item.pool, 5,replace=TRUE),
>                score1 = sample(c(0,1), 5,replace=TRUE), score2 =
sample(c(0,1), 5,replace=TRUE), score3 = sample(c(0,1), 5,replace=TRUE))
>
> I need to restructure this into a new format. The new matrix df (after the
loop) is exactly what I want in the end. But, I'm annoyed at myself for not
thinking of a more efficient way to restructure this without using a loop.
>
> df <- matrix(NA, ncol = length(item.pool), nrow = nrow(dat))
> colnames(df) <- unique(item.pool)
>
> for(i in 1:5){
>                for(j in 2:4){
>                                rr <- which(dat[i,j] == colnames(df))
>                                df[i,rr] <- dat[i, (j+3)]
>                }
> }
>
> Any thoughts?
>
> Harold
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Reasonably Related Threads

Search for more reasonably related threads

R help - Feb 2010 - Restructure some data

[R] Restructure some data

[R] Restructure some data

[R] Restructure some data

Reasonably Related Threads