In a research project we are using a web-based tools for collecting data from questionnaire. The system generates files that are simple to read as a data frame in the "long" format, which are simple to convert to the "wide" format. Something that might happen are: (a) there are two (multiple) references to the same cell, and (b) if there are missing values? So, the data set has two references to S2/T2 and none to the S2/T1 combination: > d values person time 1 1 S1 T1 2 2 S1 T2 3 3 S1 T3 4 4 S1 T4 5 22 S2 T2 6 6 S2 T2 7 7 S2 T3 8 8 S2 T4 9 9 S3 T1 10 10 S3 T2 11 11 S3 T3 12 12 S3 T4 reshape (d, idvar="person", v.names=c("values"), timevar="time", direction="wide") person values.T1 values.T2 values.T3 values.T4 1 S1 1 2 3 4 5 S2 NA 22 7 8 9 S3 9 10 11 12 The missing cell gets an NA as expected. But the surprise is in the case where there are two references to the same cell. The the *first* is used (22 rather than 6). Is there some way of forcing reshape () to use the *last* value? Tom
On Tue, Jun 17, 2008 at 9:28 AM, Tom Backer Johnsen <backer at psych.uib.no> wrote:> In a research project we are using a web-based tools for collecting data > from questionnaire. The system generates files that are simple to read as a > data frame in the "long" format, which are simple to convert to the "wide" > format. > > Something that might happen are: (a) there are two (multiple) references to > the same cell, and (b) if there are missing values? So, the data set has > two references to S2/T2 and none to the S2/T1 combination: > >> d > values person time > 1 1 S1 T1 > 2 2 S1 T2 > 3 3 S1 T3 > 4 4 S1 T4 > 5 22 S2 T2 > 6 6 S2 T2 > 7 7 S2 T3 > 8 8 S2 T4 > 9 9 S3 T1 > 10 10 S3 T2 > 11 11 S3 T3 > 12 12 S3 T4 > reshape (d, idvar="person", v.names=c("values"), timevar="time", > direction="wide") > person values.T1 values.T2 values.T3 values.T4 > 1 S1 1 2 3 4 > 5 S2 NA 22 7 8 > 9 S3 9 10 11 12 > > The missing cell gets an NA as expected. But the surprise is in the case > where there are two references to the same cell. The the *first* is used > (22 rather than 6).You might try using the reshape package instead: last <- function(x) x[length(x)] names(d) <- c("value", "person", "time") cast(d, person ~ time, last) You can find out more at http://had.co.nz/reshape Hadley -- http://had.co.nz/
hadley wickham wrote:> On Tue, Jun 17, 2008 at 9:28 AM, Tom Backer Johnsen <backer at psych.uib.no> wrote: >> In a research project we are using a web-based tools for collecting data >> from questionnaire. The system generates files that are simple to read as a >> data frame in the "long" format, which are simple to convert to the "wide" >> format. >> >> Something that might happen are: (a) there are two (multiple) references to >> the same cell, and (b) if there are missing values? So, the data set has >> two references to S2/T2 and none to the S2/T1 combination: >> >>> d >> values person time >> 1 1 S1 T1 >> 2 2 S1 T2 >> 3 3 S1 T3 >> 4 4 S1 T4 >> 5 22 S2 T2 >> 6 6 S2 T2 >> 7 7 S2 T3 >> 8 8 S2 T4 >> 9 9 S3 T1 >> 10 10 S3 T2 >> 11 11 S3 T3 >> 12 12 S3 T4 >> reshape (d, idvar="person", v.names=c("values"), timevar="time", >> direction="wide") >> person values.T1 values.T2 values.T3 values.T4 >> 1 S1 1 2 3 4 >> 5 S2 NA 22 7 8 >> 9 S3 9 10 11 12 >> >> The missing cell gets an NA as expected. But the surprise is in the case >> where there are two references to the same cell. The the *first* is used >> (22 rather than 6). > > You might try using the reshape package instead: > > last <- function(x) x[length(x)] > names(d) <- c("value", "person", "time") > cast(d, person ~ time, last)The first and the last line I think is clear, although I will have to experiment more to understand the call on cast () better. However, what I do not understand is the purpose of the second line. I can print out names(d) right after the reading the frame with the read.table function. If I print names (d) right after that statement has been executed, then I see no difference. Even so, it seems to be necessary for the call on cast to work. It seems that "names" is not the same as "names". Something along the lines of a with () or attach () perhaps? Tom> > You can find out more at http://had.co.nz/reshape > > Hadley > >