Dear list, I have a dataset consists of duplicated sequences within day for each patient (see below data) and I want to reshape the data with patient as time variable. However the reshape function only takes the first sequence of the replicates and ignores the second. How can I 1) average the duplicates and 2) give the duplicated sequences unique names before reshaping the data ? > data patient day seq y 1 10 1 acdf -0.52416066 2 10 1 cdsv 0.62551539 3 10 1 dlfg -1.54668047 4 10 1 acdf 0.82404978 5 10 1 cdsv -1.17459914 6 10 2 acdf 0.47238216 7 10 2 cdsv -0.92364896 8 10 2 dlfg 1.19273992 9 10 2 acdf 0.03759663 10 10 2 cdsv 1.05106783 11 12 1 acdf 0.43575105 12 12 1 cdsv 1.01675547 13 12 1 dlfg -1.54601413 14 12 1 acdf 1.03384654 15 12 1 cdsv 0.32197671 16 12 2 acdf 0.37355285 17 12 2 cdsv -0.39780850 18 12 2 dlfg -0.37693499 19 12 2 acdf -1.28989165 20 12 2 cdsv -0.06938098 21 23 1 acdf -0.68486972 22 23 1 cdsv -1.08035660 23 23 1 dlfg 0.93124685 24 23 1 acdf -0.78737514 25 23 1 cdsv -1.56315904 26 23 2 acdf -2.30913270 27 23 2 cdsv -1.64583577 28 23 2 dlfg 1.87435485 29 23 2 acdf -1.99671825 30 23 2 cdsv 0.62995993 > redata<-reshape(data,idvar=c("day","seq"),timevar="patient",direction="wide") The reshaped data has only three sequences for each day and didn't take into account the value of the second replicate. > > redata day seq y.10 y.12 y.23 1 1 acdf -0.5241607 0.4357510 -0.6848697 2 1 cdsv 0.6255154 1.0167555 -1.0803566 3 1 dlfg -1.5466805 -1.5460141 0.9312469 6 2 acdf 0.4723822 0.3735529 -2.3091327 7 2 cdsv -0.9236490 -0.3978085 -1.6458358 8 2 dlfg 1.1927399 -0.3769350 1.8743548 Another problem I have is that I want to check for duplicates in the dataset. If there are duplicates then print out the sequences. I tried with the code below but got not so nice output. How can I make the output look nicer or is there better way to do this? pat<-subset(data, data$patien==10 & data$day==1) if(any(duplicated(pat$seq,MARGIN=1)) ==FALSE) cat(“No duplicates”,”\n”, sep=””) else {cat (”duplicates” ,”\n”,sep=””) & print(pat$seq[duplicated(pat$seq)]) } I got this output: duplicates [1] acdf cdsv Levels: acdf cdsv dlfg [1] NA NA Warning message: & not meaningful for factors in: Ops.factor(cat("duplicates", "\n", sep = ""), print(pat$seq[duplicated(pat$seq)])) But would like the output to be something like: duplicates [1] acdf cdsv Thanks alot for any help, Have a nice weekend ! Tom --------------------------------- Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/b/a/c_169901_resor_biljetter.html [[alternative HTML version deleted]]
hadley wickham
2007-Jul-13 18:25 UTC
[R] help with handling replicates before reshaping data
Hi Tom,> I have a dataset consists of duplicated sequences within day for each patient (see below data) and I want to reshape the data with patient as time variable. However the reshape function only takes the first sequence of the replicates and ignores the second. How can I 1) average the duplicates and 2) give the duplicated sequences unique names before reshaping the data ? > > > data > patient day seq y > 1 10 1 acdf -0.52416066 > 2 10 1 cdsv 0.62551539 > 3 10 1 dlfg -1.54668047 > 4 10 1 acdf 0.82404978 > 5 10 1 cdsv -1.17459914 > 6 10 2 acdf 0.47238216You mind find that the functions in the reshape package give you a bit more flexibility. # The reshape package expects data like to have # the value variable named "value" d2 <- rename(data, c("y" = "value")) # I think this is the format you want, which will average over the reps cast(d2, day + seq ~ patient, mean) Hadley
Reasonably Related Threads
- [PATCH 04/12] nv50/ir/tgsi: TGSI_OPCODE_POW replicates its result
- [Mesa-dev] [PATCH 04/12] nv50/ir/tgsi: TGSI_OPCODE_POW replicates its result
- [Mesa-dev] [PATCH 04/12] nv50/ir/tgsi: TGSI_OPCODE_POW replicates its result
- Re: coding factor replicates
- boot and variances of the bootstrap replicates of the variable of interest?