thr3ads.net - R help - [R] help with handling replicates before reshaping data [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Tom Cohen

2007-Jul-13 18:04 UTC

[R] help with handling replicates before reshaping data

Dear list,
  I have a dataset consists of duplicated sequences within day for each patient
(see below data) and I want to reshape the data with patient as time variable.
However the reshape function only takes the first sequence of the replicates and
ignores the second. How can I 1) average the duplicates and 2) give the
duplicated sequences unique names before reshaping the data ?
   
  > data
     patient day  seq           y
  1       10   1 acdf -0.52416066
  2       10   1 cdsv  0.62551539
  3       10   1 dlfg -1.54668047
  4       10   1 acdf  0.82404978
  5       10   1 cdsv -1.17459914
  6       10   2 acdf  0.47238216
  7       10   2 cdsv -0.92364896
  8       10   2 dlfg  1.19273992
  9       10   2 acdf  0.03759663
  10      10   2 cdsv  1.05106783
  11      12   1 acdf  0.43575105
  12      12   1 cdsv  1.01675547
  13      12   1 dlfg -1.54601413
  14      12   1 acdf  1.03384654
  15      12   1 cdsv  0.32197671
  16      12   2 acdf  0.37355285
  17      12   2 cdsv -0.39780850
  18      12   2 dlfg -0.37693499
  19      12   2 acdf -1.28989165
  20      12   2 cdsv -0.06938098
  21      23   1 acdf -0.68486972
  22      23   1 cdsv -1.08035660
  23      23   1 dlfg  0.93124685
  24      23   1 acdf -0.78737514
  25      23   1 cdsv -1.56315904
  26      23   2 acdf -2.30913270
  27      23   2 cdsv -1.64583577
  28      23   2 dlfg  1.87435485
  29      23   2 acdf -1.99671825
  30      23   2 cdsv  0.62995993
   
   
  >
redata<-reshape(data,idvar=c("day","seq"),timevar="patient",direction="wide")
   
   
   
  The reshaped data has only three sequences for each day and didn't take
into account the value of the second replicate.
   
  > 
  > redata
    day  seq       y.10       y.12       y.23
  1   1 acdf -0.5241607  0.4357510 -0.6848697
  2   1 cdsv  0.6255154  1.0167555 -1.0803566
  3   1 dlfg -1.5466805 -1.5460141  0.9312469
  6   2 acdf  0.4723822  0.3735529 -2.3091327
  7   2 cdsv -0.9236490 -0.3978085 -1.6458358
  8   2 dlfg  1.1927399 -0.3769350  1.8743548
   
  Another problem I have is that I want to check for duplicates in the dataset.
If there are duplicates then print out the sequences. I tried with the code
below but got not so nice output. How can I make the output look nicer or is
there better way to do this?
   
  pat<-subset(data, data$patien==10 & data$day==1)
  if(any(duplicated(pat$seq,MARGIN=1)) ==FALSE) 
  cat(“No duplicates”,”\n”, sep=””)  else {cat (”duplicates” ,”\n”,sep=””) &
print(pat$seq[duplicated(pat$seq)]) }
   
  I got this output:
   
  duplicates
  [1] acdf cdsv
  Levels: acdf cdsv dlfg
  [1] NA NA
  Warning message:
  & not meaningful for factors in: Ops.factor(cat("duplicates",
"\n", sep = ""), print(pat$seq[duplicated(pat$seq)]))
   
  But would like the output to be something like:
   
  duplicates
  [1] acdf cdsv
   
  Thanks alot for any help,
  Have a nice weekend !
  Tom

 	      
---------------------------------

Jämför pris på flygbiljetter och hotellrum:
http://shopping.yahoo.se/b/a/c_169901_resor_biljetter.html
	[[alternative HTML version deleted]]

hadley wickham

2007-Jul-13 18:25 UTC

head link

[R] help with handling replicates before reshaping data

Hi Tom,
>   I have a dataset consists of duplicated sequences within day for each
patient (see below data) and I want to reshape the data with patient as time
variable. However the reshape function only takes the first sequence of the
replicates and ignores the second. How can I 1) average the duplicates and 2)
give the duplicated sequences unique names before reshaping the data ?
>
>   > data
>      patient day  seq           y
>   1       10   1 acdf -0.52416066
>   2       10   1 cdsv  0.62551539
>   3       10   1 dlfg -1.54668047
>   4       10   1 acdf  0.82404978
>   5       10   1 cdsv -1.17459914
>   6       10   2 acdf  0.47238216
You mind find that the functions in the reshape package give you a bit
more flexibility.

# The reshape package expects data like to have
# the value variable named "value"
d2 <- rename(data, c("y" = "value"))

# I think this is the format you want, which will average over the reps
cast(d2, day + seq ~ patient, mean)


Hadley

Maybe Matching Threads

Search for more possibly parallel threads

R help - Jul 2007 - help with handling replicates before reshaping data

[R] help with handling replicates before reshaping data

[R] help with handling replicates before reshaping data

Maybe Matching Threads