Dear list,
I have a dataset consists of duplicated sequences within day for each patient
(see below data) and I want to reshape the data with patient as time variable.
However the reshape function only takes the first sequence of the replicates and
ignores the second. How can I 1) average the duplicates and 2) give the
duplicated sequences unique names before reshaping the data ?
> data
patient day seq y
1 10 1 acdf -0.52416066
2 10 1 cdsv 0.62551539
3 10 1 dlfg -1.54668047
4 10 1 acdf 0.82404978
5 10 1 cdsv -1.17459914
6 10 2 acdf 0.47238216
7 10 2 cdsv -0.92364896
8 10 2 dlfg 1.19273992
9 10 2 acdf 0.03759663
10 10 2 cdsv 1.05106783
11 12 1 acdf 0.43575105
12 12 1 cdsv 1.01675547
13 12 1 dlfg -1.54601413
14 12 1 acdf 1.03384654
15 12 1 cdsv 0.32197671
16 12 2 acdf 0.37355285
17 12 2 cdsv -0.39780850
18 12 2 dlfg -0.37693499
19 12 2 acdf -1.28989165
20 12 2 cdsv -0.06938098
21 23 1 acdf -0.68486972
22 23 1 cdsv -1.08035660
23 23 1 dlfg 0.93124685
24 23 1 acdf -0.78737514
25 23 1 cdsv -1.56315904
26 23 2 acdf -2.30913270
27 23 2 cdsv -1.64583577
28 23 2 dlfg 1.87435485
29 23 2 acdf -1.99671825
30 23 2 cdsv 0.62995993
>
redata<-reshape(data,idvar=c("day","seq"),timevar="patient",direction="wide")
The reshaped data has only three sequences for each day and didn't take
into account the value of the second replicate.
>
> redata
day seq y.10 y.12 y.23
1 1 acdf -0.5241607 0.4357510 -0.6848697
2 1 cdsv 0.6255154 1.0167555 -1.0803566
3 1 dlfg -1.5466805 -1.5460141 0.9312469
6 2 acdf 0.4723822 0.3735529 -2.3091327
7 2 cdsv -0.9236490 -0.3978085 -1.6458358
8 2 dlfg 1.1927399 -0.3769350 1.8743548
Another problem I have is that I want to check for duplicates in the dataset.
If there are duplicates then print out the sequences. I tried with the code
below but got not so nice output. How can I make the output look nicer or is
there better way to do this?
pat<-subset(data, data$patien==10 & data$day==1)
if(any(duplicated(pat$seq,MARGIN=1)) ==FALSE)
cat(“No duplicates”,”\n”, sep=””) else {cat (”duplicates” ,”\n”,sep=””) &
print(pat$seq[duplicated(pat$seq)]) }
I got this output:
duplicates
[1] acdf cdsv
Levels: acdf cdsv dlfg
[1] NA NA
Warning message:
& not meaningful for factors in: Ops.factor(cat("duplicates",
"\n", sep = ""), print(pat$seq[duplicated(pat$seq)]))
But would like the output to be something like:
duplicates
[1] acdf cdsv
Thanks alot for any help,
Have a nice weekend !
Tom
---------------------------------
Jämför pris på flygbiljetter och hotellrum:
http://shopping.yahoo.se/b/a/c_169901_resor_biljetter.html
[[alternative HTML version deleted]]
hadley wickham
2007-Jul-13 18:25 UTC
[R] help with handling replicates before reshaping data
Hi Tom,> I have a dataset consists of duplicated sequences within day for each patient (see below data) and I want to reshape the data with patient as time variable. However the reshape function only takes the first sequence of the replicates and ignores the second. How can I 1) average the duplicates and 2) give the duplicated sequences unique names before reshaping the data ? > > > data > patient day seq y > 1 10 1 acdf -0.52416066 > 2 10 1 cdsv 0.62551539 > 3 10 1 dlfg -1.54668047 > 4 10 1 acdf 0.82404978 > 5 10 1 cdsv -1.17459914 > 6 10 2 acdf 0.47238216You mind find that the functions in the reshape package give you a bit more flexibility. # The reshape package expects data like to have # the value variable named "value" d2 <- rename(data, c("y" = "value")) # I think this is the format you want, which will average over the reps cast(d2, day + seq ~ patient, mean) Hadley
Apparently Analagous Threads
- [PATCH 04/12] nv50/ir/tgsi: TGSI_OPCODE_POW replicates its result
- [Mesa-dev] [PATCH 04/12] nv50/ir/tgsi: TGSI_OPCODE_POW replicates its result
- [Mesa-dev] [PATCH 04/12] nv50/ir/tgsi: TGSI_OPCODE_POW replicates its result
- Re: coding factor replicates
- boot and variances of the bootstrap replicates of the variable of interest?