ALAN SMITH
2007-Mar-07 23:25 UTC
[R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work
Hello R users,
Problem.......I do not understand how to use
"aggregate","by", or the
appropriate "apply" to perform a function on data with more than one
factor on unbalanced data...
I have a data frame in the long format that does not contain balanced
data. The ID is a unique identifier corresponding to the experimental
unit that will later be examined by ANOVA, T-tests etc. Y is the data
generated from the experiment. The factors represent the differences
between each sample or "run" measured.
str(mydata) ### sample of table at bottom of email ###
'data.frame': 129982 obs. of 6 variables:
$ ID : num 7 7 7 7 7 7 8 8 8 8 ...
$ time : Factor w/ 2 levels "120hr","24hr": 1 1 1 1 2 2
2 1 1 1 ...
$ treatment: Factor w/ 2 levels "control","trt": 1 1 1 2 2
1 1 2 1 1 ...
$ expREP : Factor w/ 3 levels "expREP1","expREP2",..: 1 1
1 3 1 1 1 1 2 2 ...
$ techREP : Factor w/ 3 levels "techREP1","techREP2",..: 3
2 1 1 1 3
1 3 3 2 ...
$ Y : num 14.4 14.1 14.2 13.8 14.1 ...
Could someone please help with doing something like the following
1. I would like to find the median for each unique combination of
factors using the data in the long format (like finding the median
of a single column of data).
2. Create a new column where the median is repeated for the number of
rows of the unique factor combination
3. I would like to learn the most efficient way to do this because I
want to avoid recreating the table from scratch with many commands
like the series below. I will have to perform this operation on many
different data sets some, with many more factors then this example.
### help me learn to use an apply or other command that will do the
following #####
m0<-mydata$cpdID[mydata$time=="24hr" &
mydata$treatment=="control" &
mydata$expREP=="expREP1" & mydata$techREP=="techREP1"]
m1<-mydata$Y[mydata$time=="24hr" &
mydata$treatment=="control" &
mydata$expREP=="expREP1" & mydata$techREP=="techREP1"]
m2<-median(m1)
m3<-cbind(ID=m0,time=rep("24hr",length(m1)),
treatment=rep("control",length(m1)),
expREP=rep("expREP1",length(m1)),
techREP=rep("techREP1",length(m1)),Y=m1,Y50=rep(m2,length(m1)))
######### I would like to avoid writing the above hundreds of times ######
I am able to reshape into wide format and then find the column
medians. However restacking the data and regenerating the factors
becomes very very messy on data sets with 150 columns. I am able to
preform this analysis is SAS easily using BY, but I would like to know
how to do it in R.
I have tried these commands in a number of different variations with
no luck and similar error messages
test1<-aggregate(mydata[,-1],
list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP)
,median, na.rm=T)
Error in median.default(X[[1]], ...) : need numeric data ### Y in numeric####
test1<-by(mydata[,-1],
list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP)
,median, na.rm=T)
Error in median.default(data[x, ], ...) : need numeric data
Thanks
Alan
winXP R 2.4.1
#####Example data frame######
mydata<-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8,
8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23,
23, 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42,
42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
42, 42, 42, 42, 42, 42, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47,
47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47), time = structure(as.integer(c(1,
1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2,
1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2,
1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2,
2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2,
2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2,
2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2,
2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1,
1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2,
2, 2, 1, 2, 2, 1, 2, 2, 1, 2)), .Label = c("120hr", "24hr"),
class = "factor"),
treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2,
1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1,
2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2,
1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1,
1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2,
2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2,
1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2,
2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1,
1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1,
2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1,
2, 1)), .Label = c("control", "trt"), class =
"factor"),
expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2,
2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3,
3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1,
1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2,
2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2,
3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1,
1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1,
3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2,
2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1,
1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3,
3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2,
3)), .Label = c("expREP1", "expREP2",
"expREP3"), class = "factor"),
techREP = structure(as.integer(c(3, 2, 1, 1, 1, 3, 1, 3,
3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2,
3, 1, 1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2,
3, 1, 3, 2, 1, 1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3,
2, 3, 1, 3, 2, 2, 1, 2, 1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3,
1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3, 2, 1, 3, 1, 3, 2, 3, 1,
1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2, 3, 2, 2, 1, 2,
2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1, 3, 2,
3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1,
2, 1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1,
3, 2, 3, 3, 2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2,
2, 3)), .Label = c("techREP1", "techREP2",
"techREP3"), class = "factor"),
log2Abun = c(14.4233129144089, 14.052822741429, 14.2281422686467,
13.8492096005693, 14.076481601207, 14.2139395740777, 14.3399195756207,
14.3625602954496, 14.0141948668145, 14.0980320829605, 14.3152203363759,
14.4528846974866, 13.9591869268449, 14.4064043323413, 14.0403753485321,
14.2285932517829, 14.1259784261721, 13.5925738310379, 13.5830827675029,
13.0280787227049, 15.0198078807043, 12.8423503434138, 12.645883554519,
13.4644181177386, 12.8399910705399, 12.7879025593914, 12.4978518369511,
14.3949985145017, 12.8670856466168, 12.9749522735341, 13.3456824481868,
13.4557125040673, 12.8989792046225, 16.0609491915918, 13.6795900568273,
16.456466720182, 13.6145948287653, 13.2604785448039, 14.8573006848798,
13.1382718001722, 13.690761908446, 14.0557060971613, 13.7495552174335,
13.6336764098923, 13.7844303674846, 15.9518993688317, 13.2452555803066,
13.1930632791304, 12.1919845133603, 13.8710388986595, 13.6375305515253,
12.5919897676151, 17.4797250127015, 17.4014712120155, 17.5948202702163,
12.6031626795344, 17.8287811089804, 11.3613955331659, 15.8064741020529,
15.1007855146758, 16.0553036215393, 15.7553570530353, 15.9747058600332,
15.776715745005, 15.8588066550904, 16.2935434944118, 16.271207673964,
16.3660489506706, 16.3273070282017, 15.7632383068689, 14.6030467398838,
14.7118820283521, 14.7577545959238, 14.7315311764619, 14.8250084466403,
15.6652803936783, 15.8249587405285, 15.6558660906456, 15.5387042614836,
14.8487696278309, 15.5477380355109, 15.9451465974129, 16.196755792715,
15.9999119421954, 15.8660714836595, 15.9406577104549, 15.8754613979164,
16.0358944927638, 16.1785092456522, 16.1992122284106, 15.8087128474547,
15.9373968104322, 16.1432636222427, 16.2412011305004, 15.9488234774507,
15.7820255767261, 15.7730361533934, 15.7459893802453, 20.7777738189812,
21.7489122647969, 21.0374490930058, 20.9765158780184, 21.0464959041766,
21.6790715518273, 21.8021013715842, 20.7652083875471, 20.6663696521617,
20.3963413756589, 20.7983642126234, 20.1864915044977, 20.4422216681915,
20.59064186918, 20.6964531077756, 20.6822196619653, 20.4532414913665,
20.8126113450884, 20.4397608946311, 21.4603719009067, 21.5318145314919,
21.0400816517662, 21.0466431076593, 20.7459819969019, 20.6723053403015,
20.4793421418014, 20.6432035537608, 20.6831942471622, 21.6913537667357,
20.6562913013787, 21.0940693071186, 20.9473294479256, 20.5087271424267,
16.0871520250047, 16.3816612332698, 16.998645516939, 15.7912392142223,
14.5058735666446, 13.6035104425928, 14.4369066987207, 14.6998435295626,
14.6818972267862, 14.1086877961546, 14.3539049235617, 15.40862828087,
15.0657947671893, 14.8615716011254, 14.5538692431961, 14.2397476835569,
13.4381420777437, 13.4499224158638, 13.6887966810545, 14.6550275257018,
13.500966330283, 14.9271297886953, 14.7405186421119, 15.0047910398043,
14.7051463678038, 14.8325933769599, 12.9854861991046, 13.4203550220891,
15.399010832952, 15.4064707685293, 15.0953970227926, 15.0712109416537,
15.7587957644032, 15.0013202225009, 15.7608498673217, 14.7604080920677,
14.2478533598602, 14.4140245098782, 14.7936541075062, 14.7684428120549,
14.595607155062, 16.1507389488284, 16.4915712924337, 14.490161446684,
14.721633263063, 14.4341721012904, 15.8747652729112, 14.543333961671,
14.8633635585377, 14.6696601802386, 13.3020676725265, 14.0190694293311,
15.2168973938334, 12.6304946615056, 12.1972166931101, 12.7960396088298,
14.4285564621952, 14.5308330346953, 14.1496677436943, 14.0823985634278,
12.8407779235951, 14.6543003749437, 14.3202364452416, 15.1723493709662,
14.0744760007345, 14.8132801684508, 12.9183042336999, 14.5225202325766,
13.742309436084)), .Names = c("cpdID", "time",
"treatment",
"expREP", "techREP", "Y")))
jim holtman
2007-Mar-08 00:16 UTC
[R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work
Here is one way of doing it:> # create the rows for each unique combination > x.split <- split(seq(nrow(mydata)), list(mydata$time, mydata$treatment,+ mydata$expREP, mydata$techREP), drop=TRUE)> # now go through the list of indices and add the median > mydata$Y50 <- 0 # add the dummy median column > for (i in x.split){+ mydata$Y50[i] <- median(mydata$Y[i]) # median for each group + }> head(mydata,20)cpdID time treatment expREP techREP Y Y50 1 7 120hr control expREP1 techREP3 14.42331 15.74599 2 7 120hr control expREP1 techREP2 14.05282 15.10810 3 7 120hr control expREP1 techREP1 14.22814 14.63248 4 7 120hr trt expREP3 techREP1 13.84921 15.08641 5 7 24hr trt expREP1 techREP1 14.07648 15.17235 6 7 24hr control expREP1 techREP3 14.21394 14.63314 7 8 24hr control expREP1 techREP1 14.33992 14.81328 8 8 120hr trt expREP1 techREP3 14.36256 15.34493 9 8 120hr control expREP2 techREP3 14.01419 15.14270 10 8 120hr control expREP2 techREP2 14.09803 15.10079 11 8 120hr trt expREP1 techREP2 14.31522 15.39152 12 8 120hr trt expREP1 techREP1 14.45288 14.65430 13 8 24hr control expREP2 techREP1 13.95919 14.71188 14 8 24hr trt expREP2 techREP3 14.40640 14.36332 15 8 24hr trt expREP1 techREP2 14.04038 14.42856 16 8 24hr control expREP3 techREP3 14.22859 15.08463 17 8 24hr trt expREP1 techREP3 14.12598 14.53840 18 8 24hr trt expREP3 techREP1 13.59257 14.69984 19 8 24hr trt expREP3 techREP2 13.58308 14.85730 20 10 120hr control expREP3 techREP1 13.02808 14.07448>On 3/7/07, ALAN SMITH <alansmith2@gmail.com> wrote:> > Hello R users, > > Problem.......I do not understand how to use "aggregate","by", or the > appropriate "apply" to perform a function on data with more than one > factor on unbalanced data... > I have a data frame in the long format that does not contain balanced > data. The ID is a unique identifier corresponding to the experimental > unit that will later be examined by ANOVA, T-tests etc. Y is the data > generated from the experiment. The factors represent the differences > between each sample or "run" measured. > > str(mydata) ### sample of table at bottom of email ### > 'data.frame': 129982 obs. of 6 variables: > $ ID : num 7 7 7 7 7 7 8 8 8 8 ... > $ time : Factor w/ 2 levels "120hr","24hr": 1 1 1 1 2 2 2 1 1 1 ... > $ treatment: Factor w/ 2 levels "control","trt": 1 1 1 2 2 1 1 2 1 1 ... > $ expREP : Factor w/ 3 levels "expREP1","expREP2",..: 1 1 1 3 1 1 1 1 2 > 2 ... > $ techREP : Factor w/ 3 levels "techREP1","techREP2",..: 3 2 1 1 1 3 > 1 3 3 2 ... > $ Y : num 14.4 14.1 14.2 13.8 14.1 ... > > Could someone please help with doing something like the following > 1. I would like to find the median for each unique combination of > factors using the data in the long format (like finding the median > of a single column of data). > 2. Create a new column where the median is repeated for the number of > rows of the unique factor combination > 3. I would like to learn the most efficient way to do this because I > want to avoid recreating the table from scratch with many commands > like the series below. I will have to perform this operation on many > different data sets some, with many more factors then this example. > > ### help me learn to use an apply or other command that will do the > following ##### > m0<-mydata$cpdID[mydata$time=="24hr" & mydata$treatment=="control" & > mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] > m1<-mydata$Y[mydata$time=="24hr" & mydata$treatment=="control" & > mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] > m2<-median(m1) > m3<-cbind(ID=m0,time=rep("24hr",length(m1)), > treatment=rep("control",length(m1)), expREP=rep("expREP1",length(m1)), > techREP=rep("techREP1",length(m1)),Y=m1,Y50=rep(m2,length(m1))) > ######### I would like to avoid writing the above hundreds of times ###### > > I am able to reshape into wide format and then find the column > medians. However restacking the data and regenerating the factors > becomes very very messy on data sets with 150 columns. I am able to > preform this analysis is SAS easily using BY, but I would like to know > how to do it in R. > > > I have tried these commands in a number of different variations with > no luck and similar error messages > test1<-aggregate(mydata[,-1], > list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) > ,median, na.rm=T) > Error in median.default(X[[1]], ...) : need numeric data ### Y in > numeric#### > > test1<-by(mydata[,-1], > list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) > ,median, na.rm=T) > Error in median.default(data[x, ], ...) : need numeric data > > Thanks > Alan > winXP R 2.4.1 > > > #####Example data frame###### > mydata<-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8, > 8, 8, 8, 8, > 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, > 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, > 10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23, > 23, 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, > 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, > 33, 33, 33, 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42, > 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, > 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, > 42, 42, 42, 42, 42, 42, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, > 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47), time = structure(as.integer > (c(1, > 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, > 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, > 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, > 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2, > 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, > 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, > 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, > 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, > 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, > 2, 2, 1, 2, 2, 1, 2, 2, 1, 2)), .Label = c("120hr", "24hr"), class > "factor"), > treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2, > 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, > 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, > 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, > 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, > 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, > 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2, > 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, > 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, > 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, > 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, > 2, 1)), .Label = c("control", "trt"), class = "factor"), > expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2, > 2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3, > 3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1, > 1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2, > 2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2, > 3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1, > 1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1, > 3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2, > 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1, > 1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3, > 3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2, > 3)), .Label = c("expREP1", "expREP2", "expREP3"), class = "factor"), > techREP = structure(as.integer(c(3, 2, 1, 1, 1, 3, 1, 3, > 3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2, > 3, 1, 1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2, > 3, 1, 3, 2, 1, 1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, > 2, 3, 1, 3, 2, 2, 1, 2, 1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3, > 1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3, 2, 1, 3, 1, 3, 2, 3, 1, > 1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2, 3, 2, 2, 1, 2, > 2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1, 3, 2, > 3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1, > 2, 1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1, > 3, 2, 3, 3, 2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2, > 2, 3)), .Label = c("techREP1", "techREP2", "techREP3"), class > "factor"), > log2Abun = c(14.4233129144089, 14.052822741429, 14.2281422686467, > 13.8492096005693, 14.076481601207, 14.2139395740777, 14.3399195756207, > 14.3625602954496, 14.0141948668145, 14.0980320829605, 14.3152203363759, > 14.4528846974866, 13.9591869268449, 14.4064043323413, 14.0403753485321, > 14.2285932517829, 14.1259784261721, 13.5925738310379, 13.5830827675029, > 13.0280787227049, 15.0198078807043, 12.8423503434138, 12.645883554519, > 13.4644181177386, 12.8399910705399, 12.7879025593914, 12.4978518369511, > 14.3949985145017, 12.8670856466168, 12.9749522735341, 13.3456824481868, > 13.4557125040673, 12.8989792046225, 16.0609491915918, 13.6795900568273, > 16.456466720182, 13.6145948287653, 13.2604785448039, 14.8573006848798, > 13.1382718001722, 13.690761908446, 14.0557060971613, 13.7495552174335, > 13.6336764098923, 13.7844303674846, 15.9518993688317, 13.2452555803066, > 13.1930632791304, 12.1919845133603, 13.8710388986595, 13.6375305515253, > 12.5919897676151, 17.4797250127015, 17.4014712120155, 17.5948202702163, > 12.6031626795344, 17.8287811089804, 11.3613955331659, 15.8064741020529, > 15.1007855146758, 16.0553036215393, 15.7553570530353, 15.9747058600332, > 15.776715745005, 15.8588066550904, 16.2935434944118, 16.271207673964, > 16.3660489506706, 16.3273070282017, 15.7632383068689, 14.6030467398838, > 14.7118820283521, 14.7577545959238, 14.7315311764619, 14.8250084466403, > 15.6652803936783, 15.8249587405285, 15.6558660906456, 15.5387042614836, > 14.8487696278309, 15.5477380355109, 15.9451465974129, 16.196755792715, > 15.9999119421954, 15.8660714836595, 15.9406577104549, 15.8754613979164, > 16.0358944927638, 16.1785092456522, 16.1992122284106, 15.8087128474547, > 15.9373968104322, 16.1432636222427, 16.2412011305004, 15.9488234774507, > 15.7820255767261, 15.7730361533934, 15.7459893802453, 20.7777738189812, > 21.7489122647969, 21.0374490930058, 20.9765158780184, 21.0464959041766, > 21.6790715518273, 21.8021013715842, 20.7652083875471, 20.6663696521617, > 20.3963413756589, 20.7983642126234, 20.1864915044977, 20.4422216681915, > 20.59064186918, 20.6964531077756, 20.6822196619653, 20.4532414913665, > 20.8126113450884, 20.4397608946311, 21.4603719009067, 21.5318145314919, > 21.0400816517662, 21.0466431076593, 20.7459819969019, 20.6723053403015, > 20.4793421418014, 20.6432035537608, 20.6831942471622, 21.6913537667357, > 20.6562913013787, 21.0940693071186, 20.9473294479256, 20.5087271424267, > 16.0871520250047, 16.3816612332698, 16.998645516939, 15.7912392142223, > 14.5058735666446, 13.6035104425928, 14.4369066987207, 14.6998435295626, > 14.6818972267862, 14.1086877961546, 14.3539049235617, 15.40862828087, > 15.0657947671893, 14.8615716011254, 14.5538692431961, 14.2397476835569, > 13.4381420777437, 13.4499224158638, 13.6887966810545, 14.6550275257018, > 13.500966330283, 14.9271297886953, 14.7405186421119, 15.0047910398043, > 14.7051463678038, 14.8325933769599, 12.9854861991046, 13.4203550220891, > 15.399010832952, 15.4064707685293, 15.0953970227926, 15.0712109416537, > 15.7587957644032, 15.0013202225009, 15.7608498673217, 14.7604080920677, > 14.2478533598602, 14.4140245098782, 14.7936541075062, 14.7684428120549, > 14.595607155062, 16.1507389488284, 16.4915712924337, 14.490161446684, > 14.721633263063, 14.4341721012904, 15.8747652729112, 14.543333961671, > 14.8633635585377, 14.6696601802386, 13.3020676725265, 14.0190694293311, > 15.2168973938334, 12.6304946615056, 12.1972166931101, 12.7960396088298, > 14.4285564621952, 14.5308330346953, 14.1496677436943, 14.0823985634278, > 12.8407779235951, 14.6543003749437, 14.3202364452416, 15.1723493709662, > 14.0744760007345, 14.8132801684508, 12.9183042336999, 14.5225202325766, > 13.742309436084)), .Names = c("cpdID", "time", "treatment", > "expREP", "techREP", "Y"))) > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
Mendiburu, Felipe (CIP)
2007-Mar-08 01:51 UTC
[R] how to "apply" functions to unbalanced data in long format byfactors......cant get "by" or "aggregate" to work
Dear Alan,
I think that podria to be of utility the function tapply.stat () of the package
agricolae.
see ?tapply.stat
Regards,
Felipe.
for example:
library(agricolae)
attach(mydata)
set1<-tapply.stat(mydata[,2:5],Y,median)
set2<-tapply.stat(time,Y,function(x) median(x))
set3<-tapply.stat(mydata[,c(2,3)],Y,function(x) median(x))
set2
time Y
1 120hr 14.94159
2 24hr 14.81914
set3
time treatment Y
1 120hr control 15.31974
2 120hr trt 14.82851
3 24hr control 15.03627
4 24hr trt 14.70249
________________________________
From: r-help-bounces at stat.math.ethz.ch on behalf of ALAN SMITH
Sent: Wed 3/7/2007 6:25 PM
To: r-help at stat.math.ethz.ch
Subject: [R] how to "apply" functions to unbalanced data in long
format byfactors......cant get "by" or "aggregate" to work
Hello R users,
Problem.......I do not understand how to use
"aggregate","by", or the
appropriate "apply" to perform a function on data with more than one
factor on unbalanced data...
I have a data frame in the long format that does not contain balanced
data. The ID is a unique identifier corresponding to the experimental
unit that will later be examined by ANOVA, T-tests etc. Y is the data
generated from the experiment. The factors represent the differences
between each sample or "run" measured.
str(mydata) ### sample of table at bottom of email ###
'data.frame': 129982 obs. of 6 variables:
$ ID : num 7 7 7 7 7 7 8 8 8 8 ...
$ time : Factor w/ 2 levels "120hr","24hr": 1 1 1 1 2 2
2 1 1 1 ...
$ treatment: Factor w/ 2 levels "control","trt": 1 1 1 2 2
1 1 2 1 1 ...
$ expREP : Factor w/ 3 levels "expREP1","expREP2",..: 1 1
1 3 1 1 1 1 2 2 ...
$ techREP : Factor w/ 3 levels "techREP1","techREP2",..: 3
2 1 1 1 3
1 3 3 2 ...
$ Y : num 14.4 14.1 14.2 13.8 14.1 ...
Could someone please help with doing something like the following
1. I would like to find the median for each unique combination of
factors using the data in the long format (like finding the median
of a single column of data).
2. Create a new column where the median is repeated for the number of
rows of the unique factor combination
3. I would like to learn the most efficient way to do this because I
want to avoid recreating the table from scratch with many commands
like the series below. I will have to perform this operation on many
different data sets some, with many more factors then this example.
### help me learn to use an apply or other command that will do the
following #####
m0<-mydata$cpdID[mydata$time=="24hr" &
mydata$treatment=="control" &
mydata$expREP=="expREP1" & mydata$techREP=="techREP1"]
m1<-mydata$Y[mydata$time=="24hr" &
mydata$treatment=="control" &
mydata$expREP=="expREP1" & mydata$techREP=="techREP1"]
m2<-median(m1)
m3<-cbind(ID=m0,time=rep("24hr",length(m1)),
treatment=rep("control",length(m1)),
expREP=rep("expREP1",length(m1)),
techREP=rep("techREP1",length(m1)),Y=m1,Y50=rep(m2,length(m1)))
######### I would like to avoid writing the above hundreds of times ######
I am able to reshape into wide format and then find the column
medians. However restacking the data and regenerating the factors
becomes very very messy on data sets with 150 columns. I am able to
preform this analysis is SAS easily using BY, but I would like to know
how to do it in R.
I have tried these commands in a number of different variations with
no luck and similar error messages
test1<-aggregate(mydata[,-1],
list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP)
,median, na.rm=T)
Error in median.default(X[[1]], ...) : need numeric data ### Y in numeric####
test1<-by(mydata[,-1],
list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP)
,median, na.rm=T)
Error in median.default(data[x, ], ...) : need numeric data
Thanks
Alan
winXP R 2.4.1
#####Example data frame######
mydata<-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8,
8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23,
23, 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42,
42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
42, 42, 42, 42, 42, 42, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47,
47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47), time = structure(as.integer(c(1,
1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2,
1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2,
1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2,
2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2,
2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2,
2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2,
2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1,
1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2,
2, 2, 1, 2, 2, 1, 2, 2, 1, 2)), .Label = c("120hr", "24hr"),
class = "factor"),
treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2,
1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1,
2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2,
1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1,
1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2,
2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2,
1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2,
2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1,
1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1,
2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1,
2, 1)), .Label = c("control", "trt"), class =
"factor"),
expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2,
2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3,
3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1,
1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2,
2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2,
3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1,
1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1,
3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2,
2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1,
1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3,
3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2,
3)), .Label = c("expREP1", "expREP2",
"expREP3"), class = "factor"),
techREP = structure(as.integer(c(3, 2, 1, 1, 1, 3, 1, 3,
3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2,
3, 1, 1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2,
3, 1, 3, 2, 1, 1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3,
2, 3, 1, 3, 2, 2, 1, 2, 1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3,
1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3, 2, 1, 3, 1, 3, 2, 3, 1,
1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2, 3, 2, 2, 1, 2,
2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1, 3, 2,
3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1,
2, 1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1,
3, 2, 3, 3, 2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2,
2, 3)), .Label = c("techREP1", "techREP2",
"techREP3"), class = "factor"),
log2Abun = c(14.4233129144089, 14.052822741429, 14.2281422686467,
13.8492096005693, 14.076481601207, 14.2139395740777, 14.3399195756207,
14.3625602954496, 14.0141948668145, 14.0980320829605, 14.3152203363759,
14.4528846974866, 13.9591869268449, 14.4064043323413, 14.0403753485321,
14.2285932517829, 14.1259784261721, 13.5925738310379, 13.5830827675029,
13.0280787227049, 15.0198078807043, 12.8423503434138, 12.645883554519,
13.4644181177386, 12.8399910705399, 12.7879025593914, 12.4978518369511,
14.3949985145017, 12.8670856466168, 12.9749522735341, 13.3456824481868,
13.4557125040673, 12.8989792046225, 16.0609491915918, 13.6795900568273,
16.456466720182, 13.6145948287653, 13.2604785448039, 14.8573006848798,
13.1382718001722, 13.690761908446, 14.0557060971613, 13.7495552174335,
13.6336764098923, 13.7844303674846, 15.9518993688317, 13.2452555803066,
13.1930632791304, 12.1919845133603, 13.8710388986595, 13.6375305515253,
12.5919897676151, 17.4797250127015, 17.4014712120155, 17.5948202702163,
12.6031626795344, 17.8287811089804, 11.3613955331659, 15.8064741020529,
15.1007855146758, 16.0553036215393, 15.7553570530353, 15.9747058600332,
15.776715745005, 15.8588066550904, 16.2935434944118, 16.271207673964,
16.3660489506706, 16.3273070282017, 15.7632383068689, 14.6030467398838,
14.7118820283521, 14.7577545959238, 14.7315311764619, 14.8250084466403,
15.6652803936783, 15.8249587405285, 15.6558660906456, 15.5387042614836,
14.8487696278309, 15.5477380355109, 15.9451465974129, 16.196755792715,
15.9999119421954, 15.8660714836595, 15.9406577104549, 15.8754613979164,
16.0358944927638, 16.1785092456522, 16.1992122284106, 15.8087128474547,
15.9373968104322, 16.1432636222427, 16.2412011305004, 15.9488234774507,
15.7820255767261, 15.7730361533934, 15.7459893802453, 20.7777738189812,
21.7489122647969, 21.0374490930058, 20.9765158780184, 21.0464959041766,
21.6790715518273, 21.8021013715842, 20.7652083875471, 20.6663696521617,
20.3963413756589, 20.7983642126234, 20.1864915044977, 20.4422216681915,
20.59064186918, 20.6964531077756, 20.6822196619653, 20.4532414913665,
20.8126113450884, 20.4397608946311, 21.4603719009067, 21.5318145314919,
21.0400816517662, 21.0466431076593, 20.7459819969019, 20.6723053403015,
20.4793421418014, 20.6432035537608, 20.6831942471622, 21.6913537667357,
20.6562913013787, 21.0940693071186, 20.9473294479256, 20.5087271424267,
16.0871520250047, 16.3816612332698, 16.998645516939, 15.7912392142223,
14.5058735666446, 13.6035104425928, 14.4369066987207, 14.6998435295626,
14.6818972267862, 14.1086877961546, 14.3539049235617, 15.40862828087,
15.0657947671893, 14.8615716011254, 14.5538692431961, 14.2397476835569,
13.4381420777437, 13.4499224158638, 13.6887966810545, 14.6550275257018,
13.500966330283, 14.9271297886953, 14.7405186421119, 15.0047910398043,
14.7051463678038, 14.8325933769599, 12.9854861991046, 13.4203550220891,
15.399010832952, 15.4064707685293, 15.0953970227926, 15.0712109416537,
15.7587957644032, 15.0013202225009, 15.7608498673217, 14.7604080920677,
14.2478533598602, 14.4140245098782, 14.7936541075062, 14.7684428120549,
14.595607155062, 16.1507389488284, 16.4915712924337, 14.490161446684,
14.721633263063, 14.4341721012904, 15.8747652729112, 14.543333961671,
14.8633635585377, 14.6696601802386, 13.3020676725265, 14.0190694293311,
15.2168973938334, 12.6304946615056, 12.1972166931101, 12.7960396088298,
14.4285564621952, 14.5308330346953, 14.1496677436943, 14.0823985634278,
12.8407779235951, 14.6543003749437, 14.3202364452416, 15.1723493709662,
14.0744760007345, 14.8132801684508, 12.9183042336999, 14.5225202325766,
13.742309436084)), .Names = c("cpdID", "time",
"treatment",
"expREP", "techREP", "Y")))
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Petr Pikal
2007-Mar-08 07:00 UTC
[R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work
Hi you can use aggregate to create table of medians with(mydata, aggregate(Y, list(time, tratment, expRep,....), median) repeats of unique factors either by rle or aggregate with length function Then you can do replication by norep <- rep(your.median, each = your replicates) Regards Petr On 7 Mar 2007 at 17:25, ALAN SMITH wrote: Date sent: Wed, 7 Mar 2007 17:25:54 -0600 From: "ALAN SMITH" <alansmith2 at gmail.com> To: r-help at stat.math.ethz.ch Subject: [R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work> Hello R users, > > Problem.......I do not understand how to use "aggregate","by", or the > appropriate "apply" to perform a function on data with more than one > factor on unbalanced data... I have a data frame in the long format > that does not contain balanced data. The ID is a unique identifier > corresponding to the experimental unit that will later be examined by > ANOVA, T-tests etc. Y is the data generated from the experiment. The > factors represent the differences between each sample or "run" > measured. > > str(mydata) ### sample of table at bottom of email ### > 'data.frame': 129982 obs. of 6 variables: > $ ID : num 7 7 7 7 7 7 8 8 8 8 ... > $ time : Factor w/ 2 levels "120hr","24hr": 1 1 1 1 2 2 2 1 1 1 > ... $ treatment: Factor w/ 2 levels "control","trt": 1 1 1 2 2 1 1 2 > 1 1 ... $ expREP : Factor w/ 3 levels "expREP1","expREP2",..: 1 1 1 > 3 1 1 1 1 2 2 ... $ techREP : Factor w/ 3 levels > "techREP1","techREP2",..: 3 2 1 1 1 3 > 1 3 3 2 ... > $ Y : num 14.4 14.1 14.2 13.8 14.1 ... > > Could someone please help with doing something like the following 1. I > would like to find the median for each unique combination of factors > using the data in the long format (like finding the median of a > single column of data). 2. Create a new column where the median is > repeated for the number of rows of the unique factor combination 3. I > would like to learn the most efficient way to do this because I want > to avoid recreating the table from scratch with many commands like the > series below. I will have to perform this operation on many different > data sets some, with many more factors then this example. > > ### help me learn to use an apply or other command that will do the > following ##### > m0<-mydata$cpdID[mydata$time=="24hr" & mydata$treatment=="control" & > mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] > m1<-mydata$Y[mydata$time=="24hr" & mydata$treatment=="control" & > mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] m2<-median(m1) > m3<-cbind(ID=m0,time=rep("24hr",length(m1)), > treatment=rep("control",length(m1)), expREP=rep("expREP1",length(m1)), > techREP=rep("techREP1",length(m1)),Y=m1,Y50=rep(m2,length(m1))) > ######### I would like to avoid writing the above hundreds of times > ###### > > I am able to reshape into wide format and then find the column > medians. However restacking the data and regenerating the factors > becomes very very messy on data sets with 150 columns. I am able to > preform this analysis is SAS easily using BY, but I would like to know > how to do it in R. > > > I have tried these commands in a number of different variations with > no luck and similar error messages > test1<-aggregate(mydata[,-1], > list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) > ,median, na.rm=T) > Error in median.default(X[[1]], ...) : need numeric data ### Y in > numeric#### > > test1<-by(mydata[,-1], > list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) > ,median, na.rm=T) > Error in median.default(data[x, ], ...) : need numeric data > > Thanks > Alan > winXP R 2.4.1 > > > #####Example data frame###### > mydata<-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8, > 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, > 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, > 10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23, 23, > 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, > 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, > 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, > 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, > 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 47, 47, > 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, > 47, 47), time = structure(as.integer(c(1, 1, 1, 1, 2, 2, 2, 1, 1, 1, > 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, > 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, > 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, > 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, > 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, > 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, > 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, > 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, > 2, 1, 2, 2, 1, 2)), .Label = c("120hr", "24hr"), class = "factor"), > treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2, > 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, > 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, > 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, > 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, > 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, > 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2, > 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, > 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, > 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, > 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, > 2, 1)), .Label = c("control", "trt"), class = "factor"), > expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2, > 2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3, > 3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1, > 1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2, > 2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2, > 3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1, > 1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1, > 3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2, > 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1, > 1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3, > 3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2, > 3)), .Label = c("expREP1", "expREP2", "expREP3"), class > "factor"), techREP = structure(as.integer(c(3, 2, 1, 1, 1, 3, 1, > 3, 3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2, 3, 1, > 1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2, 3, 1, 3, 2, 1, > 1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, 2, 3, 1, 3, 2, 2, 1, 2, > 1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3, 1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3, > 2, 1, 3, 1, 3, 2, 3, 1, 1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2, > 3, 2, 2, 1, 2, 2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1, > 3, 2, 3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1, 2, > 1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1, 3, 2, 3, 3, > 2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2, 2, 3)), .Label > c("techREP1", "techREP2", "techREP3"), class = "factor"), log2Abun > = c(14.4233129144089, 14.052822741429, 14.2281422686467, > 13.8492096005693, 14.076481601207, 14.2139395740777, > 14.3399195756207, 14.3625602954496, 14.0141948668145, > 14.0980320829605, 14.3152203363759, 14.4528846974866, > 13.9591869268449, 14.4064043323413, 14.0403753485321, > 14.2285932517829, 14.1259784261721, 13.5925738310379, > 13.5830827675029, 13.0280787227049, 15.0198078807043, > 12.8423503434138, 12.645883554519, 13.4644181177386, > 12.8399910705399, 12.7879025593914, 12.4978518369511, > 14.3949985145017, 12.8670856466168, 12.9749522735341, > 13.3456824481868, 13.4557125040673, 12.8989792046225, > 16.0609491915918, 13.6795900568273, 16.456466720182, > 13.6145948287653, 13.2604785448039, 14.8573006848798, > 13.1382718001722, 13.690761908446, 14.0557060971613, > 13.7495552174335, 13.6336764098923, 13.7844303674846, > 15.9518993688317, 13.2452555803066, 13.1930632791304, > 12.1919845133603, 13.8710388986595, 13.6375305515253, > 12.5919897676151, 17.4797250127015, 17.4014712120155, > 17.5948202702163, 12.6031626795344, 17.8287811089804, > 11.3613955331659, 15.8064741020529, 15.1007855146758, > 16.0553036215393, 15.7553570530353, 15.9747058600332, > 15.776715745005, 15.8588066550904, 16.2935434944118, > 16.271207673964, 16.3660489506706, 16.3273070282017, > 15.7632383068689, 14.6030467398838, 14.7118820283521, > 14.7577545959238, 14.7315311764619, 14.8250084466403, > 15.6652803936783, 15.8249587405285, 15.6558660906456, > 15.5387042614836, 14.8487696278309, 15.5477380355109, > 15.9451465974129, 16.196755792715, 15.9999119421954, > 15.8660714836595, 15.9406577104549, 15.8754613979164, > 16.0358944927638, 16.1785092456522, 16.1992122284106, > 15.8087128474547, 15.9373968104322, 16.1432636222427, > 16.2412011305004, 15.9488234774507, 15.7820255767261, > 15.7730361533934, 15.7459893802453, 20.7777738189812, > 21.7489122647969, 21.0374490930058, 20.9765158780184, > 21.0464959041766, 21.6790715518273, 21.8021013715842, > 20.7652083875471, 20.6663696521617, 20.3963413756589, > 20.7983642126234, 20.1864915044977, 20.4422216681915, > 20.59064186918, 20.6964531077756, 20.6822196619653, > 20.4532414913665, 20.8126113450884, 20.4397608946311, > 21.4603719009067, 21.5318145314919, 21.0400816517662, > 21.0466431076593, 20.7459819969019, 20.6723053403015, > 20.4793421418014, 20.6432035537608, 20.6831942471622, > 21.6913537667357, 20.6562913013787, 21.0940693071186, > 20.9473294479256, 20.5087271424267, 16.0871520250047, > 16.3816612332698, 16.998645516939, 15.7912392142223, > 14.5058735666446, 13.6035104425928, 14.4369066987207, > 14.6998435295626, 14.6818972267862, 14.1086877961546, > 14.3539049235617, 15.40862828087, 15.0657947671893, > 14.8615716011254, 14.5538692431961, 14.2397476835569, > 13.4381420777437, 13.4499224158638, 13.6887966810545, > 14.6550275257018, 13.500966330283, 14.9271297886953, > 14.7405186421119, 15.0047910398043, 14.7051463678038, > 14.8325933769599, 12.9854861991046, 13.4203550220891, > 15.399010832952, 15.4064707685293, 15.0953970227926, > 15.0712109416537, 15.7587957644032, 15.0013202225009, > 15.7608498673217, 14.7604080920677, 14.2478533598602, > 14.4140245098782, 14.7936541075062, 14.7684428120549, > 14.595607155062, 16.1507389488284, 16.4915712924337, > 14.490161446684, 14.721633263063, 14.4341721012904, > 15.8747652729112, 14.543333961671, 14.8633635585377, > 14.6696601802386, 13.3020676725265, 14.0190694293311, > 15.2168973938334, 12.6304946615056, 12.1972166931101, > 12.7960396088298, 14.4285564621952, 14.5308330346953, > 14.1496677436943, 14.0823985634278, 12.8407779235951, > 14.6543003749437, 14.3202364452416, 15.1723493709662, > 14.0744760007345, 14.8132801684508, 12.9183042336999, > 14.5225202325766, 13.742309436084)), .Names = c("cpdID", "time", > "treatment", > "expREP", "techREP", "Y"))) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code.Petr Pikal petr.pikal at precheza.cz
ALAN SMITH
2007-Mar-08 17:34 UTC
[R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work
Hello R-users
The help I received from Petr helped me created this solution to my problems.
t1<-with(mydata ,aggregate(mydata$Y,
list(mydata$time,mydata$treatment, mydata$expREP, mydata$techREP) ,
median, na.rm=T)) ### find median by factors ####
colnames(t1)<-c("time","treatment","expREP","techREP","Y50")
### column name ##
newdata<-merge(mydata, t1, by.x= names(mydata)[2:5],
by.y=names(t1)[1:4], all=T)
Thank you,
Alan
###############################################################
Message: 97
Date: Thu, 08 Mar 2007 08:00:53 +0100
From: "Petr Pikal" <petr.pikal at precheza.cz>
Subject: Re: [R] how to "apply" functions to unbalanced data in long
format by factors......cant get "by" or
"aggregate" to work
To: "ALAN SMITH" <alansmith2 at gmail.com>, r-help at
stat.math.ethz.ch
Message-ID: <45EFC2B5.29775.2FD750 at localhost>
Content-Type: text/plain; charset=US-ASCII
Hi
you can use aggregate to create table of medians
with(mydata, aggregate(Y, list(time, tratment, expRep,....), median)
repeats of unique factors
either by rle or aggregate with length function
Then you can do replication by
norep <- rep(your.median, each = your replicates)
Regards
Petr
submitted question abrigded> Hello R users,
> #####Example data frame######
> mydata<-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8,
> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10,
> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
> 10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23, 23,
> 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
> 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
> 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
> 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
> 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 47, 47,
> 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47,
> 47, 47), time = structure(as.integer(c(1, 1, 1, 1, 2, 2, 2, 1, 1, 1,
> 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2,
> 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2,
> 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1,
> 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1,
> 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2,
> 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
> 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1,
> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2,
> 2, 1, 2, 2, 1, 2)), .Label = c("120hr", "24hr"), class
= "factor"),
> treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2,
> 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1,
> 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2,
> 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1,
> 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2,
> 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2,
> 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2,
> 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1,
> 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
> 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1,
> 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1,
> 2, 1)), .Label = c("control", "trt"), class =
"factor"),
> expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2,
> 2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3,
> 3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1,
> 1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2,
> 2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2,
> 3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1,
> 1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1,
> 3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2,
> 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1,
> 1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3,
> 3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2,
> 3)), .Label = c("expREP1", "expREP2",
"expREP3"), class > "factor"), techREP =
structure(as.integer(c(3, 2, 1, 1, 1, 3, 1,
> 3, 3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2, 3, 1,
> 1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2, 3, 1, 3, 2, 1,
> 1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, 2, 3, 1, 3, 2, 2, 1, 2,
> 1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3, 1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3,
> 2, 1, 3, 1, 3, 2, 3, 1, 1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2,
> 3, 2, 2, 1, 2, 2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1,
> 3, 2, 3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1, 2,
> 1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1, 3, 2, 3, 3,
> 2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2, 2, 3)), .Label >
c("techREP1", "techREP2", "techREP3"), class =
"factor"), log2Abun
> = c(14.4233129144089, 14.052822741429, 14.2281422686467,
> 13.8492096005693, 14.076481601207, 14.2139395740777,
> 14.3399195756207, 14.3625602954496, 14.0141948668145,
> 14.0980320829605, 14.3152203363759, 14.4528846974866,
> 13.9591869268449, 14.4064043323413, 14.0403753485321,
> 14.2285932517829, 14.1259784261721, 13.5925738310379,
> 13.5830827675029, 13.0280787227049, 15.0198078807043,
> 12.8423503434138, 12.645883554519, 13.4644181177386,
> 12.8399910705399, 12.7879025593914, 12.4978518369511,
> 14.3949985145017, 12.8670856466168, 12.9749522735341,
> 13.3456824481868, 13.4557125040673, 12.8989792046225,
> 16.0609491915918, 13.6795900568273, 16.456466720182,
> 13.6145948287653, 13.2604785448039, 14.8573006848798,
> 13.1382718001722, 13.690761908446, 14.0557060971613,
> 13.7495552174335, 13.6336764098923, 13.7844303674846,
> 15.9518993688317, 13.2452555803066, 13.1930632791304,
> 12.1919845133603, 13.8710388986595, 13.6375305515253,
> 12.5919897676151, 17.4797250127015, 17.4014712120155,
> 17.5948202702163, 12.6031626795344, 17.8287811089804,
> 11.3613955331659, 15.8064741020529, 15.1007855146758,
> 16.0553036215393, 15.7553570530353, 15.9747058600332,
> 15.776715745005, 15.8588066550904, 16.2935434944118,
> 16.271207673964, 16.3660489506706, 16.3273070282017,
> 15.7632383068689, 14.6030467398838, 14.7118820283521,
> 14.7577545959238, 14.7315311764619, 14.8250084466403,
> 15.6652803936783, 15.8249587405285, 15.6558660906456,
> 15.5387042614836, 14.8487696278309, 15.5477380355109,
> 15.9451465974129, 16.196755792715, 15.9999119421954,
> 15.8660714836595, 15.9406577104549, 15.8754613979164,
> 16.0358944927638, 16.1785092456522, 16.1992122284106,
> 15.8087128474547, 15.9373968104322, 16.1432636222427,
> 16.2412011305004, 15.9488234774507, 15.7820255767261,
> 15.7730361533934, 15.7459893802453, 20.7777738189812,
> 21.7489122647969, 21.0374490930058, 20.9765158780184,
> 21.0464959041766, 21.6790715518273, 21.8021013715842,
> 20.7652083875471, 20.6663696521617, 20.3963413756589,
> 20.7983642126234, 20.1864915044977, 20.4422216681915,
> 20.59064186918, 20.6964531077756, 20.6822196619653,
> 20.4532414913665, 20.8126113450884, 20.4397608946311,
> 21.4603719009067, 21.5318145314919, 21.0400816517662,
> 21.0466431076593, 20.7459819969019, 20.6723053403015,
> 20.4793421418014, 20.6432035537608, 20.6831942471622,
> 21.6913537667357, 20.6562913013787, 21.0940693071186,
> 20.9473294479256, 20.5087271424267, 16.0871520250047,
> 16.3816612332698, 16.998645516939, 15.7912392142223,
> 14.5058735666446, 13.6035104425928, 14.4369066987207,
> 14.6998435295626, 14.6818972267862, 14.1086877961546,
> 14.3539049235617, 15.40862828087, 15.0657947671893,
> 14.8615716011254, 14.5538692431961, 14.2397476835569,
> 13.4381420777437, 13.4499224158638, 13.6887966810545,
> 14.6550275257018, 13.500966330283, 14.9271297886953,
> 14.7405186421119, 15.0047910398043, 14.7051463678038,
> 14.8325933769599, 12.9854861991046, 13.4203550220891,
> 15.399010832952, 15.4064707685293, 15.0953970227926,
> 15.0712109416537, 15.7587957644032, 15.0013202225009,
> 15.7608498673217, 14.7604080920677, 14.2478533598602,
> 14.4140245098782, 14.7936541075062, 14.7684428120549,
> 14.595607155062, 16.1507389488284, 16.4915712924337,
> 14.490161446684, 14.721633263063, 14.4341721012904,
> 15.8747652729112, 14.543333961671, 14.8633635585377,
> 14.6696601802386, 13.3020676725265, 14.0190694293311,
> 15.2168973938334, 12.6304946615056, 12.1972166931101,
> 12.7960396088298, 14.4285564621952, 14.5308330346953,
> 14.1496677436943, 14.0823985634278, 12.8407779235951,
> 14.6543003749437, 14.3202364452416, 15.1723493709662,
> 14.0744760007345, 14.8132801684508, 12.9183042336999,
> 14.5225202325766, 13.742309436084)), .Names = c("cpdID",
"time",
> "treatment",
> "expREP", "techREP", "Y")))
hadley wickham
2007-Mar-08 18:30 UTC
[R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work
> Hello R-users > The help I received from Petr helped me created this solution to my problems. > > t1<-with(mydata ,aggregate(mydata$Y, > list(mydata$time,mydata$treatment, mydata$expREP, mydata$techREP) , > median, na.rm=T)) ### find median by factors #### > > colnames(t1)<-c("time","treatment","expREP","techREP","Y50") ### column name ## > > newdata<-merge(mydata, t1, by.x= names(mydata)[2:5], > by.y=names(t1)[1:4], all=T) >Another way is to use the reshape package, http://had.co.nz/reshape library(reshape) molten <- melt(mydata, m="log2Abun") cast(molten, time + treatment + expREP + techREP ~ ., median) # You can also create many other "shapes" easily: cast(molten, expREP + techREP ~ time + treatment , median) cast(molten, expREP + techREP ~ time + treatment , median, margins=TRUE) Hadley