ALAN SMITH
2007-Mar-07 23:25 UTC
[R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work
Hello R users, Problem.......I do not understand how to use "aggregate","by", or the appropriate "apply" to perform a function on data with more than one factor on unbalanced data... I have a data frame in the long format that does not contain balanced data. The ID is a unique identifier corresponding to the experimental unit that will later be examined by ANOVA, T-tests etc. Y is the data generated from the experiment. The factors represent the differences between each sample or "run" measured. str(mydata) ### sample of table at bottom of email ### 'data.frame': 129982 obs. of 6 variables: $ ID : num 7 7 7 7 7 7 8 8 8 8 ... $ time : Factor w/ 2 levels "120hr","24hr": 1 1 1 1 2 2 2 1 1 1 ... $ treatment: Factor w/ 2 levels "control","trt": 1 1 1 2 2 1 1 2 1 1 ... $ expREP : Factor w/ 3 levels "expREP1","expREP2",..: 1 1 1 3 1 1 1 1 2 2 ... $ techREP : Factor w/ 3 levels "techREP1","techREP2",..: 3 2 1 1 1 3 1 3 3 2 ... $ Y : num 14.4 14.1 14.2 13.8 14.1 ... Could someone please help with doing something like the following 1. I would like to find the median for each unique combination of factors using the data in the long format (like finding the median of a single column of data). 2. Create a new column where the median is repeated for the number of rows of the unique factor combination 3. I would like to learn the most efficient way to do this because I want to avoid recreating the table from scratch with many commands like the series below. I will have to perform this operation on many different data sets some, with many more factors then this example. ### help me learn to use an apply or other command that will do the following ##### m0<-mydata$cpdID[mydata$time=="24hr" & mydata$treatment=="control" & mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] m1<-mydata$Y[mydata$time=="24hr" & mydata$treatment=="control" & mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] m2<-median(m1) m3<-cbind(ID=m0,time=rep("24hr",length(m1)), treatment=rep("control",length(m1)), expREP=rep("expREP1",length(m1)), techREP=rep("techREP1",length(m1)),Y=m1,Y50=rep(m2,length(m1))) ######### I would like to avoid writing the above hundreds of times ###### I am able to reshape into wide format and then find the column medians. However restacking the data and regenerating the factors becomes very very messy on data sets with 150 columns. I am able to preform this analysis is SAS easily using BY, but I would like to know how to do it in R. I have tried these commands in a number of different variations with no luck and similar error messages test1<-aggregate(mydata[,-1], list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) ,median, na.rm=T) Error in median.default(X[[1]], ...) : need numeric data ### Y in numeric#### test1<-by(mydata[,-1], list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) ,median, na.rm=T) Error in median.default(data[x, ], ...) : need numeric data Thanks Alan winXP R 2.4.1 #####Example data frame###### mydata<-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23, 23, 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47), time = structure(as.integer(c(1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2)), .Label = c("120hr", "24hr"), class = "factor"), treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 2, 1)), .Label = c("control", "trt"), class = "factor"), expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3, 3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1, 1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2, 2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2, 3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1, 1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1, 3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3, 3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2, 3)), .Label = c("expREP1", "expREP2", "expREP3"), class = "factor"), techREP = structure(as.integer(c(3, 2, 1, 1, 1, 3, 1, 3, 3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2, 3, 1, 1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2, 3, 1, 3, 2, 1, 1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, 2, 3, 1, 3, 2, 2, 1, 2, 1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3, 1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3, 2, 1, 3, 1, 3, 2, 3, 1, 1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2, 3, 2, 2, 1, 2, 2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1, 3, 2, 3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1, 2, 1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1, 3, 2, 3, 3, 2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2, 2, 3)), .Label = c("techREP1", "techREP2", "techREP3"), class = "factor"), log2Abun = c(14.4233129144089, 14.052822741429, 14.2281422686467, 13.8492096005693, 14.076481601207, 14.2139395740777, 14.3399195756207, 14.3625602954496, 14.0141948668145, 14.0980320829605, 14.3152203363759, 14.4528846974866, 13.9591869268449, 14.4064043323413, 14.0403753485321, 14.2285932517829, 14.1259784261721, 13.5925738310379, 13.5830827675029, 13.0280787227049, 15.0198078807043, 12.8423503434138, 12.645883554519, 13.4644181177386, 12.8399910705399, 12.7879025593914, 12.4978518369511, 14.3949985145017, 12.8670856466168, 12.9749522735341, 13.3456824481868, 13.4557125040673, 12.8989792046225, 16.0609491915918, 13.6795900568273, 16.456466720182, 13.6145948287653, 13.2604785448039, 14.8573006848798, 13.1382718001722, 13.690761908446, 14.0557060971613, 13.7495552174335, 13.6336764098923, 13.7844303674846, 15.9518993688317, 13.2452555803066, 13.1930632791304, 12.1919845133603, 13.8710388986595, 13.6375305515253, 12.5919897676151, 17.4797250127015, 17.4014712120155, 17.5948202702163, 12.6031626795344, 17.8287811089804, 11.3613955331659, 15.8064741020529, 15.1007855146758, 16.0553036215393, 15.7553570530353, 15.9747058600332, 15.776715745005, 15.8588066550904, 16.2935434944118, 16.271207673964, 16.3660489506706, 16.3273070282017, 15.7632383068689, 14.6030467398838, 14.7118820283521, 14.7577545959238, 14.7315311764619, 14.8250084466403, 15.6652803936783, 15.8249587405285, 15.6558660906456, 15.5387042614836, 14.8487696278309, 15.5477380355109, 15.9451465974129, 16.196755792715, 15.9999119421954, 15.8660714836595, 15.9406577104549, 15.8754613979164, 16.0358944927638, 16.1785092456522, 16.1992122284106, 15.8087128474547, 15.9373968104322, 16.1432636222427, 16.2412011305004, 15.9488234774507, 15.7820255767261, 15.7730361533934, 15.7459893802453, 20.7777738189812, 21.7489122647969, 21.0374490930058, 20.9765158780184, 21.0464959041766, 21.6790715518273, 21.8021013715842, 20.7652083875471, 20.6663696521617, 20.3963413756589, 20.7983642126234, 20.1864915044977, 20.4422216681915, 20.59064186918, 20.6964531077756, 20.6822196619653, 20.4532414913665, 20.8126113450884, 20.4397608946311, 21.4603719009067, 21.5318145314919, 21.0400816517662, 21.0466431076593, 20.7459819969019, 20.6723053403015, 20.4793421418014, 20.6432035537608, 20.6831942471622, 21.6913537667357, 20.6562913013787, 21.0940693071186, 20.9473294479256, 20.5087271424267, 16.0871520250047, 16.3816612332698, 16.998645516939, 15.7912392142223, 14.5058735666446, 13.6035104425928, 14.4369066987207, 14.6998435295626, 14.6818972267862, 14.1086877961546, 14.3539049235617, 15.40862828087, 15.0657947671893, 14.8615716011254, 14.5538692431961, 14.2397476835569, 13.4381420777437, 13.4499224158638, 13.6887966810545, 14.6550275257018, 13.500966330283, 14.9271297886953, 14.7405186421119, 15.0047910398043, 14.7051463678038, 14.8325933769599, 12.9854861991046, 13.4203550220891, 15.399010832952, 15.4064707685293, 15.0953970227926, 15.0712109416537, 15.7587957644032, 15.0013202225009, 15.7608498673217, 14.7604080920677, 14.2478533598602, 14.4140245098782, 14.7936541075062, 14.7684428120549, 14.595607155062, 16.1507389488284, 16.4915712924337, 14.490161446684, 14.721633263063, 14.4341721012904, 15.8747652729112, 14.543333961671, 14.8633635585377, 14.6696601802386, 13.3020676725265, 14.0190694293311, 15.2168973938334, 12.6304946615056, 12.1972166931101, 12.7960396088298, 14.4285564621952, 14.5308330346953, 14.1496677436943, 14.0823985634278, 12.8407779235951, 14.6543003749437, 14.3202364452416, 15.1723493709662, 14.0744760007345, 14.8132801684508, 12.9183042336999, 14.5225202325766, 13.742309436084)), .Names = c("cpdID", "time", "treatment", "expREP", "techREP", "Y")))
jim holtman
2007-Mar-08 00:16 UTC
[R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work
Here is one way of doing it:> # create the rows for each unique combination > x.split <- split(seq(nrow(mydata)), list(mydata$time, mydata$treatment,+ mydata$expREP, mydata$techREP), drop=TRUE)> # now go through the list of indices and add the median > mydata$Y50 <- 0 # add the dummy median column > for (i in x.split){+ mydata$Y50[i] <- median(mydata$Y[i]) # median for each group + }> head(mydata,20)cpdID time treatment expREP techREP Y Y50 1 7 120hr control expREP1 techREP3 14.42331 15.74599 2 7 120hr control expREP1 techREP2 14.05282 15.10810 3 7 120hr control expREP1 techREP1 14.22814 14.63248 4 7 120hr trt expREP3 techREP1 13.84921 15.08641 5 7 24hr trt expREP1 techREP1 14.07648 15.17235 6 7 24hr control expREP1 techREP3 14.21394 14.63314 7 8 24hr control expREP1 techREP1 14.33992 14.81328 8 8 120hr trt expREP1 techREP3 14.36256 15.34493 9 8 120hr control expREP2 techREP3 14.01419 15.14270 10 8 120hr control expREP2 techREP2 14.09803 15.10079 11 8 120hr trt expREP1 techREP2 14.31522 15.39152 12 8 120hr trt expREP1 techREP1 14.45288 14.65430 13 8 24hr control expREP2 techREP1 13.95919 14.71188 14 8 24hr trt expREP2 techREP3 14.40640 14.36332 15 8 24hr trt expREP1 techREP2 14.04038 14.42856 16 8 24hr control expREP3 techREP3 14.22859 15.08463 17 8 24hr trt expREP1 techREP3 14.12598 14.53840 18 8 24hr trt expREP3 techREP1 13.59257 14.69984 19 8 24hr trt expREP3 techREP2 13.58308 14.85730 20 10 120hr control expREP3 techREP1 13.02808 14.07448>On 3/7/07, ALAN SMITH <alansmith2@gmail.com> wrote:> > Hello R users, > > Problem.......I do not understand how to use "aggregate","by", or the > appropriate "apply" to perform a function on data with more than one > factor on unbalanced data... > I have a data frame in the long format that does not contain balanced > data. The ID is a unique identifier corresponding to the experimental > unit that will later be examined by ANOVA, T-tests etc. Y is the data > generated from the experiment. The factors represent the differences > between each sample or "run" measured. > > str(mydata) ### sample of table at bottom of email ### > 'data.frame': 129982 obs. of 6 variables: > $ ID : num 7 7 7 7 7 7 8 8 8 8 ... > $ time : Factor w/ 2 levels "120hr","24hr": 1 1 1 1 2 2 2 1 1 1 ... > $ treatment: Factor w/ 2 levels "control","trt": 1 1 1 2 2 1 1 2 1 1 ... > $ expREP : Factor w/ 3 levels "expREP1","expREP2",..: 1 1 1 3 1 1 1 1 2 > 2 ... > $ techREP : Factor w/ 3 levels "techREP1","techREP2",..: 3 2 1 1 1 3 > 1 3 3 2 ... > $ Y : num 14.4 14.1 14.2 13.8 14.1 ... > > Could someone please help with doing something like the following > 1. I would like to find the median for each unique combination of > factors using the data in the long format (like finding the median > of a single column of data). > 2. Create a new column where the median is repeated for the number of > rows of the unique factor combination > 3. I would like to learn the most efficient way to do this because I > want to avoid recreating the table from scratch with many commands > like the series below. I will have to perform this operation on many > different data sets some, with many more factors then this example. > > ### help me learn to use an apply or other command that will do the > following ##### > m0<-mydata$cpdID[mydata$time=="24hr" & mydata$treatment=="control" & > mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] > m1<-mydata$Y[mydata$time=="24hr" & mydata$treatment=="control" & > mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] > m2<-median(m1) > m3<-cbind(ID=m0,time=rep("24hr",length(m1)), > treatment=rep("control",length(m1)), expREP=rep("expREP1",length(m1)), > techREP=rep("techREP1",length(m1)),Y=m1,Y50=rep(m2,length(m1))) > ######### I would like to avoid writing the above hundreds of times ###### > > I am able to reshape into wide format and then find the column > medians. However restacking the data and regenerating the factors > becomes very very messy on data sets with 150 columns. I am able to > preform this analysis is SAS easily using BY, but I would like to know > how to do it in R. > > > I have tried these commands in a number of different variations with > no luck and similar error messages > test1<-aggregate(mydata[,-1], > list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) > ,median, na.rm=T) > Error in median.default(X[[1]], ...) : need numeric data ### Y in > numeric#### > > test1<-by(mydata[,-1], > list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) > ,median, na.rm=T) > Error in median.default(data[x, ], ...) : need numeric data > > Thanks > Alan > winXP R 2.4.1 > > > #####Example data frame###### > mydata<-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8, > 8, 8, 8, 8, > 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, > 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, > 10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23, > 23, 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, > 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, > 33, 33, 33, 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42, > 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, > 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, > 42, 42, 42, 42, 42, 42, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, > 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47), time = structure(as.integer > (c(1, > 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, > 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, > 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, > 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2, > 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, > 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, > 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, > 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, > 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, > 2, 2, 1, 2, 2, 1, 2, 2, 1, 2)), .Label = c("120hr", "24hr"), class > "factor"), > treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2, > 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, > 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, > 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, > 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, > 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, > 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2, > 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, > 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, > 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, > 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, > 2, 1)), .Label = c("control", "trt"), class = "factor"), > expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2, > 2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3, > 3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1, > 1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2, > 2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2, > 3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1, > 1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1, > 3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2, > 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1, > 1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3, > 3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2, > 3)), .Label = c("expREP1", "expREP2", "expREP3"), class = "factor"), > techREP = structure(as.integer(c(3, 2, 1, 1, 1, 3, 1, 3, > 3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2, > 3, 1, 1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2, > 3, 1, 3, 2, 1, 1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, > 2, 3, 1, 3, 2, 2, 1, 2, 1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3, > 1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3, 2, 1, 3, 1, 3, 2, 3, 1, > 1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2, 3, 2, 2, 1, 2, > 2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1, 3, 2, > 3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1, > 2, 1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1, > 3, 2, 3, 3, 2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2, > 2, 3)), .Label = c("techREP1", "techREP2", "techREP3"), class > "factor"), > log2Abun = c(14.4233129144089, 14.052822741429, 14.2281422686467, > 13.8492096005693, 14.076481601207, 14.2139395740777, 14.3399195756207, > 14.3625602954496, 14.0141948668145, 14.0980320829605, 14.3152203363759, > 14.4528846974866, 13.9591869268449, 14.4064043323413, 14.0403753485321, > 14.2285932517829, 14.1259784261721, 13.5925738310379, 13.5830827675029, > 13.0280787227049, 15.0198078807043, 12.8423503434138, 12.645883554519, > 13.4644181177386, 12.8399910705399, 12.7879025593914, 12.4978518369511, > 14.3949985145017, 12.8670856466168, 12.9749522735341, 13.3456824481868, > 13.4557125040673, 12.8989792046225, 16.0609491915918, 13.6795900568273, > 16.456466720182, 13.6145948287653, 13.2604785448039, 14.8573006848798, > 13.1382718001722, 13.690761908446, 14.0557060971613, 13.7495552174335, > 13.6336764098923, 13.7844303674846, 15.9518993688317, 13.2452555803066, > 13.1930632791304, 12.1919845133603, 13.8710388986595, 13.6375305515253, > 12.5919897676151, 17.4797250127015, 17.4014712120155, 17.5948202702163, > 12.6031626795344, 17.8287811089804, 11.3613955331659, 15.8064741020529, > 15.1007855146758, 16.0553036215393, 15.7553570530353, 15.9747058600332, > 15.776715745005, 15.8588066550904, 16.2935434944118, 16.271207673964, > 16.3660489506706, 16.3273070282017, 15.7632383068689, 14.6030467398838, > 14.7118820283521, 14.7577545959238, 14.7315311764619, 14.8250084466403, > 15.6652803936783, 15.8249587405285, 15.6558660906456, 15.5387042614836, > 14.8487696278309, 15.5477380355109, 15.9451465974129, 16.196755792715, > 15.9999119421954, 15.8660714836595, 15.9406577104549, 15.8754613979164, > 16.0358944927638, 16.1785092456522, 16.1992122284106, 15.8087128474547, > 15.9373968104322, 16.1432636222427, 16.2412011305004, 15.9488234774507, > 15.7820255767261, 15.7730361533934, 15.7459893802453, 20.7777738189812, > 21.7489122647969, 21.0374490930058, 20.9765158780184, 21.0464959041766, > 21.6790715518273, 21.8021013715842, 20.7652083875471, 20.6663696521617, > 20.3963413756589, 20.7983642126234, 20.1864915044977, 20.4422216681915, > 20.59064186918, 20.6964531077756, 20.6822196619653, 20.4532414913665, > 20.8126113450884, 20.4397608946311, 21.4603719009067, 21.5318145314919, > 21.0400816517662, 21.0466431076593, 20.7459819969019, 20.6723053403015, > 20.4793421418014, 20.6432035537608, 20.6831942471622, 21.6913537667357, > 20.6562913013787, 21.0940693071186, 20.9473294479256, 20.5087271424267, > 16.0871520250047, 16.3816612332698, 16.998645516939, 15.7912392142223, > 14.5058735666446, 13.6035104425928, 14.4369066987207, 14.6998435295626, > 14.6818972267862, 14.1086877961546, 14.3539049235617, 15.40862828087, > 15.0657947671893, 14.8615716011254, 14.5538692431961, 14.2397476835569, > 13.4381420777437, 13.4499224158638, 13.6887966810545, 14.6550275257018, > 13.500966330283, 14.9271297886953, 14.7405186421119, 15.0047910398043, > 14.7051463678038, 14.8325933769599, 12.9854861991046, 13.4203550220891, > 15.399010832952, 15.4064707685293, 15.0953970227926, 15.0712109416537, > 15.7587957644032, 15.0013202225009, 15.7608498673217, 14.7604080920677, > 14.2478533598602, 14.4140245098782, 14.7936541075062, 14.7684428120549, > 14.595607155062, 16.1507389488284, 16.4915712924337, 14.490161446684, > 14.721633263063, 14.4341721012904, 15.8747652729112, 14.543333961671, > 14.8633635585377, 14.6696601802386, 13.3020676725265, 14.0190694293311, > 15.2168973938334, 12.6304946615056, 12.1972166931101, 12.7960396088298, > 14.4285564621952, 14.5308330346953, 14.1496677436943, 14.0823985634278, > 12.8407779235951, 14.6543003749437, 14.3202364452416, 15.1723493709662, > 14.0744760007345, 14.8132801684508, 12.9183042336999, 14.5225202325766, > 13.742309436084)), .Names = c("cpdID", "time", "treatment", > "expREP", "techREP", "Y"))) > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
Mendiburu, Felipe (CIP)
2007-Mar-08 01:51 UTC
[R] how to "apply" functions to unbalanced data in long format byfactors......cant get "by" or "aggregate" to work
Dear Alan, I think that podria to be of utility the function tapply.stat () of the package agricolae. see ?tapply.stat Regards, Felipe. for example: library(agricolae) attach(mydata) set1<-tapply.stat(mydata[,2:5],Y,median) set2<-tapply.stat(time,Y,function(x) median(x)) set3<-tapply.stat(mydata[,c(2,3)],Y,function(x) median(x)) set2 time Y 1 120hr 14.94159 2 24hr 14.81914 set3 time treatment Y 1 120hr control 15.31974 2 120hr trt 14.82851 3 24hr control 15.03627 4 24hr trt 14.70249 ________________________________ From: r-help-bounces at stat.math.ethz.ch on behalf of ALAN SMITH Sent: Wed 3/7/2007 6:25 PM To: r-help at stat.math.ethz.ch Subject: [R] how to "apply" functions to unbalanced data in long format byfactors......cant get "by" or "aggregate" to work Hello R users, Problem.......I do not understand how to use "aggregate","by", or the appropriate "apply" to perform a function on data with more than one factor on unbalanced data... I have a data frame in the long format that does not contain balanced data. The ID is a unique identifier corresponding to the experimental unit that will later be examined by ANOVA, T-tests etc. Y is the data generated from the experiment. The factors represent the differences between each sample or "run" measured. str(mydata) ### sample of table at bottom of email ### 'data.frame': 129982 obs. of 6 variables: $ ID : num 7 7 7 7 7 7 8 8 8 8 ... $ time : Factor w/ 2 levels "120hr","24hr": 1 1 1 1 2 2 2 1 1 1 ... $ treatment: Factor w/ 2 levels "control","trt": 1 1 1 2 2 1 1 2 1 1 ... $ expREP : Factor w/ 3 levels "expREP1","expREP2",..: 1 1 1 3 1 1 1 1 2 2 ... $ techREP : Factor w/ 3 levels "techREP1","techREP2",..: 3 2 1 1 1 3 1 3 3 2 ... $ Y : num 14.4 14.1 14.2 13.8 14.1 ... Could someone please help with doing something like the following 1. I would like to find the median for each unique combination of factors using the data in the long format (like finding the median of a single column of data). 2. Create a new column where the median is repeated for the number of rows of the unique factor combination 3. I would like to learn the most efficient way to do this because I want to avoid recreating the table from scratch with many commands like the series below. I will have to perform this operation on many different data sets some, with many more factors then this example. ### help me learn to use an apply or other command that will do the following ##### m0<-mydata$cpdID[mydata$time=="24hr" & mydata$treatment=="control" & mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] m1<-mydata$Y[mydata$time=="24hr" & mydata$treatment=="control" & mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] m2<-median(m1) m3<-cbind(ID=m0,time=rep("24hr",length(m1)), treatment=rep("control",length(m1)), expREP=rep("expREP1",length(m1)), techREP=rep("techREP1",length(m1)),Y=m1,Y50=rep(m2,length(m1))) ######### I would like to avoid writing the above hundreds of times ###### I am able to reshape into wide format and then find the column medians. However restacking the data and regenerating the factors becomes very very messy on data sets with 150 columns. I am able to preform this analysis is SAS easily using BY, but I would like to know how to do it in R. I have tried these commands in a number of different variations with no luck and similar error messages test1<-aggregate(mydata[,-1], list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) ,median, na.rm=T) Error in median.default(X[[1]], ...) : need numeric data ### Y in numeric#### test1<-by(mydata[,-1], list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) ,median, na.rm=T) Error in median.default(data[x, ], ...) : need numeric data Thanks Alan winXP R 2.4.1 #####Example data frame###### mydata<-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23, 23, 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47), time = structure(as.integer(c(1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2)), .Label = c("120hr", "24hr"), class = "factor"), treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 2, 1)), .Label = c("control", "trt"), class = "factor"), expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3, 3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1, 1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2, 2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2, 3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1, 1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1, 3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3, 3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2, 3)), .Label = c("expREP1", "expREP2", "expREP3"), class = "factor"), techREP = structure(as.integer(c(3, 2, 1, 1, 1, 3, 1, 3, 3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2, 3, 1, 1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2, 3, 1, 3, 2, 1, 1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, 2, 3, 1, 3, 2, 2, 1, 2, 1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3, 1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3, 2, 1, 3, 1, 3, 2, 3, 1, 1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2, 3, 2, 2, 1, 2, 2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1, 3, 2, 3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1, 2, 1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1, 3, 2, 3, 3, 2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2, 2, 3)), .Label = c("techREP1", "techREP2", "techREP3"), class = "factor"), log2Abun = c(14.4233129144089, 14.052822741429, 14.2281422686467, 13.8492096005693, 14.076481601207, 14.2139395740777, 14.3399195756207, 14.3625602954496, 14.0141948668145, 14.0980320829605, 14.3152203363759, 14.4528846974866, 13.9591869268449, 14.4064043323413, 14.0403753485321, 14.2285932517829, 14.1259784261721, 13.5925738310379, 13.5830827675029, 13.0280787227049, 15.0198078807043, 12.8423503434138, 12.645883554519, 13.4644181177386, 12.8399910705399, 12.7879025593914, 12.4978518369511, 14.3949985145017, 12.8670856466168, 12.9749522735341, 13.3456824481868, 13.4557125040673, 12.8989792046225, 16.0609491915918, 13.6795900568273, 16.456466720182, 13.6145948287653, 13.2604785448039, 14.8573006848798, 13.1382718001722, 13.690761908446, 14.0557060971613, 13.7495552174335, 13.6336764098923, 13.7844303674846, 15.9518993688317, 13.2452555803066, 13.1930632791304, 12.1919845133603, 13.8710388986595, 13.6375305515253, 12.5919897676151, 17.4797250127015, 17.4014712120155, 17.5948202702163, 12.6031626795344, 17.8287811089804, 11.3613955331659, 15.8064741020529, 15.1007855146758, 16.0553036215393, 15.7553570530353, 15.9747058600332, 15.776715745005, 15.8588066550904, 16.2935434944118, 16.271207673964, 16.3660489506706, 16.3273070282017, 15.7632383068689, 14.6030467398838, 14.7118820283521, 14.7577545959238, 14.7315311764619, 14.8250084466403, 15.6652803936783, 15.8249587405285, 15.6558660906456, 15.5387042614836, 14.8487696278309, 15.5477380355109, 15.9451465974129, 16.196755792715, 15.9999119421954, 15.8660714836595, 15.9406577104549, 15.8754613979164, 16.0358944927638, 16.1785092456522, 16.1992122284106, 15.8087128474547, 15.9373968104322, 16.1432636222427, 16.2412011305004, 15.9488234774507, 15.7820255767261, 15.7730361533934, 15.7459893802453, 20.7777738189812, 21.7489122647969, 21.0374490930058, 20.9765158780184, 21.0464959041766, 21.6790715518273, 21.8021013715842, 20.7652083875471, 20.6663696521617, 20.3963413756589, 20.7983642126234, 20.1864915044977, 20.4422216681915, 20.59064186918, 20.6964531077756, 20.6822196619653, 20.4532414913665, 20.8126113450884, 20.4397608946311, 21.4603719009067, 21.5318145314919, 21.0400816517662, 21.0466431076593, 20.7459819969019, 20.6723053403015, 20.4793421418014, 20.6432035537608, 20.6831942471622, 21.6913537667357, 20.6562913013787, 21.0940693071186, 20.9473294479256, 20.5087271424267, 16.0871520250047, 16.3816612332698, 16.998645516939, 15.7912392142223, 14.5058735666446, 13.6035104425928, 14.4369066987207, 14.6998435295626, 14.6818972267862, 14.1086877961546, 14.3539049235617, 15.40862828087, 15.0657947671893, 14.8615716011254, 14.5538692431961, 14.2397476835569, 13.4381420777437, 13.4499224158638, 13.6887966810545, 14.6550275257018, 13.500966330283, 14.9271297886953, 14.7405186421119, 15.0047910398043, 14.7051463678038, 14.8325933769599, 12.9854861991046, 13.4203550220891, 15.399010832952, 15.4064707685293, 15.0953970227926, 15.0712109416537, 15.7587957644032, 15.0013202225009, 15.7608498673217, 14.7604080920677, 14.2478533598602, 14.4140245098782, 14.7936541075062, 14.7684428120549, 14.595607155062, 16.1507389488284, 16.4915712924337, 14.490161446684, 14.721633263063, 14.4341721012904, 15.8747652729112, 14.543333961671, 14.8633635585377, 14.6696601802386, 13.3020676725265, 14.0190694293311, 15.2168973938334, 12.6304946615056, 12.1972166931101, 12.7960396088298, 14.4285564621952, 14.5308330346953, 14.1496677436943, 14.0823985634278, 12.8407779235951, 14.6543003749437, 14.3202364452416, 15.1723493709662, 14.0744760007345, 14.8132801684508, 12.9183042336999, 14.5225202325766, 13.742309436084)), .Names = c("cpdID", "time", "treatment", "expREP", "techREP", "Y"))) ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Petr Pikal
2007-Mar-08 07:00 UTC
[R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work
Hi you can use aggregate to create table of medians with(mydata, aggregate(Y, list(time, tratment, expRep,....), median) repeats of unique factors either by rle or aggregate with length function Then you can do replication by norep <- rep(your.median, each = your replicates) Regards Petr On 7 Mar 2007 at 17:25, ALAN SMITH wrote: Date sent: Wed, 7 Mar 2007 17:25:54 -0600 From: "ALAN SMITH" <alansmith2 at gmail.com> To: r-help at stat.math.ethz.ch Subject: [R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work> Hello R users, > > Problem.......I do not understand how to use "aggregate","by", or the > appropriate "apply" to perform a function on data with more than one > factor on unbalanced data... I have a data frame in the long format > that does not contain balanced data. The ID is a unique identifier > corresponding to the experimental unit that will later be examined by > ANOVA, T-tests etc. Y is the data generated from the experiment. The > factors represent the differences between each sample or "run" > measured. > > str(mydata) ### sample of table at bottom of email ### > 'data.frame': 129982 obs. of 6 variables: > $ ID : num 7 7 7 7 7 7 8 8 8 8 ... > $ time : Factor w/ 2 levels "120hr","24hr": 1 1 1 1 2 2 2 1 1 1 > ... $ treatment: Factor w/ 2 levels "control","trt": 1 1 1 2 2 1 1 2 > 1 1 ... $ expREP : Factor w/ 3 levels "expREP1","expREP2",..: 1 1 1 > 3 1 1 1 1 2 2 ... $ techREP : Factor w/ 3 levels > "techREP1","techREP2",..: 3 2 1 1 1 3 > 1 3 3 2 ... > $ Y : num 14.4 14.1 14.2 13.8 14.1 ... > > Could someone please help with doing something like the following 1. I > would like to find the median for each unique combination of factors > using the data in the long format (like finding the median of a > single column of data). 2. Create a new column where the median is > repeated for the number of rows of the unique factor combination 3. I > would like to learn the most efficient way to do this because I want > to avoid recreating the table from scratch with many commands like the > series below. I will have to perform this operation on many different > data sets some, with many more factors then this example. > > ### help me learn to use an apply or other command that will do the > following ##### > m0<-mydata$cpdID[mydata$time=="24hr" & mydata$treatment=="control" & > mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] > m1<-mydata$Y[mydata$time=="24hr" & mydata$treatment=="control" & > mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] m2<-median(m1) > m3<-cbind(ID=m0,time=rep("24hr",length(m1)), > treatment=rep("control",length(m1)), expREP=rep("expREP1",length(m1)), > techREP=rep("techREP1",length(m1)),Y=m1,Y50=rep(m2,length(m1))) > ######### I would like to avoid writing the above hundreds of times > ###### > > I am able to reshape into wide format and then find the column > medians. However restacking the data and regenerating the factors > becomes very very messy on data sets with 150 columns. I am able to > preform this analysis is SAS easily using BY, but I would like to know > how to do it in R. > > > I have tried these commands in a number of different variations with > no luck and similar error messages > test1<-aggregate(mydata[,-1], > list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) > ,median, na.rm=T) > Error in median.default(X[[1]], ...) : need numeric data ### Y in > numeric#### > > test1<-by(mydata[,-1], > list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP) > ,median, na.rm=T) > Error in median.default(data[x, ], ...) : need numeric data > > Thanks > Alan > winXP R 2.4.1 > > > #####Example data frame###### > mydata<-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8, > 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, > 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, > 10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23, 23, > 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, > 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, > 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, > 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, > 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 47, 47, > 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, > 47, 47), time = structure(as.integer(c(1, 1, 1, 1, 2, 2, 2, 1, 1, 1, > 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, > 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, > 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, > 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, > 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, > 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, > 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, > 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, > 2, 1, 2, 2, 1, 2)), .Label = c("120hr", "24hr"), class = "factor"), > treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2, > 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, > 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, > 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, > 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, > 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, > 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2, > 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, > 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, > 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, > 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, > 2, 1)), .Label = c("control", "trt"), class = "factor"), > expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2, > 2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3, > 3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1, > 1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2, > 2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2, > 3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1, > 1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1, > 3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2, > 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1, > 1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3, > 3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2, > 3)), .Label = c("expREP1", "expREP2", "expREP3"), class > "factor"), techREP = structure(as.integer(c(3, 2, 1, 1, 1, 3, 1, > 3, 3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2, 3, 1, > 1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2, 3, 1, 3, 2, 1, > 1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, 2, 3, 1, 3, 2, 2, 1, 2, > 1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3, 1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3, > 2, 1, 3, 1, 3, 2, 3, 1, 1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2, > 3, 2, 2, 1, 2, 2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1, > 3, 2, 3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1, 2, > 1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1, 3, 2, 3, 3, > 2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2, 2, 3)), .Label > c("techREP1", "techREP2", "techREP3"), class = "factor"), log2Abun > = c(14.4233129144089, 14.052822741429, 14.2281422686467, > 13.8492096005693, 14.076481601207, 14.2139395740777, > 14.3399195756207, 14.3625602954496, 14.0141948668145, > 14.0980320829605, 14.3152203363759, 14.4528846974866, > 13.9591869268449, 14.4064043323413, 14.0403753485321, > 14.2285932517829, 14.1259784261721, 13.5925738310379, > 13.5830827675029, 13.0280787227049, 15.0198078807043, > 12.8423503434138, 12.645883554519, 13.4644181177386, > 12.8399910705399, 12.7879025593914, 12.4978518369511, > 14.3949985145017, 12.8670856466168, 12.9749522735341, > 13.3456824481868, 13.4557125040673, 12.8989792046225, > 16.0609491915918, 13.6795900568273, 16.456466720182, > 13.6145948287653, 13.2604785448039, 14.8573006848798, > 13.1382718001722, 13.690761908446, 14.0557060971613, > 13.7495552174335, 13.6336764098923, 13.7844303674846, > 15.9518993688317, 13.2452555803066, 13.1930632791304, > 12.1919845133603, 13.8710388986595, 13.6375305515253, > 12.5919897676151, 17.4797250127015, 17.4014712120155, > 17.5948202702163, 12.6031626795344, 17.8287811089804, > 11.3613955331659, 15.8064741020529, 15.1007855146758, > 16.0553036215393, 15.7553570530353, 15.9747058600332, > 15.776715745005, 15.8588066550904, 16.2935434944118, > 16.271207673964, 16.3660489506706, 16.3273070282017, > 15.7632383068689, 14.6030467398838, 14.7118820283521, > 14.7577545959238, 14.7315311764619, 14.8250084466403, > 15.6652803936783, 15.8249587405285, 15.6558660906456, > 15.5387042614836, 14.8487696278309, 15.5477380355109, > 15.9451465974129, 16.196755792715, 15.9999119421954, > 15.8660714836595, 15.9406577104549, 15.8754613979164, > 16.0358944927638, 16.1785092456522, 16.1992122284106, > 15.8087128474547, 15.9373968104322, 16.1432636222427, > 16.2412011305004, 15.9488234774507, 15.7820255767261, > 15.7730361533934, 15.7459893802453, 20.7777738189812, > 21.7489122647969, 21.0374490930058, 20.9765158780184, > 21.0464959041766, 21.6790715518273, 21.8021013715842, > 20.7652083875471, 20.6663696521617, 20.3963413756589, > 20.7983642126234, 20.1864915044977, 20.4422216681915, > 20.59064186918, 20.6964531077756, 20.6822196619653, > 20.4532414913665, 20.8126113450884, 20.4397608946311, > 21.4603719009067, 21.5318145314919, 21.0400816517662, > 21.0466431076593, 20.7459819969019, 20.6723053403015, > 20.4793421418014, 20.6432035537608, 20.6831942471622, > 21.6913537667357, 20.6562913013787, 21.0940693071186, > 20.9473294479256, 20.5087271424267, 16.0871520250047, > 16.3816612332698, 16.998645516939, 15.7912392142223, > 14.5058735666446, 13.6035104425928, 14.4369066987207, > 14.6998435295626, 14.6818972267862, 14.1086877961546, > 14.3539049235617, 15.40862828087, 15.0657947671893, > 14.8615716011254, 14.5538692431961, 14.2397476835569, > 13.4381420777437, 13.4499224158638, 13.6887966810545, > 14.6550275257018, 13.500966330283, 14.9271297886953, > 14.7405186421119, 15.0047910398043, 14.7051463678038, > 14.8325933769599, 12.9854861991046, 13.4203550220891, > 15.399010832952, 15.4064707685293, 15.0953970227926, > 15.0712109416537, 15.7587957644032, 15.0013202225009, > 15.7608498673217, 14.7604080920677, 14.2478533598602, > 14.4140245098782, 14.7936541075062, 14.7684428120549, > 14.595607155062, 16.1507389488284, 16.4915712924337, > 14.490161446684, 14.721633263063, 14.4341721012904, > 15.8747652729112, 14.543333961671, 14.8633635585377, > 14.6696601802386, 13.3020676725265, 14.0190694293311, > 15.2168973938334, 12.6304946615056, 12.1972166931101, > 12.7960396088298, 14.4285564621952, 14.5308330346953, > 14.1496677436943, 14.0823985634278, 12.8407779235951, > 14.6543003749437, 14.3202364452416, 15.1723493709662, > 14.0744760007345, 14.8132801684508, 12.9183042336999, > 14.5225202325766, 13.742309436084)), .Names = c("cpdID", "time", > "treatment", > "expREP", "techREP", "Y"))) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code.Petr Pikal petr.pikal at precheza.cz
ALAN SMITH
2007-Mar-08 17:34 UTC
[R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work
Hello R-users The help I received from Petr helped me created this solution to my problems. t1<-with(mydata ,aggregate(mydata$Y, list(mydata$time,mydata$treatment, mydata$expREP, mydata$techREP) , median, na.rm=T)) ### find median by factors #### colnames(t1)<-c("time","treatment","expREP","techREP","Y50") ### column name ## newdata<-merge(mydata, t1, by.x= names(mydata)[2:5], by.y=names(t1)[1:4], all=T) Thank you, Alan ############################################################### Message: 97 Date: Thu, 08 Mar 2007 08:00:53 +0100 From: "Petr Pikal" <petr.pikal at precheza.cz> Subject: Re: [R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work To: "ALAN SMITH" <alansmith2 at gmail.com>, r-help at stat.math.ethz.ch Message-ID: <45EFC2B5.29775.2FD750 at localhost> Content-Type: text/plain; charset=US-ASCII Hi you can use aggregate to create table of medians with(mydata, aggregate(Y, list(time, tratment, expRep,....), median) repeats of unique factors either by rle or aggregate with length function Then you can do replication by norep <- rep(your.median, each = your replicates) Regards Petr submitted question abrigded> Hello R users,> #####Example data frame###### > mydata<-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8, > 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, > 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, > 10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23, 23, > 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, > 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, > 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, > 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, > 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 47, 47, > 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, > 47, 47), time = structure(as.integer(c(1, 1, 1, 1, 2, 2, 2, 1, 1, 1, > 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, > 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, > 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, > 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, > 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, > 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, > 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, > 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, > 2, 1, 2, 2, 1, 2)), .Label = c("120hr", "24hr"), class = "factor"), > treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2, > 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, > 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, > 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, > 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, > 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, > 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2, > 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, > 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, > 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, > 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, > 2, 1)), .Label = c("control", "trt"), class = "factor"), > expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2, > 2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3, > 3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1, > 1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2, > 2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2, > 3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1, > 1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1, > 3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2, > 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1, > 1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3, > 3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2, > 3)), .Label = c("expREP1", "expREP2", "expREP3"), class > "factor"), techREP = structure(as.integer(c(3, 2, 1, 1, 1, 3, 1, > 3, 3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2, 3, 1, > 1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2, 3, 1, 3, 2, 1, > 1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, 2, 3, 1, 3, 2, 2, 1, 2, > 1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3, 1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3, > 2, 1, 3, 1, 3, 2, 3, 1, 1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2, > 3, 2, 2, 1, 2, 2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1, > 3, 2, 3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1, 2, > 1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1, 3, 2, 3, 3, > 2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2, 2, 3)), .Label > c("techREP1", "techREP2", "techREP3"), class = "factor"), log2Abun > = c(14.4233129144089, 14.052822741429, 14.2281422686467, > 13.8492096005693, 14.076481601207, 14.2139395740777, > 14.3399195756207, 14.3625602954496, 14.0141948668145, > 14.0980320829605, 14.3152203363759, 14.4528846974866, > 13.9591869268449, 14.4064043323413, 14.0403753485321, > 14.2285932517829, 14.1259784261721, 13.5925738310379, > 13.5830827675029, 13.0280787227049, 15.0198078807043, > 12.8423503434138, 12.645883554519, 13.4644181177386, > 12.8399910705399, 12.7879025593914, 12.4978518369511, > 14.3949985145017, 12.8670856466168, 12.9749522735341, > 13.3456824481868, 13.4557125040673, 12.8989792046225, > 16.0609491915918, 13.6795900568273, 16.456466720182, > 13.6145948287653, 13.2604785448039, 14.8573006848798, > 13.1382718001722, 13.690761908446, 14.0557060971613, > 13.7495552174335, 13.6336764098923, 13.7844303674846, > 15.9518993688317, 13.2452555803066, 13.1930632791304, > 12.1919845133603, 13.8710388986595, 13.6375305515253, > 12.5919897676151, 17.4797250127015, 17.4014712120155, > 17.5948202702163, 12.6031626795344, 17.8287811089804, > 11.3613955331659, 15.8064741020529, 15.1007855146758, > 16.0553036215393, 15.7553570530353, 15.9747058600332, > 15.776715745005, 15.8588066550904, 16.2935434944118, > 16.271207673964, 16.3660489506706, 16.3273070282017, > 15.7632383068689, 14.6030467398838, 14.7118820283521, > 14.7577545959238, 14.7315311764619, 14.8250084466403, > 15.6652803936783, 15.8249587405285, 15.6558660906456, > 15.5387042614836, 14.8487696278309, 15.5477380355109, > 15.9451465974129, 16.196755792715, 15.9999119421954, > 15.8660714836595, 15.9406577104549, 15.8754613979164, > 16.0358944927638, 16.1785092456522, 16.1992122284106, > 15.8087128474547, 15.9373968104322, 16.1432636222427, > 16.2412011305004, 15.9488234774507, 15.7820255767261, > 15.7730361533934, 15.7459893802453, 20.7777738189812, > 21.7489122647969, 21.0374490930058, 20.9765158780184, > 21.0464959041766, 21.6790715518273, 21.8021013715842, > 20.7652083875471, 20.6663696521617, 20.3963413756589, > 20.7983642126234, 20.1864915044977, 20.4422216681915, > 20.59064186918, 20.6964531077756, 20.6822196619653, > 20.4532414913665, 20.8126113450884, 20.4397608946311, > 21.4603719009067, 21.5318145314919, 21.0400816517662, > 21.0466431076593, 20.7459819969019, 20.6723053403015, > 20.4793421418014, 20.6432035537608, 20.6831942471622, > 21.6913537667357, 20.6562913013787, 21.0940693071186, > 20.9473294479256, 20.5087271424267, 16.0871520250047, > 16.3816612332698, 16.998645516939, 15.7912392142223, > 14.5058735666446, 13.6035104425928, 14.4369066987207, > 14.6998435295626, 14.6818972267862, 14.1086877961546, > 14.3539049235617, 15.40862828087, 15.0657947671893, > 14.8615716011254, 14.5538692431961, 14.2397476835569, > 13.4381420777437, 13.4499224158638, 13.6887966810545, > 14.6550275257018, 13.500966330283, 14.9271297886953, > 14.7405186421119, 15.0047910398043, 14.7051463678038, > 14.8325933769599, 12.9854861991046, 13.4203550220891, > 15.399010832952, 15.4064707685293, 15.0953970227926, > 15.0712109416537, 15.7587957644032, 15.0013202225009, > 15.7608498673217, 14.7604080920677, 14.2478533598602, > 14.4140245098782, 14.7936541075062, 14.7684428120549, > 14.595607155062, 16.1507389488284, 16.4915712924337, > 14.490161446684, 14.721633263063, 14.4341721012904, > 15.8747652729112, 14.543333961671, 14.8633635585377, > 14.6696601802386, 13.3020676725265, 14.0190694293311, > 15.2168973938334, 12.6304946615056, 12.1972166931101, > 12.7960396088298, 14.4285564621952, 14.5308330346953, > 14.1496677436943, 14.0823985634278, 12.8407779235951, > 14.6543003749437, 14.3202364452416, 15.1723493709662, > 14.0744760007345, 14.8132801684508, 12.9183042336999, > 14.5225202325766, 13.742309436084)), .Names = c("cpdID", "time", > "treatment", > "expREP", "techREP", "Y")))
hadley wickham
2007-Mar-08 18:30 UTC
[R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work
> Hello R-users > The help I received from Petr helped me created this solution to my problems. > > t1<-with(mydata ,aggregate(mydata$Y, > list(mydata$time,mydata$treatment, mydata$expREP, mydata$techREP) , > median, na.rm=T)) ### find median by factors #### > > colnames(t1)<-c("time","treatment","expREP","techREP","Y50") ### column name ## > > newdata<-merge(mydata, t1, by.x= names(mydata)[2:5], > by.y=names(t1)[1:4], all=T) >Another way is to use the reshape package, http://had.co.nz/reshape library(reshape) molten <- melt(mydata, m="log2Abun") cast(molten, time + treatment + expREP + techREP ~ ., median) # You can also create many other "shapes" easily: cast(molten, expREP + techREP ~ time + treatment , median) cast(molten, expREP + techREP ~ time + treatment , median, margins=TRUE) Hadley