Hello R Community, I don't post to these things often so excuse me if I stumble on my forum etiquette. This is a complex problem for me, which may require two forum entries, but I will try my best to be concise. Also, I am a self taught coder, so if my code is not to convention, your constructive criticism is always welcome. I need to split up a data frame by participant (gpsarc - factor), status (testpo - factor), and by date (LocalDate), then sum the distance (Dist_Bef_m - numeric) records for each date and average them across each status state for each participant. Each participant has several records for each date and for at least 1 of 3 different possible status types (max 3). In the end, I want a table with participant number and the status state as my column headings with means for each status state under their appropriate heading (see example below). I am confident I made this way more complicated than it needs to be. I really appreciate any help you can offer. Here is my relevant coding so far: s1 <- split(data[,c(4,10,20,42)], data$gpsarc) for(i in 1:length(s1)) s1[[i]] <- split(s1[[i]],s1[[i]]$testpo) s2 <- vector("list", length(s1)) for(i in 1:length(s2)) s2[[i]] <- ldply(s1[[i]], function(x) { if(nrow(x) == 0) #### if one status state does not exist, but still accounted in the split sublist because its a factor, I would get an error, so I added this If/Else portion to remove those entries with no records { remove(x) } else { by(x, x[["LocalDate"]], function(x1) { sum(x1[["Dist_Bef_m"]]) }) } }) s3 <- vector("list", length(s2)) for(i in 1:length(s3)) s3[[i]] <- data.frame(mean = apply(s2[[i]][,-1],1,mean,na.rm=TRUE), row.names = as.character(s2[[i]][,1])) here is a sample of the s3 result: [[1]] mean 2 12533.2 [[2]] mean 2 26300.96 3 25313.93 [[3]] mean 1 48489.15 3 27398.23 [[4]] mean 1 34783.97 [[5]] mean 1 21293.19 2 21962.41 3 18272.67 ##### I really want it to look like this: ppt 1 2 3 1 NA 12533.2 NA 2 NA 26300.96 25313.93 3 48489.15 NA 27398.23 4 34783.97 NA NA 5 21293.19 21962.41 18272.67 [[alternative HTML version deleted]]
Ivan Calandra
2012-Mar-21 16:28 UTC
[R] Forloop/ifelse program problem and list of dataframes
Hi Ian, I haven't read in details because you don't provide a reproducible example (see ?dput) but you might want to take a look at the aggregate() and doBy::summaryBy() functions. HTH, Ivan -- Ivan CALANDRA Universit? de Bourgogne UMR CNRS/uB 6282 Biog?osciences 6 Boulevard Gabriel 21000 Dijon, FRANCE +33(0)3.80.39.63.06 ivan.calandra at u-bourgogne.fr http://biogeosciences.u-bourgogne.fr/calandra Le 21/03/12 16:44, Ian Craig a ?crit :> Hello R Community, > > I don't post to these things often so excuse me if I stumble on my forum > etiquette. This is a complex problem for me, which may require two forum > entries, but I will try my best to be concise. Also, I am a self taught > coder, so if my code is not to convention, your constructive criticism is > always welcome. > > I need to split up a data frame by participant (gpsarc - factor), status > (testpo - factor), and by date (LocalDate), then sum the distance > (Dist_Bef_m - numeric) records for each date and average them across each > status state for each participant. Each participant has several records > for each date and for at least 1 of 3 different possible status types (max > 3). In the end, I want a table with participant number and the status > state as my column headings with means for each status state under their > appropriate heading (see example below). I am confident I made this way > more complicated than it needs to be. I really appreciate any help you can > offer. > > Here is my relevant coding so far: > > s1<- split(data[,c(4,10,20,42)], data$gpsarc) > for(i in 1:length(s1)) > s1[[i]]<- split(s1[[i]],s1[[i]]$testpo) > s2<- vector("list", length(s1)) > for(i in 1:length(s2)) > s2[[i]]<- ldply(s1[[i]], > function(x) > { > if(nrow(x) == 0) #### if one status state does not exist, but still > accounted in the split sublist because its a factor, I would get an error, > so I added this If/Else portion to remove those entries with no records > { > remove(x) > } > else > { > by(x, x[["LocalDate"]], > function(x1) > { > sum(x1[["Dist_Bef_m"]]) > }) > } > }) > > s3<- vector("list", length(s2)) > for(i in 1:length(s3)) > s3[[i]]<- data.frame(mean = apply(s2[[i]][,-1],1,mean,na.rm=TRUE), > row.names = as.character(s2[[i]][,1])) > > here is a sample of the s3 result: > > [[1]] > mean > 2 12533.2 > > [[2]] > mean > 2 26300.96 > 3 25313.93 > > [[3]] > mean > 1 48489.15 > 3 27398.23 > > [[4]] > mean > 1 34783.97 > > [[5]] > mean > 1 21293.19 > 2 21962.41 > 3 18272.67 > > ##### I really want it to look like this: > > ppt 1 2 3 > 1 NA 12533.2 NA > 2 NA 26300.96 25313.93 > 3 48489.15 NA 27398.23 > 4 34783.97 NA NA > 5 21293.19 21962.41 18272.67 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >