Hello R Community,
I don't post to these things often so excuse me if I stumble on my forum
etiquette. This is a complex problem for me, which may require two forum
entries, but I will try my best to be concise. Also, I am a self taught
coder, so if my code is not to convention, your constructive criticism is
always welcome.
I need to split up a data frame by participant (gpsarc - factor), status
(testpo - factor), and by date (LocalDate), then sum the distance
(Dist_Bef_m - numeric) records for each date and average them across each
status state for each participant. Each participant has several records
for each date and for at least 1 of 3 different possible status types (max
3). In the end, I want a table with participant number and the status
state as my column headings with means for each status state under their
appropriate heading (see example below). I am confident I made this way
more complicated than it needs to be. I really appreciate any help you can
offer.
Here is my relevant coding so far:
s1 <- split(data[,c(4,10,20,42)], data$gpsarc)
for(i in 1:length(s1))
s1[[i]] <- split(s1[[i]],s1[[i]]$testpo)
s2 <- vector("list", length(s1))
for(i in 1:length(s2))
s2[[i]] <- ldply(s1[[i]],
function(x)
{
if(nrow(x) == 0) #### if one status state does not exist, but still
accounted in the split sublist because its a factor, I would get an error,
so I added this If/Else portion to remove those entries with no records
{
remove(x)
}
else
{
by(x, x[["LocalDate"]],
function(x1)
{
sum(x1[["Dist_Bef_m"]])
})
}
})
s3 <- vector("list", length(s2))
for(i in 1:length(s3))
s3[[i]] <- data.frame(mean = apply(s2[[i]][,-1],1,mean,na.rm=TRUE),
row.names = as.character(s2[[i]][,1]))
here is a sample of the s3 result:
[[1]]
mean
2 12533.2
[[2]]
mean
2 26300.96
3 25313.93
[[3]]
mean
1 48489.15
3 27398.23
[[4]]
mean
1 34783.97
[[5]]
mean
1 21293.19
2 21962.41
3 18272.67
##### I really want it to look like this:
ppt 1 2 3
1 NA 12533.2 NA
2 NA 26300.96 25313.93
3 48489.15 NA 27398.23
4 34783.97 NA NA
5 21293.19 21962.41 18272.67
[[alternative HTML version deleted]]
Ivan Calandra
2012-Mar-21 16:28 UTC
[R] Forloop/ifelse program problem and list of dataframes
Hi Ian, I haven't read in details because you don't provide a reproducible example (see ?dput) but you might want to take a look at the aggregate() and doBy::summaryBy() functions. HTH, Ivan -- Ivan CALANDRA Universit? de Bourgogne UMR CNRS/uB 6282 Biog?osciences 6 Boulevard Gabriel 21000 Dijon, FRANCE +33(0)3.80.39.63.06 ivan.calandra at u-bourgogne.fr http://biogeosciences.u-bourgogne.fr/calandra Le 21/03/12 16:44, Ian Craig a ?crit :> Hello R Community, > > I don't post to these things often so excuse me if I stumble on my forum > etiquette. This is a complex problem for me, which may require two forum > entries, but I will try my best to be concise. Also, I am a self taught > coder, so if my code is not to convention, your constructive criticism is > always welcome. > > I need to split up a data frame by participant (gpsarc - factor), status > (testpo - factor), and by date (LocalDate), then sum the distance > (Dist_Bef_m - numeric) records for each date and average them across each > status state for each participant. Each participant has several records > for each date and for at least 1 of 3 different possible status types (max > 3). In the end, I want a table with participant number and the status > state as my column headings with means for each status state under their > appropriate heading (see example below). I am confident I made this way > more complicated than it needs to be. I really appreciate any help you can > offer. > > Here is my relevant coding so far: > > s1<- split(data[,c(4,10,20,42)], data$gpsarc) > for(i in 1:length(s1)) > s1[[i]]<- split(s1[[i]],s1[[i]]$testpo) > s2<- vector("list", length(s1)) > for(i in 1:length(s2)) > s2[[i]]<- ldply(s1[[i]], > function(x) > { > if(nrow(x) == 0) #### if one status state does not exist, but still > accounted in the split sublist because its a factor, I would get an error, > so I added this If/Else portion to remove those entries with no records > { > remove(x) > } > else > { > by(x, x[["LocalDate"]], > function(x1) > { > sum(x1[["Dist_Bef_m"]]) > }) > } > }) > > s3<- vector("list", length(s2)) > for(i in 1:length(s3)) > s3[[i]]<- data.frame(mean = apply(s2[[i]][,-1],1,mean,na.rm=TRUE), > row.names = as.character(s2[[i]][,1])) > > here is a sample of the s3 result: > > [[1]] > mean > 2 12533.2 > > [[2]] > mean > 2 26300.96 > 3 25313.93 > > [[3]] > mean > 1 48489.15 > 3 27398.23 > > [[4]] > mean > 1 34783.97 > > [[5]] > mean > 1 21293.19 > 2 21962.41 > 3 18272.67 > > ##### I really want it to look like this: > > ppt 1 2 3 > 1 NA 12533.2 NA > 2 NA 26300.96 25313.93 > 3 48489.15 NA 27398.23 > 4 34783.97 NA NA > 5 21293.19 21962.41 18272.67 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >