Dear All, I would like to make partial sums (or means or any other function) of the values in intervals along a sequence (spatial transect) where groups are defined. For instance: habitats<-rep(c("meadow","forest","meadow","pasture"),c(10,5,12,6)) observations<-rpois(length(habitats),2) transect<-data.frame(observations=observations,habitats=habitats) aggregate() is not suitable for my purpose because I want a result respecting the order of the habitats encountered although they may have the same name (and not pooling each group on each level of the factor created). For instance, the output of the ideal function mynicefunction() would be something as: mynicefunction(transect$observations, by=list(transect$habitats),sum) meadow 16 forest 9 meadow 21 pasture 17 and not aggregate(transect$observations,by=list(transect$habitats),sum) Group.1 x 1 forest 9 2 meadow 37 3 pasture 17 Did anybody hear about such a function already written in R? If no, any idea to make it simple and elegant to write? Cheers, Patrick Giraudoux
Create another variable that gives the run number and aggregate on both the habitat and run number removing the run number after aggregating: runno <- cumsum(c(TRUE, diff(as.numeric(transect[,2])) !=0)) aggregate(transect[,1], list(obs = transect[,2], runno = runno), sum)[,-2] This does not give the same as your example but I think there are some errors in your example output. On 2/26/06, Patrick Giraudoux <patrick.giraudoux at univ-fcomte.fr> wrote:> Dear All, > > I would like to make partial sums (or means or any other function) of > the values in intervals along a sequence (spatial transect) where groups > are defined. > > For instance: > > habitats<-rep(c("meadow","forest","meadow","pasture"),c(10,5,12,6)) > observations<-rpois(length(habitats),2) > transect<-data.frame(observations=observations,habitats=habitats) > > aggregate() is not suitable for my purpose because I want a result > respecting the order of the habitats encountered although they may have > the same name (and not pooling each group on each level of the factor > created). For instance, the output of the ideal function > mynicefunction() would be something as: > > mynicefunction(transect$observations, by=list(transect$habitats),sum) > meadow 16 > forest 9 > meadow 21 > pasture 17 > > and not > > aggregate(transect$observations,by=list(transect$habitats),sum) > Group.1 x > 1 forest 9 > 2 meadow 37 > 3 pasture 17 > > Did anybody hear about such a function already written in R? If no, any > idea to make it simple and elegant to write? > > Cheers, > > Patrick Giraudoux > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
On Sun, 26 Feb 2006, Patrick Giraudoux wrote:> Dear All, > > I would like to make partial sums (or means or any other function) of > the values in intervals along a sequence (spatial transect) where groups > are defined. > > For instance: > > habitats<-rep(c("meadow","forest","meadow","pasture"),c(10,5,12,6)) > observations<-rpois(length(habitats),2) > transect<-data.frame(observations=observations,habitats=habitats) > > aggregate() is not suitable for my purpose because I want a result > respecting the order of the habitats encountered although they may have > the same name (and not pooling each group on each level of the factor > created). For instance, the output of the ideal function > mynicefunction() would be something as: > > mynicefunction(transect$observations, by=list(transect$habitats),sum) > meadow 16 > forest 9 > meadow 21 > pasture 17 > > and not > > aggregate(transect$observations,by=list(transect$habitats),sum) > Group.1 x > 1 forest 9 > 2 meadow 37 > 3 pasture 17 > > Did anybody hear about such a function already written in R? If no, any > idea to make it simple and elegant to write?I got as far as: rle.habs <- rle(habitats) habitats1 <- rep(make.names(rle.habs$values, unique=TRUE), rle.habs$lengths) aggregate(observations,by=list(habitats1),sum) making an extra habitats vector with a unique label for each run. Since I don't know your seed, the results are not the same, but rle() is quite good for runs. Roger> > Cheers, > > Patrick Giraudoux > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no