Dear R-Help List, I have a question about data manipulation. I tried to make code myself but too much for me. I would greatly appreciate your help. I have data set consisting of site (from 1 to N1) and distance and there are several variables (1 to N2) collected from each sampling site. I am interested in looking at cumulative sums of each variable based on site and distance like below. Can anyone help create function to do the task? The cumulative sum outputs may not be necessarily combined with original data. It will be better to have only cumulative sum outputs and indicator variables of interest. site distance var1 ? var(n2) cu.sum.v1 ? cu.sum.v(n2) 1 10 1 0 1 0 1 20 0 1 1 1 1 30 1 2 2 3 1 40 3 3 5 6 1 50 1 4 6 10 2 10 1 1 1 1 2 20 2 1 3 2 2 30 4 1 7 3 2 40 0 1 7 4 2 50 1 1 8 5 ,,, ,,, ,,, ,,, ,,, n1 10 1 1 1 1 n1 20 1 0 2 1 n1 30 1 1 3 2 n1 40 1 0 4 2 n1 50 1 1 5 3 Thank you very much for reading and time!!! Steve Hong
Here is one way of doing it:> x <- read.table(textConnection("site distance var1 var2+ 1 10 1 0 + 1 20 0 1 + 1 30 1 2 + 1 40 3 3 + 1 50 1 4 + 2 10 1 1 + 2 20 2 1 + 2 30 4 1 + 2 40 0 1 + 2 50 1 1 + n1 10 1 1 + n1 20 1 0 + n1 30 1 1 + n1 40 1 0 + n1 50 1 1 "), header=TRUE)> closeAllConnections() > # split by site, compute cumsum of the data columns > x.1 <- lapply(split(x, x$site), function(.site){+ # get the cumsum of data rows (using 3:4 for example) + .cum <- do.call(cbind, lapply(.site[,3:4], cumsum)) + cbind(.site, .cum) + })> # put back together (you can change the column names > do.call(rbind, x.1)site distance var1 var2 var1 var2 1.1 1 10 1 0 1 0 1.2 1 20 0 1 1 1 1.3 1 30 1 2 2 3 1.4 1 40 3 3 5 6 1.5 1 50 1 4 6 10 2.6 2 10 1 1 1 1 2.7 2 20 2 1 3 2 2.8 2 30 4 1 7 3 2.9 2 40 0 1 7 4 2.10 2 50 1 1 8 5 n1.11 n1 10 1 1 1 1 n1.12 n1 20 1 0 2 1 n1.13 n1 30 1 1 3 2 n1.14 n1 40 1 0 4 2 n1.15 n1 50 1 1 5 3 On Wed, Jun 17, 2009 at 11:48 AM, SEUNG CHEON HONG <seunghong@wisc.edu>wrote:> Dear R-Help List, > > I have a question about data manipulation. I tried to make code myself but > too much for me. I would greatly appreciate your help. > > I have data set consisting of site (from 1 to N1) and distance and there > are several variables (1 to N2) collected from each sampling site. I am > interested in looking at cumulative sums of each variable based on site and > distance like below. > > Can anyone help create function to do the task? The cumulative sum outputs > may not be necessarily combined with original data. It will be better to > have only cumulative sum outputs and indicator variables of interest. > > site distance var1 … var(n2) cu.sum.v1 … > cu.sum.v(n2) > 1 10 1 0 1 0 > 1 20 0 1 1 1 > 1 30 1 2 2 3 > 1 40 3 3 5 6 > 1 50 1 4 6 10 > 2 10 1 1 1 1 > 2 20 2 1 3 2 > 2 30 4 1 7 3 > 2 40 0 1 7 4 > 2 50 1 1 8 5 > ,,, ,,, ,,, ,,, ,,, > n1 10 1 1 1 1 > n1 20 1 0 2 1 > n1 30 1 1 3 2 > n1 40 1 0 4 2 > n1 50 1 1 5 3 > > > Thank you very much for reading and time!!! > > Steve Hong > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]
Dear Steve, Using Jim Holtman's x data, you can also try the following for columns 3 and 4: a <- with(x, apply(x[,3:4], 2, tapply, site,function(x) cumsum(x))) x[,c('cvar1','cvar2')] <- do.call(cbind,lapply(a, function(x) do.call(c,x))) x HTH, Jorge On Wed, Jun 17, 2009 at 11:48 AM, SEUNG CHEON HONG <seunghong@wisc.edu>wrote:> Dear R-Help List, > > I have a question about data manipulation. I tried to make code myself but > too much for me. I would greatly appreciate your help. > > I have data set consisting of site (from 1 to N1) and distance and there > are several variables (1 to N2) collected from each sampling site. I am > interested in looking at cumulative sums of each variable based on site and > distance like below. > > Can anyone help create function to do the task? The cumulative sum outputs > may not be necessarily combined with original data. It will be better to > have only cumulative sum outputs and indicator variables of interest. > > site distance var1 … var(n2) cu.sum.v1 … > cu.sum.v(n2) > 1 10 1 0 1 0 > 1 20 0 1 1 1 > 1 30 1 2 2 3 > 1 40 3 3 5 6 > 1 50 1 4 6 10 > 2 10 1 1 1 1 > 2 20 2 1 3 2 > 2 30 4 1 7 3 > 2 40 0 1 7 4 > 2 50 1 1 8 5 > ,,, ,,, ,,, ,,, ,,, > n1 10 1 1 1 1 > n1 20 1 0 2 1 > n1 30 1 1 3 2 > n1 40 1 0 4 2 > n1 50 1 1 5 3 > > > Thank you very much for reading and time!!! > > Steve Hong > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]