james.foadi at diamond.ac.uk
2010-Jan-25 16:07 UTC
[R] summing a large, partitioned data frame
Dear R community, I'm trying to develop a fast way of summing specific rows of a large data frame. Here is an example of the kind of data frames I'm dealing with:> reflsH K L M/ISYM BATCH I SIGI 43247 1 0 5 21 79 61.44117 2.20553 1040 1 0 5 257 6 15.16316 0.54431 2324 1 0 5 257 5 46.76152 1.67858 31515 1 0 5 259 60 57.97305 2.08104 35158 1 0 5 259 61 3.15614 0.11329 51575 1 0 6 259 88 380.04477 8.08878 51846 1 0 6 259 89 624.90802 13.30038 28946 1 1 4 1 42 2517.79492 55.37144 23199 1 1 4 5 31 2525.67407 55.54472 23198 1 1 4 21 39 2519.44653 55.40777 ............................................ ............................................ I need to add up all I's with same H, K, L and M/ISYM. The new data frame coming out of this partial summing should look, in this case, like: H K L M/ISYM BATCH I SIGI 43247 1 0 5 21 79 61.44117 2.20553 1040 1 0 5 257 6 61.92468 0.54431 31515 1 0 5 259 60 61.12919 2.08104 51575 1 0 6 259 88 1004.95279 8.08878 28946 1 1 4 1 42 2517.79492 55.37144 23199 1 1 4 5 31 2525.67407 55.54472 23198 1 1 4 21 39 2519.44653 55.40777 ............................................ ............................................ Essentially I only add those I's with same H, K, L, M/ISYM and replace the sum in a unique row in the new data frame. In other words there's first a partition and then a sum. I have tried with a for loop, but it really takes too long. I was wondering whether anyone knows of a better and faster way of doing this operation. J Dr James Foadi PhD Membrane Protein Laboratory (MPL) Diamond Light Source Ltd Diamond House Harewell Science and Innovation Campus Chilton, Didcot Oxfordshire OX11 0DE Email : james.foadi at diamond.ac.uk Alt Email: j.foadi at imperial.ac.uk -- This e-mail and any attachments may contain confidential...{{dropped:8}}
check aggregate() (the examples are quite helpful) b On Mon, Jan 25, 2010 at 4:07 PM, <james.foadi at diamond.ac.uk> wrote:> Dear R community, > I'm trying to develop a fast way of summing specific rows of a large data frame. > Here is an example of the kind of data frames I'm dealing with: > >> refls > H K L M/ISYM BATCH I SIGI > 43247 1 0 5 21 79 61.44117 2.20553 > 1040 1 0 5 257 6 15.16316 0.54431 > 2324 1 0 5 257 5 46.76152 1.67858 > 31515 1 0 5 259 60 57.97305 2.08104 > 35158 1 0 5 259 61 3.15614 0.11329 > 51575 1 0 6 259 88 380.04477 8.08878 > 51846 1 0 6 259 89 624.90802 13.30038 > 28946 1 1 4 1 42 2517.79492 55.37144 > 23199 1 1 4 5 31 2525.67407 55.54472 > 23198 1 1 4 21 39 2519.44653 55.40777 > ............................................ > ............................................ > > I need to add up all I's with same H, K, L and M/ISYM. > The new data frame coming out of this partial summing should look, in this case, like: > > H K L M/ISYM BATCH I SIGI > 43247 1 0 5 21 79 61.44117 2.20553 > 1040 1 0 5 257 6 61.92468 0.54431 > 31515 1 0 5 259 60 61.12919 2.08104 > 51575 1 0 6 259 88 1004.95279 8.08878 > 28946 1 1 4 1 42 2517.79492 55.37144 > 23199 1 1 4 5 31 2525.67407 55.54472 > 23198 1 1 4 21 39 2519.44653 55.40777 > ............................................ > ............................................ > > > Essentially I only add those I's with same H, K, L, M/ISYM and replace the sum > in a unique row in the new data frame. In other words there's first a partition and then > a sum. > > I have tried with a for loop, but it really takes too long. > > I was wondering whether anyone knows of a better and faster way of doing this operation. > > > J > > > > Dr James Foadi PhD > Membrane Protein Laboratory (MPL) > Diamond Light Source Ltd > Diamond House > Harewell Science and Innovation Campus > Chilton, Didcot > Oxfordshire OX11 0DE > > Email : james.foadi at diamond.ac.uk > Alt Email: j.foadi at imperial.ac.uk > > -- > This e-mail and any attachments may contain confidential...{{dropped:8}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >