Nathan S. Watson-Haigh
2009-Mar-26 01:48 UTC
[R] Splitting Area under curve into equal portions
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have some data generated as follows: <code> n <- 2000 work <- vector() for(x in 1:n) { work[x] <- sum(1:(n-x+1)) } plot(work) </code> What I want to do - ----------------- I want to split work into a number of unequal chunks such that the sum of the values in each chunk is approximately equal. The numbers in "work" are proportional to the amount of work to be performed for each value of x by a function I've written. i.e. For each value of x, there are work[x] * y calculations to be performed (where y is a constant). I've written a parallel version of my function where I simply assign z number of x values to each slave. This is not ideal, since a slave that gets the 1:z smallest values of x will take longer to compute than the (n-z+1):n set of x values. For example, if I have 4 slaves available: slave 1 processes x in 1:500 slave 2 processes x in 501:1000 slave 3 processes x in 1001:1500 slave 4 processes x in 1501:2000 This means the total work performed by each slave is: slave 1 sum(work[1:500]) = 771708500 slave 2 sum(work[501:1000]) = 396458500 slave 3 sum(work[1001:1500]) = 146208500 slave 4 sum(work[1501:2000]) = 20958500 Manually plitting work into chunks where the sum of the values for the chunks is approximately equal, I get the following: sum(work[1:184]) [1] 335533384> sum(work[185:415])[1] 334897871> sum(work[416:745])[1] 334672085> sum(work[746:2000])[1] 330230660 I need to be able to do this automatically for any value of n and I think I should be able to do this by calculating the area under the curve and slicing it into equally sized regions, but don't really know how to get there from what I've said above! Cheers, Nathan - -- - -------------------------------------------------------- Dr. Nathan S. Watson-Haigh OCE Post Doctoral Fellow CSIRO Livestock Industries Queensland Bioscience Precinct St Lucia, QLD 4067 Australia Tel: +61 (0)7 3214 2922 Fax: +61 (0)7 3214 2900 Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html - -------------------------------------------------------- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknK3vkACgkQ9gTv6QYzVL68TACeI0gXqUXRr+W64iZaGe7olvov b9IAnjVENA6rn0r5QFv+Pu/poWjydEC7 =dgnE -----END PGP SIGNATURE-----
Hi Nathan, I am not sure that I understood what you need, and also I know that it is not a elegant solution, but may do the job. n <- 1991 work <- vector() for(x in 1:n) { work[x] <- sum(1:(n-x+1)) } plot(work) number.groups <- 5 last.i<-0 number.groups.list<-NULL for (i in 1:(number.groups-1)) { number.groups.list<-c(number.groups.list, rep(i, round(length(work)/number.groups,0))) } number.groups.list<-c(number.groups.list, rep(number.groups, (length(work)-length(number.groups.list)) )) aggregate(work, list(number.groups.list), sum) plot(work, col=number.groups.list) Regards a lot, miltinho brazil On Wed, Mar 25, 2009 at 9:48 PM, Nathan S. Watson-Haigh <nathan.watson-haigh@csiro.au> wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I have some data generated as follows: > > <code> > n <- 2000 > work <- vector() > for(x in 1:n) { > work[x] <- sum(1:(n-x+1)) > } > plot(work) > </code> > > What I want to do > - ----------------- > I want to split work into a number of unequal chunks such that the sum of > the > values in each chunk is approximately equal. > > > > The numbers in "work" are proportional to the amount of work to be > performed for > each value of x by a function I've written. i.e. For each value of x, there > are > work[x] * y calculations to be performed (where y is a constant). > > I've written a parallel version of my function where I simply assign z > number of > x values to each slave. This is not ideal, since a slave that gets the 1:z > smallest values of x will take longer to compute than the (n-z+1):n set of > x > values. For example, if I have 4 slaves available: > > slave 1 processes x in 1:500 > slave 2 processes x in 501:1000 > slave 3 processes x in 1001:1500 > slave 4 processes x in 1501:2000 > > This means the total work performed by each slave is: > > slave 1 sum(work[1:500]) = 771708500 > slave 2 sum(work[501:1000]) = 396458500 > slave 3 sum(work[1001:1500]) = 146208500 > slave 4 sum(work[1501:2000]) = 20958500 > > Manually plitting work into chunks where the sum of the values for the > chunks is > approximately equal, I get the following: > > sum(work[1:184]) > [1] 335533384 > > sum(work[185:415]) > [1] 334897871 > > sum(work[416:745]) > [1] 334672085 > > sum(work[746:2000]) > [1] 330230660 > > I need to be able to do this automatically for any value of n and I think I > should be able to do this by calculating the area under the curve and > slicing it > into equally sized regions, but don't really know how to get there from > what > I've said above! > > Cheers, > Nathan > > - -- > - -------------------------------------------------------- > Dr. Nathan S. Watson-Haigh > OCE Post Doctoral Fellow > CSIRO Livestock Industries > Queensland Bioscience Precinct > St Lucia, QLD 4067 > Australia > > Tel: +61 (0)7 3214 2922 > Fax: +61 (0)7 3214 2900 > Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html > - -------------------------------------------------------- > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEYEARECAAYFAknK3vkACgkQ9gTv6QYzVL68TACeI0gXqUXRr+W64iZaGe7olvov > b9IAnjVENA6rn0r5QFv+Pu/poWjydEC7 > =dgnE > -----END PGP SIGNATURE----- > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]