Hello, One more question.. I have the data.frame "pop": xloc yloc gonad ind Ene W Area 1 23 20 516.74 1 0.02 20.21 1 2 23 20 1143.20 1 0.02 20.21 1 3 23 20 250.00 1 0.02 20.21 1 4 22 15 251.98 1 0.02 18.69 2 5 22 15 598.08 1 0.02 18.69 2 6 21 19 250.00 1 0.02 20.21 3 7 22 20 251.98 1 0.02 18.69 4 8 22 20 598.08 1 0.02 18.69 4 and I need to extract 50% (or rounded) of the rows for each Area (from Area 1 to 3 only): xloc yloc gonad ind Ene W Area 1 23 20 516.74 1 0.02 20.21 1 2 23 20 1143.20 1 0.02 20.21 1 4 22 15 251.98 1 0.02 18.69 2 6 21 19 250.00 1 0.02 20.21 3 I did this within a loop, but considering my data.frame has more than 10,000 rows and within other loops it makes my code run forever! Any hints? Thanks!! Nico
Try this: subset(pop, (ave(Area, Area, FUN = length) == 1 | ave(Area, Area, FUN = function(x)cumsum(prop.table(x))) < 0.7 & Area %in% 1:3)) On Fri, Mar 18, 2011 at 2:48 PM, Nicolas Gutierrez <nicolasg at uw.edu> wrote:> Hello, > > One more question.. I have the data.frame "pop": > > ? ?xloc yloc ?gonad ?ind ? ?Ene ? ?W ? Area > 1 ? ?23 ?20 ? 516.74 ? 1 ? ? 0.02 20.21 ?1 > 2 ? ?23 ?20 ?1143.20 ? 1 ? ? 0.02 20.21 ?1 > 3 ? ?23 ?20 ? 250.00 ? 1 ? ? 0.02 20.21 ?1 > 4 ? ?22 ?15 ? 251.98 ? 1 ? ? 0.02 18.69 ?2 > 5 ? ?22 ?15 ? 598.08 ? 1 ? ? 0.02 18.69 ?2 > 6 ? ?21 ?19 ? 250.00 ? 1 ? ? 0.02 20.21 ?3 > 7 ? ?22 ?20 ? 251.98 ? 1 ? ? 0.02 18.69 ?4 > 8 ? ?22 ?20 ? 598.08 ? 1 ? ? 0.02 18.69 ?4 > > and I need to extract 50% (or rounded) of the rows for each Area (from Area > 1 to 3 only): > > ? ?xloc yloc ?gonad ?ind ? ?Ene ? ?W ? Area > 1 ? ?23 ?20 ? 516.74 ? 1 ? ? 0.02 20.21 ?1 > 2 ? ?23 ?20 ?1143.20 ? 1 ? ? 0.02 20.21 ?1 > 4 ? ?22 ?15 ? 251.98 ? 1 ? ? 0.02 18.69 ?2 > 6 ? ?21 ?19 ? 250.00 ? 1 ? ? 0.02 20.21 ?3 > > I did this within a loop, but considering my data.frame has more than 10,000 > rows and within other loops it makes my code run forever! Any hints? > Thanks!! > > Nico > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
On Fri, Mar 18, 2011 at 10:48:44AM -0700, Nicolas Gutierrez wrote:> Hello, > > One more question.. I have the data.frame "pop": > > xloc yloc gonad ind Ene W Area > 1 23 20 516.74 1 0.02 20.21 1 > 2 23 20 1143.20 1 0.02 20.21 1 > 3 23 20 250.00 1 0.02 20.21 1 > 4 22 15 251.98 1 0.02 18.69 2 > 5 22 15 598.08 1 0.02 18.69 2 > 6 21 19 250.00 1 0.02 20.21 3 > 7 22 20 251.98 1 0.02 18.69 4 > 8 22 20 598.08 1 0.02 18.69 4 > > and I need to extract 50% (or rounded) of the rows for each Area (from > Area 1 to 3 only): > > xloc yloc gonad ind Ene W Area > 1 23 20 516.74 1 0.02 20.21 1 > 2 23 20 1143.20 1 0.02 20.21 1 > 4 22 15 251.98 1 0.02 18.69 2 > 6 21 19 250.00 1 0.02 20.21 3 > > I did this within a loop, but considering my data.frame has more than > 10,000 rows and within other loops it makes my code run forever! Any > hints? Thanks!!Hello. Let me use a data frame with one column only, but more rows. The following code contains a cycle over 1:3, but otherwise is vectorized. pop <- data.frame(Area=c(1,1,1,1,1,2,2,2,2,3,3,3,4,4)) final <- rep(FALSE, times=nrow(pop)) for (k in 1:3) { is.k <- pop$Area == k accept <- is.k & (cumsum(is.k) <= ceiling(sum(is.k)/2)) final <- final | accept } pop[final, , drop=FALSE] # "drop=" not needed, if there are more columns Hope this helps. Petr Savicky.
On Fri, Mar 18, 2011 at 10:48:44AM -0700, Nicolas Gutierrez wrote:> Hello, > > One more question.. I have the data.frame "pop": > > xloc yloc gonad ind Ene W Area > 1 23 20 516.74 1 0.02 20.21 1 > 2 23 20 1143.20 1 0.02 20.21 1 > 3 23 20 250.00 1 0.02 20.21 1 > 4 22 15 251.98 1 0.02 18.69 2 > 5 22 15 598.08 1 0.02 18.69 2 > 6 21 19 250.00 1 0.02 20.21 3 > 7 22 20 251.98 1 0.02 18.69 4 > 8 22 20 598.08 1 0.02 18.69 4 > > and I need to extract 50% (or rounded) of the rows for each Area (from > Area 1 to 3 only): > > xloc yloc gonad ind Ene W Area > 1 23 20 516.74 1 0.02 20.21 1 > 2 23 20 1143.20 1 0.02 20.21 1 > 4 22 15 251.98 1 0.02 18.69 2 > 6 21 19 250.00 1 0.02 20.21 3Let me suggest one more solution, which is a modification of the solution by Henrique Dallazuanna pop <- data.frame(Area=c(1,1,1,1,1,2,2,2,2,3,3,3,4,4)) g <- function(x) { seq(along=x) <= ceiling(length(x)/2) } subset(pop, ave(Area, Area, FUN=g) & Area %in% 1:3) Area 1 1 2 1 3 1 6 2 7 2 10 3 11 3 Hope this helps. Petr Savicky.
perfect, thanks Henrique! Nico On 3/18/2011 11:17 AM, Henrique Dallazuanna wrote:> Try this: > > > subset(pop, (ave(Area, Area, FUN = length) == 1 | ave(Area, Area, FUN > = function(x)cumsum(prop.table(x)))< 0.7& Area %in% 1:3)) > > > On Fri, Mar 18, 2011 at 2:48 PM, Nicolas Gutierrez<nicolasg at uw.edu> wrote: >> Hello, >> >> One more question.. I have the data.frame "pop": >> >> xloc yloc gonad ind Ene W Area >> 1 23 20 516.74 1 0.02 20.21 1 >> 2 23 20 1143.20 1 0.02 20.21 1 >> 3 23 20 250.00 1 0.02 20.21 1 >> 4 22 15 251.98 1 0.02 18.69 2 >> 5 22 15 598.08 1 0.02 18.69 2 >> 6 21 19 250.00 1 0.02 20.21 3 >> 7 22 20 251.98 1 0.02 18.69 4 >> 8 22 20 598.08 1 0.02 18.69 4 >> >> and I need to extract 50% (or rounded) of the rows for each Area (from Area >> 1 to 3 only): >> >> xloc yloc gonad ind Ene W Area >> 1 23 20 516.74 1 0.02 20.21 1 >> 2 23 20 1143.20 1 0.02 20.21 1 >> 4 22 15 251.98 1 0.02 18.69 2 >> 6 21 19 250.00 1 0.02 20.21 3 >> >> I did this within a loop, but considering my data.frame has more than 10,000 >> rows and within other loops it makes my code run forever! Any hints? >> Thanks!! >> >> Nico >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > >