ivo welch
2011-Jul-24 17:07 UTC
[R] split data frame temporary and work with only part of it?
dear R wizards: I have a large data frame, a million rows, 40 columns. In this data frame, there are some (about 100,000) rows which I want to recompute (update), while I want to leave others just as is. this is based on a condition that I need to compute, based on what is in a few of the columns. what is the right R way to do this? I could subset out the rows that I want to recompute into a new data frame (A), subset out the rows I don't want to recompute (B), operate on the first data frame (A), then rbind the two (A and B) back together and resort into original order. is this the recommended way? sincerely, /iaw ---- Ivo Welch (ivo.welch at gmail.com)
Peter Ehlers
2011-Jul-24 18:53 UTC
[R] split data frame temporary and work with only part of it?
On 2011-07-24 10:07, ivo welch wrote:> dear R wizards: I have a large data frame, a million rows, 40 > columns. In this data frame, there are some (about 100,000) rows > which I want to recompute (update), while I want to leave others just > as is. this is based on a condition that I need to compute, based on > what is in a few of the columns. what is the right R way to do this? > > I could subset out the rows that I want to recompute into a new data > frame (A), subset out the rows I don't want to recompute (B), operate > on the first data frame (A), then rbind the two (A and B) back > together and resort into original order. is this the recommended way?Can't you just make an index vector of the rows that will be changed, then make your new data frame for these rows, then change the original data frame by indexing the rows in your index vector? set.seed(1) dat <- data.frame(x=20:1, y=sample(20)) idx <- which(dat$y > 17) dat2 <- data.frame(x=c(999,-889,777), y=c(-44,NA,0)) dat[idx, ] <- dat2 dat Peter Ehlers> > sincerely, > > /iaw > ---- > Ivo Welch (ivo.welch at gmail.com) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Daniel Malter
2011-Jul-24 19:55 UTC
[R] split data frame temporary and work with only part of it?
My recommendation would be to not "subset out" the data, because you are introducing a potential source of error when binding the new data back together with the old data. Preferably, I would work on selecting subsets of the dataset using indices (as suggested in the previous post) and just do the computations for these subsets without separating the datasets. Alternatively, you can split() the data, do your computations, and later unsplit() the data. HTH, Daniel ivo welch wrote:> > dear R wizards: I have a large data frame, a million rows, 40 > columns. In this data frame, there are some (about 100,000) rows > which I want to recompute (update), while I want to leave others just > as is. this is based on a condition that I need to compute, based on > what is in a few of the columns. what is the right R way to do this? > > I could subset out the rows that I want to recompute into a new data > frame (A), subset out the rows I don't want to recompute (B), operate > on the first data frame (A), then rbind the two (A and B) back > together and resort into original order. is this the recommended way? > > sincerely, > > /iaw > ---- > Ivo Welch (ivo.welch at gmail.com) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- View this message in context: http://r.789695.n4.nabble.com/split-data-frame-temporary-and-work-with-only-part-of-it-tp3690576p3690818.html Sent from the R help mailing list archive at Nabble.com.