sneaffer
2011-Apr-24 00:37 UTC
[R] How to erase (replace) certain elements in the data.frame?
Hello R-world, Please, help me to get round my little mess I have a data.frame in which I'd rather like some values to be NA for the future imputation process. I've come up with the following piece of code: random.del <- function (x, n.keeprows, del.percent){ n.items <- ncol(x) k <- n.items*(del.percent/100) x.del <- x for (i in (n.keeprows+1):nrow(x)){ j <- sample(1:n.items, k) x.del[i,j] <- NA } return (x.del) } The problems is that random.del turns out to be slow on huge samples. Is there any other more effective/charming way to do the same? Thanks, Sergey -- View this message in context: http://r.789695.n4.nabble.com/How-to-erase-replace-certain-elements-in-the-data-frame-tp3470883p3470883.html Sent from the R help mailing list archive at Nabble.com.
Thomas Levine
2011-Apr-24 06:35 UTC
[R] How to erase (replace) certain elements in the data.frame?
This should do the same thing random.del <- function (x, n.keeprows, del.percent){ ? del<-function(col){ ??? col[sample.int(length(col),length(col)*del.percent/100)]<-NA ??? col ? } ? change<-n.keeprows:nrow(x) ? x[change,]<-lapply(x[change,],del) ? x } This is faster because it's vectorized. [1] "Mine" user system elapsed 0.004 0.000 0.002 [1] "Yours" user system elapsed 1.172 0.020 1.193 Tom On Sat, Apr 23, 2011 at 8:37 PM, sneaffer <sneaffer at mail.ru> wrote:> > Hello R-world, > Please, help me to get round my little mess > I have a data.frame in which I'd rather like some values to be NA for the > future imputation process. > > I've come up with the following piece of code: > > random.del <- function (x, n.keeprows, del.percent){ > ?n.items <- ncol(x) > ?k <- n.items*(del.percent/100) > ?x.del <- x > ?for (i in (n.keeprows+1):nrow(x)){ > ? ?j <- sample(1:n.items, k) > ? ?x.del[i,j] <- NA > ?} > ?return (x.del) > } > > The problems is that random.del turns out to be slow on huge samples. > Is there any other more effective/charming way to do the same? > > Thanks, > Sergey > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-erase-replace-certain-elements-in-the-data-frame-tp3470883p3470883.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley
2011-Apr-24 07:40 UTC
[R] How to erase (replace) certain elements in the data.frame?
Hi Sergey, This is not an answer to your exact question, but can you use a matrix? If you can use a matrix instead of a data frame, you should get a considerable performance boost. Even for very large matrices (at least on my system), it is fast enough I find it hard to believe it is a bottle neck in the overall imputation process. For example, for a 1000 by 100 object as a data frame:> system.time(r0 <- random.del(mat, 100, 50))user system elapsed 1.09 0.02 1.12 and as a matrix:> system.time(r0 <- random.del(mat, 100, 50))user system elapsed 0.02 0.00 0.01 Beyond that, for very large objects, this revision gives a slight (i.e., around 5 seconds for 1 million by 100 column object on my system) performance increase, which is small for matrices and completely dwarfed by other bottlenecks for data frames, at the cost of readability/flexibility: rdel <- function (x, n.keeprows, del.percent){ n.items <- ncol(x) k <- as.integer(n.items * del.percent / 100) cols <- 1:n.items lcols <- length(cols) for (i in (n.keeprows+1):nrow(x)){ j <- cols[.Internal(sample(lcols, k, FALSE, NULL))] x[i,j] <- NA } return(x) } If you must use a data frame, you can gain some performance increase (for a 10000 by 100 data frame, it takes about 30 seconds on my system versus 40 for your original function) by using: random.del2 <- function (x, n.keeprows, del.percent){ n.items <- ncol(x) k <- n.items*(del.percent/100) for (i in (n.keeprows+1):nrow(x)){ j <- sample(1:n.items, k) `[<-.data.frame`(x, i, j, NA) } return(x) } which basically just saves R the trouble of figuring out which assignment method to use. Of course the problem is that your function becomes extremely specialized. If you pass anything to it but a data frame, good things will not happen. Cheers, Josh On Sat, Apr 23, 2011 at 5:37 PM, sneaffer <sneaffer at mail.ru> wrote:> Hello R-world, > Please, help me to get round my little mess > I have a data.frame in which I'd rather like some values to be NA for the > future imputation process. > > I've come up with the following piece of code: > > random.del <- function (x, n.keeprows, del.percent){ > ?n.items <- ncol(x) > ?k <- n.items*(del.percent/100) > ?x.del <- x > ?for (i in (n.keeprows+1):nrow(x)){ > ? ?j <- sample(1:n.items, k) > ? ?x.del[i,j] <- NA > ?} > ?return (x.del) > } > > The problems is that random.del turns out to be slow on huge samples. > Is there any other more effective/charming way to do the same? > > Thanks, > Sergey > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-erase-replace-certain-elements-in-the-data-frame-tp3470883p3470883.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Reasonably Related Threads
- Faster way of binding multiple rows of data than rbind?
- How to Calculate Percentage of Data within certain SD of Mean
- generating a bar chart with two axis for co-linear variable
- a new-bie question about obtaining certain value from the print out
- How to test frequency independence (in a 2 by 2 table) with many missing values