thr3ads.net - R help - [R] How to erase (replace) certain elements in the data.frame? [Apr 2011]

If this information is useful, please help other people find it:
Share via:

sneaffer

2011-Apr-24 00:37 UTC

[R] How to erase (replace) certain elements in the data.frame?

Hello R-world,
Please, help me to get round my little mess
I have a data.frame in which I'd rather like some values to be NA for the
future imputation process.

I've come up with the following piece of code:

random.del <- function (x, n.keeprows, del.percent){
  n.items <- ncol(x)
  k <- n.items*(del.percent/100)
  x.del <- x
  for (i in (n.keeprows+1):nrow(x)){
    j <- sample(1:n.items, k)
    x.del[i,j] <- NA
  }
  return (x.del)
}
 
The problems is that random.del turns out to be slow on huge samples.
Is there any other more effective/charming way to do the same?

Thanks,
Sergey

--
View this message in context:
http://r.789695.n4.nabble.com/How-to-erase-replace-certain-elements-in-the-data-frame-tp3470883p3470883.html
Sent from the R help mailing list archive at Nabble.com.

Thomas Levine

2011-Apr-24 06:35 UTC

head link

[R] How to erase (replace) certain elements in the data.frame?

This should do the same thing

random.del <- function (x, n.keeprows, del.percent){
? del<-function(col){
??? col[sample.int(length(col),length(col)*del.percent/100)]<-NA
??? col
? }
? change<-n.keeprows:nrow(x)
? x[change,]<-lapply(x[change,],del)
? x
}

This is faster because it's vectorized.

[1] "Mine"
   user  system elapsed
  0.004   0.000   0.002
[1] "Yours"
   user  system elapsed
  1.172   0.020   1.193

Tom

On Sat, Apr 23, 2011 at 8:37 PM, sneaffer <sneaffer at mail.ru>
wrote:>
> Hello R-world,
> Please, help me to get round my little mess
> I have a data.frame in which I'd rather like some values to be NA for
the
> future imputation process.
>
> I've come up with the following piece of code:
>
> random.del <- function (x, n.keeprows, del.percent){
> ?n.items <- ncol(x)
> ?k <- n.items*(del.percent/100)
> ?x.del <- x
> ?for (i in (n.keeprows+1):nrow(x)){
> ? ?j <- sample(1:n.items, k)
> ? ?x.del[i,j] <- NA
> ?}
> ?return (x.del)
> }
>
> The problems is that random.del turns out to be slow on huge samples.
> Is there any other more effective/charming way to do the same?
>
> Thanks,
> Sergey
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/How-to-erase-replace-certain-elements-in-the-data-frame-tp3470883p3470883.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Joshua Wiley

2011-Apr-24 07:40 UTC

head link

[R] How to erase (replace) certain elements in the data.frame?

Hi Sergey,

This is not an answer to your exact question, but can you use a
matrix?  If you can use a matrix instead of a data frame, you should
get a considerable performance boost.  Even for very large matrices
(at least on my system), it is fast enough I find it hard to believe
it is a bottle neck in the overall imputation process.  For example,
for a 1000 by 100 object
as a data frame:> system.time(r0 <- random.del(mat, 100, 50))   user  system elapsed
   1.09    0.02    1.12
and as a matrix:> system.time(r0 <- random.del(mat, 100, 50))   user  system elapsed
   0.02    0.00    0.01

Beyond that, for very large objects, this revision gives a slight
(i.e., around 5 seconds for 1 million by 100 column object on my
system) performance increase, which is small for matrices and
completely dwarfed by other bottlenecks for data frames, at the cost
of readability/flexibility:

rdel <- function (x, n.keeprows, del.percent){
  n.items <- ncol(x)
  k <- as.integer(n.items * del.percent / 100)
  cols <- 1:n.items
  lcols <- length(cols)
  for (i in (n.keeprows+1):nrow(x)){
    j <- cols[.Internal(sample(lcols, k, FALSE, NULL))]
    x[i,j] <- NA
  }
  return(x)
}

If you must use a data frame, you can gain some performance increase
(for a 10000 by 100 data frame, it takes about 30 seconds on my system
versus 40 for your original function) by using:

random.del2 <- function (x, n.keeprows, del.percent){
  n.items <- ncol(x)
  k <- n.items*(del.percent/100)
  for (i in (n.keeprows+1):nrow(x)){
    j <- sample(1:n.items, k)
    `[<-.data.frame`(x, i, j, NA)
  }
  return(x)
}

which basically just saves R the trouble of figuring out which
assignment method to use.  Of course the problem is that your function
becomes extremely specialized.  If you pass anything to it but a data
frame, good things will not happen.

Cheers,

Josh

On Sat, Apr 23, 2011 at 5:37 PM, sneaffer <sneaffer at mail.ru>
wrote:> Hello R-world,
> Please, help me to get round my little mess
> I have a data.frame in which I'd rather like some values to be NA for
the
> future imputation process.
>
> I've come up with the following piece of code:
>
> random.del <- function (x, n.keeprows, del.percent){
> ?n.items <- ncol(x)
> ?k <- n.items*(del.percent/100)
> ?x.del <- x
> ?for (i in (n.keeprows+1):nrow(x)){
> ? ?j <- sample(1:n.items, k)
> ? ?x.del[i,j] <- NA
> ?}
> ?return (x.del)
> }
>
> The problems is that random.del turns out to be slow on huge samples.
> Is there any other more effective/charming way to do the same?
>
> Thanks,
> Sergey
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/How-to-erase-replace-certain-elements-in-the-data-frame-tp3470883p3470883.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Apr 2011 - How to erase (replace) certain elements in the data.frame?

[R] How to erase (replace) certain elements in the data.frame?

[R] How to erase (replace) certain elements in the data.frame?

[R] How to erase (replace) certain elements in the data.frame?

Possibly Parallel Threads