Hello I would like to delete some values at random in a data frame. Does anyone know how I could do? With best regards Caroline
Caroline, You probably want to look at ?sample. Use sample to choose the rows for deletion then use: df.new = df[-sampled,] Sean On Mar 1, 2005, at 9:04 AM, Caroline TRUNTZER wrote:> Hello > I would like to delete some values at random in a data frame. Does > anyone know how I could do? > With best regards > Caroline > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
On Tue, 2005-03-01 at 15:04 +0100, Caroline TRUNTZER wrote:> Hello > I would like to delete some values at random in a data frame. Does > anyone know how I could do? > With best regards > CarolineThe basic process is to randomly select row indices from the possible number of rows. If your data frame is 'df' and you want to randomly delete 10 rows: df.new <- df[-sample(1:nrow(df), 10), ] The sample() function in this case randomly selects 10 values in the range 1:nrow(df). Using the '-' then removes these rows in the subsetting process, returning the new, smaller, data frame in df.new. See ?sample and ?Extract for more information. HTH, Marc Schwartz
On Tue, 01 Mar 2005 15:04:17 +0100, Caroline TRUNTZER <caroline.truntzer at chu-lyon.fr> wrote :>Hello >I would like to delete some values at random in a data frame. Does >anyone know how I could do?Assuming your data.frame is named df: To delete index i, use df[-i, ] (i.e. the row selection is the negative of the index you don't want). This works for vectors i, so df[-sample(1:N, n), ] would delete a random selection of n rows from N. Duncan Murdoch
Caroline TRUNTZER wrote:> Hello > I would like to delete some values at random in a data frame. Does > anyone know how I could do?What about sample()-ing (if I understand "at random" correctly) a certain number of values from 1:nrow(data) and using the result as negative index the data.frame? Uwe Ligges> With best regards > Caroline > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Hello, d <- data.frame(a=c(2,3,4), b=c(2,4,1), c=c(3,5,6)) ## one NA s.r <- sample(dim(d)[1], 1) s.c <- sample(dim(d)[2], 1) d.na <- d d.na[s.r, s.c] <- NA d.na # Here a matrix is more comfortable by using sample. For multiple NA, you should write a loop, but to choose e.g. exact 4 values, it might be, that one value is more than one times chosen. For this purpose you can search at your NA by using which(is.na(d), arr.ind=TRUE) and count only if those index is not a NA at a while loop. This is not an elegant way, but probably it helps you a little bit. Best, Matthias> -----Urspr?ngliche Nachricht----- > Von: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] Im Auftrag von > Caroline TRUNTZER > Gesendet: Dienstag, 01. M?rz 2005 15:04 > An: R-help at stat.math.ethz.ch > Betreff: [R] Help : delete at random > > > Hello > I would like to delete some values at random in a data frame. > Does anyone know how I could do? With best regards Caroline > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read > the posting guide! http://www.R-project.org/posting-guide.html >
Might be slightly more interesting. If we want to generate values which are completely missing at random, then we can just simply sample all available index of a 2-d array. # simulate data # set.seed(1) # for reproducibility m <- matrix( rnorm(12), nr=4, nc=3 ) m [,1] [,2] [,3] [1,] -0.6264538 0.3295078 0.5757814 [2,] 0.1836433 -0.8204684 -0.3053884 [3,] -0.8356286 0.4874291 1.5117812 [4,] 1.5952808 0.7383247 0.3898432 indices <- expand.grid( row=1:nrow(m), col=1:ncol(m) ) # generate all possible indices N <- ncol(m)*nrow(m) # number of total elements Now suppose you want to generate 25% missing values, then k <- round( 0.25 * N ) w <- as.matrix( indices[ sample( 1:N, k ), ] ) w # shows the row and column numbers that will be imputed row col 4 4 1 5 1 2 1 1 1 m[ w ] <- NA # impute NAs m [,1] [,2] [,3] [1,] NA NA 0.5757814 [2,] 0.1836433 -0.8204684 -0.3053884 [3,] -0.8356286 0.4874291 1.5117812 [4,] NA 0.7383247 0.3898432 Regards, Adai On Tue, 2005-03-01 at 15:30 +0100, Uwe Ligges wrote:> Caroline TRUNTZER wrote: > > Hello > > I would like to delete some values at random in a data frame. Does > > anyone know how I could do? > > What about sample()-ing (if I understand "at random" correctly) a > certain number of values from 1:nrow(data) and using the result as > negative index the data.frame? > > Uwe Ligges > > > > With best regards > > Caroline > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >