Vickie S
2011-May-13 13:42 UTC
[R] Quick question: Omitting rows and cols with certain percents of missing values
Hi naive question. It is possible to get R command for omitting rows or cols with missing values present. But if i want to omit rows or cols with i.e . >20% missing values, I could´t find any package-based command, probably because it is too simple for anyone to do that manually, though not for me. Can anyone please help me ? - vickie [[alternative HTML version deleted]]
David Winsemius
2011-May-13 14:12 UTC
[R] Quick question: Omitting rows and cols with certain percents of missing values
On May 13, 2011, at 9:42 AM, Vickie S wrote:> > Hi > naive question. > It is possible to get R command for omitting rows or cols with > missing values present. > > But > if i want to omit rows or cols with i.e . >20% missing values, I > could?t find any package-based command, probably because it is too > simple for anyone to do that manually, though not for me. Can anyone > please help me ??is.na > str(fil) 'data.frame': 8 obs. of 5 variables: $ X1 : int 2 3 4 5 6 NA NA 6 $ X5 : int 6 7 NA NA NA NA NA NA $ X8 : int 9 NA NA NA NA NA NA NA $ X : logi NA NA NA NA NA NA ... $ X1.1: Factor w/ 6 levels "","2","3","5",..: 2 3 1 4 5 6 1 1 > is.na(fil) X1 X5 X8 X X1.1 [1,] FALSE FALSE FALSE TRUE FALSE [2,] FALSE FALSE TRUE TRUE FALSE [3,] FALSE TRUE TRUE TRUE FALSE [4,] FALSE TRUE TRUE TRUE FALSE [5,] FALSE TRUE TRUE TRUE FALSE [6,] TRUE TRUE TRUE TRUE FALSE [7,] TRUE TRUE TRUE TRUE FALSE [8,] FALSE TRUE TRUE TRUE FALSE > str(is.na(fil)) logi [1:8, 1:5] FALSE FALSE FALSE FALSE FALSE TRUE ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:5] "X1" "X5" "X8" "X" ... So is.na() applied to a dataframe will return a logical matrix. You can run your tests for percentages with apply() using appropriate margin arguments to generate logical indices for selection of rows or columns. -- David Winsemius, MD West Hartford, CT
John Kane
2011-May-13 14:15 UTC
[R] Quick question: Omitting rows and cols with certain percents of missing values
--- On Fri, 5/13/11, Vickie S <isvik at live.com> wrote:> From: Vickie S <isvik at live.com> > Subject: [R] Quick question: Omitting rows and cols with certain percents of missing values > To: r-help at r-project.org > Received: Friday, May 13, 2011, 9:42 AM > > Hi > naive question. > It is possible to get R command for omitting rows or col > with missing values present.http://tolstoy.newcastle.edu.au/R/help/04/11/6887.html Slightly adapted (mydata <- data.frame(matrix(c(1,2,3,4,5,6,7,8, NA),3))) mydata[apply(mydata, 1, function(x)!any(is.na(x))), , drop=TRUE]> But > if i want to omit rows or cols with i.e . >20% missing> values, > could?t find any package-based command, probably because > it is to > simple for anyone to do that manually, though not for me. > Can anyone > please help me ? > > - vickieI'd have to think about this. Hopefully a guru can come up with something quickly.
Jorge Ivan Velez
2011-May-13 14:32 UTC
[R] Quick question: Omitting rows and cols with certain percents of missing values
Hi Vickie, You might try the following: # some data set.seed(123) X <- matrix(rnorm(1000), ncol = 20) X[sample(1000, 100)] <- NA # excluding rows with NA >20% X[!rowMeans(is.na(X)) > 0.2, ] # excluding columns with NA >10% X[, !colMeans(is.na(X)) > 0.1] See ?is.na, ?rowMeans and ?colMeans for more information. HTH, Jorge On Fri, May 13, 2011 at 9:42 AM, Vickie S <> wrote:> > Hi > naive question. > It is possible to get R command for omitting rows or cols with missing > values present. > > But > if i want to omit rows or cols with i.e . >20% missing values, I > could´t find any package-based command, probably because it is too > simple for anyone to do that manually, though not for me. Can anyone > please help me ? > > - vickie > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
Peter Ehlers
2011-May-13 14:43 UTC
[R] Quick question: Omitting rows and cols with certain percents of missing values
On 2011-05-13 06:42, Vickie S wrote:> > Hi > naive question. > It is possible to get R command for omitting rows or cols with missing values present. > > But > if i want to omit rows or cols with i.e .>20% missing values, I > could?t find any package-based command, probably because it is too > simple for anyone to do that manually, though not for me. Can anyone > please help me ?Example: set.seed(2718) m <- matrix(sample(1:9, 100, TRUE), 10, 10) is.na(m) <- sample(100, 20) d <- as.data.frame(m) d[rowSums(is.na(d)) / nrow(d) <= 0.2,] d[colSums(is.na(d)) / ncol(d) <= 0.2,] Peter Ehlers