Josh B
2009-Jan-18 14:55 UTC
[R] Deleting columns based on the number of non-blank observations
Hello, I have a dataset (named "x") with many (966) columns. What I would like to do is delete any columns that do not have at least 375 non-blank observations (i.e., the cells have some value in them besides NA). How can I do this? I have come up with the following code to _count_ the non-blank observations in each column, but how would I adapt this code to _delete_ columns from the dataset if they do not have at least 375 non-blank observations? lapply(x, function(d) { d.2<- na.omit(d) count<- length(d.2) } ) Many thanks in advance, Josh B. [[alternative HTML version deleted]]
jim holtman
2009-Jan-18 15:53 UTC
[R] Deleting columns based on the number of non-blank observations
Something like this should work: num <- apply(yourData, 2, function(x) sum(is.na(x)) < 375) yourData <- youData[, num] On Sun, Jan 18, 2009 at 9:55 AM, Josh B <joshb41 at yahoo.com> wrote:> Hello, > > I have a dataset (named "x") with many (966) columns. What I would like to do is delete any columns that do not have at least 375 non-blank observations (i.e., the cells have some value in them besides NA). > > How can I do this? I have come up with the following code to _count_ the non-blank observations in each column, but how would I adapt this code to _delete_ columns from the dataset if they do not have at least 375 non-blank observations? > > > > lapply(x, function(d) > { > d.2<- na.omit(d) > count<- length(d.2) > } > ) > > Many thanks in advance, > Josh B. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
David Winsemius
2009-Jan-18 16:01 UTC
[R] Deleting columns based on the number of non-blank observations
colSums(is,na(x) ) can replace your function and negative indexing can eliminate the unwanted columns: x[-(colSums(is.na(x)) > 375)] or equivalently: x[(colSums(is.na(x)) <= 375)] You could (destructively) assign the result to x if you are brave. -- David Winsemius On Jan 18, 2009, at 9:55 AM, Josh B wrote:> Hello, > > I have a dataset (named "x") with many (966) columns. What I would > like to do is delete any columns that do not have at least 375 non- > blank observations (i.e., the cells have some value in them besides > NA). > > How can I do this? I have come up with the following code to _count_ > the non-blank observations in each column, but how would I adapt > this code to _delete_ columns from the dataset if they do not have > at least 375 non-blank observations? > > > > lapply(x, function(d) > { > d.2<- na.omit(d) > count<- length(d.2) > } > ) > > Many thanks in advance, > Josh B. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.