Rita Carreira
2011-Mar-18 22:35 UTC
[R] How do I delete multiple blank variables from a data frame?
Dear List Members,I have 55 data frames, each of which with 272 variables and 267 observations. Some of these variables are blanks but the blanks are not the same for every data frame. I would like to write a procedure in which I import a data frame, see which variables are blank, and delete those variables. My data frames have variables named P1 to P136 and Q1 to Q136. I have a couple of questions regarding this issue: 1) Is a loop an efficient way to address this problem? If not, what are my alternatives and how do I implement them?2) I have been playing with a single data frame to try to figure out a way of having R go through the columns and see which ones it should delete. I have figured out how to delete rows with missing data (newdata <- na.omit(olddata)) but how do I do it for columns??? Thank you very much for your help and have a great weekend! Rita ________________________________________ "If you think education is expensive, try ignorance"--Derek Bok [[alternative HTML version deleted]]
Joshua Wiley
2011-Mar-19 01:35 UTC
[R] How do I delete multiple blank variables from a data frame?
Hi Rita, This is far from the most efficient or elegant way, but: ## two column data frame, one all NAs d <- data.frame(1:10, NA) ## use apply to create logical vector and subset d d[, apply(d, 2, function(x) !all(is.na(x)))] I am just apply()ing to each column (the 2) of d, the function !all(is.na(x)) which will return FALSE if all of x is missing and TRUE otherwise. The result is a logical vector the same length as the number of columns in d that is used to subset only the d columns with at least some non-missing values. For documentation see: ?apply ?is.na ?all ?"[" ?Logic HTH, Josh On Fri, Mar 18, 2011 at 3:35 PM, Rita Carreira <ritacarreira at hotmail.com> wrote:> > Dear List Members,I have 55 data frames, each of which with 272 variables and 267 observations. Some of these variables are blanks but the blanks are not the same for every data frame. I would like to write a procedure in which I import a data frame, see which variables are blank, and delete those variables. My data frames have variables named P1 to P136 and Q1 to Q136. > I have a couple of questions regarding this issue: > 1) Is a loop an efficient way to address this problem? If not, what are my alternatives and how do I implement them?2) I have been playing with a single data frame to try to figure out a way of having R go through the columns and see which ones it should delete. I have figured out how to delete rows with missing data (newdata <- na.omit(olddata)) but how do I do it for columns??? > Thank you very much for your help and have a great weekend! > Rita ________________________________________ "If you think education is expensive, try ignorance"--Derek Bok > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Allan Engelhardt
2011-Mar-19 08:36 UTC
[R] How do I delete multiple blank variables from a data frame?
On 19/03/11 01:35, Joshua Wiley wrote:> Hi Rita, > > This is far from the most efficient or elegant way, but: > > ## two column data frame, one all NAs > d<- data.frame(1:10, NA) > ## use apply to create logical vector and subset d > d[, apply(d, 2, function(x) !all(is.na(x)))]This works, but apply converts d to a matrix which is not needed, so try d[, sapply(d, function(x) !all(is.na(x)))] if performance is an issue (apply is about 3x slower on your test data frame d, more for larger data frames). For the related problem of removing columns of constant-or-na values, the best I could come up with is zv.1 <- function(x) { ## The literal approach y <- var(x, na.rm = TRUE) return(is.na(y) || y == 0) } sapply(train, zv.1) See http://www.cybaea.net/Blogs/Data/R-Eliminating-observed-values-with-zero-variance.html for the benchmarks. Allan> I am just apply()ing to each column (the 2) of d, the function > !all(is.na(x)) which will return FALSE if all of x is missing and TRUE > otherwise. The result is a logical vector the same length as the > number of columns in d that is used to subset only the d columns with > at least some non-missing values. For documentation see: > > ?apply > ?is.na > ?all > ?"[" > ?Logic > > HTH, > > Josh > > On Fri, Mar 18, 2011 at 3:35 PM, Rita Carreira<ritacarreira at hotmail.com> wrote: >> Dear List Members,I have 55 data frames, each of which with 272 variables and 267 observations. Some of these variables are blanks but the blanks are not the same for every data frame. I would like to write a procedure in which I import a data frame, see which variables are blank, and delete those variables. My data frames have variables named P1 to P136 and Q1 to Q136. >> I have a couple of questions regarding this issue: >> 1) Is a loop an efficient way to address this problem? If not, what are my alternatives and how do I implement them?2) I have been playing with a single data frame to try to figure out a way of having R go through the columns and see which ones it should delete. I have figured out how to delete rows with missing data (newdata<- na.omit(olddata)) but how do I do it for columns??? >> Thank you very much for your help and have a great weekend! >> Rita ________________________________________ "If you think education is expensive, try ignorance"--Derek Bok >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >
Reasonably Related Threads
- What does class "call" mean? How do I make class "formula" into a "call"?
- Function for deleting variables with >=50% missing obs from a data frame
- Subsetting a data frame by dropping correlated variables
- df with max function applied to 6 lags of a variable?!?
- Package mice: Error in if (meth[j] != "") { : argument is of length zero