Samir Benzerfa
2011-Oct-12 15:35 UTC
[R] exclude columns with at least three consecutive zeros
Hi everyone, I have a large data set with about 3'000 columns and I would like to exclude all columns which include three or more consecutive zeros (see below example). A further issue is that it should just jump NA values if any. How can I do this? In the below example R should exclude column C and D (since in D jumping the NA leaves three consecutive zeros). I would appreciate any solutions to this issue. Many thanks! S.B. Date A B C D 1980 2 75 12 41 1981 9 NA 7 0 1982 18 15 0 0 1983 0 16 0 NA 1984 12 43 0 0 1985 48 3 26 21 [[alternative HTML version deleted]]
William Dunlap
2011-Oct-12 17:23 UTC
[R] exclude columns with at least three consecutive zeros
First define a function that returns TRUE if a column should be dropped. E.g., has3Zeros.1 <- function(x) { x <- x[!is.na(x)] == 0 # drop NA's, convert 0's to TRUE, others to FALSE if (length(x) < 3) { FALSE # you may want to further test short vectors } else { i <- seq_len(length(x) - 2) any(x[i] & x[i + 1] & x[i + 2]) } } or has3Zeros.2 <- function (x) { x <- x[!is.na(x)] == 0 r <- rle(x) any(r$lengths[r$values] >= 3) } The use sapply on your data.frame with this function to see which columns to omit and use [ to omit them: > e <- data.frame(Date=1980:1985, + A = c(2, 9, 18, 0, 12, 48), + B = c(75, NA, 15, 16, 43, 3), + C = c(12, 7, 0, 0, 0, 26), + D = c(41, 0, 0, NA, 0, 21)) > e[, !sapply(e, has3Zeros.1), drop=FALSE] Date A B 1 1980 2 75 2 1981 9 NA 3 1982 18 15 4 1983 0 16 5 1984 12 43 6 1985 48 3 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Samir Benzerfa > Sent: Wednesday, October 12, 2011 8:35 AM > To: r-help at r-project.org > Subject: [R] exclude columns with at least three consecutive zeros > > Hi everyone, > > > > I have a large data set with about 3'000 columns and I would like to exclude > all columns which include three or more consecutive zeros (see below > example). A further issue is that it should just jump NA values if any. How > can I do this? > > > > In the below example R should exclude column C and D (since in D jumping the > NA leaves three consecutive zeros). > > > > I would appreciate any solutions to this issue. > > > > Many thanks! > > S.B. > > > > Date A B C D > > 1980 2 75 12 41 > > 1981 9 NA 7 0 > > 1982 18 15 0 0 > > 1983 0 16 0 NA > > 1984 12 43 0 0 > > 1985 48 3 26 21 > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.