Ana Marija
2020-Jun-03 14:55 UTC
[R] how to filter variables which appear in any row but do not include
Hello. I am trying to filter only rows that have ANY of these variables: E109, E119, E149 so I did: controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) than I checked what I got:> s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE)) > d0=unlist(s0) > d10=unique(d0) > d10[1] "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" [11] "E107" s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE)) d1=unlist(s1) d11=unique(d1)> d11[1] "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112" [11] "E117" I need help with changing this command controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) so that in the output I do not have any rows that include E102 or E112? Thanks Ana
Bert Gunter
2020-Jun-03 16:00 UTC
[R] how to filter variables which appear in any row but do not include
I suggest that you forget all that fancy stuff (and this is not a use case for regular expressions). Use %in% with logical subscripting instead -- basic R functionality that can be found in any good R tutorial.> x <- c("ab","bc","cd") > x[x %in% c("ab","cd")][1] "ab" "cd"> x[!x %in% c("ab","cd")][1] "bc" Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Jun 3, 2020 at 7:56 AM Ana Marija <sokovic.anamarija at gmail.com> wrote:> Hello. > > I am trying to filter only rows that have ANY of these variables: > E109, E119, E149 > > so I did: > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) > > than I checked what I got: > > s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE)) > > d0=unlist(s0) > > d10=unique(d0) > > d10 > [1] "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" > [11] "E107" > s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE)) > d1=unlist(s1) > d11=unique(d1) > > d11 > [1] "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112" > [11] "E117" > > I need help with changing this command > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) > > so that in the output I do not have any rows that include E102 or E112? > > Thanks > Ana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Ana Marija
2020-Jun-03 16:28 UTC
[R] how to filter variables which appear in any row but do not include
Hi Bert The issue is that I have around 2000 columns so I can not be checking if those two are not present in each column of any row ?by hand? so to speak....And I need my output to be a data frame where neither E102 nor E112 are present. Basically from the data frame columns that I already created just remove any row that contains any of those variables. Thanks Ana On Wed, 3 Jun 2020 at 11:00, Bert Gunter <bgunter.4567 at gmail.com> wrote:> I suggest that you forget all that fancy stuff (and this is not a use > case for regular expressions). > Use %in% with logical subscripting instead -- basic R functionality that > can be found in any good R tutorial. > > > x <- c("ab","bc","cd") > > x[x %in% c("ab","cd")] > [1] "ab" "cd" > > x[!x %in% c("ab","cd")] > [1] "bc" > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Wed, Jun 3, 2020 at 7:56 AM Ana Marija <sokovic.anamarija at gmail.com> > wrote: > >> Hello. >> >> I am trying to filter only rows that have ANY of these variables: >> E109, E119, E149 >> >> so I did: >> controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) >> >> than I checked what I got: >> > s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE)) >> > d0=unlist(s0) >> > d10=unique(d0) >> > d10 >> [1] "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" >> [11] "E107" >> s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE)) >> d1=unlist(s1) >> d11=unique(d1) >> > d11 >> [1] "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112" >> [11] "E117" >> >> I need help with changing this command >> controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) >> >> so that in the output I do not have any rows that include E102 or E112? >> >> Thanks >> Ana >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]
Rui Barradas
2020-Jun-03 16:50 UTC
[R] how to filter variables which appear in any row but do not include
Hello, If you want to filter out rows with any of the values in a 'unwanted' vector, try the following. First, create a test data set. x <- scan(what = character(), text = ' "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" "E107" "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112" "E117" ') set.seed(2020) dat <- replicate(5, sample(x, 20, TRUE)) dat <- as.data.frame(dat) Now, remove all rows that have at least one of "E102" or "E112" unwanted <- c("E102", "E112") no <- sapply(dat, function(x){ grepl(paste(unwanted, collapse = "|"), x) }) no <- apply(no, 1, any) dat[!no, ] That's it, if I understood the problem. Hope this helps, Rui Barradas ?s 15:55 de 03/06/20, Ana Marija escreveu:> Hello. > > I am trying to filter only rows that have ANY of these variables: > E109, E119, E149 > > so I did: > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) > > than I checked what I got: >> s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE)) >> d0=unlist(s0) >> d10=unique(d0) >> d10 > [1] "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" > [11] "E107" > s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE)) > d1=unlist(s1) > d11=unique(d1) >> d11 > [1] "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112" > [11] "E117" > > I need help with changing this command > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) > > so that in the output I do not have any rows that include E102 or E112? > > Thanks > Ana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
William Michels
2020-Jun-03 17:19 UTC
[R] how to filter variables which appear in any row but do not include
#Below returns long list of TRUE/FALSE values, #Note: "IDs" is a column name, #Wrap with head() to shorten: df$IDs %in% c("ident_1", "ident_2"); #Below returns index of IDs that are TRUE, #Wrap with head() to shorten: which(df$IDs %in% c("ident_1", "ident_2")); #Below returns short TRUE/FALSE table: table(df$IDs %in% c("ident_1", "ident_2")); #Below check df to see unique IDs returned by "%in%" code above, #(Good for identifying missing "desired" IDs): unique(df[df$IDs %in% c("ident_1", "ident_2"), "IDs"]); #Below returns dimensions of dataframe "filtered" (retained) by desired IDs, #(Note rows below should equal number of TRUE in table above): dim(df[df$IDs %in% c("ident_1", "ident_2"), ]); #Create filtered dataframe object: df_filtered <- df[df$IDs %in% c("ident_1", "ident_2"), ]; #Below returns row counts per "IDs" ("ident_1", "ident_2", etc.) in df_filtered: aggregate(df_filtered$IDs, by=list(df_filtered$IDs), FUN = "length"); HTH, Bill. W. Michels, Ph.D. On Wed, Jun 3, 2020 at 7:56 AM Ana Marija <sokovic.anamarija at gmail.com> wrote:> > Hello. > > I am trying to filter only rows that have ANY of these variables: > E109, E119, E149 > > so I did: > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) > > than I checked what I got: > > s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE)) > > d0=unlist(s0) > > d10=unique(d0) > > d10 > [1] "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" > [11] "E107" > s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE)) > d1=unlist(s1) > d11=unique(d1) > > d11 > [1] "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112" > [11] "E117" > > I need help with changing this command > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) > > so that in the output I do not have any rows that include E102 or E112? > > Thanks > Ana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2020-Jun-03 17:34 UTC
[R] how to filter variables which appear in any row but do not include
regex's are not needed. Using Rui's example:> bad <- mapply(function(x) x %in% unwanted,dat) > dat[!rowSums(bad),]V1 V2 V3 V4 V5 2 E117 E113 E119 E100 E10 4 E114 E11 E119 E119 E114 5 E109 E111 E103 E103 E100 7 E108 E113 E119 E117 E11 8 E114 E105 E10 E109 E110 9 E119 E116 E108 E118 E119 10 E100 E110 E104 E111 E101 13 E111 E116 E101 E110 E116 15 E103 E11 E108 E10 E113 16 E111 E117 E103 E115 E119 17 E104 E110 E104 E117 E114 19 E100 E108 E10 E111 E105 20 E109 E115 E117 E108 E106 Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Jun 3, 2020 at 9:57 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > If you want to filter out rows with any of the values in a 'unwanted' > vector, try the following. > > First, create a test data set. > > x <- scan(what = character(), text = ' > "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" > "E107" "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" > "E112" > "E117" > ') > > set.seed(2020) > dat <- replicate(5, sample(x, 20, TRUE)) > dat <- as.data.frame(dat) > > > Now, remove all rows that have at least one of "E102" or "E112" > > > unwanted <- c("E102", "E112") > no <- sapply(dat, function(x){ > grepl(paste(unwanted, collapse = "|"), x) > }) > no <- apply(no, 1, any) > dat[!no, ] > > > That's it, if I understood the problem. > > > Hope this helps, > > Rui Barradas > > > ?s 15:55 de 03/06/20, Ana Marija escreveu: > > Hello. > > > > I am trying to filter only rows that have ANY of these variables: > > E109, E119, E149 > > > > so I did: > > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) > > > > than I checked what I got: > >> s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE)) > >> d0=unlist(s0) > >> d10=unique(d0) > >> d10 > > [1] "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" > "E102" > > [11] "E107" > > s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE)) > > d1=unlist(s1) > > d11=unique(d1) > >> d11 > > [1] "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" > "E112" > > [11] "E117" > > > > I need help with changing this command > > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) > > > > so that in the output I do not have any rows that include E102 or E112? > > > > Thanks > > Ana > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Ana Marija
2020-Jun-03 17:49 UTC
[R] how to filter variables which appear in any row but do not include
Hi Rui, thank you so much, that is exactly what I needed! Cheers, Ana On Wed, Jun 3, 2020 at 11:50 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:> > Hello, > > If you want to filter out rows with any of the values in a 'unwanted' > vector, try the following. > > First, create a test data set. > > x <- scan(what = character(), text = ' > "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" > "E107" "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112" > "E117" > ') > > set.seed(2020) > dat <- replicate(5, sample(x, 20, TRUE)) > dat <- as.data.frame(dat) > > > Now, remove all rows that have at least one of "E102" or "E112" > > > unwanted <- c("E102", "E112") > no <- sapply(dat, function(x){ > grepl(paste(unwanted, collapse = "|"), x) > }) > no <- apply(no, 1, any) > dat[!no, ] > > > That's it, if I understood the problem. > > > Hope this helps, > > Rui Barradas > > > ?s 15:55 de 03/06/20, Ana Marija escreveu: > > Hello. > > > > I am trying to filter only rows that have ANY of these variables: > > E109, E119, E149 > > > > so I did: > > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) > > > > than I checked what I got: > >> s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE)) > >> d0=unlist(s0) > >> d10=unique(d0) > >> d10 > > [1] "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" > > [11] "E107" > > s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE)) > > d1=unlist(s1) > > d11=unique(d1) > >> d11 > > [1] "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112" > > [11] "E117" > > > > I need help with changing this command > > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) > > > > so that in the output I do not have any rows that include E102 or E112? > > > > Thanks > > Ana > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >