Avi Gross
2021-Sep-15 05:22 UTC
[R] How to remove all rows that have a numeric in the first (or any) column
You are correct, Gregg, I am aware of that trick of asking something to not be evaluated in certain ways. And you can indeed use base R to play with contents of beta as defined above. Here is a sort of incremental demo:> sapply(mydf$beta, is.numeric)[1] FALSE TRUE TRUE FALSE> !sapply(mydf$beta, is.numeric)[1] TRUE FALSE FALSE TRUE> keeping <- !sapply(mydf$beta, is.numeric)> mydf[keeping, ]# A tibble: 2 x 2 alpha beta <int> <list> 1 1 <chr [1]> 2 4 <chr [1]> > str(mydf[keeping, ]) tibble [2 x 2] (S3: tbl_df/tbl/data.frame) $ alpha: int [1:2] 1 4 $ beta :List of 2 ..$ : chr "Hello" ..$ : chr "bye" Now for the bad news. The original request was for ANY column. But presumably one way to do it, neither efficiently nor the best, would be to loop on the names of all the columns and starting with the original data.frame, whittle away at it column by column and adjust which column you search each time until what is left had nothing numeric anywhere. Now if I was using dplyr, I wonder if there is a nice way to use rowwise() to evaluate across a row. Using your technique I made the following data.frame: mydf <- data.frame(alpha=I(list("first", 2, 3.3, "Last")), beta=I(list(1, "second", 3.3, "Lasting")))> mydfalpha beta 1 first 1 2 2 second 3 3.3 3.3 4 Last Lasting Do we agree only the fourth row should be kept as the others have one or two numeric values? Here is some code I cobbled together that seems to work: rowwise(mydf) %>% mutate(alphazoid=!is.numeric(unlist(alpha)), betazoid=!is.numeric(unlist(beta))) %>% filter(alphazoid & betazoid) -> result str(result) print(result) result[[1,1]] result[[1,2]] as.data.frame(result) The results are shown below that only the fourth row was kept:> rowwise(mydf) %>%+ mutate(alphazoid=!is.numeric(unlist(alpha)), + betazoid=!is.numeric(unlist(beta))) %>% + filter(alphazoid & betazoid) -> result>> str(result) rowwise_df [1 x 4] (S3: rowwise_df/tbl_df/tbl/data.frame) $ alpha :List of 1 ..$ : chr "Last" ..- attr(*, "class")= chr "AsIs" $ beta :List of 1 ..$ : chr "Lasting" ..- attr(*, "class")= chr "AsIs" $ alphazoid: logi TRUE $ betazoid : logi TRUE - attr(*, "groups")= tibble [1 x 1] (S3: tbl_df/tbl/data.frame) ..$ .rows: list<int> [1:1] .. ..$ : int 1 .. ..@ ptype: int(0)> print(result)# A tibble: 1 x 4 # Rowwise: alpha beta alphazoid betazoid <I<list>> <I<list>> <lgl> <lgl> 1 <chr [1]> <chr [1]> TRUE TRUE> result[[1,1]][[1]] [1] "Last"> result[[1,2]][[1]] [1] "Lasting"> as.data.frame(result)alpha beta alphazoid betazoid 1 Last Lasting TRUE TRUE Of course, the temporary columns for alphazoid and betazoid can trivially be removed. From: Andrew Simmons <akwsimmo at gmail.com> Sent: Wednesday, September 15, 2021 12:44 AM To: Avi Gross <avigross at verizon.net> Cc: Gregg Powell via R-help <r-help at r-project.org> Subject: Re: [R] How to remove all rows that have a numeric in the first (or any) column I'd like to point out that base R can handle a list as a data frame column, it's just that you have to make the list of class "AsIs". So in your example temp <- list("Hello", 1, 1.1, "bye") data.frame(alpha = 1:4, beta = I(temp)) means that column "beta" will still be a list. On Wed, Sep 15, 2021, 00:40 Avi Gross via R-help <r-help at r-project.org <mailto:r-help at r-project.org> > wrote: Calling something a data.frame does not make it a data.frame. The abbreviated object shown below is a list of singletons. If it is a column in a larger object that is a data.frame, then it is a list column which is valid but can be ticklish to handle within base R but less so in the tidyverse. For example, if I try to make a data.frame the normal way, the list gets made into multiple columns and copied to each row. Not what was expected. I think some tidyverse functionality does better. Like this: library(tidyverse) temp=list("Hello", 1, 1.1, "bye") Now making a data.frame has an odd result:> mydf=data.frame(alpha=1:4, beta=temp) > mydfalpha beta..Hello. beta.1 beta.1.1 beta..bye. 1 1 Hello 1 1.1 bye 2 2 Hello 1 1.1 bye 3 3 Hello 1 1.1 bye 4 4 Hello 1 1.1 bye But a tibble handles it:> mydf=tibble(alpha=1:4, beta=temp) > mydf# A tibble: 4 x 2 alpha beta <int> <list> 1 1 <chr [1]> 2 2 <dbl [1]> 3 3 <dbl [1]> 4 4 <chr [1]> So if the data does look like this, with a list column, but access can be tricky as subsetting a list with [] returns a list and you need [[]]. I found a somehwhat odd solution like this: mydf %>% filter(!map_lgl(beta, is.numeric)) -> mydf2 # A tibble: 2 x 2 alpha beta <int> <list> 1 1 <chr [1]> 2 4 <chr [1]> When I saved that result into mydf2, I got this. Original: > str(mydf) tibble [4 x 2] (S3: tbl_df/tbl/data.frame) $ alpha: int [1:4] 1 2 3 4 $ beta :List of 4 ..$ : chr "Hello" ..$ : num 1 ..$ : num 1.1 ..$ : chr "bye" Output when any row with a numeric is removed:> str(mydf2)tibble [2 x 2] (S3: tbl_df/tbl/data.frame) $ alpha: int [1:2] 1 4 $ beta :List of 2 ..$ : chr "Hello" ..$ : chr "bye" So if you try variations on your code motivated by what I show, good luck. I am sure there are many better ways but I repeat, it can be tricky. -----Original Message----- From: R-help <r-help-bounces at r-project.org <mailto:r-help-bounces at r-project.org> > On Behalf Of Jeff Newmiller Sent: Tuesday, September 14, 2021 11:54 PM To: Gregg Powell <g.a.powell at protonmail.com <mailto:g.a.powell at protonmail.com> > Cc: Gregg Powell via R-help <r-help at r-project.org <mailto:r-help at r-project.org> > Subject: Re: [R] How to remove all rows that have a numeric in the first (or any) column You cannot apply vectorized operators to list columns... you have to use a map function like sapply or purrr::map_lgl to obtain a logical vector by running the function once for each list element: sapply( VPN_Sheet1$HVA, is.numeric ) On September 14, 2021 8:38:35 PM PDT, Gregg Powell <g.a.powell at protonmail.com <mailto:g.a.powell at protonmail.com> > wrote:>Here is the output: > >> str(VPN_Sheet1$HVA) >List of 2174 > $ : chr "Email: fffd at fffffffffff.com <mailto:fffd at fffffffffff.com> " > $ : num 1 > $ : chr "Eloisa Libas" > $ : chr "Percival Esquejo" > $ : chr "Louchelle Singh" > $ : num 2 > $ : chr "Charisse Anne Tabarno, RN" > $ : chr "Sol Amor Mucoy" > $ : chr "Josan Moira Paler" > $ : num 3 > $ : chr "Anna Katrina V. Alberto" > $ : chr "Nenita Velarde" > $ : chr "Eunice Arrances" > $ : num 4 > $ : chr "Catherine Henson" > $ : chr "Maria Carla Daya" > $ : chr "Renee Ireine Alit" > $ : num 5 > $ : chr "Marol Joseph Domingo - PS" > $ : chr "Kissy Andrea Arriesgado" > $ : chr "Pia B Baluyut, RN" > $ : num 6 > $ : chr "Gladys Joy Tan" > $ : chr "Frances Zarzua" > $ : chr "Fairy Jane Nery" > $ : num 7 > $ : chr "Gladys Tijam, RMT" > $ : chr "Sarah Jane Aramburo" > $ : chr "Eve Mendoza" > $ : num 8 > $ : chr "Gloria Padolino" > $ : chr "Joyce Pearl Javier" > $ : chr "Ayza Padilla" > $ : num 9 > $ : chr "Walfredson Calderon" > $ : chr "Stephanie Anne Militante" > $ : chr "Rennua Oquilan" > $ : num 10 > $ : chr "Neil John Nery" > $ : chr "Maria Reyna Reyes" > $ : chr "Rowella Villegas" > $ : num 11 > $ : chr "Katelyn Mendiola" > $ : chr "Maria Riza Mariano" > $ : chr "Marie Vallianne Carantes" > $ : num 12 > >??????? Original Message ??????? > >On Tuesday, September 14th, 2021 at 8:32 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us <mailto:jdnewmil at dcn.davis.ca.us> > wrote: > >> An atomic column of data by design has exactly one mode, so if any >> values are non-numeric then the entire column will be non-numeric. >> What does >> > >> str(VPN_Sheet1$HVA) >> > >> tell you? It is likely either a factor or character data. >> > >> On September 14, 2021 7:01:53 PM PDT, Gregg Powell via R-help r-help at r-project.org <mailto:r-help at r-project.org> wrote: >> > >> > > Stuck on this problem - How does one remove all rows in a dataframe that have a numeric in the first (or any) column? >> > > >> > > Seems straight forward - but I'm having trouble. >> > > >> > I've attempted to used: >> > > >> > VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA),] >> > > >> > and >> > > >> > VPN_Sheet1 <- VPN_Sheet1[!is.integer(VPN_Sheet1$HVA),] >> > > >> > Neither work - Neither throw an error. >> > > >> > class(VPN_Sheet1$HVA) returns: >> > > >> > [1] "list" >> > > >> > So, the HVA column returns a list. >> > > >> > > Data looks like the attached screen grab - >> > > >> > > The ONLY rows I need to delete are the rows where there is a numeric in the HVA column. >> > > >> > > There are some 5000+ rows in the actual data. >> > > >> > > Would be grateful for a solution to this problem. >> > > >> > How to get R to detect whether the value in column 1 is a number so the rows with the number values can be deleted? >> > > >> > > Thanks in advance to any and all willing to help on this problem. >> > > >> > > Gregg Powell >> > > >> > > Sierra Vista, AZ >> > >> -- >> > >> Sent from my phone. Please excuse my brevity.-- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Avi Gross
2021-Sep-15 05:25 UTC
[R] How to remove all rows that have a numeric in the first (or any) column
My apologies. My reply was to Andrew, not Gregg. Enough damage for one night. Here is hoping we finally understood a question that could have been better phrased. list columns are not normally considered common data structures but quite possibly will be more as time goes on and the tools to handle them become better or at least better understood. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Avi Gross via R-help Sent: Wednesday, September 15, 2021 1:23 AM To: R-help at r-project.org Subject: Re: [R] How to remove all rows that have a numeric in the first (or any) column You are correct, Gregg, I am aware of that trick of asking something to not be evaluated in certain ways. And you can indeed use base R to play with contents of beta as defined above. Here is a sort of incremental demo:> sapply(mydf$beta, is.numeric)[1] FALSE TRUE TRUE FALSE> !sapply(mydf$beta, is.numeric)[1] TRUE FALSE FALSE TRUE> keeping <- !sapply(mydf$beta, is.numeric)> mydf[keeping, ]# A tibble: 2 x 2 alpha beta <int> <list> 1 1 <chr [1]> 2 4 <chr [1]> > str(mydf[keeping, ]) tibble [2 x 2] (S3: tbl_df/tbl/data.frame) $ alpha: int [1:2] 1 4 $ beta :List of 2 ..$ : chr "Hello" ..$ : chr "bye" Now for the bad news. The original request was for ANY column. But presumably one way to do it, neither efficiently nor the best, would be to loop on the names of all the columns and starting with the original data.frame, whittle away at it column by column and adjust which column you search each time until what is left had nothing numeric anywhere. Now if I was using dplyr, I wonder if there is a nice way to use rowwise() to evaluate across a row. Using your technique I made the following data.frame: mydf <- data.frame(alpha=I(list("first", 2, 3.3, "Last")), beta=I(list(1, "second", 3.3, "Lasting")))> mydfalpha beta 1 first 1 2 2 second 3 3.3 3.3 4 Last Lasting Do we agree only the fourth row should be kept as the others have one or two numeric values? Here is some code I cobbled together that seems to work: rowwise(mydf) %>% mutate(alphazoid=!is.numeric(unlist(alpha)), betazoid=!is.numeric(unlist(beta))) %>% filter(alphazoid & betazoid) -> result str(result) print(result) result[[1,1]] result[[1,2]] as.data.frame(result) The results are shown below that only the fourth row was kept:> rowwise(mydf) %>%+ mutate(alphazoid=!is.numeric(unlist(alpha)), + betazoid=!is.numeric(unlist(beta))) %>% + filter(alphazoid & betazoid) -> result>> str(result) rowwise_df [1 x 4] (S3: rowwise_df/tbl_df/tbl/data.frame) $ alpha :List of 1 ..$ : chr "Last" ..- attr(*, "class")= chr "AsIs" $ beta :List of 1 ..$ : chr "Lasting" ..- attr(*, "class")= chr "AsIs" $ alphazoid: logi TRUE $ betazoid : logi TRUE - attr(*, "groups")= tibble [1 x 1] (S3: tbl_df/tbl/data.frame) ..$ .rows: list<int> [1:1] .. ..$ : int 1 .. ..@ ptype: int(0)> print(result)# A tibble: 1 x 4 # Rowwise: alpha beta alphazoid betazoid <I<list>> <I<list>> <lgl> <lgl> 1 <chr [1]> <chr [1]> TRUE TRUE> result[[1,1]][[1]] [1] "Last"> result[[1,2]][[1]] [1] "Lasting"> as.data.frame(result)alpha beta alphazoid betazoid 1 Last Lasting TRUE TRUE Of course, the temporary columns for alphazoid and betazoid can trivially be removed. From: Andrew Simmons <akwsimmo at gmail.com> Sent: Wednesday, September 15, 2021 12:44 AM To: Avi Gross <avigross at verizon.net> Cc: Gregg Powell via R-help <r-help at r-project.org> Subject: Re: [R] How to remove all rows that have a numeric in the first (or any) column I'd like to point out that base R can handle a list as a data frame column, it's just that you have to make the list of class "AsIs". So in your example temp <- list("Hello", 1, 1.1, "bye") data.frame(alpha = 1:4, beta = I(temp)) means that column "beta" will still be a list. On Wed, Sep 15, 2021, 00:40 Avi Gross via R-help <r-help at r-project.org <mailto:r-help at r-project.org> > wrote: Calling something a data.frame does not make it a data.frame. The abbreviated object shown below is a list of singletons. If it is a column in a larger object that is a data.frame, then it is a list column which is valid but can be ticklish to handle within base R but less so in the tidyverse. For example, if I try to make a data.frame the normal way, the list gets made into multiple columns and copied to each row. Not what was expected. I think some tidyverse functionality does better. Like this: library(tidyverse) temp=list("Hello", 1, 1.1, "bye") Now making a data.frame has an odd result:> mydf=data.frame(alpha=1:4, beta=temp) > mydfalpha beta..Hello. beta.1 beta.1.1 beta..bye. 1 1 Hello 1 1.1 bye 2 2 Hello 1 1.1 bye 3 3 Hello 1 1.1 bye 4 4 Hello 1 1.1 bye But a tibble handles it:> mydf=tibble(alpha=1:4, beta=temp) > mydf# A tibble: 4 x 2 alpha beta <int> <list> 1 1 <chr [1]> 2 2 <dbl [1]> 3 3 <dbl [1]> 4 4 <chr [1]> So if the data does look like this, with a list column, but access can be tricky as subsetting a list with [] returns a list and you need [[]]. I found a somehwhat odd solution like this: mydf %>% filter(!map_lgl(beta, is.numeric)) -> mydf2 # A tibble: 2 x 2 alpha beta <int> <list> 1 1 <chr [1]> 2 4 <chr [1]> When I saved that result into mydf2, I got this. Original: > str(mydf) tibble [4 x 2] (S3: tbl_df/tbl/data.frame) $ alpha: int [1:4] 1 2 3 4 $ beta :List of 4 ..$ : chr "Hello" ..$ : num 1 ..$ : num 1.1 ..$ : chr "bye" Output when any row with a numeric is removed:> str(mydf2)tibble [2 x 2] (S3: tbl_df/tbl/data.frame) $ alpha: int [1:2] 1 4 $ beta :List of 2 ..$ : chr "Hello" ..$ : chr "bye" So if you try variations on your code motivated by what I show, good luck. I am sure there are many better ways but I repeat, it can be tricky. -----Original Message----- From: R-help <r-help-bounces at r-project.org <mailto:r-help-bounces at r-project.org> > On Behalf Of Jeff Newmiller Sent: Tuesday, September 14, 2021 11:54 PM To: Gregg Powell <g.a.powell at protonmail.com <mailto:g.a.powell at protonmail.com> > Cc: Gregg Powell via R-help <r-help at r-project.org <mailto:r-help at r-project.org> > Subject: Re: [R] How to remove all rows that have a numeric in the first (or any) column You cannot apply vectorized operators to list columns... you have to use a map function like sapply or purrr::map_lgl to obtain a logical vector by running the function once for each list element: sapply( VPN_Sheet1$HVA, is.numeric ) On September 14, 2021 8:38:35 PM PDT, Gregg Powell <g.a.powell at protonmail.com <mailto:g.a.powell at protonmail.com> > wrote:>Here is the output: > >> str(VPN_Sheet1$HVA) >List of 2174 > $ : chr "Email: fffd at fffffffffff.com <mailto:fffd at fffffffffff.com> " > $ : num 1 > $ : chr "Eloisa Libas" > $ : chr "Percival Esquejo" > $ : chr "Louchelle Singh" > $ : num 2 > $ : chr "Charisse Anne Tabarno, RN" > $ : chr "Sol Amor Mucoy" > $ : chr "Josan Moira Paler" > $ : num 3 > $ : chr "Anna Katrina V. Alberto" > $ : chr "Nenita Velarde" > $ : chr "Eunice Arrances" > $ : num 4 > $ : chr "Catherine Henson" > $ : chr "Maria Carla Daya" > $ : chr "Renee Ireine Alit" > $ : num 5 > $ : chr "Marol Joseph Domingo - PS" > $ : chr "Kissy Andrea Arriesgado" > $ : chr "Pia B Baluyut, RN" > $ : num 6 > $ : chr "Gladys Joy Tan" > $ : chr "Frances Zarzua" > $ : chr "Fairy Jane Nery" > $ : num 7 > $ : chr "Gladys Tijam, RMT" > $ : chr "Sarah Jane Aramburo" > $ : chr "Eve Mendoza" > $ : num 8 > $ : chr "Gloria Padolino" > $ : chr "Joyce Pearl Javier" > $ : chr "Ayza Padilla" > $ : num 9 > $ : chr "Walfredson Calderon" > $ : chr "Stephanie Anne Militante" > $ : chr "Rennua Oquilan" > $ : num 10 > $ : chr "Neil John Nery" > $ : chr "Maria Reyna Reyes" > $ : chr "Rowella Villegas" > $ : num 11 > $ : chr "Katelyn Mendiola" > $ : chr "Maria Riza Mariano" > $ : chr "Marie Vallianne Carantes" > $ : num 12 > >??????? Original Message ??????? > >On Tuesday, September 14th, 2021 at 8:32 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us <mailto:jdnewmil at dcn.davis.ca.us> > wrote: > >> An atomic column of data by design has exactly one mode, so if any >> values are non-numeric then the entire column will be non-numeric. >> What does >> > >> str(VPN_Sheet1$HVA) >> > >> tell you? It is likely either a factor or character data. >> > >> On September 14, 2021 7:01:53 PM PDT, Gregg Powell via R-help r-help at r-project.org <mailto:r-help at r-project.org> wrote: >> > >> > > Stuck on this problem - How does one remove all rows in a dataframe that have a numeric in the first (or any) column? >> > > >> > > Seems straight forward - but I'm having trouble. >> > > >> > I've attempted to used: >> > > >> > VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA),] >> > > >> > and >> > > >> > VPN_Sheet1 <- VPN_Sheet1[!is.integer(VPN_Sheet1$HVA),] >> > > >> > Neither work - Neither throw an error. >> > > >> > class(VPN_Sheet1$HVA) returns: >> > > >> > [1] "list" >> > > >> > So, the HVA column returns a list. >> > > >> > > Data looks like the attached screen grab - >> > > >> > > The ONLY rows I need to delete are the rows where there is a numeric in the HVA column. >> > > >> > > There are some 5000+ rows in the actual data. >> > > >> > > Would be grateful for a solution to this problem. >> > > >> > How to get R to detect whether the value in column 1 is a number so the rows with the number values can be deleted? >> > > >> > > Thanks in advance to any and all willing to help on this problem. >> > > >> > > Gregg Powell >> > > >> > > Sierra Vista, AZ >> > >> -- >> > >> Sent from my phone. Please excuse my brevity.-- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.