Ana Marija
2020-May-15 19:24 UTC
[R] how to extract strings in any column and in any row that start with
Hello,
this command was running for more than 2 hours
grep("E10",tot,value=T)
and no output
and this command
df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
gave me a subset (a data frame) of tot where ^E10
what I need is just a vector or all values in tot which start with E10.
Thanks
Ana
On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:>
> Read about regular expressions... they are extremely useful.
>
> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>
> It is bad form not to put spaces around the <- assignment.
>
>
> On May 15, 2020 10:00:04 AM PDT, Ana Marija <sokovic.anamarija at
gmail.com> wrote:
> >Hello,
> >
> >I have a data frame:
> >
> >> dim(tot)
> >[1] 502536 1093
> >
> >How would I extract from it all strings that start with E10?
> >
> >I know how to extract all rows that contain with E10
> >df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
> >> dim(df0)
> >[1] 5105 1093
> >
> >but I just need a vector of strings that start with E10...
> >it would look something like this:
> >
> >[1] "E102" "E109" "E108" "E103"
"E104" "E105" "E101" "E106"
"E107"
> >
> >Thanks
> >Ana
> >
> >______________________________________________
> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
Jeff Newmiller
2020-May-15 20:34 UTC
[R] how to extract strings in any column and in any row that start with
If you want to treat your data frame as if it were a vector, then convert it to a vector before you give it to grep. unlist(tot) On May 15, 2020 12:24:17 PM PDT, Ana Marija <sokovic.anamarija at gmail.com> wrote:>Hello, > >this command was running for more than 2 hours >grep("E10",tot,value=T) >and no output > >and this command >df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) > >gave me a subset (a data frame) of tot where ^E10 > >what I need is just a vector or all values in tot which start with E10. > >Thanks >Ana > >On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller ><jdnewmil at dcn.davis.ca.us> wrote: >> >> Read about regular expressions... they are extremely useful. >> >> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) >> >> It is bad form not to put spaces around the <- assignment. >> >> >> On May 15, 2020 10:00:04 AM PDT, Ana Marija ><sokovic.anamarija at gmail.com> wrote: >> >Hello, >> > >> >I have a data frame: >> > >> >> dim(tot) >> >[1] 502536 1093 >> > >> >How would I extract from it all strings that start with E10? >> > >> >I know how to extract all rows that contain with E10 >> >df0<-tot %>% filter_all(any_vars(. %in% c('E10'))) >> >> dim(df0) >> >[1] 5105 1093 >> > >> >but I just need a vector of strings that start with E10... >> >it would look something like this: >> > >> >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107" >> > >> >Thanks >> >Ana >> > >> >______________________________________________ >> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >PLEASE do read the posting guide >> >http://www.R-project.org/posting-guide.html >> >and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Sent from my phone. Please excuse my brevity.-- Sent from my phone. Please excuse my brevity.
cpoiw@rt m@iii@g oii chemo@org@uk
2020-May-15 20:43 UTC
[R] how to extract strings in any column and in any row that start with
This is almost certainly not the most efficient way:
tot <- data.frame(v1 = paste0(LETTERS[seq(1:5)],seq(1:10)),
v2 = paste0(LETTERS[seq(1:5)],seq(from = 101, to=110, by =
1)),
v3 = paste0(LETTERS[seq(1:5)],seq(from = 111, to=120, by =
1)),
v4 = paste0(LETTERS[seq(1:5)],seq(from = 121, to=130, by =
1)),
v5 = paste0(LETTERS[seq(1:5)],seq(from = 131, to=140, by =
1)),
v6 = paste0(LETTERS[seq(1:5)],seq(from = 101, to=110, by =
1))
)
# set a variable to hold the result
myResult <- NULL
# iterate through each variable
for (v in 1:length(tot[1,])) {
thisResult <- as.character(tot[grepl ('^E10', tot[,v]),v])
myResult <- c(myResult, thisResult)
}
myResult <- unique( myResult )
==
Indeed as I wrote this Jeff has popped along with unlist!
Using my example above:
unique ( as.character( unlist (tot) )[grepl ('^E10', as.character(
unlist (tot) ) )] )
does what you wanted (you may not need the as.characters if you are on R
4.o, or if your df has chars rather than factors.
On 2020-05-15 21:34, Jeff Newmiller wrote:> If you want to treat your data frame as if it were a vector, then
> convert it to a vector before you give it to grep.
>
> unlist(tot)
>
> On May 15, 2020 12:24:17 PM PDT, Ana Marija
> <sokovic.anamarija at gmail.com> wrote:
>> Hello,
>>
>> this command was running for more than 2 hours
>> grep("E10",tot,value=T)
>> and no output
>>
>> and this command
>> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>>
>> gave me a subset (a data frame) of tot where ^E10
>>
>> what I need is just a vector or all values in tot which start with
>> E10.
>>
>> Thanks
>> Ana
>>
>> On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
>> <jdnewmil at dcn.davis.ca.us> wrote:
>>>
>>> Read about regular expressions... they are extremely useful.
>>>
>>> df1 <- tot %>% filter_all(any_vars(grepl( '^E10',
.)))
>>>
>>> It is bad form not to put spaces around the <- assignment.
>>>
>>>
>>> On May 15, 2020 10:00:04 AM PDT, Ana Marija
>> <sokovic.anamarija at gmail.com> wrote:
>>> >Hello,
>>> >
>>> >I have a data frame:
>>> >
>>> >> dim(tot)
>>> >[1] 502536 1093
>>> >
>>> >How would I extract from it all strings that start with E10?
>>> >
>>> >I know how to extract all rows that contain with E10
>>> >df0<-tot %>% filter_all(any_vars(. %in%
c('E10')))
>>> >> dim(df0)
>>> >[1] 5105 1093
>>> >
>>> >but I just need a vector of strings that start with E10...
>>> >it would look something like this:
>>> >
>>> >[1] "E102" "E109" "E108"
"E103" "E104" "E105" "E101"
"E106" "E107"
>>> >
>>> >Thanks
>>> >Ana
>>> >
>>> >______________________________________________
>>> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>> >https://stat.ethz.ch/mailman/listinfo/r-help
>>> >PLEASE do read the posting guide
>>> >http://www.R-project.org/posting-guide.html
>>> >and provide commented, minimal, self-contained, reproducible
code.
>>>
>>> --
>>> Sent from my phone. Please excuse my brevity.
Rui Barradas
2020-May-15 22:12 UTC
[R] how to extract strings in any column and in any row that start with
Hello,
I have tried several options and with large dataframes this one was the
fastest (in my tests, of the ones I have tried).
s1 <- sapply(tot, function(x) grep('^E10', x, value = TRUE))
Then unlist(s1).
A close second (15% slower) was
s2 <- tot[sapply(tot, function(x) grepl('^E10', x))]
grep/unlist was 3.7 times slower:
grep("^E10", unlist(tot), value = TRUE)
Hope this helps,
Rui Barradas
?s 20:24 de 15/05/20, Ana Marija escreveu:> Hello,
>
> this command was running for more than 2 hours
> grep("E10",tot,value=T)
> and no output
>
> and this command
> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>
> gave me a subset (a data frame) of tot where ^E10
>
> what I need is just a vector or all values in tot which start with E10.
>
> Thanks
> Ana
>
> On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
> <jdnewmil at dcn.davis.ca.us> wrote:
>>
>> Read about regular expressions... they are extremely useful.
>>
>> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>>
>> It is bad form not to put spaces around the <- assignment.
>>
>>
>> On May 15, 2020 10:00:04 AM PDT, Ana Marija <sokovic.anamarija at
gmail.com> wrote:
>>> Hello,
>>>
>>> I have a data frame:
>>>
>>>> dim(tot)
>>> [1] 502536 1093
>>>
>>> How would I extract from it all strings that start with E10?
>>>
>>> I know how to extract all rows that contain with E10
>>> df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
>>>> dim(df0)
>>> [1] 5105 1093
>>>
>>> but I just need a vector of strings that start with E10...
>>> it would look something like this:
>>>
>>> [1] "E102" "E109" "E108"
"E103" "E104" "E105" "E101"
"E106" "E107"
>>>
>>> Thanks
>>> Ana
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Ana Marija
2020-May-15 22:28 UTC
[R] how to extract strings in any column and in any row that start with
Hi Rui, thank you so much that is exactly what I needed! Cheers, Ana On Fri, May 15, 2020 at 5:12 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:> > Hello, > > I have tried several options and with large dataframes this one was the > fastest (in my tests, of the ones I have tried). > > > s1 <- sapply(tot, function(x) grep('^E10', x, value = TRUE)) > > > Then unlist(s1). > A close second (15% slower) was > > > s2 <- tot[sapply(tot, function(x) grepl('^E10', x))] > > > grep/unlist was 3.7 times slower: > > > grep("^E10", unlist(tot), value = TRUE) > > > Hope this helps, > > Rui Barradas > > ?s 20:24 de 15/05/20, Ana Marija escreveu: > > Hello, > > > > this command was running for more than 2 hours > > grep("E10",tot,value=T) > > and no output > > > > and this command > > df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) > > > > gave me a subset (a data frame) of tot where ^E10 > > > > what I need is just a vector or all values in tot which start with E10. > > > > Thanks > > Ana > > > > On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller > > <jdnewmil at dcn.davis.ca.us> wrote: > >> > >> Read about regular expressions... they are extremely useful. > >> > >> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) > >> > >> It is bad form not to put spaces around the <- assignment. > >> > >> > >> On May 15, 2020 10:00:04 AM PDT, Ana Marija <sokovic.anamarija at gmail.com> wrote: > >>> Hello, > >>> > >>> I have a data frame: > >>> > >>>> dim(tot) > >>> [1] 502536 1093 > >>> > >>> How would I extract from it all strings that start with E10? > >>> > >>> I know how to extract all rows that contain with E10 > >>> df0<-tot %>% filter_all(any_vars(. %in% c('E10'))) > >>>> dim(df0) > >>> [1] 5105 1093 > >>> > >>> but I just need a vector of strings that start with E10... > >>> it would look something like this: > >>> > >>> [1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107" > >>> > >>> Thanks > >>> Ana > >>> > >>> ______________________________________________ > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> -- > >> Sent from my phone. Please excuse my brevity. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >