Ana Marija
2019-Oct-05 18:50 UTC
[R] how to select all columns that contain in any of their rows a partial match for a string?
Hello, I have a data frame tot which has many columns and many rows. I am trying to find all columns that have say a value in any of their rows that STARTS WITH: "E94" for example there are columns like this:> unique(tot$diagnoses_icd9_f41271_0_44)[1] NA "E9420" I tried: s=select(tot,starts_with("E94")) but this didn't return me anything. Data type in those columns is character. Thanks Ana
Rui Barradas
2019-Oct-05 19:24 UTC
[R] how to select all columns that contain in any of their rows a partial match for a string?
Hello,
Try the following
cols <- sapply(tot, function(x) any(grepl("^E94", x)))
To have the column numbers,
which(cols)
Hope this helps,
Rui Barradas
?s 19:50 de 05/10/19, Ana Marija escreveu:> Hello,
>
> I have a data frame tot which has many columns and many rows.
>
> I am trying to find all columns that have say a value in any of their
> rows that STARTS WITH: "E94"
>
> for example there are columns like this:
>
>> unique(tot$diagnoses_icd9_f41271_0_44)
> [1] NA "E9420"
>
> I tried:
> s=select(tot,starts_with("E94"))
>
> but this didn't return me anything. Data type in those columns is
character.
>
> Thanks
> Ana
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Rui Barradas
2019-Oct-05 21:05 UTC
[R] how to select all columns that contain in any of their rows a partial match for a string?
Hello,
Please CC the list.
The following code does what you want.
tot <- data.frame(a = c("E10123", "F123",
"G4567"),
b = c("a123", "E112345",
"b456"))
e10 <- sapply(tot, function(x) grepl("^E10", x))
e10 <- rowSums(e10) > 0
e11 <- sapply(tot, function(x) grepl("^E11", x))
e11 <- rowSums(e11) > 0
tot$newcol <- -9
tot$newcol[e10] <- 1
tot$newcol[e11] <- 2
On both cases the 2 lines sapply/rowSums can be made one with
rowSums(sapply(...)) > 0
Hope this helps,
Rui Barradas
?s 20:52 de 05/10/19, Ana Marija escreveu:> Hi Rui,
>
> thank you so much for getting back to me.
>
> I did what you told me:
> cols <- sapply(tot, function(x) any(grepl("^E10", x)))
> a=which(cols)
>
> so this gives me name of 49 columns that have that particular string
>
> But how do I create a new column in my tot data frame (the column
> would be called "TD") which has 1 in the row where the subject
> (designated in the "eid" column) has a string which starts with
"E10"
> and it has 2 if it starts with "E11" and otherwise it is -9.
>
>> head(tot)[1:3,1:3]
> eid sex_f31_0_0 year_of_birth_f34_0_0
> 1 1000017 Female 1938
> 2 1000025 Female 1951
> 3 1000038 Male 1961
>
> Thanks you so much!
>
>
> On Sat, Oct 5, 2019 at 2:24 PM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
>>
>> Hello,
>>
>> Try the following
>>
>> cols <- sapply(tot, function(x) any(grepl("^E94", x)))
>>
>> To have the column numbers,
>>
>> which(cols)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> ?s 19:50 de 05/10/19, Ana Marija escreveu:
>>> Hello,
>>>
>>> I have a data frame tot which has many columns and many rows.
>>>
>>> I am trying to find all columns that have say a value in any of
their
>>> rows that STARTS WITH: "E94"
>>>
>>> for example there are columns like this:
>>>
>>>> unique(tot$diagnoses_icd9_f41271_0_44)
>>> [1] NA "E9420"
>>>
>>> I tried:
>>> s=select(tot,starts_with("E94"))
>>>
>>> but this didn't return me anything. Data type in those columns
is character.
>>>
>>> Thanks
>>> Ana
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
Ana Marija
2019-Oct-05 21:41 UTC
[R] how to select all columns that contain in any of their rows a partial match for a string?
Thank you so much this worked wonderfully! On Sat, Oct 5, 2019 at 4:05 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:> > Hello, > > Please CC the list. > > The following code does what you want. > > tot <- data.frame(a = c("E10123", "F123", "G4567"), > b = c("a123", "E112345", "b456")) > > e10 <- sapply(tot, function(x) grepl("^E10", x)) > e10 <- rowSums(e10) > 0 > e11 <- sapply(tot, function(x) grepl("^E11", x)) > e11 <- rowSums(e11) > 0 > > tot$newcol <- -9 > tot$newcol[e10] <- 1 > tot$newcol[e11] <- 2 > > > On both cases the 2 lines sapply/rowSums can be made one with > > rowSums(sapply(...)) > 0 > > > Hope this helps, > > Rui Barradas > > ?s 20:52 de 05/10/19, Ana Marija escreveu: > > Hi Rui, > > > > thank you so much for getting back to me. > > > > I did what you told me: > > cols <- sapply(tot, function(x) any(grepl("^E10", x))) > > a=which(cols) > > > > so this gives me name of 49 columns that have that particular string > > > > But how do I create a new column in my tot data frame (the column > > would be called "TD") which has 1 in the row where the subject > > (designated in the "eid" column) has a string which starts with "E10" > > and it has 2 if it starts with "E11" and otherwise it is -9. > > > >> head(tot)[1:3,1:3] > > eid sex_f31_0_0 year_of_birth_f34_0_0 > > 1 1000017 Female 1938 > > 2 1000025 Female 1951 > > 3 1000038 Male 1961 > > > > Thanks you so much! > > > > > > On Sat, Oct 5, 2019 at 2:24 PM Rui Barradas <ruipbarradas at sapo.pt> wrote: > >> > >> Hello, > >> > >> Try the following > >> > >> cols <- sapply(tot, function(x) any(grepl("^E94", x))) > >> > >> To have the column numbers, > >> > >> which(cols) > >> > >> > >> Hope this helps, > >> > >> Rui Barradas > >> > >> ?s 19:50 de 05/10/19, Ana Marija escreveu: > >>> Hello, > >>> > >>> I have a data frame tot which has many columns and many rows. > >>> > >>> I am trying to find all columns that have say a value in any of their > >>> rows that STARTS WITH: "E94" > >>> > >>> for example there are columns like this: > >>> > >>>> unique(tot$diagnoses_icd9_f41271_0_44) > >>> [1] NA "E9420" > >>> > >>> I tried: > >>> s=select(tot,starts_with("E94")) > >>> > >>> but this didn't return me anything. Data type in those columns is character. > >>> > >>> Thanks > >>> Ana > >>> > >>> ______________________________________________ > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>>