Hello,
A base R solution with strsplit, like in your code.
F1$Y1 <- +grepl("_", F1$text)
tmp <- strsplit(as.character(F1$text), "_")
tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x)
tmp <- do.call(rbind, tmp)
colnames(tmp) <- c("X1", "X2")
F1 <- cbind(F1[-3], tmp) # remove the original column
rm(tmp)
F1
# ID1 ID2 Y1 X1 X2
#1 A1 B1 0 NONE .
#2 A1 B1 1 cf 12
#3 A1 B1 0 NONE .
#4 A2 B2 1 X2 25
#5 A2 B3 1 fd 15
Note that cbind dispatches on F1, an object of class "data.frame".
Therefore it's the method cbind.data.frame that is called and the result
is also a df, though tmp is a "matrix".
Hope this helps,
Rui Barradas
?s 20:07 de 22/09/20, Rui Barradas escreveu:> Hello,
>
> Something like this?
>
>
> F1$Y1 <- +grepl("_", F1$text)
> F1 <- F1[c(1, 2, 4, 3)]
> F1 <- tidyr::separate(F1, text, into = c("X1",
"X2"), sep = "_", fill =
> "right")
> F1
>
>
> Hope this helps,
>
> Rui Barradas
>
> ?s 19:55 de 22/09/20, Val escreveu:
>> HI All,
>>
>> I am trying to create?? new columns based on another column string
>> content. First I want to identify rows that contain a particular
>> string.? If it contains, I want to split the string and create two
>> variables.
>>
>> Here is my sample of data.
>> F1<-read.table(text="ID1? ID2? text
>> A1 B1?? NONE
>> A1 B1?? cf_12
>> A1 B1?? NONE
>> A2 B2?? X2_25
>> A2 B3?? fd_15? ",header=TRUE,stringsAsFactors=F)
>> If the variable "text" contains this "_" I want to
create an indicator
>> variable as shown below
>>
>> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
>>
>>
>> Then I want to split that string in to two, before "_" and
after "_"
>> and create two variables as shown below
>> x1= strsplit(as.character(F1$text),'_',2)
>>
>> My problem is how to combine this with the original data frame. The
>> desired? output is shown?? below,
>>
>>
>> ID1 ID2? Y1?? X1??? X2
>> A1? B1??? 0?? NONE?? .
>> A1? B1?? 1??? cf??????? 12
>> A1? B1?? 0? NONE?? .
>> A2? B2?? 1??? X2??? 25
>> A2? B3?? 1??? fd??? 15
>>
>> Any help?
>> Thank you.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
To be clear, I think Rui's solution is perfectly fine and probably better than what I offer below. But just for fun, I wanted to do it without the lapply(). Here is one way. I think my comments suffice to explain.> ## which are the non "_" indices? > wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE) > ## paste "_." to these > F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_") > ## Now strsplit() and unlist() them to get a vector > z <- unlist(strsplit(F1$text, "_")) > ## now cbind() to the data frame > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE)) > F1ID1 ID2 text 1 2 1 A1 B1 NONE_. NONE . 2 A1 B1 cf_12 cf 12 3 A1 B1 NONE_. NONE . 4 A2 B2 X2_25 X2 25 5 A2 B3 fd_15 fd 15>## You can change the names of the 2 columns yourselfCheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > A base R solution with strsplit, like in your code. > > F1$Y1 <- +grepl("_", F1$text) > > tmp <- strsplit(as.character(F1$text), "_") > tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x) > tmp <- do.call(rbind, tmp) > colnames(tmp) <- c("X1", "X2") > F1 <- cbind(F1[-3], tmp) # remove the original column > rm(tmp) > > F1 > # ID1 ID2 Y1 X1 X2 > #1 A1 B1 0 NONE . > #2 A1 B1 1 cf 12 > #3 A1 B1 0 NONE . > #4 A2 B2 1 X2 25 > #5 A2 B3 1 fd 15 > > > Note that cbind dispatches on F1, an object of class "data.frame". > Therefore it's the method cbind.data.frame that is called and the result > is also a df, though tmp is a "matrix". > > > Hope this helps, > > Rui Barradas > > > ?s 20:07 de 22/09/20, Rui Barradas escreveu: > > Hello, > > > > Something like this? > > > > > > F1$Y1 <- +grepl("_", F1$text) > > F1 <- F1[c(1, 2, 4, 3)] > > F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill > > "right") > > F1 > > > > > > Hope this helps, > > > > Rui Barradas > > > > ?s 19:55 de 22/09/20, Val escreveu: > >> HI All, > >> > >> I am trying to create new columns based on another column string > >> content. First I want to identify rows that contain a particular > >> string. If it contains, I want to split the string and create two > >> variables. > >> > >> Here is my sample of data. > >> F1<-read.table(text="ID1 ID2 text > >> A1 B1 NONE > >> A1 B1 cf_12 > >> A1 B1 NONE > >> A2 B2 X2_25 > >> A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F) > >> If the variable "text" contains this "_" I want to create an indicator > >> variable as shown below > >> > >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0) > >> > >> > >> Then I want to split that string in to two, before "_" and after "_" > >> and create two variables as shown below > >> x1= strsplit(as.character(F1$text),'_',2) > >> > >> My problem is how to combine this with the original data frame. The > >> desired output is shown below, > >> > >> > >> ID1 ID2 Y1 X1 X2 > >> A1 B1 0 NONE . > >> A1 B1 1 cf 12 > >> A1 B1 0 NONE . > >> A2 B2 1 X2 25 > >> A2 B3 1 fd 15 > >> > >> Any help? > >> Thank you. > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Thank you all for the help! LMH, Yes I would like to see the alternative. I am using this for a large data set and if the alternative is more efficient than this then I would be happy. On Tue, Sep 22, 2020 at 6:25 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > To be clear, I think Rui's solution is perfectly fine and probably better than what I offer below. But just for fun, I wanted to do it without the lapply(). Here is one way. I think my comments suffice to explain. > > > ## which are the non "_" indices? > > wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE) > > ## paste "_." to these > > F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_") > > ## Now strsplit() and unlist() them to get a vector > > z <- unlist(strsplit(F1$text, "_")) > > ## now cbind() to the data frame > > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE)) > > F1 > ID1 ID2 text 1 2 > 1 A1 B1 NONE_. NONE . > 2 A1 B1 cf_12 cf 12 > 3 A1 B1 NONE_. NONE . > 4 A2 B2 X2_25 X2 25 > 5 A2 B3 fd_15 fd 15 > >## You can change the names of the 2 columns yourself > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarradas at sapo.pt> wrote: >> >> Hello, >> >> A base R solution with strsplit, like in your code. >> >> F1$Y1 <- +grepl("_", F1$text) >> >> tmp <- strsplit(as.character(F1$text), "_") >> tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x) >> tmp <- do.call(rbind, tmp) >> colnames(tmp) <- c("X1", "X2") >> F1 <- cbind(F1[-3], tmp) # remove the original column >> rm(tmp) >> >> F1 >> # ID1 ID2 Y1 X1 X2 >> #1 A1 B1 0 NONE . >> #2 A1 B1 1 cf 12 >> #3 A1 B1 0 NONE . >> #4 A2 B2 1 X2 25 >> #5 A2 B3 1 fd 15 >> >> >> Note that cbind dispatches on F1, an object of class "data.frame". >> Therefore it's the method cbind.data.frame that is called and the result >> is also a df, though tmp is a "matrix". >> >> >> Hope this helps, >> >> Rui Barradas >> >> >> ?s 20:07 de 22/09/20, Rui Barradas escreveu: >> > Hello, >> > >> > Something like this? >> > >> > >> > F1$Y1 <- +grepl("_", F1$text) >> > F1 <- F1[c(1, 2, 4, 3)] >> > F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill >> > "right") >> > F1 >> > >> > >> > Hope this helps, >> > >> > Rui Barradas >> > >> > ?s 19:55 de 22/09/20, Val escreveu: >> >> HI All, >> >> >> >> I am trying to create new columns based on another column string >> >> content. First I want to identify rows that contain a particular >> >> string. If it contains, I want to split the string and create two >> >> variables. >> >> >> >> Here is my sample of data. >> >> F1<-read.table(text="ID1 ID2 text >> >> A1 B1 NONE >> >> A1 B1 cf_12 >> >> A1 B1 NONE >> >> A2 B2 X2_25 >> >> A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F) >> >> If the variable "text" contains this "_" I want to create an indicator >> >> variable as shown below >> >> >> >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0) >> >> >> >> >> >> Then I want to split that string in to two, before "_" and after "_" >> >> and create two variables as shown below >> >> x1= strsplit(as.character(F1$text),'_',2) >> >> >> >> My problem is how to combine this with the original data frame. The >> >> desired output is shown below, >> >> >> >> >> >> ID1 ID2 Y1 X1 X2 >> >> A1 B1 0 NONE . >> >> A1 B1 1 cf 12 >> >> A1 B1 0 NONE . >> >> A2 B2 1 X2 25 >> >> A2 B3 1 fd 15 >> >> >> >> Any help? >> >> Thank you. >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
Another way to make columns out of the stuff before and after the
underscore, with NAs if there is no underscore, is
utils::strcapture("([^_]*)_(.*)", F1$text,
proto=data.frame(Before_=character(), After_=character()))
-Bill
On Tue, Sep 22, 2020 at 4:25 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> To be clear, I think Rui's solution is perfectly fine and probably
better
> than what I offer below. But just for fun, I wanted to do it without the
> lapply(). Here is one way. I think my comments suffice to explain.
>
> > ## which are the non "_" indices?
> > wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE)
> > ## paste "_." to these
> > F1[wh,"text"] <-
paste(F1[wh,"text"],".",sep = "_")
> > ## Now strsplit() and unlist() them to get a vector
> > z <- unlist(strsplit(F1$text, "_"))
> > ## now cbind() to the data frame
> > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
> > F1
> ID1 ID2 text 1 2
> 1 A1 B1 NONE_. NONE .
> 2 A1 B1 cf_12 cf 12
> 3 A1 B1 NONE_. NONE .
> 4 A2 B2 X2_25 X2 25
> 5 A2 B3 fd_15 fd 15
> >## You can change the names of the 2 columns yourself
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarradas at
sapo.pt>
> wrote:
>
> > Hello,
> >
> > A base R solution with strsplit, like in your code.
> >
> > F1$Y1 <- +grepl("_", F1$text)
> >
> > tmp <- strsplit(as.character(F1$text), "_")
> > tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x,
".") else x)
> > tmp <- do.call(rbind, tmp)
> > colnames(tmp) <- c("X1", "X2")
> > F1 <- cbind(F1[-3], tmp) # remove the original column
> > rm(tmp)
> >
> > F1
> > # ID1 ID2 Y1 X1 X2
> > #1 A1 B1 0 NONE .
> > #2 A1 B1 1 cf 12
> > #3 A1 B1 0 NONE .
> > #4 A2 B2 1 X2 25
> > #5 A2 B3 1 fd 15
> >
> >
> > Note that cbind dispatches on F1, an object of class
"data.frame".
> > Therefore it's the method cbind.data.frame that is called and the
result
> > is also a df, though tmp is a "matrix".
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> > ?s 20:07 de 22/09/20, Rui Barradas escreveu:
> > > Hello,
> > >
> > > Something like this?
> > >
> > >
> > > F1$Y1 <- +grepl("_", F1$text)
> > > F1 <- F1[c(1, 2, 4, 3)]
> > > F1 <- tidyr::separate(F1, text, into = c("X1",
"X2"), sep = "_", fill > > > "right")
> > > F1
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > > ?s 19:55 de 22/09/20, Val escreveu:
> > >> HI All,
> > >>
> > >> I am trying to create new columns based on another column
string
> > >> content. First I want to identify rows that contain a
particular
> > >> string. If it contains, I want to split the string and
create two
> > >> variables.
> > >>
> > >> Here is my sample of data.
> > >> F1<-read.table(text="ID1 ID2 text
> > >> A1 B1 NONE
> > >> A1 B1 cf_12
> > >> A1 B1 NONE
> > >> A2 B2 X2_25
> > >> A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F)
> > >> If the variable "text" contains this "_"
I want to create an indicator
> > >> variable as shown below
> > >>
> > >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
> > >>
> > >>
> > >> Then I want to split that string in to two, before
"_" and after "_"
> > >> and create two variables as shown below
> > >> x1= strsplit(as.character(F1$text),'_',2)
> > >>
> > >> My problem is how to combine this with the original data
frame. The
> > >> desired output is shown below,
> > >>
> > >>
> > >> ID1 ID2 Y1 X1 X2
> > >> A1 B1 0 NONE .
> > >> A1 B1 1 cf 12
> > >> A1 B1 0 NONE .
> > >> A2 B2 1 X2 25
> > >> A2 B3 1 fd 15
> > >>
> > >> Any help?
> > >> Thank you.
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible
code.
> > >>
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]