thr3ads.net - R help - [R] Split [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Rui Barradas

2020-Sep-22 19:16 UTC

[R] Split

Hello,

A base R solution with strsplit, like in your code.

F1$Y1 <- +grepl("_", F1$text)

tmp <- strsplit(as.character(F1$text), "_")
tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x)
tmp <- do.call(rbind, tmp)
colnames(tmp) <- c("X1", "X2")
F1 <- cbind(F1[-3], tmp)    # remove the original column
rm(tmp)

F1
#  ID1 ID2 Y1   X1 X2
#1  A1  B1  0 NONE  .
#2  A1  B1  1   cf 12
#3  A1  B1  0 NONE  .
#4  A2  B2  1   X2 25
#5  A2  B3  1   fd 15


Note that cbind dispatches on F1, an object of class "data.frame".
Therefore it's the method cbind.data.frame that is called and the result 
is also a df, though tmp is a "matrix".


Hope this helps,

Rui Barradas


?s 20:07 de 22/09/20, Rui Barradas escreveu:> Hello,
> 
> Something like this?
> 
> 
> F1$Y1 <- +grepl("_", F1$text)
> F1 <- F1[c(1, 2, 4, 3)]
> F1 <- tidyr::separate(F1, text, into = c("X1",
"X2"), sep = "_", fill =
> "right")
> F1
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> ?s 19:55 de 22/09/20, Val escreveu:
>> HI All,
>>
>> I am trying to create?? new columns based on another column string
>> content. First I want to identify rows that contain a particular
>> string.? If it contains, I want to split the string and create two
>> variables.
>>
>> Here is my sample of data.
>> F1<-read.table(text="ID1? ID2? text
>> A1 B1?? NONE
>> A1 B1?? cf_12
>> A1 B1?? NONE
>> A2 B2?? X2_25
>> A2 B3?? fd_15? ",header=TRUE,stringsAsFactors=F)
>> If the variable "text" contains this "_" I want to
create an indicator
>> variable as shown below
>>
>> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
>>
>>
>> Then I want to split that string in to two, before "_" and
after "_"
>> and create two variables as shown below
>> x1= strsplit(as.character(F1$text),'_',2)
>>
>> My problem is how to combine this with the original data frame. The
>> desired? output is shown?? below,
>>
>>
>> ID1 ID2? Y1?? X1??? X2
>> A1? B1??? 0?? NONE?? .
>> A1? B1?? 1??? cf??????? 12
>> A1? B1?? 0? NONE?? .
>> A2? B2?? 1??? X2??? 25
>> A2? B3?? 1??? fd??? 15
>>
>> Any help?
>> Thank you.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2020-Sep-22 23:25 UTC

head link

[R] Split

To be clear, I think Rui's solution is perfectly fine and probably better
than what I offer below. But just for fun, I wanted to do it without the
lapply().  Here is one way. I think my comments suffice to explain.
> ## which are the  non "_" indices?
> wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE)
> ## paste "_." to these
> F1[wh,"text"] <-
paste(F1[wh,"text"],".",sep = "_")
> ## Now strsplit() and unlist() them to get a vector
> z <- unlist(strsplit(F1$text, "_"))
> ## now cbind() to the data frame
> F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
> F1  ID1 ID2   text    1  2
1  A1  B1 NONE_. NONE  .
2  A1  B1  cf_12   cf 12
3  A1  B1 NONE_. NONE  .
4  A2  B2  X2_25   X2 25
5  A2  B3  fd_15   fd 15>## You can change the names of the 2 columns yourself
Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> Hello,
>
> A base R solution with strsplit, like in your code.
>
> F1$Y1 <- +grepl("_", F1$text)
>
> tmp <- strsplit(as.character(F1$text), "_")
> tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".")
else x)
> tmp <- do.call(rbind, tmp)
> colnames(tmp) <- c("X1", "X2")
> F1 <- cbind(F1[-3], tmp)    # remove the original column
> rm(tmp)
>
> F1
> #  ID1 ID2 Y1   X1 X2
> #1  A1  B1  0 NONE  .
> #2  A1  B1  1   cf 12
> #3  A1  B1  0 NONE  .
> #4  A2  B2  1   X2 25
> #5  A2  B3  1   fd 15
>
>
> Note that cbind dispatches on F1, an object of class
"data.frame".
> Therefore it's the method cbind.data.frame that is called and the
result
> is also a df, though tmp is a "matrix".
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> ?s 20:07 de 22/09/20, Rui Barradas escreveu:
> > Hello,
> >
> > Something like this?
> >
> >
> > F1$Y1 <- +grepl("_", F1$text)
> > F1 <- F1[c(1, 2, 4, 3)]
> > F1 <- tidyr::separate(F1, text, into = c("X1",
"X2"), sep = "_", fill > > "right")
> > F1
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > ?s 19:55 de 22/09/20, Val escreveu:
> >> HI All,
> >>
> >> I am trying to create   new columns based on another column string
> >> content. First I want to identify rows that contain a particular
> >> string.  If it contains, I want to split the string and create two
> >> variables.
> >>
> >> Here is my sample of data.
> >> F1<-read.table(text="ID1  ID2  text
> >> A1 B1   NONE
> >> A1 B1   cf_12
> >> A1 B1   NONE
> >> A2 B2   X2_25
> >> A2 B3   fd_15  ",header=TRUE,stringsAsFactors=F)
> >> If the variable "text" contains this "_" I
want to create an indicator
> >> variable as shown below
> >>
> >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
> >>
> >>
> >> Then I want to split that string in to two, before "_"
and after "_"
> >> and create two variables as shown below
> >> x1= strsplit(as.character(F1$text),'_',2)
> >>
> >> My problem is how to combine this with the original data frame.
The
> >> desired  output is shown   below,
> >>
> >>
> >> ID1 ID2  Y1   X1    X2
> >> A1  B1    0   NONE   .
> >> A1  B1   1    cf        12
> >> A1  B1   0  NONE   .
> >> A2  B2   1    X2    25
> >> A2  B3   1    fd    15
> >>
> >> Any help?
> >> Thank you.
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Val

2020-Sep-23 00:00 UTC

head link

[R] Split

Thank you all for the help!

LMH, Yes I would like to see the alternative.  I am using this for a
large data set and if the  alternative is more efficient than this
then I would be happy.

On Tue, Sep 22, 2020 at 6:25 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:>
> To be clear, I think Rui's solution is perfectly fine and probably
better than what I offer below. But just for fun, I wanted to do it without the
lapply().  Here is one way. I think my comments suffice to explain.
>
> > ## which are the  non "_" indices?
> > wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE)
> > ## paste "_." to these
> > F1[wh,"text"] <-
paste(F1[wh,"text"],".",sep = "_")
> > ## Now strsplit() and unlist() them to get a vector
> > z <- unlist(strsplit(F1$text, "_"))
> > ## now cbind() to the data frame
> > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
> > F1
>   ID1 ID2   text    1  2
> 1  A1  B1 NONE_. NONE  .
> 2  A1  B1  cf_12   cf 12
> 3  A1  B1 NONE_. NONE  .
> 4  A2  B2  X2_25   X2 25
> 5  A2  B3  fd_15   fd 15
> >## You can change the names of the 2 columns yourself
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarradas at
sapo.pt> wrote:
>>
>> Hello,
>>
>> A base R solution with strsplit, like in your code.
>>
>> F1$Y1 <- +grepl("_", F1$text)
>>
>> tmp <- strsplit(as.character(F1$text), "_")
>> tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x,
".") else x)
>> tmp <- do.call(rbind, tmp)
>> colnames(tmp) <- c("X1", "X2")
>> F1 <- cbind(F1[-3], tmp)    # remove the original column
>> rm(tmp)
>>
>> F1
>> #  ID1 ID2 Y1   X1 X2
>> #1  A1  B1  0 NONE  .
>> #2  A1  B1  1   cf 12
>> #3  A1  B1  0 NONE  .
>> #4  A2  B2  1   X2 25
>> #5  A2  B3  1   fd 15
>>
>>
>> Note that cbind dispatches on F1, an object of class
"data.frame".
>> Therefore it's the method cbind.data.frame that is called and the
result
>> is also a df, though tmp is a "matrix".
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>> ?s 20:07 de 22/09/20, Rui Barradas escreveu:
>> > Hello,
>> >
>> > Something like this?
>> >
>> >
>> > F1$Y1 <- +grepl("_", F1$text)
>> > F1 <- F1[c(1, 2, 4, 3)]
>> > F1 <- tidyr::separate(F1, text, into = c("X1",
"X2"), sep = "_", fill >> > "right")
>> > F1
>> >
>> >
>> > Hope this helps,
>> >
>> > Rui Barradas
>> >
>> > ?s 19:55 de 22/09/20, Val escreveu:
>> >> HI All,
>> >>
>> >> I am trying to create   new columns based on another column
string
>> >> content. First I want to identify rows that contain a
particular
>> >> string.  If it contains, I want to split the string and create
two
>> >> variables.
>> >>
>> >> Here is my sample of data.
>> >> F1<-read.table(text="ID1  ID2  text
>> >> A1 B1   NONE
>> >> A1 B1   cf_12
>> >> A1 B1   NONE
>> >> A2 B2   X2_25
>> >> A2 B3   fd_15  ",header=TRUE,stringsAsFactors=F)
>> >> If the variable "text" contains this "_" I
want to create an indicator
>> >> variable as shown below
>> >>
>> >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
>> >>
>> >>
>> >> Then I want to split that string in to two, before
"_" and after "_"
>> >> and create two variables as shown below
>> >> x1= strsplit(as.character(F1$text),'_',2)
>> >>
>> >> My problem is how to combine this with the original data
frame. The
>> >> desired  output is shown   below,
>> >>
>> >>
>> >> ID1 ID2  Y1   X1    X2
>> >> A1  B1    0   NONE   .
>> >> A1  B1   1    cf        12
>> >> A1  B1   0  NONE   .
>> >> A2  B2   1    X2    25
>> >> A2  B3   1    fd    15
>> >>
>> >> Any help?
>> >> Thank you.
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible
code.
>> >>
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

Bill Dunlap

2020-Sep-23 00:45 UTC

head link

[R] Split

Another way to make columns out of the stuff before and after the
underscore, with NAs if there is no underscore, is

utils::strcapture("([^_]*)_(.*)", F1$text,
proto=data.frame(Before_=character(), After_=character()))

-Bill

On Tue, Sep 22, 2020 at 4:25 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> To be clear, I think Rui's solution is perfectly fine and probably
better
> than what I offer below. But just for fun, I wanted to do it without the
> lapply().  Here is one way. I think my comments suffice to explain.
>
> > ## which are the  non "_" indices?
> > wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE)
> > ## paste "_." to these
> > F1[wh,"text"] <-
paste(F1[wh,"text"],".",sep = "_")
> > ## Now strsplit() and unlist() them to get a vector
> > z <- unlist(strsplit(F1$text, "_"))
> > ## now cbind() to the data frame
> > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
> > F1
>   ID1 ID2   text    1  2
> 1  A1  B1 NONE_. NONE  .
> 2  A1  B1  cf_12   cf 12
> 3  A1  B1 NONE_. NONE  .
> 4  A2  B2  X2_25   X2 25
> 5  A2  B3  fd_15   fd 15
> >## You can change the names of the 2 columns yourself
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarradas at
sapo.pt>
> wrote:
>
> > Hello,
> >
> > A base R solution with strsplit, like in your code.
> >
> > F1$Y1 <- +grepl("_", F1$text)
> >
> > tmp <- strsplit(as.character(F1$text), "_")
> > tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x,
".") else x)
> > tmp <- do.call(rbind, tmp)
> > colnames(tmp) <- c("X1", "X2")
> > F1 <- cbind(F1[-3], tmp)    # remove the original column
> > rm(tmp)
> >
> > F1
> > #  ID1 ID2 Y1   X1 X2
> > #1  A1  B1  0 NONE  .
> > #2  A1  B1  1   cf 12
> > #3  A1  B1  0 NONE  .
> > #4  A2  B2  1   X2 25
> > #5  A2  B3  1   fd 15
> >
> >
> > Note that cbind dispatches on F1, an object of class
"data.frame".
> > Therefore it's the method cbind.data.frame that is called and the
result
> > is also a df, though tmp is a "matrix".
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> > ?s 20:07 de 22/09/20, Rui Barradas escreveu:
> > > Hello,
> > >
> > > Something like this?
> > >
> > >
> > > F1$Y1 <- +grepl("_", F1$text)
> > > F1 <- F1[c(1, 2, 4, 3)]
> > > F1 <- tidyr::separate(F1, text, into = c("X1",
"X2"), sep = "_", fill > > > "right")
> > > F1
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > > ?s 19:55 de 22/09/20, Val escreveu:
> > >> HI All,
> > >>
> > >> I am trying to create   new columns based on another column
string
> > >> content. First I want to identify rows that contain a
particular
> > >> string.  If it contains, I want to split the string and
create two
> > >> variables.
> > >>
> > >> Here is my sample of data.
> > >> F1<-read.table(text="ID1  ID2  text
> > >> A1 B1   NONE
> > >> A1 B1   cf_12
> > >> A1 B1   NONE
> > >> A2 B2   X2_25
> > >> A2 B3   fd_15  ",header=TRUE,stringsAsFactors=F)
> > >> If the variable "text" contains this "_"
I want to create an indicator
> > >> variable as shown below
> > >>
> > >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
> > >>
> > >>
> > >> Then I want to split that string in to two, before
"_" and after "_"
> > >> and create two variables as shown below
> > >> x1= strsplit(as.character(F1$text),'_',2)
> > >>
> > >> My problem is how to combine this with the original data
frame. The
> > >> desired  output is shown   below,
> > >>
> > >>
> > >> ID1 ID2  Y1   X1    X2
> > >> A1  B1    0   NONE   .
> > >> A1  B1   1    cf        12
> > >> A1  B1   0  NONE   .
> > >> A2  B2   1    X2    25
> > >> A2  B3   1    fd    15
> > >>
> > >> Any help?
> > >> Thank you.
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible
code.
> > >>
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Sep 2020 - Split

[R] Split

[R] Split

[R] Split

[R] Split