thr3ads.net - R help - [R] Split [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Bill Dunlap

2020-Sep-23 00:45 UTC

[R] Split

Another way to make columns out of the stuff before and after the
underscore, with NAs if there is no underscore, is

utils::strcapture("([^_]*)_(.*)", F1$text,
proto=data.frame(Before_=character(), After_=character()))

-Bill

On Tue, Sep 22, 2020 at 4:25 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> To be clear, I think Rui's solution is perfectly fine and probably
better
> than what I offer below. But just for fun, I wanted to do it without the
> lapply().  Here is one way. I think my comments suffice to explain.
>
> > ## which are the  non "_" indices?
> > wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE)
> > ## paste "_." to these
> > F1[wh,"text"] <-
paste(F1[wh,"text"],".",sep = "_")
> > ## Now strsplit() and unlist() them to get a vector
> > z <- unlist(strsplit(F1$text, "_"))
> > ## now cbind() to the data frame
> > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
> > F1
>   ID1 ID2   text    1  2
> 1  A1  B1 NONE_. NONE  .
> 2  A1  B1  cf_12   cf 12
> 3  A1  B1 NONE_. NONE  .
> 4  A2  B2  X2_25   X2 25
> 5  A2  B3  fd_15   fd 15
> >## You can change the names of the 2 columns yourself
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarradas at
sapo.pt>
> wrote:
>
> > Hello,
> >
> > A base R solution with strsplit, like in your code.
> >
> > F1$Y1 <- +grepl("_", F1$text)
> >
> > tmp <- strsplit(as.character(F1$text), "_")
> > tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x,
".") else x)
> > tmp <- do.call(rbind, tmp)
> > colnames(tmp) <- c("X1", "X2")
> > F1 <- cbind(F1[-3], tmp)    # remove the original column
> > rm(tmp)
> >
> > F1
> > #  ID1 ID2 Y1   X1 X2
> > #1  A1  B1  0 NONE  .
> > #2  A1  B1  1   cf 12
> > #3  A1  B1  0 NONE  .
> > #4  A2  B2  1   X2 25
> > #5  A2  B3  1   fd 15
> >
> >
> > Note that cbind dispatches on F1, an object of class
"data.frame".
> > Therefore it's the method cbind.data.frame that is called and the
result
> > is also a df, though tmp is a "matrix".
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> > ?s 20:07 de 22/09/20, Rui Barradas escreveu:
> > > Hello,
> > >
> > > Something like this?
> > >
> > >
> > > F1$Y1 <- +grepl("_", F1$text)
> > > F1 <- F1[c(1, 2, 4, 3)]
> > > F1 <- tidyr::separate(F1, text, into = c("X1",
"X2"), sep = "_", fill > > > "right")
> > > F1
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > > ?s 19:55 de 22/09/20, Val escreveu:
> > >> HI All,
> > >>
> > >> I am trying to create   new columns based on another column
string
> > >> content. First I want to identify rows that contain a
particular
> > >> string.  If it contains, I want to split the string and
create two
> > >> variables.
> > >>
> > >> Here is my sample of data.
> > >> F1<-read.table(text="ID1  ID2  text
> > >> A1 B1   NONE
> > >> A1 B1   cf_12
> > >> A1 B1   NONE
> > >> A2 B2   X2_25
> > >> A2 B3   fd_15  ",header=TRUE,stringsAsFactors=F)
> > >> If the variable "text" contains this "_"
I want to create an indicator
> > >> variable as shown below
> > >>
> > >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
> > >>
> > >>
> > >> Then I want to split that string in to two, before
"_" and after "_"
> > >> and create two variables as shown below
> > >> x1= strsplit(as.character(F1$text),'_',2)
> > >>
> > >> My problem is how to combine this with the original data
frame. The
> > >> desired  output is shown   below,
> > >>
> > >>
> > >> ID1 ID2  Y1   X1    X2
> > >> A1  B1    0   NONE   .
> > >> A1  B1   1    cf        12
> > >> A1  B1   0  NONE   .
> > >> A2  B2   1    X2    25
> > >> A2  B3   1    fd    15
> > >>
> > >> Any help?
> > >> Thank you.
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible
code.
> > >>
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Bert Gunter

2020-Sep-23 01:47 UTC

head link

[R] Split

That was still slower and doesn't quite give what was requested:
> cbind(F1,utils::strcapture("([^_]*)_(.*)", F1$text,proto=data.frame(Before_=character(), After_=character())))
  ID1 ID2  text Before_ After_
1  A1  B1  NONE    <NA>   <NA>
2  A1  B1 cf_12      cf     12
3  A1  B1  NONE    <NA>   <NA>
4  A2  B2 X2_25      X2     25
5  A2  B3 fd_15      fd     15
> system.time({+ cbind(F2,utils::strcapture("([^_]*)_(.*)", F2$text,
proto=data.frame(Before_=character(), After_=character())))
+ }
+ )
   user  system elapsed
 32.712   0.736  33.587

Cheers,
Bert




On Tue, Sep 22, 2020 at 5:45 PM Bill Dunlap <williamwdunlap at gmail.com>
wrote:
> Another way to make columns out of the stuff before and after the
> underscore, with NAs if there is no underscore, is
>
> utils::strcapture("([^_]*)_(.*)", F1$text,
> proto=data.frame(Before_=character(), After_=character()))
>
> -Bill
>
> On Tue, Sep 22, 2020 at 4:25 PM Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
>
>> To be clear, I think Rui's solution is perfectly fine and probably
better
>> than what I offer below. But just for fun, I wanted to do it without
the
>> lapply().  Here is one way. I think my comments suffice to explain.
>>
>> > ## which are the  non "_" indices?
>> > wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE)
>> > ## paste "_." to these
>> > F1[wh,"text"] <-
paste(F1[wh,"text"],".",sep = "_")
>> > ## Now strsplit() and unlist() them to get a vector
>> > z <- unlist(strsplit(F1$text, "_"))
>> > ## now cbind() to the data frame
>> > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
>> > F1
>>   ID1 ID2   text    1  2
>> 1  A1  B1 NONE_. NONE  .
>> 2  A1  B1  cf_12   cf 12
>> 3  A1  B1 NONE_. NONE  .
>> 4  A2  B2  X2_25   X2 25
>> 5  A2  B3  fd_15   fd 15
>> >## You can change the names of the 2 columns yourself
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming
along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic
strip )
>>
>>
>> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarradas at
sapo.pt>
>> wrote:
>>
>> > Hello,
>> >
>> > A base R solution with strsplit, like in your code.
>> >
>> > F1$Y1 <- +grepl("_", F1$text)
>> >
>> > tmp <- strsplit(as.character(F1$text), "_")
>> > tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x,
".") else x)
>> > tmp <- do.call(rbind, tmp)
>> > colnames(tmp) <- c("X1", "X2")
>> > F1 <- cbind(F1[-3], tmp)    # remove the original column
>> > rm(tmp)
>> >
>> > F1
>> > #  ID1 ID2 Y1   X1 X2
>> > #1  A1  B1  0 NONE  .
>> > #2  A1  B1  1   cf 12
>> > #3  A1  B1  0 NONE  .
>> > #4  A2  B2  1   X2 25
>> > #5  A2  B3  1   fd 15
>> >
>> >
>> > Note that cbind dispatches on F1, an object of class
"data.frame".
>> > Therefore it's the method cbind.data.frame that is called and
the result
>> > is also a df, though tmp is a "matrix".
>> >
>> >
>> > Hope this helps,
>> >
>> > Rui Barradas
>> >
>> >
>> > ?s 20:07 de 22/09/20, Rui Barradas escreveu:
>> > > Hello,
>> > >
>> > > Something like this?
>> > >
>> > >
>> > > F1$Y1 <- +grepl("_", F1$text)
>> > > F1 <- F1[c(1, 2, 4, 3)]
>> > > F1 <- tidyr::separate(F1, text, into = c("X1",
"X2"), sep = "_", fill
>> >> > > "right")
>> > > F1
>> > >
>> > >
>> > > Hope this helps,
>> > >
>> > > Rui Barradas
>> > >
>> > > ?s 19:55 de 22/09/20, Val escreveu:
>> > >> HI All,
>> > >>
>> > >> I am trying to create   new columns based on another
column string
>> > >> content. First I want to identify rows that contain a
particular
>> > >> string.  If it contains, I want to split the string and
create two
>> > >> variables.
>> > >>
>> > >> Here is my sample of data.
>> > >> F1<-read.table(text="ID1  ID2  text
>> > >> A1 B1   NONE
>> > >> A1 B1   cf_12
>> > >> A1 B1   NONE
>> > >> A2 B2   X2_25
>> > >> A2 B3   fd_15  ",header=TRUE,stringsAsFactors=F)
>> > >> If the variable "text" contains this
"_" I want to create an
>> indicator
>> > >> variable as shown below
>> > >>
>> > >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
>> > >>
>> > >>
>> > >> Then I want to split that string in to two, before
"_" and after "_"
>> > >> and create two variables as shown below
>> > >> x1= strsplit(as.character(F1$text),'_',2)
>> > >>
>> > >> My problem is how to combine this with the original data
frame. The
>> > >> desired  output is shown   below,
>> > >>
>> > >>
>> > >> ID1 ID2  Y1   X1    X2
>> > >> A1  B1    0   NONE   .
>> > >> A1  B1   1    cf        12
>> > >> A1  B1   0  NONE   .
>> > >> A2  B2   1    X2    25
>> > >> A2  B3   1    fd    15
>> > >>
>> > >> Any help?
>> > >> Thank you.
>> > >>
>> > >> ______________________________________________
>> > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
>> > >> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >> PLEASE do read the posting guide
>> > >> http://www.R-project.org/posting-guide.html
>> > >> and provide commented, minimal, self-contained,
reproducible code.
>> > >>
>> > >
>> > > ______________________________________________
>> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> > > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible
code.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
	[[alternative HTML version deleted]]

Rui Barradas

2020-Sep-23 10:58 UTC

head link

[R] Split

Hello,

If speed is important, and following the previous discussion and Bert's 
tests, here are two other alternatives, both faster.

1. Bert2 is Bert's original but with scan(., sep = "_")
substituted for
unlist/strsplit.
2. A package data.table solution. These are always fast, many times the 
fastest. But have the inconvenience of coercing the data to class 
"data.table" and the rest of the code needs to be adapted to handle 
data.tables. Namely, the second index in dt[i, j] is no longer a column 
index.

Unlike Bert, I time my first code, the one with package tidyr and its 
performance clearly beats the second one.
I define a test function, running several input sizes. It doesn't take 
much time to complete, only several minutes. The times' differences are 
not as impressive as Bert's, probably due to be on a different OS. I'm 
running R 4.0.2 on Ubuntu 20.04, sessionInfo at the end.

Also, I find X$Y1 <- as.integer(grepl("_", X$text)) more readable
than
coercion to numeric with +grepl(.).



library(data.table)
library(microbenchmark)
library(ggplot2)

Rui1 <- function(X){
   #X$Y1 <- as.integer(grepl("_", X$text))
   tidyr::separate(X, text, into = c("X1", "X2"), sep =
"_", fill = "right")
}
Bert <- function(X){
   ## which are the  non "_" indices?
   wh <- grep("_",X$text, fixed = TRUE, invert = TRUE)
   ## paste "_." to these
   X[wh,"text"] <- paste(X[wh,"text"],".",sep =
"_")
   ## Now strsplit() and unlist() them to get a vector
   z <- unlist(strsplit(X$text, "_"))
   ## now cbind() to the data frame
   cbind(X, matrix(z, ncol = 2, byrow = TRUE))
}
Bert2 <- function(X){
   wh <- grep("_",X$text, fixed = TRUE, invert = TRUE)
   X[wh,"text"] <- paste(X[wh,"text"],".",sep =
"_")
   z <- scan(what = character(), text = X$text, sep = "_")
   cbind(X, matrix(z, ncol = 2, byrow = TRUE))
}
DT <- function(X){
   Y <- as.data.table(X)
   Y[, c("X1", "X2") := tstrsplit(text, "_", fixed
= TRUE)]
}

testSeparate <- function(X, size = 0:6, times = 10){
   row_nums <- seq_len(nrow(X))
   res <- lapply(size, function(s){
     Y <- X[rep(row_nums, 10^s), ]
     mb <- microbenchmark(
       Rui = Rui1(Y),
       Bert = Bert(Y),
       Bert2 = Bert2(Y),
       DT = DT(Y),
       times = times
     )
     mb$size <- s
     mb
   })
   # return median times
   res <- do.call(rbind, res)
   aggregate(time ~ size + expr, res, median)
}

F1 <- read.table(text="ID1  ID2  text
A1 B1   NONE
A1 B1   cf_12
A1 B1   NONE
A2 B2   X2_25
A2 B3   fd_15  ",header=TRUE,stringsAsFactors=F)

agg <- testSeparate(F1, times = 5)

ggplot(agg, aes(size, time, color = expr)) +
   geom_line() + geom_point() +
   scale_y_continuous(trans = "log10") +
   xlab(expression(log[10] ~ "(size)")) +
   ylab(expression(log[10] ~ "(time)"))


sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
  [1] LC_CTYPE=pt_PT.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=pt_PT.UTF-8        LC_COLLATE=pt_PT.UTF-8
  [5] LC_MONETARY=pt_PT.UTF-8    LC_MESSAGES=pt_PT.UTF-8
  [7] LC_PAPER=pt_PT.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] ggplot2_3.3.2        microbenchmark_1.4-7 data.table_1.12.8

loaded via a namespace (and not attached):
  [1] Rcpp_1.0.5       magrittr_1.5     tidyselect_1.1.0 munsell_0.5.0
  [5] colorspace_1.4-1 R6_2.4.1         rlang_0.4.7      dplyr_1.0.2
  [9] tools_4.0.2      grid_4.0.2       gtable_0.3.0     withr_2.2.0
[13] ellipsis_0.3.1   digest_0.6.25    tibble_3.0.3     lifecycle_0.2.0
[17] crayon_1.3.4     purrr_0.3.4      farver_2.0.3     tidyr_1.0.2
[21] vctrs_0.3.4      glue_1.4.2       labeling_0.3     stringi_1.4.6
[25] compiler_4.0.2   pillar_1.4.6     generics_0.0.2   scales_1.1.0
[29] pkgconfig_2.0.3


Hope this helps,

Rui Barradas


?s 02:47 de 23/09/20, Bert Gunter escreveu:> That was still slower and doesn't quite give what was requested:
> 
>  > cbind(F1,utils::strcapture("([^_]*)_(.*)", F1$text, 
> proto=data.frame(Before_=character(), After_=character())))
>  ? ID1 ID2 ?text Before_ After_
> 1 ?A1 ?B1 ?NONE ? ?<NA> ? <NA>
> 2 ?A1 ?B1 cf_12 ? ? ?cf ? ? 12
> 3 ?A1 ?B1 ?NONE ? ?<NA> ? <NA>
> 4 ?A2 ?B2 X2_25 ? ? ?X2 ? ? 25
> 5 ?A2 ?B3 fd_15 ? ? ?fd ? ? 15
> 
>  > system.time({
> + cbind(F2,utils::strcapture("([^_]*)_(.*)", F2$text, 
> proto=data.frame(Before_=character(), After_=character())))
> + }
> + )
>  ? ?user ?system elapsed
>  ?32.712 ? 0.736 ?33.587
> 
> Cheers,
> Bert
> 
> 
> 
> 
> On Tue, Sep 22, 2020 at 5:45 PM Bill Dunlap <williamwdunlap at gmail.com
> <mailto:williamwdunlap at gmail.com>> wrote:
> 
>     Another way to make columns out of the stuff before and after the
>     underscore, with NAs if there is no underscore, is
> 
>     utils::strcapture("([^_]*)_(.*)", F1$text,
>     proto=data.frame(Before_=character(), After_=character()))
> 
>     -Bill
> 
>     On Tue, Sep 22, 2020 at 4:25 PM Bert Gunter <bgunter.4567 at
gmail.com
>     <mailto:bgunter.4567 at gmail.com>> wrote:
> 
>         To be clear, I think Rui's solution is perfectly fine and
>         probably better
>         than what I offer below. But just for fun, I wanted to do it
>         without the
>         lapply().? Here is one way. I think my comments suffice to explain.
> 
>          > ## which are the? non "_" indices?
>          > wh <- grep("_",F1$text, fixed = TRUE, invert =
TRUE)
>          > ## paste "_." to these
>          > F1[wh,"text"] <-
paste(F1[wh,"text"],".",sep = "_")
>          > ## Now strsplit() and unlist() them to get a vector
>          > z <- unlist(strsplit(F1$text, "_"))
>          > ## now cbind() to the data frame
>          > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
>          > F1
>          ? ID1 ID2? ?text? ? 1? 2
>         1? A1? B1 NONE_. NONE? .
>         2? A1? B1? cf_12? ?cf 12
>         3? A1? B1 NONE_. NONE? .
>         4? A2? B2? X2_25? ?X2 25
>         5? A2? B3? fd_15? ?fd 15
>          >## You can change the names of the 2 columns yourself
> 
>         Cheers,
>         Bert
> 
>         Bert Gunter
> 
>         "The trouble with having an open mind is that people keep
coming
>         along and
>         sticking things into it."
>         -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
> 
> 
>         On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas
>         <ruipbarradas at sapo.pt <mailto:ruipbarradas at
sapo.pt>> wrote:
> 
>          > Hello,
>          >
>          > A base R solution with strsplit, like in your code.
>          >
>          > F1$Y1 <- +grepl("_", F1$text)
>          >
>          > tmp <- strsplit(as.character(F1$text), "_")
>          > tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x,
".")
>         else x)
>          > tmp <- do.call(rbind, tmp)
>          > colnames(tmp) <- c("X1", "X2")
>          > F1 <- cbind(F1[-3], tmp)? ? # remove the original column
>          > rm(tmp)
>          >
>          > F1
>          > #? ID1 ID2 Y1? ?X1 X2
>          > #1? A1? B1? 0 NONE? .
>          > #2? A1? B1? 1? ?cf 12
>          > #3? A1? B1? 0 NONE? .
>          > #4? A2? B2? 1? ?X2 25
>          > #5? A2? B3? 1? ?fd 15
>          >
>          >
>          > Note that cbind dispatches on F1, an object of class
>         "data.frame".
>          > Therefore it's the method cbind.data.frame that is called
and
>         the result
>          > is also a df, though tmp is a "matrix".
>          >
>          >
>          > Hope this helps,
>          >
>          > Rui Barradas
>          >
>          >
>          > ?s 20:07 de 22/09/20, Rui Barradas escreveu:
>          > > Hello,
>          > >
>          > > Something like this?
>          > >
>          > >
>          > > F1$Y1 <- +grepl("_", F1$text)
>          > > F1 <- F1[c(1, 2, 4, 3)]
>          > > F1 <- tidyr::separate(F1, text, into =
c("X1", "X2"), sep >         "_", fill >    
> > "right")
>          > > F1
>          > >
>          > >
>          > > Hope this helps,
>          > >
>          > > Rui Barradas
>          > >
>          > > ?s 19:55 de 22/09/20, Val escreveu:
>          > >> HI All,
>          > >>
>          > >> I am trying to create? ?new columns based on another
>         column string
>          > >> content. First I want to identify rows that contain
a
>         particular
>          > >> string.? If it contains, I want to split the string
and
>         create two
>          > >> variables.
>          > >>
>          > >> Here is my sample of data.
>          > >> F1<-read.table(text="ID1? ID2? text
>          > >> A1 B1? ?NONE
>          > >> A1 B1? ?cf_12
>          > >> A1 B1? ?NONE
>          > >> A2 B2? ?X2_25
>          > >> A2 B3? ?fd_15?
",header=TRUE,stringsAsFactors=F)
>          > >> If the variable "text" contains this
"_" I want to create
>         an indicator
>          > >> variable as shown below
>          > >>
>          > >> F1$Y1 <- ifelse(grepl("_",
F1$text),1,0)
>          > >>
>          > >>
>          > >> Then I want to split that string in to two, before
"_" and
>         after "_"
>          > >> and create two variables as shown below
>          > >> x1= strsplit(as.character(F1$text),'_',2)
>          > >>
>          > >> My problem is how to combine this with the original
data
>         frame. The
>          > >> desired? output is shown? ?below,
>          > >>
>          > >>
>          > >> ID1 ID2? Y1? ?X1? ? X2
>          > >> A1? B1? ? 0? ?NONE? ?.
>          > >> A1? B1? ?1? ? cf? ? ? ? 12
>          > >> A1? B1? ?0? NONE? ?.
>          > >> A2? B2? ?1? ? X2? ? 25
>          > >> A2? B3? ?1? ? fd? ? 15
>          > >>
>          > >> Any help?
>          > >> Thank you.
>          > >>
>          > >> ______________________________________________
>          > >> R-help at r-project.org <mailto:R-help at
r-project.org> mailing
>         list -- To UNSUBSCRIBE and more, see
>          > >> https://stat.ethz.ch/mailman/listinfo/r-help
>          > >> PLEASE do read the posting guide
>          > >> http://www.R-project.org/posting-guide.html
>          > >> and provide commented, minimal, self-contained,
>         reproducible code.
>          > >>
>          > >
>          > > ______________________________________________
>          > > R-help at r-project.org <mailto:R-help at
r-project.org> mailing
>         list -- To UNSUBSCRIBE and more, see
>          > > https://stat.ethz.ch/mailman/listinfo/r-help
>          > > PLEASE do read the posting guide
>          > > http://www.R-project.org/posting-guide.html
>          > > and provide commented, minimal, self-contained,
>         reproducible code.
>          >
>          > ______________________________________________
>          > R-help at r-project.org <mailto:R-help at
r-project.org> mailing
>         list -- To UNSUBSCRIBE and more, see
>          > https://stat.ethz.ch/mailman/listinfo/r-help
>          > PLEASE do read the posting guide
>          > http://www.R-project.org/posting-guide.html
>          > and provide commented, minimal, self-contained, reproducible
>         code.
>          >
> 
>          ? ? ? ? [[alternative HTML version deleted]]
> 
>         ______________________________________________
>         R-help at r-project.org <mailto:R-help at r-project.org>
mailing list
>         -- To UNSUBSCRIBE and more, see
>         https://stat.ethz.ch/mailman/listinfo/r-help
>         PLEASE do read the posting guide
>         http://www.R-project.org/posting-guide.html
>         and provide commented, minimal, self-contained, reproducible code.
>

R help - Sep 2020 - Split

[R] Split

[R] Split

[R] Split