HI All, I am trying to create new columns based on another column string content. First I want to identify rows that contain a particular string. If it contains, I want to split the string and create two variables. Here is my sample of data. F1<-read.table(text="ID1 ID2 text A1 B1 NONE A1 B1 cf_12 A1 B1 NONE A2 B2 X2_25 A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F) If the variable "text" contains this "_" I want to create an indicator variable as shown below F1$Y1 <- ifelse(grepl("_", F1$text),1,0) Then I want to split that string in to two, before "_" and after "_" and create two variables as shown below x1= strsplit(as.character(F1$text),'_',2) My problem is how to combine this with the original data frame. The desired output is shown below, ID1 ID2 Y1 X1 X2 A1 B1 0 NONE . A1 B1 1 cf 12 A1 B1 0 NONE . A2 B2 1 X2 25 A2 B3 1 fd 15 Any help? Thank you.
Hello, Something like this? F1$Y1 <- +grepl("_", F1$text) F1 <- F1[c(1, 2, 4, 3)] F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill = "right") F1 Hope this helps, Rui Barradas ?s 19:55 de 22/09/20, Val escreveu:> HI All, > > I am trying to create new columns based on another column string > content. First I want to identify rows that contain a particular > string. If it contains, I want to split the string and create two > variables. > > Here is my sample of data. > F1<-read.table(text="ID1 ID2 text > A1 B1 NONE > A1 B1 cf_12 > A1 B1 NONE > A2 B2 X2_25 > A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F) > If the variable "text" contains this "_" I want to create an indicator > variable as shown below > > F1$Y1 <- ifelse(grepl("_", F1$text),1,0) > > > Then I want to split that string in to two, before "_" and after "_" > and create two variables as shown below > x1= strsplit(as.character(F1$text),'_',2) > > My problem is how to combine this with the original data frame. The > desired output is shown below, > > > ID1 ID2 Y1 X1 X2 > A1 B1 0 NONE . > A1 B1 1 cf 12 > A1 B1 0 NONE . > A2 B2 1 X2 25 > A2 B3 1 fd 15 > > Any help? > Thank you. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hello, A base R solution with strsplit, like in your code. F1$Y1 <- +grepl("_", F1$text) tmp <- strsplit(as.character(F1$text), "_") tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x) tmp <- do.call(rbind, tmp) colnames(tmp) <- c("X1", "X2") F1 <- cbind(F1[-3], tmp) # remove the original column rm(tmp) F1 # ID1 ID2 Y1 X1 X2 #1 A1 B1 0 NONE . #2 A1 B1 1 cf 12 #3 A1 B1 0 NONE . #4 A2 B2 1 X2 25 #5 A2 B3 1 fd 15 Note that cbind dispatches on F1, an object of class "data.frame". Therefore it's the method cbind.data.frame that is called and the result is also a df, though tmp is a "matrix". Hope this helps, Rui Barradas ?s 20:07 de 22/09/20, Rui Barradas escreveu:> Hello, > > Something like this? > > > F1$Y1 <- +grepl("_", F1$text) > F1 <- F1[c(1, 2, 4, 3)] > F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill = > "right") > F1 > > > Hope this helps, > > Rui Barradas > > ?s 19:55 de 22/09/20, Val escreveu: >> HI All, >> >> I am trying to create?? new columns based on another column string >> content. First I want to identify rows that contain a particular >> string.? If it contains, I want to split the string and create two >> variables. >> >> Here is my sample of data. >> F1<-read.table(text="ID1? ID2? text >> A1 B1?? NONE >> A1 B1?? cf_12 >> A1 B1?? NONE >> A2 B2?? X2_25 >> A2 B3?? fd_15? ",header=TRUE,stringsAsFactors=F) >> If the variable "text" contains this "_" I want to create an indicator >> variable as shown below >> >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0) >> >> >> Then I want to split that string in to two, before "_" and after "_" >> and create two variables as shown below >> x1= strsplit(as.character(F1$text),'_',2) >> >> My problem is how to combine this with the original data frame. The >> desired? output is shown?? below, >> >> >> ID1 ID2? Y1?? X1??? X2 >> A1? B1??? 0?? NONE?? . >> A1? B1?? 1??? cf??????? 12 >> A1? B1?? 0? NONE?? . >> A2? B2?? 1??? X2??? 25 >> A2? B3?? 1??? fd??? 15 >> >> Any help? >> Thank you. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Sometimes it just makes more sense to pre-process your data and get it into the format you need. It just depends on whether you are more comfortable programing in R or in some other text manipulation language like bash/sed/awk/grep etc. If you know how to do this with other tools, you could write a script and probably call the script from R. I could post a sample if you are interested. LMH Val wrote:> HI All, > > I am trying to create new columns based on another column string > content. First I want to identify rows that contain a particular > string. If it contains, I want to split the string and create two > variables. > > Here is my sample of data. > F1<-read.table(text="ID1 ID2 text > A1 B1 NONE > A1 B1 cf_12 > A1 B1 NONE > A2 B2 X2_25 > A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F) > If the variable "text" contains this "_" I want to create an indicator > variable as shown below > > F1$Y1 <- ifelse(grepl("_", F1$text),1,0) > > > Then I want to split that string in to two, before "_" and after "_" > and create two variables as shown below > x1= strsplit(as.character(F1$text),'_',2) > > My problem is how to combine this with the original data frame. The > desired output is shown below, > > > ID1 ID2 Y1 X1 X2 > A1 B1 0 NONE . > A1 B1 1 cf 12 > A1 B1 0 NONE . > A2 B2 1 X2 25 > A2 B3 1 fd 15 > > Any help? > Thank you. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >