Hi, I am using colsplit (package = reshape) to split all strings in a column according to the same patterns. Here an example: library(reshape2) df1 <- data.frame(x=c("str1_name2", "str3_name5")) df2 <- data.frame(df1, colsplit(df1$x, pattern = "_", names=c("str","name"))) This is nearly what I want but I want to remove the words "str" and "name" from the values, because the columns are already named with that words. Is there a way to remove them using colsplit? Or any other simple way? /johannes
Hi Johannes, On Thu, Sep 27, 2012 at 7:25 AM, Johannes Radinger <johannesradinger at gmail.com> wrote:> Hi, > > I am using colsplit (package = reshape) to split all strings > in a column according to the same patterns. Here > an example: > > library(reshape2) > > > df1 <- data.frame(x=c("str1_name2", "str3_name5")) > df2 <- data.frame(df1, colsplit(df1$x, pattern = "_", names=c("str","name"))) > > This is nearly what I want but I want to remove the words "str" and > "name" from the values, because the columns are already named with > that words. Is there a way to remove them using colsplit? Or any other > simple way?You can remove them afterwords, e.g., df2$str <- gsub("[^0-9]", "", df2$str) df2$name <- gsub("[^0-9]", "", df2$name) Best, Ista> > /johannes > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, By looking at the output of pat <- "(str)|(_name)|( name)" strsplit(c("str1_name2", "str3_name5"), pat) [[1]] [1] "" "1" "2" [[2]] [1] "" "3" "5" I could understand why colsplit includes NAs as column 'str' values. So the hack is to fake we want three coluns and then set the first one to NULL. df2 <- data.frame(df1, colsplit(df1$x, pattern = pat, names=c("Null", "str","name"))) df2$Null <- NULL df2 I don't like it very much but it's simple and it works. Hope this helps, Rui Barradas Em 27-09-2012 12:25, Johannes Radinger escreveu:> Hi, > > I am using colsplit (package = reshape) to split all strings > in a column according to the same patterns. Here > an example: > > library(reshape2) > > > df1 <- data.frame(x=c("str1_name2", "str3_name5")) > df2 <- data.frame(df1, colsplit(df1$x, pattern = "_", names=c("str","name"))) > > This is nearly what I want but I want to remove the words "str" and > "name" from the values, because the columns are already named with > that words. Is there a way to remove them using colsplit? Or any other > simple way? > > /johannes > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi, You can also try this: df2 <- data.frame(df1, colsplit(df1$x, pattern = "_", names=c("str","name"))) df2list<-list(df2$str,df2$name) df2[,2:3]<-sapply(df2list,function(x) gsub(".*(\\d)","\\1",x)) df2 ?# ???????? x str name #1 str1_name2?? 1??? 2 #2 str3_name5?? 3??? 5 A.K. ----- Original Message ----- From: Ista Zahn <istazahn at gmail.com> To: Johannes Radinger <johannesradinger at gmail.com> Cc: r-help at r-project.org Sent: Thursday, September 27, 2012 7:43 AM Subject: Re: [R] Colsplit, removing parts of a string Hi Johannes, On Thu, Sep 27, 2012 at 7:25 AM, Johannes Radinger <johannesradinger at gmail.com> wrote:> Hi, > > I am using colsplit (package = reshape) to split all strings > in a column according to the same patterns. Here > an example: > > library(reshape2) > > > df1 <- data.frame(x=c("str1_name2", "str3_name5")) > df2 <- data.frame(df1, colsplit(df1$x, pattern = "_", names=c("str","name"))) > > This is nearly what I want but I want to remove the words "str" and > "name" from the values, because the columns are already named with > that words. Is there a way to remove them using colsplit? Or any other > simple way?You can remove them afterwords, e.g., df2$str <- gsub("[^0-9]", "", df2$str) df2$name <- gsub("[^0-9]", "", df2$name) Best, Ista> > /johannes > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thank you, this works perfectly... best regards, Johannes On Thu, Sep 27, 2012 at 1:43 PM, Ista Zahn <istazahn at gmail.com> wrote:> Hi Johannes, > > On Thu, Sep 27, 2012 at 7:25 AM, Johannes Radinger > <johannesradinger at gmail.com> wrote: >> Hi, >> >> I am using colsplit (package = reshape) to split all strings >> in a column according to the same patterns. Here >> an example: >> >> library(reshape2) >> >> >> df1 <- data.frame(x=c("str1_name2", "str3_name5")) >> df2 <- data.frame(df1, colsplit(df1$x, pattern = "_", names=c("str","name"))) >> >> This is nearly what I want but I want to remove the words "str" and >> "name" from the values, because the columns are already named with >> that words. Is there a way to remove them using colsplit? Or any other >> simple way? > > You can remove them afterwords, e.g., > > df2$str <- gsub("[^0-9]", "", df2$str) > df2$name <- gsub("[^0-9]", "", df2$name) > > Best, > Ista > >> >> /johannes >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.