Suppose I had the following string, which has length of integer multiple of some value n. So, say n=2, and the example string has a length of (2x4) = 8 characters. str <- "ABCDEFGH" What I'm trying to figure out is a simple, base-R coded way (which I heuristically call StrSubset in the following) to extract every nth character from the string, to generate a new string. So str <- "ABCDEFGH" new_str <- StrSubset(str); print(new_str) which would yield "ACEG" Best I could come up with is something like the following, where I extract every odd character from the string: StrSubset <- function(string) { paste(unlist(strsplit(string,""))[seq(1,nchar(string),2)],collapse="") } Anything more elegant come to mind? Trying to avoid regex if possible (harder to explain to end-users), but if that meets the 'more elegant' sniff test, happy to consider... Thanks in advance...
Gabor Grothendieck
2015-Sep-06 15:26 UTC
[R] extracting every nth character from a string...
This uses a regular expression but is shorter:> gsub("(.).", "\\1", "ABCDEFG")[1] "ACEG" It replaces each successive pair of characters with the first of that pair. If there is an odd number of characters then the last character is not matched and therefore kept -- thus it works properly for both even and odd. On Sat, Sep 5, 2015 at 4:59 PM, Evan Cooch <evan.cooch at gmail.com> wrote:> Suppose I had the following string, which has length of integer multiple > of some value n. So, say n=2, and the example string has a length of (2x4) > = 8 characters. > > str <- "ABCDEFGH" > > What I'm trying to figure out is a simple, base-R coded way (which I > heuristically call StrSubset in the following) to extract every nth > character from the string, to generate a new string. > > So > > str <- "ABCDEFGH" > > new_str <- StrSubset(str); > > print(new_str) > > which would yield > > "ACEG" > > > Best I could come up with is something like the following, where I extract > every odd character from the string: > > StrSubset <- function(string) > { > paste(unlist(strsplit(string,""))[seq(1,nchar(string),2)],collapse="") } > > > Anything more elegant come to mind? Trying to avoid regex if possible > (harder to explain to end-users), but if that meets the 'more elegant' > sniff test, happy to consider... > > Thanks in advance... > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]]
No rex, but not much less complicated, than your original but a different approach:> i <- seq(1, nchar(str), 2) > paste0(mapply(substr, str, i, i), collapse="")[1] "ACEG" ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Gabor Grothendieck Sent: Sunday, September 6, 2015 10:27 AM To: Evan Cooch Cc: r-help at r-project.org Subject: Re: [R] extracting every nth character from a string... This uses a regular expression but is shorter:> gsub("(.).", "\\1", "ABCDEFG")[1] "ACEG" It replaces each successive pair of characters with the first of that pair. If there is an odd number of characters then the last character is not matched and therefore kept -- thus it works properly for both even and odd. On Sat, Sep 5, 2015 at 4:59 PM, Evan Cooch <evan.cooch at gmail.com> wrote:> Suppose I had the following string, which has length of integer multiple > of some value n. So, say n=2, and the example string has a length of (2x4) > = 8 characters. > > str <- "ABCDEFGH" > > What I'm trying to figure out is a simple, base-R coded way (which I > heuristically call StrSubset in the following) to extract every nth > character from the string, to generate a new string. > > So > > str <- "ABCDEFGH" > > new_str <- StrSubset(str); > > print(new_str) > > which would yield > > "ACEG" > > > Best I could come up with is something like the following, where I extract > every odd character from the string: > > StrSubset <- function(string) > { > paste(unlist(strsplit(string,""))[seq(1,nchar(string),2)],collapse="") } > > > Anything more elegant come to mind? Trying to avoid regex if possible > (harder to explain to end-users), but if that meets the 'more elegant' > sniff test, happy to consider... > > Thanks in advance... > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
# or: strsplit("junk",split=NULL)[[1]][(1:nchar("junk"))%%2==1] strsplit("junk",split=NULL)[[1]][(1:nchar("junk"))%%2==0] -- View this message in context: http://r.789695.n4.nabble.com/extracting-every-nth-character-from-a-string-tp4711908p4711962.html Sent from the R help mailing list archive at Nabble.com.
# as a complete function StrSubset<-function(junk,n){ ifelse(n==1,junk, paste(strsplit(junk,split=NULL)[[1]][(1:nchar(junk))%%n==1],collapse='')) } -- View this message in context: http://r.789695.n4.nabble.com/extracting-every-nth-character-from-a-string-tp4711908p4711963.html Sent from the R help mailing list archive at Nabble.com.
> rawToChar( charToRaw( str)[ c( TRUE, FALSE)])[1] "ACEG" Regards On Sat, Sep 05, 2015 at 04:59:54PM -0400, Evan Cooch wrote:> Suppose I had the following string, which has length of integer multiple of > some value n. So, say n=2, and the example string has a length of (2x4) = 8 > characters. > > str <- "ABCDEFGH" > > What I'm trying to figure out is a simple, base-R coded way (which I > heuristically call StrSubset in the following) to extract every nth > character from the string, to generate a new string. > > So > > str <- "ABCDEFGH" > > new_str <- StrSubset(str); > > print(new_str) > > which would yield > > "ACEG" > > > Best I could come up with is something like the following, where I extract > every odd character from the string: > > StrSubset <- function(string) > { > paste(unlist(strsplit(string,""))[seq(1,nchar(string),2)],collapse="") } > > > Anything more elegant come to mind? Trying to avoid regex if possible > (harder to explain to end-users), but if that meets the 'more elegant' sniff > test, happy to consider... > > Thanks in advance... > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
charToRaw is not good here because it splits up multibyte characters: strsplit(str, "") will split str into its characters. E.g.,> str <- c("ggaammmmaa12:\u03b3, OOmmeeggaa12:\u03A9...") > rawToChar( charToRaw( str)[ c( TRUE, FALSE)])[1] "gamma1:? Omega1:?."> paste(collapse="", strsplit(str,split=NULL)[[1]][(1:nchar(str))%%2==1])[1] "gamma1:, Omega2?."> gsub("(.)(.)", "\\1", str)[1] "gamma1:, Omega2?." Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Sep 8, 2015 at 2:37 PM, Frank Schwidom <schwidom at gmx.net> wrote:> > > rawToChar( charToRaw( str)[ c( TRUE, FALSE)]) > [1] "ACEG" > > Regards > > On Sat, Sep 05, 2015 at 04:59:54PM -0400, Evan Cooch wrote: > > Suppose I had the following string, which has length of integer multiple > of > > some value n. So, say n=2, and the example string has a length of (2x4) > = 8 > > characters. > > > > str <- "ABCDEFGH" > > > > What I'm trying to figure out is a simple, base-R coded way (which I > > heuristically call StrSubset in the following) to extract every nth > > character from the string, to generate a new string. > > > > So > > > > str <- "ABCDEFGH" > > > > new_str <- StrSubset(str); > > > > print(new_str) > > > > which would yield > > > > "ACEG" > > > > > > Best I could come up with is something like the following, where I > extract > > every odd character from the string: > > > > StrSubset <- function(string) > > { > > paste(unlist(strsplit(string,""))[seq(1,nchar(string),2)],collapse="") } > > > > > > Anything more elegant come to mind? Trying to avoid regex if possible > > (harder to explain to end-users), but if that meets the 'more elegant' > sniff > > test, happy to consider... > > > > Thanks in advance... > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]