I have genetic data as follows (simple example, actual data is much larger): comb ID1 A A T G C T G C G T C G T A ID2 G C T G C C T G C T G T T T And I wish to get an output like this: ID1 AA TG CT GC GT CG TA ID2 GC TG CC TG CT GT TT That is, paste every two columns together. I have this code, but I get the error: Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1 conc <- function(x) { s <- seq(2, nchar(x), 2) paste0(x[s], x[s+1]) } combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE) Thanks in advance!
Hi Kate, Maybe you want: seq(2,length(x),by=2) Jim On Thu, Jan 29, 2015 at 10:55 AM, Kate Ignatius <kate.ignatius at gmail.com> wrote:> I have genetic data as follows (simple example, actual data is much larger): > > comb > > ID1 A A T G C T G C G T C G T A > > ID2 G C T G C C T G C T G T T T > > And I wish to get an output like this: > > ID1 AA TG CT GC GT CG TA > > ID2 GC TG CC TG CT GT TT > > That is, paste every two columns together. > > I have this code, but I get the error: > > Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1 > > conc <- function(x) { > s <- seq(2, nchar(x), 2) > paste0(x[s], x[s+1]) > } > > combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE) > > Thanks in advance! > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi, Here is my implementation:> combine <- function(x){+ odd <- x[1:length(x) %% 2 == 1] + even <- x[1:length(x) %%2 == 0] + paste0(odd,even)}> temp <- letters[1:24] > temp[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"> combine(temp)[1] "ab" "cd" "ef" "gh" "ij" "kl" "mn" "op" "qr" "st" "uv" "wx" -- View this message in context: http://r.789695.n4.nabble.com/Paste-every-two-columns-together-tp4702429p4702433.html Sent from the R help mailing list archive at Nabble.com.
I am using just the first row of your data (i.e. ID1). > ID1 <- c("A", "A", "T", "G", "C", "T", "G", "C", "G", "T", "C", "G", "T", "A") > do.call(c,lapply(tapply(ID1, gl(7,2), c), paste, collapse="")) 1 2 3 4 5 6 7 "AA" "TG" "CT" "GC" "GT" "CG" "TA" > Is this what you are looking for? I hope this helps. Chel Hee Lee On 01/28/2015 05:55 PM, Kate Ignatius wrote:> I have genetic data as follows (simple example, actual data is much larger): > > comb > > ID1 A A T G C T G C G T C G T A > > ID2 G C T G C C T G C T G T T T > > And I wish to get an output like this: > > ID1 AA TG CT GC GT CG TA > > ID2 GC TG CC TG CT GT TT > > That is, paste every two columns together. > > I have this code, but I get the error: > > Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1 > > conc <- function(x) { > s <- seq(2, nchar(x), 2) > paste0(x[s], x[s+1]) > } > > combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE) > > Thanks in advance! > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
eek! Chel Hee,anything that complicated should engender fear and trembling. Much simpler and more efficient (if I understand correctly) i <- seq.int(1L,length(ID1),by = 2L) paste0(ID1[i],ID1[i+1]) That gives a vector of paired letters. If you want a single character string, just collapse with a " " (space): paste0(ID1[i],ID1[i+1],collapse= " ") Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Wed, Jan 28, 2015 at 7:41 PM, Chel Hee Lee <chl948 at mail.usask.ca> wrote:> I am using just the first row of your data (i.e. ID1). > >> ID1 <- c("A", "A", "T", "G", "C", "T", "G", "C", "G", "T", "C", "G", "T", >> "A") >> do.call(c,lapply(tapply(ID1, gl(7,2), c), paste, collapse="")) > 1 2 3 4 5 6 7 > "AA" "TG" "CT" "GC" "GT" "CG" "TA" >> > > Is this what you are looking for? I hope this helps. > > Chel Hee Lee > > > On 01/28/2015 05:55 PM, Kate Ignatius wrote: >> >> I have genetic data as follows (simple example, actual data is much >> larger): >> >> comb >> >> ID1 A A T G C T G C G T C G T A >> >> ID2 G C T G C C T G C T G T T T >> >> And I wish to get an output like this: >> >> ID1 AA TG CT GC GT CG TA >> >> ID2 GC TG CC TG CT GT TT >> >> That is, paste every two columns together. >> >> I have this code, but I get the error: >> >> Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1 >> >> conc <- function(x) { >> s <- seq(2, nchar(x), 2) >> paste0(x[s], x[s+1]) >> } >> >> combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE) >> >> Thanks in advance! >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi: Don't know about performance, but this is fairly simple for operating on atomic vectors: x <- c("A", "A", "G", "T", "C", "G") apply(embed(x, 2), 1, paste0, collapse = "") [1] "AA" "GA" "TG" "CT" "GC" Check the help page of embed() for details. Dennis On Wed, Jan 28, 2015 at 3:55 PM, Kate Ignatius <kate.ignatius at gmail.com> wrote:> I have genetic data as follows (simple example, actual data is much larger): > > comb > > ID1 A A T G C T G C G T C G T A > > ID2 G C T G C C T G C T G T T T > > And I wish to get an output like this: > > ID1 AA TG CT GC GT CG TA > > ID2 GC TG CC TG CT GT TT > > That is, paste every two columns together. > > I have this code, but I get the error: > > Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1 > > conc <- function(x) { > s <- seq(2, nchar(x), 2) > paste0(x[s], x[s+1]) > } > > combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE) > > Thanks in advance! > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.