Dear R-Help Team! I have some trouble with R. It's probably nothing big, but I can't find a solution. My problem is the following: I am trying to download some sequences from ncbi using the ape package. seq1 <- paste("DQ", seq(060054, 060060), sep = "") sequences <- read.GenBank(seq1, seq.names = seq1, species.names = TRUE, gene.names = FALSE, as.character = TRUE) write.dna(sequences, "mysequences.fas", format = "fasta") My problem is, that R doesn't take the whole sequence number as "060054" but it puts it as DQ60054 (missing the zero in the beginning, which is essential). Could please tell me, how I can get R to accepting the zero in the beginning of the accession number? Thank you very much in advance and all the best! Nabila [[alternative HTML version deleted]]
Try this: seq1 <- paste("DQ0", seq(60054, 60060), sep = "") Jean On Sun, Feb 5, 2017 at 7:50 PM, Nabila Arbi <nabilaelarbi1912 at gmail.com> wrote:> Dear R-Help Team! > > I have some trouble with R. It's probably nothing big, but I can't find a > solution. > My problem is the following: > I am trying to download some sequences from ncbi using the ape package. > > seq1 <- paste("DQ", seq(060054, 060060), sep = "") > > sequences <- read.GenBank(seq1, > seq.names = seq1, > species.names = TRUE, > gene.names = FALSE, > as.character = TRUE) > > write.dna(sequences, "mysequences.fas", format = "fasta") > > My problem is, that R doesn't take the whole sequence number as "060054" > but it puts it as DQ60054 (missing the zero in the beginning, which is > essential). > > Could please tell me, how I can get R to accepting the zero in the > beginning of the accession number? > > Thank you very much in advance and all the best! > > Nabila > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
Hi Nabila, This is because you ask to create a sequence with seq(), which does not make much sense with non numeric data. That's why R trims the 0. One alternative would be: seq2 <- paste("DQ0", seq(60054, 60060), sep = "") Would that work for you? HTH, Ivan -- Ivan Calandra, PhD MONREPOS Archaeological Research Centre and Museum for Human Behavioural Evolution Schloss Monrepos 56567 Neuwied, Germany calandra at rgzm.de +49 (0) 2631 9772-243 https://www.researchgate.net/profile/Ivan_Calandra https://rgzm.academia.edu/IvanCalandra https://publons.com/author/705639/ On 06/02/2017 02:50, Nabila Arbi wrote:> Dear R-Help Team! > > I have some trouble with R. It's probably nothing big, but I can't find a > solution. > My problem is the following: > I am trying to download some sequences from ncbi using the ape package. > > seq1 <- paste("DQ", seq(060054, 060060), sep = "") > > sequences <- read.GenBank(seq1, > seq.names = seq1, > species.names = TRUE, > gene.names = FALSE, > as.character = TRUE) > > write.dna(sequences, "mysequences.fas", format = "fasta") > > My problem is, that R doesn't take the whole sequence number as "060054" > but it puts it as DQ60054 (missing the zero in the beginning, which is > essential). > > Could please tell me, how I can get R to accepting the zero in the > beginning of the accession number? > > Thank you very much in advance and all the best! > > Nabila > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
You need the leading zeros, and 'numerics' just give the number without leading zeros. You can use 'sprintf' for create a character string with the leading zeros:> # this is using 'numeric' and drops leading zeros > > seq1 <- paste("DQ", seq(060054, 060060), sep = "") > seq1[1] "DQ60054" "DQ60055" "DQ60056" "DQ60057" "DQ60058" "DQ60059" "DQ60060"> > # use 'sprintf' to create leading zeros > seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060))) > seq2[1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059" "DQ060060">Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi <nabilaelarbi1912 at gmail.com> wrote:> Dear R-Help Team! > > I have some trouble with R. It's probably nothing big, but I can't find a > solution. > My problem is the following: > I am trying to download some sequences from ncbi using the ape package. > > seq1 <- paste("DQ", seq(060054, 060060), sep = "") > > sequences <- read.GenBank(seq1, > seq.names = seq1, > species.names = TRUE, > gene.names = FALSE, > as.character = TRUE) > > write.dna(sequences, "mysequences.fas", format = "fasta") > > My problem is, that R doesn't take the whole sequence number as "060054" > but it puts it as DQ60054 (missing the zero in the beginning, which is > essential). > > Could please tell me, how I can get R to accepting the zero in the > beginning of the accession number? > > Thank you very much in advance and all the best! > > Nabila > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Two methods, among others: seq1 <- paste("DQ", sprintf("%0*d", 6, seq(060054, 060060)), sep = "") or seq1 <- paste("DQ", formatC(seq(060054, 060060), dig = 5, flag = 0), sep "") Hth, Adrian On Mon, Feb 6, 2017 at 3:50 AM, Nabila Arbi <nabilaelarbi1912 at gmail.com> wrote:> Dear R-Help Team! > > I have some trouble with R. It's probably nothing big, but I can't find a > solution. > My problem is the following: > I am trying to download some sequences from ncbi using the ape package. > > seq1 <- paste("DQ", seq(060054, 060060), sep = "") > > sequences <- read.GenBank(seq1, > seq.names = seq1, > species.names = TRUE, > gene.names = FALSE, > as.character = TRUE) > > write.dna(sequences, "mysequences.fas", format = "fasta") > > My problem is, that R doesn't take the whole sequence number as "060054" > but it puts it as DQ60054 (missing the zero in the beginning, which is > essential). > > Could please tell me, how I can get R to accepting the zero in the > beginning of the accession number? > > Thank you very much in advance and all the best! > > Nabila > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr. 90-92 050663 Bucharest sector 5 Romania [[alternative HTML version deleted]]
I think it is important to point out that whenever R treats a number as a numeric (integer or double) it loses any base 10 concept of "leading zero" in that internal representation, so in this expression seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060))) the arguments to seq have leading zeros that are ignored by R and have nothing to do with getting the desired output. That is, the same result can be obtained using seq2 <- paste0("DQ", sprintf("%06d", seq(60054, 60060))) or seq2 <- paste0("DQ", sprintf("%06d", seq(0060054, 00060060))) since only the zero inside the format string is key to success. (If it makes you more comfortable to put the zero there for readability that is your choice, but R ignores therm.) Also note that the paste0 function is not needed when you use sprintf: seq2 <- sprintf("DQ%06d", seq(60054, 60060)) or myprefix <- "DQ" seq2 <- sprintf("%s%06d", myprefix,seq(60054, 60060)) -- Sent from my phone. Please excuse my brevity. On February 6, 2017 5:45:43 AM PST, jim holtman <jholtman at gmail.com> wrote:>You need the leading zeros, and 'numerics' just give the number without >leading zeros. You can use 'sprintf' for create a character string >with >the leading zeros: > >> # this is using 'numeric' and drops leading zeros >> >> seq1 <- paste("DQ", seq(060054, 060060), sep = "") >> seq1 >[1] "DQ60054" "DQ60055" "DQ60056" "DQ60057" "DQ60058" "DQ60059" >"DQ60060" >> >> # use 'sprintf' to create leading zeros >> seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060))) >> seq2 >[1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059" >"DQ060060" >> > > >Jim Holtman >Data Munger Guru > >What is the problem that you are trying to solve? >Tell me what you want to do, not how you want to do it. > >On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi ><nabilaelarbi1912 at gmail.com> >wrote: > >> Dear R-Help Team! >> >> I have some trouble with R. It's probably nothing big, but I can't >find a >> solution. >> My problem is the following: >> I am trying to download some sequences from ncbi using the ape >package. >> >> seq1 <- paste("DQ", seq(060054, 060060), sep = "") >> >> sequences <- read.GenBank(seq1, >> seq.names = seq1, >> species.names = TRUE, >> gene.names = FALSE, >> as.character = TRUE) >> >> write.dna(sequences, "mysequences.fas", format = "fasta") >> >> My problem is, that R doesn't take the whole sequence number as >"060054" >> but it puts it as DQ60054 (missing the zero in the beginning, which >is >> essential). >> >> Could please tell me, how I can get R to accepting the zero in the >> beginning of the accession number? >> >> Thank you very much in advance and all the best! >> >> Nabila >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
No need for sprintf(). Simply:> paste0("DQ0",seq.int(60054,60060))[1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059" [7] "DQ060060" Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Feb 6, 2017 at 5:45 AM, jim holtman <jholtman at gmail.com> wrote:> You need the leading zeros, and 'numerics' just give the number without > leading zeros. You can use 'sprintf' for create a character string with > the leading zeros: > >> # this is using 'numeric' and drops leading zeros >> >> seq1 <- paste("DQ", seq(060054, 060060), sep = "") >> seq1 > [1] "DQ60054" "DQ60055" "DQ60056" "DQ60057" "DQ60058" "DQ60059" "DQ60060" >> >> # use 'sprintf' to create leading zeros >> seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060))) >> seq2 > [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059" > "DQ060060" >> > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi <nabilaelarbi1912 at gmail.com> > wrote: > >> Dear R-Help Team! >> >> I have some trouble with R. It's probably nothing big, but I can't find a >> solution. >> My problem is the following: >> I am trying to download some sequences from ncbi using the ape package. >> >> seq1 <- paste("DQ", seq(060054, 060060), sep = "") >> >> sequences <- read.GenBank(seq1, >> seq.names = seq1, >> species.names = TRUE, >> gene.names = FALSE, >> as.character = TRUE) >> >> write.dna(sequences, "mysequences.fas", format = "fasta") >> >> My problem is, that R doesn't take the whole sequence number as "060054" >> but it puts it as DQ60054 (missing the zero in the beginning, which is >> essential). >> >> Could please tell me, how I can get R to accepting the zero in the >> beginning of the accession number? >> >> Thank you very much in advance and all the best! >> >> Nabila >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.