Dear R-Help Team!
I have some trouble with R. It's probably nothing big, but I can't find
a
solution.
My problem is the following:
I am trying to download some sequences from ncbi using the ape package.
seq1 <- paste("DQ", seq(060054, 060060), sep = "")
sequences <- read.GenBank(seq1,
seq.names = seq1,
species.names = TRUE,
gene.names = FALSE,
as.character = TRUE)
write.dna(sequences, "mysequences.fas", format = "fasta")
My problem is, that R doesn't take the whole sequence number as
"060054"
but it puts it as DQ60054 (missing the zero in the beginning, which is
essential).
Could please tell me, how I can get R to accepting the zero in the
beginning of the accession number?
Thank you very much in advance and all the best!
Nabila
[[alternative HTML version deleted]]
Try this:
seq1 <- paste("DQ0", seq(60054, 60060), sep = "")
Jean
On Sun, Feb 5, 2017 at 7:50 PM, Nabila Arbi <nabilaelarbi1912 at
gmail.com>
wrote:
> Dear R-Help Team!
>
> I have some trouble with R. It's probably nothing big, but I can't
find a
> solution.
> My problem is the following:
> I am trying to download some sequences from ncbi using the ape package.
>
> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
>
> sequences <- read.GenBank(seq1,
> seq.names = seq1,
> species.names = TRUE,
> gene.names = FALSE,
> as.character = TRUE)
>
> write.dna(sequences, "mysequences.fas", format =
"fasta")
>
> My problem is, that R doesn't take the whole sequence number as
"060054"
> but it puts it as DQ60054 (missing the zero in the beginning, which is
> essential).
>
> Could please tell me, how I can get R to accepting the zero in the
> beginning of the accession number?
>
> Thank you very much in advance and all the best!
>
> Nabila
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
[[alternative HTML version deleted]]
Hi Nabila,
This is because you ask to create a sequence with seq(), which does not
make much sense with non numeric data. That's why R trims the 0.
One alternative would be:
seq2 <- paste("DQ0", seq(60054, 60060), sep = "")
Would that work for you?
HTH,
Ivan
--
Ivan Calandra, PhD
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
calandra at rgzm.de
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra
https://rgzm.academia.edu/IvanCalandra
https://publons.com/author/705639/
On 06/02/2017 02:50, Nabila Arbi wrote:> Dear R-Help Team!
>
> I have some trouble with R. It's probably nothing big, but I can't
find a
> solution.
> My problem is the following:
> I am trying to download some sequences from ncbi using the ape package.
>
> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
>
> sequences <- read.GenBank(seq1,
> seq.names = seq1,
> species.names = TRUE,
> gene.names = FALSE,
> as.character = TRUE)
>
> write.dna(sequences, "mysequences.fas", format =
"fasta")
>
> My problem is, that R doesn't take the whole sequence number as
"060054"
> but it puts it as DQ60054 (missing the zero in the beginning, which is
> essential).
>
> Could please tell me, how I can get R to accepting the zero in the
> beginning of the accession number?
>
> Thank you very much in advance and all the best!
>
> Nabila
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
You need the leading zeros, and 'numerics' just give the number without leading zeros. You can use 'sprintf' for create a character string with the leading zeros:> # this is using 'numeric' and drops leading zeros > > seq1 <- paste("DQ", seq(060054, 060060), sep = "") > seq1[1] "DQ60054" "DQ60055" "DQ60056" "DQ60057" "DQ60058" "DQ60059" "DQ60060"> > # use 'sprintf' to create leading zeros > seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060))) > seq2[1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059" "DQ060060">Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi <nabilaelarbi1912 at gmail.com> wrote:> Dear R-Help Team! > > I have some trouble with R. It's probably nothing big, but I can't find a > solution. > My problem is the following: > I am trying to download some sequences from ncbi using the ape package. > > seq1 <- paste("DQ", seq(060054, 060060), sep = "") > > sequences <- read.GenBank(seq1, > seq.names = seq1, > species.names = TRUE, > gene.names = FALSE, > as.character = TRUE) > > write.dna(sequences, "mysequences.fas", format = "fasta") > > My problem is, that R doesn't take the whole sequence number as "060054" > but it puts it as DQ60054 (missing the zero in the beginning, which is > essential). > > Could please tell me, how I can get R to accepting the zero in the > beginning of the accession number? > > Thank you very much in advance and all the best! > > Nabila > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Two methods, among others:
seq1 <- paste("DQ", sprintf("%0*d", 6, seq(060054,
060060)), sep = "")
or
seq1 <- paste("DQ", formatC(seq(060054, 060060), dig = 5, flag =
0), sep "")
Hth,
Adrian
On Mon, Feb 6, 2017 at 3:50 AM, Nabila Arbi <nabilaelarbi1912 at
gmail.com>
wrote:
> Dear R-Help Team!
>
> I have some trouble with R. It's probably nothing big, but I can't
find a
> solution.
> My problem is the following:
> I am trying to download some sequences from ncbi using the ape package.
>
> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
>
> sequences <- read.GenBank(seq1,
> seq.names = seq1,
> species.names = TRUE,
> gene.names = FALSE,
> as.character = TRUE)
>
> write.dna(sequences, "mysequences.fas", format =
"fasta")
>
> My problem is, that R doesn't take the whole sequence number as
"060054"
> but it puts it as DQ60054 (missing the zero in the beginning, which is
> essential).
>
> Could please tell me, how I can get R to accepting the zero in the
> beginning of the accession number?
>
> Thank you very much in advance and all the best!
>
> Nabila
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
[[alternative HTML version deleted]]
I think it is important to point out that whenever R treats a number as a
numeric (integer or double) it loses any base 10 concept of "leading
zero" in that internal representation, so in this expression
seq2 <- paste0("DQ", sprintf("%06d", seq(060054,
060060)))
the arguments to seq have leading zeros that are ignored by R and have nothing
to do with getting the desired output. That is, the same result can be obtained
using
seq2 <- paste0("DQ", sprintf("%06d", seq(60054, 60060)))
or
seq2 <- paste0("DQ", sprintf("%06d", seq(0060054,
00060060)))
since only the zero inside the format string is key to success. (If it makes you
more comfortable to put the zero there for readability that is your choice, but
R ignores therm.)
Also note that the paste0 function is not needed when you use sprintf:
seq2 <- sprintf("DQ%06d", seq(60054, 60060))
or
myprefix <- "DQ"
seq2 <- sprintf("%s%06d", myprefix,seq(60054, 60060))
--
Sent from my phone. Please excuse my brevity.
On February 6, 2017 5:45:43 AM PST, jim holtman <jholtman at gmail.com>
wrote:>You need the leading zeros, and 'numerics' just give the number
without
>leading zeros. You can use 'sprintf' for create a character string
>with
>the leading zeros:
>
>> # this is using 'numeric' and drops leading zeros
>>
>> seq1 <- paste("DQ", seq(060054, 060060), sep =
"")
>> seq1
>[1] "DQ60054" "DQ60055" "DQ60056"
"DQ60057" "DQ60058" "DQ60059"
>"DQ60060"
>>
>> # use 'sprintf' to create leading zeros
>> seq2 <- paste0("DQ", sprintf("%06d", seq(060054,
060060)))
>> seq2
>[1] "DQ060054" "DQ060055" "DQ060056"
"DQ060057" "DQ060058" "DQ060059"
>"DQ060060"
>>
>
>
>Jim Holtman
>Data Munger Guru
>
>What is the problem that you are trying to solve?
>Tell me what you want to do, not how you want to do it.
>
>On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi
><nabilaelarbi1912 at gmail.com>
>wrote:
>
>> Dear R-Help Team!
>>
>> I have some trouble with R. It's probably nothing big, but I
can't
>find a
>> solution.
>> My problem is the following:
>> I am trying to download some sequences from ncbi using the ape
>package.
>>
>> seq1 <- paste("DQ", seq(060054, 060060), sep =
"")
>>
>> sequences <- read.GenBank(seq1,
>> seq.names = seq1,
>> species.names = TRUE,
>> gene.names = FALSE,
>> as.character = TRUE)
>>
>> write.dna(sequences, "mysequences.fas", format =
"fasta")
>>
>> My problem is, that R doesn't take the whole sequence number as
>"060054"
>> but it puts it as DQ60054 (missing the zero in the beginning, which
>is
>> essential).
>>
>> Could please tell me, how I can get R to accepting the zero in the
>> beginning of the accession number?
>>
>> Thank you very much in advance and all the best!
>>
>> Nabila
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
No need for sprintf(). Simply:> paste0("DQ0",seq.int(60054,60060))[1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059" [7] "DQ060060" Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Feb 6, 2017 at 5:45 AM, jim holtman <jholtman at gmail.com> wrote:> You need the leading zeros, and 'numerics' just give the number without > leading zeros. You can use 'sprintf' for create a character string with > the leading zeros: > >> # this is using 'numeric' and drops leading zeros >> >> seq1 <- paste("DQ", seq(060054, 060060), sep = "") >> seq1 > [1] "DQ60054" "DQ60055" "DQ60056" "DQ60057" "DQ60058" "DQ60059" "DQ60060" >> >> # use 'sprintf' to create leading zeros >> seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060))) >> seq2 > [1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059" > "DQ060060" >> > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi <nabilaelarbi1912 at gmail.com> > wrote: > >> Dear R-Help Team! >> >> I have some trouble with R. It's probably nothing big, but I can't find a >> solution. >> My problem is the following: >> I am trying to download some sequences from ncbi using the ape package. >> >> seq1 <- paste("DQ", seq(060054, 060060), sep = "") >> >> sequences <- read.GenBank(seq1, >> seq.names = seq1, >> species.names = TRUE, >> gene.names = FALSE, >> as.character = TRUE) >> >> write.dna(sequences, "mysequences.fas", format = "fasta") >> >> My problem is, that R doesn't take the whole sequence number as "060054" >> but it puts it as DQ60054 (missing the zero in the beginning, which is >> essential). >> >> Could please tell me, how I can get R to accepting the zero in the >> beginning of the accession number? >> >> Thank you very much in advance and all the best! >> >> Nabila >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.