thr3ads.net - R help - [R] "ACCTGMX" to "1223400" in R? [Jul 2010]

If this information is useful, please help other people find it:
Share via:

John1983

2010-Jul-19 21:31 UTC

[R] "ACCTGMX" to "1223400" in R?

Hi,

I am a newbie in R and was working on some DNA data represented as strings
of A,C,T and G (also wild-character like M and X). I use the Bioconductor
package in R. Currently I need to convert a string of the form
"ACCTGMX" to
"1223400" i.e. A is replaced by 1, C with 2, T with 3, G with 4 and
any
other character with a 0. I checked with 'replace' and also with a
function
called 'copySubstitute' found in the Biobase package but this is only
for
files. 
The data here is a string ("ACCTGMX" ) and we need to convert it to
yet
another string ("1223400"). Now I use the strsplit function to split
"ACCTGM" into "A" "C" "C" "T"
"G" "M" and then use 'which' to assign the
corresponding numbers. 
Is there a faster way to do this or some function I can make use of?

Please advice.

Thank you.
-- 
View this message in context:
http://r.789695.n4.nabble.com/ACCTGMX-to-1223400-in-R-tp2294636p2294636.html
Sent from the R help mailing list archive at Nabble.com.

David Winsemius

2010-Jul-20 01:37 UTC

head link

[R] "ACCTGMX" to "1223400" in R?

On Jul 19, 2010, at 5:31 PM, John1983 wrote:
>
> Hi,
>
> I am a newbie in R and was working on some DNA data represented as  
> strings
> of A,C,T and G (also wild-character like M and X). I use the  
> Bioconductor
> package in R.
Well, I guess it's sort of a "meta" package, but it is really more
of
a subculture. It also has its own mailing list.
> Currently I need to convert a string of the form "ACCTGMX" to
> "1223400" i.e. A is replaced by 1, C with 2, T with 3, G with 4
and
> any
> other character with a 0. I checked with 'replace' and also with a
> function
> called 'copySubstitute' found in the Biobase package but this is  
> only for
> files.
> The data here is a string ("ACCTGMX" ) and we need to convert it
to
> yet
> another string ("1223400"). Now I use the strsplit function to
split
> "ACCTGM" into "A" "C" "C"
"T" "G" "M" and then use 'which' to assign
> the
> corresponding numbers.
> Is there a faster way to do this or some function I can make use of?
 > tst <- rep( "ACCTGMX", 5)
 > newtst <- gsub("A", "1", tst)
 > newtst <- gsub("C", "2", newtst)
 > newtst <- gsub("T", "3", newtst)
 > newtst <- gsub("G", "4", newtst)
 > newtst <- gsub("[[:alpha:]]", "0", newtst)
 > newtst
[1] "1223400" "1223400" "1223400"
"1223400" "1223400"

There is also a rollaply function in teh zoo and an strapply function  
in the gsubfn package that might be even more powerful, but I am  
insufficiently talented to give you a one-liner using them.
>
> Please advise.
>
> Thank you.
> -- -- 

David Winsemius, MD
West Hartford, CT

jim holtman

2010-Jul-20 02:44 UTC

head link

[R] "ACCTGMX" to "1223400" in R?

Here is another way of doing it with 'chartr'; I only assume that you
have the upper characters, but you can add to the strings to cover any
others:
> tst <- rep( "ACCTGMX", 5)
> chartr("ACTGBDEFHIJKLMNOPQRSUVWXYZ",
"12340000000000000000000000", tst)[1] "1223400" "1223400" "1223400"
"1223400" "1223400"


On Mon, Jul 19, 2010 at 5:31 PM, John1983 <sandhya_prabhakaran at
yahoo.com> wrote:>
> Hi,
>
> I am a newbie in R and was working on some DNA data represented as strings
> of A,C,T and G (also wild-character like M and X). I use the Bioconductor
> package in R. Currently I need to convert a string of the form
"ACCTGMX" to
> "1223400" i.e. A is replaced by 1, C with 2, T with 3, G with 4
and any
> other character with a 0. I checked with 'replace' and also with a
function
> called 'copySubstitute' found in the Biobase package but this is
only for
> files.
> The data here is a string ("ACCTGMX" ) and we need to convert it
to yet
> another string ("1223400"). Now I use the strsplit function to
split
> "ACCTGM" into "A" "C" "C"
"T" "G" "M" and then use 'which' to assign
the
> corresponding numbers.
> Is there a faster way to do this or some function I can make use of?
>
> Please advice.
>
> Thank you.
> --
> View this message in context:
http://r.789695.n4.nabble.com/ACCTGMX-to-1223400-in-R-tp2294636p2294636.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Gabor Grothendieck

2010-Jul-20 02:46 UTC

head link

[R] "ACCTGMX" to "1223400" in R?

On Mon, Jul 19, 2010 at 5:31 PM, John1983 <sandhya_prabhakaran at
yahoo.com> wrote:>
> Hi,
>
> I am a newbie in R and was working on some DNA data represented as strings
> of A,C,T and G (also wild-character like M and X). I use the Bioconductor
> package in R. Currently I need to convert a string of the form
"ACCTGMX" to
> "1223400" i.e. A is replaced by 1, C with 2, T with 3, G with 4
and any
> other character with a 0. I checked with 'replace' and also with a
function
> called 'copySubstitute' found in the Biobase package but this is
only for
> files.
> The data here is a string ("ACCTGMX" ) and we need to convert it
to yet
> another string ("1223400"). Now I use the strsplit function to
split
> "ACCTGM" into "A" "C" "C"
"T" "G" "M" and then use 'which' to assign
the
> corresponding numbers.
> Is there a faster way to do this or some function I can make use of?
>
Here are a few alternatives.  The first uses chartr which translates
the ith character
in the first string to the ith character in second string.   If speed
is a consideration
then note that this alternative is the fastest by far.

The second alternative translates just ACGT using chartr and then uses gsub to
translate everything else to 0.  This alternative like the prior only
uses core R
functionality.  This solution is intermediate in speed and simplicity
between the
other two.

The third uses gsubfn which is like gsub but allows the replacement to
be a list.
In that case if the match equals a name in the list it is replaced
with that component
and if no name is matched then the unnamed component at the end is used as the
replacement.  This one has the advantage that it is particularly
simple to specify.

#1
chartr("ABCDEFGHIJKLMNOPQRSTUVWXYZ",
"10200040000000000003000000", "ACCTGMX")

#2
gsub("[^1-4]", "0", chartr("ACGT",
"1234", "ACCTGMX"))

#3
library(gsubfn)
gsubfn(".", list(A = 1, C = 2, T = 3, G = 4, 0), "ACCTGMX")

Possibly Parallel Threads

Search for more maybe matching threads

R help - Jul 2010 - "ACCTGMX" to "1223400" in R?

[R] "ACCTGMX" to "1223400" in R?

[R] "ACCTGMX" to "1223400" in R?

[R] "ACCTGMX" to "1223400" in R?

[R] "ACCTGMX" to "1223400" in R?

Possibly Parallel Threads