Hi, I have a question about the data handling. I have a dataset as following: ID snp1 snp2 snp3 1001 0/0 1/1 1/1 1002 2/2 3/3 1/1 1003 4/4 3/3 2/2 I want to convert the dataset to the following format: ID snp1 snp2 snp3 1001 00 AA AA 1002 GG CC AA 1003 TT CC GG thing to be done: 1) take the '/' off 2) replace the numbers with letters, 0 not change. 1=A, 2=G, 3=C, 4=T what is the most efficient way to do it? thanks you very much, karena -- View this message in context: http://r.789695.n4.nabble.com/questions-about-string-handling-tp2314335p2314335.html Sent from the R help mailing list archive at Nabble.com.
Tena koe Karena See ?sub and ?gsub HTH ... Peter Alspach> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of karena > Sent: Thursday, 5 August 2010 12:56 p.m. > To: r-help at r-project.org > Subject: [R] questions about string handling > > > Hi, I have a question about the data handling. I have a dataset as > following: > ID snp1 snp2 snp3 > 1001 0/0 1/1 1/1 > 1002 2/2 3/3 1/1 > 1003 4/4 3/3 2/2 > > I want to convert the dataset to the following format: > ID snp1 snp2 snp3 > 1001 00 AA AA > 1002 GG CC AA > 1003 TT CC GG > > thing to be done: > 1) take the '/' off > 2) replace the numbers with letters, 0 not change. 1=A, 2=G, 3=C, 4=T > > what is the most efficient way to do it? > > thanks you very much, > > karena > -- > View this message in context: http://r.789695.n4.nabble.com/questions- > about-string-handling-tp2314335p2314335.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
How about this ("df" is your input data.frame) data.frame(ID=df[,1], apply(df[,2:4], 2, function(x) c("00", "AA", "GG", "CC", "TT")[match(x, c("0/0", "1/1", "2/2", "3/3", "4/4"))])) Michael On 5 August 2010 10:55, karena <dr.jzhou at gmail.com> wrote:> > Hi, I have a question about the data handling. I have a dataset as following: > ID ? ? ? ? snp1 ? ? ? ?snp2 ? ? ? ? ? snp3 > 1001 ? ? 0/0 ? ? ? ? ? 1/1 ? ? ? ? ? ?1/1 > 1002 ? ? 2/2 ? ? ? ? ? 3/3 ? ? ? ? ? ?1/1 > 1003 ? ? 4/4 ? ? ? ? ? 3/3 ? ? ? ? ? ?2/2 > > I want to convert the dataset to the following format: > ID ? ? ? ?snp1 ? ? ? ? snp2 ? ? ? ? ? snp3 > 1001 ? ? 00 ? ? ? ? ? ?AA ? ? ? ? ? ? AA > 1002 ? ? GG ? ? ? ? ? ?CC ? ? ? ? ? ?AA > 1003 ? ? TT ? ? ? ? ? ?CC ? ? ? ? ? ?GG > > thing to be done: > 1) take the '/' off > 2) replace the numbers with letters, 0 not change. ? 1=A, 2=G, 3=C, 4=T > > what is the most efficient way to do it? > > thanks you very much, > > karena > -- > View this message in context: http://r.789695.n4.nabble.com/questions-about-string-handling-tp2314335p2314335.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >