Hello, I have got the following problem: given is a large string sequence consisting of the four letters "A" "C" "G" and "T" (as before). Additionally, I have got a second string sequence of the same length giving a label for each character. The labels are "+" and "-". Now I would like to create an 8x8 matrix which contains the numbers on how often we see all possible pairwise combinations, for example "A" with the label "+" followed by "C" with the label "+" or "T"->"C" with the labels "-"->"+" etc. Of course I can just use loops to "walk" along the sequence, but as you have shown me so much better solutions in response to my last mail, I thought you might be able to help and improve my R skills even further .. Thanks for your ideas! Cheers, Winnie
Have you done a search of "www.r-project.org" -> search -> "R site search" for "Markov Chain"? I just got "138 documents matching your query". The fifth one suggested "chapter 5 of Jim Lindsey's online document 'The statistical analysis of stochastic processes in Time', at his website www.luc.ac.be/~jlindsey". I found this document mentioned under "recent publications". The book may no longer be downloadable , but his examples still are. There are probably other tools of interest to you in that list, and perhaps someone else will enlighten both of us on this. There may be an easier way to do what you ask, if I understand your question correctly, the following seems to do it for me: bases <- c("A","C","G","T") sgn <- c("+", "-") signedBases <- as.vector( outer(bases, sgn, paste, sep="")) sBnum <- 1:8 names(sBnum) <- signedBases set.seed(1) seqLen <- 100 sBaseSeq <- sample(x=signedBases, size=seqLen, replace=TRUE) nextBase <- aggregate(sBaseSeq[-seqLen], list(thisBase=sBaseSeq[-seqLen], nextBase=sBaseSeq[-1]), length) transFreq <- array(0, dim=c(8,8)) dimnames(transFreq) <- list(signedBases, signedBases) nBnum <- array( sBnum[as.matrix(nextBase[1:2])], dim=dim(nextBase[1:2])) transFreq[nBnum]<- nextBase[[3]] > transFreq A+ C+ G+ T+ A- C- G- T- A+ 1 2 1 2 0 2 0 1 C+ 2 3 1 0 0 3 1 1 G+ 0 0 2 5 2 1 2 0 T+ 1 2 2 1 1 3 8 2 A- 0 0 0 1 1 1 1 1 C- 2 1 1 5 0 2 2 2 G- 3 1 2 4 2 2 1 2 T- 0 2 2 2 0 1 2 1 hope this helps. spencer graves dax42 wrote:> Hello, > > I have got the following problem: > given is a large string sequence consisting of the four letters "A" > "C" "G" and "T" (as before). Additionally, I have got a second string > sequence of the same length giving a label for each character. The > labels are "+" and "-". > > Now I would like to create an 8x8 matrix which contains the numbers on > how often we see all possible pairwise combinations, for example "A" > with the label "+" followed by "C" with the label "+" or "T"->"C" with > the labels "-"->"+" etc. > > Of course I can just use loops to "walk" along the sequence, but as > you have shown me so much better solutions in response to my last > mail, I thought you might be able to help and improve my R skills even > further .. > > Thanks for your ideas! > Cheers, Winnie > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html-- Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567
dax42 <Dax42 <at> web.de> writes: : : Hello, : : I have got the following problem: : given is a large string sequence consisting of the four letters "A" "C" : "G" and "T" (as before). Additionally, I have got a second string : sequence of the same length giving a label for each character. The : labels are "+" and "-". : : Now I would like to create an 8x8 matrix which contains the numbers on : how often we see all possible pairwise combinations, for example "A" : with the label "+" followed by "C" with the label "+" or "T"->"C" with : the labels "-"->"+" etc. : : Of course I can just use loops to "walk" along the sequence, but as you : have shown me so much better solutions in response to my last mail, I : thought you might be able to help and improve my R skills even further This is quite similar to your prior question. Use this as your factor: f <- factor( paste(s1, s2, sep = "."), levels = levels(interaction(c("A","C","G","T"), c("-","+")) ) ) and process it with the same table expression as last time: table( f[-length(f)], f[-1] )
On 30-Dec-04 dax42 wrote:> Hello, > > I have got the following problem: > given is a large string sequence consisting of the four letters "A" "C" > "G" and "T" (as before). Additionally, I have got a second string > sequence of the same length giving a label for each character. The > labels are "+" and "-". > > Now I would like to create an 8x8 matrix which contains the numbers on > how often we see all possible pairwise combinations, for example "A" > with the label "+" followed by "C" with the label "+" or "T"->"C" with > the labels "-"->"+" etc. > > Of course I can just use loops to "walk" along the sequence, but as you > have shown me so much better solutions in response to my last mail, I > thought you might be able to help and improve my R skills even further > .. > > Thanks for your ideas! > Cheers, WinnieWell, flattery and all that ... Anyway, the following is an example of how it can be done. You can cut&paste all the following. # Artificial example of pairs, one of "A","C","T","G" paired # with one of "-","+" S<-sample(c("A","C","G","T"),1000,replace=TRUE) T<-sample(c("-","+"),1000,replace=TRUE) U<-apply(cbind(S,T),1,paste,collapse="") U[1:10] ## [1] "C+" "T-" "G+" "T+" "C+" "T+" "T-" "C+" "C-" "C-" ## Shows the first few of the pairs # constructs 4-character items, each consisting of a pair # (e.g. "C+") pasted to its successor (e.g. "T-") V<-apply(cbind(U[1:999],U[2:1000]),1,paste,collapse="") V[1:7] ## [1] "C+T-" "T-G+" "G+T+" "T+C+" "C+T+" "T+T-" "T-C+" ## Shows the first few of these. Compare with U above. ## Now this is where the real gurus can show their mettle. ## ## One way to get the counts is simply table(V) ## but this is not a nice layout. Another is the loop: for(i in sort(unique(V))){print(paste(i,":",sum(V==i)))} ## and I had hoped to think of a solution that did not ## involve a vulgar loop but would also avoid the unhelpful ## layout of table(V). (This is not your 8x8 matrix, but ## converting the output of the loop to one should not be ## impossible ... ) Pending the elegant solution which someone will come up with, working through the above and consulting "?" for anything not understood will reveal a few things about R ... Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 30-Dec-04 Time: 23:37:35 ------------------------------ XFMail ------------------------------