Hi there, I have a large amino acid csv file like this: input.txt: P,LV,Q,Z P,VL,Q,Z P,ML,QL,Z There is a problem with this file, since LV and VL are in fact the same thing. How do I order each element according to alphabetical order so that the desired output would look like: output.txt: P,LV,Q,Z P,LV,Q,Z P,LM,LQ,Z -- View this message in context: http://r.789695.n4.nabble.com/how-to-order-each-element-according-to-alphabet-tp3668997p3668997.html Sent from the R help mailing list archive at Nabble.com.
On Jul 14, 2011, at 9:18 PM, onthetopo wrote:> Hi there, > > I have a large amino acid csv file like this: > > input.txt: > P,LV,Q,Z > P,VL,Q,Z > P,ML,QL,Z >Are you also asking how to read a comma separated file? ? read.csv # and read more introductory material> There is a problem with this file, since LV and VL are in fact the > same > thing. > How do I order each element according to alphabetical order so that > the > desired output would look like: > > output.txt: > P,LV,Q,Z > P,LV,Q,Z > P,LM,LQ,Z >That is not a reproducible example without input code: Perhaps: as.data.frame(lapply(input_dfrm, gsub, patt="LV", repl="VL")) -- David Winsemius, MD West Hartford, CT
Hi, There are many more patterns than VL to LV. In fact, too many to be listed manually. For example ML should be ordered as LM, QL should be ordered as LQ. The order is according to the alphabet. -- View this message in context: http://r.789695.n4.nabble.com/how-to-order-each-element-according-to-alphabet-tp3668997p3669130.html Sent from the R help mailing list archive at Nabble.com.
On Jul 14, 2011, at 11:19 PM, onthetopo wrote:> Hi, > > There are many more patterns than VL to LV. In fact, too many to be > listed manually.> > For example ML should be ordered as LM, QL should be ordered as LQ. > The order is according to the alphabet.A more complete (reproducible) answer would have been appreciated and note that local custom dictates that context is offered for ongoing threads. Nabble provides a mechanism for doing so. >lets2 <- paste(LETTERS[sample(20, replace=TRUE)], LETTERS[sample(20, replace=TRUE)], sep="") > lets2 [1] "IA" "EP" "TE" "IT" "PS" "DO" "RO" "EJ" "DR" "DD" "LM" "OF" "RJ" "OA" "JD" "QB" "AS" "TG" "MK" "IM" > sapply( lapply( strsplit(lets2, split=""), sort), paste, collapse="") [1] "AI" "EP" "ET" "IT" "PS" "DO" "OR" "EJ" "DR" "DD" "LM" "FO" "JR" "AO" "DJ" "BQ" "AS" "GT" "KM" "IM"> -- > View this message in context: http://r.789695.n4.nabble.com/how-to-order-each-element-according-to-alphabet-tp3668997p3669130.html > Sent from the R help mailing list archive at Nabble.com.Nabble is NOT rhelp. So PLEASE, PLEASE, PLEASE:> .... do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- David Winsemius, MD West Hartford, CT
> dd[,1] [,2] [1,] "OP" "SU" [2,] "XA" "YQ" sapply( lapply( + strsplit(dd, split=""), sort), + paste, collapse="") [1] "OP" "AX" "SU" "QY" The result is not what I intended since it is a single line. It should be: [,1] [,2] [1,] "OP" "SU" [2,] "AX" "QY" -- View this message in context: http://r.789695.n4.nabble.com/how-to-order-each-element-according-to-alphabet-tp3668997p3669195.html Sent from the R help mailing list archive at Nabble.com.
Hi: Is this what you're looking for? Lines <- " ASG,UXW,AFODJEL E,TDIWE,ROFD" # Read in the above lines (for purposes of this example only) # Note the stringsAs Factors = FALSE option! df <- read.csv(textConnection(Lines), header = FALSE, stringsAsFactors = FALSE) closeAllConnections() dm <- as.matrix(df) # convert to a character matrix # Function to sort a character string in alphabetical (lexical) order sortfun <- function(x) paste(sort(unlist(strsplit(x, ''))), collapse = '') # Apply to the rows of the matrix t(apply(df, 1, function(x) sapply(x, sortfun))) Result: V1 V2 V3 [1,] "AGS" "UWX" "ADEFJLO" [2,] "E" "DEITW" "DFOR" If you need to do this for only a subset of your variables, create a character submatrix and follow the script above on that, after which you would need to do some post-processing on your own. HTH, Dennis On Thu, Jul 14, 2011 at 6:18 PM, onthetopo <jint83 at sina.com> wrote:> Hi there, > > ?I have a large amino acid csv file like this: > > input.txt: > P,LV,Q,Z > P,VL,Q,Z > P,ML,QL,Z > > There is a problem with this file, since LV and VL are in fact the same > thing. > How do I order each element according to alphabetical order so that the > desired output would look like: > > output.txt: > P,LV,Q,Z > P,LV,Q,Z > P,LM,LQ,Z > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/how-to-order-each-element-according-to-alphabet-tp3668997p3668997.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Jul 15, 2011, at 12:23 AM, onthetopo wrote:>> dd > [,1] [,2] > [1,] "OP" "SU" > [2,] "XA" "YQ" > > sapply( lapply( > + strsplit(dd, split=""), sort), > + paste, collapse="") > > [1] "OP" "AX" "SU" "QY" > > The result is not what I intended since it is a single line. > It should be: > [,1] [,2] > [1,] "OP" "SU" > [2,] "AX" "QY"sortvec <- function(x) paste( sapply( strsplit(x, split=""), sort), sep="") apply(dd, 1:2, sortvec) [,1] [,2] [1,] "OP" "SU" [2,] "AX" "QY" -- David Winsemius, MD West Hartford, CT
Reasonably Related Threads
- qr.qy and qr.qty give an error message when y is integer and LAPACK=TRUE
- vlmc - "In vlmc(traffic.clusters.stationary, cutoff = i) : alphabet with >1-letter strings; trying to abbreviate"
- samba as pdc in Ubuntu dapper, fails on ps$ join?
- Mandatory 1 uppercase alphabet for all user passwords
- Suggestions to speed up median() and has.na()