Benjamin Wright
2011-Oct-03 15:40 UTC
[R] Parsing variable-length delimited strings into a matrix
I'm struggling to find a way of parsing a vector of data in this sort of form: A,B,C B,B A,AA,C A,B,BB,BBB,B,B into a matrix (or data frame). The catch is that I don't know a priori how many entries there will be in each element, nor how many characters there will be. strsplit(vec,",") gets me a list, but I can't find a way of turning the list into a matrix. unlistlst) destroys the length data and do.call("rbind", lst) fails because of the uneven lengths. It is possible to go through the vector element by element, but that has proved too slow for my purposes. Is there a reasonably quick method of achieving this in a vector-oriented way? Cheers, Ben [[alternative HTML version deleted]]
R. Michael Weylandt
2011-Oct-03 20:15 UTC
[R] Parsing variable-length delimited strings into a matrix
Well how do you want it be made into a matrix if the rows are all different lengths? Methinks you are finding this tricky for a reason... Michael On Mon, Oct 3, 2011 at 11:40 AM, Benjamin Wright <bjw78 at well.ox.ac.uk> wrote:> > I'm struggling to find a way of parsing a vector of data in this sort of form: > > A,B,C > B,B > A,AA,C > A,B,BB,BBB,B,B > > into a matrix (or data frame). The catch is that I don't know a priori how many entries there will be in each element, nor how many characters there will be. strsplit(vec,",") gets me a list, but I can't find a way of turning the list into a matrix. unlistlst) destroys the length data and do.call("rbind", lst) fails because of the uneven lengths. It is possible to go through the vector element by element, but that has proved too slow for my purposes. > > Is there a reasonably quick method of achieving this in a vector-oriented way? > > Cheers, > > Ben > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
jim holtman
2011-Oct-04 12:43 UTC
[R] Parsing variable-length delimited strings into a matrix
Will this do it for you:> x <- readLines(textConnection("A,B,C+ B,B + A,AA,C + A,B,BB,BBB,B,B"))> closeAllConnections() > x.s <- strsplit(x, ',') > # determine max length > x.max <- max(sapply(x.s, length)) > # create character matrix > x.mat <- matrix(+ sapply(x.s, function(a) c(a, rep(NA, x.max - length(a)))) + , byrow = TRUE + , ncol = x.max + )> > > x.mat[,1] [,2] [,3] [,4] [,5] [,6] [1,] "A" "B" "C" NA NA NA [2,] "B" "B" NA NA NA NA [3,] "A" "AA" "C" NA NA NA [4,] "A" "B" "BB" "BBB" "B" "B">On Mon, Oct 3, 2011 at 11:40 AM, Benjamin Wright <bjw78 at well.ox.ac.uk> wrote:> > I'm struggling to find a way of parsing a vector of data in this sort of form: > > A,B,C > B,B > A,AA,C > A,B,BB,BBB,B,B > > into a matrix (or data frame). The catch is that I don't know a priori how many entries there will be in each element, nor how many characters there will be. strsplit(vec,",") gets me a list, but I can't find a way of turning the list into a matrix. unlistlst) destroys the length data and do.call("rbind", lst) fails because of the uneven lengths. It is possible to go through the vector element by element, but that has proved too slow for my purposes. > > Is there a reasonably quick method of achieving this in a vector-oriented way? > > Cheers, > > Ben > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?