Hello, I have a dataset with multiple entries in one field separated by "/" characters. (The true dataset has long names, 20-odd variables, and hundreds of observations.) v1 v2 1 A L 2 A/B M 3 C N 4 D/E/F O 5 A P 6 C L What I would like is to have a dataset that looks like this instead:> my.dfv1 v2 1 A L 2 A M 3 B M 4 C N 5 D O 6 E O 7 F O 8 A P 9 C L My original thought was to break the string into variables using strsplit(), create new columns in the data frame using cbind(), and then reshape the dataset with the melt() function.> v1.new <- as.character(my.df$v1) > v1.new <- strsplit(v1.new, "/") > v1.new[[1]] [1] "A" [[2]] [1] "A" "B" [[3]] [1] "C" [[4]] [1] "D" "E" "F" [[5]] [1] "A" [[6]] [1] "C" My next thought was to coerce the list into a data frame, but I ran into an error because the list output from strsplit() does not contain equal length vectors.> v1.cols <- data.frame(v1.new, check.rows=FALSE)Error in data.frame("A", c("A", "B"), "C", c("D", "E", "F"), "A", "C", : arguments imply differing number of rows: 1, 2, 3 How can I create a data frame from the unequal length vectors that result from strsplit(my.df$v1)? Am I going about this the wrong way? I have also tried to use colsplit{reshape} without success. Thank you for any advice you can offer. I hope the answer to this question is not too obvious.
try this:> xv1 v2 1 A L 2 A/B M 3 C N 4 D/E/F O 5 A P 6 C L> as.data.frame(do.call(rbind, apply(x, 1, function(.row){+ cbind(strsplit(.row[1], '/')[[1]], .row[2]) + })),row.names='') V1 V2 1 A L 2 A M 3 B M 4 C N 5 D O 6 E O 7 F O 8 A P 9 C L>On Mon, Jul 20, 2009 at 4:46 PM, Ben Mazzotta<benjamin.mazzotta at tufts.edu> wrote:> Hello, > > I have a dataset with multiple entries in one field separated by "/" > characters. (The true dataset has long names, 20-odd variables, and > hundreds of observations.) > > > ? ? v1 v2 > 1 ? ? A ?L > 2 ? A/B ?M > 3 ? ? C ?N > 4 D/E/F ?O > 5 ? ? A ?P > 6 ? ? C ?L > > > What I would like is to have a dataset that looks like this instead: > >> my.df > ?v1 v2 > 1 ?A ?L > 2 ?A ?M > 3 ?B ?M > 4 ?C ?N > 5 ?D ?O > 6 ?E ?O > 7 ?F ?O > 8 ?A ?P > 9 ?C ?L > > > My original thought was to break the string into variables using > strsplit(), create new columns in the data frame using cbind(), and then > reshape the dataset with the melt() function. > >> v1.new <- as.character(my.df$v1) >> v1.new <- strsplit(v1.new, "/") >> v1.new > [[1]] > [1] "A" > > [[2]] > [1] "A" "B" > > [[3]] > [1] "C" > > [[4]] > [1] "D" "E" "F" > > [[5]] > [1] "A" > > [[6]] > [1] "C" > > My next thought was to coerce the list into a data frame, but ?I ran > into an error because the list output from strsplit() does not contain > equal length vectors. > >> v1.cols <- data.frame(v1.new, check.rows=FALSE) > Error in data.frame("A", c("A", "B"), "C", c("D", "E", "F"), "A", "C", ?: > ?arguments imply differing number of rows: 1, 2, 3 > > > How can I create a data frame from the unequal length vectors that > result from strsplit(my.df$v1)? > > Am I going about this the wrong way? I have also tried to use > colsplit{reshape} without success. > > Thank you for any advice you can offer. I hope the answer to this > question is not too obvious. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Henrique Dallazuanna
2009-Jul-21 01:10 UTC
[R] data frame from list of lists with unequal lengths
Try this: r <- strsplit(as.character(x$v1), "/") cbind(unlist(r), rep(x$v2, sapply(r, length))) On Mon, Jul 20, 2009 at 5:46 PM, Ben Mazzotta <benjamin.mazzotta@tufts.edu>wrote:> Hello, > > I have a dataset with multiple entries in one field separated by "/" > characters. (The true dataset has long names, 20-odd variables, and > hundreds of observations.) > > > v1 v2 > 1 A L > 2 A/B M > 3 C N > 4 D/E/F O > 5 A P > 6 C L > > > What I would like is to have a dataset that looks like this instead: > > > my.df > v1 v2 > 1 A L > 2 A M > 3 B M > 4 C N > 5 D O > 6 E O > 7 F O > 8 A P > 9 C L > > > My original thought was to break the string into variables using > strsplit(), create new columns in the data frame using cbind(), and then > reshape the dataset with the melt() function. > > > v1.new <- as.character(my.df$v1) > > v1.new <- strsplit(v1.new, "/") > > v1.new > [[1]] > [1] "A" > > [[2]] > [1] "A" "B" > > [[3]] > [1] "C" > > [[4]] > [1] "D" "E" "F" > > [[5]] > [1] "A" > > [[6]] > [1] "C" > > My next thought was to coerce the list into a data frame, but I ran > into an error because the list output from strsplit() does not contain > equal length vectors. > > > v1.cols <- data.frame(v1.new, check.rows=FALSE) > Error in data.frame("A", c("A", "B"), "C", c("D", "E", "F"), "A", "C", : > arguments imply differing number of rows: 1, 2, 3 > > > How can I create a data frame from the unequal length vectors that > result from strsplit(my.df$v1)? > > Am I going about this the wrong way? I have also tried to use > colsplit{reshape} without success. > > Thank you for any advice you can offer. I hope the answer to this > question is not too obvious. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]