Cliff Clive
2011-Apr-14 21:04 UTC
[R] Creating a dataframe from a vector of character strings
I have a vector of character strings that I would like to split in two, and place in columns of a dataframe. So for example, I start with this: beatles <- c("John Lennon", "Paul McCartney", "George Harrison", "Ringo Starr") and I want to end up with a data frame that looks like this:> Beatles = data.frame(firstName=c("John", "Paul", "George", "Ringo"),lastName=c("Lennon", "McCartney", "Harrison", "Starr"))> BeatlesfirstName lastName 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr I tried string-splitting the first vector on the spaces between first and last names, and it returned a list:> strsplit(beatles, " ")[[1]] [1] "John" "Lennon" [[2]] [1] "Paul" "McCartney" [[3]] [1] "George" "Harrison" [[4]] [1] "Ringo" "Starr" Is there a fast way to convert this list into a data frame? Right now all I can think of is using a for loop, which I would like to avoid, since the real application I am working on involves a much larger dataset. -- View this message in context: http://r.789695.n4.nabble.com/Creating-a-dataframe-from-a-vector-of-character-strings-tp3450716p3450716.html Sent from the R help mailing list archive at Nabble.com.
Tóth Dénes
2011-Apr-14 22:33 UTC
[R] Creating a dataframe from a vector of character strings
You could use ?unlist: structure(data.frame( matrix(unlist(strsplit(beatles," ")),length(beatles),2,T)), names=c("FirstName","LastName")) Note that this compact code does not guard you against typos, that is names with >2 or <2 elements. Hope that helps, Denes> I have a vector of character strings that I would like to split in two, > and > place in columns of a dataframe. > > So for example, I start with this: > > beatles <- c("John Lennon", "Paul McCartney", "George Harrison", "Ringo > Starr") > > and I want to end up with a data frame that looks like this: > >> Beatles = data.frame(firstName=c("John", "Paul", "George", "Ringo"), > lastName=c("Lennon", "McCartney", "Harrison", > "Starr")) >> Beatles > firstName lastName > 1 John Lennon > 2 Paul McCartney > 3 George Harrison > 4 Ringo Starr > > > I tried string-splitting the first vector on the spaces between first and > last names, and it returned a list: > >> strsplit(beatles, " ") > [[1]] > [1] "John" "Lennon" > > [[2]] > [1] "Paul" "McCartney" > > [[3]] > [1] "George" "Harrison" > > [[4]] > [1] "Ringo" "Starr" > > > Is there a fast way to convert this list into a data frame? Right now all > I > can think of is using a for loop, which I would like to avoid, since the > real application I am working on involves a much larger dataset. > > -- > View this message in context: > http://r.789695.n4.nabble.com/Creating-a-dataframe-from-a-vector-of-character-strings-tp3450716p3450716.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Rolf Turner
2011-Apr-14 22:41 UTC
[R] Creating a dataframe from a vector of character strings
On 15/04/11 09:04, Cliff Clive wrote:> I have a vector of character strings that I would like to split in two, and > place in columns of a dataframe. > > So for example, I start with this: > > beatles<- c("John Lennon", "Paul McCartney", "George Harrison", "Ringo > Starr") > > and I want to end up with a data frame that looks like this: > >> Beatles = data.frame(firstName=c("John", "Paul", "George", "Ringo"), > lastName=c("Lennon", "McCartney", "Harrison", > "Starr")) >> Beatles > firstName lastName > 1 John Lennon > 2 Paul McCartney > 3 George Harrison > 4 Ringo Starr > > > I tried string-splitting the first vector on the spaces between first and > last names, and it returned a list: > >> strsplit(beatles, " ") > [[1]] > [1] "John" "Lennon" > > [[2]] > [1] "Paul" "McCartney" > > [[3]] > [1] "George" "Harrison" > > [[4]] > [1] "Ringo" "Starr" > > > Is there a fast way to convert this list into a data frame? Right now all I > can think of is using a for loop, which I would like to avoid, since the > real application I am working on involves a much larger datasetWhenever you think of using a for loop, stop and think about using some flavour of apply() instead: melvin <- strsplit(beatles," ") clyde <- data.frame(firstName=sapply(melvin,function(x){x[1]}), lastName=sapply(melvin,function(x){x[2]})) cheers, Rolf Turner
Brian Diggs
2011-Apr-14 22:55 UTC
[R] Creating a dataframe from a vector of character strings
On 4/14/2011 2:04 PM, Cliff Clive wrote:> I have a vector of character strings that I would like to split in two, and > place in columns of a dataframe. > > So for example, I start with this: > > beatles<- c("John Lennon", "Paul McCartney", "George Harrison", "Ringo > Starr") > > and I want to end up with a data frame that looks like this: > >> Beatles = data.frame(firstName=c("John", "Paul", "George", "Ringo"), > lastName=c("Lennon", "McCartney", "Harrison", > "Starr")) >> Beatles > firstName lastName > 1 John Lennon > 2 Paul McCartney > 3 George Harrison > 4 Ringo Starr > > > I tried string-splitting the first vector on the spaces between first and > last names, and it returned a list: > >> strsplit(beatles, " ") > [[1]] > [1] "John" "Lennon" > > [[2]] > [1] "Paul" "McCartney" > > [[3]] > [1] "George" "Harrison" > > [[4]] > [1] "Ringo" "Starr" > > > Is there a fast way to convert this list into a data frame? Right now all I > can think of is using a for loop, which I would like to avoid, since the > real application I am working on involves a much larger dataset.Another approach, in addition to the ones you have already been given, is to use the colsplit function in the reshape package. This is the sort of thing it is designed to do. library("reshape") colsplit(beatles, " ", names=c("firstName", "lastName")) Similar caveats apply, though, in that it assumes only 2 names that are separated by one space (and will give a warning if that is not the case). -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University