sunny
2011-May-04 23:08 UTC
[R] split character vector by multiple keywords simultaneously
Hi. I have a character vector that looks like this:> temp <- c("Company name: The first company General Manager: John Doe I > Managers: John Doe II, John Doe III","Company name: The second company > General Manager: Jane Doe I","Company name: The third company Managers: > Jane Doe II, Jane Doe III") > temp[1] "Company name: The first company General Manager: John Doe I Managers: John Doe II, John Doe III" [2] "Company name: The second company General Manager: Jane Doe I" [3] "Company name: The third company Managers: Jane Doe II, Jane Doe III" I know all the keywords, i.e. "Company name:", "General Manager:", "Managers:" etc. I'm looking for a way to split this character vector into multiple character vectors, with one column for each keyword and the corresponding values for each, i.e. Company name General Manager Managers 1 The first company John Doe I John Doe II, John Doe III 2 The second company Jane Doe I 3 The third company Jane Doe II, Jane Doe III I have tried a lot to find something suitable but haven't so far. Any help will be greatly appreciated. I am running R-2.12.1 on x86_64 linux. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/split-character-vector-by-multiple-keywords-simultaneously-tp3497033p3497033.html Sent from the R help mailing list archive at Nabble.com.
Andrew Robinson
2011-May-05 02:22 UTC
[R] split character vector by multiple keywords simultaneously
A hack would be to use gsub() to prepend e.g. XXX to the keywords that you want, perform a strsplit() to break the lines into component strings, and then substr() to extract the pieces that you want from those strings. Cheers Andrew On Wed, May 04, 2011 at 04:08:40PM -0700, sunny wrote:> Hi. I have a character vector that looks like this: > > > temp <- c("Company name: The first company General Manager: John Doe I > > Managers: John Doe II, John Doe III","Company name: The second company > > General Manager: Jane Doe I","Company name: The third company Managers: > > Jane Doe II, Jane Doe III") > > temp > [1] "Company name: The first company General Manager: John Doe I Managers: > John Doe II, John Doe III" > [2] "Company name: The second company General Manager: Jane Doe I" > [3] "Company name: The third company Managers: Jane Doe II, Jane Doe III" > > I know all the keywords, i.e. "Company name:", "General Manager:", > "Managers:" etc. I'm looking for a way to split this character vector into > multiple character vectors, with one column for each keyword and the > corresponding values for each, i.e. > > Company name General Manager Managers > 1 The first company John Doe I John Doe II, John > Doe III > 2 The second company Jane Doe I > 3 The third company Jane Doe II, > Jane Doe III > > I have tried a lot to find something suitable but haven't so far. Any help > will be greatly appreciated. I am running R-2.12.1 on x86_64 linux. > > Thanks. > > -- > View this message in context: http://r.789695.n4.nabble.com/split-character-vector-by-multiple-keywords-simultaneously-tp3497033p3497033.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/
Greg Snow
2011-May-06 16:11 UTC
[R] split character vector by multiple keywords simultaneously
Will all the keywords always be present in the same order? Or are you looking for the keywords, but some may be absent or in different orders? Look into the gsubfn package for some tools that could help. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of sunny Sent: Wednesday, May 04, 2011 5:09 PM To: r-help at r-project.org Subject: [R] split character vector by multiple keywords simultaneously Hi. I have a character vector that looks like this:> temp <- c("Company name: The first company General Manager: John Doe I > Managers: John Doe II, John Doe III","Company name: The second company > General Manager: Jane Doe I","Company name: The third company Managers: > Jane Doe II, Jane Doe III") > temp[1] "Company name: The first company General Manager: John Doe I Managers: John Doe II, John Doe III" [2] "Company name: The second company General Manager: Jane Doe I" [3] "Company name: The third company Managers: Jane Doe II, Jane Doe III" I know all the keywords, i.e. "Company name:", "General Manager:", "Managers:" etc. I'm looking for a way to split this character vector into multiple character vectors, with one column for each keyword and the corresponding values for each, i.e. Company name General Manager Managers 1 The first company John Doe I John Doe II, John Doe III 2 The second company Jane Doe I 3 The third company Jane Doe II, Jane Doe III I have tried a lot to find something suitable but haven't so far. Any help will be greatly appreciated. I am running R-2.12.1 on x86_64 linux. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/split-character-vector-by-multiple-keywords-simultaneously-tp3497033p3497033.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.