ZeMajik
2010-Feb-08 23:31 UTC
[R] Dividing one column of form xx-yy into two columns, xx and yy
I have a data set where one column consists of two numerical factors, separated by a "-". So my data looks something like this: 43-156 43-43 1267-18 . . . There are additional columns consisting of single factors as well, so reading the csv file (where the data is stored) with the sep="-" addition won't work since the rest of the factors are separated by commas. So first of all, is there any way to import a file which is separated by "," OR "-"? If this is not possible, does anyone have any ideas how I could go about to separate these? I could use a text editor to replace the - with , and import, but I would prefer doing this inside of R so that making a script could be used in the future. Just to clarify, I would like the above to turn out as two separate columns (or vectors) where the first in this would be (43,43,1267,....) and the second (156,43,18,.....) The dataset is rather large, with a few hundred thousand lines, so it would be preferable to keep resource intensive methods to a minimum if possible. Thanks in advance! Mike [[alternative HTML version deleted]]
Peter Alspach
2010-Feb-09 00:09 UTC
[R] Dividing one column of form xx-yy into two columns, xx and yy
Tena koe Mike ?strsplit for post input separation. AFAIK there is one cannot specify multiple separators, but: library(fortunes) fortune('this is R') HTH .... Peter Alspach> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of ZeMajik > Sent: Tuesday, 9 February 2010 12:32 p.m. > To: R mailing list > Subject: [R] Dividing one column of form xx-yy into two > columns, xx and yy > > I have a data set where one column consists of two numerical > factors, separated by a "-". > So my data looks something like this: > > 43-156 > 43-43 > 1267-18 > . > . > . > > There are additional columns consisting of single factors as > well, so reading the csv file (where the data is stored) with > the sep="-" addition won't work since the rest of the factors > are separated by commas. > So first of all, is there any way to import a file which is > separated by "," > OR "-"? > > If this is not possible, does anyone have any ideas how I > could go about to separate these? I could use a text editor > to replace the - with , and import, but I would prefer doing > this inside of R so that making a script could be used in the future. > > Just to clarify, I would like the above to turn out as two > separate columns (or vectors) where the first in this would > be (43,43,1267,....) and the second (156,43,18,.....) The > dataset is rather large, with a few hundred thousand lines, > so it would be preferable to keep resource intensive methods > to a minimum if possible. > > Thanks in advance! > Mike > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Bill.Venables at csiro.au
2010-Feb-09 02:07 UTC
[R] Dividing one column of form xx-yy into two columns, xx and yy
Here is one way.> datV1 1 43-156 2 43-43 3 1267-18> dat <- within(dat, {+ m <- do.call("rbind", strsplit(as.character(V1), "-")) + XX <- as.numeric(m[,1]) + YY <- as.numeric(m[,2]) + rm(m) + })> datV1 YY XX 1 43-156 156 43 2 43-43 43 43 3 1267-18 18 1267>Bill Venables CSIRO/CMIS Cleveland Laboratories -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of ZeMajik Sent: Tuesday, 9 February 2010 9:32 AM To: R mailing list Subject: [R] Dividing one column of form xx-yy into two columns, xx and yy I have a data set where one column consists of two numerical factors, separated by a "-". So my data looks something like this: 43-156 43-43 1267-18 . . . There are additional columns consisting of single factors as well, so reading the csv file (where the data is stored) with the sep="-" addition won't work since the rest of the factors are separated by commas. So first of all, is there any way to import a file which is separated by "," OR "-"? If this is not possible, does anyone have any ideas how I could go about to separate these? I could use a text editor to replace the - with , and import, but I would prefer doing this inside of R so that making a script could be used in the future. Just to clarify, I would like the above to turn out as two separate columns (or vectors) where the first in this would be (43,43,1267,....) and the second (156,43,18,.....) The dataset is rather large, with a few hundred thousand lines, so it would be preferable to keep resource intensive methods to a minimum if possible. Thanks in advance! Mike [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck
2010-Feb-09 02:19 UTC
[R] Dividing one column of form xx-yy into two columns, xx and yy
If you are willing to use an outside utility, tr, and you are using UNIX then you can pipe the input through it so read.csv sees all the dashes as commas:> cat("A,B-C,D+ 1,2-3,4 + 5,6-7,8 + ", file = "dashcomma.dat")> > read.csv(pipe("tr - , < dashcomma.dat"))A B C D 1 1 2 3 4 2 5 6 7 8 On Windows tr is available from Duncan Murdoch's Rtools distribution. (google to find it). On Mon, Feb 8, 2010 at 6:31 PM, ZeMajik <zemajik at gmail.com> wrote:> I have a data set where one column consists of two numerical factors, > separated by a "-". > So my data looks something like this: > > 43-156 > 43-43 > 1267-18 > . > . > . > > There are additional columns consisting of single factors as well, so > reading the csv file (where the data is stored) with the sep="-" addition > won't work since the rest of the factors are separated by commas. > So first of all, is there any way to import a file which is separated by "," > OR "-"? > > If this is not possible, does anyone have any ideas how I could go about to > separate these? I could use a text editor to replace the - with , and > import, but I would prefer doing this inside of R so that making a script > could be used in the future. > > Just to clarify, I would like the above to turn out as two separate columns > (or vectors) where the first in this would be (43,43,1267,....) and the > second (156,43,18,.....) > The dataset is rather large, with a few hundred thousand lines, so it would > be preferable to keep resource intensive methods to a minimum if possible. > > Thanks in advance! > Mike > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >