Jim Bouldin
2010-May-06 12:12 UTC
[R] splitting character strings and converting to numeric vectors
This seemingly should be quite simple but I can't solve it: I have a long character vector of geographic data (data frame column named "XY") whose elements vary in length (from 11 to 14 chars). Each element is structured as a set of digits, then an underscore, then more digits, e.g:> data.frame(head(as.character(XY)))head.as.character.XY.. 1 -448623_854854 2 -448563_854850 3 -448442_854842 4 -448301_854833 5 -448060_854818 6 -446828_854736 I simply need to separate the two sets of digits from each other and assign them into new columns. The closest I've been able to get is by:> test=t(as.matrix(data.frame(head(strsplit(as.character(XY), "\\_"))))) > test[,1] [,2] c...448623....854854.. "-448623" "854854" c...448563....854850.. "-448563" "854850" c...448442....854842.. "-448442" "854842" c...448301....854833.. "-448301" "854833" c...448060....854818.. "-448060" "854818" c...446828....854736.. "-446828" "854736" So far so good, but columns 1:2 will not coerce to either numeric or integer, for unknown reasons. Thanks for any help (and/or suggestions on a better way to code this). Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740
Petr PIKAL
2010-May-06 12:43 UTC
[R] Odp: splitting character strings and converting to numeric vectors
Hi Probably not the best way> read.table("clipboard", header=T)head.as.character.XY.. 1 -448623_854854 2 -448563_854850 3 -448442_854842 4 -448301_854833 5 -448060_854818 6 -446828_854736> test=read.table("clipboard", header=T)now do the split> ls<-strsplit(as.character(test[,1]), "_") > ls[[1]] [1] "-448623" "854854" [[2]] [1] "-448563" "854850" and change list to numeric t(sapply(ls, as.numeric)) [,1] [,2] [1,] -448623 854854 [2,] -448563 854850 [3,] -448442 854842 [4,] -448301 854833 [5,] -448060 854818 [6,] -446828 854736 Result is numeric matrix. You probably could start directly from your XY object, but it depends what it is. And maybe you could read it the way Gabor recently post with read.table(...., sep= "_", ....) Regards Petr r-help-bounces at r-project.org napsal dne 06.05.2010 14:12:40:> > This seemingly should be quite simple but I can't solve it: > > I have a long character vector of geographic data (data frame columnnamed> "XY") whose elements vary in length (from 11 to 14 chars). Each elementis> structured as a set of digits, then an underscore, then more digits,e.g:> > > data.frame(head(as.character(XY))) > head.as.character.XY.. > 1 -448623_854854 > 2 -448563_854850 > 3 -448442_854842 > 4 -448301_854833 > 5 -448060_854818 > 6 -446828_854736 > > I simply need to separate the two sets of digits from each other andassign> them into new columns. The closest I've been able to get is by: > > > test=t(as.matrix(data.frame(head(strsplit(as.character(XY), "\\_"))))) > > test > [,1] [,2] > c...448623....854854.. "-448623" "854854" > c...448563....854850.. "-448563" "854850" > c...448442....854842.. "-448442" "854842" > c...448301....854833.. "-448301" "854833" > c...448060....854818.. "-448060" "854818" > c...446828....854736.. "-446828" "854736" > > So far so good, but columns 1:2 will not coerce to either numeric or > integer, for unknown reasons. Thanks for any help (and/or suggestionson a> better way to code this). > > > > Jim Bouldin, PhD > Research Ecologist > Department of Plant Sciences, UC Davis > Davis CA, 95616 > 530-554-1740 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck
2010-May-06 12:58 UTC
[R] splitting character strings and converting to numeric vectors
Try this:> x <- c("-448623_854854", "-448563_854850", "-448442_854842", "-448301_854833",+ "-448060_854818", "-446828_854736")> > > read.table(textConnection(x), sep = "_")V1 V2 1 -448623 854854 2 -448563 854850 3 -448442 854842 4 -448301 854833 5 -448060 854818 6 -446828 854736 Here is another way.> library(gsubfn) # see http://gsubfn.googlecode.com > > data.frame(strapply(x, "[-0-9]+", as.numeric, simplify = rbind))X1 X2 1 -448623 854854 2 -448563 854850 3 -448442 854842 4 -448301 854833 5 -448060 854818 6 -446828 854736 If you omit data.frame then it will return a matrix. On Thu, May 6, 2010 at 8:12 AM, Jim Bouldin <jrbouldin at ucdavis.edu> wrote:> > This seemingly should be quite simple but I can't solve it: > > I have a long character vector of geographic data (data frame column named > "XY") whose elements vary in length (from 11 to 14 chars). ?Each element is > structured as a set of digits, then an underscore, then more digits, e.g: > >> data.frame(head(as.character(XY))) > ?head.as.character.XY.. > 1 ? ? ? ? -448623_854854 > 2 ? ? ? ? -448563_854850 > 3 ? ? ? ? -448442_854842 > 4 ? ? ? ? -448301_854833 > 5 ? ? ? ? -448060_854818 > 6 ? ? ? ? -446828_854736 > > I simply need to separate the two sets of digits from each other and assign > them into new columns. ?The closest I've been able to get is by: > >> test=t(as.matrix(data.frame(head(strsplit(as.character(XY), "\\_"))))) >> test > ? ? ? ? ? ? ? ? ? ? ? [,1] ? ? ?[,2] > c...448623....854854.. "-448623" "854854" > c...448563....854850.. "-448563" "854850" > c...448442....854842.. "-448442" "854842" > c...448301....854833.. "-448301" "854833" > c...448060....854818.. "-448060" "854818" > c...446828....854736.. "-446828" "854736" > > So far so good, but ?columns 1:2 will not coerce to either numeric or > integer, for unknown reasons. ?Thanks for any help (and/or suggestions on a > better way to code this). > > > > Jim Bouldin, PhD > Research Ecologist > Department of Plant Sciences, UC Davis > Davis CA, 95616 > 530-554-1740 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >