emorway
2012-Feb-11 21:51 UTC
[R] obtaining a true/false vector with combination of strsplit, length, unlist,
Hi, A pared down version of the dataset I'm working with: edm<-read.table(textConnection("WELLID X_GRID Y_GRID LAYER ROW COLUMN SPECIES CALCULATED OBSERVED w301_3 4428. 1389 2 6 18 1 3558 6490. w304_12 4836. 6627 2 27 20 1 3509 3228. 02_10_12080 3.6125E+04 13875 1 56 145 1 2774 -999.0 02_10_12080 3.6125E+04 13875 1 56 145 1 2774 -999.0 02_10_12081 3.6375E+04 13875 1 56 146 1 3493 -999.0 02_10_12092 3.9125E+04 13875 1 56 157 1 4736 -999.0 w305_12 2962. 7326 2 30 12 1 4575 5899."),header=T) closeAllConnections() I'm having a hard time coming up with the R code that would produce a TRUE/FALSE vector based on whether or not the first column of the data.frame "edm" has a length of 2 or 3? To show what I mean going row-by-row, I could do the following:> length(strsplit(as.character(edm$WELLID),"_")[[1]])==3[1] FALSE> length(strsplit(as.character(edm$WELLID),"_")[[2]])==3[1] FALSE> length(strsplit(as.character(edm$WELLID),"_")[[3]])==3[1] TRUE> length(strsplit(as.character(edm$WELLID),"_")[[4]])==3[1] TRUE> length(strsplit(as.character(edm$WELLID),"_")[[5]])==3[1] TRUE> length(strsplit(as.character(edm$WELLID),"_")[[6]])==3[1] TRUE> length(strsplit(as.character(edm$WELLID),"_")[[7]])==3[1] FALSE I've fumbled around trying to come up with a line of R code that would create a vector that looks like: "FALSE FALSE TRUE TRUE TRUE TRUE FALSE" The final goal is to use this vector to create two new data.frames, where, for example, the first contains all the rows of edm in which the first column has a length of 2 when split using a "_" character. The second data.frame would contain all the rows in which the first column has a length of 3 when split using a "_" character. Thanks, Eric -- View this message in context: http://r.789695.n4.nabble.com/obtaining-a-true-false-vector-with-combination-of-strsplit-length-unlist-tp4380050p4380050.html Sent from the R help mailing list archive at Nabble.com.
Sarah Goslee
2012-Feb-11 21:58 UTC
[R] obtaining a true/false vector with combination of strsplit, length, unlist,
You are so very close:> sapply(edm[,1], function(x)length(strsplit(as.character(x), "_")[[1]]) == 3)[1] FALSE FALSE TRUE TRUE TRUE TRUE FALSE Thanks for providing a small reproducible example. dput() tends to work better for than than textConnection(), because many email clients add arbitrary newlines, messing up the text formatting. Sarah On Sat, Feb 11, 2012 at 4:51 PM, emorway <emorway at usgs.gov> wrote:> edm<-read.table(textConnection("WELLID ? ? ? ?X_GRID Y_GRID LAYER ROW COLUMN > SPECIES CALCULATED ? ? OBSERVED > w301_3 ? ? ? ? ?4428. ? ? ? 1389 ? ? 2 ? 6 ? ? 18 ? ? ? 1 ? ? ? 3558 > 6490. > w304_12 ? ? ? ? 4836. ? ? ? 6627 ? ? 2 ?27 ? ? 20 ? ? ? 1 ? ? ? 3509 > 3228. > 02_10_12080 ? ?3.6125E+04 ?13875 ? ? 1 ?56 ? ?145 ? ? ? 1 ? ? ? 2774 > -999.0 > 02_10_12080 ? ?3.6125E+04 ?13875 ? ? 1 ?56 ? ?145 ? ? ? 1 ? ? ? 2774 > -999.0 > 02_10_12081 ? ?3.6375E+04 ?13875 ? ? 1 ?56 ? ?146 ? ? ? 1 ? ? ? 3493 > -999.0 > 02_10_12092 ? ?3.9125E+04 ?13875 ? ? 1 ?56 ? ?157 ? ? ? 1 ? ? ? 4736 > -999.0 > w305_12 ? ? ? ? 2962. ? ? ? 7326 ? ? 2 ?30 ? ? 12 ? ? ? 1 ? ? ? 4575 > 5899."),header=T) > closeAllConnections()-- Sarah Goslee http://www.functionaldiversity.org
Phil Spector
2012-Feb-11 22:00 UTC
[R] obtaining a true/false vector with combination of strsplit, length, unlist,
It sounds like the problem boils down to counting the number of "_"s in the WELLID variable, and seeing if there are two:> nchar(gsub('[^_]','',edm$WELLID)) == 2[1] FALSE FALSE TRUE TRUE TRUE TRUE FALSE - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Sat, 11 Feb 2012, emorway wrote:> Hi, > > A pared down version of the dataset I'm working with: > > edm<-read.table(textConnection("WELLID X_GRID Y_GRID LAYER ROW COLUMN > SPECIES CALCULATED OBSERVED > w301_3 4428. 1389 2 6 18 1 3558 > 6490. > w304_12 4836. 6627 2 27 20 1 3509 > 3228. > 02_10_12080 3.6125E+04 13875 1 56 145 1 2774 > -999.0 > 02_10_12080 3.6125E+04 13875 1 56 145 1 2774 > -999.0 > 02_10_12081 3.6375E+04 13875 1 56 146 1 3493 > -999.0 > 02_10_12092 3.9125E+04 13875 1 56 157 1 4736 > -999.0 > w305_12 2962. 7326 2 30 12 1 4575 > 5899."),header=T) > closeAllConnections() > > I'm having a hard time coming up with the R code that would produce a > TRUE/FALSE vector based on whether or not the first column of the data.frame > "edm" has a length of 2 or 3? To show what I mean going row-by-row, I could > do the following: > >> length(strsplit(as.character(edm$WELLID),"_")[[1]])==3 > [1] FALSE >> length(strsplit(as.character(edm$WELLID),"_")[[2]])==3 > [1] FALSE >> length(strsplit(as.character(edm$WELLID),"_")[[3]])==3 > [1] TRUE >> length(strsplit(as.character(edm$WELLID),"_")[[4]])==3 > [1] TRUE >> length(strsplit(as.character(edm$WELLID),"_")[[5]])==3 > [1] TRUE >> length(strsplit(as.character(edm$WELLID),"_")[[6]])==3 > [1] TRUE >> length(strsplit(as.character(edm$WELLID),"_")[[7]])==3 > [1] FALSE > > I've fumbled around trying to come up with a line of R code that would > create a vector that looks like: "FALSE FALSE TRUE TRUE TRUE TRUE FALSE" > > The final goal is to use this vector to create two new data.frames, where, > for example, the first contains all the rows of edm in which the first > column has a length of 2 when split using a "_" character. The second > data.frame would contain all the rows in which the first column has a length > of 3 when split using a "_" character. > > Thanks, > Eric > > -- > View this message in context: http://r.789695.n4.nabble.com/obtaining-a-true-false-vector-with-combination-of-strsplit-length-unlist-tp4380050p4380050.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >