emorway
2012-Feb-11 21:51 UTC
[R] obtaining a true/false vector with combination of strsplit, length, unlist,
Hi,
A pared down version of the dataset I'm working with:
edm<-read.table(textConnection("WELLID X_GRID Y_GRID LAYER ROW
COLUMN
SPECIES CALCULATED OBSERVED
w301_3 4428. 1389 2 6 18 1 3558
6490.
w304_12 4836. 6627 2 27 20 1 3509
3228.
02_10_12080 3.6125E+04 13875 1 56 145 1 2774
-999.0
02_10_12080 3.6125E+04 13875 1 56 145 1 2774
-999.0
02_10_12081 3.6375E+04 13875 1 56 146 1 3493
-999.0
02_10_12092 3.9125E+04 13875 1 56 157 1 4736
-999.0
w305_12 2962. 7326 2 30 12 1 4575
5899."),header=T)
closeAllConnections()
I'm having a hard time coming up with the R code that would produce a
TRUE/FALSE vector based on whether or not the first column of the data.frame
"edm" has a length of 2 or 3? To show what I mean going row-by-row, I
could
do the following:
> length(strsplit(as.character(edm$WELLID),"_")[[1]])==3
[1] FALSE> length(strsplit(as.character(edm$WELLID),"_")[[2]])==3
[1] FALSE> length(strsplit(as.character(edm$WELLID),"_")[[3]])==3
[1] TRUE> length(strsplit(as.character(edm$WELLID),"_")[[4]])==3
[1] TRUE> length(strsplit(as.character(edm$WELLID),"_")[[5]])==3
[1] TRUE> length(strsplit(as.character(edm$WELLID),"_")[[6]])==3
[1] TRUE> length(strsplit(as.character(edm$WELLID),"_")[[7]])==3
[1] FALSE
I've fumbled around trying to come up with a line of R code that would
create a vector that looks like: "FALSE FALSE TRUE TRUE TRUE TRUE
FALSE"
The final goal is to use this vector to create two new data.frames, where,
for example, the first contains all the rows of edm in which the first
column has a length of 2 when split using a "_" character. The second
data.frame would contain all the rows in which the first column has a length
of 3 when split using a "_" character.
Thanks,
Eric
--
View this message in context:
http://r.789695.n4.nabble.com/obtaining-a-true-false-vector-with-combination-of-strsplit-length-unlist-tp4380050p4380050.html
Sent from the R help mailing list archive at Nabble.com.
Sarah Goslee
2012-Feb-11 21:58 UTC
[R] obtaining a true/false vector with combination of strsplit, length, unlist,
You are so very close:> sapply(edm[,1], function(x)length(strsplit(as.character(x), "_")[[1]]) == 3)[1] FALSE FALSE TRUE TRUE TRUE TRUE FALSE Thanks for providing a small reproducible example. dput() tends to work better for than than textConnection(), because many email clients add arbitrary newlines, messing up the text formatting. Sarah On Sat, Feb 11, 2012 at 4:51 PM, emorway <emorway at usgs.gov> wrote:> edm<-read.table(textConnection("WELLID ? ? ? ?X_GRID Y_GRID LAYER ROW COLUMN > SPECIES CALCULATED ? ? OBSERVED > w301_3 ? ? ? ? ?4428. ? ? ? 1389 ? ? 2 ? 6 ? ? 18 ? ? ? 1 ? ? ? 3558 > 6490. > w304_12 ? ? ? ? 4836. ? ? ? 6627 ? ? 2 ?27 ? ? 20 ? ? ? 1 ? ? ? 3509 > 3228. > 02_10_12080 ? ?3.6125E+04 ?13875 ? ? 1 ?56 ? ?145 ? ? ? 1 ? ? ? 2774 > -999.0 > 02_10_12080 ? ?3.6125E+04 ?13875 ? ? 1 ?56 ? ?145 ? ? ? 1 ? ? ? 2774 > -999.0 > 02_10_12081 ? ?3.6375E+04 ?13875 ? ? 1 ?56 ? ?146 ? ? ? 1 ? ? ? 3493 > -999.0 > 02_10_12092 ? ?3.9125E+04 ?13875 ? ? 1 ?56 ? ?157 ? ? ? 1 ? ? ? 4736 > -999.0 > w305_12 ? ? ? ? 2962. ? ? ? 7326 ? ? 2 ?30 ? ? 12 ? ? ? 1 ? ? ? 4575 > 5899."),header=T) > closeAllConnections()-- Sarah Goslee http://www.functionaldiversity.org
Phil Spector
2012-Feb-11 22:00 UTC
[R] obtaining a true/false vector with combination of strsplit, length, unlist,
It sounds like the problem boils down to counting the number of "_"s in the WELLID variable, and seeing if there are two:> nchar(gsub('[^_]','',edm$WELLID)) == 2[1] FALSE FALSE TRUE TRUE TRUE TRUE FALSE - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Sat, 11 Feb 2012, emorway wrote:> Hi, > > A pared down version of the dataset I'm working with: > > edm<-read.table(textConnection("WELLID X_GRID Y_GRID LAYER ROW COLUMN > SPECIES CALCULATED OBSERVED > w301_3 4428. 1389 2 6 18 1 3558 > 6490. > w304_12 4836. 6627 2 27 20 1 3509 > 3228. > 02_10_12080 3.6125E+04 13875 1 56 145 1 2774 > -999.0 > 02_10_12080 3.6125E+04 13875 1 56 145 1 2774 > -999.0 > 02_10_12081 3.6375E+04 13875 1 56 146 1 3493 > -999.0 > 02_10_12092 3.9125E+04 13875 1 56 157 1 4736 > -999.0 > w305_12 2962. 7326 2 30 12 1 4575 > 5899."),header=T) > closeAllConnections() > > I'm having a hard time coming up with the R code that would produce a > TRUE/FALSE vector based on whether or not the first column of the data.frame > "edm" has a length of 2 or 3? To show what I mean going row-by-row, I could > do the following: > >> length(strsplit(as.character(edm$WELLID),"_")[[1]])==3 > [1] FALSE >> length(strsplit(as.character(edm$WELLID),"_")[[2]])==3 > [1] FALSE >> length(strsplit(as.character(edm$WELLID),"_")[[3]])==3 > [1] TRUE >> length(strsplit(as.character(edm$WELLID),"_")[[4]])==3 > [1] TRUE >> length(strsplit(as.character(edm$WELLID),"_")[[5]])==3 > [1] TRUE >> length(strsplit(as.character(edm$WELLID),"_")[[6]])==3 > [1] TRUE >> length(strsplit(as.character(edm$WELLID),"_")[[7]])==3 > [1] FALSE > > I've fumbled around trying to come up with a line of R code that would > create a vector that looks like: "FALSE FALSE TRUE TRUE TRUE TRUE FALSE" > > The final goal is to use this vector to create two new data.frames, where, > for example, the first contains all the rows of edm in which the first > column has a length of 2 when split using a "_" character. The second > data.frame would contain all the rows in which the first column has a length > of 3 when split using a "_" character. > > Thanks, > Eric > > -- > View this message in context: http://r.789695.n4.nabble.com/obtaining-a-true-false-vector-with-combination-of-strsplit-length-unlist-tp4380050p4380050.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >