Jason Rupert
2009-Sep-25 17:58 UTC
[R] grep or other complex string matching approach to capture necessary information...
Say I have the following data: house_number<-floor(runif(100, 200, 600)) water_evaluation<-c("No water damage", "Water damage", "Water On", "Water off", "water pipes damaged", "leaking water") water_evaluation_selection<-floor(runif(100, 1,6)) house_info<-data.frame(water_evaluation[water_evaluation_selection], house_number) And, that I only want to pull out the ones with negative water evaluations, i.e. Water damage, water pipes damaged, and leaking water. Should/could I use grep in order to pull the house numbers out of house_info with those negative water evaluations? I guess I want to know the house numbers from house_info where the water evaluation is negative. Is there a way to use grep or another R function in order to acquire that information? Thank you again in advance for any insights.
Tony Plate
2009-Sep-25 18:13 UTC
[R] grep or other complex string matching approach to capture necessary information...
You could use grep, but it's probably easier to use %in% (see also is.element()), e.g.:> house_info[ house_info[,1] %in% c("Water damage", "water pipes damaged", "leaking water"), ]water_evaluation.water_evaluation_selection. house_number 6 water pipes damaged 489 8 water pipes damaged 512 11 water pipes damaged 597 19 Water damage 478 21 water pipes damaged 373 23 Water damage 465 ....> house_info[ house_info[,1] %in% c("Water damage", "water pipes damaged", "leaking water"), 2][1] 489 512 597 478 373 465 337 362 234 535 551 351 415 495 220 216 317 443 346 577 585 268 463 441 225 200 304 486 390 476 485 247 [33] 399 504 262 551 575 359 538> sort(unique(house_info[ house_info[,1] %in% c("Water damage", "water pipes damaged", "leaking water"), 2]))[1] 200 216 220 225 234 247 262 268 304 317 337 346 351 359 362 373 390 399 415 441 443 463 465 476 478 485 486 489 495 504 512 535 [33] 538 551 575 577 585 597>Also, an easier way to generated random integers is sample(), e.g.> sample(1:3, size=5, rep=T)[1] 3 1 2 1 1>(This is more straightforward, and more easily avoids possibly unintended errors such as floor(runif(100, 1,6) never generating a 6, but do be careful of the gotcha that sample(2:3, ...) will generate a selection of 2's and 3's, while sample(3,...) will generate samples from 1, 2, and 3.) -- Tony Plate Jason Rupert wrote:> Say I have the following data: > > > house_number<-floor(runif(100, 200, 600)) > water_evaluation<-c("No water damage", "Water damage", "Water On", "Water off", "water pipes damaged", "leaking water") > water_evaluation_selection<-floor(runif(100, 1,6)) > house_info<-data.frame(water_evaluation[water_evaluation_selection], > house_number) > > And, that I only want to pull out the ones with negative water evaluations, i.e. Water damage, water pipes damaged, and leaking water. > > Should/could I use grep in order to pull the house numbers out of house_info with those negative water evaluations? > > I guess I want to know the house numbers from house_info where the water evaluation is negative. Is there a way to use grep or another R function in order to acquire that information? > > Thank you again in advance for any insights. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
John Kane
2009-Sep-26 15:40 UTC
[R] grep or other complex string matching approach to capture necessary information...
?subset problems <- c( "Water damage", "Water off", "water pipes damaged", "leaking water") damaged <- subset(house_info, house_info[,1]==problems[1] | house_info[,1]==problems[2] | house_info[,1]==problems[3] | house_info[,1]==problems[4]) or am I misunderstanding the question? or perhaps %in% which probably does the job more elegantly but I forget the syntax at the moment. --- On Fri, 9/25/09, Jason Rupert <jasonkrupert at yahoo.com> wrote:> From: Jason Rupert <jasonkrupert at yahoo.com> > Subject: [R] grep or other complex string matching approach to capture necessary information... > To: R-help at r-project.org > Received: Friday, September 25, 2009, 1:58 PM > Say I have the following data: > > > house_number<-floor(runif(100, 200, 600)) > water_evaluation<-c("No water damage", "Water damage", > "Water On", "Water off", "water pipes damaged", "leaking > water") > water_evaluation_selection<-floor(runif(100, 1,6)) > house_info<-data.frame(water_evaluation[water_evaluation_selection], > ? ? ? ? ? ? ? ? > ? ? ???house_number) > > And, that I only want to pull out the ones with negative > water evaluations, i.e. Water damage, water pipes damaged, > and leaking water. > > Should/could I use grep in order to pull the house numbers > out of house_info with those negative water > evaluations?? > > I guess I want to know the house numbers from house_info > where the water evaluation is negative.? Is there a way > to use grep or another R function in order to acquire that > information? > > Thank you again in advance for any insights. > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >__________________________________________________________________ Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All