Rich Shepard
2011-Oct-04 18:39 UTC
[R] How to subset() from data frame using specific rows
I have a data frame called chemdata with this structure:> str(chemdata)'data.frame': 14886 obs. of 4 variables: $ site : Factor w/ 148 levels "BC-0.5","BC-1",..: 104 145 126 115 114 128 124 2 3 3 ... $ sampdate: Date, format: "1996-12-27" "1996-08-22" ... $ param : Factor w/ 8 levels "As","Ca","Cl",..: 1 1 1 1 1 1 1 1 1 1 ... $ quant : num 0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ... I've looked in the R Cookbook and Dalgaard's intro book without finding a way to use wildcards (e.g., like "BC-*") or explicitly witing each site ID when subdsetting a data frame.. I need to create subsets (as data frames) based on sites, but including all sites on each stream. For example, using the initial site factor shown above, I want a subset containing all data for sites "BC-0.5", "BC-1". "BC-2", "BC-3", "BC-4", "BC-5", and "BC-6". Pointers appreciated, Rich
Sarah Goslee
2011-Oct-04 18:46 UTC
[R] How to subset() from data frame using specific rows
Hi Rich, You can use something like this:> testdata <- c("A1", "A2", "A3", "B1", "B2", "B3") > grep("^A", testdata)[1] 1 2 3> grepl("^A", testdata)[1] TRUE TRUE TRUE FALSE FALSE FALSE Sarah On Tue, Oct 4, 2011 at 2:39 PM, Rich Shepard <rshepard at appl-ecosys.com> wrote:> ?I have a data frame called chemdata with this structure: > >> str(chemdata) > > 'data.frame': ? 14886 obs. of ?4 variables: > ?$ site ? ?: Factor w/ 148 levels "BC-0.5","BC-1",..: 104 145 126 115 114 > 128 124 2 3 3 ... > ?$ sampdate: Date, format: "1996-12-27" "1996-08-22" ... > ?$ param ? : Factor w/ 8 levels "As","Ca","Cl",..: 1 1 1 1 1 1 1 1 1 1 ... > ?$ quant ? : num ?0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ... > > ?I've looked in the R Cookbook and Dalgaard's intro book without finding a > way to use wildcards (e.g., like "BC-*") or explicitly witing each site ID > when subdsetting a data frame.. > > ?I need to create subsets (as data frames) based on sites, but including > all sites on each stream. For example, using the initial site factor shown > above, I want a subset containing all data for sites "BC-0.5", "BC-1". > "BC-2", "BC-3", "BC-4", "BC-5", and "BC-6". > > Pointers appreciated, > > Rich >-- Sarah Goslee http://www.functionaldiversity.org
R. Michael Weylandt
2011-Oct-04 18:46 UTC
[R] How to subset() from data frame using specific rows
This isn't going to be the most elegant, but it should work: ## Get the factors as characters ff <- as.character(chemdata$site) ## Identify those that match what you want ff <- grepl(ff, "BC-") now use this logical vector to subset chemdata[ff, ] Can't test, but should be good to go assuming that "BC-" entirely identifies those sites you want. If you have other "BC-" things read through the ?regex documentation and I think it describes how to do selective wildcards Michael On Tue, Oct 4, 2011 at 2:39 PM, Rich Shepard <rshepard at appl-ecosys.com> wrote:> ?I have a data frame called chemdata with this structure: > >> str(chemdata) > > 'data.frame': ? 14886 obs. of ?4 variables: > ?$ site ? ?: Factor w/ 148 levels "BC-0.5","BC-1",..: 104 145 126 115 114 > 128 124 2 3 3 ... > ?$ sampdate: Date, format: "1996-12-27" "1996-08-22" ... > ?$ param ? : Factor w/ 8 levels "As","Ca","Cl",..: 1 1 1 1 1 1 1 1 1 1 ... > ?$ quant ? : num ?0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ... > > ?I've looked in the R Cookbook and Dalgaard's intro book without finding a > way to use wildcards (e.g., like "BC-*") or explicitly witing each site ID > when subdsetting a data frame.. > > ?I need to create subsets (as data frames) based on sites, but including > all sites on each stream. For example, using the initial site factor shown > above, I want a subset containing all data for sites "BC-0.5", "BC-1". > "BC-2", "BC-3", "BC-4", "BC-5", and "BC-6". > > Pointers appreciated, > > Rich > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Jeff Newmiller
2011-Oct-04 19:08 UTC
[R] How to subset() from data frame using specific rows
?grep ?names Use indexing by name [, namevector] --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Rich Shepard <rshepard@appl-ecosys.com> wrote: I have a data frame called chemdata with this structure:> str(chemdata)'data.frame': 14886 obs. of 4 variables: $ site : Factor w/ 148 levels "BC-0.5","BC-1",..: 104 145 126 115 114 128 124 2 3 3 ... $ sampdate: Date, format: "1996-12-27" "1996-08-22" ... $ param : Factor w/ 8 levels "As","Ca","Cl",..: 1 1 1 1 1 1 1 1 1 1 ... $ quant : num 0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ... I've looked in the R Cookbook and Dalgaard's intro book without finding a way to use wildcards (e.g., like "BC-*") or explicitly witing each site ID when subdsetting a data frame.. I need to create subsets (as data frames) based on sites, but including all sites on each stream. For example, using the initial site factor shown above, I want a subset containing all data for sites "BC-0.5", "BC-1". "BC-2", "BC-3", "BC-4", "BC-5", and "BC-6". Pointers appreciated, Rich _____________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]