I am so happy about learning how to read in multiple Excel files, that I have to try and make another improvement. I know what I have been doing is clumsy, but it works. Hopefully, someone can suggest a more elegant solution. As a novice, I have been using MS-Word and mail merge to write my code. I start with about 2 pages of code, and end up with 2,220 merged pages that I copy and paste into R. You can probably guess that I am not a programmer. ## here is the start of my merge document. The "x116" line has the merge field, in this case "Bases-K Ammonium Acetate-2008-116". This changes for each soil sample, and for each type of analysis. ## napt <- read.table(file = "C:/Documents and Settings/jfloren/My Documents/R_Statistics/NAPT/NAPT_09/CertIn2010/Data_for_R/readin_all_for_2010_cert.csv ", header = TRUE, sep = ",") attach(napt) x116 <- subset(napt, Analysis_Soil %in% c("Bases-K Ammonium Acetate-2008-116")) detach(napt) attach(x116) #### End of merge document section for selecting the subset ### Once I get the subset isolated, I have no problems calculating the necessary statistics and can generate some wonderful graphs. I have two questions. 1. How do I select different subsets from a large table without resorting to using Word's Mail Merge? 2. I prefer to only analyze the results if at least nine labs submitted results for a particular test. How would I tell R to skip the analysis if the number of labs running a particular test is less than nine? Thanks, Jerry Floren Minnesota Department of Agriculture -- View this message in context: http://n4.nabble.com/Help-with-subset-tp1049883p1049883.html Sent from the R help mailing list archive at Nabble.com.
Tena koe Jerry I'm not sure exactly what you wish to do, but it would seem you could create a character vector (myVars) containing all your merge field names and then use a for() loop. Something like: for (myV in myVars) { x116 <- subset(napt, Analysis_Soil %in% myV) if (nrow(x116) > 8) { # steps for your analysis } } HTH ...... Peter Alspach> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Jerry Floren > Sent: Friday, 22 January 2010 5:04 a.m. > To: r-help at r-project.org > Subject: [R] Help with subset > > > I am so happy about learning how to read in multiple Excel > files, that I have to try and make another improvement. I > know what I have been doing is clumsy, but it works. > Hopefully, someone can suggest a more elegant solution. As a > novice, I have been using MS-Word and mail merge to write my > code. I start with about 2 pages of code, and end up with > 2,220 merged pages that I copy and paste into R. You can > probably guess that I am not a programmer. > > ## here is the start of my merge document. The "x116" line > has the merge field, in this case "Bases-K Ammonium > Acetate-2008-116". This changes for each soil sample, and for > each type of analysis. ## > > napt <- read.table(file = "C:/Documents and > Settings/jfloren/My > Documents/R_Statistics/NAPT/NAPT_09/CertIn2010/Data_for_R/read > in_all_for_2010_cert.csv > ", header = TRUE, sep = ",") > attach(napt) > > x116 <- subset(napt, Analysis_Soil %in% c("Bases-K Ammonium > Acetate-2008-116")) > > detach(napt) > attach(x116) > > #### End of merge document section for selecting the subset ### > > Once I get the subset isolated, I have no problems > calculating the necessary statistics and can generate some > wonderful graphs. > > I have two questions. > 1. How do I select different subsets from a large table > without resorting to using Word's Mail Merge? > 2. I prefer to only analyze the results if at least nine labs > submitted results for a particular test. How would I tell R > to skip the analysis if the number of labs running a > particular test is less than nine? > > Thanks, > > Jerry Floren > Minnesota Department of Agriculture > > > -- > View this message in context: > http://n4.nabble.com/Help-with-subset-tp1049883p1049883.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thank you Peter. I am really new to this. The spreadsheet I am working with has 12,379 rows with the first row consisting of the variable names and 12,378 rows of data. There are seven columns, and the 7th column is the only one with numerical data ("Results"). I need to match up the variable Results with the variable "Anlysis_Soil", which is the type of test performed by the labs on one of 20 different soil samples. Here are some examples of the Anlysis_Soil variable: Anlysis_Soil Bases-Aluminum KCL Extr-2008-116 Bases-Aluminum KCL Extr-2008-116 Bases-Aluminum KCL Extr-2008-117 Bases-Aluminum KCL Extr-2008-118 Bases-Aluminum KCL Extr-2008-118 Bases-Aluminum KCL Extr-2008-119 Bases-Aluminum KCL Extr-2008-120 Bases-Aluminum KCL Extr-2008-120 Bases-Aluminum KCL Extr-2009-101 Actually, I am not interested in any of the above, because there are too few (less than 9). I think I need to first identify the unique Anlysis_Soil from the entire list, and I thought using "list" might work:> anlyses <- list(Anlysis_Soil) > str(anlyses)List of 1 $ : Factor w/ 1695 levels "Bases-Aluminum KCL Extr-2008-116",..: 1 1 2 3 3 4 5 5 6 6 ...>It does correctly identify there are 1695 unique "Anlysis_Soil" variables. However, "anlyses" contains all 12,378 "Anlysis_Soil" variables. For example: print(anlyses) ... ... ... [12374] Soil pH & EC-Soil EC (1to2)-2009-115 [12375] Soil pH & EC-Soil EC (1to2)-2009-115 [12376] Soil pH & EC-Soil EC (1to2)-2009-115 [12377] Soil pH & EC-Soil EC (1to2)-2009-115 [12378] Soil pH & EC-Soil EC (1to2)-2009-115 1695 Levels: Bases-Aluminum KCL Extr-2008-116 ... And once again shows correctly that there are 1,695 unique "Anlysis_Soil" variables. Once the unique Anlysis_Soil variables are identified, I need to determine the ones greater than 8, and I see how that could be done with your code. I am not clear what you mean by, "for (myV in myVars)" ? Is myV the name of one of the unique variables that has at least 9 Results? Is myVars the entire column of "Anlysis_Soil" ? I am not sure if this is any clearer. Thanks, Jerry Floren Minnesota Department of Agriculture -- View this message in context: http://n4.nabble.com/Help-with-subset-tp1049883p1058242.html Sent from the R help mailing list archive at Nabble.com.