I don't find how to do what I need to do in Dalgaard or 'R Cookbook', so I'm asking here. I have a data frame with water chemistry data and I want to start exploring these data. There are three factors (site, date, chemical) associated with each measurement. The data frame looks like this:> summary(chemdata)site_id.sample_date.param.quant BC-0.5|1996-04-19|Arsenic|0.01 : 1 BC-0.5|1996-04-19|Calcium|76.56 : 1 BC-0.5|1996-04-19|Chloride|12 : 1 BC-0.5|1996-04-19|Magnesium|43.23 : 1 BC-0.5|1996-04-19|Sulfate|175 : 1 BC-0.5|1996-04-19|Total Dissolved Solids|460: 1 (Other) :14880 I want first to calculate (and plot) descriptive stats by chemical, ignoring site and date and telling R to ignore missing data. (Incorporating those factors will occur later.) What I have not been able to figure out is how to specify the command to, for example, calculate mean and sd for Arsenic. My floundering and thrashing includes attempts like these:> mean(chemdata.param="Arsenic")Error in is.numeric(x) : 'x' is missing> mean(chemdata.quant, param="Arsenic")Error in mean(chemdata.quant, param = "Arsenic") : object 'chemdata.quant' not found> mean(chemdata$quant, param="Arsenic")[1] NA Warning message: In mean.default(chemdata$quant, param = "Arsenic") : argument is not numeric or logical: returning NA As a newcomer to R I've done a lot of reading, yet all the examples use nicely structured data to illustrate the point being made. I need to work with my data and learn how to specify columns and write correct commands for the analyses I need. Please point me in the right direction. Rich
Hi Rich, It is a bit hard to read the "summary" you are using. Consider please pasting the output of: ls.str(chemdata) Regarding your question, please start and see if this work (I'm not sure, since it seems you have made some changes to the summary output, and I am only guessing how things look): mean(chemdata$quant[chemdata$param =="Arsenic"]) sd(chemdata$quant[chemdata$param =="Arsenic"]) Cheers, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Wed, Aug 31, 2011 at 12:00 AM, Rich Shepard <rshepard@appl-ecosys.com>wrote:> mean(chemdata.param="Arsenic")[[alternative HTML version deleted]]
On Aug 30, 2011, at 5:00 PM, Rich Shepard wrote:> I don't find how to do what I need to do in Dalgaard or 'R > Cookbook', so > I'm asking here. > > I have a data frame with water chemistry data and I want to start > exploring these data. There are three factors (site, date, chemical) > associated with each measurement. The data frame looks like this: > >> summary(chemdata) > site_id.sample_date.param.quantIt appears that your original file was delimited by "|" and your used something else, perhaps the default white-space setting? I think you need to go back and do your input operations again with sep="|" (Or you could provide str() on the data.frame rather than making us guess.) -- David> BC-0.5|1996-04-19|Arsenic|0.01 : 1 > BC-0.5|1996-04-19|Calcium|76.56 : 1 > BC-0.5|1996-04-19|Chloride|12 : 1 > BC-0.5|1996-04-19|Magnesium|43.23 : 1 > BC-0.5|1996-04-19|Sulfate|175 : 1 > BC-0.5|1996-04-19|Total Dissolved Solids|460: 1 > (Other) :14880 > > I want first to calculate (and plot) descriptive stats by chemical, > ignoring site and date and telling R to ignore missing data. > (Incorporating > those factors will occur later.) What I have not been able to figure > out is > how to specify the command to, for example, calculate mean and sd for > Arsenic. My floundering and thrashing includes attempts like these: > >> mean(chemdata.param="Arsenic") > Error in is.numeric(x) : 'x' is missing >> mean(chemdata.quant, param="Arsenic") > Error in mean(chemdata.quant, param = "Arsenic") : > object 'chemdata.quant' not found >> mean(chemdata$quant, param="Arsenic") > [1] NA > Warning message: > In mean.default(chemdata$quant, param = "Arsenic") : > argument is not numeric or logical: returning NA > > As a newcomer to R I've done a lot of reading, yet all the examples > use > nicely structured data to illustrate the point being made. I need to > work > with my data and learn how to specify columns and write correct > commands for > the analyses I need. Please point me in the right direction. > > Rich > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Hi Rich, I do not know what u really want, because it seems to me, u want to calculate the mean of all rows, where the chemical is Arsenic?? But try this to get a little more inside: mean(chemdata$quant[chemdata$param=="Arsenic"]) The vector chemdata[chemdata$param=="Arsenic",] is a logical vector, returning TRUE for every row in which the variable param takes the value "Arsenic". Try it in your R editor to see it and understand the R concept! If u now want to get all values of a certain column, given all values have "Arsenic" as param, u just write: chemdata$COLUMNNAME[chemdata$param=="Arsenic"] I do not know if this helps, as it seems to me, that Arsenic only occurs once in your frame?.. Good luck Simon On Aug 30, 2011, at 11:00 PM, Rich Shepard wrote:> I don't find how to do what I need to do in Dalgaard or 'R Cookbook', so > I'm asking here. > > I have a data frame with water chemistry data and I want to start > exploring these data. There are three factors (site, date, chemical) > associated with each measurement. The data frame looks like this: > >> summary(chemdata) > site_id.sample_date.param.quant > BC-0.5|1996-04-19|Arsenic|0.01 : 1 > BC-0.5|1996-04-19|Calcium|76.56 : 1 > BC-0.5|1996-04-19|Chloride|12 : 1 > BC-0.5|1996-04-19|Magnesium|43.23 : 1 > BC-0.5|1996-04-19|Sulfate|175 : 1 > BC-0.5|1996-04-19|Total Dissolved Solids|460: 1 > (Other) :14880 > > I want first to calculate (and plot) descriptive stats by chemical, > ignoring site and date and telling R to ignore missing data. (Incorporating > those factors will occur later.) What I have not been able to figure out is > how to specify the command to, for example, calculate mean and sd for > Arsenic. My floundering and thrashing includes attempts like these: > >> mean(chemdata.param="Arsenic") > Error in is.numeric(x) : 'x' is missing >> mean(chemdata.quant, param="Arsenic") > Error in mean(chemdata.quant, param = "Arsenic") : > object 'chemdata.quant' not found >> mean(chemdata$quant, param="Arsenic") > [1] NA > Warning message: > In mean.default(chemdata$quant, param = "Arsenic") : > argument is not numeric or logical: returning NA > > As a newcomer to R I've done a lot of reading, yet all the examples use > nicely structured data to illustrate the point being made. I need to work > with my data and learn how to specify columns and write correct commands for > the analyses I need. Please point me in the right direction. > > Rich > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.