Hello, I have tried to find this out some other way, but unsuccessful I have to try this list. I assume this should be quite simple. I have a dataset with 4 columns, "Sample_no", "Species", "Nitrogen", "Carbon" in csv format. In the species column I have many different species with varying number of obs per species Eg "Sample_no" "Species" "Nitrogen" "Carbon" 1 Cod 15.2 -19.0 2 Haddock 14.8 -20.2 3 Cod 15.6 -18.5 4 Cod 13.2 -20.1 5 Haddock 14.3 -18.8 Etc.. And I want to calculate, mean, standard dev etc per species for the observations "Nitrogen" and "Carbon". And later do plots and stats with the different species. I will in the end have many species, so need it to be "automatic" I can't enter code for every species separate. Can anyone help me with this? Or if this is the wrong list to sendt this question to, where do I send it? Thank you very much in advance. Best regards Silje Ramsvatn PhD-candidate University of Troms? Norway
On Nov 4, 2010, at 8:28 AM, Ramsvatn Silje wrote:> > Hello, > > I have tried to find this out some other way, but unsuccessful I > have to > try this list. > I assume this should be quite simple. > > I have a dataset with 4 columns, "Sample_no", "Species", "Nitrogen", > "Carbon" in csv format. In the species column I have many different > species with varying number of obs per species > > Eg > > "Sample_no" "Species" "Nitrogen" "Carbon" > 1 Cod 15.2 -19.0 > 2 Haddock 14.8 -20.2 > 3 Cod 15.6 -18.5 > 4 Cod 13.2 -20.1 > 5 Haddock 14.3 -18.8 > Etc.. > > And I want to calculate, mean, standard dev etc per species for the > observations "Nitrogen" and "Carbon". And later do plots and stats > with > the different species. I will in the end have many species, so need > it to > be "automatic" I can't enter code for every species separate. >http://finzi.psych.upenn.edu/R/library/prettyR/html/brkdn.html http://finzi.psych.upenn.edu/R/library/Hmisc/html/describe.html e.g library(Hmisc) with( dfrm, describe( ~Species) ) I think you could also probably do lapply(split(dfrm, dfrm$species), describe) the Hmisc::describe function is especially good at first examining a vector and applying the appropriate methods to the type of data. There are several other packages with different describe functions. And there are several other packages such as doBy and plyr that will offer other concise methods for doing your by-category statistics. -- David.> Can anyone help me with this? Or if this is the wrong list to sendt > this > question to, where do I send it? > > Thank you very much in advance. > > > Best regards > > Silje Ramsvatn > > PhD-candidate > University of Troms? > Norway > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Hi r-help-bounces at r-project.org napsal dne 04.11.2010 13:28:06:> > Hello, > > I have tried to find this out some other way, but unsuccessful I have to > try this list. > I assume this should be quite simple. > > I have a dataset with 4 columns, "Sample_no", "Species", "Nitrogen", > "Carbon" in csv format. In the species column I have many different > species with varying number of obs per species > > Eg > > "Sample_no" "Species" "Nitrogen" "Carbon" > 1 Cod 15.2 -19.0 > 2 Haddock 14.8 -20.2 > 3 Cod 15.6 -18.5 > 4 Cod 13.2 -20.1 > 5 Haddock 14.3 -18.8 > Etc.. > > And I want to calculate, mean, standard dev etc per species for the > observations "Nitrogen" and "Carbon". And later do plots and stats with > the different species. I will in the end have many species, so need itto> be "automatic" I can't enter code for every species separate.No need for sorting. You can us R. Particularly ?tapply, ?by or ?aggregate commands. Regarding plots you can consider lattice or ggplot2, but you can get good results also with base graphics. aggregate(your.data[,3:4], list(yourdata$Species), function(x) c(mean(x), sd(x))) xyplot(nitrogen~carbon|species, data=your.data) Regards Petr> > Can anyone help me with this? Or if this is the wrong list to sendt this > question to, where do I send it? > > Thank you very much in advance. > > > Best regards > > Silje Ramsvatn > > PhD-candidate > University of Troms? > Norway > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Try tapply(). For example: tapply(data$Nitrogen,factor(data$Species),mean) For the Nitrogen column, the mean is calculated for each Species. (if the data frame below is in the object data) Regards, Annemarie Eigenhuis -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Ramsvatn Silje Sent: donderdag 4 november 2010 13:28 To: R-help at r-project.org Subject: [R] Sorting data from one column with strings Hello, I have tried to find this out some other way, but unsuccessful I have to try this list. I assume this should be quite simple. I have a dataset with 4 columns, "Sample_no", "Species", "Nitrogen", "Carbon" in csv format. In the species column I have many different species with varying number of obs per species Eg "Sample_no" "Species" "Nitrogen" "Carbon" 1 Cod 15.2 -19.0 2 Haddock 14.8 -20.2 3 Cod 15.6 -18.5 4 Cod 13.2 -20.1 5 Haddock 14.3 -18.8 Etc.. And I want to calculate, mean, standard dev etc per species for the observations "Nitrogen" and "Carbon". And later do plots and stats with the different species. I will in the end have many species, so need it to be "automatic" I can't enter code for every species separate. Can anyone help me with this? Or if this is the wrong list to sendt this question to, where do I send it? Thank you very much in advance. Best regards Silje Ramsvatn PhD-candidate University of Troms? Norway ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
(apologies for any double hits; forgot to reply all...) Or, you could just go back to basics, and write yourself a general loop that goes through whatever levels of a variable and gives you back whatever statistics you want... below is an example where you estimate means for each level, but you could estimate any number of statistical parameters... dat<-data.frame(c(rep("A",5), rep("B",5),rep("C",5)),c(1:15)) results<-NULL for(i in levels(dat[,1])) { sub.dat<-subset(dat, dat[,1]==i) res<-mean(sub.dat[,2]) results<-c(results,i,res) } results.mat<-matrix(results, ncol=2, byrow=TRUE) results.mat HTH, Mike On Thu, Nov 4, 2010 at 7:28 AM, Ramsvatn Silje <silje.ramsvatn@uit.no>wrote:> > Hello, > > I have tried to find this out some other way, but unsuccessful I have to > try this list. > I assume this should be quite simple. > > I have a dataset with 4 columns, "Sample_no", "Species", "Nitrogen", > "Carbon" in csv format. In the species column I have many different > species with varying number of obs per species > > Eg > > "Sample_no" "Species" "Nitrogen" "Carbon" > 1 Cod 15.2 -19.0 > 2 Haddock 14.8 -20.2 > 3 Cod 15.6 -18.5 > 4 Cod 13.2 -20.1 > 5 Haddock 14.3 -18.8 > Etc.. > > And I want to calculate, mean, standard dev etc per species for the > observations "Nitrogen" and "Carbon". And later do plots and stats with > the different species. I will in the end have many species, so need it to > be "automatic" I can't enter code for every species separate. > > Can anyone help me with this? Or if this is the wrong list to sendt this > question to, where do I send it? > > Thank you very much in advance. > > > Best regards > > Silje Ramsvatn > > PhD-candidate > University of Tromsø > Norway > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
try sqldf:> xSample_no Species Nitrogen Carbon 1 1 Cod 15.2 -19.0 2 2 Haddock 14.8 -20.2 3 3 Cod 15.6 -18.5 4 4 Cod 13.2 -20.1 5 5 Haddock 14.3 -18.8> require(sqldf) > sqldf("select Species, avg(Nitrogen) Nitrogen, avg(Carbon) Carbon from x group by Species")Species Nitrogen Carbon 1 Cod 14.66667 -19.2 2 Haddock 14.55000 -19.5 On Thu, Nov 4, 2010 at 8:28 AM, Ramsvatn Silje <silje.ramsvatn at uit.no> wrote:> > Hello, > > I have tried to find this out some other way, but unsuccessful I have to > try this list. > I assume this should be quite simple. > > I have a dataset with 4 columns, "Sample_no", "Species", "Nitrogen", > "Carbon" in csv format. In the species column I have many different > species with varying number of obs per species > > Eg > > "Sample_no" ? ? "Species" ? ? ? "Nitrogen" ? ? ?"Carbon" > 1 ? ? ? ? ? ? ? Cod ? ? ? ? ? ? 15.2 ? ? ? ? ? ?-19.0 > 2 ? ? ? ? ? ? ? Haddock 14.8 ? ? ? ? ? ?-20.2 > 3 ? ? ? ? ? ? ? Cod ? ? ? ? ? ? 15.6 ? ? ? ? ? ?-18.5 > 4 ? ? ? ? ? ? ? Cod ? ? ? ? ? ? 13.2 ? ? ? ? ? ?-20.1 > 5 ? ? ? ? ? ? ? Haddock 14.3 ? ? ? ? ? ?-18.8 > Etc.. > > And I want to calculate, mean, standard dev etc per species for the > observations "Nitrogen" and "Carbon". And later do plots and stats with > the different species. I will in the end have many species, so need it to > be "automatic" I can't enter code for every species separate. > > Can anyone help me with this? Or if this is the wrong list to sendt this > question to, where do I send it? > > Thank you very much in advance. > > > Best regards > > Silje Ramsvatn > > PhD-candidate > University of Troms? > Norway > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?