Alexander.Herr@csiro.au
2002-Dec-18 03:15 UTC
[R] summary stats including NA's into new dataframe
List, I am trying to extract summary statistics from a data frame with several variables (and NAs) into a dataframe with the columns: Variablename (ie the colnames of original data), mean, stdev, max, min, Valid N, Missing Values. Extracting the statistics is straightforward using stack and aggregate. However, I haven''t succeeded in obtaining the number of Missing Values. I can extract these from describe (Hmisc library), but surely there is a simpler way similar to obtaining the mean using aggregate? Suggestions are much appreciated Thanks Herry -------------------------------------------- Alexander Herr - Herry CSIRO Sustainable Ecosystems: http://www.cse.csiro.au/ -------------------------------------------- [[alternate HTML version deleted]]
Alexander.Herr at csiro.au wrote:> List, > > I am trying to extract summary statistics from a data frame with several > variables (and NAs) into a dataframe with the columns: Variablename (ie the > colnames of original data), mean, stdev, max, min, Valid N, Missing Values. > > Extracting the statistics is straightforward using stack and aggregate. > However, I haven't succeeded in obtaining the number of Missing Values. I > can extract these from describe (Hmisc library), but surely there is a > simpler way similar to obtaining the mean using aggregate?The similar way is: aggregate(......., function(x) sum(is.na(x))) Uwe Ligges> Suggestions are much appreciated
Alexander.Herr@csiro.au
2002-Dec-19 02:19 UTC
[R] summary stats including NA's into new dataframe
Thanks Uwe, Can''t seem to get your formula to work... I should have made this clearer. I am after a listing of the number of NAs and Valid Ns (or total N)for export to csv,eg: Variable, mean, Missing Values, Valid N test, 6.00000,2,18 bummer,5.44444,1,19 from: x<-c(1,4,2,6,8,3,5,6,7,8,7,2,4,7,5,1,8,9,8,9) labl<-gl(2,2,length=20,labels=c("test","bummer")) x[3]<-NA x[5]<-NA x[6]<-NA aggregate(x,by=list(labl),mean, sum(is.na(x))) # Group.1 x #1 test NA #2 bummer NA aggregate(x,by=list(labl),mean, na.rm=T) # Group.1 x #1 test 6.000000 #2 bummer 5.444444 aggregate(x,by=list(labl),sum(is.na(x))) # Error in FUN(X[[1]], ...) : Argument "INDEX" is missing, with no default Cheers Herry -------------------------------------------- Alexander Herr - Herry Northern Futures Davies Laboratory PMB, Aitkenvale, QLD 4814 Phone (07) 4753 8510 Fax (07) 4753 8650 Home: http://batcall.csu.edu.au/~aherr CSIRO Sustainable Ecosystems: http://www.cse.csiro.au/ -------------------------------------------- -----Original Message----- From: Uwe Ligges [mailto:ligges@statistik.uni-dortmund.de] Sent: Wednesday, 18 December 2002 5:30 PM To: Alexander.Herr@csiro.au Cc: r-help@stat.math.ethz.ch Subject: Re: [R] summary stats including NA''s into new dataframe Alexander.Herr@csiro.au wrote:> List, > > I am trying to extract summary statistics from a data frame with several > variables (and NAs) into a dataframe with the columns: Variablename (iethe> colnames of original data), mean, stdev, max, min, Valid N, MissingValues.> > Extracting the statistics is straightforward using stack and aggregate. > However, I haven''t succeeded in obtaining the number of Missing Values. I > can extract these from describe (Hmisc library), but surely there is a > simpler way similar to obtaining the mean using aggregate?The similar way is: aggregate(......., function(x) sum(is.na(x))) Uwe Ligges> Suggestions are much appreciated[[alternate HTML version deleted]]
I think my dstats function does what you want, if I understand you coorrectly. You could apply it over rows or columns: http://home.earthlink.net/~bmagill/MyMisc.html It is there along with several other functions. On Thu, 19 Dec 2002 11:18:25 +1000 Alexander.Herr at csiro.au wrote:> Thanks Uwe, > Can't seem to get your formula to work... > I should have made this clearer. I am after a > listing of the number of NAs > and Valid Ns (or total N)for export to csv,eg: > Variable, mean, Missing Values, Valid N > test, 6.00000,2,18 > bummer,5.44444,1,19 > > from: > > x List, > > > > I am trying to extract summary statistics > from a data frame with several > > variables (and NAs) into a dataframe with the > columns: Variablename (ie > the > > colnames of original data), mean, stdev, max, > min, Valid N, Missing > Values. > > > > Extracting the statistics is straightforward > using stack and aggregate. > > However, I haven't succeeded in obtaining the > number of Missing Values. I > > can extract these from describe (Hmisc > library), but surely there is a > > simpler way similar to obtaining the mean > using aggregate? > > The similar way is: > > aggregate(......., function(x) sum(is.na(x))) > > Uwe Ligges > > > Suggestions are much appreciated > > > [[alternate HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help >