Hi, I have the following data about courses (504) in a university, two attributes about the proportion of resources used (#resources_used / #resources_available), namely the average and the standard deviation. Thus I have: [1] n=504 rows [2] 1 id column and 2 attributes Here's a sample of the data: courseid,average,std 12741,1,0 17161,1,0 12514,1,0 12316,0.8666666692648178,0.26090261464799325 2467,0.8623188442510107,0.24920700355307424 3047,0.85,0.2314550249431379 1747,0.8481481481481481,0.23078446747051584 2487,0.8383838455333854, 0.20429589057565342 13869,0.8181818181818182,0.2522624895547565 1706,0.8158730235364702,0.19332287915878024 2041,0.8095238095238095,0.24880667576405963 1864,0.8080808141014793,0.17456052968726046 2106,0.784444437623024,0.2475808839379094 .... ..... My question is how can I sensibly visualise this data. In this context, it does not make sense to go find the population mean or population std. However, what would sense is showing the cdf of the mean. So, I'm thinking of doing this using ecdf(). But what about the standard deviation? How can I include visualise the standard deviation as well as the mean? Would that make sense on just one plot? Any idea? Thanks Gawesh
R. Michael Weylandt
2011-Oct-23 13:22 UTC
[R] how to plot a distribution of mean and standard deviation
It seems like the relevant plot would depend on what you are trying to investigate, but usually a scatterplot would well work for bivariate data with no other assumptions needed. I usually find ecdf() plots rather hard to interpret without playing around with the data elsewhere first and I'm not sure they make an enormous amount of sense for bivariate data in your case since they reorder inputs. Michael On Sun, Oct 23, 2011 at 6:51 AM, gj <gawesh at gmail.com> wrote:> Hi, > I have the following data about courses (504) in a university, two > attributes about the proportion of resources used (#resources_used / > #resources_available), namely the average and the standard deviation. > Thus I have: > [1] n=504 rows > [2] 1 id column and 2 attributes > > Here's a sample of the data: > > courseid,average,std > 12741,1,0 > 17161,1,0 > 12514,1,0 > 12316,0.8666666692648178,0.26090261464799325 > 2467,0.8623188442510107,0.24920700355307424 > 3047,0.85,0.2314550249431379 > 1747,0.8481481481481481,0.23078446747051584 > 2487,0.8383838455333854, ? ? ? ?0.20429589057565342 > 13869,0.8181818181818182,0.2522624895547565 > 1706,0.8158730235364702,0.19332287915878024 > 2041,0.8095238095238095,0.24880667576405963 > 1864,0.8080808141014793,0.17456052968726046 > 2106,0.784444437623024,0.2475808839379094 > .... > ..... > > My question is how can I sensibly visualise this data. > > In this context, it does not make sense to go find the population mean > or population std. However, what would sense is showing the cdf of the > mean. So, I'm thinking of doing this using ecdf(). But what about the > standard deviation? How can I include visualise the standard deviation > as well as the mean? Would that make sense on just one plot? > > Any idea? > > Thanks > Gawesh > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >