Dear List: I am trying to find the area between two ECDFs. I am examining the gap in performance between two groups, males and females on a student achievement test in math, which is a continuous metric. I start by creating a subset of the dataframe male<-subset(datafile, female="Male") female<-subset(datafile, female="Female") I then plot the two CDFs via plot.ecdf(male$math) plot.ecdf(female$math, add=TRUE) This produces the visual display that reveals a gap in performance. What I would like to do is learn to perform the integration between the two ECDFs to examine the size of this gap. I would also like to try and examine the horizontal distance between the two CDFs via another visual display. In other words, the distance between, say the 50th percentile, from each CDF (or, the distance along the x-axis from cdf1 to cdf2 at each percentile. Ideally, I would like to plot this horizontal gap at each percentile. Secondly, I would like to try and measure and plot the vertical gap, i.e., the distance along the y-axis from cdf 1 to cdf2 at each value along the x-axis. I am not sure if I first need to smooth the ECDFs before performing these operations. Any help would be appreciated. I hope this makes sense. Harold ------ Harold C. Doran Director of Research and Evaluation New American Schools 675 N. Washington Street, Suite 220 Alexandria, Virginia 22314 703.647.1628 <http://www.edperform.net/> [[alternative HTML version deleted]]
mf<-c(male,female) ord<-order(mf); v<-c(rep(1/length(male),length(male)),rep(-1/length(female),length(female))) ; mf<-mf[ord]; v<-v[ord]; sum(diff(mf)*(cumsum(v)[1:(length(v)-1)])) You may not want to integrate cdfs. They're already probabilities. :) Nice analytic statistics exist for just the maximum distance between the cdfs, for example. -Frank -----Original Message----- From: Harold Doran [mailto:hdoran at nasdc.org] Sent: Wednesday, February 18, 2004 2:21 PM To: R Help Subject: [R] Area between CDFs Dear List: I am trying to find the area between two ECDFs. I am examining the gap in performance between two groups, males and females on a student achievement test in math, which is a continuous metric. I start by creating a subset of the dataframe male<-subset(datafile, female="Male") female<-subset(datafile, female="Female") I then plot the two CDFs via plot.ecdf(male$math) plot.ecdf(female$math, add=TRUE) This produces the visual display that reveals a gap in performance. What I would like to do is learn to perform the integration between the two ECDFs to examine the size of this gap. I would also like to try and examine the horizontal distance between the two CDFs via another visual display. In other words, the distance between, say the 50th percentile, from each CDF (or, the distance along the x-axis from cdf1 to cdf2 at each percentile. Ideally, I would like to plot this horizontal gap at each percentile. Secondly, I would like to try and measure and plot the vertical gap, i.e., the distance along the y-axis from cdf 1 to cdf2 at each value along the x-axis. I am not sure if I first need to smooth the ECDFs before performing these operations. Any help would be appreciated. I hope this makes sense. Harold ------ Harold C. Doran Director of Research and Evaluation New American Schools 675 N. Washington Street, Suite 220 Alexandria, Virginia 22314 703.647.1628 <http://www.edperform.net/> [[alternative HTML version deleted]] ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Thanks. I have been able to create the following simple function to examine the vertical gap between two CDFs at a value along the x-axis that I specify. For example, I create the ECDFs:>male.ecdf<-ecdf(egmale$math) >female.ecdf<-ecdf(egfemale$math)I then define the following function:>dif.cdf<-function(x){return(abs(female.ecdf(x)-male.ecdf(x)))}Now, I can use the function to measure the gap at values along the x-axis (i.e., gap = F(x)-G(x). Also, the CDFs do not cross at any point):>dif.cdf(0)which returns a value that is the size of the gap at a specific score between males and females. What I would like to be able to do is measure the vertical gap at each point along the x-axis and then plot the gap. This would illustrate for how large differences in student achievement are at different score values into a nice visual display. The brute force way seems to use the function above for each score value. However, this is, of course, inefficient. Any ideas on how I might be able to create a function that would be more efficient? Many thanks, Harold ------ Harold C. Doran Director of Research and Evaluation New American Schools 675 N. Washington Street, Suite 220 Alexandria, Virginia 22314 703.647.1628 -----Original Message----- From: Thomas Lumley [mailto:tlumley at u.washington.edu] Sent: Thursday, February 19, 2004 10:49 AM To: Samuelson, Frank* Cc: r-help at stat.math.ethz.ch Subject: RE: [R] Area between CDFs On Wed, 18 Feb 2004, Samuelson, Frank* wrote:> > You may not want to integrate cdfs. They're already probabilities. :) > Nice analytic statistics exist for just the maximum distance between > the cdfs, for example. >And for the area between cdfs, which is perhaps better known as the difference in means. -thomas ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html