I have some Bernoulli data something like this: x<-sort(runif(100,1,20)) p<-pnorm(x,10,3) y<-as.numeric(runif(x)<p) plot(x,y) lines(x,p) This plot is not very satisfactory because the ogive does not visually fit the (0,1) points very well, and also because the points tend to fall on top of one another. The second problem can be eliminated by adding vertical jitter. However I was thinking about the following plot. Instead of plotting all the 0,1 points, instead divide the x axis into bins. In each bin, find the average y value. Then plot (x=average of x values in bin, y=average of 0,1 values in bin). So if I use 10 bins I have 10 points in the plot and now the y-values are proportions instead of 0/1. Is this a plot that other people have used (refs appreciated)? If so maybe someone has code to do this. Otherwise, I am not sure of how to do this in R. Could someone help me? Thanks very much. Bill -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> x<-sort(runif(100,1,20)) > p<-pnorm(x,10,3) > y<-as.numeric(runif(x)<p) > plot(x,y) > lines(x,p)df<-data.frame(x,y) aggregate(df,list(x=(x<5),(x>5)&(x<10),(x>10) & (x<15),(x>15)), FUN=mean) gives me what I want but if anyone has a better way to collect the observations into bins I'd like to hear it. It would be nice to pass along something like breaks<-c(5,10,15,20) Thanks Bill Simpson -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
The loess smoother, with outlier detection turned off, is an excellent way to estimate the relationship between a continuous variable and the probability of an event, based on a binary dependent variable. Use lowess(x,y,iter=0). I do wonder about the way in which your data are generated, however. You might think about p <- whatever # and check that if you use pnorm the # first arg to pnorm spans the right metric y <- 1*(runif(n) <= p) # n=100 in your example Bill Simpson wrote:> > I have some Bernoulli data something like this: > x<-sort(runif(100,1,20)) > p<-pnorm(x,10,3) > y<-as.numeric(runif(x)<p) > plot(x,y) > lines(x,p) > > This plot is not very satisfactory because the ogive does not visually > fit the (0,1) points very well, and also because the points tend to fall > on top of one another. The second problem can be eliminated by adding > vertical jitter. However I was thinking about the following plot. Instead > of plotting all the 0,1 points, instead divide the x axis into bins. In > each bin, find the average y value. Then plot (x=average of x values in > bin, y=average of 0,1 values in bin). So if I use 10 bins I have 10 points > in the plot and now the y-values are proportions instead of 0/1. Is this > a plot that other people have used (refs appreciated)? If so maybe someone > has code to do this. Otherwise, I am not sure of how to do this in R. > Could someone help me? Thanks very much. > > Bill > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
In the next release of the gregmisc library there is a function "wapply" that applies a specified function over 'windows' of x-y data (code below). You would use it in this case as: x<-sort(runif(100,1,20)) p<-pnorm(x,10,3) y<-as.numeric(runif(x)<p) plot(x,y) lines(x,p) # draw a line with 10 points, each computed on regions spanning 1/5th of the x range lines(wapply(x,y,n=10,width=1/5),col="red") -Greg # $Id: wapply.R,v 1.2 2001/08/31 23:45:45 warneg Exp $ # "wapply" _ function( x, y, fun=mean, method="range", width=1/10, n=50, ...) { method <- match.arg(method, c("width","range","nobs","fraction")) if(method == "width" || method == "range" ) { if(method=="range") width <- width * range(x) pts _ seq(min(x),max(x),length.out=n) result _ sapply( pts, function(pts,y,width,fun,XX,...) { low _ min((pts-width/2),max(XX)) high _ max((pts+width/2), min(XX)) return (fun(y[(XX>= low) & (XX<=high)],...)) }, y=y, width=width, fun=fun, XX = x, ...) return(x=pts,y=result) } else # method=="nobs" || method=="fraction" { if( method=="fraction") width <- floor(length(x) * width) ord <- order(x) x <- x[ord] y <- y[ord] n <- length(x) center <- 1:n below <- sapply(center - width/2, function(XX) max(1,XX) ) above <- sapply(center + width/2, function(XX) min(n,XX) ) retval <- list() retval$x <- x retval$y <- apply(cbind(below,above), 1, function(x) fun(y[x[1]:x[2]],...) ) return(retval) } } > -----Original Message----- > From: Bill Simpson [mailto:wsi at gcal.ac.uk] > Sent: Tuesday, October 02, 2001 7:57 AM > To: r-help at stat.math.ethz.ch > Subject: Re: [R] plot of Bernoulli data > > > > x<-sort(runif(100,1,20)) > > p<-pnorm(x,10,3) > > y<-as.numeric(runif(x)<p) > > plot(x,y) > > lines(x,p) > df<-data.frame(x,y) > aggregate(df,list(x=(x<5),(x>5)&(x<10),(x>10) & > (x<15),(x>15)), FUN=mean) > gives me what I want but if anyone has a better way to collect the > observations into bins I'd like to hear it. It would be nice to > pass along something like > breaks<-c(5,10,15,20) > > Thanks > > Bill Simpson > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._._._._ > LEGAL NOTICE Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi Frank, Actually the 'wapply' function I have does 'overlapping' windows. (It would have to if it uses 1/5 of the range at each of 10 points, n'est-pas?) The main reason I created it was to add a 'local variance' estimate to plots in an attempt to more easily detect changes in variability. EG: x <- rnorm(500,sd=10) y <- rnorm(500,sd=sqrt(abs(x))) plot(x,y) sdplus <- function(x,sign=1) { if(length(x)>0) mean(x) + sign * sqrt(var(x)) else NA } lines(wapply(x,y,fun=sdplus,width=1/5),col="red") lines(wapply(x,y,fun=sdplus,sign=-1,width=1/5),col="red") Not very pretty, but does what I need. I'm not familiar with the trellace/lattice functions. Is there a better or more elegant way to do this? -Greg > -----Original Message----- > From: Frank E Harrell Jr [mailto:fharrell at virginia.edu] > Sent: Thursday, October 04, 2001 9:06 AM > To: Warnes, Gregory R > Cc: 'Bill Simpson'; 'r-help at stat.math.ethz.ch' > Subject: Re: [R] plot of Bernoulli data > > > Greg - My model for this type of application is to > either use > > - shingle objects with Trellis/Lattice to allow overlapping > intervals (and for this application there is good reason > for intervals to overlap if you are not using the > better loess/lowess solution) > or > - the cut2 function in the soon (I hope) to be beta-released > Hmisc library, with tapply and all the other stratified > computation functions. cut2 has the same arguments as > cut but also arguments g (number of quantile groups) and > m (number of observations per interval) > > cut2 at present does not handle fixed windows as you > describe, but in general such windows are problematic. > > The bigger point though is that there may be a slight > advantage in separating the stratification code (e.g., > cut2) from the computation code (e.g., tapply and > summarize (in Hmisc)). > > Frank Harrell > > > "Warnes, Gregory R" wrote: > > > > In the next release of the gregmisc library there is a > function "wapply" > > that applies a specified function over 'windows' of x-y > data (code below). > > You would use it in this case as: > > > > x<-sort(runif(100,1,20)) > > p<-pnorm(x,10,3) > > y<-as.numeric(runif(x)<p) > > plot(x,y) > > lines(x,p) > > > > # draw a line with 10 points, each computed on regions > spanning 1/5th of > > the x range > > lines(wapply(x,y,n=10,width=1/5),col="red") > > > > -Greg > > > > # $Id: wapply.R,v 1.2 2001/08/31 23:45:45 warneg Exp $ > > # > > "wapply" _ function( x, y, fun=mean, method="range", > > width=1/10, n=50, ...) > > { > > method <- match.arg(method, > c("width","range","nobs","fraction")) > > > > if(method == "width" || method == "range" ) > > { > > if(method=="range") > > width <- width * range(x) > > > > > > pts _ seq(min(x),max(x),length.out=n) > > > > result _ sapply( pts, function(pts,y,width,fun,XX,...) > > { > > low _ min((pts-width/2),max(XX)) > > high _ max((pts+width/2), min(XX)) > > return (fun(y[(XX>= low) & > (XX<=high)],...)) > > }, > > y=y, > > width=width, > > fun=fun, > > XX = x, > > ...) > > > > return(x=pts,y=result) > > } > > else # method=="nobs" || method=="fraction" > > { > > if( method=="fraction") > > width <- floor(length(x) * width) > > > > ord <- order(x) > > x <- x[ord] > > y <- y[ord] > > > > n <- length(x) > > center <- 1:n > > below <- sapply(center - width/2, function(XX) max(1,XX) ) > > above <- sapply(center + width/2, function(XX) min(n,XX) ) > > > > retval <- list() > > retval$x <- x > > retval$y <- apply(cbind(below,above), 1, > > function(x) fun(y[x[1]:x[2]],...) ) > > > > return(retval) > > } > > > > } > > > > > -----Original Message----- > > > From: Bill Simpson [mailto:wsi at gcal.ac.uk] > > > Sent: Tuesday, October 02, 2001 7:57 AM > > > To: r-help at stat.math.ethz.ch > > > Subject: Re: [R] plot of Bernoulli data > > > > > > > > > > x<-sort(runif(100,1,20)) > > > > p<-pnorm(x,10,3) > > > > y<-as.numeric(runif(x)<p) > > > > plot(x,y) > > > > lines(x,p) > > > df<-data.frame(x,y) > > > aggregate(df,list(x=(x<5),(x>5)&(x<10),(x>10) & > > > (x<15),(x>15)), FUN=mean) > > > gives me what I want but if anyone has a better way > to collect the > > > observations into bins I'd like to hear it. It would > be nice to > > > pass along something like > > > breaks<-c(5,10,15,20) > > > > > > Thanks > > > > > > Bill Simpson > > > > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > > > -.-.-.-.-.-.-.-.-.- > > > r-help mailing list -- Read > > > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > > > Send "info", "help", or "[un]subscribe" > > > (in the "body", not the subject !) To: > > > r-help-request at stat.math.ethz.ch > > > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > > > _._._._._._._._._._ > > > > > > > LEGAL NOTICE > > Unless expressly stated otherwise, this message is > confidential and may be privileged. It is intended for the > addressee(s) only. Access to this E-mail by anyone else is > unauthorized. If you are not an addressee, any disclosure > or copying of the contents of this E-mail or any action > taken (or not taken) in reliance on it is unauthorized and > may be unlawful. If you are not an addressee, please inform > the sender immediately. > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.-.- > > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html> Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch >_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._