thr3ads.net - R help - [R] plot of Bernoulli data [Oct 2001]

If this information is useful, please help other people find it:
Share via:

Bill Simpson

2001-Oct-02 09:23 UTC

[R] plot of Bernoulli data

I have some Bernoulli data something like this:
 x<-sort(runif(100,1,20))
 p<-pnorm(x,10,3)
 y<-as.numeric(runif(x)<p)
 plot(x,y)
 lines(x,p)

This plot is not very satisfactory because the ogive does not visually
fit the (0,1) points very well, and also because the points tend to fall
on top of one another. The second problem can be eliminated by adding
vertical jitter. However I was thinking about the following plot. Instead
of plotting all the 0,1 points, instead divide the x axis into bins. In
each bin, find the average y value. Then plot (x=average of x values in
bin, y=average of 0,1 values in bin). So if I use 10 bins I have 10 points
in the plot and now the y-values are proportions instead of 0/1.  Is this
a plot that other people have used (refs appreciated)? If so maybe someone
has code to do this. Otherwise, I am not sure of how to do this in R.
Could someone help me?  Thanks very much.

Bill

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Bill Simpson

2001-Oct-02 11:57 UTC

head link

[R] plot of Bernoulli data

>  x<-sort(runif(100,1,20))
>  p<-pnorm(x,10,3)
>  y<-as.numeric(runif(x)<p)
>  plot(x,y)
>  lines(x,p)df<-data.frame(x,y)
aggregate(df,list(x=(x<5),(x>5)&(x<10),(x>10) &
(x<15),(x>15)), FUN=mean)
gives me what I want but if anyone has a better way to collect the
observations into bins I'd like to hear it. It would be nice to
pass along something like
breaks<-c(5,10,15,20)

Thanks

Bill Simpson

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Frank E Harrell Jr

2001-Oct-02 13:15 UTC

head link

[R] plot of Bernoulli data

The loess smoother, with outlier detection turned off,
is an excellent way to estimate the relationship
between a continuous variable and the probability
of an event, based on a binary dependent variable.
Use lowess(x,y,iter=0).  I do wonder about the way in
which your data are generated, however.  You
might think about

 p <- whatever  # and check that if you use pnorm the
                # first arg to pnorm spans the right metric
 y <- 1*(runif(n) <= p)  # n=100 in your example

Bill Simpson wrote:> 
> I have some Bernoulli data something like this:
>  x<-sort(runif(100,1,20))
>  p<-pnorm(x,10,3)
>  y<-as.numeric(runif(x)<p)
>  plot(x,y)
>  lines(x,p)
> 
> This plot is not very satisfactory because the ogive does not visually
> fit the (0,1) points very well, and also because the points tend to fall
> on top of one another. The second problem can be eliminated by adding
> vertical jitter. However I was thinking about the following plot. Instead
> of plotting all the 0,1 points, instead divide the x axis into bins. In
> each bin, find the average y value. Then plot (x=average of x values in
> bin, y=average of 0,1 values in bin). So if I use 10 bins I have 10 points
> in the plot and now the y-values are proportions instead of 0/1.  Is this
> a plot that other people have used (refs appreciated)? If so maybe someone
> has code to do this. Otherwise, I am not sure of how to do this in R.
> Could someone help me?  Thanks very much.
> 
> Bill
> 
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Warnes, Gregory R

2001-Oct-03 13:50 UTC

head link

[R] plot of Bernoulli data

In the next release of the gregmisc library there is a function
"wapply"
that applies a specified function over 'windows' of x-y data (code
below).
You would use it in this case as:

 x<-sort(runif(100,1,20))
 p<-pnorm(x,10,3)
 y<-as.numeric(runif(x)<p)
 plot(x,y)
 lines(x,p)

 # draw a line with 10 points, each computed on regions spanning 1/5th of
the x range
 lines(wapply(x,y,n=10,width=1/5),col="red")

-Greg



# $Id: wapply.R,v 1.2 2001/08/31 23:45:45 warneg Exp $
#
"wapply" _ function( x, y, fun=mean, method="range",
                    width=1/10, n=50, ...)
{
  method <- match.arg(method,
c("width","range","nobs","fraction"))

  if(method == "width" || method == "range" )
    {
      if(method=="range")
        width <- width * range(x)
      
      
      pts _ seq(min(x),max(x),length.out=n)
      
      result _ sapply( pts, function(pts,y,width,fun,XX,...)
                      {
                        low _ min((pts-width/2),max(XX))
                        high _ max((pts+width/2), min(XX))
                        return (fun(y[(XX>= low) & (XX<=high)],...))
                      },
                      y=y,
                      width=width,
                      fun=fun,
                      XX = x,
                      ...)
      
      return(x=pts,y=result)
    }
  else # method=="nobs" || method=="fraction"
    {
      if( method=="fraction")
        width <- floor(length(x) * width)
      
      ord <- order(x)
      x  <- x[ord]
      y  <- y[ord]

      n  <- length(x)
      center  <- 1:n
      below <- sapply(center - width/2, function(XX) max(1,XX) )
      above <- sapply(center + width/2, function(XX) min(n,XX) )

      retval  <- list()
      retval$x  <- x
      retval$y  <- apply(cbind(below,above), 1,
                         function(x) fun(y[x[1]:x[2]],...) )
                         
      return(retval)
    }
      
}



 >  -----Original Message-----
 >  From: Bill Simpson [mailto:wsi at gcal.ac.uk]
 >  Sent: Tuesday, October 02, 2001 7:57 AM
 >  To: r-help at stat.math.ethz.ch
 >  Subject: Re: [R] plot of Bernoulli data
 >  
 >  
 >  >  x<-sort(runif(100,1,20))
 >  >  p<-pnorm(x,10,3)
 >  >  y<-as.numeric(runif(x)<p)
 >  >  plot(x,y)
 >  >  lines(x,p)
 >  df<-data.frame(x,y)
 >  aggregate(df,list(x=(x<5),(x>5)&(x<10),(x>10) & 
 >  (x<15),(x>15)), FUN=mean)
 >  gives me what I want but if anyone has a better way to collect the
 >  observations into bins I'd like to hear it. It would be nice to
 >  pass along something like
 >  breaks<-c(5,10,15,20)
 >  
 >  Thanks
 >  
 >  Bill Simpson
 >  
 >  -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
 >  -.-.-.-.-.-.-.-.-.-
 >  r-help mailing list -- Read 
 >  http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
 >  Send "info", "help", or "[un]subscribe"
 >  (in the "body", not the subject !)  To: 
 >  r-help-request at stat.math.ethz.ch
 >  _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
 >  _._._._._._._._._._
 >  


LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be
privileged. It is intended for the addressee(s) only. Access to this E-mail by
anyone else is unauthorized. If you are not an addressee, any disclosure or
copying of the contents of this E-mail or any action taken (or not taken) in
reliance on it is unauthorized and may be unlawful. If you are not an addressee,
please inform the sender immediately.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Warnes, Gregory R

2001-Oct-04 15:54 UTC

head link

[R] plot of Bernoulli data

Hi Frank,

Actually the 'wapply' function I have does 'overlapping'
windows.  (It would
have to if it uses 1/5 of the range at each of 10 points, n'est-pas?)  The
main reason I created it was to add a 'local variance' estimate to plots
in
an attempt to more easily detect changes in variability. EG:

x <- rnorm(500,sd=10)
y <- rnorm(500,sd=sqrt(abs(x)))
plot(x,y)

sdplus <- function(x,sign=1) 
  {
    if(length(x)>0) 
        mean(x) + sign * sqrt(var(x)) 
    else 
        NA
  }
lines(wapply(x,y,fun=sdplus,width=1/5),col="red")
lines(wapply(x,y,fun=sdplus,sign=-1,width=1/5),col="red")

Not very pretty, but does what I need. 

I'm not familiar with the trellace/lattice functions.  Is there a better or
more elegant way to do this?

-Greg

 >  -----Original Message-----
 >  From: Frank E Harrell Jr [mailto:fharrell at virginia.edu]
 >  Sent: Thursday, October 04, 2001 9:06 AM
 >  To: Warnes, Gregory R
 >  Cc: 'Bill Simpson'; 'r-help at stat.math.ethz.ch'
 >  Subject: Re: [R] plot of Bernoulli data
 >  
 >  
 >  Greg - My model for this type of application is to
 >  either use
 >  
 >  - shingle objects with Trellis/Lattice to allow overlapping
 >    intervals (and for this application there is good reason
 >    for intervals to overlap if you are not using the
 >    better loess/lowess solution)
 >  or
 >  - the cut2 function in the soon (I hope) to be beta-released
 >    Hmisc library, with tapply and all the other stratified
 >    computation functions.  cut2 has the same arguments as
 >    cut but also arguments g (number of quantile groups) and
 >    m (number of observations per interval)
 >  
 >  cut2 at present does not handle fixed windows as you
 >  describe, but in general such windows are problematic.
 >  
 >  The bigger point though is that there may be a slight
 >  advantage in separating the stratification code (e.g.,
 >  cut2) from the computation code (e.g., tapply and
 >  summarize (in Hmisc)).
 >  
 >  Frank Harrell
 >  
 >  
 >  "Warnes, Gregory R" wrote:
 >  > 
 >  > In the next release of the gregmisc library there is a 
 >  function "wapply"
 >  > that applies a specified function over 'windows' of x-y 
 >  data (code below).
 >  > You would use it in this case as:
 >  > 
 >  >  x<-sort(runif(100,1,20))
 >  >  p<-pnorm(x,10,3)
 >  >  y<-as.numeric(runif(x)<p)
 >  >  plot(x,y)
 >  >  lines(x,p)
 >  > 
 >  >  # draw a line with 10 points, each computed on regions 
 >  spanning 1/5th of
 >  > the x range
 >  >  lines(wapply(x,y,n=10,width=1/5),col="red")
 >  > 
 >  > -Greg
 >  > 
 >  > # $Id: wapply.R,v 1.2 2001/08/31 23:45:45 warneg Exp $
 >  > #
 >  > "wapply" _ function( x, y, fun=mean,
method="range",
 >  >                     width=1/10, n=50, ...)
 >  > {
 >  >   method <- match.arg(method, 
 > 
c("width","range","nobs","fraction"))
 >  > 
 >  >   if(method == "width" || method == "range" )
 >  >     {
 >  >       if(method=="range")
 >  >         width <- width * range(x)
 >  > 
 >  > 
 >  >       pts _ seq(min(x),max(x),length.out=n)
 >  > 
 >  >       result _ sapply( pts, function(pts,y,width,fun,XX,...)
 >  >                       {
 >  >                         low _ min((pts-width/2),max(XX))
 >  >                         high _ max((pts+width/2), min(XX))
 >  >                         return (fun(y[(XX>= low) & 
 >  (XX<=high)],...))
 >  >                       },
 >  >                       y=y,
 >  >                       width=width,
 >  >                       fun=fun,
 >  >                       XX = x,
 >  >                       ...)
 >  > 
 >  >       return(x=pts,y=result)
 >  >     }
 >  >   else # method=="nobs" || method=="fraction"
 >  >     {
 >  >       if( method=="fraction")
 >  >         width <- floor(length(x) * width)
 >  > 
 >  >       ord <- order(x)
 >  >       x  <- x[ord]
 >  >       y  <- y[ord]
 >  > 
 >  >       n  <- length(x)
 >  >       center  <- 1:n
 >  >       below <- sapply(center - width/2, function(XX) max(1,XX) )
 >  >       above <- sapply(center + width/2, function(XX) min(n,XX) )
 >  > 
 >  >       retval  <- list()
 >  >       retval$x  <- x
 >  >       retval$y  <- apply(cbind(below,above), 1,
 >  >                          function(x) fun(y[x[1]:x[2]],...) )
 >  > 
 >  >       return(retval)
 >  >     }
 >  > 
 >  > }
 >  > 
 >  >  >  -----Original Message-----
 >  >  >  From: Bill Simpson [mailto:wsi at gcal.ac.uk]
 >  >  >  Sent: Tuesday, October 02, 2001 7:57 AM
 >  >  >  To: r-help at stat.math.ethz.ch
 >  >  >  Subject: Re: [R] plot of Bernoulli data
 >  >  >
 >  >  >
 >  >  >  >  x<-sort(runif(100,1,20))
 >  >  >  >  p<-pnorm(x,10,3)
 >  >  >  >  y<-as.numeric(runif(x)<p)
 >  >  >  >  plot(x,y)
 >  >  >  >  lines(x,p)
 >  >  >  df<-data.frame(x,y)
 >  >  >  aggregate(df,list(x=(x<5),(x>5)&(x<10),(x>10)
&
 >  >  >  (x<15),(x>15)), FUN=mean)
 >  >  >  gives me what I want but if anyone has a better way 
 >  to collect the
 >  >  >  observations into bins I'd like to hear it. It would 
 >  be nice to
 >  >  >  pass along something like
 >  >  >  breaks<-c(5,10,15,20)
 >  >  >
 >  >  >  Thanks
 >  >  >
 >  >  >  Bill Simpson
 >  >  >
 >  >  >  -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
 >  >  >  -.-.-.-.-.-.-.-.-.-
 >  >  >  r-help mailing list -- Read
 >  >  >  http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
 >  >  >  Send "info", "help", or
"[un]subscribe"
 >  >  >  (in the "body", not the subject !)  To:
 >  >  >  r-help-request at stat.math.ethz.ch
 >  >  >  _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
 >  >  >  _._._._._._._._._._
 >  >  >
 >  > 
 >  > LEGAL NOTICE
 >  > Unless expressly stated otherwise, this message is 
 >  confidential and may be privileged. It is intended for the 
 >  addressee(s) only. Access to this E-mail by anyone else is 
 >  unauthorized. If you are not an addressee, any disclosure 
 >  or copying of the contents of this E-mail or any action 
 >  taken (or not taken) in reliance on it is unauthorized and 
 >  may be unlawful. If you are not an addressee, please inform 
 >  the sender immediately.
 >  > 
 >  -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
 >  -.-.-.-.-.-.-.-.-.-
 >  > r-help mailing list -- Read 
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Oct 2001 - plot of Bernoulli data

[R] plot of Bernoulli data

[R] plot of Bernoulli data

[R] plot of Bernoulli data

[R] plot of Bernoulli data

[R] plot of Bernoulli data

Seemingly Similar Threads