Dear all, I have a surprising problem with the representation of frequencies in a histogram. Consider, for example, the R code: b<-rnorm(2000,3.5,0.3) hist(b,freq=F) When I plotted the histogram, I expected that values in the y-axis (the probability) varied between 0 and 1. Instead, they varied within the range 0-1.3. Have you got any suggestion for obtaining a correct graph with probability within the range 0-1? Thank you very much! Bests /Cristian/ ============================================Cristian Pattaro ============================================Unit of Epidemiology & Medical Statistics Department of Medicine and Public Health University of Verona http://biometria.univr.it cristian@biometria.univr.it ============================================ [[alternative HTML version deleted]]
Cristian Pattaro wrote:> Dear all, > > I have a surprising problem with the representation of frequencies in a > histogram. > > Consider, for example, the R code: > > b<-rnorm(2000,3.5,0.3) > hist(b,freq=F) > > When I plotted the histogram, I expected that values in the y-axis (the > probability) varied between 0 and 1. Instead, they varied within the > range 0-1.3. > > Have you got any suggestion for obtaining a correct graph with > probability within the range 0-1?Note that width * height (and *not* the height solely) corresponds to the probability in a histogram. Uwe Ligges> Thank you very much! > > Bests > /Cristian/ > > ============================================> Cristian Pattaro > ============================================> Unit of Epidemiology & Medical Statistics > Department of Medicine and Public Health > University of Verona > > http://biometria.univr.it > cristian at biometria.univr.it > ============================================> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
On Tue, 25 May 2004 12:27:40 +0200 Cristian Pattaro wrote:> Dear all, > > I have a surprising problem with the representation of frequencies in > a histogram. > > Consider, for example, the R code: > > b<-rnorm(2000,3.5,0.3) > hist(b,freq=F) > > When I plotted the histogram, I expected that values in the y-axis > (the probability) varied between 0 and 1. Instead, they varied within > the range 0-1.3.The y-axis gives the density not the probability! And the density you are sampling from has R> dnorm(3.5, mean = 3.5, sd = 0.3) [1] 1.329808 so you shouldn't be surprised by this. Z> Have you got any suggestion for obtaining a correct graph with > probability within the range 0-1? > > Thank you very much! > > Bests > /Cristian/ > > ============================================> Cristian Pattaro > ============================================> Unit of Epidemiology & Medical Statistics > Department of Medicine and Public Health > University of Verona > > http://biometria.univr.it > cristian at biometria.univr.it > ============================================> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
On 25-May-04 Cristian Pattaro wrote:> I have a surprising problem with the representation of > frequencies in a histogram. > > Consider, for example, the R code: > > b<-rnorm(2000,3.5,0.3) > hist(b,freq=F) > > When I plotted the histogram, I expected that values in > the y-axis (the probability) varied between 0 and 1. > Instead, they varied within the range 0-1.3. > > Have you got any suggestion for obtaining a correct graph > with probability within the range 0-1?It depends on the widths of the bins, since what is plotted in the histogram when freq=F is vertically scaled so that sum over bins of h*(width of bin) = 1 where h is the height of the histogram bar according to the vertical scale. In other words, hist plots a per-bin estimate of the probability density in the sense of "amount of probability per bin divided by width of bin". If your bin widths are narrow (and your SD above is 0,3, so you will get quite narrow bins, 0.2 in this case) and you may well get values exceeding 1. Exactly, indeed, as for the density of the normal distribution itself: (1/(sqrt(2*pi)*sigma))*exp(-0.5* ... ) where small values of sigma give density > 1 near x=0. If you need the actual value of the probabilities in the bins (i.e. n_i/N) then you can force it by constructing a new hist object on the lines of h<-hist(b,freq=F) h$counts <- h$counts/sum(h$counts) plot(h) When I do this with your above example, whereas the original gives a y-axis from 0 to 1.2 with the tallest bar at about 1.3, "plot(h)" give exactly the same graph but with the y-axis labelled from 0 to 0.25, with the tallest bar at 0.2625, which shows the probabilities. Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 25-May-04 Time: 12:15:20 ------------------------------ XFMail ------------------------------
anoly
2004-May-28 16:05 UTC
[R] how to pass defined function with more than one arguments to apply?
Dear all: I meet a problem of apply function. I have a matrix called tb>tbV1 V2 V3 1 0 3 1 2 1 4 0 3 0 3 0 4 0 4 0 5 0 3 1 6 1 4 1 7 1 1 0 8 1 3 0 9 0 1 1 10 0 3 1 I hope to get the number of row that match c(0,3,1) I do this way:>length(apply(t(tb) = = c (0,3,1), 2, all))I defined a funtion, compare<-function(vector1, vector2){...}. For example, compare(1:3, 1:3) will return TRUE. compare(1:3,2:4) return FALSE. Then I hope to call apply(tb,1,compare). But this can not work, because apply only pass one argument to compare function. Does anyone know how to solve this problem? Thanks so much. Anoly
Gabor Grothendieck
2004-May-29 01:13 UTC
[R] how to pass defined function with more than one arguments to apply?
Not sure what you intend with regard to length but to get a logical vector indicating which rows equal a particular vector: f1 <- function(tb, row) apply(tb,1,function(x)all(x==row)) # or without using apply: f2 <- function(tb, row) colSums( abs( (t(tb) - row) ) ) == 0 # or in terms of a general compare function: f3 <- function(tb, row, compare = function(x)all(x==row)) apply(tb,1,compare) row <- c(0,3,1) f1(tb,row) f2(tb,row) f3(tb,row) # If you want the number of matching rows: length(which(f1(tb,row))) etc. anoly <anoly16b <at> hotmail.com> writes: : : Dear all: : I meet a problem of apply function. I have a matrix called tb : >tb : V1 V2 V3 : 1 0 3 1 : 2 1 4 0 : 3 0 3 0 : 4 0 4 0 : 5 0 3 1 : 6 1 4 1 : 7 1 1 0 : 8 1 3 0 : 9 0 1 1 : 10 0 3 1 : : I hope to get the number of row that match c(0,3,1) : I do this way: : >length(apply(t(tb) = = c (0,3,1), 2, all)) : I defined a funtion, compare<-function(vector1, vector2){...}. For example, : compare(1:3, 1:3) will return TRUE. compare(1:3,2:4) return FALSE. : Then I hope to call apply(tb,1,compare). But this can not work, because : apply only pass one argument to compare function. Does anyone know how to : solve this problem? : : Thanks so much. : Anoly : : ______________________________________________ : R-help <at> stat.math.ethz.ch mailing list : https://www.stat.math.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html : :