Hello, I have a data set of about 300.000 measurements made by an STM which should apporximately fix a normal (Gaussian) distribution. I have imported the data in R and used plot(density()) to get a nice plot of the distribution which in fact looks like a real Gaussian. However, the integral over the surface is not equal to one (I know since some of the plots extend to numbers greater then 1). Is there a way to normalize the data so the density function will actualy yield the probability of x (a height in my case)? This is my code so far: #Input path path <- "G:\\C\\Data txt\\1au300.txt" #Dataverwerking data <- read.table(path, header=TRUE) rows <- length(data$height) height <- data$height[1:rows] dens <-density(height) mean <- mean(height) sd <- sd(height) min <- min(hnorm) max <- max(hnorm) #Plot par(new=FALSE) curve(dnorm(x,m=mean,sd=sd),from=min,to=max, xlab="", ylab="", col="white", lwd=2) points(dens, type="h", col="grey" ) par(new=TRUE) curve(dnorm(x,m=mean,sd=sd),from=min,to=max, xlab="Height (nm)", ylab="Density", lwd=2, col="darkred") Thanks [[alternative HTML version deleted]]
I am not sure what you mean when you say it does not integrate to 1. Here are a couple of cases, and it seems fine to me:> x <- density(1:30) > str(x)List of 7 $ x : num [1:512] -11.0 -10.9 -10.8 -10.7 -10.6 ... $ y : num [1:512] 6.66e-05 7.22e-05 7.84e-05 8.49e-05 9.20e-05 ... $ bw : num 4.01 $ n : int 30 $ call : language density.default(x = 1:30) $ data.name: chr "1:30" $ has.na : logi FALSE - attr(*, "class")= chr "density"> plot(x) > sum(diff(x$x) * (head(x$y,-1) + tail(x$y,-1))/2) # integrate[1] 1.000823> x <- density(rnorm(1000)) > sum(diff(x$x) * (head(x$y,-1) + tail(x$y,-1))/2) # integrate[1] 1.000974>On Nov 15, 2007 2:47 PM, Joren Heit <jorenheit at gmail.com> wrote:> Hello, > > I have a data set of about 300.000 measurements made by an STM which should > apporximately fix a normal (Gaussian) distribution. > I have imported the data in R and used plot(density()) to get a nice plot of > the distribution which in fact looks like a real Gaussian. > However, the integral over the surface is not equal to one (I know since > some of the plots extend to numbers greater then 1). Is there a way to > normalize the data so the density function will actualy yield the > probability of x (a height in my case)? > This is my code so far: > > #Input path > path <- "G:\\C\\Data txt\\1au300.txt" > > #Dataverwerking > data <- read.table(path, header=TRUE) > rows <- length(data$height) > height <- data$height[1:rows] > dens <-density(height) > > mean <- mean(height) > sd <- sd(height) > min <- min(hnorm) > max <- max(hnorm) > > #Plot > par(new=FALSE) > curve(dnorm(x,m=mean,sd=sd),from=min,to=max, xlab="", ylab="", col="white", > lwd=2) > points(dens, type="h", col="grey" ) > par(new=TRUE) > curve(dnorm(x,m=mean,sd=sd),from=min,to=max, xlab="Height (nm)", > ylab="Density", lwd=2, col="darkred") > > > Thanks > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Maybe Joren means that the y axis has values greater than 1? If that is the case, that is certainly not evidence of any problem; the density can have values larger than 1 and still integrate to 1. (And, just as a silly example, try "dnorm(0, mean = 0, sd = 0.1)"). Best, R. On Nov 15, 2007 11:05 PM, jim holtman <jholtman at gmail.com> wrote:> I am not sure what you mean when you say it does not integrate to 1. > Here are a couple of cases, and it seems fine to me: > > > x <- density(1:30) > > str(x) > List of 7 > $ x : num [1:512] -11.0 -10.9 -10.8 -10.7 -10.6 ... > $ y : num [1:512] 6.66e-05 7.22e-05 7.84e-05 8.49e-05 9.20e-05 ... > $ bw : num 4.01 > $ n : int 30 > $ call : language density.default(x = 1:30) > $ data.name: chr "1:30" > $ has.na : logi FALSE > - attr(*, "class")= chr "density" > > plot(x) > > sum(diff(x$x) * (head(x$y,-1) + tail(x$y,-1))/2) # integrate > [1] 1.000823 > > x <- density(rnorm(1000)) > > sum(diff(x$x) * (head(x$y,-1) + tail(x$y,-1))/2) # integrate > [1] 1.000974 > > > > > On Nov 15, 2007 2:47 PM, Joren Heit <jorenheit at gmail.com> wrote: > > Hello, > > > > I have a data set of about 300.000 measurements made by an STM which should > > apporximately fix a normal (Gaussian) distribution. > > I have imported the data in R and used plot(density()) to get a nice plot of > > the distribution which in fact looks like a real Gaussian. > > However, the integral over the surface is not equal to one (I know since > > some of the plots extend to numbers greater then 1). Is there a way to > > normalize the data so the density function will actualy yield the > > probability of x (a height in my case)? > > This is my code so far: > > > > #Input path > > path <- "G:\\C\\Data txt\\1au300.txt" > > > > #Dataverwerking > > data <- read.table(path, header=TRUE) > > rows <- length(data$height) > > height <- data$height[1:rows] > > dens <-density(height) > > > > mean <- mean(height) > > sd <- sd(height) > > min <- min(hnorm) > > max <- max(hnorm) > > > > #Plot > > par(new=FALSE) > > curve(dnorm(x,m=mean,sd=sd),from=min,to=max, xlab="", ylab="", col="white", > > lwd=2) > > points(dens, type="h", col="grey" ) > > par(new=TRUE) > > curve(dnorm(x,m=mean,sd=sd),from=min,to=max, xlab="Height (nm)", > > ylab="Density", lwd=2, col="darkred") > > > > > > Thanks > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz