thr3ads.net - R help - [R] geometric mean of probability density functions [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Aaron Spivak

2009-Mar-18 17:54 UTC

[R] geometric mean of probability density functions

Hi,
This is my first time posting to the mailing list, so if I'm doing something
wrong, just let me know.  I've taken ~1000 samples from 8 biological
replicates, and I want to somehow combine the density functions of the
replicates.  Currently, I can plot the density function for each biological
replicate, and I'd like to see how pool of replicates compares to a
simulation I conducted earlier.  I can compare each replicate to the
simulation, but there's a fair amount of variability between replicates.
I'd like to take the geometric mean of the density functions at each point
along the x-axis, but when I compute:
> a<-density(A[,1][A[,1]>=0], n=2^15)
> b<-density(A[,3][A[,3]>=0], n=2^15)
> a$x[1]
[1] -70.47504> b$x[1][1] -69.28902

So I can't simply compute the mean across y-values, because the x-values
don't match.  Is there a way to set the x-values to be the same for multiple
density plots?  Also, there are no negative values in the dataset, so I'd
like to bound the x-axis at 0 if at all possible?  Is there a standard way
to combine density functions?  Thanks for the advice.
-Aaron Spivak

ps. I thought about just pooling all measurements, but I don't think
that's
appropriate because they are from different replicates and the smoothing
kernel depends on the variance in the sample to calculate the distribution.

	[[alternative HTML version deleted]]

David Winsemius

2009-Mar-18 18:33 UTC

head link

[R] geometric mean of probability density functions

If you make your calls to density with common lenth and interval  
parameters you should be able to get better "registration":

  ?density
# this example sums the squared differences
  x <- rnorm(200,1,1)
  x2 <- rnorm(200,1,1)
  d1 <- density(x, n=512, from=-1, to= 4)
  d2 <- density(x2, n=512, from=-1, to= 4)
  ssq <- sum( (d1$y - d2$y)^2 )
  ssq

-- 
David Winsemius

On Mar 18, 2009, at 1:54 PM, Aaron Spivak wrote:
> Hi,
> This is my first time posting to the mailing list, so if I'm doing  
> something
> wrong, just let me know.  I've taken ~1000 samples from 8 biological
> replicates, and I want to somehow combine the density functions of the
> replicates.  Currently, I can plot the density function for each  
> biological
> replicate, and I'd like to see how pool of replicates compares to a
> simulation I conducted earlier.  I can compare each replicate to the
> simulation, but there's a fair amount of variability between  
> replicates.
> I'd like to take the geometric mean of the density functions at each  
> point
> along the x-axis, but when I compute:
>
>> a<-density(A[,1][A[,1]>=0], n=2^15)
>> b<-density(A[,3][A[,3]>=0], n=2^15)
>> a$x[1]
> [1] -70.47504
>> b$x[1]
> [1] -69.28902
>
> So I can't simply compute the mean across y-values, because the x- 
> values
> don't match.  Is there a way to set the x-values to be the same for  
> multiple
> density plots?  Also, there are no negative values in the dataset,  
> so I'd
> like to bound the x-axis at 0 if at all possible?  Is there a  
> standard way
> to combine density functions?  Thanks for the advice.
> -Aaron Spivak
>
> ps. I thought about just pooling all measurements, but I don't think  
> that's
> appropriate because they are from different replicates and the  
> smoothing
> kernel depends on the variance in the sample to calculate the  
> distribution.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Mike Lawrence

2009-Mar-18 23:18 UTC

head link

[R] geometric mean of probability density functions

If you're interested in comparing empirical to simulation
distributions, I see two alternatives to your density() approach
(which will be sensitive to your choice of bandwidth). Both of the
following have been used in my field to look at the fit of empirical
response time data to models of human information processing:

(1) estimate some number of quantiles for each set of 1000 replicates,
average the quantile points, then compare the results to a similar set
of quantiles generated from your simulation data.

(2) fit an a priori distribution to each set of 1000 replicates,
within each distribution parameter, average across sets, then compare
to parameters obtained from fitting the simulation data.

#generate some fake simulation data
sim.x = rnorm(1e5,100,20)

#make up some fake empirical data
a=data.frame(
	set = rep(1:8,each=1e3)
	,x=rnorm(8*1e3,100,20)+rexp(8*1e3,50)
)

#define a function to compute the geometric mean
##(fails with negative numbers!)
geometric.mean = function(x) exp(mean(log(x)))


########
# Quantile approach
########

#set up a data frame to collect empirical quantile data
emp.q=expand.grid(
	set=unique(a$set)
	,q=seq(.1,.9,.1)
	,x=NA
)

#estimate empirical quantiles
for(set in unique(a$set)){
	emp.q$x[emp.q$set==set] = quantile(a$x[a$set==set],probs=unique(emp.q$q))
}

#compute the geomentric mean for each empirical quantile
emp.q.means = with(
	emp.q
	,aggregate(
		x
		,list(
			q=q
		)
		,geometric.mean
	)
)

#estimate the quantiles from the simulation data
sim.q = quantile(sim.x,unique(emp.q$q))

#assess the fit using sum squared scaled error
quantile.sum.sq.sc.err = sum(((emp.q.means$x-sim.q)/sim.q)^2)
print(quantile.sum.sq.sc.err)


########
# Fitting approach using the Generalized Lambda distribution
## GLD chosen because it's flexible and easily fit using the
### gld package
########

#install the gld package if necessary
if(!('gld' %in% installed.packages())){
	install.packages('gld')
}

#load the gld package
library(gld)

#set up a data frame to collect empirical GLD parameters
emp.gld.par=expand.grid(
	set=unique(a$set)
	,lambda=1:4
	,x=NA
)

#estimate empirical GLD parameters
## print-out keeps an eye on convergence (hopefully 0)
## takes a while to complete
for(set in unique(a$set)){
	fit = starship(a$x[a$set==set])
	cat('Set:',set,',
convergence:',fit$optim.results$convergence,'\n')
	emp.gld.par$x[emp.gld.par$set==set] = fit$lambda
}

#compute the geomentric mean for each empirical GLD parameter
emp.gld.par.means = with(
	emp.gld.par
	,aggregate(
		x
		,list(
			lambda=lambda
		)
		,geometric.mean
	)
)

#estimate the GLD parameters from the simulation data
sim.fit = starship(sim.x)
cat('Sim convergence:',sim.fit$optim.results$convergence,'\n')
sim.gld.par = sim.fit$lambda

#assess the fit using sum squared scaled error
gld.par.sum.sq.sc.err = sum(((emp.gld.par.means$x-sim.gld.par)/sim.gld.par)^2)
print(gld.par.sum.sq.sc.err)



-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tinyurl.com/mikes-public-calendar

~ Certainty is folly... I think. ~

Possibly Parallel Threads

Search for more maybe matching threads

R help - Mar 2009 - geometric mean of probability density functions

[R] geometric mean of probability density functions

[R] geometric mean of probability density functions

[R] geometric mean of probability density functions

Possibly Parallel Threads