On Wednesday 01 July 2009, roger koenker wrote:> url: www.econ.uiuc.edu/~roger Roger Koenker
> email rkoenker at uiuc.edu Department of Economics
> vox: 217-333-4558 University of Illinois
> fax: 217-244-6678 Urbana, IL 61801
>
> On Jul 1, 2009, at 4:52 PM, Dylan Beaudette wrote:
> > On Wednesday 01 July 2009, roger koenker wrote:
> >> It's not clear to me whether you are looking for an
exploratory tool
> >> or something more like formal inference. For the former, it seems
> >> that estimating a few weighted quantiles would be quite useful.
at
> >> least
> >> it is rather Tukeyesque. While I'm appealing to authorities,
I can't
> >> resist recalling that Galton's "invention of
correlation" paper: Co-
> >> relations
> >> and their measurement, Proceedings of the Royal Society, 1888-89,
> >> used medians and interquartile ranges.
> >
> > Thanks Roger.
> >
> > We are interested in a formal inference type of comparison. I have
> > found that
> > weighted density plots to be a helpful exploratory graphic-- but
> > ultimately
> > we would like to (as objectively as possible) determine if
> > distributions are
> > different.
> >
> > Unfortunately the distributions are very poorly behaved. Once such
> > example can
> > be seen here:
> > http://169.237.35.250/~dylan/temp/ACTIVITY-method_comparision.pdf
> >
> > I was imagine using something like the number of 'significant'
> > differences in
> > n-quantiles would be used to determine which treatment group was most
> > different than the control group.
>
> sure, you could do that but to be formal and account for multiple
> testing you
> are going to have some headaches. The much maligned two sample KS
> tests might be a worthwhile alternative to consider.
Right... forgot about that. OK. I had considered the KS test, but have not yet
been able to identify how one could include weights... There appear to be
methods listed on the internet, but I haven't seen one implemented in R.
Still looking...
Dylan
>
> > Cheers,
> > Dylan
> >
> >> url: www.econ.uiuc.edu/~roger Roger Koenker
> >> email rkoenker at uiuc.edu Department of Economics
> >> vox: 217-333-4558 University of Illinois
> >> fax: 217-244-6678 Urbana, IL 61801
> >>
> >> On Jul 1, 2009, at 2:48 PM, Dylan Beaudette wrote:
> >>> Thanks Roger. Your comments were very helpful. Unfortunately,
each
> >>> of
> >>> the 'groups' in this example are derived from the same
set of data,
> >>> two of
> >>> which were subsets-- so it is not that unlikely that the
weighted
> >>> medians
> >>> were the same in some cases.
> >>>
> >>> This all leads back to an operation attempting to answer the
> >>> question:
> >>>
> >>> Of the 2 subsetting methods, which one produces a distribution
most
> >>> like the
> >>> complete data set? Since the distributions are not normal, and
there
> >>> are
> >>> area-weights involved others on the list suggested quantile-
> >>> regression. For a
> >>> more complete picture of how 'different' the
distributions are, I
> >>> have tried
> >>> looking at the differences between weighted quantiles: (0.1,
0.25,
> >>> 0.5, 0.75,
> >>> 0.9) as a more complete 'description' of each
distribution.
> >>>
> >>> I imagine that there may be a better way to perform this
> >>> comparison...
> >>>
> >>> Cheers,
> >>> Dylan
> >>>
> >>> On Tuesday 30 June 2009, roger koenker wrote:
> >>>> Admittedly this seemed quite peculiar.... but if you look
at the
> >>>> entrails
> >>>> of the following code you will see that with the weights
the first
> >>>> and
> >>>> second levels of your x$method variable have the same
(weighted)
> >>>> median
> >>>> so the contrast that you are estimating SHOULD be zero.
Perhaps
> >>>> there is something fishy about the data construction that
would
> >>>> have
> >>>> allowed us to anticipate this? Regarding the
"fn" option, and the
> >>>> non-uniqueness warning, this is covered in the
(admittedly
> >>>> obscure)
> >>>> faq on quantile regression available at:
> >>>>
> >>>> http://www.econ.uiuc.edu/~roger/research/rq/FAQ
> >>>>
> >>>> # example:
> >>>> library(quantreg)
> >>>>
> >>>> # load data
> >>>> x <-
read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
> >>>>
> >>>> # with weights
> >>>> summary(rq(sand ~ method, data=x, weights=area_fraction,
tau=0.5),
> >>>> se='ker')
> >>>>
> >>>> #Reproduction with more convenient notation:
> >>>>
> >>>> X0 <- model.matrix(~method, data = x)
> >>>> y <- x$sand
> >>>> w <- x$area_fraction
> >>>> f0 <- summary(rq(y ~ X0 - 1, weights = w),se =
"ker")
> >>>>
> >>>> #Second reproduction with orthogonal design:
> >>>>
> >>>> X1 <- model.matrix(~method - 1, data = x)
> >>>> f1 <- summary(rq(y ~ X1 - 1, weights = w),se =
"ker")
> >>>>
> >>>> #Comparing f0 and f1 we see that they are consistent!!
How can
> >>>> that
> >>>> be??
> >>>> #Since the columns of X1 are orthogonal estimation of the
3
> >>>> parameters
> >>>> are separable
> >>>> #so we can check to see whether the univariate weighted
medians are
> >>>> reproducible.
> >>>>
> >>>> s1 <- X1[,1] == 1
> >>>> s2 <- X1[,2] == 1
> >>>> g1 <- rq(y[s1] ~ X1[s1,1] - 1, weights = w[s1])
> >>>> g2 <- rq(y[s2] ~ X1[s2,2] - 1, weights = w[s2])
> >>>>
> >>>> #Now looking at the g1 and g2 objects we see that they are
equal
> >>>> and
> >>>> agree with f1.
> >>>>
> >>>>
> >>>> url: www.econ.uiuc.edu/~roger Roger Koenker
> >>>> email rkoenker at uiuc.edu Department of
Economics
> >>>> vox: 217-333-4558 University of
Illinois
> >>>> fax: 217-244-6678 Urbana, IL 61801
> >>>>
> >>>> On Jun 30, 2009, at 3:54 PM, Dylan Beaudette wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I am trying to use quantile regression to perform
weighted-
> >>>>> comparisons of the
> >>>>> median across groups. This works most of the time,
however I am
> >>>>> seeing some
> >>>>> odd output in summary(rq()):
> >>>>>
> >>>>> Call: rq(formula = sand ~ method, tau = 0.5, data = x,
weights > >>>>> area_fraction)
> >>>>> Coefficients:
> >>>>> Value Std. Error t value
Pr(>|t|)
> >>>>> (Intercept) 45.44262 3.64706 12.46007
0.00000
> >>>>> methodmukey-HRU 0.00000 4.67115 0.00000
1.00000
> >>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>>>
> >>>>> When I do not include the weights, I get something a
little closer
> >>>>> to a
> >>>>> weighted comparison of means, along with an error
message:
> >>>>>
> >>>>> Call: rq(formula = sand ~ method, tau = 0.5, data = x)
> >>>>> Coefficients:
> >>>>> Value Std. Error t value
Pr(>|t|)
> >>>>> (Intercept) 44.91579 2.46341 18.23318
0.00000
> >>>>> methodmukey-HRU 9.57601 9.29348 1.03040
0.30380
> >>>>> Warning message:
> >>>>> In rq.fit.br(x, y, tau = tau, ...) : Solution may be
nonunique
> >>>>>
> >>>>>
> >>>>> I have noticed that the error message goes away when
specifying
> >>>>> method='fn' to
> >>>>> rq(). An example is below. Could this have something
to do with
> >>>>> replication
> >>>>> in the data?
> >>>>>
> >>>>>
> >>>>> # example:
> >>>>> library(quantreg)
> >>>>>
> >>>>> # load data
> >>>>> x <-
read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
> >>>>>
> >>>>> # with weights
> >>>>> summary(rq(sand ~ method, data=x,
weights=area_fraction, tau=0.5),
> >>>>> se='ker')
> >>>>>
> >>>>> # without weights
> >>>>> # note error message
> >>>>> summary(rq(sand ~ method, data=x, tau=0.5),
se='ker')
> >>>>>
> >>>>> # without weights, no error message
> >>>>> summary(rq(sand ~ method, data=x, tau=0.5,
method='fn'), se='ker')
> >
--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341