thr3ads.net - R help - [R] mixtures as outcome variables [Mar 2005]

If this information is useful, please help other people find it:
Share via:

Jason W. Martinez

2005-Mar-22 23:11 UTC

[R] mixtures as outcome variables

Dear R-users,

I have an outcome variable and I'm unsure about how to treat it. Any
advice?

I have spending data for each county in the state of California (N=58).
Each county has been allocated money to spend on any one of the
following four categories: A, B, C, and D.

Each county may spend the money in any way they see fit. This also means
that the county need not spend all the money that was allocated to them.
The data structure looks something like the one below:

COUNTY    A        B       C       D        Total
----------------------------------------------------
alameda  2534221  1555592 2835475  3063249  9988537
alpine   3174     8500    0        45558    55232
amador    0       0        0        0       0
....


The goal is to explain variation in spending patterns, which are
presumably the result of characteristics for each county.

I may treat the problem like a simple linear regression problem for each
category, but by definition, money spent in one category will take away
the amount of money that can be spent in any other category---and each
county is not allocated the same amount of money to spend.

I have constructed proportions of amount spent on each category and have
conducted quasibinomial regression, on each dependent outcome but that
does not seem very convincing to me. 

Would anyone have any advice about how to treat an outcome variable of
this sort?

Thanks for any hints!

Jason





-- 
Jason W. Martinez, Gradaute Student
University of California, Riverside
Department of Sociology
E-mail: jmartinez5 at verizon.net

Kjetil Brinchmann Halvorsen

2005-Mar-23 16:36 UTC

head link

[R] mixtures as outcome variables

Jason W. Martinez wrote:
>Dear R-users,
>
>I have an outcome variable and I'm unsure about how to treat it. Any
>advice?
>
>I have spending data for each county in the state of California (N=58).
>Each county has been allocated money to spend on any one of the
>following four categories: A, B, C, and D.
>
>Each county may spend the money in any way they see fit. This also means
>that the county need not spend all the money that was allocated to them.
>The data structure looks something like the one below:
>
>COUNTY    A        B       C       D        Total
>----------------------------------------------------
>alameda  2534221  1555592 2835475  3063249  9988537
>alpine   3174     8500    0        45558    55232
>amador    0       0        0        0       0
>....
>
>
>The goal is to explain variation in spending patterns, which are
>presumably the result of characteristics for each county.
>
>I may treat the problem like a simple linear regression problem for each
>category, but by definition, money spent in one category will take away
>the amount of money that can be spent in any other category---and each
>county is not allocated the same amount of money to spend.
>
>I have constructed proportions of amount spent on each category and have
>conducted quasibinomial regression, on each dependent outcome but that
>does not seem very convincing to me. 
>
>Would anyone have any advice about how to treat an outcome variable of
>this sort?
>
>Thanks for any hints!
>
>Jason
>
>
>
>
>
>  
>If you only concentrate on the relative proportions, this are called 
compositional data. I f your data are in
mydata (n x 4), you obtain compositions by
sweep(mydata, 1, apply(mydata, 1, sum), "/")

There are not (AFAIK) specific functions/packages for R for 
compositional data AFAIK, but you
can try googling. Aitchison  has a monography (Chapman & Hall) and a 
paper in JRSS B.

One way to start might be lm's or anova on the symmetric logratio 
transform of the
compositons. The R function lm can take a multivariate response, but 
some extra programming will be needed
for interpretation. With simulated data:

 > slr
function(y) { # y should sum to 1
          v <- log(y)
          return( v - mean(v) ) }
 > testdata <- matrix( rgamma(120, 2,3), 30, 4)
 > str(testdata)
 num [1:30, 1:4] 0.200 0.414 0.311 2.145 0.233 ...
 > comp <- sweep(testdata, 1, apply(testdata,1,sum), "/")
# To get the symmetric logratio transform:
comp <- t(apply(comp, 1, slr))
# Observe:
apply(cov(comp), 1, sum)
[1] -5.551115e-17  2.775558e-17  5.551115e-17 -2.775558e-17
 > lm( comp ~ 1)

Call:
lm(formula = comp ~ 1)

Coefficients:
             [,1]      [,2]      [,3]      [,4]   
(Intercept)   0.17606   0.06165  -0.03783  -0.19988

 > summary(lm( comp ~ 1))
Response Y1 :

Call:
lm(formula = Y1 ~ 1)

Residuals:
     Min       1Q   Median       3Q      Max
-1.29004 -0.46725 -0.07657  0.55834  1.20551

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
[1,]   0.1761     0.1265   1.391    0.175

Residual standard error: 0.6931 on 29 degrees of freedom


Response Y2 :

Call:
lm(formula = Y2 ~ 1)

Residuals:
    Min      1Q  Median      3Q     Max
-1.2982 -0.5711 -0.1355  0.5424  1.6598

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
[1,]  0.06165    0.15049    0.41    0.685

Residual standard error: 0.8242 on 29 degrees of freedom


Response Y3 :

Call:
lm(formula = Y3 ~ 1)

Residuals:
     Min       1Q   Median       3Q      Max
-1.97529 -0.41115  0.03666  0.42785  0.88567

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
[1,] -0.03783    0.11623  -0.325    0.747

Residual standard error: 0.6366 on 29 degrees of freedom


Response Y4 :

Call:
lm(formula = Y4 ~ 1)

Residuals:
    Min      1Q  Median      3Q     Max
-2.8513 -0.3955  0.2815  0.5939  1.2475

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
[1,]  -0.1999     0.1620  -1.234    0.227

Residual standard error: 0.8872 on 29 degrees of freedom


Sorry for not being of more help!

Kjetil


-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
               --  Mahdi Elmandjra





-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

Greg Snow

2005-Mar-23 17:52 UTC

head link

[R] mixtures as outcome variables

>>  >>> "Jason W. Martinez" <jmartinez5 at
verizon.net> 03/22/05 04:11PM
>>>
>>  Dear R-users,
>>  
>>  I have an outcome variable and I'm unsure about how to treat it.
Any>>  advice?
Below are a couple of ideas/suggestions of things to think about
>>  
>>  I have spending data for each county in the state of California
(N=58).>>  Each county has been allocated money to spend on any one of the
>>  following four categories: A, B, C, and D.
>>  
>>  Each county may spend the money in any way they see fit. This also
means>>  that the county need not spend all the money that was allocated to
them.>>  The data structure looks something like the one below:
You might want to include a category for the amout of money not spent
(for
a total of 5 possibilities).
>>  COUNTY    A        B       C       D        Total
>>  ----------------------------------------------------
>>  alameda  2534221  1555592 2835475  3063249  9988537
>>  alpine   3174     8500    0        45558    55232
>>  amador    0       0        0        0       0
>>  ....
>>  
>>  
>>  The goal is to explain variation in spending patterns, which are
>>  presumably the result of characteristics for each county.
Do you have data representing these characteristics?  The predictor
values
in a regression type model?

Starting with some good graphics may help determine and show 
interesting patterns.

The maptools package can read in shapefiles and plot the maps.  You can

download a shapefile with the county boundaries from:
http://www.census.gov/geo/www/cob/co2000.html

Then you could use the symbols function to plot a star in the center of
each 
county (use get.Pcent from maptools to find the coordinates of the
centers).

Then just look for groups of counties with similar looking stars, or
stars that
are different from those close by (I would use the percentage spent in
each
category for the lengths of the star spokes).

Another graph that may prove interesting is the trilinear plot (see the
article
in Chance from the summer of 2002).  Combine your categories into 3
groups
(e.g. A&B vs. C&D vs. not spent; or A vs. B vs. all others) then plot
each county's
spending on the trilinear plot (functions to do the plot are:
triangle.plot in ade4,
triplot in klaR, or I have some code that I wrote (not on CRAN yet)).

Look for clusters of counties in these plots.
>>  I may treat the problem like a simple linear regression problem for
each>>  category, but by definition, money spent in one category will take
away>>  the amount of money that can be spent in any other category---and
each>>  county is not allocated the same amount of money to spend.
>>  
>>  I have constructed proportions of amount spent on each category and
have>>  conducted quasibinomial regression, on each dependent outcome but
that>>  does not seem very convincing to me. 
>>  
>>  Would anyone have any advice about how to treat an outcome variable
of>>  this sort?
Here are a couple of thoughts (there may be better options).

Assuming that you have some predictor (x) variables about each county:

use the multinom function in the nnet package, the idea being that each

dollar spent follows a multinomial with certain probabilities as to
which category
it will be spent in and the predictors tell you what the probabilities
are.

Similarly you could use package rpart to do a tree model, use the
category as the
outcome and the percentage spent on the category as the weights (each
county
would be spread accross 4 or 5 lines of the dataset with the predictors
replicated
on each line).  rpart gives the probabilities/proportions for each
category based
on splits of the predictor variables.

>>  Thanks for any hints!
>>  
>>  Jason
>>  
>>  
>>  -- 
>>  Jason W. Martinez, Gradaute Student
>>  University of California, Riverside
>>  Department of Sociology
>>  E-mail: jmartinez5 at verizon.net 
>>  
hope this helps,

Greg Snow, Ph.D.
Statistical Data Center
greg.snow at ihc.com
(801) 408-8111

James Reilly

2005-Mar-25 16:43 UTC

head link

[R] mixtures as outcome variables

A collection of functions for compositional data analysis were posted on 
the S-news mailing list about a year ago.

Basic Compositional Data Analysis functions for S+/R
http://www.biostat.wustl.edu/archives/html/s-news/2003-12/msg00139.html

James
> Date: Thu, 24 Mar 2005 20:28:51 -0400 
> From: Kjetil Brinchmann Halvorsen <kjetil at acelerate.com> 
> Subject: Re: [R] mixtures as outcome variables 
> Cc: r-help at stat.math.ethz.ch, "Jason W. Martinez"
<jmartinez5 at verizon.net>
> 
> Kjetil Brinchmann Halvorsen wrote:
> 
>>> Jason W. Martinez wrote:
>>>
>>
>>>>> Dear R-users,
>>>>>
>>>>> I have an outcome variable and I'm unsure about how to
treat it. Any
>>>>> advice?
>>>>>
>>
>>> If you only concentrate on the relative proportions, this are
called
>>> compositional data. I f your data are in
>>> mydata (n x 4), you obtain compositions by
>>> sweep(mydata, 1, apply(mydata, 1, sum), "/")
>>>
>>> There are not (AFAIK) specific functions/packages for R for 
>>> compositional data AFAIK, but you
>>> can try googling. Aitchison  has a monography (Chapman & Hall)
and a
>>> paper in JRSS B.
>>>
>>> One way to start might be lm's or anova on the symmetric
logratio
>>> transform of the
>>> compositons. The R function lm can take a multivariate response,
but
>>> some extra programming will be needed
>>> for interpretation. With simulated data:
>>>
>>
>>>> > slr
>>
>>> function(y) { # y should sum to 1
>>>          v <- log(y)
>>>          return( v - mean(v) ) }
>>
>>>> > testdata <- matrix( rgamma(120, 2,3), 30, 4)
>>>> > str(testdata)
>>
>>> num [1:30, 1:4] 0.200 0.414 0.311 2.145 0.233 ...
>>
>>>> > comp <- sweep(testdata, 1, apply(testdata,1,sum),
"/")
>>
>>> # To get the symmetric logratio transform:
>>> comp <- t(apply(comp, 1, slr))
>>> # Observe:
>>> apply(cov(comp), 1, sum)
>>> [1] -5.551115e-17  2.775558e-17  5.551115e-17 -2.775558e-17
>>
>>>> > lm( comp ~ 1)
>>
>>>
>>> Call:
>>> lm(formula = comp ~ 1)
>>>
>>> Coefficients:
>>>             [,1]      [,2]      [,3]      [,4]   (Intercept)   
>>> 0.17606   0.06165  -0.03783  -0.19988
> 
> 
> Followup:
> 
>  > mmod <- manova(comp ~ x)
>  > summary(mmod)
> Error in summary.manova(mmod) : residuals have rank 3 < 4
>  >
> 
> So the manova() function cannot be used. I guess MANOVA for 
> compositional data should be
> a straight extension, but it must be programmed , standard manova cannot 
> be used.
> 
> Kjetil
> 
> -- Kjetil Halvorsen. Peace is the most effective weapon of mass
construction. -- Mahdi Elmandjra

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Mar 2005 - mixtures as outcome variables

[R] mixtures as outcome variables

[R] mixtures as outcome variables

[R] mixtures as outcome variables

[R] mixtures as outcome variables

Apparently Analagous Threads