thr3ads.net - R help - [R] split and rlm [Oct 2004]

If this information is useful, please help other people find it:
Share via:

Stuart Luppescu

2004-Oct-11 14:27 UTC

[R] split and rlm

Hello, I'm trying to do a little rlm of some data that looks like this:

UNIT    COHORT     perdo     adjodds
 1010      96      0.39890    1.06894
 1010      97      0.48113    1.57500
 1010      98      0.36328    1.21498
 1010      99      0.44391    1.38608

It works fine like this: rlm(perdo ~ COHORT, psi=psisquare)
But the problem is that I have about 100 UNITs, and I want to do a
separate rlm for each one. I tried to use split and lapply but it didn't
work at all. Is this possible?

In addition, I'm trying to extract the t statistic for the slope
coefficient and the degrees of freedom so I can put them into dt() to
get the p-value. I can get the t from coef(summary(u))[2,3] (where u is
my rlm object), but u$df.residual gives me NULL. Also, the help for
summary.lm says it returns coefficients, which contains a 4x4 matrix
including the p-values, but when I do summary(u)$coefficients I get:

 summary(u)$coefficients
                                       Value Std. Error    t value
(Intercept)                      0.151756859 3.00972988 0.05042209
drops$COHORT[drops$UNIT == unit] 0.002769108 0.03086700 0.08971097

Any help with this, and on getting the degrees of freedom or the p-value
would be much appreciated. 

-- 
Stuart Luppescu -=- s-luppescu .at. uchicago.edu        
University of Chicago -=- CCSR 
$B:MJ8$HCRF`H~$NIc(B -=-    Kernel 2.6.8-gentoo-r3                
"I don't remember debates.  I don't think we spent 
 a lot of time debating it. Maybe we did, but I
 don't remember."  George W. Bush July 27, 1999
 Referring to whether he had discussions about the 
 Vietnam War while an undergraduate at Yale

Prof Brian Ripley

2004-Oct-11 14:54 UTC

head link

[R] split and rlm

On Mon, 11 Oct 2004, Stuart Luppescu wrote:
> Hello, I'm trying to do a little rlm of some data that looks like this:
> 
> UNIT    COHORT     perdo     adjodds
>  1010      96      0.39890    1.06894
>  1010      97      0.48113    1.57500
>  1010      98      0.36328    1.21498
>  1010      99      0.44391    1.38608
> 
> It works fine like this: rlm(perdo ~ COHORT, psi=psisquare)
> But the problem is that I have about 100 UNITs, and I want to do a
> separate rlm for each one. I tried to use split and lapply but it
didn't
> work at all. Is this possible?
> 
> In addition, I'm trying to extract the t statistic for the slope
> coefficient and the degrees of freedom so I can put them into dt() to
> get the p-value. I can get the t from coef(summary(u))[2,3] (where u is
> my rlm object), but u$df.residual gives me NULL. Also, the help for
> summary.lm says it returns coefficients, which contains a 4x4 matrix
> including the p-values, but when I do summary(u)$coefficients I get:
> 
>  summary(u)$coefficients
>                                        Value Std. Error    t value
> (Intercept)                      0.151756859 3.00972988 0.05042209
> drops$COHORT[drops$UNIT == unit] 0.002769108 0.03086700 0.08971097
> 
> Any help with this, and on getting the degrees of freedom or the p-value
> would be much appreciated. 
0) Do use extractor function like coef() and df.residual().

1) for the 100 fits try using the default interface, not the formula one.

2) df.residual is not a relevant concept, and the coefficients are not 
t-distributed.  Where did you read that it had?

You can try a normal approximation, but beware it may be rough.  If you
look up MASS4 (the book this software supports) you will see better ideas
illustrated.

3) I get in ?summary.rlm

coefficients: A matrix with three columns, containing the coefficients,
          their standard errors and the corresponding t statistic.

What has summary.lm got to do with rlm fits?

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Liaw, Andy

2004-Oct-11 14:56 UTC

head link

[R] split and rlm

Here's one example:

[BTW, please tell us when you're using functions in contributed packages.]
> library(MASS)
> dat <- data.frame(unit=rep(1:5, each=10), y = rnorm(50), x1=rnorm(50),
+                   x2 = rnorm(50))> fit <- by(dat, dat$unit, function(dat) rlm(y ~ x1 + x2, dat,+                                           
psi=psi.bisquare))> lapply(fit, function(m) coefficients(summary(m)))$"1"
                 Value Std. Error   t value
(Intercept) -0.4353587  0.3657624 -1.190277
x1           0.7605763  0.2717837  2.798462
x2           0.2196208  0.3236791  0.678514

$"2"
                  Value Std. Error    t value
(Intercept) -0.26731627 0.04553002 -5.8712088
x1          -0.12206044 0.05581878 -2.1867272
x2           0.01277407 0.04808985  0.2656293

$"3"
                  Value Std. Error    t value
(Intercept)  0.07899741  0.4420049  0.1787252
x1          -0.14290441  0.4790551 -0.2983048
x2          -0.19731260  0.4246512 -0.4646463

$"4"
                 Value Std. Error   t value
(Intercept)  0.4925298  0.3332106  1.478134
x1           0.4938363  0.3866812  1.277115
x2          -0.3475754  0.3281346 -1.059246

$"5"
                  Value Std. Error    t value
(Intercept) -0.10928768  0.2582368 -0.4232073
x1           0.33719681  0.3028123  1.1135506
x2           0.04587871  0.2680164  0.1711788

For residual df, you need to look at the object returned by the summary()
method for rlm.  You should not be reading the help page for summary.lm when
you're dealing with rlm objects.
> names(summary(fit[[1]])) [1] "call"         "residuals"    "coefficients"
"sigma"        "stddev"

 [6] "df"           "r.squared"    "cov.unscaled"
"correlation"  "terms"


HTH,
Andy
> From: Stuart Luppescu
> 
> Hello, I'm trying to do a little rlm of some data that looks 
> like this:
> 
> UNIT    COHORT     perdo     adjodds
>  1010      96      0.39890    1.06894
>  1010      97      0.48113    1.57500
>  1010      98      0.36328    1.21498
>  1010      99      0.44391    1.38608
> 
> It works fine like this: rlm(perdo ~ COHORT, psi=psisquare)
> But the problem is that I have about 100 UNITs, and I want to do a
> separate rlm for each one. I tried to use split and lapply 
> but it didn't
> work at all. Is this possible?
> 
> In addition, I'm trying to extract the t statistic for the slope
> coefficient and the degrees of freedom so I can put them into dt() to
> get the p-value. I can get the t from coef(summary(u))[2,3] 
> (where u is
> my rlm object), but u$df.residual gives me NULL. Also, the help for
> summary.lm says it returns coefficients, which contains a 4x4 matrix
> including the p-values, but when I do summary(u)$coefficients I get:
> 
>  summary(u)$coefficients
>                                        Value Std. Error    t value
> (Intercept)                      0.151756859 3.00972988 0.05042209
> drops$COHORT[drops$UNIT == unit] 0.002769108 0.03086700 0.08971097
> 
> Any help with this, and on getting the degrees of freedom or 
> the p-value
> would be much appreciated. 
> 
> -- 
> Stuart Luppescu -=- s-luppescu .at. uchicago.edu        
> University of Chicago -=- CCSR 
> $B:MJ8$HCRF`H~$NIc(B -=-    Kernel 2.6.8-gentoo-r3                
> "I don't remember debates.  I don't think we spent 
>  a lot of time debating it. Maybe we did, but I
>  don't remember."  George W. Bush July 27, 1999
>  Referring to whether he had discussions about the 
>  Vietnam War while an undergraduate at Yale
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

Peter Dalgaard

2004-Oct-11 15:17 UTC

head link

[R] split and rlm

Stuart Luppescu <s-luppescu at uchicago.edu> writes:
> Hello, I'm trying to do a little rlm of some data that looks like this:
> 
> UNIT    COHORT     perdo     adjodds
>  1010      96      0.39890    1.06894
>  1010      97      0.48113    1.57500
>  1010      98      0.36328    1.21498
>  1010      99      0.44391    1.38608
> 
> It works fine like this: rlm(perdo ~ COHORT, psi=psisquare)
> But the problem is that I have about 100 UNITs, and I want to do a
> separate rlm for each one. I tried to use split and lapply but it
didn't
> work at all. Is this possible?
How about by()?

 > In addition, I'm trying to extract the t statistic for the slope
> coefficient and the degrees of freedom so I can put them into dt() to
pt(), I would hope....
> get the p-value. I can get the t from coef(summary(u))[2,3] (where u is
> my rlm object), but u$df.residual gives me NULL. Also, the help for
> summary.lm says it returns coefficients, which contains a 4x4 matrix
> including the p-values, but when I do summary(u)$coefficients I get:
> 
>  summary(u)$coefficients
>                                        Value Std. Error    t value
> (Intercept)                      0.151756859 3.00972988 0.05042209
> drops$COHORT[drops$UNIT == unit] 0.002769108 0.03086700 0.08971097
> 
> Any help with this, and on getting the degrees of freedom or the p-value
> would be much appreciated. 

As Brian usually puts it: This is support software for a book. You
might want to read it...

Looking at the code, it seems to explicitly set the residual DF to NA,
which - along with the absense of p values - usually suggests that the
author(s) do not trust them and I think I can see why. Apparently the
lm() methods which get invoked through inheritance have ways of
sticking the residual DF back in, but that doesn't make them any more
sensible. If you must, have a look inside print.summary.lm and see
where "rdf" comes from, but I'd suspect that you're just as
well off
by using the normal approximation.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

R help - Oct 2004 - split and rlm

[R] split and rlm

[R] split and rlm

[R] split and rlm

[R] split and rlm