Hello, I'm trying to do a little rlm of some data that looks like this: UNIT COHORT perdo adjodds 1010 96 0.39890 1.06894 1010 97 0.48113 1.57500 1010 98 0.36328 1.21498 1010 99 0.44391 1.38608 It works fine like this: rlm(perdo ~ COHORT, psi=psisquare) But the problem is that I have about 100 UNITs, and I want to do a separate rlm for each one. I tried to use split and lapply but it didn't work at all. Is this possible? In addition, I'm trying to extract the t statistic for the slope coefficient and the degrees of freedom so I can put them into dt() to get the p-value. I can get the t from coef(summary(u))[2,3] (where u is my rlm object), but u$df.residual gives me NULL. Also, the help for summary.lm says it returns coefficients, which contains a 4x4 matrix including the p-values, but when I do summary(u)$coefficients I get: summary(u)$coefficients Value Std. Error t value (Intercept) 0.151756859 3.00972988 0.05042209 drops$COHORT[drops$UNIT == unit] 0.002769108 0.03086700 0.08971097 Any help with this, and on getting the degrees of freedom or the p-value would be much appreciated. -- Stuart Luppescu -=- s-luppescu .at. uchicago.edu University of Chicago -=- CCSR $B:MJ8$HCRF`H~$NIc(B -=- Kernel 2.6.8-gentoo-r3 "I don't remember debates. I don't think we spent a lot of time debating it. Maybe we did, but I don't remember." George W. Bush July 27, 1999 Referring to whether he had discussions about the Vietnam War while an undergraduate at Yale
On Mon, 11 Oct 2004, Stuart Luppescu wrote:> Hello, I'm trying to do a little rlm of some data that looks like this: > > UNIT COHORT perdo adjodds > 1010 96 0.39890 1.06894 > 1010 97 0.48113 1.57500 > 1010 98 0.36328 1.21498 > 1010 99 0.44391 1.38608 > > It works fine like this: rlm(perdo ~ COHORT, psi=psisquare) > But the problem is that I have about 100 UNITs, and I want to do a > separate rlm for each one. I tried to use split and lapply but it didn't > work at all. Is this possible? > > In addition, I'm trying to extract the t statistic for the slope > coefficient and the degrees of freedom so I can put them into dt() to > get the p-value. I can get the t from coef(summary(u))[2,3] (where u is > my rlm object), but u$df.residual gives me NULL. Also, the help for > summary.lm says it returns coefficients, which contains a 4x4 matrix > including the p-values, but when I do summary(u)$coefficients I get: > > summary(u)$coefficients > Value Std. Error t value > (Intercept) 0.151756859 3.00972988 0.05042209 > drops$COHORT[drops$UNIT == unit] 0.002769108 0.03086700 0.08971097 > > Any help with this, and on getting the degrees of freedom or the p-value > would be much appreciated.0) Do use extractor function like coef() and df.residual(). 1) for the 100 fits try using the default interface, not the formula one. 2) df.residual is not a relevant concept, and the coefficients are not t-distributed. Where did you read that it had? You can try a normal approximation, but beware it may be rough. If you look up MASS4 (the book this software supports) you will see better ideas illustrated. 3) I get in ?summary.rlm coefficients: A matrix with three columns, containing the coefficients, their standard errors and the corresponding t statistic. What has summary.lm got to do with rlm fits? -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Here's one example: [BTW, please tell us when you're using functions in contributed packages.]> library(MASS) > dat <- data.frame(unit=rep(1:5, each=10), y = rnorm(50), x1=rnorm(50),+ x2 = rnorm(50))> fit <- by(dat, dat$unit, function(dat) rlm(y ~ x1 + x2, dat,+ psi=psi.bisquare))> lapply(fit, function(m) coefficients(summary(m)))$"1" Value Std. Error t value (Intercept) -0.4353587 0.3657624 -1.190277 x1 0.7605763 0.2717837 2.798462 x2 0.2196208 0.3236791 0.678514 $"2" Value Std. Error t value (Intercept) -0.26731627 0.04553002 -5.8712088 x1 -0.12206044 0.05581878 -2.1867272 x2 0.01277407 0.04808985 0.2656293 $"3" Value Std. Error t value (Intercept) 0.07899741 0.4420049 0.1787252 x1 -0.14290441 0.4790551 -0.2983048 x2 -0.19731260 0.4246512 -0.4646463 $"4" Value Std. Error t value (Intercept) 0.4925298 0.3332106 1.478134 x1 0.4938363 0.3866812 1.277115 x2 -0.3475754 0.3281346 -1.059246 $"5" Value Std. Error t value (Intercept) -0.10928768 0.2582368 -0.4232073 x1 0.33719681 0.3028123 1.1135506 x2 0.04587871 0.2680164 0.1711788 For residual df, you need to look at the object returned by the summary() method for rlm. You should not be reading the help page for summary.lm when you're dealing with rlm objects.> names(summary(fit[[1]]))[1] "call" "residuals" "coefficients" "sigma" "stddev" [6] "df" "r.squared" "cov.unscaled" "correlation" "terms" HTH, Andy> From: Stuart Luppescu > > Hello, I'm trying to do a little rlm of some data that looks > like this: > > UNIT COHORT perdo adjodds > 1010 96 0.39890 1.06894 > 1010 97 0.48113 1.57500 > 1010 98 0.36328 1.21498 > 1010 99 0.44391 1.38608 > > It works fine like this: rlm(perdo ~ COHORT, psi=psisquare) > But the problem is that I have about 100 UNITs, and I want to do a > separate rlm for each one. I tried to use split and lapply > but it didn't > work at all. Is this possible? > > In addition, I'm trying to extract the t statistic for the slope > coefficient and the degrees of freedom so I can put them into dt() to > get the p-value. I can get the t from coef(summary(u))[2,3] > (where u is > my rlm object), but u$df.residual gives me NULL. Also, the help for > summary.lm says it returns coefficients, which contains a 4x4 matrix > including the p-values, but when I do summary(u)$coefficients I get: > > summary(u)$coefficients > Value Std. Error t value > (Intercept) 0.151756859 3.00972988 0.05042209 > drops$COHORT[drops$UNIT == unit] 0.002769108 0.03086700 0.08971097 > > Any help with this, and on getting the degrees of freedom or > the p-value > would be much appreciated. > > -- > Stuart Luppescu -=- s-luppescu .at. uchicago.edu > University of Chicago -=- CCSR > $B:MJ8$HCRF`H~$NIc(B -=- Kernel 2.6.8-gentoo-r3 > "I don't remember debates. I don't think we spent > a lot of time debating it. Maybe we did, but I > don't remember." George W. Bush July 27, 1999 > Referring to whether he had discussions about the > Vietnam War while an undergraduate at Yale > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Stuart Luppescu <s-luppescu at uchicago.edu> writes:> Hello, I'm trying to do a little rlm of some data that looks like this: > > UNIT COHORT perdo adjodds > 1010 96 0.39890 1.06894 > 1010 97 0.48113 1.57500 > 1010 98 0.36328 1.21498 > 1010 99 0.44391 1.38608 > > It works fine like this: rlm(perdo ~ COHORT, psi=psisquare) > But the problem is that I have about 100 UNITs, and I want to do a > separate rlm for each one. I tried to use split and lapply but it didn't > work at all. Is this possible?How about by()?> In addition, I'm trying to extract the t statistic for the slope > coefficient and the degrees of freedom so I can put them into dt() topt(), I would hope....> get the p-value. I can get the t from coef(summary(u))[2,3] (where u is > my rlm object), but u$df.residual gives me NULL. Also, the help for > summary.lm says it returns coefficients, which contains a 4x4 matrix > including the p-values, but when I do summary(u)$coefficients I get: > > summary(u)$coefficients > Value Std. Error t value > (Intercept) 0.151756859 3.00972988 0.05042209 > drops$COHORT[drops$UNIT == unit] 0.002769108 0.03086700 0.08971097 > > Any help with this, and on getting the degrees of freedom or the p-value > would be much appreciated.As Brian usually puts it: This is support software for a book. You might want to read it... Looking at the code, it seems to explicitly set the residual DF to NA, which - along with the absense of p values - usually suggests that the author(s) do not trust them and I think I can see why. Apparently the lm() methods which get invoked through inheritance have ways of sticking the residual DF back in, but that doesn't make them any more sensible. If you must, have a look inside print.summary.lm and see where "rdf" comes from, but I'd suspect that you're just as well off by using the normal approximation. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907