R-users
E-mail: r-help@r-project.org
Hi! R-users.
I am just wondering what the definition of "dffits" in R language
is.
Let me show you an simple example.
function() {
library(MASS)
xx <- c(1,2,3,4,5)
yy <- c(1,3,4,2,4)
data1 <- data.frame(x=xx, y=yy)
lm.out <- lm(y~., data=data1, x=T)
lev1 <- lm.influence(lm.out)$hat
sig1 <- lm.influence(lm.out)$sigma
res1 <- residuals(lm.out)
ey <- fitted(lm.out)
py <- ey + res1/(1-lev1)
df1 <- dffits(lm.out, infl = lm.influence(lm.out))
df1 <- dffits(lm.out)
print("df1: dffits")
print(df1)
my_df1 <- (ey-py)/(sig1*sqrt(lev1))
print("my_df1")
print(my_df1)
my_df2 <- -lev1*(ey-py)/(sig1*sqrt(lev1))
print("my_df2")
print(my_df2)
}
[1] "df1: dffits"
1 2 3 4 5
-1.3333333 0.4082483 0.6000000 -1.0475699 0.2672612
[1] "my_df1"
1 2 3 4 5
2.2222222 -1.3608276 -3.0000000 3.4918995 -0.4454354
[1] "my_df2"
1 2 3 4 5
-1.3333333 0.4082483 0.6000000 -1.0475699 0.2672612
I think that "my_df1" is "dffits"(
http://en.wikipedia.org/wiki/DFFITS ),
but in R language, "my_df2" gives the difinition of
"dffits".
Please let me know why.
--
***** r.otasuke@gmail.com *****
http://cse.naro.affrc.go.jp/takezawa/intro.html
[[alternative HTML version deleted]]
Check out: http://en.wikipedia.org/wiki/DFFITS On Sun, Oct 19, 2008 at 1:26 AM, Kunio takezawa <r.otasuke at gmail.com> wrote:> R-users > E-mail: r-help at r-project.org > > Hi! R-users. > > I am just wondering what the definition of "dffits" in R language is. > Let me show you an simple example. > > function() { > library(MASS) > > xx <- c(1,2,3,4,5) > yy <- c(1,3,4,2,4) > > data1 <- data.frame(x=xx, y=yy) > lm.out <- lm(y~., data=data1, x=T) > lev1 <- lm.influence(lm.out)$hat > sig1 <- lm.influence(lm.out)$sigma > res1 <- residuals(lm.out) > > ey <- fitted(lm.out) > py <- ey + res1/(1-lev1) > > df1 <- dffits(lm.out, infl = lm.influence(lm.out)) > df1 <- dffits(lm.out) > print("df1: dffits") > print(df1) > > my_df1 <- (ey-py)/(sig1*sqrt(lev1)) > print("my_df1") > print(my_df1) > > my_df2 <- -lev1*(ey-py)/(sig1*sqrt(lev1)) > > print("my_df2") > print(my_df2) > } > > > [1] "df1: dffits" > 1 2 3 4 5 > -1.3333333 0.4082483 0.6000000 -1.0475699 0.2672612 > [1] "my_df1" > 1 2 3 4 5 > 2.2222222 -1.3608276 -3.0000000 3.4918995 -0.4454354 > [1] "my_df2" > 1 2 3 4 5 > -1.3333333 0.4082483 0.6000000 -1.0475699 0.2672612 > > I think that "my_df1" is "dffits"( http://en.wikipedia.org/wiki/DFFITS ), > but in R language, "my_df2" gives the difinition of "dffits". > Please let me know why. > > -- > ***** r.otasuke at gmail.com ***** > http://cse.naro.affrc.go.jp/takezawa/intro.html > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dear Kunio,
The approach in dffits() in R is equivalent to the definition of DFFITS_i in
Belsley, Kuh, and Welch, Regression Diagnostics (which is, I believe the
original source, or close to it), generalized to WLS. Possibly a more
transparent definition would be
dfs <- function(mod){
rs <- rstudent(mod)
h <- hatvalues(mod)
sqrt(h/(1 - h))*rs
}
I hope this helps,
John
------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org]
On> Behalf Of Kunio takezawa
> Sent: October-19-08 1:27 AM
> To: r-help at r-project.org
> Subject: [R] definition of "dffits"
>
> R-users
> E-mail: r-help at r-project.org
>
> Hi! R-users.
>
> I am just wondering what the definition of "dffits" in R
language is.
> Let me show you an simple example.
>
> function() {
> library(MASS)
>
> xx <- c(1,2,3,4,5)
> yy <- c(1,3,4,2,4)
>
> data1 <- data.frame(x=xx, y=yy)
> lm.out <- lm(y~., data=data1, x=T)
> lev1 <- lm.influence(lm.out)$hat
> sig1 <- lm.influence(lm.out)$sigma
> res1 <- residuals(lm.out)
>
> ey <- fitted(lm.out)
> py <- ey + res1/(1-lev1)
>
> df1 <- dffits(lm.out, infl = lm.influence(lm.out))
> df1 <- dffits(lm.out)
> print("df1: dffits")
> print(df1)
>
> my_df1 <- (ey-py)/(sig1*sqrt(lev1))
> print("my_df1")
> print(my_df1)
>
> my_df2 <- -lev1*(ey-py)/(sig1*sqrt(lev1))
>
> print("my_df2")
> print(my_df2)
> }
>
>
> [1] "df1: dffits"
> 1 2 3 4 5
> -1.3333333 0.4082483 0.6000000 -1.0475699 0.2672612
> [1] "my_df1"
> 1 2 3 4 5
> 2.2222222 -1.3608276 -3.0000000 3.4918995 -0.4454354
> [1] "my_df2"
> 1 2 3 4 5
> -1.3333333 0.4082483 0.6000000 -1.0475699 0.2672612
>
> I think that "my_df1" is "dffits"(
http://en.wikipedia.org/wiki/DFFITS ),
> but in R language, "my_df2" gives the difinition of
"dffits".
> Please let me know why.
>
> --
> ***** r.otasuke at gmail.com *****
> http://cse.naro.affrc.go.jp/takezawa/intro.html
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.