The r newbie Fred
2011-Mar-07 09:03 UTC
[R] Difference between the S-plus influence and R empinf functions
Hello everyone !
I am currently trying to convert a program from S-plus to R, and I am
having some trouble with the S-plus function called "influence(data,
statistic,...)".
This function aims to "calculate empirical influence values and related
quantities",
and is part of the Resample library that I cannot find for R.
However, 2 similar functions are available in R:
- the lm.influence(model, ...) function,
- the empinf(data, statistic,...)" function.
I didn't manage to use the lm.influence() function correctly, because it
needs a
linear model
as input (lm, glm), and what I have as input is a function (I don't know
well
R/S-plus languages,
so I may be mistaken, but I believe lm.influence() is not what I should use for
my problem ...?)
I have tried to use the R empinf() instead but I am stucked with a problem
concerning the
input argument "group" that I cannot translate in R.
Here is a copy of the S-plus "influence()" help concerning this
argument:
group : vector of length equal to the number of observations in data, for
stratified sampling or
multiple-sample problems. Sampling is done separately for each group (determined
by unique values
of this vector). If data is a data frame, this may be a variable in the data
frame, or expression
involving such variables.
empinf() accepts an argument called "strata" but it doesn't seem
to correspond
to "group".
Below is a sample test showing my problem:
"testinflu" = function(data, weights) { sum(data[,1]*weights) }
mydata <- cbind(c(1,2,3,4,5), c(1,1,1,1,0))
# In S-plus :> testinflu(data=mydata, weights=rep(1,length(mydata[,1])))
15
# In R:> testinflu(data=mydata, weights=rep(1,length(mydata[,1])))
15
# In S-plus : > influence(data = mydata, statistic=testinflu)$L
testinflu
[1,] -2.000000e+000
[2,] -1.000000e+000
[3,] -1.776357e-013
[4,] 1.000000e+000
[5,] 2.000000e+000
# In R :> empinf(data = mydata, statistic=testinflu)
[1] -2.000000e+00 -1.000000e+00 2.220446e-12 1.000000e+00 2.000000e+00
# ==> OK
# In S-plus :> influence(data = mydata, statistic=testinflu, group = mydata[, 2])$L
testinflu
[1,] -1.2
[2,] -0.4
[3,] 0.4
[4,] 1.2
[5,] 0.0
# In R:> empinf(data = mydata, statistic=testinflu, strata = mydata[, 2])
[1] -1.5 -0.5 0.5 1.5 0.0
# ==> NOT OK
So I have a few questions:
- has anyone already experienced the same kind of problem with the influence
function ?
- is it possible to mimic the use of the "group" argument in empinf()
?
I have looked for answers on the web but couldn't find anythings really
helpful,
so if someone has an idea I would really appreciate it !! :)
Thanks,
Fred
ps : sorry for my broken English ...
[[alternative HTML version deleted]]
