The r newbie Fred
2011-Mar-07 09:03 UTC
[R] Difference between the S-plus influence and R empinf functions
Hello everyone ! I am currently trying to convert a program from S-plus to R, and I am having some trouble with the S-plus function called "influence(data, statistic,...)". This function aims to "calculate empirical influence values and related quantities", and is part of the Resample library that I cannot find for R. However, 2 similar functions are available in R: - the lm.influence(model, ...) function, - the empinf(data, statistic,...)" function. I didn't manage to use the lm.influence() function correctly, because it needs a linear model as input (lm, glm), and what I have as input is a function (I don't know well R/S-plus languages, so I may be mistaken, but I believe lm.influence() is not what I should use for my problem ...?) I have tried to use the R empinf() instead but I am stucked with a problem concerning the input argument "group" that I cannot translate in R. Here is a copy of the S-plus "influence()" help concerning this argument: group : vector of length equal to the number of observations in data, for stratified sampling or multiple-sample problems. Sampling is done separately for each group (determined by unique values of this vector). If data is a data frame, this may be a variable in the data frame, or expression involving such variables. empinf() accepts an argument called "strata" but it doesn't seem to correspond to "group". Below is a sample test showing my problem: "testinflu" = function(data, weights) { sum(data[,1]*weights) } mydata <- cbind(c(1,2,3,4,5), c(1,1,1,1,0)) # In S-plus :> testinflu(data=mydata, weights=rep(1,length(mydata[,1])))15 # In R:> testinflu(data=mydata, weights=rep(1,length(mydata[,1])))15 # In S-plus :> influence(data = mydata, statistic=testinflu)$Ltestinflu [1,] -2.000000e+000 [2,] -1.000000e+000 [3,] -1.776357e-013 [4,] 1.000000e+000 [5,] 2.000000e+000 # In R :> empinf(data = mydata, statistic=testinflu)[1] -2.000000e+00 -1.000000e+00 2.220446e-12 1.000000e+00 2.000000e+00 # ==> OK # In S-plus :> influence(data = mydata, statistic=testinflu, group = mydata[, 2])$Ltestinflu [1,] -1.2 [2,] -0.4 [3,] 0.4 [4,] 1.2 [5,] 0.0 # In R:> empinf(data = mydata, statistic=testinflu, strata = mydata[, 2])[1] -1.5 -0.5 0.5 1.5 0.0 # ==> NOT OK So I have a few questions: - has anyone already experienced the same kind of problem with the influence function ? - is it possible to mimic the use of the "group" argument in empinf() ? I have looked for answers on the web but couldn't find anythings really helpful, so if someone has an idea I would really appreciate it !! :) Thanks, Fred ps : sorry for my broken English ... [[alternative HTML version deleted]]