Cecile De Cat
2012-Jul-31 10:26 UTC
[R] phantom NA/NaN/Inf in foreign function call (or something altogether different?)
Dear experts, Please forgive the puzzled title and the length of this message - I thought it would be best to be as complete as possible and to show the avenues I have explored. I'm trying to fit a linear model to data with a binary dependent variable (i.e. Target.ACC: accuracy of response) using lrm, and thought I would start from the most complex model (of which "sample1.lrm1" is a trimmed version). I got the error shown below. (sample1 is available at http://tinyurl.com/bwqq7ya) For info:> str(sample1)'data.frame': 14022 obs. of 5 variables: $ Target.ACC : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ... $ Word.Order : Factor w/ 2 levels "HeadMod*","ModHead": 1 1 1 1 1 1 1 1 1 1 ... $ Target.RESP: Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ... $ L1 : Factor w/ 3 levels "English","German",..: 2 2 2 2 2 2 2 2 2 2 ... $ Relation : Factor w/ 4 levels "For","From","MadeOf",..: 1 1 1 1 1 1 1 1 1 1 ... Commands and error message:> sample1.dd = datadist(sample1) > options(datadist="sample1.dd") > sample1.lrm = lrm(Target.ACC ~ (L1 + Relation + Target.RESP + Word.Order)^2, sample1, x=T, y=T)Error in lrm(Target.ACC ~ (L1 + Relation + Target.RESP + Word.Order)^2, : Unable to fit model using ?lrm.fit? So I tried to narrow down the error by looking at all the combinations manually, and the problem appears to be specifically with the interaction between Word.Order and Target.RESP. Models including interaction of these variables with other variables (e.g. L1, Relation) can be fitted without problem.> sample1.lrm = lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, sample1, x=T, y=T)Error in lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, dat, x = T, y = T) : Unable to fit model using ?lrm.fit? unproblematic:> sample1.lrm1 = lrm(Target.ACC ~ (L1 + Relation + Target.RESP)^2, sample1, x=T, y=T) > sample1.lrm2 = lrm(Target.ACC ~ (L1 + Relation + Word.Order)^2, sample1, x=T, y=T)When running the problematic analysis on a smaller sample of the same data, I get a different (more precise?) error message:> sample2 <- sample1[1:500,] > sample2.lrm = lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, sample2, x=T, y=T)Error in fitter(X, Y, penalty.matrix = penalty.matrix, tol = tol, weights = weights, : NA/NaN/Inf in foreign function call (arg 1) But I cannot find any NA in the data:> table(complete.cases(sample2))TRUE 500 Some portions of the data don't appear to contain any of the offending "bit":> sample3 <- sample1[12500:13000,] > sample3.lrm = lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, sample3, x=T, y=T)Could one of your shine your light on this puzzle, please? If that includes pointing me towards some background reading, that would be great too. Many thanks in advance. Cecile De Cat Linguistics - University of Leeds
R. Michael Weylandt
2012-Jul-31 14:23 UTC
[R] phantom NA/NaN/Inf in foreign function call (or something altogether different?)
On Tue, Jul 31, 2012 at 5:26 AM, Cecile De Cat <c.decat at leeds.ac.uk> wrote:> Dear experts, > > Please forgive the puzzled title and the length of this message - I > thought it would be best to be as complete as possible and to show the > avenues I have explored. > > I'm trying to fit a linear model to data with a binary dependent > variable (i.e. Target.ACC: accuracy of response) using lrm, and > thought I would start from the most complex model (of which > "sample1.lrm1" is a trimmed version). I got the error shown below. > (sample1 is available at http://tinyurl.com/bwqq7ya) > > For info: > >> str(sample1) > 'data.frame': 14022 obs. of 5 variables: > $ Target.ACC : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ... > $ Word.Order : Factor w/ 2 levels "HeadMod*","ModHead": 1 1 1 1 1 1 1 1 1 1 ... > $ Target.RESP: Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ... > $ L1 : Factor w/ 3 levels "English","German",..: 2 2 2 2 2 2 > 2 2 2 2 ... > $ Relation : Factor w/ 4 levels "For","From","MadeOf",..: 1 1 1 1 1 > 1 1 1 1 1 ... > > Commands and error message: > >> sample1.dd = datadist(sample1) >> options(datadist="sample1.dd") >> sample1.lrm = lrm(Target.ACC ~ (L1 + Relation + Target.RESP + Word.Order)^2, sample1, x=T, y=T) > Error in lrm(Target.ACC ~ (L1 + Relation + Target.RESP + Word.Order)^2, : > Unable to fit model using ?lrm.fit? > > So I tried to narrow down the error by looking at all the combinations > manually, and the problem appears to be specifically with the > interaction between Word.Order and Target.RESP. Models including > interaction of these variables with other variables (e.g. L1, > Relation) can be fitted without problem. > >> sample1.lrm = lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, sample1, x=T, y=T) > Error in lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, dat, x = T, y = T) : > Unable to fit model using ?lrm.fit? > > unproblematic: >> sample1.lrm1 = lrm(Target.ACC ~ (L1 + Relation + Target.RESP)^2, sample1, x=T, y=T) >> sample1.lrm2 = lrm(Target.ACC ~ (L1 + Relation + Word.Order)^2, sample1, x=T, y=T) > > When running the problematic analysis on a smaller sample of the same > data, I get a different (more precise?) error message: > >> sample2 <- sample1[1:500,] >> sample2.lrm = lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, sample2, x=T, y=T) > Error in fitter(X, Y, penalty.matrix = penalty.matrix, tol = tol, > weights = weights, : > NA/NaN/Inf in foreign function call (arg 1) > > But I cannot find any NA in the data: >> table(complete.cases(sample2)) > TRUE > 500Not a complete answer, but complete.cases() won't pick up +/- Inf. x <- data.frame(1:5, letters[1:5], c(NA, NaN, Inf, -Inf, 0)) x[complete.cases(x),] You could perhaps use something like sapply(x, is.finite) with any()/all() to hunt them down (is.finite requires "real" numbers: it gives false for NA, NaN, Inf, and -Inf). Best, Michael> > Some portions of the data don't appear to contain any of the offending "bit": >> sample3 <- sample1[12500:13000,] >> sample3.lrm = lrm(Target.ACC ~ (Target.RESP + Word.Order)^2, sample3, x=T, y=T) > > > Could one of your shine your light on this puzzle, please? If that > includes pointing me towards some background reading, that would be > great too. > > Many thanks in advance. > > Cecile De Cat > Linguistics - University of Leeds > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Cecile De Cat
2012-Aug-02 16:11 UTC
[R] phantom NA/NaN/Inf in foreign function call (or something altogether different?)
Sorry. I've used:> library(rms)I realise I still have a lot to learn to ask questions well - it took me a long time to compile this one, but I've obviously missed important things. Please see below for the session info.> sessionInfo()R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] rms_3.5-0 Hmisc_3.9-3 survival_2.36-12 loaded via a namespace (and not attached): [1] cluster_1.14.2 grid_2.15.0 lattice_0.20-6 tools_2.15.0 Many thanks for your help. Cecile On 2 August 2012 01:00, R. Michael Weylandt <michael.weylandt@gmail.com>wrote:> What package(s) are the functions in question from? > > This might also help: > > > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example > > Michael > > On Wed, Aug 1, 2012 at 2:57 AM, Cecile De Cat <c.decat@leeds.ac.uk> wrote: > > You're right, it's just the 2 columns that are characters that return > > false. But I don't use them in the analysis (it's the experiments' > > names and the participants' names). > > > > So I guess I'm back to my original question (although I can discard > > one possible cause thanks to you): there appears to be only "real > > numbers" in the data used for the lrm analysis, and yet it falls over. > > > > Thanks a lot for your help. > > > > Cecile > > > > > > On 31 July 2012 16:42, R. Michael Weylandt <michael.weylandt@gmail.com> > wrote: > >> What classes are the columns of your data frame? > >> > >> Note that > >> > >> is.finite("a") # False > >> is.finite(factor("a")) # True > >> > >> M > >> > >> On Tue, Jul 31, 2012 at 10:34 AM, Cecile De Cat <c.decat@leeds.ac.uk> > wrote: > >>> Thank you. This is very useful. I do indeed get the following: > >>>> table(sapply(dat, is.finite)) > >>> FALSE TRUE > >>> 28164 253476 > >>> > >>> But the number of observations returned baffles me, as there should > >>> only be 14082 in the data. And when I look at each variable > >>> individually, none appear to violate "is.finite": e.g. > >>> > >>>> table(sapply(dat$Proficiency, is.finite)) > >>> TRUE > >>> 14082 > >>> > >>> Sorry if this is a dumb question, but can you help me understand > >>> what's going on? > >>> > >>> Many thanks. > >>> > >>> Cecile >[[alternative HTML version deleted]]