Chris Beeley
2012-Jan-09 14:11 UTC
[R] as.numeric() generates NAs inside an apply call, but fine outside of it
Hello- I have rather a messy SPSS file which I have imported to R, I've dput'd some of the columns at the end of this message. I wish to get rid of all the labels and have numeric values using as.numeric. The funny thing is it works like this: as.numeric(mydata[,2]) # generates correct numbers however, if I pass the whole dataframe at once like this: apply(mydata, 1:2, function(x) as.numeric(x)) This same column, column 2, generates NAs with a "in FUN(newX[, i], ...) : NAs introduced by coercion" message. Meanwhile column 3 works fine like this: as.numeric(mydata[,3]) # generates correct numbers And generates numeric results out of the apply function. I think I basically know why, the str() command tells me that the variables which work okay are "labelled" whereas the ones that don't are "Factor". However, I can't figure out what's special about the apply call that generates the NAs when as.numeric(mydata[,2]) doesn't and I'm not sure what to do about it in future. I realise I can just loop over the columns, but I would rather get to the bottom of this if I can so I know for future. Thanks in advance for any advice Chris Beeley Institute of Mental Health, UK dput() gives- structure(list(id = structure(1:79, label = structure("Participant", .Names = "id"), class = "labelled"), item2.jan11 = structure(c(4L, 3L, 6L, 4L, 6L, 6L, 2L, 6L, 2L, 2L, 3L, 3L, 1L, 6L, 2L, 6L, 4L, 2L, 6L, 2L, 6L, 6L, 6L, 4L, 4L, 6L, 2L, 6L, 2L, 6L, 2L, 3L, 6L, 6L, 3L, 6L, 5L, 6L, 3L, 6L, 1L, 3L, 3L, 3L, 6L, 4L, 1L, 3L, 6L, 2L, 6L, 2L, 6L, 6L, 6L, 4L, 3L, 6L, 6L, 6L, 6L, 6L, 3L, 6L, 2L, 6L, 6L, 2L, 4L, 6L, 2L, 5L, 6L, 6L, 6L, 6L, 1L, 6L, 4L), .Label = c("Not at all", "a little", "somewhat", "quite a lot", "very much", "missing data" ), class = c("labelled", "factor"), label = structure("The patients care for each other", .Names = "item2_jan11")), item12.jan11 = structure(c(5L, 5L, 999L, 5L, 999L, 999L, 2L, 999L, 5L, 2L, 5L, 3L, 3L, 999L, 2L, 999L, 5L, 5L, 999L, 5L, 999L, 999L, 999L, 5L, 5L, 999L, 3L, 999L, 5L, 999L, 3L, 4L, 999L, 999L, 4L, 999L, 5L, 999L, 5L, 999L, 3L, 5L, 4L, 4L, 999L, 3L, 2L, 4L, 999L, 5L, 999L, 5L, 999L, 999L, 999L, 4L, 5L, 999L, 999L, 999L, 999L, 999L, 4L, 999L, 3L, 999L, 999L, 1L, 5L, 999L, 3L, 5L, 999L, 999L, 999L, 999L, 4L, 999L, 0L), value.labels = structure(c(999, 5, 4, 3, 2, 1), .Names = c("missing data", "very much", "quite a lot", "somewhat", "a little", "Not at all" )), label = structure("At times, members of staff are afraid of some of the patients", .Names = "item12_jan11"), class = "labelled")), .Names = c("id", "item2.jan11", "item12.jan11"), class = "data.frame", row.names = c(NA, -79L))
peter dalgaard
2012-Jan-09 14:29 UTC
[R] as.numeric() generates NAs inside an apply call, but fine outside of it
On Jan 9, 2012, at 15:11 , Chris Beeley wrote:> Hello- > > I have rather a messy SPSS file which I have imported to R, I've dput'd some of the columns at the end of this message. I wish to get rid of all the labels and have numeric values using as.numeric. The funny thing is it works like this: > > as.numeric(mydata[,2]) # generates correct numbers > > however, if I pass the whole dataframe at once like this: > > apply(mydata, 1:2, function(x) as.numeric(x))This is your problem. apply(mydata,....) implies as.matrix(mydata) and that turns everything to character, and in the case of a factor column that means the _levels_. I.e., this effect:> as.matrix(mydata)id item2.jan11 item12.jan11 [1,] " 1" "quite a lot" " 5" [2,] " 2" "somewhat" " 5" [3,] " 3" "missing data" "999" [4,] " 4" "quite a lot" " 5" .... You might be looking for as.data.frame(lapply(mydata, as.numeric)) -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Chris Beeley
2012-Jan-09 14:35 UTC
[R] as.numeric() generates NAs inside an apply call, but fine outside of it
Perfect, many thanks for explanation and correct line of code. On 09/01/2012 14:29, peter dalgaard wrote:> as.data.frame(lapply(mydata, as.numeric))
Petr PIKAL
2012-Jan-09 14:41 UTC
[R] as.numeric() generates NAs inside an apply call, but fine outside of it
Hi> Hello- > > I have rather a messy SPSS file which I have imported to R, I've dput'd > some of the columns at the end of this message. I wish to get rid of all> the labels and have numeric values using as.numeric. The funny thing is > it works like this: > > as.numeric(mydata[,2]) # generates correct numbers > > however, if I pass the whole dataframe at once like this: > > apply(mydata, 1:2, function(x) as.numeric(x)) > > This same column, column 2, generates NAs with a "in FUN(newX[, i], ...)> : NAs introduced by coercion" message. > > Meanwhile column 3 works fine like this: > > as.numeric(mydata[,3]) # generates correct numbers > > And generates numeric results out of the apply function. > > I think I basically know why, the str() command tells me that the > variables which work okay are "labelled" whereas the ones that don't are> "Factor". However, I can't figure out what's special about the apply > call that generates the NAs when as.numeric(mydata[,2]) doesn't and I'm > not sure what to do about it in future.Details section of apply help page tells you that an input object is coerced to matrix which can have only values of one type therefore it is transformed probably to nonumeric values which can not be coerced to numeric. If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array. The column two was at first factor which is numeric vector with values - therefore as.numeric(mydata[,2]) works Then it was changed to character inside apply and the other columns were converted too. It is possible to change character values to numeric if they are numeric, see> as.numeric(letters)[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [26] NA Warning message: NAs introduced by coercion> as.numeric(as.character(1:10))[1] 1 2 3 4 5 6 7 8 9 10 If you really want to change factor values in data frame to underlaying numeric code use sapply(mydata, as.numeric) Regards Petr> > I realise I can just loop over the columns, but I would rather get to > the bottom of this if I can so I know for future. > > Thanks in advance for any advice > > Chris Beeley > Institute of Mental Health, UK > > dput() gives- > > structure(list(id = structure(1:79, label = structure("Participant", > .Names = "id"), class = "labelled"), > item2.jan11 = structure(c(4L, 3L, 6L, 4L, 6L, 6L, 2L, 6L, > 2L, 2L, 3L, 3L, 1L, 6L, 2L, 6L, 4L, 2L, 6L, 2L, 6L, 6L, 6L, > 4L, 4L, 6L, 2L, 6L, 2L, 6L, 2L, 3L, 6L, 6L, 3L, 6L, 5L, 6L, > 3L, 6L, 1L, 3L, 3L, 3L, 6L, 4L, 1L, 3L, 6L, 2L, 6L, 2L, 6L, > 6L, 6L, 4L, 3L, 6L, 6L, 6L, 6L, 6L, 3L, 6L, 2L, 6L, 6L, 2L, > 4L, 6L, 2L, 5L, 6L, 6L, 6L, 6L, 1L, 6L, 4L), .Label = c("Not atall",> "a little", "somewhat", "quite a lot", "very much", "missing data" > ), class = c("labelled", "factor"), label = structure("The patients> care for each other", .Names = "item2_jan11")), > item12.jan11 = structure(c(5L, 5L, 999L, 5L, 999L, 999L, > 2L, 999L, 5L, 2L, 5L, 3L, 3L, 999L, 2L, 999L, 5L, 5L, 999L, > 5L, 999L, 999L, 999L, 5L, 5L, 999L, 3L, 999L, 5L, 999L, 3L, > 4L, 999L, 999L, 4L, 999L, 5L, 999L, 5L, 999L, 3L, 5L, 4L, > 4L, 999L, 3L, 2L, 4L, 999L, 5L, 999L, 5L, 999L, 999L, 999L, > 4L, 5L, 999L, 999L, 999L, 999L, 999L, 4L, 999L, 3L, 999L, > 999L, 1L, 5L, 999L, 3L, 5L, 999L, 999L, 999L, 999L, 4L, 999L, > 0L), value.labels = structure(c(999, 5, 4, 3, 2, 1), .Names =c("missing data",> "very much", "quite a lot", "somewhat", "a little", "Not at all" > )), label = structure("At times, members of staff are afraid ofsome> of the patients", .Names = "item12_jan11"), class = "labelled")), .Names= c("id",> "item2.jan11", "item12.jan11"), class = "data.frame", row.names = c(NA, > -79L)) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.