Jeff Brown
2010-Apr-30 15:08 UTC
[R] Why do data frame column types vary across apply, lapply?
Hi, I still have little ability to predict how these functions will treat the columns of data frames:> # Here's a data frame with a column "a" of integers, > # and a column "b" of characters: > df <- data.frame(+ a = 1:2, + b = c("a","b") + )> dfa b 1 1 a 2 2 b> > # Except -- both columns are characters: > apply (df, 2, typeof)a b "character" "character"> > # Except -- they're both integers: > lapply (df, typeof)$a [1] "integer" $b [1] "integer"> > # Except -- only one of those integers is numeric: > lapply (df, is.numeric)$a [1] TRUE $b [1] FALSE Many thanks, Jeff -- View this message in context: http://r.789695.n4.nabble.com/Why-do-data-frame-column-types-vary-across-apply-lapply-tp2077054p2077054.html Sent from the R help mailing list archive at Nabble.com.
Henrique Dallazuanna
2010-Apr-30 15:42 UTC
[R] Why do data frame column types vary across apply, lapply?
Hi, On Fri, Apr 30, 2010 at 12:08 PM, Jeff Brown <dopethatwantscash@yahoo.com>wrote:> > Hi, > > I still have little ability to predict how these functions will treat the > columns of data frames: > > > # Here's a data frame with a column "a" of integers, > > # and a column "b" of characters: > > df <- data.frame( > + a = 1:2, > + b = c("a","b") > + ) > > df > a b > 1 1 a > 2 2 b > > > > # Except -- both columns are characters: > > apply (df, 2, typeof) > a b > "character" "character" >apply converts all to character> > > > # Except -- they're both integers: > > lapply (df, typeof) > $a > [1] "integer" > > $b > [1] "integer" > >data.frame has a argument 'stringsAsFactors', this converts character columns to factor columns. Factors are integers with labels> > > > # Except -- only one of those integers is numeric: > > lapply (df, is.numeric) > $a > [1] TRUE > > $b > [1] FALSE >df$b is a factor.> > > Many thanks, > Jeff > -- > View this message in context: > http://r.789695.n4.nabble.com/Why-do-data-frame-column-types-vary-across-apply-lapply-tp2077054p2077054.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Erik Iverson
2010-Apr-30 15:45 UTC
[R] Why do data frame column types vary across apply, lapply?
> > I still have little ability to predict how these functions will treat the > columns of data frames:All of this is explained by knowing what class of data functions *work on*, and what class of data *you have*.> >> # Here's a data frame with a column "a" of integers, >> # and a column "b" of characters: >> df <- data.frame( > + a = 1:2, > + b = c("a","b") > + ) >> df > a b > 1 1 a > 2 2 bFirst, let's see what we have? Use str(df) str(df) 'data.frame': 2 obs. of 2 variables: $ a: int 1 2 $ b: Factor w/ 2 levels "a","b": 1 2 So we have a data.frame with two variables, one of class integer and one of class factor. Notice how neither are of class character.>> # Except -- both columns are characters: >> apply (df, 2, typeof) > a b > "character" "character"See ?apply. The apply function works on *matrices*. You're not passing it a matrix, you're passing a data.frame. Matrices are two dimensional vectors and are of *ONE* type. So apply could either 1) report an error saying "give me a matrix" or 2) try to convert whatever you gave it to a matrix. Apply does (2), and converts it to the best thing it can, a character matrix. It can't be a numeric matrix since you have mixed types of data, so it goes to the "lowest common denominator", a matrix of characters. This is all explained in the first paragraph of ?apply.>> # Except -- they're both integers: >> lapply (df, typeof) > $a > [1] "integer" > > $b > [1] "integer"?typeof is probably not very useful for casual R use. I've never used it. More useful is ?class. ?typeof is showing you how R is storing this stuff low-level. Factors are just integer codes with labels, and you have an integer variable and a factor variable, thus ?typeof reports both integers. Try lapply(df, class)> >> # Except -- only one of those integers is numeric: >> lapply (df, is.numeric) > $a > [1] TRUE > > $b > [1] FALSEYes, because you have a factor, and in the first 3 paragraphs of ?as.numeric, you'd see: Factors are handled by the default method, and there are methods for classes ?"Date"? and ?"POSIXt"? (in all three cases the result is false). Methods for ?is.numeric? should only return true if the base type of the class is ?double? or ?integer? _and_ values can reasonably be regarded as numeric (e.g. arithmetic on them makes sense). See, it all makes perfect sense :). My advice? Don't worry about typeof. *Always* know what class your objects are, and what class the functions you're using expect. Use ?str liberally.
Erik Iverson
2010-Apr-30 15:52 UTC
[R] Why do data frame column types vary across apply, lapply?
> > See ?apply. The apply function works on *matrices*.Actually arrays, and matrices are arrays with 2 dimensions.> characters. This is all explained in the first paragraph of ?apply. >Also see ?as.matrix