Robert A. LaBudde
2011-Jun-05 04:31 UTC
[R] How to convert a factor column into a numeric one?
I have a data frame: > head(df) Time Temp Conc Repl Log10 1 0 -20 H 1 6.406547 2 2 -20 H 1 5.738683 3 7 -20 H 1 5.796394 4 14 -20 H 1 4.413691 5 0 4 H 1 6.406547 7 7 4 H 1 5.705433 > str(df) 'data.frame': 177 obs. of 5 variables: $ Time : Factor w/ 4 levels "0","2","7","14": 1 2 3 4 1 3 4 1 3 4 ... $ Temp : Factor w/ 4 levels "-20","4","25",..: 1 1 1 1 2 2 2 3 3 3 ... $ Conc : Factor w/ 3 levels "H","L","M": 1 1 1 1 1 1 1 1 1 1 ... $ Repl : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... $ Log10: num 6.41 5.74 5.8 4.41 6.41 ... > levels(df$Temp) [1] "-20" "4" "25" "45" > levels(df$Time) [1] "0" "2" "7" "14" As you can see, "Time" and "Temp" are currently factors, not numeric. I would like to change these columns into numerical ones. df$Time<- as.numeric(df$Time) doesn't work, as it changes to the factor level indices (1,2,3,4) instead of the values (0,2,7,14). There must be a direct way of doing this in R. I tried recode() in 'car': > df$Temp<- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) > head(df) Time Temp Conc Repl Freq 1 0 -20 H 1 6.406547 2 2 -20 H 1 5.738683 3 7 -20 H 1 5.796394 4 14 -20 H 1 4.413691 5 0 45 H 1 6.406547 7 7 45 H 1 5.705433 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as expected, although the result is numeric. The same happens if I use the order given by levels(df$Temp) instead of the sort order in the recode() 2nd argument. Any hints? ===============================================================Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"
Jorge Ivan Velez
2011-Jun-05 04:49 UTC
[R] How to convert a factor column into a numeric one?
Dr. LaBudde, Perhaps as.numeric(as.character(x)) is what you are looking for. HTH, Jorge On Sun, Jun 5, 2011 at 12:31 AM, Robert A. LaBudde <> wrote:> I have a data frame: > > > head(df) > Time Temp Conc Repl Log10 > 1 0 -20 H 1 6.406547 > 2 2 -20 H 1 5.738683 > 3 7 -20 H 1 5.796394 > 4 14 -20 H 1 4.413691 > 5 0 4 H 1 6.406547 > 7 7 4 H 1 5.705433 > > str(df) > 'data.frame': 177 obs. of 5 variables: > $ Time : Factor w/ 4 levels "0","2","7","14": 1 2 3 4 1 3 4 1 3 4 ... > $ Temp : Factor w/ 4 levels "-20","4","25",..: 1 1 1 1 2 2 2 3 3 3 ... > $ Conc : Factor w/ 3 levels "H","L","M": 1 1 1 1 1 1 1 1 1 1 ... > $ Repl : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... > $ Log10: num 6.41 5.74 5.8 4.41 6.41 ... > > levels(df$Temp) > [1] "-20" "4" "25" "45" > > levels(df$Time) > [1] "0" "2" "7" "14" > > As you can see, "Time" and "Temp" are currently factors, not numeric. > > I would like to change these columns into numerical ones. > > df$Time<- as.numeric(df$Time) > > doesn't work, as it changes to the factor level indices (1,2,3,4) instead > of the values (0,2,7,14). > > There must be a direct way of doing this in R. > > I tried recode() in 'car': > > > df$Temp<- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) > > head(df) > Time Temp Conc Repl Freq > 1 0 -20 H 1 6.406547 > 2 2 -20 H 1 5.738683 > 3 7 -20 H 1 5.796394 > 4 14 -20 H 1 4.413691 > 5 0 45 H 1 6.406547 > 7 7 45 H 1 5.705433 > > but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as > expected, although the result is numeric. The same happens if I use the > order given by levels(df$Temp) instead of the sort order in the recode() 2nd > argument. > > Any hints? > ===============================================================> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral@lcfltd.com > Least Cost Formulations, Ltd. URL: http://lcfltd.com/ > 824 Timberlake Drive Tel: 757-467-0954 > Virginia Beach, VA 23464-3239 Fax: 757-467-2947 > > "Vere scire est per causas scire" > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dennis Murphy
2011-Jun-05 04:49 UTC
[R] How to convert a factor column into a numeric one?
Hi: Try this:> dd <- data.frame(a = factor(rep(1:5, each = 4)),+ b = factor(rep(rep(1:2, each = 2), 5)), + y = rnorm(20))> str(dd)'data.frame': 20 obs. of 3 variables: $ a: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ... $ b: Factor w/ 2 levels "1","2": 1 1 2 2 1 1 2 2 1 1 ... $ y: num 0.6396 1.467 1.8403 -0.0915 0.2711 ...> de <- within(dd, {+ a <- as.numeric(as.character(a)) + b <- as.numeric(as.character(b)) + } )> str(de)'data.frame': 20 obs. of 3 variables: $ a: num 1 1 1 1 2 2 2 2 3 3 ... $ b: num 1 1 2 2 1 1 2 2 1 1 ... $ y: num 0.6396 1.467 1.8403 -0.0915 0.2711 ... HTH, Dennis On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde <ral at lcfltd.com> wrote:> I have a data frame: > >> head(df) > ?Time Temp Conc Repl ? ?Log10 > 1 ? ?0 ?-20 ? ?H ? ?1 6.406547 > 2 ? ?2 ?-20 ? ?H ? ?1 5.738683 > 3 ? ?7 ?-20 ? ?H ? ?1 5.796394 > 4 ? 14 ?-20 ? ?H ? ?1 4.413691 > 5 ? ?0 ? ?4 ? ?H ? ?1 6.406547 > 7 ? ?7 ? ?4 ? ?H ? ?1 5.705433 >> str(df) > 'data.frame': ? 177 obs. of ?5 variables: > ?$ Time : Factor w/ 4 levels "0","2","7","14": 1 2 3 4 1 3 4 1 3 4 ... > ?$ Temp : Factor w/ 4 levels "-20","4","25",..: 1 1 1 1 2 2 2 3 3 3 ... > ?$ Conc : Factor w/ 3 levels "H","L","M": 1 1 1 1 1 1 1 1 1 1 ... > ?$ Repl : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... > ?$ Log10: num ?6.41 5.74 5.8 4.41 6.41 ... >> levels(df$Temp) > [1] "-20" "4" ? "25" ?"45" >> levels(df$Time) > [1] "0" ?"2" ?"7" ?"14" > > As you can see, "Time" and "Temp" are currently factors, not numeric. > > I would like to change these columns into numerical ones. > > df$Time<- as.numeric(df$Time) > > doesn't work, as it changes to the factor level indices (1,2,3,4) instead of > the values (0,2,7,14). > > There must be a direct way of doing this in R. > > I tried recode() in 'car': > >> df$Temp<- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) >> head(df) > ?Time Temp Conc Repl ? ? Freq > 1 ? ?0 ?-20 ? ?H ? ?1 6.406547 > 2 ? ?2 ?-20 ? ?H ? ?1 5.738683 > 3 ? ?7 ?-20 ? ?H ? ?1 5.796394 > 4 ? 14 ?-20 ? ?H ? ?1 4.413691 > 5 ? ?0 ? 45 ? ?H ? ?1 6.406547 > 7 ? ?7 ? 45 ? ?H ? ?1 5.705433 > > but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as > expected, although the result is numeric. The same happens if I use the > order given by levels(df$Temp) instead of the sort order in the recode() 2nd > argument. > > Any hints? > ===============================================================> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS ?e-mail: ral at lcfltd.com > Least Cost Formulations, Ltd. ? ? ? ? ? ?URL: http://lcfltd.com/ > 824 Timberlake Drive ? ? ? ? ? ? ? ? ? ? Tel: 757-467-0954 > Virginia Beach, VA 23464-3239 ? ? ? ? ? ?Fax: 757-467-2947 > > "Vere scire est per causas scire" > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi Robert, Try this: ## Example data converting mtcars to factors testdf <- as.data.frame(lapply(mtcars, factor)) str(testdf) ## taking advantage of assignment methods to avoid an explicit call to as.data.frame ## convert factor to numeric using the technique recommended in ?factor testdf[] <- lapply(testdf, function(x) as.numeric(levels(x))[x]) str(testdf) If you do not want to convert all columns, just use a subset. Here is one way: testdf[, c("mpg", "cyl", "disp")] <- lapply(testdf[, c("mpg", "cyl", "disp")], function(x) as.numeric(levels(x))[x]) I would also look into *why* those numeric columns are being stored as factors in the first place. If you are reading the data in with read.table() or one of its wrapper functions (like read.csv), then it would be better to preempt the storage as a factor altogether rather than converting back to numeric. For example, perhaps something is being used to indicate missing data that R does not recognize (e.g., SAS uses "."). Specifying na.strings = ".", would fix this. See ?read.table for some of the options available. Hope this helps, Josh On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde <ral at lcfltd.com> wrote:> I have a data frame: > >> head(df) > ?Time Temp Conc Repl ? ?Log10 > 1 ? ?0 ?-20 ? ?H ? ?1 6.406547 > 2 ? ?2 ?-20 ? ?H ? ?1 5.738683 > 3 ? ?7 ?-20 ? ?H ? ?1 5.796394 > 4 ? 14 ?-20 ? ?H ? ?1 4.413691 > 5 ? ?0 ? ?4 ? ?H ? ?1 6.406547 > 7 ? ?7 ? ?4 ? ?H ? ?1 5.705433 >> str(df) > 'data.frame': ? 177 obs. of ?5 variables: > ?$ Time : Factor w/ 4 levels "0","2","7","14": 1 2 3 4 1 3 4 1 3 4 ... > ?$ Temp : Factor w/ 4 levels "-20","4","25",..: 1 1 1 1 2 2 2 3 3 3 ... > ?$ Conc : Factor w/ 3 levels "H","L","M": 1 1 1 1 1 1 1 1 1 1 ... > ?$ Repl : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... > ?$ Log10: num ?6.41 5.74 5.8 4.41 6.41 ... >> levels(df$Temp) > [1] "-20" "4" ? "25" ?"45" >> levels(df$Time) > [1] "0" ?"2" ?"7" ?"14" > > As you can see, "Time" and "Temp" are currently factors, not numeric. > > I would like to change these columns into numerical ones. > > df$Time<- as.numeric(df$Time) > > doesn't work, as it changes to the factor level indices (1,2,3,4) instead of > the values (0,2,7,14). > > There must be a direct way of doing this in R. > > I tried recode() in 'car': > >> df$Temp<- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) >> head(df) > ?Time Temp Conc Repl ? ? Freq > 1 ? ?0 ?-20 ? ?H ? ?1 6.406547 > 2 ? ?2 ?-20 ? ?H ? ?1 5.738683 > 3 ? ?7 ?-20 ? ?H ? ?1 5.796394 > 4 ? 14 ?-20 ? ?H ? ?1 4.413691 > 5 ? ?0 ? 45 ? ?H ? ?1 6.406547 > 7 ? ?7 ? 45 ? ?H ? ?1 5.705433 > > but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as > expected, although the result is numeric. The same happens if I use the > order given by levels(df$Temp) instead of the sort order in the recode() 2nd > argument. > > Any hints? > ===============================================================> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS ?e-mail: ral at lcfltd.com > Least Cost Formulations, Ltd. ? ? ? ? ? ?URL: http://lcfltd.com/ > 824 Timberlake Drive ? ? ? ? ? ? ? ? ? ? Tel: 757-467-0954 > Virginia Beach, VA 23464-3239 ? ? ? ? ? ?Fax: 757-467-2947 > > "Vere scire est per causas scire" > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/