Lopez, Dan
2013-Feb-21 00:09 UTC
[R] Having trouble converting a dataframe of character vectors to factors
R Experts, I have a dataframe made up of character vectors--these are results from survey questions. I need to convert them to factors. I tried the following which did not work: scs2<-sapply(scs2,as.factor) also this didn't work: scs2<-sapply(scs2,function(x) as.factor(x)) After doing either of above I end up with>str(scs2)chr [1:10, 1:10] "very important" "very important" "very important" "very important" ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:10] "Q1_1" "Q1_2" "Q1_3" "Q1_4" ...>class(scs2)"matrix" But when I do it one at a time it works: scs2$Q1_1<-as.factor(scs2$Q1_1) scs2$Q1_2<- as.factor(scs2$Q1_2) What am I doing wrong? How do I accomplish this with sapply or similar function? Data for reproducibility: scs2<-structure(list(Q1_1 = c("very important", "very important", "very important", "very important", "very important", "very important", "very important", "somewhat important", "important", "very important"), Q1_2 = c("important", "somewhat important", "very important", "important", "important", "very important", "somewhat important", "somewhat important", "very important", "very important"), Q1_3 = c("very important", "important", "very important", "very important", "important", "very important", "very important", "somewhat important", "not important", "important"), Q1_4 = c("very important", "important", "very important", "very important", "important", "important", "important", "very important", "somewhat important", "important"), Q1_5 = c("very important", "not important", "important", "very important", "not important", "important", "somewhat important", "important", "somewhat important", "not important"), Q1_6 = c("very important", "not important", "important", "very important", "somewhat important", "very important", "very important", "very important", "important", "important"), Q1_7 = c("very important", "somewhat important", "important", "somewhat important", "important", "important", "very important", "very important", "somewhat important", "not important"), Q2 = c("Somewhat", "Very Much", "Somewhat", "Very Much", "Very Much", "Very Much", "Very Much", "Very Much", "Very Much", "Very Much"), Q3 = c("yes", "yes", "yes", "yes", "yes", "yes", "yes", "yes", "yes", "yes"), Q4 = c("None", "None", "None", "None", "Confirmed Field of Study", "Confirmed Field of Study", "Confirmed Field of Study", "None", "None", "None")), .Names = c("Q1_1", "Q1_2", "Q1_3", "Q1_4", "Q1_5", "Q1_6", "Q1_7", "Q2", "Q3", "Q4" ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L, 172L, 110L), class = "data.frame") [[alternative HTML version deleted]]
Bert Gunter
2013-Feb-21 00:24 UTC
[R] Having trouble converting a dataframe of character vectors to factors
Pleaser re-read ?sapply and pay particular attention to the "simplify" argument. The following should help explain the issues:> z <- data.frame(a=letters[1:3],b=letters[4:6],stringsAsFactors=FALSE) > sapply(z,class)a b "character" "character"> z1 <- sapply(z,as.factor) > sapply(z1,class)a b c d e f "character" "character" "character" "character" "character" "character"> z2 <- sapply(z,factor, simplify = FALSE) > sapply(z2,class)a b "factor" "factor"> z3 <- lapply(z,factor) > sapply(z3,class)a b "factor" "factor"> z3$a [1] a b c Levels: a b c $b [1] d e f Levels: d e f ## Note that both z2 and z3 are lists, and would have to be converted back to data frames. -- Bert On Wed, Feb 20, 2013 at 4:09 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:> R Experts, > > I have a dataframe made up of character vectors--these are results from survey questions. I need to convert them to factors. > > I tried the following which did not work: > scs2<-sapply(scs2,as.factor) > also this didn't work: > scs2<-sapply(scs2,function(x) as.factor(x)) > > After doing either of above I end up with >>str(scs2) > > chr [1:10, 1:10] "very important" "very important" "very important" "very important" ... > > - attr(*, "dimnames")=List of 2 > > ..$ : NULL > > ..$ : chr [1:10] "Q1_1" "Q1_2" "Q1_3" "Q1_4" ... > >>class(scs2) > "matrix" > > But when I do it one at a time it works: > scs2$Q1_1<-as.factor(scs2$Q1_1) > scs2$Q1_2<- as.factor(scs2$Q1_2) > > What am I doing wrong? How do I accomplish this with sapply or similar function? > > Data for reproducibility: > > > scs2<-structure(list(Q1_1 = c("very important", "very important", "very important", > > "very important", "very important", "very important", "very important", > > "somewhat important", "important", "very important"), Q1_2 = c("important", > > "somewhat important", "very important", "important", "important", > > "very important", "somewhat important", "somewhat important", > > "very important", "very important"), Q1_3 = c("very important", > > "important", "very important", "very important", "important", > > "very important", "very important", "somewhat important", "not important", > > "important"), Q1_4 = c("very important", "important", "very important", > > "very important", "important", "important", "important", "very important", > > "somewhat important", "important"), Q1_5 = c("very important", > > "not important", "important", "very important", "not important", > > "important", "somewhat important", "important", "somewhat important", > > "not important"), Q1_6 = c("very important", "not important", > > "important", "very important", "somewhat important", "very important", > > "very important", "very important", "important", "important"), > > Q1_7 = c("very important", "somewhat important", "important", > > "somewhat important", "important", "important", "very important", > > "very important", "somewhat important", "not important"), > > Q2 = c("Somewhat", "Very Much", "Somewhat", "Very Much", > > "Very Much", "Very Much", "Very Much", "Very Much", "Very Much", > > "Very Much"), Q3 = c("yes", "yes", "yes", "yes", "yes", "yes", > > "yes", "yes", "yes", "yes"), Q4 = c("None", "None", "None", > > "None", "Confirmed Field of Study", "Confirmed Field of Study", > > "Confirmed Field of Study", "None", "None", "None")), .Names = c("Q1_1", > > "Q1_2", "Q1_3", "Q1_4", "Q1_5", "Q1_6", "Q1_7", "Q2", "Q3", "Q4" > > ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L, > > 172L, 110L), class = "data.frame") > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Mark Lamias
2013-Feb-21 02:50 UTC
[R] Having trouble converting a dataframe of character vectors to factors
How about this? scs2<-data.frame(lapply(scs2, factor)) ________________________________ From: "Lopez, Dan" <lopez235@llnl.gov> To: "R help (r-help@r-project.org)" <r-help@r-project.org> Sent: Wednesday, February 20, 2013 7:09 PM Subject: [R] Having trouble converting a dataframe of character vectors to factors R Experts, I have a dataframe made up of character vectors--these are results from survey questions. I need to convert them to factors. I tried the following which did not work: scs2<-sapply(scs2,as.factor) also this didn't work: scs2<-sapply(scs2,function(x) as.factor(x)) After doing either of above I end up with>str(scs2)chr [1:10, 1:10] "very important" "very important" "very important" "very important" ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:10] "Q1_1" "Q1_2" "Q1_3" "Q1_4" ...>class(scs2)"matrix" But when I do it one at a time it works: scs2$Q1_1<-as.factor(scs2$Q1_1) scs2$Q1_2<- as.factor(scs2$Q1_2) What am I doing wrong? How do I accomplish this with sapply or similar function? Data for reproducibility: scs2<-structure(list(Q1_1 = c("very important", "very important", "very important", "very important", "very important", "very important", "very important", "somewhat important", "important", "very important"), Q1_2 = c("important", "somewhat important", "very important", "important", "important", "very important", "somewhat important", "somewhat important", "very important", "very important"), Q1_3 = c("very important", "important", "very important", "very important", "important", "very important", "very important", "somewhat important", "not important", "important"), Q1_4 = c("very important", "important", "very important", "very important", "important", "important", "important", "very important", "somewhat important", "important"), Q1_5 = c("very important", "not important", "important", "very important", "not important", "important", "somewhat important", "important", "somewhat important", "not important"), Q1_6 = c("very important", "not important", "important", "very important", "somewhat important", "very important", "very important", "very important", "important", "important"), Q1_7 = c("very important", "somewhat important", "important", "somewhat important", "important", "important", "very important", "very important", "somewhat important", "not important"), Q2 = c("Somewhat", "Very Much", "Somewhat", "Very Much", "Very Much", "Very Much", "Very Much", "Very Much", "Very Much", "Very Much"), Q3 = c("yes", "yes", "yes", "yes", "yes", "yes", "yes", "yes", "yes", "yes"), Q4 = c("None", "None", "None", "None", "Confirmed Field of Study", "Confirmed Field of Study", "Confirmed Field of Study", "None", "None", "None")), .Names = c("Q1_1", "Q1_2", "Q1_3", "Q1_4", "Q1_5", "Q1_6", "Q1_7", "Q2", "Q3", "Q4" ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L, 172L, 110L), class = "data.frame") [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Lopez, Dan
2013-Feb-21 22:50 UTC
[R] Having trouble converting a dataframe of character vectors to factors
Hi Bill, Great info. The problem is what was originally given to me looks like DPUT1 below (random sample of 25). This is the only format they can give me this in and the data already looks molten. So I applied reshape2::dcast which resulted in a dataframe made of character vectors; except for the first column which is an integer vector. So after dropping columns full of "" (blanks) and reordering columns I figured I needed factors to accomplish my goal (refer below) and converted everything to factors with:> x2[,-1]<-as.data.frame(lapply(x[,-1],as.factor))and ended up with DPUT2 below (random sample of 25) Now after reading your last email I figured I've done will since no attributes got dropped and no levels got dropped (just need to add some in because couldn't be derived from original dataframe) and column names seem fine. Now I have a new problem which is how to reorder levels in a dataframe and possible add some unused. After seeing contents using Hmisc::contents I figured the next logical step is to handle like vectors a chunk at a time. For example subsetting to grepl("Q1_",names(scs.c2)) gives these vectors which all have identical levels except for one: $Q1_1 thru $Q1_7 except $Q1_3 [1] "" "important" "not important" "somewhat important" "very important" $Q1_3 [1] "important" "not important" "somewhat important" "very important" #So I tried I tried this which had no effect keepcols<- grepl("Q1_",names(scs.c2)) levels(scs.c2[,keepcols])<-list(NoResp="",NotImportant="not important",SomewhatImpt="somewhat important",Important="important",VeryImpt="very important") #then this which also failed. It coerced a bunch of NA's and turned the vectors back to character vectors scs.c2[,keepcols]<-sapply(scs.c2[,keepcols],function(x) factor(x,levels(x)[c(NoResp="",NotImportant="not important",SomewhatImpt="somewhat important",Important="important",VeryImpt="very important")]) Mind you I can easily do this in MS Excel and is probably what I am going to break down and do fairly soon. But I wanted to give this a good solid shot in R because I want to learn to handle these situations in R. I've been using R for almost a year. __________________________ ADDITIONAL BACKGROUND MY GOAL I ultimately want to get started with some basic correlation analysis for some of the columns : taking your example (slightly modified) I hope to be able to do this xx <- data.frame(stringsAsFactors=FALSE, check.names=FALSE,"No/Yes" = factor(c("Yes","No","No","No"), levels=c("No","Yes")), "Size" = ordered(c("Small","Large","Medium","Medium"), levels=c("Small","Medium","Large")),"Name" = c("Adam","Bill","Chuck","Larry"))> cor(sapply(xx[,1:2],as.numeric))No/Yes Size No/Yes 1.0000000 -0.8164966 Size -0.8164966 1.0000000 DPUT1 structure(list(svaID = c(771L, 771L, 775L, 775L, 774L, 776L, 774L, 771L, 771L, 771L, 771L, 774L, 774L, 775L, 765L, 775L, 765L, 775L, 771L, 777L, 775L, 771L, 774L, 776L, 776L), question = structure(c(19L, 12L, 23L, 3L, 10L, 36L, 25L, 1L, 30L, 7L, 21L, 13L, 16L, 32L, 6L, 5L, 18L, 19L, 14L, 2L, 2L, 9L, 37L, 28L, 24L), .Label = c("Q1", "Q1_1", "Q1_2", "Q1_3", "Q1_4", "Q1_5", "Q1_6", "Q1_7", "Q10", "Q11", "Q12", "Q13", "Q14", "Q15", "Q16", "Q17", "Q17_1", "Q17_2", "Q17_3", "Q17_4", "Q17_5", "Q18", "Q19", "Q2", "Q20", "Q3", "Q4", "Q5", "Q6", "Q6_A_1", "Q6_A_2", "Q6_A_3", "Q6_A_4", "Q6_A_5", "Q7", "Q8", "Q9"), class = "factor"), answer = structure(c(11L, 29L, 29L, 26L, 29L, 29L, 1L, 1L, 1L, 13L, 11L, 1L, 1L, 1L, 26L, 26L, 11L, 11L, 29L, 13L, 13L, 29L, 29L, 29L, 27L), .Label = c("", "1", "2", "3", "4", "5", "Change of College/University", "Change of Field of Study", "Confirmed Field of Study", "did not meet expectations", "exceeded expectations", "Family/Friend", "important", "Live Locally", "LLNL Contact", "LLNL Housing page", "Local Newspaper", "met expectations", "no", "None", "Not at All", "not important", "Pursue an Advanced Degree", "Somewhat", "somewhat important", "very important", "Very Much", "Web", "yes"), class = "factor")), .Names = c("svaID", "question", "answer"), row.names = c(68L, 62L, 147L, 113L, 97L, 168L, 111L, 45L, 51L, 43L, 70L, 100L, 108L, 127L, 5L, 115L, 30L, 142L, 64L, 186L, 112L, 59L, 95L, 160L, 157L), class = "data.frame") DPUT2 structure(list(svaID = c(765L, 771L, 774L, 775L, 776L, 777L, 778L, 779L, 782L, 783L, 786L, 788L, 789L, 790L, 791L, 793L, 794L, 795L, 797L, 801L, 803L, 804L, 805L, 807L, 808L), Q1_1 = structure(c(5L, 5L, 5L, 2L, 5L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 2L, 2L, 2L), .Label = c("", "important", "not important", "somewhat important", "very important"), class = "factor"), Q1_2 = structure(c(2L, 5L, 2L, 5L, 2L, 4L, 3L, 5L, 4L, 2L, 2L, 5L, 2L, 3L, 5L, 2L, 2L, 5L, 5L, 5L, 5L, 2L, 1L, 2L, 3L ), .Label = c("", "important", "not important", "somewhat important", "very important"), class = "factor"), Q1_3 = structure(c(4L, 4L, 4L, 4L, 4L, 1L, 1L, 4L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 1L, 4L), .Label = c("important", "not important", "somewhat important", "very important"), class = "factor"), Q1_4 = structure(c(5L, 5L, 5L, 5L, 5L, 2L, 2L, 5L, 2L, 2L, 5L, 5L, 5L, 5L, 5L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L ), .Label = c("", "important", "not important", "somewhat important", "very important"), class = "factor"), Q1_5 = structure(c(5L, 3L, 5L, 5L, 3L, 2L, 2L, 3L, 2L, 3L, 5L, 5L, 5L, 4L, 4L, 5L, 5L, 5L, 3L, 3L, 3L, 5L, 2L, 2L, 4L), .Label = c("", "important", "not important", "somewhat important", "very important"), class = "factor"), Q1_6 = structure(c(5L, 2L, 2L, 2L, 5L, 2L, 4L, 5L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 2L, 5L, 2L, 4L, 2L, 4L, 5L, 2L, 4L, 4L ), .Label = c("", "important", "not important", "somewhat important", "very important"), class = "factor"), Q1_7 = structure(c(3L, 2L, 5L, 2L, 2L, 5L, 2L, 5L, 5L, 5L, 2L, 5L, 5L, 2L, 4L, 2L, 5L, 2L, 3L, 5L, 4L, 5L, 2L, 2L, 4L), .Label = c("", "important", "not important", "somewhat important", "very important"), class = "factor"), Q2 = structure(c(4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 4L ), .Label = c("", "Not at All", "Somewhat", "Very Much"), class = "factor"), Q3 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L ), .Label = c("", "yes"), class = "factor"), Q4 = structure(c(4L, 5L, 6L, 4L, 5L, 5L, 5L, 4L, 5L, 4L, 4L, 4L, 4L, 6L, 3L, 5L, 4L, 4L, 5L, 5L, 4L, 4L, 5L, 5L, 4L), .Label = c("", "Change of College/University", "Change of Field of Study", "Confirmed Field of Study", "None", "Pursue an Advanced Degree"), class = "factor"), Q5 = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "no", "yes"), class = "factor"), Q6 = structure(c(3L, 5L, 2L, 2L, 7L, 5L, 7L, 5L, 4L, 5L, 3L, 4L, 4L, 5L, 7L, 5L, 5L, 3L, 4L, 5L, 2L, 3L, 5L, 5L, 4L), .Label = c("", "Family/Friend", "Live Locally", "LLNL Contact", "LLNL Housing page", "Local Newspaper", "Web"), class = "factor"), Q6_A_1 = structure(c(1L, 1L, 1L, 6L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "1", "2", "3", "4", "5"), class = "factor"), Q6_A_2 = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "4", "5"), class = "factor"), Q6_A_3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "5"), class = "factor"), Q6_A_4 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "5"), class = "factor"), Q6_A_5 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L), .Label = c("", "2", "3", "4", "5"), class = "factor"), Q8 = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L ), .Label = c("", "no", "yes"), class = "factor"), Q9 = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "no", "yes"), class = "factor"), Q10 = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "no", "yes"), class = "factor"), Q11 = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L ), .Label = c("", "no", "yes"), class = "factor"), Q12 = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "no", "yes"), class = "factor"), Q13 = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 1L, 3L), .Label = c("", "no", "yes"), class = "factor"), Q14 = structure(c(3L, 1L, 1L, 3L, 2L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 2L ), .Label = c("", "no", "yes"), class = "factor"), Q15 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "yes" ), class = "factor"), Q16 = structure(c(4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 3L, 4L, 3L, 3L, 3L, 4L, 3L, 4L), .Label = c("", "did not meet expectations", "exceeded expectations", "met expectations"), class = "factor"), Q17_1 = structure(c(3L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 4L, 2L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L ), .Label = c("", "did not meet expectations", "exceeded expectations", "met expectations"), class = "factor"), Q17_2 = structure(c(3L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 4L, 3L, 4L, 3L, 3L, 3L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 4L, 3L, 3L, 4L), .Label = c("", "did not meet expectations", "exceeded expectations", "met expectations"), class = "factor"), Q17_3 = structure(c(3L, 3L, 4L, 3L, 3L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 4L, 3L, 3L, 3L, 3L, 3L, 4L ), .Label = c("", "did not meet expectations", "exceeded expectations", "met expectations"), class = "factor"), Q17_4 = structure(c(4L, 4L, 4L, 3L, 2L, 3L, 4L, 3L, 3L, 3L, 4L, 4L, 3L, 4L, 3L, 4L, 4L, 3L, 2L, 4L, 3L, 3L, 3L, 3L, 4L), .Label = c("", "did not meet expectations", "exceeded expectations", "met expectations"), class = "factor"), Q17_5 = structure(c(3L, 3L, 4L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 3L, 4L, 4L, 3L ), .Label = c("", "did not meet expectations", "exceeded expectations", "met expectations"), class = "factor"), Q18 = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 3L), .Label = c("", "no", "yes"), class = "factor"), Q19 = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "no", "yes"), class = "factor")), .Names = c("svaID", "Q1_1", "Q1_2", "Q1_3", "Q1_4", "Q1_5", "Q1_6", "Q1_7", "Q2", "Q3", "Q4", "Q5", "Q6", "Q6_A_1", "Q6_A_2", "Q6_A_3", "Q6_A_4", "Q6_A_5", "Q8", "Q9", "Q10", "Q11", "Q12", "Q13", "Q14", "Q15", "Q16", "Q17_1", "Q17_2", "Q17_3", "Q17_4", "Q17_5", "Q18", "Q19" ), row.names = c(NA, 25L), class = "data.frame") Thanks. Dan -----Original Message----- From: William Dunlap [mailto:wdunlap at tibco.com] Sent: Thursday, February 21, 2013 8:33 AM To: Mark Lamias; Lopez, Dan; R help (r-help at r-project.org) Subject: RE: [R] Having trouble converting a dataframe of character vectors to factors> scs2<-data.frame(lapply(scs2, factor))Calling data.frame() on the output of lapply() can result in changing column names and will drop attributes that the input data.frame may have had. I prefer to modify the original data.frame instead of making a new one from scratch to avoid these problems. Also, calling factor() on a factor will drop any unused levels, which you may not want to do. Calling as.factor will not. Compare the following three methods f1 <- function (dataFrame) { dataFrame[] <- lapply(dataFrame, factor) dataFrame } f2 <- function (dataFrame) { dataFrame[] <- lapply(dataFrame, as.factor) dataFrame } f3 <- function (dataFrame) { data.frame(lapply(dataFrame, factor)) } on the following data.frame x <- data.frame(stringsAsFactors=FALSE, check.names=FALSE, "No/Yes" = factor(c("Yes","Yes","Yes"), levels=c("No","Yes")), "Size" = ordered(c("Small","Large","Medium"), levels=c("Small","Medium","Large")), "Name" = c("Adam","Bill","Chuck")) attr(x, "Date") <- as.POSIXlt("2013-02-21") > str(x) 'data.frame': 3 obs. of 3 variables: $ No/Yes: Factor w/ 2 levels "No","Yes": 2 2 2 $ Size : Ord.factor w/ 3 levels "Small"<"Medium"<..: 1 3 2 $ Name : chr "Adam" "Bill" "Chuck" - attr(*, "Date")= POSIXlt, format: "2013-02-21" > str(f1(x)) # drops unused levels 'data.frame': 3 obs. of 3 variables: $ No/Yes: Factor w/ 1 level "Yes": 1 1 1 $ Size : Ord.factor w/ 3 levels "Small"<"Medium"<..: 1 3 2 $ Name : Factor w/ 3 levels "Adam","Bill",..: 1 2 3 - attr(*, "Date")= POSIXlt, format: "2013-02-21" > str(f2(x)) 'data.frame': 3 obs. of 3 variables: $ No/Yes: Factor w/ 2 levels "No","Yes": 2 2 2 $ Size : Ord.factor w/ 3 levels "Small"<"Medium"<..: 1 3 2 $ Name : Factor w/ 3 levels "Adam","Bill",..: 1 2 3 - attr(*, "Date")= POSIXlt, format: "2013-02-21" > str(f3(x)) # mangles column names, drops unused levels, drops Date attribute 'data.frame': 3 obs. of 3 variables: $ No.Yes: Factor w/ 1 level "Yes": 1 1 1 $ Size : Ord.factor w/ 3 levels "Small"<"Medium"<..: 1 3 2 $ Name : Factor w/ 3 levels "Adam","Bill",..: 1 2 3 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Mark Lamias > Sent: Wednesday, February 20, 2013 6:51 PM > To: Daniel Lopez; R help (r-help at r-project.org) > Subject: Re: [R] Having trouble converting a dataframe of character > vectors to factors > > How about this? > > scs2<-data.frame(lapply(scs2, factor)) > > > > > ________________________________ > From: "Lopez, Dan" <lopez235 at llnl.gov> > To: "R help (r-help at r-project.org)" <r-help at r-project.org> > Sent: Wednesday, February 20, 2013 7:09 PM > Subject: [R] Having trouble converting a dataframe of character > vectors to factors > > R Experts, > > I have a dataframe made up of character vectors--these are results > from survey questions. I need to convert them to factors. > > I tried the following which did not work: > scs2<-sapply(scs2,as.factor) > also this didn't work: > scs2<-sapply(scs2,function(x) as.factor(x)) > > After doing either of above I end up with > >str(scs2) > > chr [1:10, 1:10] "very important" "very important" "very important" "very important" ... > > - attr(*, "dimnames")=List of 2 > > ..$ : NULL > > ..$ : chr [1:10] "Q1_1" "Q1_2" "Q1_3" "Q1_4" ... > > >class(scs2) > "matrix" > > But when I do it one at a time it works: > scs2$Q1_1<-as.factor(scs2$Q1_1) > scs2$Q1_2<- as.factor(scs2$Q1_2) > > What am I doing wrong? How do I accomplish this with sapply or similar function? > > Data for reproducibility: > > > scs2<-structure(list(Q1_1 = c("very important", "very important", > "very important", > > "very important", "very important", "very important", "very > important", > > "somewhat important", "important", "very important"), Q1_2 = > c("important", > > "somewhat important", "very important", "important", "important", > > "very important", "somewhat important", "somewhat important", > > "very important", "very important"), Q1_3 = c("very important", > > "important", "very important", "very important", "important", > > "very important", "very important", "somewhat important", "not > important", > > "important"), Q1_4 = c("very important", "important", "very > important", > > "very important", "important", "important", "important", "very > important", > > "somewhat important", "important"), Q1_5 = c("very important", > > "not important", "important", "very important", "not important", > > "important", "somewhat important", "important", "somewhat important", > > "not important"), Q1_6 = c("very important", "not important", > > "important", "very important", "somewhat important", "very important", > > "very important", "very important", "important", "important"), > > Q1_7 = c("very important", "somewhat important", "important", > > "somewhat important", "important", "important", "very important", > > "very important", "somewhat important", "not important"), > > Q2 = c("Somewhat", "Very Much", "Somewhat", "Very Much", > > "Very Much", "Very Much", "Very Much", "Very Much", "Very Much", > > "Very Much"), Q3 = c("yes", "yes", "yes", "yes", "yes", "yes", > > "yes", "yes", "yes", "yes"), Q4 = c("None", "None", "None", > > "None", "Confirmed Field of Study", "Confirmed Field of Study", > > "Confirmed Field of Study", "None", "None", "None")), .Names = > c("Q1_1", > > "Q1_2", "Q1_3", "Q1_4", "Q1_5", "Q1_6", "Q1_7", "Q2", "Q3", "Q4" > > ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L, > > 172L, 110L), class = "data.frame") > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]]
Possibly Parallel Threads
- --copy-unsafe-links, links preserved in source tree or local directory?
- Confirmatory factor analysis using the sem package. TLI CFI and RMSEA absent from model summary.
- Breaking up a Row in R (transpose)
- disculpe las molestias ...ayuda con MICE
- ggplot2 legend