Hi, I want to get a clean succinct list of all levels for all my factor variables. I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list". BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish. 1. DATAFRAME> str(mydata)'data.frame': 11868 obs. of 26 variables: $ EMPLID : int 431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ... $ NAME : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ... $ TRAIN : int 1 1 1 1 1 1 1 1 1 1 ... $ TARGET : int 0 0 0 0 0 0 0 0 0 0 ... $ APPT_TYP_CD_LL : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ... $ ORG_NAM_LL : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ... $ NEW_DISCIPLINE : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ... $ SERIES : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ... $ AGE : int 62 53 46 62 55 59 50 36 34 53 ... $ SERVICE : int 13 29 16 26 18 9 19 11 8 26 ... $ AGE_SERVICE : int 75 82 62 87 73 69 69 47 42 79 ... $ HIEDUCLV : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ... $ GENDER : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ... $ RETCD : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ... $ FLSASTATUS : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ... $ MONTHLY_RT : int 17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ... $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ... $ ETHNIC_GRP_CD : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ... $ COMMUTE_BIN : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ... $ EEO_CLASS : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ... $ WRK_SCHED : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ... $ FWT_MAR_STATUS : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ... $ COVERED_DP : int 2 2 4 0 1 3 1 2 0 0 ... $ YRS_IN_SERIES : int 13 29 16 26 18 9 19 3 7 26 ... $ SAVINGS_PCT : int 10 0 6 19 8 0 10 15 15 18 ... $ Generation : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ... 2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]3. Get a list of all levels> sapply(mydataF,function(x)levels(x))$APPT_TYP_CD_LL [1] "FX" "IN" "IP" $ORG_NAM_LL [1] "Business" "Chief Financial Officer" "Chief Information Office" "Computation" "Engineering" "ESH and Quality" [7] "Facilities and Infrastructure" "Global Security" "NIF" "NO" "Office of the Director" "Operations and Business Office" [13] "Physical and Life Sciences" "Planning and Financial Services" "ST" "Security Organization" "Strategic Human Resources Mgmt" "WCI" $NEW_DISCIPLINE [1] "100s" "300s" "400s" "500s" "600s" "800s" "900s" [8] "Chem Science" "Engineering" "Life Sciences" "Math Computer Science IT" "Physics" "pre100s" "PSTS Other" [15] "Re" $SERIES ...... Daniel Lopez Workforce Analyst HRIM - Workforce Analytics & Metrics [[alternative HTML version deleted]]
On Tue, Oct 16, 2012 at 4:19 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:> Hi, > > I want to get a clean succinct list of all levels for all my factor variables. > > I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list". > > BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish. > > > > 1. DATAFRAME > >> str(mydata) > 'data.frame': 11868 obs. of 26 variables: > $ EMPLID : int 431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ... > $ NAME : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ... > $ TRAIN : int 1 1 1 1 1 1 1 1 1 1 ... > $ TARGET : int 0 0 0 0 0 0 0 0 0 0 ... > $ APPT_TYP_CD_LL : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ... > $ ORG_NAM_LL : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ... > $ NEW_DISCIPLINE : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ... > $ SERIES : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ... > $ AGE : int 62 53 46 62 55 59 50 36 34 53 ... > $ SERVICE : int 13 29 16 26 18 9 19 11 8 26 ... > $ AGE_SERVICE : int 75 82 62 87 73 69 69 47 42 79 ... > $ HIEDUCLV : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ... > $ GENDER : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ... > $ RETCD : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ... > $ FLSASTATUS : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ... > $ MONTHLY_RT : int 17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ... > $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ... > $ ETHNIC_GRP_CD : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ... > $ COMMUTE_BIN : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ... > $ EEO_CLASS : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ... > $ WRK_SCHED : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ... > $ FWT_MAR_STATUS : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ... > $ COVERED_DP : int 2 2 4 0 1 3 1 2 0 0 ... > $ YRS_IN_SERIES : int 13 29 16 26 18 9 19 3 7 26 ... > $ SAVINGS_PCT : int 10 0 6 19 8 0 10 15 15 18 ... > $ Generation : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ... > > 2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in) > >> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1] > > 3. Get a list of all levels > >> sapply(mydataF,function(x)levels(x)) >I think you want to unlist() the result of this call. RMW
HI, You can also try this: set.seed(1) dat1<-data.frame(col1=factor(sample(1:25,10,replace=TRUE)),col2=sample(letters[1:10],10,replace=TRUE),col3=factor(rep(1:5,each=2))) sapply(lapply(mapply(c,lapply(names(sapply(dat1,levels)),function(x) x),sapply(dat1,levels)),function(x) paste(x[1],":",paste(x[-1],collapse=" "))),print) #[1] "col1 : 2 6 7 10 15 16 17 23 24" #[1] "col2 : b c d e g h j" #[1] "col3 : 1 2 3 4 5" #[1] "col1 : 2 6 7 10 15 16 17 23 24" "col2 : b c d e g h j"????????? #[3] "col3 : 1 2 3 4 5"?? A.K. ?? ----- Original Message ----- From: "Lopez, Dan" <lopez235 at llnl.gov> To: "R help (r-help at r-project.org)" <r-help at r-project.org> Cc: Sent: Tuesday, October 16, 2012 11:19 AM Subject: [R] List of Levels for all Factor variables Hi, I want to get a clean succinct list of all levels for all my factor variables. I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list". BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish. 1.? ? ? DATAFRAME> str(mydata)'data.frame':? 11868 obs. of? 26 variables: $ EMPLID? ? ? ? ? : int? 431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ... $ NAME? ? ? ? ? ? : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ... $ TRAIN? ? ? ? ? : int? 1 1 1 1 1 1 1 1 1 1 ... $ TARGET? ? ? ? ? : int? 0 0 0 0 0 0 0 0 0 0 ... $ APPT_TYP_CD_LL? : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ... $ ORG_NAM_LL? ? ? : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ... $ NEW_DISCIPLINE? : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ... $ SERIES? ? ? ? ? : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ... $ AGE? ? ? ? ? ? : int? 62 53 46 62 55 59 50 36 34 53 ... $ SERVICE? ? ? ? : int? 13 29 16 26 18 9 19 11 8 26 ... $ AGE_SERVICE? ? : int? 75 82 62 87 73 69 69 47 42 79 ... $ HIEDUCLV? ? ? ? : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ... $ GENDER? ? ? ? ? : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ... $ RETCD? ? ? ? ? : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ... $ FLSASTATUS? ? ? : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ... $ MONTHLY_RT? ? ? : int? 17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ... $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ... $ ETHNIC_GRP_CD? : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ... $ COMMUTE_BIN? ? : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ... $ EEO_CLASS? ? ? : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ... $ WRK_SCHED? ? ? : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ... $ FWT_MAR_STATUS? : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ... $ COVERED_DP? ? ? : int? 2 2 4 0 1 3 1 2 0 0 ... $ YRS_IN_SERIES? : int? 13 29 16 26 18 9 19 3 7 26 ... $ SAVINGS_PCT? ? : int? 10 0 6 19 8 0 10 15 15 18 ... $ Generation? ? ? : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ... 2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]3. Get a list of all levels> sapply(mydataF,function(x)levels(x))$APPT_TYP_CD_LL [1] "FX" "IN" "IP" $ORG_NAM_LL [1] "Business"? ? ? ? ? ? ? ? ? ? ? ? "Chief Financial Officer"? ? ? ? "Chief Information Office"? ? ? ? "Computation"? ? ? ? ? ? ? ? ? ? "Engineering"? ? ? ? ? ? ? ? ? ? "ESH and Quality" [7] "Facilities and Infrastructure"? "Global Security"? ? ? ? ? ? ? ? "NIF"? ? ? ? ? "NO"? ? ? ? ? ? ? "Office of the Director"? ? ? ? ? "Operations and Business Office" [13] "Physical and Life Sciences"? ? ? "Planning and Financial Services" "ST"? "Security Organization"? ? ? ? ? "Strategic Human Resources Mgmt"? "WCI" $NEW_DISCIPLINE [1] "100s"? ? ? ? ? ? ? ? ? ? ? "300s"? ? ? ? ? ? ? ? ? ? ? "400s"? ? ? ? ? ? ? ? ? ? ? "500s"? ? ? ? ? ? ? ? ? ? ? "600s"? ? ? ? ? ? ? ? ? ? ? "800s"? ? ? ? ? ? ? ? ? ? ? "900s" [8] "Chem? Science"? ? ? ? ? ? ? "Engineering"? ? ? ? ? ? ? ? "Life Sciences"? ? ? ? ? ? ? "Math? Computer Science? IT" "Physics"? ? ? ? ? ? ? ? ? ? "pre100s"? ? ? ? ? ? ? ? ? ? "PSTS Other" [15] "Re" $SERIES? ...... Daniel Lopez Workforce Analyst HRIM - Workforce Analytics & Metrics ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.