thr3ads.net - R help - [R] List of Levels for all Factor variables [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Lopez, Dan

2012-Oct-16 15:19 UTC

[R] List of Levels for all Factor variables

Hi,

I want to get a clean succinct list of all levels for all my factor variables.

I have a dataframe that's something like #1 below. This is just an example
subset of my data and my actual dataset has 70 variables. I know how to narrow
down my list of variables to just my factor variables by using #2 below (thanks
to Bert Gunter). I can also get list of all levels for all my factor variables
using #3 below. But I what I want to find out is if there is a way to get this
list in a similar fashion to what the str function returns: without all the
extra spacing and carriage returns. That's what I mean by "clean
succinct list".

BTW I also tried playing around with several of the parameters for the str
function itself but could not find a way to accomplish what I want to
accomplish.



1.       DATAFRAME
> str(mydata)'data.frame':  11868 obs. of  26 variables:
$ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628 558 3423
743175 ...
$ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242
161 104 336 4254 1595 1244 3669 4760 ...
$ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
$ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
$ APPT_TYP_CD_LL  : Factor w/ 3 levels
"FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
$ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief
Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
$ NEW_DISCIPLINE  : Factor w/ 15 levels "100s","300s",..: 14
6 4 1 11 11 14 2 1 1 ...
$ SERIES          : Factor w/ 10 levels "100s","300s",..: 9
6 4 1 9 9 9 2 1 1 ...
$ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
$ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
$ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
$ HIEDUCLV        : Factor w/ 6 levels
"Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
$ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1 2 2
2 2 2 1 ...
$ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2 1 2
2 2 1 1 2 1 2 ...
$ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1 1 1
1 1 1 1 ...
$ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986 7231 6758
...
$ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE
DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
$ ETHNIC_GRP_CD   : Factor w/ 8 levels "AMIND","ASIAN",..: 8
8 8 8 8 8 8 8 8 8 ...
$ COMMUTE_BIN     : Factor w/ 7 levels "","<15","15
- 24",..: 5 7 2 2 4 3 3 6 3 2 ...
$ EEO_CLASS       : Factor w/ 4 levels
"M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
$ WRK_SCHED       : Factor w/ 6 levels "12HR","4/10s",..: 3
3 3 3 3 3 3 3 4 4 ...
$ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1 2 1
1 1 1 2 ...
$ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
$ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
$ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
$ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1
1 2 2 1 ...

2. Create mydataF to only include factor variables (and exclude NAME which I am
not interested in)
> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
3. Get a list of all levels
> sapply(mydataF,function(x)levels(x))
$APPT_TYP_CD_LL

[1] "FX" "IN" "IP"



$ORG_NAM_LL

 [1] "Business"                        "Chief Financial
Officer"         "Chief Information Office"       
"Computation"                     "Engineering"             
"ESH and Quality"

 [7] "Facilities and Infrastructure"   "Global Security"    
"NIF"          "NO"              "Office of the
Director"          "Operations and Business Office"

[13] "Physical and Life Sciences"      "Planning and Financial
Services" "ST"   "Security Organization"          
"Strategic Human Resources Mgmt"  "WCI"



$NEW_DISCIPLINE

 [1] "100s"                       "300s"                    
"400s"                       "500s"                      
"600s"                       "800s"                      
"900s"

 [8] "Chem  Science"              "Engineering"             
"Life Sciences"              "Math  Computer Science  IT"
"Physics"                    "pre100s"                   
"PSTS Other"

[15] "Re"



$SERIES   ......

Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics & Metrics


	[[alternative HTML version deleted]]

R. Michael Weylandt

2012-Oct-16 15:27 UTC

head link

[R] List of Levels for all Factor variables

On Tue, Oct 16, 2012 at 4:19 PM, Lopez, Dan <lopez235 at llnl.gov>
wrote:> Hi,
>
> I want to get a clean succinct list of all levels for all my factor
variables.
>
> I have a dataframe that's something like #1 below. This is just an
example subset of my data and my actual dataset has 70 variables. I know how to
narrow down my list of variables to just my factor variables by using #2 below
(thanks to Bert Gunter). I can also get list of all levels for all my factor
variables using #3 below. But I what I want to find out is if there is a way to
get this list in a similar fashion to what the str function returns: without all
the extra spacing and carriage returns. That's what I mean by "clean
succinct list".
>
> BTW I also tried playing around with several of the parameters for the str
function itself but could not find a way to accomplish what I want to
accomplish.
>
>
>
> 1.       DATAFRAME
>
>> str(mydata)
> 'data.frame':  11868 obs. of  26 variables:
> $ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628 558
3423 743175 ...
> $ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..:
2777 242 161 104 336 4254 1595 1244 3669 4760 ...
> $ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
> $ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
> $ APPT_TYP_CD_LL  : Factor w/ 3 levels
"FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
> $ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief
Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
> $ NEW_DISCIPLINE  : Factor w/ 15 levels
"100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
> $ SERIES          : Factor w/ 10 levels
"100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
> $ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
> $ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
> $ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
> $ HIEDUCLV        : Factor w/ 6 levels
"Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
> $ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1
2 2 2 2 2 1 ...
> $ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2
1 2 2 2 1 1 2 1 2 ...
> $ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1
1 1 1 1 1 1 ...
> $ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986 7231
6758 ...
> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE
DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
> $ ETHNIC_GRP_CD   : Factor w/ 8 levels
"AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
> $ COMMUTE_BIN     : Factor w/ 7 levels
"","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
> $ EEO_CLASS       : Factor w/ 4 levels
"M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
> $ WRK_SCHED       : Factor w/ 6 levels
"12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
> $ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1
2 1 1 1 1 2 ...
> $ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
> $ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
> $ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
> $ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1
1 1 1 2 2 1 ...
>
> 2. Create mydataF to only include factor variables (and exclude NAME which
I am not interested in)
>
>> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>
> 3. Get a list of all levels
>
>> sapply(mydataF,function(x)levels(x))
>
I think you want to unlist() the result of this call.

RMW

arun

2012-Oct-16 17:08 UTC

head link

[R] List of Levels for all Factor variables

HI,
You can also try this:
set.seed(1)
dat1<-data.frame(col1=factor(sample(1:25,10,replace=TRUE)),col2=sample(letters[1:10],10,replace=TRUE),col3=factor(rep(1:5,each=2)))

sapply(lapply(mapply(c,lapply(names(sapply(dat1,levels)),function(x)
x),sapply(dat1,levels)),function(x)
paste(x[1],":",paste(x[-1],collapse=" "))),print)
#[1] "col1 : 2 6 7 10 15 16 17 23 24"
#[1] "col2 : b c d e g h j"
#[1] "col3 : 1 2 3 4 5"
#[1] "col1 : 2 6 7 10 15 16 17 23 24" "col2 : b c d e g h
j"?????????
#[3] "col3 : 1 2 3 4 5"?? 

A.K. ?? 




----- Original Message -----
From: "Lopez, Dan" <lopez235 at llnl.gov>
To: "R help (r-help at r-project.org)" <r-help at r-project.org>
Cc: 
Sent: Tuesday, October 16, 2012 11:19 AM
Subject: [R] List of Levels for all Factor variables

Hi,

I want to get a clean succinct list of all levels for all my factor variables.

I have a dataframe that's something like #1 below. This is just an example
subset of my data and my actual dataset has 70 variables. I know how to narrow
down my list of variables to just my factor variables by using #2 below (thanks
to Bert Gunter). I can also get list of all levels for all my factor variables
using #3 below. But I what I want to find out is if there is a way to get this
list in a similar fashion to what the str function returns: without all the
extra spacing and carriage returns. That's what I mean by "clean
succinct list".

BTW I also tried playing around with several of the parameters for the str
function itself but could not find a way to accomplish what I want to
accomplish.



1.? ? ?  DATAFRAME
> str(mydata)'data.frame':? 11868 obs. of? 26 variables:
$ EMPLID? ? ? ? ? : int? 431108 32709 19730 10850 48786 2004 237628 558 3423
743175 ...
$ NAME? ? ? ? ? ? : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242
161 104 336 4254 1595 1244 3669 4760 ...
$ TRAIN? ? ? ? ?  : int? 1 1 1 1 1 1 1 1 1 1 ...
$ TARGET? ? ? ? ? : int? 0 0 0 0 0 0 0 0 0 0 ...
$ APPT_TYP_CD_LL? : Factor w/ 3 levels
"FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
$ ORG_NAM_LL? ? ? : Factor w/ 18 levels "Business","Chief
Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
$ NEW_DISCIPLINE? : Factor w/ 15 levels "100s","300s",..: 14
6 4 1 11 11 14 2 1 1 ...
$ SERIES? ? ? ? ? : Factor w/ 10 levels "100s","300s",..: 9
6 4 1 9 9 9 2 1 1 ...
$ AGE? ? ? ? ? ?  : int? 62 53 46 62 55 59 50 36 34 53 ...
$ SERVICE? ? ? ?  : int? 13 29 16 26 18 9 19 11 8 26 ...
$ AGE_SERVICE? ?  : int? 75 82 62 87 73 69 69 47 42 79 ...
$ HIEDUCLV? ? ? ? : Factor w/ 6 levels
"Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
$ GENDER? ? ? ? ? : Factor w/ 2 levels "F","M": 2 2 2 1 2 2
2 2 2 1 ...
$ RETCD? ? ? ? ?  : Factor w/ 2 levels "TCP1","TCP2": 2 1 2
2 2 1 1 2 1 2 ...
$ FLSASTATUS? ? ? : Factor w/ 2 levels "E","N": 1 2 2 1 1 1
1 1 1 1 ...
$ MONTHLY_RT? ? ? : int? 17640 6932 5845 9809 11473 8719 19190 8986 7231 6758
...
$ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE
DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
$ ETHNIC_GRP_CD?  : Factor w/ 8 levels "AMIND","ASIAN",..: 8
8 8 8 8 8 8 8 8 8 ...
$ COMMUTE_BIN? ?  : Factor w/ 7 levels "","<15","15
- 24",..: 5 7 2 2 4 3 3 6 3 2 ...
$ EEO_CLASS? ? ?  : Factor w/ 4 levels
"M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
$ WRK_SCHED? ? ?  : Factor w/ 6 levels "12HR","4/10s",..: 3
3 3 3 3 3 3 3 4 4 ...
$ FWT_MAR_STATUS? : Factor w/ 2 levels "M","S": 1 1 1 1 2 1
1 1 1 2 ...
$ COVERED_DP? ? ? : int? 2 2 4 0 1 3 1 2 0 0 ...
$ YRS_IN_SERIES?  : int? 13 29 16 26 18 9 19 3 7 26 ...
$ SAVINGS_PCT? ?  : int? 10 0 6 19 8 0 10 15 15 18 ...
$ Generation? ? ? : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1
1 2 2 1 ...

2. Create mydataF to only include factor variables (and exclude NAME which I am
not interested in)
> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
3. Get a list of all levels
> sapply(mydataF,function(x)levels(x))
$APPT_TYP_CD_LL

[1] "FX" "IN" "IP"



$ORG_NAM_LL

[1] "Business"? ? ? ? ? ? ? ? ? ? ? ? "Chief Financial
Officer"? ? ? ?  "Chief Information Office"? ? ? ?
"Computation"? ? ? ? ? ? ? ? ? ?  "Engineering"? ? ? ? ? ? ?
? ? ?  "ESH and Quality"

[7] "Facilities and Infrastructure"?  "Global Security"? ? ?
? ? ? ? ?  "NIF"? ? ? ? ? "NO"? ? ? ? ? ? ? "Office of
the Director"? ? ? ? ? "Operations and Business Office"

[13] "Physical and Life Sciences"? ? ? "Planning and Financial
Services" "ST"?  "Security Organization"? ? ? ? ? 
"Strategic Human Resources Mgmt"? "WCI"



$NEW_DISCIPLINE

[1] "100s"? ? ? ? ? ? ? ? ? ? ?  "300s"? ? ? ? ? ? ? ? ? ? ?
"400s"? ? ? ? ? ? ? ? ? ? ?  "500s"? ? ? ? ? ? ? ? ? ? ? 
"600s"? ? ? ? ? ? ? ? ? ? ?  "800s"? ? ? ? ? ? ? ? ? ? ? 
"900s"

[8] "Chem? Science"? ? ? ? ? ? ? "Engineering"? ? ? ? ? ? ?
? "Life Sciences"? ? ? ? ? ? ? "Math? Computer Science? IT"
"Physics"? ? ? ? ? ? ? ? ? ? "pre100s"? ? ? ? ? ? ? ? ? ?
"PSTS Other"

[15] "Re"



$SERIES?  ......

Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics & Metrics


??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

R help - Oct 2012 - List of Levels for all Factor variables

[R] List of Levels for all Factor variables

[R] List of Levels for all Factor variables

[R] List of Levels for all Factor variables

Possibly Parallel Threads