thr3ads.net - R help - [R] dput vs unclass to see what a factor really is composed of [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Jacob Wegelin

2008-Jul-31 21:21 UTC

[R] dput vs unclass to see what a factor really is composed of

I used read.dta() to read in a Stata 9 dataset to R. The "Sex01"
variable
takes on two values in Stata: 0 and 1, and it is labeled "M" and
"F"
respectively, analogous to an R factor. Thus, read.dta reads it in as a
factor.

Now, I wanted to see what this variable *really* is, in R. For instance,
sometimes R converts a 0/1 variable into a 1/2 variable when it considers it
a factor. The "dput" function often enables me to see what a variable
really
is, but in this case it tells me that the components of the factor are
"1L"
and "2L", that is, "one uppercase ell" and "two
uppercase ell."  What does
the uppercase ell mean?

In this case, "unclass" seems to enable me to see what the variable
really
consists of. But what do "1L" and "2L" mean?
> summary(DAT$Sex01[1:150])  M   F
137  13> dput(DAT$Sex01[1:150])structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("M", "F"),
class = "factor")
> unclass(junk)  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 2 1 1 2 2 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1
[149] 1 1
attr(,"levels")
[1] "M" "F"
> str(junk) Factor w/ 2 levels "M","F": 1 1 1 1 1 1 1 1 1 1 ...


Jake

	[[alternative HTML version deleted]]

Prof Brian Ripley

2008-Jul-31 22:00 UTC

head link

[R] dput vs unclass to see what a factor really is composed of

See ?NumericConstants.

On Thu, 31 Jul 2008, Jacob Wegelin wrote:
> I used read.dta() to read in a Stata 9 dataset to R. The "Sex01"
variable
> takes on two values in Stata: 0 and 1, and it is labeled "M" and
"F"
> respectively, analogous to an R factor. Thus, read.dta reads it in as a
> factor.
>
> Now, I wanted to see what this variable *really* is, in R. For instance,
> sometimes R converts a 0/1 variable into a 1/2 variable when it considers
it
> a factor. The "dput" function often enables me to see what a
variable really
> is, but in this case it tells me that the components of the factor are
"1L"
> and "2L", that is, "one uppercase ell" and "two
uppercase ell."  What does
> the uppercase ell mean?
>
> In this case, "unclass" seems to enable me to see what the
variable really
> consists of. But what do "1L" and "2L" mean?
>
>> summary(DAT$Sex01[1:150])
>  M   F
> 137  13
>> dput(DAT$Sex01[1:150])
> structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L,
> 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
> 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
> 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("M",
"F"), class = "factor")
>
>> unclass(junk)
>  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> 1 2 1 1 2 2 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1
> [149] 1 1
> attr(,"levels")
> [1] "M" "F"
>
>> str(junk)
> Factor w/ 2 levels "M","F": 1 1 1 1 1 1 1 1 1 1 ...
>
>
> Jake
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Duncan Murdoch

2008-Jul-31 22:10 UTC

head link

[R] dput vs unclass to see what a factor really is composed of

On 31/07/2008 5:21 PM, Jacob Wegelin wrote:> I used read.dta() to read in a Stata 9 dataset to R. The "Sex01"
variable
> takes on two values in Stata: 0 and 1, and it is labeled "M" and
"F"
> respectively, analogous to an R factor. Thus, read.dta reads it in as a
> factor.
> 
> Now, I wanted to see what this variable *really* is, in R. For instance,
> sometimes R converts a 0/1 variable into a 1/2 variable when it considers
it
> a factor. The "dput" function often enables me to see what a
variable really
> is, but in this case it tells me that the components of the factor are
"1L"
> and "2L", that is, "one uppercase ell" and "two
uppercase ell."  What does
> the uppercase ell mean?
> 
> In this case, "unclass" seems to enable me to see what the
variable really
> consists of. But what do "1L" and "2L" mean?
Those are integer constants.  Compare:

 > typeof(1)
[1] "double"
 > typeof(1L)
[1] "integer"

A digit 1 is taken to be "double" (or "numeric"), i.e. a 64
bit double
precision floating point.  A constant 1L is "integer", i.e. a 32 bit 
signed integer.  Factors only contain integer values, so we use integers 
to store them, and dput() writes them out that way so that they wouldn't 
get converted to floating point when you read them in.

Duncan Murdoch
> 
>> summary(DAT$Sex01[1:150])
>   M   F
> 137  13
>> dput(DAT$Sex01[1:150])
> structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L,
> 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
> 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
> 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("M",
"F"), class = "factor")
> 
>> unclass(junk)
>   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> 1 2 1 1 2 2 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1
> [149] 1 1
> attr(,"levels")
> [1] "M" "F"
> 
>> str(junk)
>  Factor w/ 2 levels "M","F": 1 1 1 1 1 1 1 1 1 1 ...
> 
> 
> Jake
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Maybe Matching Threads

Search for more seemingly similar threads

R help - Jul 2008 - dput vs unclass to see what a factor really is composed of

[R] dput vs unclass to see what a factor really is composed of

[R] dput vs unclass to see what a factor really is composed of

[R] dput vs unclass to see what a factor really is composed of

Maybe Matching Threads