thr3ads.net - R help - [R] When creating a data frame with data.frame() transforms "integers" into "factors" [May 2013]

If this information is useful, please help other people find it:
Share via:

António Camacho

2013-May-25 19:36 UTC

[R] When creating a data frame with data.frame() transforms "integers" into "factors"

Hello


I am novice to R and i was learning how to do a scatter plot with R using
an example from a website.

My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install,
without additional packages loaded

I created a .csv file in vim with  the following content
userID,user,posts
1,user1,581
2,user2,281
3,user3,196
4,user4,150
5,user5,282
6,user6,184
7,user7,90
8,user8,74
9,user9,45
10,user10,20
11,user11,3
12,user12,1
13,user13,345
14,user14,123

i imported the file into R using : ' df <- read.csv('file.csv')
to confirm the data types i did : 'sappily(df, class) '
that returns "userID" --> "integer" ; "user"
---> "factor" ; "posts" --->
"integer"
then i try to create another data frame with the number of posts and its
frequencies,
so i did: 'postFreqCount<-data.frame(table(df['posts']))'
this gives me the postFreqCount data frame with two columns, one called
'Var1' that has the number of posts each user did, and another collumn
'Freq' with the frequency of each number of posts.
the problem is that if i do :
'sappily(postFreqCount['Var1'],class)' it
returns "factor".
So the data.frame() function transformed a variable that was "integer"
(posts) to a variable (Var1) that has the same values but is "factor".
I want to know how to prevent this from happening. How do i keep the values
from being transformed from "integer" to "factor" ?

Thank you for your help

António

	[[alternative HTML version deleted]]

Thomas Stewart

2013-May-26 00:34 UTC

head link

[R] When creating a data frame with data.frame() transforms "integers" into "factors"

Antonio-

What exactly do you want as output?  You stated you wanted a scatter plot,
but which variable do you want on the X axis and which variable do you want
on the Y axis?

-tgs


On Sat, May 25, 2013 at 3:36 PM, António Camacho <toinobc@gmail.com>
wrote:
> Hello
>
>
> I am novice to R and i was learning how to do a scatter plot with R using
> an example from a website.
>
> My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install,
> without additional packages loaded
>
> I created a .csv file in vim with  the following content
> userID,user,posts
> 1,user1,581
> 2,user2,281
> 3,user3,196
> 4,user4,150
> 5,user5,282
> 6,user6,184
> 7,user7,90
> 8,user8,74
> 9,user9,45
> 10,user10,20
> 11,user11,3
> 12,user12,1
> 13,user13,345
> 14,user14,123
>
> i imported the file into R using : ' df <-
read.csv('file.csv')
> to confirm the data types i did : 'sappily(df, class) '
> that returns "userID" --> "integer" ;
"user" ---> "factor" ; "posts" --->
> "integer"
> then i try to create another data frame with the number of posts and its
> frequencies,
> so i did:
'postFreqCount<-data.frame(table(df['posts']))'
> this gives me the postFreqCount data frame with two columns, one called
> 'Var1' that has the number of posts each user did, and another
collumn
> 'Freq' with the frequency of each number of posts.
> the problem is that if i do :
'sappily(postFreqCount['Var1'],class)' it
> returns "factor".
> So the data.frame() function transformed a variable that was
"integer"
> (posts) to a variable (Var1) that has the same values but is
"factor".
> I want to know how to prevent this from happening. How do i keep the values
> from being transformed from "integer" to "factor" ?
>
> Thank you for your help
>
> António
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
	[[alternative HTML version deleted]]

Bert Gunter

2013-May-26 00:44 UTC

head link

[R] When creating a data frame with data.frame() transforms "integers" into "factors"

Huh?
> z <- sample(1:10,30,rep=TRUE)
> tbl <- table(z)
> tblz
 1  2  3  4  5  6  7  8  9 10
 4  3  2  6  3  3  2  2  2  3> data.frame(z)    z
1   5
2   2
3   4
4   1
5   6
6   4
7  10
8   4
9   3
10  8
11 10
12  4
13  3
14  9
15  2
16  2
17  6
18  1
19  4
20  7
21  9
22 10
23  7
24  5
25  5
26  6
27  8
28  1
29  1
30  4> sapply(data.frame(z),class)        z
"integer"

Your error: you used df['posts']  . You should have used
df[,'posts'] .

The former is a data frame. The latter is a vector. Read the
"Introduction to R tutorial" or ?"[" if you don't
understand why.

-- Bert

-- Bert

On Sat, May 25, 2013 at 12:36 PM, Ant?nio Camacho <toinobc at gmail.com>
wrote:> Hello
>
>
> I am novice to R and i was learning how to do a scatter plot with R using
> an example from a website.
>
> My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install,
> without additional packages loaded
>
> I created a .csv file in vim with  the following content
> userID,user,posts
> 1,user1,581
> 2,user2,281
> 3,user3,196
> 4,user4,150
> 5,user5,282
> 6,user6,184
> 7,user7,90
> 8,user8,74
> 9,user9,45
> 10,user10,20
> 11,user11,3
> 12,user12,1
> 13,user13,345
> 14,user14,123
>
> i imported the file into R using : ' df <-
read.csv('file.csv')
> to confirm the data types i did : 'sappily(df, class) '
> that returns "userID" --> "integer" ;
"user" ---> "factor" ; "posts" --->
> "integer"
> then i try to create another data frame with the number of posts and its
> frequencies,
> so i did:
'postFreqCount<-data.frame(table(df['posts']))'
> this gives me the postFreqCount data frame with two columns, one called
> 'Var1' that has the number of posts each user did, and another
collumn
> 'Freq' with the frequency of each number of posts.
> the problem is that if i do :
'sappily(postFreqCount['Var1'],class)' it
> returns "factor".
> So the data.frame() function transformed a variable that was
"integer"
> (posts) to a variable (Var1) that has the same values but is
"factor".
> I want to know how to prevent this from happening. How do i keep the values
> from being transformed from "integer" to "factor" ?
>
> Thank you for your help
>
> Ant?nio
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Bert Gunter

2013-May-26 14:00 UTC

head link

[R] When creating a data frame with data.frame() transforms "integers" into "factors"

1. Please always cc. the list; do not reply just to me.

2.  OK, I see. I ERRED. Had you cc'ed the list, someone might have
pointed this out. The correct example reproduces what you saw.

z<- sample(1:10,30,rep=TRUE)
table(z)
w <- data.frame(table(z))
w

     z  Freq
1   1    2
2   2    3
3   3    1
4   4    3
5   5    5
6   6    3
7   7    5
8   8    4
9   9    1
10 10    3
> sapply(w,class)        z      Freq
 "factor" "integer"

This is exactly what is expected and documented.  See ?table. So the
question is: What do you expect?  table() produces an array whose
cross-classifying factors are the dimensions. data.frame converts this
into a data frame. Perhaps the following will help clarify:
> z <- data.frame(fac1= sample(LETTERS[1:3],10,rep=TRUE),      fac2 =
sample(c("j","k"),10,rep=TRUE))> z   fac1 fac2
1     A    k
2     B    k
3     C    k
4     C    k
5     B    k
6     C    k
7     C    k
8     A    j
9     A    j
10    C    j
> table(z)
    fac2
fac1 j k
   A 2 1
   B 0 2
   C 1 4
> data.frame(table(z))
  fac1 fac2 Freq
1    A    j    2
2    B    j    0
3    C    j    1
4    A    k    1
5    B    k    2
6    C    k    4
> table(z['fac1'])
A B C
3 2 5
> data.frame(table(z['fac1']))  Var1 Freq
1    A    3
2    B    2
3    C    5

Cheers,
Bert

On Sat, May 25, 2013 at 6:54 PM, Ant?nio Camacho <toinobc at gmail.com>
wrote:> Hello Bert
> Thanks for your prompt reply.
> I tried your example and it worked without a problem.
>
> But what i want is to create a data frame from the output of the function
> table(), so in your example i tried
"sapply(data.frame(tbl),class)" and the
> output was z --> factor and Freq --->integer.
> What is happening in the table() function that is transforming the integers
> in z into values with labels ?
> because when i do "names(tbl)" it returns each value of z as a
name....
>
> I read the manual for " [ " but i didn't understand it
completely. I have to
> read the introduction to R more carefully.
>
> I also tried using "[," "[[" and "$" for the
extraction of the values from
> the 'posts' column, but the problem persisted.
>
> Like i said, this code was taken from an example in a webpage. I contacted
> the author and he confirmed me that the code worked on his machine, that
was
> running R 2.15.1....
> Maybe something changed between versions in the data.frame() ??
>
> I really don't understant what I am doing wrong.
>
> Ant?nio
>
> On 2013/05/26, at 01:44, Bert Gunter wrote:
>
>> Huh?
>>
>>> z <- sample(1:10,30,rep=TRUE)
>>> tbl <- table(z)
>>> tbl
>>
>> z
>> 1 2 3 4 5 6 7 8 9 10
>> 4 3 2 6 3 3 2 2 2 3
>>>
>>> data.frame(z)
>>
>>    z
>> 1   5
>> 2   2
>> 3   4
>> 4   1
>> 5   6
>> 6   4
>> 7  10
>> 8   4
>> 9   3
>> 10  8
>> 11 10
>> 12  4
>> 13  3
>> 14  9
>> 15  2
>> 16  2
>> 17  6
>> 18  1
>> 19  4
>> 20  7
>> 21  9
>> 22 10
>> 23  7
>> 24  5
>> 25  5
>> 26  6
>> 27  8
>> 28  1
>> 29  1
>> 30  4
>>>
>>> sapply(data.frame(z),class)
>>
>>        z
>> "integer"
>>
>> Your error: you used df['posts']  . You should have used
df[,'posts'] .
>>
>> The former is a data frame. The latter is a vector. Read the
>> "Introduction to R tutorial" or ?"[" if you
don't understand why.
>>
>> -- Bert
>>
>> -- Bert
>>
>> On Sat, May 25, 2013 at 12:36 PM, Ant?nio Camacho <toinobc at
gmail.com>
>> wrote:
>>>
>>> Hello
>>>
>>>
>>> I am novice to R and i was learning how to do a scatter plot with R
using
>>> an example from a website.
>>>
>>> My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default
install,
>>> without additional packages loaded
>>>
>>> I created a .csv file in vim with  the following content
>>> userID,user,posts
>>> 1,user1,581
>>> 2,user2,281
>>> 3,user3,196
>>> 4,user4,150
>>> 5,user5,282
>>> 6,user6,184
>>> 7,user7,90
>>> 8,user8,74
>>> 9,user9,45
>>> 10,user10,20
>>> 11,user11,3
>>> 12,user12,1
>>> 13,user13,345
>>> 14,user14,123
>>>
>>> i imported the file into R using : ' df <-
read.csv('file.csv')
>>> to confirm the data types i did : 'sappily(df, class) '
>>> that returns "userID" --> "integer" ;
"user" ---> "factor" ; "posts" --->
>>> "integer"
>>> then i try to create another data frame with the number of posts
and its
>>> frequencies,
>>> so i did:
'postFreqCount<-data.frame(table(df['posts']))'
>>> this gives me the postFreqCount data frame with two columns, one
called
>>> 'Var1' that has the number of posts each user did, and
another collumn
>>> 'Freq' with the frequency of each number of posts.
>>> the problem is that if i do :
'sappily(postFreqCount['Var1'],class)' it
>>> returns "factor".
>>> So the data.frame() function transformed a variable that was
"integer"
>>> (posts) to a variable (Var1) that has the same values but is
"factor".
>>> I want to know how to prevent this from happening. How do i keep
the
>>> values
>>> from being transformed from "integer" to
"factor" ?
>>>
>>> Thank you for your help
>>>
>>> Ant?nio
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>>
>>
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

R help - May 2013 - When creating a data frame with data.frame() transforms "integers" into "factors"

[R] When creating a data frame with data.frame() transforms "integers" into "factors"

[R] When creating a data frame with data.frame() transforms "integers" into "factors"

[R] When creating a data frame with data.frame() transforms "integers" into "factors"

[R] When creating a data frame with data.frame() transforms "integers" into "factors"