Sean MacEachern
2008-Mar-28 14:20 UTC
[R] Beginner help with retrieving frequency and transforming a matrix
Hi All, Just hoping some one can give me a hand with a problem... I have a dataframe (DF) with about 5 million entries that looks something like the following:>DFID Cl Co Brd Ind A AB AB 1 S-3 IND A BR_F BR_F01 1 0 0 2 S-3 IND A BR_F BR_F01 1 0 0 3 S-3 IND A BR_F BR_F01 1 0 0 4 S-3 IND A BR_F BR_F01 1 0 0 5 S-3 IND A BR_F BR_F01 1 0 0 6 S-3 IND A BR_F BR_F01 0 1 0 7 S-3 IND A BR_F BR_F02 0 0 1 8 S-3 IND A BR_F BR_F02 0 1 0 9 S-3 IND A BR_F BR_F02 1 0 0 10 S-3 IND A BR_F BR_F02 1 0 0 11 S-3 IND A BR_F BR_F02 1 0 0 12 S-3 IND A BR_F BR_F02 1 0 0 I am interested in retrieving the frequency of A for everything with the same Ind code. I have initially created a column called 'frq' that calculates the individual A frequency>DF$frq=apply(DF,1,function(x) if(x[6]==1)1 else if (x[7]==1)0.5 else 0)>DFID Cl Co Brd Ind A AB AB frq 1 S-3 IND A BR_F BR_F01 1 0 0 1 2 S-3 IND A BR_F BR_F01 1 0 0 1 3 S-3 IND A BR_F BR_F01 1 0 0 1 4 S-3 IND A BR_F BR_F01 1 0 0 1 5 S-3 IND A BR_F BR_F01 1 0 0 1 6 S-3 IND A BR_F BR_F01 0 1 0 0.5 7 S-3 IND A BR_F BR_F02 0 0 1 0 8 S-3 IND A BR_F BR_F02 0 1 0 0.5 9 S-3 IND A BR_F BR_F02 1 0 0 1 10 S-3 IND A BR_F BR_F02 1 0 0 1 11 S-3 IND A BR_F BR_F02 0 1 0 0.5 12 S-3 IND A BR_F BR_F02 1 0 0 1 I've created a new DF that contains the info I'm interested in:>DF2 = cbind(DF[1],DF[5],DF[9])>DF2ID Ind frq 1 S-3 BR_F01 1 2 S-3 BR_F01 1 ... ... ... 11 S-3 BR_F02 0.5 12 S-3 BR_F02 1 I am wondering is there a method that I can call to calculate the frequency of A or frq for all individuals with the same Ind code so the DF (matrix) looks something like the following? (I've saw something in a tut based on t-tests that I thought would work, but no joy...)>NewDFID Ind frq 1 S-3 BR_F01 0.9167 2 S-3 BR_F02 0.6667 Further, is there to then transform the matrix to look something like the following?>FinalDFInd S-3 S-4 S-5.... S-1000000 BR_F01 0.9167 0.5 1 0.6667 BR_F02 0.6667 0.2 1 0.5 ... ... ... BR_Z98 0.5 1 0.3 1 BR_Z99 1 0.6 1 0.5 Thanks in advance for any help you can offer, and please let me know if there is any further information I can provide. Sean> sessionInfo()R version 2.6.0 (2007-10-03) i386-apple-darwin8.10.1 locale: en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base
Roland Rau
2008-Mar-28 15:52 UTC
[R] Beginner help with retrieving frequency and transforming a matrix
Hi Sean,
is this roughly what you are looking for (please note that in the
example data you provided there is only one level of ID given, no
"S-4",
...) ?
> DF
ID Cl Co Brd Ind A AB AB.1 frq
1 S-3 IND A BR_F BR_F01 1 0 0 1.0
2 S-3 IND A BR_F BR_F01 1 0 0 1.0
3 S-3 IND A BR_F BR_F01 1 0 0 1.0
4 S-3 IND A BR_F BR_F01 1 0 0 1.0
5 S-3 IND A BR_F BR_F01 1 0 0 1.0
6 S-3 IND A BR_F BR_F01 0 1 0 0.5
7 S-3 IND A BR_F BR_F02 0 0 1 0.0
8 S-3 IND A BR_F BR_F02 0 1 0 0.5
9 S-3 IND A BR_F BR_F02 1 0 0 1.0
10 S-3 IND A BR_F BR_F02 1 0 0 1.0
11 S-3 IND A BR_F BR_F02 0 1 0 0.5
12 S-3 IND A BR_F BR_F02 1 0 0 1.0
> DF2 <- aggregate(x=DF$frq, by=list(ID=DF$ID, Ind=DF$Ind), FUN=mean)
> DF2
ID Ind x
1 S-3 BR_F01 0.9166667
2 S-3 BR_F02 0.6666667
> FinalDF <- tapply(X=DF$frq, INDEX=list(Ind=DF$Ind, ID=DF$ID), FUN=mean)
> FinalDF
ID
Ind S-3
BR_F01 0.9166667
BR_F02 0.6666667
>
Best,
Roland
Sean MacEachern wrote:> Hi All,
>
> Just hoping some one can give me a hand with a problem...
>
> I have a dataframe (DF) with about 5 million entries that looks something
> like the following:
>
>> DF
> ID Cl Co Brd Ind A AB AB
> 1 S-3 IND A BR_F BR_F01 1 0 0
> 2 S-3 IND A BR_F BR_F01 1 0 0
> 3 S-3 IND A BR_F BR_F01 1 0 0
> 4 S-3 IND A BR_F BR_F01 1 0 0
> 5 S-3 IND A BR_F BR_F01 1 0 0
> 6 S-3 IND A BR_F BR_F01 0 1 0
> 7 S-3 IND A BR_F BR_F02 0 0 1
> 8 S-3 IND A BR_F BR_F02 0 1 0
> 9 S-3 IND A BR_F BR_F02 1 0 0
> 10 S-3 IND A BR_F BR_F02 1 0 0
> 11 S-3 IND A BR_F BR_F02 1 0 0
> 12 S-3 IND A BR_F BR_F02 1 0 0
>
> I am interested in retrieving the frequency of A for everything with the
> same Ind code.
>
> I have initially created a column called 'frq' that calculates the
> individual A frequency
>
>
>> DF$frq=apply(DF,1,function(x) if(x[6]==1)1 else if (x[7]==1)0.5 else 0)
>
>> DF
>
> ID Cl Co Brd Ind A AB AB frq
> 1 S-3 IND A BR_F BR_F01 1 0 0 1
> 2 S-3 IND A BR_F BR_F01 1 0 0 1
> 3 S-3 IND A BR_F BR_F01 1 0 0 1
> 4 S-3 IND A BR_F BR_F01 1 0 0 1
> 5 S-3 IND A BR_F BR_F01 1 0 0 1
> 6 S-3 IND A BR_F BR_F01 0 1 0 0.5
> 7 S-3 IND A BR_F BR_F02 0 0 1 0
> 8 S-3 IND A BR_F BR_F02 0 1 0 0.5
> 9 S-3 IND A BR_F BR_F02 1 0 0 1
> 10 S-3 IND A BR_F BR_F02 1 0 0 1
> 11 S-3 IND A BR_F BR_F02 0 1 0 0.5
> 12 S-3 IND A BR_F BR_F02 1 0 0 1
>
> I've created a new DF that contains the info I'm interested in:
>
>> DF2 = cbind(DF[1],DF[5],DF[9])
>
>> DF2
>
> ID Ind frq
> 1 S-3 BR_F01 1
> 2 S-3 BR_F01 1
> ...
> ...
> ...
> 11 S-3 BR_F02 0.5
> 12 S-3 BR_F02 1
>
>
> I am wondering is there a method that I can call to calculate the frequency
> of A or frq for all individuals with the same Ind code so the DF (matrix)
> looks something like the following? (I've saw something in a tut based
on
> t-tests that I thought would work, but no joy...)
>
>
>> NewDF
>
> ID Ind frq
> 1 S-3 BR_F01 0.9167
> 2 S-3 BR_F02 0.6667
>
>
> Further, is there to then transform the matrix to look something like the
> following?
>
>
>> FinalDF
>
> Ind S-3 S-4 S-5.... S-1000000
> BR_F01 0.9167 0.5 1 0.6667
> BR_F02 0.6667 0.2 1 0.5
> ...
> ...
> ...
> BR_Z98 0.5 1 0.3 1
> BR_Z99 1 0.6 1 0.5
>
>
>
> Thanks in advance for any help you can offer, and please let me know if
> there is any further information I can provide.
>
> Sean
>
>
>> sessionInfo()
> R version 2.6.0 (2007-10-03)
> i386-apple-darwin8.10.1
>
> locale:
> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>