thr3ads.net - R help - [R] calculating the occurrences of distinct observations in the subsets of a dataframe [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Bodnar Laszlo EB_HU

2011-Mar-17 09:48 UTC

[R] calculating the occurrences of distinct observations in the subsets of a dataframe

Hello everybody,

I have a data frame in R which is similar to the follows. Actually my real
'df' dataframe is much bigger than this one here but I really do not
want to confuse anybody so that is why I try to simplify things as much as
possible.

So here's the data frame.

id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)
df <-data.frame(id,a,b,c,d,e)
df

Basically what I would like to do is to get the occurrences of numbers for each
column (a,b,c,d,e) and for each id group (1,2,3) (for this latter grouping see
my column 'id').

So, for column 'a' and for id number '1' (for the latter see
column 'id') the code would be something like this:
as.numeric(table(df[1:10,2]))

The results are:
[1] 3 7

Just to briefly explain my results: in column 'a' (and regarding only
those records which have number '1' in column 'id') we can say
that:
number 1 occured 3 times, and
number 3 occured 7 times.

Again, just to show you another example. For column 'a' and for id
number '2' (for the latter grouping see again column 'id'):
as.numeric(table(df[11:20,2]))

After running the codes the results are: [1] 4 3 3

Let me explain a little again: in column 'a' and regarding only those
observations which have number '2' in column 'id') we can say
that
number 1 occured 4 times
number 2 occured 3 times and
number 3 occured 3 times.

Last example: for column 'e' and for id number '3' the code
would be:
as.numeric(table(df[21:30,6]))

With the results:
[1] 1 4 5

...meaning that number '1' occured once, number '2' occured four
times and number '3' occured 5 times.

So this is what I would like to do. Calculating the occurrences of numbers for
each custom-defined subsets (and then collecting these values into a data
frame). I know it is NOT a difficult task but the PROBLEM is that I'm gonna
have to change the input 'df' dataframe on a regular basis and hence
both the overall number of rows and columns might CHANGE over time...

What I have done so far is that I have separated the 'df' dataframe by
columns, like this:
for (z in (2:ncol(df))) assign(paste("df",z,sep="."),df[,z])

So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c etc. But
I'm really stuck now and I don't know how to move forward, you know,
getting the occurrences for each column and each group of ids.

Do you have any ideas?
Best regards,
Laszlo

____________________________________________________________________________________________________
Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy
jogilag, szakmailag vagy más módon védett információt tartalmazhat. Amennyiben
nem Ön a levél címzettje akkor a levél tartalmának közlése, reprodukálása,
másolása, vagy egyéb más úton történő terjesztése, felhasználása szigorúan
tilos. Amennyiben tévedésből kapta meg ezt az üzenetet kérjük azonnal értesítse
az üzenet küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal felelősséget az
információ teljes és pontos - címzett(ek)hez történő - eljuttatásáért, valamint
semmilyen késésért, kapcsolat megszakadásból eredő hibáért, vagy az információ
felhasználásából vagy annak megbízhatatlanságából eredő kárért.

Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és hozzájárul,
hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH folytonos
munkamenetének biztosítása érdekében.


This e-mail and any attached files are confidential and/...{{dropped:19}}

Tóth Dénes

2011-Mar-17 11:43 UTC

head link

[R] calculating the occurrences of distinct observations in the subsets of a dataframe

Hi!

Sorry, I made an error in the previous e-mail.
So try this:
by(df[,-1],df$id,function(x) apply(x,2,tabulate))

This gives you a list. You can rearrange it into a data frame or a 3d
array if you wish.

Regards,
  Denes



> Hello everybody,
>
> I have a data frame in R which is similar to the follows. Actually my real
> 'df' dataframe is much bigger than this one here but I really do
not want
> to confuse anybody so that is why I try to simplify things as much as
> possible.
>
> So here's the data frame.
>
> id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
> a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
> b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
> c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
> d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
> e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)
> df <-data.frame(id,a,b,c,d,e)
> df
>
> Basically what I would like to do is to get the occurrences of numbers for
> each column (a,b,c,d,e) and for each id group (1,2,3) (for this latter
> grouping see my column 'id').
>
> So, for column 'a' and for id number '1' (for the latter
see column 'id')
> the code would be something like this:
> as.numeric(table(df[1:10,2]))
>
> The results are:
> [1] 3 7
>
> Just to briefly explain my results: in column 'a' (and regarding
only
> those records which have number '1' in column 'id') we can
say that:
> number 1 occured 3 times, and
> number 3 occured 7 times.
>
> Again, just to show you another example. For column 'a' and for id
number
> '2' (for the latter grouping see again column 'id'):
> as.numeric(table(df[11:20,2]))
>
> After running the codes the results are: [1] 4 3 3
>
> Let me explain a little again: in column 'a' and regarding only
those
> observations which have number '2' in column 'id') we can
say that
> number 1 occured 4 times
> number 2 occured 3 times and
> number 3 occured 3 times.
>
> Last example: for column 'e' and for id number '3' the code
would be:
> as.numeric(table(df[21:30,6]))
>
> With the results:
> [1] 1 4 5
>
> ...meaning that number '1' occured once, number '2' occured
four times and
> number '3' occured 5 times.
>
> So this is what I would like to do. Calculating the occurrences of numbers
> for each custom-defined subsets (and then collecting these values into a
> data frame). I know it is NOT a difficult task but the PROBLEM is that
I'm
> gonna have to change the input 'df' dataframe on a regular basis
and hence
> both the overall number of rows and columns might CHANGE over time...
>
> What I have done so far is that I have separated the 'df' dataframe
by
> columns, like this:
> for (z in (2:ncol(df)))
assign(paste("df",z,sep="."),df[,z])
>
> So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c
> etc. But I'm really stuck now and I don't know how to move forward,
you
> know, getting the occurrences for each column and each group of ids.
>
> Do you have any ideas?
> Best regards,
> Laszlo
>
>
____________________________________________________________________________________________________
> Ez az e-mail ??s az ??sszes hozz?? tartoz?? csatolt mell??klet titkos
> ??s/vagy jogilag, szakmailag vagy m??s m??don v??dett inform??ci??t
> tartalmazhat. Amennyiben nem ??n a lev??l c??mzettje akkor a lev??l
> tartalm??nak k??zl??se, reproduk??l??sa, m??sol??sa, vagy egy??b m??s
> ??ton t??rt??n?? terjeszt??se, felhaszn??l??sa szigor??an tilos.
> Amennyiben t??ved??sb??l kapta meg ezt az ??zenetet k??rj??k azonnal
> ??rtes??tse az ??zenet k??ld??j??t. Az Erste Bank Hungary Zrt. (EBH) nem
> v??llal felel??ss??get az inform??ci?? teljes ??s pontos - c??mzett(ek)hez
> t??rt??n?? - eljuttat??s????rt, valamint semmilyen k??s??s??rt, kapcsolat
> megszakad??sb??l ered?? hib????rt, vagy az inform??ci??
> felhaszn??l??s??b??l vagy annak megb??zhatatlans??g??b??l ered??
> k??r??rt.
>
> Az ??zenetek EBH-n k??v??li k??ld??je vagy c??mzettje tudom??sul veszi ??s
> hozz??j??rul, hogy az ??zenetekhez m??s banki alkalmazott is hozz??f??rhet
> az EBH folytonos munkamenet??nek biztos??t??sa ??rdek??ben.
>
>
> This e-mail and any attached files are confidential and/...{{dropped:19}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more reasonably related threads

R help - Mar 2011 - calculating the occurrences of distinct observations in the subsets of a dataframe

[R] calculating the occurrences of distinct observations in the subsets of a dataframe

[R] calculating the occurrences of distinct observations in the subsets of a dataframe

Apparently Analagous Threads