Using R version 2.4.1 (2006-12-18) on Windows, I have a dataset which resembles this: id att1 att2 att3 1 1 1 0 2 1 0 0 3 0 1 1 4 1 1 1 ratings <- data.frame(id = c(1,2,3,4), att1 = c(1,1,0,1), att2 = c(1,0,0,1), att3 = c(0,1,1,1)) I would like to get a cross tab of counts of co-ocurrence, which might resemble this: att1 att2 att3 att1 2 1 att2 2 2 att3 1 2 with the hope of understanding, at least pairwise, what things "hang together". (Yes, there are much, much better ways to do this statistically including clustering and binary corrected correlation, but the audience I am working with asked for this version for a specific reason.) (Later on, I would also like to convert to percentages of the total unique pop, so the final version of the table would be att1 att2 att3 att1 50% 25% att2 50% 50% att3 25% 50% But I can do this in excel if I can get the first table out.) I have tried the reshape library, but could not get anything resembling this (both on its own, as well as feeding in to table()). (I have also played with transposing and using some comments from this list from 2002 and 2004, but the questioners appear to assume more knowledge than I have in use of R; the example in the posting guide was also more complex than I was ready for, I'm afraid.) Sample of some of my efforts: library(reshape) melt(ratings,id=c("id")) ds1 <- melt(ratings,id=c("id")) table(ds1$variable, ds1$variable) # returns only rowcounts, 3 along diagonal xtabs(formula = value ~ ds1$variable + ds1$variable , data=ds1) # returns only a single row of collapsed counts, appears to not allow 1 variable in multiple uses I suspect I am close, so any nudges in the right direction would be helpful. Thanks much, Michael PS: www.rseek.org is very impressive, I heartily encourage its use. [[alternative HTML version deleted]]
Try this: tab <- crossprod(as.matrix(ratings[,-1])) tab <- tab - diag(diag(tab)) tab tab / nrow(ratings) On 2/22/07, Michael Wexler <wexler at yahoo.com> wrote:> Using R version 2.4.1 (2006-12-18) on Windows, I have a dataset which resembles this: > > id att1 att2 att3 > 1 1 1 0 > 2 1 0 0 > 3 0 1 1 > 4 1 1 1 > > ratings <- data.frame(id = c(1,2,3,4), att1 = c(1,1,0,1), att2 = c(1,0,0,1), att3 = c(0,1,1,1)) > > I would like to get a cross tab of counts of co-ocurrence, which might resemble this: > > att1 att2 att3 > att1 2 1 > att2 2 2 > att3 1 2 > > with the hope of understanding, at least pairwise, what things "hang together". (Yes, there are much, much better ways to do this statistically including clustering and binary corrected correlation, but the audience I am working with asked for this version for a specific reason.) > > (Later on, I would also like to convert to percentages of the total unique pop, so the final version of the table would be > > > att1 att2 att3 > > att1 50% 25% > > att2 50% 50% > > att3 25% 50% > > > But I can do this in excel if I can get the first table out.) > > I have tried the reshape library, but could not get anything resembling this (both on its own, as well as feeding in to table()). (I have also played with transposing and using some comments from this list from 2002 and 2004, but the questioners appear to assume more knowledge than I have in use of R; the example in the posting guide was also more complex than I was ready for, I'm afraid.) > > Sample of some of my efforts: > library(reshape) > melt(ratings,id=c("id")) > > ds1 <- melt(ratings,id=c("id")) > table(ds1$variable, ds1$variable) # returns only rowcounts, 3 along diagonal > xtabs(formula = value ~ ds1$variable + ds1$variable , data=ds1) # returns only a single row of collapsed counts, appears to not allow 1 variable in multiple uses > > I suspect I am close, so any nudges in the right direction would be helpful. > > Thanks much, Michael > > PS: www.rseek.org is very impressive, I heartily encourage its use. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
> res <- crossprod( as.matrix( ratings[ , -1] ) ) > diag(res) <- "" > print(res, quote=F)att1 att2 att3 att1 2 1 att2 2 2 att3 1 2> > res2 <- crossprod(as.matrix( ratings[ , -1])) * 100 / nrow( ratings ) > res2[] <- paste( res2, "%", sep="" ) > diag(res2) <- "" > print(res2, quote=F)att1 att2 att3 att1 50% 25% att2 50% 50% att3 25% 50%>Be sure to bone up on format and sprintf before taking this into production. On Thu, 22 Feb 2007, Michael Wexler wrote:> Using R version 2.4.1 (2006-12-18) on Windows, I have a dataset which resembles this: > > id att1 att2 att3 > 1 1 1 0 > 2 1 0 0 > 3 0 1 1 > 4 1 1 1 > > ratings <- data.frame(id = c(1,2,3,4), att1 = c(1,1,0,1), att2 = c(1,0,0,1), att3 = c(0,1,1,1)) > > I would like to get a cross tab of counts of co-ocurrence, which might resemble this: > > att1 att2 att3 > att1 2 1 > att2 2 2 > att3 1 2 > > with the hope of understanding, at least pairwise, what things "hang together". (Yes, there are much, much better ways to do this statistically including clustering and binary corrected correlation, but the audience I am working with asked for this version for a specific reason.) > > (Later on, I would also like to convert to percentages of the total unique pop, so the final version of the table would be > > > att1 att2 att3 > > att1 50% 25% > > att2 50% 50% > > att3 25% 50% > > > But I can do this in excel if I can get the first table out.) > > I have tried the reshape library, but could not get anything resembling this (both on its own, as well as feeding in to table()). (I have also played with transposing and using some comments from this list from 2002 and 2004, but the questioners appear to assume more knowledge than I have in use of R; the example in the posting guide was also more complex than I was ready for, I'm afraid.) > > Sample of some of my efforts: > library(reshape) > melt(ratings,id=c("id")) > > ds1 <- melt(ratings,id=c("id")) > table(ds1$variable, ds1$variable) # returns only rowcounts, 3 along diagonal > xtabs(formula = value ~ ds1$variable + ds1$variable , data=ds1) # returns only a single row of collapsed counts, appears to not allow 1 variable in multiple uses > > I suspect I am close, so any nudges in the right direction would be helpful. > > Thanks much, Michael > > PS: www.rseek.org is very impressive, I heartily encourage its use. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901
Thanks to Charles, Gabor, and a private message from Frank E Harrell with some good ideas and help. This crossprod approach was very clever, I would never have thought of it. Best, Michael ----- Original Message ---- From: Charles C. Berry <cberry@tajo.ucsd.edu> To: Michael Wexler <wexler@yahoo.com> Cc: r-help@stat.math.ethz.ch Sent: Thursday, February 22, 2007 1:17:44 PM Subject: Re: [R] Crosstabbing multiple response data> res <- crossprod( as.matrix( ratings[ , -1] ) ) > diag(res) <- "" > print(res, quote=F)att1 att2 att3 att1 2 1 att2 2 2 att3 1 2> > res2 <- crossprod(as.matrix( ratings[ , -1])) * 100 / nrow( ratings ) > res2[] <- paste( res2, "%", sep="" ) > diag(res2) <- "" > print(res2, quote=F)att1 att2 att3 att1 50% 25% att2 50% 50% att3 25% 50%>Be sure to bone up on format and sprintf before taking this into production. On Thu, 22 Feb 2007, Michael Wexler wrote:> Using R version 2.4.1 (2006-12-18) on Windows, I have a dataset which resembles this: > > id att1 att2 att3 > 1 1 1 0 > 2 1 0 0 > 3 0 1 1 > 4 1 1 1 > > ratings <- data.frame(id = c(1,2,3,4), att1 = c(1,1,0,1), att2 = c(1,0,0,1), att3 = c(0,1,1,1)) > > I would like to get a cross tab of counts of co-ocurrence, which might resemble this: > > att1 att2 att3 > att1 2 1 > att2 2 2 > att3 1 2 > > with the hope of understanding, at least pairwise, what things "hang together". (Yes, there are much, much better ways to do this statistically including clustering and binary corrected correlation, but the audience I am working with asked for this version for a specific reason.) > > (Later on, I would also like to convert to percentages of the total unique pop, so the final version of the table would be > > > att1 att2 att3 > > att1 50% 25% > > att2 50% 50% > > att3 25% 50% > > > But I can do this in excel if I can get the first table out.) > > I have tried the reshape library, but could not get anything resembling this (both on its own, as well as feeding in to table()). (I have also played with transposing and using some comments from this list from 2002 and 2004, but the questioners appear to assume more knowledge than I have in use of R; the example in the posting guide was also more complex than I was ready for, I'm afraid.) > > Sample of some of my efforts: > library(reshape) > melt(ratings,id=c("id")) > > ds1 <- melt(ratings,id=c("id")) > table(ds1$variable, ds1$variable) # returns only rowcounts, 3 along diagonal > xtabs(formula = value ~ ds1$variable + ds1$variable , data=ds1) # returns only a single row of collapsed counts, appears to not allow 1 variable in multiple uses > > I suspect I am close, so any nudges in the right direction would be helpful. > > Thanks much, Michael > > PS: www.rseek.org is very impressive, I heartily encourage its use. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry@tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901 [[alternative HTML version deleted]]
--- John Kane <jrkrideau at yahoo.ca> wrote:> Thanks to everyone for this. I was looking at the > same problem last night and just was going to write > a > posting to R-help when I saw this. > > > --- Michael Wexler <wexler at yahoo.com> wrote: > > > > > Thanks to Charles, Gabor, and a private message > from > > Frank E Harrell with some good ideas and help. > This > > crossprod approach was very clever, I would never > > have thought of it. > > > > Best, Michael > > > > > > ----- Original Message ---- > > From: Charles C. Berry <cberry at tajo.ucsd.edu> > > To: Michael Wexler <wexler at yahoo.com> > > Cc: r-help at stat.math.ethz.ch > > Sent: Thursday, February 22, 2007 1:17:44 PM > > Subject: Re: [R] Crosstabbing multiple response > data > > > > > > > res <- crossprod( as.matrix( ratings[ , -1] ) ) > > > diag(res) <- "" > > > print(res, quote=F) > > att1 att2 att3 > > att1 2 1 > > att2 2 2 > > att3 1 2 > > > > > > res2 <- crossprod(as.matrix( ratings[ , -1])) * > > 100 / nrow( ratings ) > > > res2[] <- paste( res2, "%", sep="" ) > > > diag(res2) <- "" > > > print(res2, quote=F) > > att1 att2 att3 > > att1 50% 25% > > att2 50% 50% > > att3 25% 50% > > > > > > > Be sure to bone up on format and sprintf before > > taking this into > > production. > > > > On Thu, 22 Feb 2007, Michael Wexler wrote: > > > > > Using R version 2.4.1 (2006-12-18) on Windows, I > > have a dataset which resembles this: > > > > > > id att1 att2 att3 > > > 1 1 1 0 > > > 2 1 0 0 > > > 3 0 1 1 > > > 4 1 1 1 > > > > > > ratings <- data.frame(id = c(1,2,3,4), att1 > > c(1,1,0,1), att2 = c(1,0,0,1), att3 = c(0,1,1,1)) > > > > > > I would like to get a cross tab of counts of > > co-ocurrence, which might resemble this: > > > > > > att1 att2 att3 > > > att1 2 1 > > > att2 2 2 > > > att3 1 2 > > > > > > with the hope of understanding, at least > pairwise, > > what things "hang together". (Yes, there are > much, > > much better ways to do this statistically > including > > clustering and binary corrected correlation, but > the > > audience I am working with asked for this version > > for a specific reason.) > > > > > > (Later on, I would also like to convert to > > percentages of the total unique pop, so the final > > version of the table would be > > > > > > > > > att1 att2 att3 > > > > > > att1 50% 25% > > > > > > att2 50% 50% > > > > > > att3 25% 50% > > > > > > > > > But I can do this in excel if I can get the > first > > table out.) > > > > > > I have tried the reshape library, but could not > > get anything resembling this (both on its own, as > > well as feeding in to table()). (I have also > played > > with transposing and using some comments from this > > list from 2002 and 2004, but the questioners > appear > > to assume more knowledge than I have in use of R; > > the example in the posting guide was also more > > complex than I was ready for, I'm afraid.) > > > > > > Sample of some of my efforts: > > > library(reshape) > > > melt(ratings,id=c("id")) > > > > > > ds1 <- melt(ratings,id=c("id")) > > > table(ds1$variable, ds1$variable) # returns only > > rowcounts, 3 along diagonal > > > xtabs(formula = value ~ ds1$variable + > > ds1$variable , data=ds1) # returns only a single > row > > of collapsed counts, appears to not allow 1 > variable > > in multiple uses > > > > > > I suspect I am close, so any nudges in the right > > direction would be helpful. > > > > > > Thanks much, Michael > > > > > > PS: www.rseek.org is very impressive, I heartily > > encourage its use. > > > > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, > > reproducible code. > > > > > > > Charles C. Berry (858) > > 534-2098 > > Dept of > > Family/Preventive Medicine > > E mailto:cberry at tajo.ucsd.edu UC San > > Diego > > http://biostat.ucsd.edu/~cberry/ La Jolla, > > San Diego 92093-0901 > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > > reproducible code. > > > > > __________________________________________________ > Do You Yahoo!?> protection around > http://mail.yahoo.com >