Hi, I have a few methodological and implementation questions for ya'll. Thank you in advance for your help. I have a dataset that reflects people's preference choices. I want to see if there's any kind of clustering effect among certain preference choices (e.g. do people who pick choice A also pick choice D). I have a data set that has one record per user ID, per preference choice. It's a "long" form of a data set that looks like this: ID | Page 123 | Choice A 123 | Choice B 456 | Choice A 456 | Choice B ... I thought that I should do the following 1. Make the data set "wide", counting the observations so the data looks like this: ID | Count of Preference A | Count of Preference B 123 | 1 | 1 ... Using table1 <- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' ) 2. Create a correlation matrix of preferences cor(table2[,-1]) How would I restrict my correlation to show preferences that met a minimum sample threshold? Can you confirm if the two following commands do the same thing? What would I do from here (or am I taking the wrong approach) table1 <- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' ) table2 <- with(data, table(Page,Page)) many thanks, Peter -- View this message in context: http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3849889.html Sent from the R help mailing list archive at Nabble.com.
On a methodological level, if the choices do not correspond on a cardinal or at least ordinal scale, you don't want to use correlations. Instead you should probably use Cramer's V, in particular if the choices are multinomial. Whether the wide format is necessary will depend on the format the function you are using expects. HTH, Daniel pde3p wrote:> > Hi, > > I have a few methodological and implementation questions for ya'll. Thank > you in advance for your help. I have a dataset that reflects people's > preference choices. I want to see if there's any kind of clustering effect > among certain preference choices (e.g. do people who pick choice A also > pick choice D). > > I have a data set that has one record per user ID, per preference choice. > It's a "long" form of a data set that looks like this: > > ID | Page > 123 | Choice A > 123 | Choice B > 456 | Choice A > 456 | Choice B > ... > > I thought that I should do the following > > 1. Make the data set "wide", counting the observations so the data looks > like this: > ID | Count of Preference A | Count of Preference B > 123 | 1 | 1 > ... > > Using > table1 <- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' ) > > 2. Create a correlation matrix of preferences > cor(table2[,-1]) > > How would I restrict my correlation to show preferences that met a minimum > sample threshold? Can you confirm if the two following commands do the > same thing? What would I do from here (or am I taking the wrong approach) > table1 <- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' ) > table2 <- with(data, table(Page,Page)) > > > many thanks, > Peter >-- View this message in context: http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3850076.html Sent from the R help mailing list archive at Nabble.com.
Seems your questions belong to rule mining for frequent item sets. check arules package Weidong Gu On Tue, Sep 27, 2011 at 11:13 PM, pip56789 <pde3p at virginia.edu> wrote:> Hi, > > I have a few methodological and implementation questions for ya'll. Thank > you in advance for your help. I have a dataset that reflects people's > preference choices. I want to see if there's any kind of clustering effect > among certain preference choices (e.g. do people who pick choice A also pick > choice D). > > I have a data set that has one record per user ID, per preference choice. > It's a "long" form of a data set that looks like this: > > ID | Page > 123 | Choice A > 123 | Choice B > 456 | Choice A > 456 | Choice B > ... > > I thought that I should do the following > > 1. Make the data set "wide", counting the observations so the data looks > like this: > ID | Count of Preference A | Count of Preference B > 123 | 1 | 1 > ... > > Using > table1 <- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' ) > > 2. Create a correlation matrix of preferences > cor(table2[,-1]) > > How would I restrict my correlation to show preferences that met a minimum > sample threshold? Can you confirm if the two following commands do the same > thing? What would I do from here (or am I taking the wrong approach) > table1 <- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' ) > table2 <- with(data, table(Page,Page)) > > > many thanks, > Peter > > -- > View this message in context: http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3849889.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 09/28/2011 01:13 PM, pip56789 wrote:> Hi, > > I have a few methodological and implementation questions for ya'll. Thank > you in advance for your help. I have a dataset that reflects people's > preference choices. I want to see if there's any kind of clustering effect > among certain preference choices (e.g. do people who pick choice A also pick > choice D). > > I have a data set that has one record per user ID, per preference choice. > It's a "long" form of a data set that looks like this: > > ID | Page > 123 | Choice A > 123 | Choice B > 456 | Choice A > 456 | Choice B > ... > > I thought that I should do the following > > 1. Make the data set "wide", counting the observations so the data looks > like this: > ID | Count of Preference A | Count of Preference B > 123 | 1 | 1 > ... > > Using > table1<- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' ) > > 2. Create a correlation matrix of preferences > cor(table2[,-1]) > > How would I restrict my correlation to show preferences that met a minimum > sample threshold? Can you confirm if the two following commands do the same > thing? What would I do from here (or am I taking the wrong approach) > table1<- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' ) > table2<- with(data, table(Page,Page)) > >Hi Peter, An easy way to visualize set intersections is the intersectDiagram function in the plotrix package. This will display the counts or percentages of each type of intersection. Your data could be passed like this: choices<-data.frame(IDs=sample(1:20,50,TRUE), sample(LETTERS[1:4],50,TRUE)) library(plotrix) intersectDiagram(choices) This example is a bit messy, as it will generate quite a few repeated choices that will be ignored by intersectDiagram, but it should give you the idea. Jim