sylvain willart
2014-Feb-04 09:19 UTC
[R] Which analysis for a set of dummy variables alone ?
Dear R-users, I have a dataset I would like to analyze and plot It consists of 100 dummy variables (0/1) for about 2,000,000 observations There is absolutely no quantitative variable, nor anything I could use as an explained variable for a regression analysis. Actually, the dataset represents the patronage of 2 billion customers for 100 stores. It equals 1 if the consumer go to the store, 0 if he doesn't. With no further information. As the variable look like factors (0/1), I thought I could go for a Mutliple Correspondence Analysis (MCA). However, the resulting plot consists of 2 points for each variable (one for 1 and one for 0) which is not easily interpretable. (or is there a method for not plotting certain points in MCA?) I also tried to consider my dataset as a bipartite network (consumer-store). However, the plot is not really insightful, as I am especially looking for links between stores. (kind of "if a consumer go to that store, he probably also goes to this one...") So, I have a simple question: which method you would choose for computing and plotting the links between a set of dummy variable? Thanks in advance Sylvain PhD Marketing Associate Professor University of Lille - FR [[alternative HTML version deleted]]
This sounds more like a statistics question than an R question. You may have better luck posting to a different forum, e.g., Cross Validated, http://stats.stackexchange.com/. Jean On Tue, Feb 4, 2014 at 3:19 AM, sylvain willart <sylvain.willart@gmail.com>wrote:> Dear R-users, > > I have a dataset I would like to analyze and plot > It consists of 100 dummy variables (0/1) for about 2,000,000 observations > There is absolutely no quantitative variable, nor anything I could use as > an explained variable for a regression analysis. > > Actually, the dataset represents the patronage of 2 billion customers for > 100 stores. It equals 1 if the consumer go to the store, 0 if he doesn't. > With no further information. > > As the variable look like factors (0/1), I thought I could go for a > Mutliple Correspondence Analysis (MCA). However, the resulting plot > consists of 2 points for each variable (one for 1 and one for 0) which is > not easily interpretable. (or is there a method for not plotting certain > points in MCA?) > > I also tried to consider my dataset as a bipartite network > (consumer-store). However, the plot is not really insightful, as I am > especially looking for links between stores. (kind of "if a consumer go to > that store, he probably also goes to this one...") > > So, I have a simple question: which method you would choose for computing > and plotting the links between a set of dummy variable? > > Thanks in advance > > Sylvain > PhD Marketing > Associate Professor University of Lille - FR > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]