sweepingplains at gmail.com
2008-May-08 06:59 UTC
[R] Finding dependencies and clusters in live survey data with a mix of independent variable types
I have a set of live data about customer satisfaction and desires of a live ecommerce site. There are only 311 survey responses. There were approximately 154 questions. A large fraction of these questions were questions with numerical answers (e.g. on a scale of 1 to 10 how satisfied are you with our service, how many months have you been a customer for, how old are you, how many computers do you own). A second large fraction of the questions had binary answers (e.g. do you own an ipod, do you think blogging will be more or less popular in 5 years time than it is now, do you use online video sites). The remaining data were multinomial answers (e.g. from which of these sources did you first find out about this site, which of these most closely describes the industry you are in). I am mostly interested in finding subsets of customers for whom some subset of survey answers best correlate with their answer to the question "On a scale of 1 to 10, how would you rate our overall service?" I am also interested in identifying market segments of like-minded individuals with similar interests and views and find out what they, as a group most want from the service in the future. I am aware of how to perform multiple linear regression using R but I am not sure how to 1. handle the binary variables and multinomial variables as independent variables 2. find a set of canonical independent variables which most closely correlate in combination to the "overall service rating" data 3. find market segments among the data by looking for clusters of like interests and views Are any of the above suitable for analysis by R? If so, do there exist example programs available which achieve similar things that I can study as guides? Thanks in advance for your contemplation. Charlie