Hi, is there a way to take a dataset and extract numeric columns and create interaction columns from it automatically? For e.g. there are 5 columns of data: A,B,C,D,E. CDE are numeric. Can someone provide code to automatically create more columns such as: 1) C*D, C*E, C*D*E, (C+E)/(D+.01 (to avoid divide by zero), (D+E)/(C+.01 (to avoid divide by zero), (C+D)/(E+.01 (to avoid divide by zero)) ? I know in glm multiplying can create terms but i want the columns to be part of the data set so that i can feed this into Random forest to pick out predictive interaction terms as regression cannot reliably handle correlated interaction terms. if anyone has some simple code that can do this that would be helpful. thanks Dhruv [[alternative HTML version deleted]]
Hi, is there a way to take a dataset and extract numeric columns and create interaction columns from it automatically? For e.g. there are 5 columns of data: A,B,C,D,E. CDE are numeric. Can someone provide code to automatically create more columns such as: 1) C*D, C*E, C*D*E, (C+E)/(D+.01 (to avoid divide by zero), (D+E)/(C+.01 (to avoid divide by zero), (C+D)/(E+.01 (to avoid divide by zero)) ? I know in glm multiplying can create terms but i want the columns to be part of the data set so that i can feed this into Random forest to pick out predictive interaction terms as regression cannot reliably handle correlated interaction terms. if anyone has some simple code that can do this that would be helpful. thanks Dhruv [[alternative HTML version deleted]]
Dear Dhruv, You could create interaction variables manually (assuming A is your dependent variable). Just multiply the variables together. cd.int<-C*D ce.int<-C*E cde.int<-C*D*E # what about D*E, or interactions with B? Include those in your model, such as A~B+C+D+E+cd.int+cd.int+ce.int+cde.int. Then you can compare those models to the results you get when you specify the interaction in the model formula directly using the documented syntax. In your R-console, type ?formula, or help("formula") for details. Sincerely, KeithC. -----Original Message----- From: Sharma, Dhruv [mailto:Dhruv.Sharma at PenFed.org] Sent: Saturday, March 06, 2010 10:30 AM To: r-help at r-project.org Subject: [R] r code to generate interaction columns Hi, is there a way to take a dataset and extract numeric columns and create interaction columns from it automatically? For e.g. there are 5 columns of data: A,B,C,D,E. CDE are numeric. Can someone provide code to automatically create more columns such as: 1) C*D, C*E, C*D*E, (C+E)/(D+.01 (to avoid divide by zero), (D+E)/(C+.01 (to avoid divide by zero), (C+D)/(E+.01 (to avoid divide by zero)) ? I know in glm multiplying can create terms but i want the columns to be part of the data set so that i can feed this into Random forest to pick out predictive interaction terms as regression cannot reliably handle correlated interaction terms. if anyone has some simple code that can do this that would be helpful. thanks Dhruv [[alternative HTML version deleted]]
thanks Kieth. I wanted something generic code to check column data type and loop through and create the interaction columns automatically as I want to test this out as a new algorithm for data mining. Traditional regression may give misleading results with multi-collinearity and thus I wanted to take interaction terms and run them through random forests and rpart as they would need interaction terms to be manually created. Hope that clarifies. Dhruv -----Original Message----- From: kMan [mailto:kchamberln at gmail.com] Sent: Sunday, March 07, 2010 8:08 PM To: Sharma, Dhruv; r-help at r-project.org Subject: RE: [R] r code to generate interaction columns Dear Dhruv, You could create interaction variables manually (assuming A is your dependent variable). Just multiply the variables together. cd.int<-C*D ce.int<-C*E cde.int<-C*D*E # what about D*E, or interactions with B? Include those in your model, such as A~B+C+D+E+cd.int+cd.int+ce.int+cde.int. Then you can compare those models to the results you get when you specify the interaction in the model formula directly using the documented syntax. In your R-console, type ?formula, or help("formula") for details. Sincerely, KeithC. -----Original Message----- From: Sharma, Dhruv [mailto:Dhruv.Sharma at PenFed.org] Sent: Saturday, March 06, 2010 10:30 AM To: r-help at r-project.org Subject: [R] r code to generate interaction columns Hi, is there a way to take a dataset and extract numeric columns and create interaction columns from it automatically? For e.g. there are 5 columns of data: A,B,C,D,E. CDE are numeric. Can someone provide code to automatically create more columns such as: 1) C*D, C*E, C*D*E, (C+E)/(D+.01 (to avoid divide by zero), (D+E)/(C+.01 (to avoid divide by zero), (C+D)/(E+.01 (to avoid divide by zero)) ? I know in glm multiplying can create terms but i want the columns to be part of the data set so that i can feed this into Random forest to pick out predictive interaction terms as regression cannot reliably handle correlated interaction terms. if anyone has some simple code that can do this that would be helpful. thanks Dhruv [[alternative HTML version deleted]]