I still need to do some repetitive statistical analysis on some outcomes from a dataset. Take the following as an example; id sex hiv age famsize bmi resprate 1 M Pos 23 2 16 15 2 F Neg 24 5 18 14 3 F Pos 56 14 23 24 4 F Pos 67 3 33 31 5 M Neg 34 2 21 23 I want to know if there are statistically detectable differences in all of the continuous variables in my data set when subdivided by sex or hiv status (ie are age, family size, bmi and resprate different in my male and female patients or in hiv pos/neg patients) Of course I can use wilcoxon or t-tests e.g: wilcox.test( age~sex) wilcox.test(famsize~sex) wilcox.test(bmi~sex) wilcox.test(resprate~sex) wilcox.test( age~hiv) wilcox.test(famsize~hiv) wilcox.test(bmi~hiv) wilcox.test(resprate~hiv) but there must be some easy way of looping/automating this code (i.e. get all the continuous variables analysed one by one by sex, then analysed one by one by hiv status). Obviously my actual dataset is considerably bigger than what is shown here - I have many variables to assess making the longhand instruction to do every test pretty unsatisfactory. I think I can use ?for? or some other looping command for this purpose but I can?t work out how. I think I don?t properly understand how loops work yet as I'm still quite new to R. Please could someone help ? ideally with an explanation and some quick sample code? Derek -- View this message in context: http://r.789695.n4.nabble.com/Using-functions-loops-for-repetitive-commands-tp3498006p3498006.html Sent from the R help mailing list archive at Nabble.com.
Hello, Derek, see below. On Thu, 5 May 2011, dereksloan wrote:> I still need to do some repetitive statistical analysis on some outcomes > from a dataset. > > Take the following as an example; > > id sex hiv age famsize bmi resprate > 1 M Pos 23 2 16 15 > 2 F Neg 24 5 18 14 > 3 F Pos 56 14 23 24 > 4 F Pos 67 3 33 31 > 5 M Neg 34 2 21 23 > > I want to know if there are statistically detectable differences in all > of the continuous variables in my data set when subdivided by sex or hiv > status (ie are age, family size, bmi and resprate different in my male > and female patients or in hiv pos/neg patients) Of course I can use > wilcoxon or t-tests e.g: > > wilcox.test( age~sex) > wilcox.test(famsize~sex) > wilcox.test(bmi~sex) > wilcox.test(resprate~sex) > wilcox.test( age~hiv) > wilcox.test(famsize~hiv) > wilcox.test(bmi~hiv) > wilcox.test(resprate~hiv) > .... [snip]Define, e. g., my.wilcox.tests <- function( var.names, groupvar.name, data) { lapply( var.names, function( v) { form <- as.formula( paste( v, "~", groupvar.name)) wilcox.test( form, data = data) } ) } and call something like my.wilcox.test( <character vector with relevant variable names>, <character string with relevant grouping variable>, data = <your data set as data frame>) Caveat: untested! Hth -- Gerrit --------------------------------------------------------------------- Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner
Hi Derek, You can accomplish your loop jobs by following means: (a) use for loop (b) use while loop (c) use lapply, tapply, or sapply. (i feel "lapply is the elegant way ) ---------------For Loop----------------------------- "for" loops are pretty simple to use and is almost similar to any other scripting languages you know.( I am referring to Matlab) (Example 1) lets say you know that you have to run 10 iterations then you can run it as for(i in 1:10) print(i) //it will print the number from 1 to 10 (Example 2) You don't know how many iterations you need to run. Only thing you have is some vector and you want to do some operation on that vector. You can do something like this: myVector<-c(20,45,23,45,89) for(i in seq_along(myVector)) print(myVector[i] -------------Using lapply------------------------- In "lapply" you need to provide mainly two things: (1)First parameter: vectors or some sequence of numbers (2)Second parameter: A function which could be user defined function or some other inbuilt function. lapply will call the function for every number given in the "First parameter of the function) For example: x<-c(10,20,20) lapply(seq_along(x),function(i) {//your logic}) if you see the first parameter i have sent seq_along(x). The outcome of seq_along(x) will be 1, 2,3. Now lapply will take each of these numbers and call the function. That means lapply is calling the function thrice for the current data set something like this function(1) { //your logic} function(2) { } function(3) { //) That means your logic inside the function will be executed for each and every value specified in the first parameter of the lapply function. I hope it helps you in some way. For your problem, i am making a guess that you are using data frame or matrix to store the data and then you want to automate the data right? You can try using "lapply", i think that would be efficient..Let me also try .. Regards, Som Shekhar
Your code may be untested but it works - also helping me slowly to start understanding how to write functions. Thank you. However I still have difficulty. I also have some categorical variables to analyse by age & hiv status - i.e. my dataset expands to (for example); id sex hiv age famsize bmi resprate smoker alcohol 1 M Pos 23 2 16 15 Y Y 2 F Neg 24 5 18 14 Y Y 3 F Pos 56 14 23 24 Y N 4 F Pos 67 3 33 31 N N 5 M Neg 34 2 21 23 N N Using the template for the code you sent me I thought I could analyse the categorical variables by sex & hiv status using a chiq-squared test; Long-hand this would be; chisq.test(smoker,sex) chisq.test(alcohol,sex) chisq.test(smoker,hiv) chisq.test(alcohol,hiv) Again I wanted to use a function to loop automate it and thought I could write; categ<-c(smoker,alcohol) group.name<-c(sex,hiv) bl.chisq<-function(categ,group.name,<dataframe name>){ lapply(categ, function(y){ form2<-as.formula(paste(y,group.name)) chisq.test(form2,<dataframe name>) }) } bl.chisq(categ,group.name,<data frame name>) but I get an error message: Error in parse(text = x) : unexpected symbol in "smoker sex" What is wrong with the code? Is is because the wilcox.test is a formula (with a ~ symbol for modelling) whilst the chisq.test simply requires me to list raw data? If so how can I change my code to automate the chisq.test in the same way I did for the wilcox.test? Many thanks for any help! Derek -- View this message in context: http://r.789695.n4.nabble.com/Using-functions-loops-for-repetitive-commands-tp3498006p3498427.html Sent from the R help mailing list archive at Nabble.com.
Hello, Derek, first of all, be very aware of what David Winsemius said; you are about to enter the area of "unprincipled data-mining" (as he called it) with its trap -- one of many -- of multiple testing. So, *if* you know what the consequences and possible remedies are, a purely R-syntactic "solution" to your problem might be the (again not fully tested) hack below.> If so how can I change my code to automate the chisq.test in the same > way I did for the wilcox.test?Try lapply( <your_data_frame>[<selection_of_relevant_components>], function( y) chisq.test( y, <your_data_frame>$<group_name>) ) or even shorter: lapply( <your_data_frame>[<selection_of_relevant_components>], chisq.test, <your_data_frame>$<group_name> ) However, in the resulting output you will not be seeing the names of the variables that went into the first argument of chisq.test(). This is a little bit more complicated to resolve: lapply( names( <your_data_frame>[<selection_of_relevant_components>]), function( y) eval( substitute( chisq.test( <your_data_frame>$y0, <your_data_frame>$tension), list( y0 = y) ) ) ) Still another possibility is to use xtabs() (with its summary-method) which has a formula argument. Hoping that you know what to do with the results -- Gerrit --------------------------------------------------------------------- Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner