Amelia Vettori
2010-Dec-15 15:18 UTC
[R] Applying function to a TABLE and also "apply, tapply, sapply etc"
Dear R-help forum members, Suppose I have a data-frame having two variables and single data for each of them, as described below. variable_1 variable_2 10 20 I have written a function, say, 'fun' which uses input 10 and 20 and gives me desired result. fun = function(X, Y) { X + Y #( I am just giving an example of process. Actual process is quite different.) } result = fun(variable_1[1], variable_2[1]) # Thus, i should be getting answer 30 which I am storing in say 'ans1.csv' # ____________________________________________________________________ # My problem Suppose instead of having above dataframe having single data for variable 1 and variable 2, I have following data as variable_1 variable_2 10 20 40 30 3 11 I need to run the function 'fun' for each pair of values taken by variable_1 and variable_2 separately. Also, the results (= 30, 70 and 14) obtained for each of above pairs should be stored in different csv files, say "ans1.csv", "ans2.csv" and "ans3.csv" respectively which I can use for further analysis. (In reality each of these output files will consists of 1000 records). As I had mentioned in my earlier mail, I am new to R and I think I should be using apply or sapply or tapply etc., which I have tried but I am not able to proceed further as I am not able to understand it properly. It will be a great help to me if I receive the guidance w.r.t (a) how do I tackle above problem i.e. how do I apply the function to a table so that it will generate different csv files pertaining to pair of values "10 and 20", "40 and 30" and "3 and 11"; (b) I am not that sharp to understand the programming aspects of R taht easily, though I am really keen to learn R, so I will be highly obliged if someone helps me understand with some simple examples as to how "apply", "supply", "tapply", "mapply" etc can be used? I am sure this will go a long way in helping the new learners like me to undesrtand the proper use of these wonderful commands. I hope I am able to put forward my problem properly. Thanking all in advance for the anticipated guidance Amelia Vettori, Auckland [[alternative HTML version deleted]]
Liviu Andronic
2010-Dec-15 16:24 UTC
[R] Applying function to a TABLE and also "apply, tapply, sapply etc"
On Wed, Dec 15, 2010 at 4:18 PM, Amelia Vettori <amelia_vettori at yahoo.co.nz> wrote:> Dear R-help forum members, > > Suppose I have a data-frame having two variables and single data for each of them, as described below. > > variable_1?????????? variable_2 > ??????? 10????????????????????????? 20 > > I have written a function, say, 'fun' which uses input 10 and 20 and gives me desired result. > > fun = function(X, Y) > ???????? { > ???????? X + Y????????????? #( I am just giving an example of process. Actual process is > ?quite different.) > ???????? } > > result = fun(variable_1[1], variable_2[1])?? # Thus, i should be getting answer 30 which I am storing in say 'ans1.csv' > > # ____________________________________________________________________ > > # My problem > > Suppose instead of having above dataframe having single data for variable 1 and variable 2, I have following data as > > variable_1?????????? variable_2 > > ?????? 10???????????????????????? 20 > ??????? 40 ? ????????????????????? 30 > ??????? 3????????????????????????? 11 > > I need to run the function 'fun' for each pair of values taken by variable_1 and variable_2 separately. Also, the results (= 30, 70 and 14) obtained for each of above pairs should be stored in different csv files, say "ans1.csv", "ans2.csv" and "ans3.csv" respectively which I can use for further analysis. (In reality each of these output files will consists of 1000 records). > > As I had mentioned in my earlier mail, I am new to R and I think I > ?should be using apply or sapply or tapply etc., which I have tried but I am not able to proceed further as I am not able to understand it properly. > > It will be a great help to me if I receive the guidance w.r.t > > (a) how do I tackle above problem i.e. how do I apply the function to a table so that it will generate different csv files pertaining to pair of values "10 and 20", "40 and 30" and "3 and 11"; >Say you have the following data frame> dfVar1 V2 1 10 20 2 40 30 3 3 11> str(df)'data.frame': 3 obs. of 2 variables: $ Var1: num 10 40 3 $ V2 : num 20 30 11 Then> apply(df, 1, sum) ##compute sum() for each row1 2 3 30 70 14> apply(df, 2, sum) ##compute sum() for each columnVar1 V2 53 61> (b) I am not that sharp to understand the programming aspects of R taht easily, though I am really keen to learn R, so I will be highly obliged if someone helps me understand with some simple examples as to how "apply", "supply", "tapply", "mapply" etc can be used? >Only some examples that I understand well. ##apply function to each element of a list (data frames are lists) ##compute sum() for each column> lapply(df, sum)$Var1 [1] 53 $V2 [1] 61 ##sapply() is a variation of lapply(); see the docs> sapply(df, sum)Var1 V2 53 61 ##using the 'iris' data frame, for each Species level compute mean() of the Sepal.Length column> with(iris, tapply(Sepal.Length, Species, mean))setosa versicolor virginica 5.006 5.936 6.588 ##a friendlier interface is provided by by()> with(iris, by(Sepal.Length, Species, mean))Species: setosa [1] 5.006 ------------------------------------------------------------ Species: versicolor [1] 5.936 ------------------------------------------------------------ Species: virginica [1] 6.588 ##the same, now for four variables at the same time> by(iris[1:4], iris$Species, mean)iris$Species: setosa Sepal.Length Sepal.Width Petal.Length Petal.Width 5.006 3.428 1.462 0.246 ------------------------------------------------------------ iris$Species: versicolor Sepal.Length Sepal.Width Petal.Length Petal.Width 5.936 2.770 4.260 1.326 ------------------------------------------------------------ iris$Species: virginica Sepal.Length Sepal.Width Petal.Length Petal.Width 6.588 2.974 5.552 2.026 For an example of mapply see this recent post: http://r.789695.n4.nabble.com/calculating-mean-of-list-components-tp3088986p3089057.html For more on vectorization, see sections 3 and 4 of the 'R inferno' [1]. Also check 'Some Hints for the R Beginner' [2]. [1] http://www.burns-stat.com/pages/Tutor/R_inferno.pdf [2] http://www.burns-stat.com/pages/Tutor/hints_R_begin.html Regards Liviu> I am sure this will go a long way in helping the new learners like me to undesrtand the proper use of these wonderful commands. > > I hope I am able to put forward my problem properly. > > Thanking all in advance for the anticipated guidance > > Amelia > ?Vettori, Auckland > > > > > > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail
Seeliger.Curt at epamail.epa.gov
2010-Dec-15 16:47 UTC
[R] Applying function to a TABLE and also "apply, tapply, sapply etc"
Amelia of Aukland writes:> Suppose instead of having above dataframe having single data for > variable 1 and variable 2, I have following data as > > variable_1 variable_2 > > 10 20 > 40 30 > 3 11And so you do: foo <- data.frame(variable_1=c(10,40,3), variable_2=c(20,30,11))> I need to run the function 'fun' for each pair of values taken by > variable_1 and variable_2 separately. Also, the results (= 30, 70 > and 14) obtained for each of above pairs should be stored in > different csv files, say "ans1.csv", "ans2.csv" and "ans3.csv" > respectively which I can use for further analysis. (In reality each > of these output files will consists of 1000 records).No function is necessary, though apply() would work as has been pointed out. foo$answer <- foo$variable_1 + foo$variable_2 Writing to a csv is another matter-- write.csv(foo, 'foo.csv') -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD seeliger.curt@epa.gov 541/754-4638 [[alternative HTML version deleted]]
Brian Diggs
2010-Dec-15 19:25 UTC
[R] Applying function to a TABLE and also "apply, tapply, sapply etc"
On 12/15/2010 7:18 AM, Amelia Vettori wrote:> Dear R-help forum members, > > Suppose I have a data-frame having two variables and single data for > each of them, as described below. > > variable_1 variable_2 10 20 > > I have written a function, say, 'fun' which uses input 10 and 20 and > gives me desired result. > > fun = function(X, Y) { X + Y #( I am just giving an > example of process. Actual process is quite different.) } > > result = fun(variable_1[1], variable_2[1]) # Thus, i should be > getting answer 30 which I am storing in say 'ans1.csv' > > # > ____________________________________________________________________ > > # My problem > > Suppose instead of having above dataframe having single data for > variable 1 and variable 2, I have following data as > > variable_1 variable_2 > > 10 20 40 30 3 11 > > I need to run the function 'fun' for each pair of values taken by > variable_1 and variable_2 separately. Also, the results (= 30, 70 and > 14) obtained for each of above pairs should be stored in different > csv files, say "ans1.csv", "ans2.csv" and "ans3.csv" respectively > which I can use for further analysis. (In reality each of these > output files will consists of 1000 records). > > As I had mentioned in my earlier mail, I am new to R and I think I > should be using apply or sapply or tapply etc., which I have tried > but I am not able to proceed further as I am not able to understand > it properly. > > It will be a great help to me if I receive the guidance w.r.t > > (a) how do I tackle above problem i.e. how do I apply the function to > a table so that it will generate different csv files pertaining to > pair of values "10 and 20", "40 and 30" and "3 and 11"; > > (b) I am not that sharp to understand the programming aspects of R > taht easily, though I am really keen to learn R, so I will be highly > obliged if someone helps me understand with some simple examples as > to how "apply", "supply", "tapply", "mapply" etc can be used? > > I am sure this will go a long way in helping the new learners like me > to undesrtand the proper use of these wonderful commands. > > I hope I am able to put forward my problem properly. > > Thanking all in advance for the anticipated guidance > > Amelia Vettori, Auckland# a slightly more complicated demonstration function, which # gives a result that make sense for writing to a CSV file. fun <- function(X, Y) { data.frame(result=X + Y) } foo <- data.frame(variable_1=c(10,40,3), variable_2=c(20,30,11)) # using apply # This only really works if the columns in foo are the same # type because it will be transformed into a matrix (which # is of one type). Also, since the column names of the data.frame # don't match the arugments of fun, the unname is needed. # do.call is a somewhat advanced function that lets you call # a function with arguments that are stored in some other # list. apply(foo, 1, function(x) do.call("fun", as.list(unname(x)))) # version using apply, where foo has been transformed into # something more like what apply would expect. foo.m <- as.matrix(foo) colnames(foo.m) <- c("X","Y") apply(foo.m, 1, function(x){do.call("fun", as.list(x))}) # using lapply # lapply takes a list, which for this looping purpose would have # to be the row indexes of foo. This version does not reqire # the different arguements to be the same type. lapply(1:nrow(foo), function(i) {fun(foo[i,1],foo[i,2])}) # using mapply # This one is more designed for when multiple arguments to a # function are changing. mapply(fun, foo[,1], foo[,2]) # using Vectorize # A different approach, where instead of creating the looping # structure, create a new function which is vectorized over its # arguements. fun.v <- Vectorize(fun) fun.v(foo[,1], foo[,2]) # storing the results to disk results <- mapply(fun, foo[,1], foo[,2]) # results is a list, each element of which is one of the returned # sets of results corresponding to a row in the original data.frame lapply(1:length(results), function(r) {write.csv(results[r], file=paste("ans",r,".csv",sep=""))}) # if you didn't need different file names (the name of which depends on # the position of the result in the list, not anything in the result # itself), it could be simpler. lapply(results, summary) -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University