Hi, I have data stored in a list that I would like to aggregate and perform some basic stats. However, I would like to apply conditional statements so that not all the data are used. Basically, I want to get a specific variable, do some basic functions (such as a mean), but only get the data in each element's data that match the condition. The code I used is below:> result<-sapply(res, function(.df) { #res is the list containing file data+ if(.df$Volume>0)mean(.df$Volume) #only have the mean function calculate on values great than 0 + }) I did get a numeric output; however, when I checked the output value the conditional was ignored (i.e. it did not do anything to the calculation) I also obtained these warning statements: Warning messages: 1: In if (.df$Volume > 0) mean(.df$Volume) : the condition has length > 1 and only the first element will be used 2: In if (.df$Volume > 0) mean(.df$Volume) : the condition has length > 1 and only the first element will be used Please let me know what am I doing wrong and how can I apply a conditional statement to the sapply function. Thanks Mark
Ling, Gary (Electronic Trading)
2008-Aug-13 22:21 UTC
[R] Conditional statement used in sapply()
Hi Mark, How about this? result <- sapply(split(res, res$Volume>0)$`TRUE`, mean) There is one thing I'm not sure: is res$Volume a vector or single numeric? -gary -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Altaweel, Mark R. Sent: Wednesday, August 13, 2008 6:03 PM To: r-help at r-project.org Subject: [R] Conditional statement used in sapply() Hi, I have data stored in a list that I would like to aggregate and perform some basic stats. However, I would like to apply conditional statements so that not all the data are used. Basically, I want to get a specific variable, do some basic functions (such as a mean), but only get the data in each element's data that match the condition. The code I used is below:> result<-sapply(res, function(.df) { #res is the list containing filedata + if(.df$Volume>0)mean(.df$Volume) #only have the mean function calculate on values great than 0 + }) I did get a numeric output; however, when I checked the output value the conditional was ignored (i.e. it did not do anything to the calculation) I also obtained these warning statements: Warning messages: 1: In if (.df$Volume > 0) mean(.df$Volume) : the condition has length > 1 and only the first element will be used 2: In if (.df$Volume > 0) mean(.df$Volume) : the condition has length > 1 and only the first element will be used Please let me know what am I doing wrong and how can I apply a conditional statement to the sapply function. Thanks Mark ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------- This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing.
> -----Original Message----- > From: r-help-bounces at r-project.org[mailto:r-help-bounces at r-project.org]> On Behalf Of Altaweel, Mark R. > Sent: Wednesday, August 13, 2008 3:03 PM > To: r-help at r-project.org > Subject: [R] Conditional statement used in sapply() > > Hi, > > I have data stored in a list that I would like to aggregate andperform> some basic stats. However, I would like to apply conditionalstatements so> that not all the data are used. Basically, I want to get a specific > variable, do some basic functions (such as a mean), but only get thedata> in each element's data that match the condition. The code I used isbelow:> > > result<-sapply(res, function(.df) { #res is the list containingfile> data > + if(.df$Volume>0)mean(.df$Volume) #only have the mean functioncalculate> on values great than 0 > + }) >You probably want something such as result<-sapply(res, function(.df) { mean(.df$Volume[.df$Volume>0]) }) HTH Steve McKinney> > I did get a numeric output; however, when I checked the output valuethe> conditional was ignored (i.e. it did not do anything to thecalculation)> > I also obtained these warning statements: > > Warning messages: > 1: In if (.df$Volume > 0) mean(.df$Volume) : > the condition has length > 1 and only the first element will be used > 2: In if (.df$Volume > 0) mean(.df$Volume) : > the condition has length > 1 and only the first element will be used > > Please let me know what am I doing wrong and how can I apply aconditional> statement to the sapply function. > > Thanks > > Mark > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello - Altaweel, Mark R. wrote:> Hi, > > I have data stored in a list that I would like to aggregate and > perform some basic stats. However, I would like to apply conditional > statements so that not all the data are used. Basically, I want to > get a specific variable, do some basic functions (such as a mean), > but only get the data in each element's data that match the > condition. The code I used is below: > >> result<-sapply(res, function(.df) { #res is the list containing >> file data > + if(.df$Volume>0)mean(.df$Volume) #only have the mean function > calculate on values great than 0 + }) > > > I did get a numeric output; however, when I checked the output value > the conditional was ignored (i.e. it did not do anything to the > calculation) > > I also obtained these warning statements: > > Warning messages: 1: In if (.df$Volume > 0) mean(.df$Volume) : the > condition has length > 1 and only the first element will be used 2: > In if (.df$Volume > 0) mean(.df$Volume) : the condition has length > > 1 and only the first element will be used > > Please let me know what am I doing wrong and how can I apply a > conditional statement to the sapply function. >Before you think about sapply, what would you do if you had one element of this list. Write a function to do that. You wouldn't do : if(x$Volume > 0) mean(x$Volume) because x$Volume > 0 will create a logical vector greater than length 1 (assuming x$Volume is greater than length 1), and then "if" will issue the warning. You might do, mean(x$Volume[x$Volume > 0]) and turn it into a function. Then use sapply. Hopefully that gets you started! Erik