Dear list I have quite a small data set in which I need to have the following values ignored - not used when performing an analysis but they need to be included later in the report that I write. Can anyone help with a suggestion as to how this can be accomplished Values to be ignored 0 - zero and 1 this is in addition to NA (null) The reason is that I need to use the log10 of the values when performing the calculation. Currently I hand massage the data set, about a 100 values, of which less than 5 to 10 are in this category. The NA values are NOT the problem What I was hoping was that I did not have to use a series of if and ifelse statements. Perhaps there is a more elegant solution. Any ideas would be welcomed. Regards Steve
Steve Sidney <sbsidney <at> mweb.co.za> writes:> > Dear list > > I have quite a small data set in which I need to have the following > values ignored - not used when performing an analysis but they need to > be included later in the report that I write. > > Can anyone help with a suggestion as to how this can be accomplished > > Values to be ignored > > 0 - zero and 1 this is in addition to NA (null) > > The reason is that I need to use the log10 of the values when performing > the calculation. > > Currently I hand massage the data set, about a 100 values, of which less > than 5 to 10 are in this category. > > The NA values are NOT the problem > > What I was hoping was that I did not have to use a series of if and > ifelse statements. Perhaps there is a more elegant solution.It would help to have a more precise/reproducible example, but if your data set (a data frame) is d, and you want to ignore cases where the response variable x is either 0 or 1, you could say ds <- subset(d,!x %in% c(0,1)) Some modeling functions (such as lm()), but not all of them, have a 'subset' argument so you can provide this criterion on the fly: lm(...,subset=(!x %in% c(0,1)))
Thanks for the questions. 1) The data represents micro-organism counts and a count of zero in this case is highly unlikely given the info we have; including the other participants. 2) The data is submitted in duplicate and then a standardised sum and difference is established and is used to calculate a Z-score which is used as a measure of performance. Given both 1) and 2) it is necessary to exclude a raw count of zero (since the log of 0 is meaningless) and a count of one (since the log of 1 of course is zero). I guess one can think of these values as outliers and that is what I am trying to exclude. There is ample evidence that such an approach is acceptable. Thanks for the interest Steve On 2010/12/13 06:47 PM, Stavros Macrakis wrote:> If you need to take the log of the values for your calculation, then > what does it mean that you have 0 values in the input? > > And why do you need to exclude the 1 values? > > Are you sure that a) you are doing the correct kind of analysis and b) > the analysis is correct if you exclude 0 and 1? > > -s > > On Mon, Dec 13, 2010 at 10:38, Steve Sidney<sbsidney at mweb.co.za> wrote: >> Dear list >> >> I have quite a small data set in which I need to have the following values >> ignored - not used when performing an analysis but they need to be included >> later in the report that I write. >> >> Can anyone help with a suggestion as to how this can be accomplished >> >> Values to be ignored >> >> 0 - zero and 1 this is in addition to NA (null) >> >> The reason is that I need to use the log10 of the values when performing the >> calculation. >> >> Currently I hand massage the data set, about a 100 values, of which less >> than 5 to 10 are in this category. >> >> The NA values are NOT the problem >> >> What I was hoping was that I did not have to use a series of if and ifelse >> statements. Perhaps there is a more elegant solution. >> >> Any ideas would be welcomed. >> >> Regards >> Steve >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>