Hi Everyone, I have a very long list of data-points (+2300) and i know from my histogram that there are outliers which are affecting my mean. I was wondering if anyone on here knows a way i can quickly get R to calculate and remove data which is 3 standard deviations from the mean? I am hoping this will tidy my data and give me a repeatable method of tidying for future data collection. Please if you do post code, make it as user friendly as possible! I am not a very good programmer, i can load my data into R and do basic stats on it however i havent tried much else.... Thank you in advance for any advice given :) -- View this message in context: http://r.789695.n4.nabble.com/Remove-data-3-standard-deviatons-from-the-mean-using-R-tp4663745.html Sent from the R help mailing list archive at Nabble.com.
Berend Hasselman
2013-Apr-09 13:25 UTC
[R] Remove data 3 standard deviatons from the mean using R?
On 09-04-2013, at 13:12, Lorna <lornam at essex.ac.uk> wrote:> Hi Everyone, > > I have a very long list of data-points (+2300) and i know from my histogram > that there are outliers which are affecting my mean. > > I was wondering if anyone on here knows a way i can quickly get R to > calculate and remove data which is 3 standard deviations from the mean? I am > hoping this will tidy my data and give me a repeatable method of tidying for > future data collection. > > Please if you do post code, make it as user friendly as possible! I am not a > very good programmer, i can load my data into R and do basic stats on it > however i havent tried much else....# some test data + standard deviation of same testdata <- rnorm(100,0,5) sd.td <- sd(testdata) # threshold (set to 3.0 for your specific situation) alpha <- 1.5 # determine which items fall within bounds and select them pidx <- (testdata<mean(testdata)+alpha*sd.td) & (testdata>mean(testdata)-alpha*sd.td) testdata[pidx] Berend
David Winsemius
2013-Apr-09 13:46 UTC
[R] Remove data 3 standard deviatons from the mean using R?
On Apr 9, 2013, at 4:12 AM, Lorna wrote:> Hi Everyone, > > I have a very long list of data-points (+2300) and i know from my > histogram > that there are outliers which are affecting my mean. > > I was wondering if anyone on here knows a way i can quickly get R to > calculate and remove data which is 3 standard deviations from the > mean? I am > hoping this will tidy my data and give me a repeatable method of > tidying for > future data collection. > > Please if you do post code, make it as user friendly as possible! I > am not a > very good programmer, i can load my data into R and do basic stats > on it > however i havent tried much else.... > > Thank you in advance for any advice given :) >This plan has no statistical justification. Around here we have reverence for data. Outliers are often meaningful. Requests to distort your data should be accompanied by a coherent argument. -- David Winsemius, MD Alameda, CA, USA
S Ellison
2013-Apr-11 17:19 UTC
[R] Remove data 3 standard deviatons from the mean using R?
> -----Original Message----- > I have a very long list of data-points (+2300) and i know > from my histogram that there are outliers which are affecting my mean.If extreme values are known to be unreliable*, take a look at robust statisical methods (or even the median) instead of the mean. huber in MASS and huberM in robustbase are often appropriate for that kind of situation. If the extreme values are not known to be unreliable, see previous posts... S Ellison *Reverence for data is fundamental, but that doesn't mean being blind to the apparently limitless number of ways people, instruments and the universe at large can foul up observations and data collection. A degree of scepticism about extreme values is healthy; dogmatic retention or rejection is not. ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}