AbouEl-Makarim Aboueissa
2023-Apr-20 18:55 UTC
[R] detect and replace outliers by the average
Dear All: the attached file in the .txt format
*Re:* detect and replace outliers by the average
The dataset, please see attached, contains a group factoring column ?
*factor*? and two columns of data ?x1? and ?x2? with some NA values. I need
some help to detect the outliers and replace it and the NAs with the
average within each level (0,1,2) for each variable ?x1? and ?x2?.
I tried the below code, but it did not accomplish what I want to do.
The average within each level should be computed after discard the outliers.
data<-read.csv("G:/20-Spring_2023/Outliers/data.csv", header=TRUE)
data
replace_outlier_with_mean <- function(x) {
replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE)) #### ,
na.rm=TRUE NOT working
}
data[] <- lapply(data, replace_outlier_with_mean)
Thank you all very much for your help in advance.
with many thanks
abou
______________________
*AbouEl-Makarim Aboueissa, PhD*
*Professor, Mathematics and Statistics*
*Graduate Coordinator*
*Department of Mathematics and Statistics*
*University of Southern Maine*
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: data.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20230420/392e459a/attachment.txt>
What does it mean when one column is just blank, neither a number nor NA, just nothing? On Fri, 21 Apr 2023 at 07:08, AbouEl-Makarim Aboueissa < abouelmakarim1962 at gmail.com> wrote:> Dear All: the attached file in the .txt format > > > > *Re:* detect and replace outliers by the average > > > > The dataset, please see attached, contains a group factoring column ? > *factor*? and two columns of data ?x1? and ?x2? with some NA values. I need > some help to detect the outliers and replace it and the NAs with the > average within each level (0,1,2) for each variable ?x1? and ?x2?. > > > > I tried the below code, but it did not accomplish what I want to do. > > > > The average within each level should be computed after discard the > outliers. > > > > data<-read.csv("G:/20-Spring_2023/Outliers/data.csv", header=TRUE) > > data > > replace_outlier_with_mean <- function(x) { > > replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE)) #### , > na.rm=TRUE NOT working > > } > > data[] <- lapply(data, replace_outlier_with_mean) > > > > > > Thank you all very much for your help in advance. > > > > > > with many thanks > > abou > ______________________ > > > *AbouEl-Makarim Aboueissa, PhD* > > *Professor, Mathematics and Statistics* > *Graduate Coordinator* > > *Department of Mathematics and Statistics* > *University of Southern Maine* > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]