AbouEl-Makarim Aboueissa
2023-Apr-20 18:55 UTC
[R] detect and replace outliers by the average
Dear All: the attached file in the .txt format *Re:* detect and replace outliers by the average The dataset, please see attached, contains a group factoring column ? *factor*? and two columns of data ?x1? and ?x2? with some NA values. I need some help to detect the outliers and replace it and the NAs with the average within each level (0,1,2) for each variable ?x1? and ?x2?. I tried the below code, but it did not accomplish what I want to do. The average within each level should be computed after discard the outliers. data<-read.csv("G:/20-Spring_2023/Outliers/data.csv", header=TRUE) data replace_outlier_with_mean <- function(x) { replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE)) #### , na.rm=TRUE NOT working } data[] <- lapply(data, replace_outlier_with_mean) Thank you all very much for your help in advance. with many thanks abou ______________________ *AbouEl-Makarim Aboueissa, PhD* *Professor, Mathematics and Statistics* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: data.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20230420/392e459a/attachment.txt>
What does it mean when one column is just blank, neither a number nor NA, just nothing? On Fri, 21 Apr 2023 at 07:08, AbouEl-Makarim Aboueissa < abouelmakarim1962 at gmail.com> wrote:> Dear All: the attached file in the .txt format > > > > *Re:* detect and replace outliers by the average > > > > The dataset, please see attached, contains a group factoring column ? > *factor*? and two columns of data ?x1? and ?x2? with some NA values. I need > some help to detect the outliers and replace it and the NAs with the > average within each level (0,1,2) for each variable ?x1? and ?x2?. > > > > I tried the below code, but it did not accomplish what I want to do. > > > > The average within each level should be computed after discard the > outliers. > > > > data<-read.csv("G:/20-Spring_2023/Outliers/data.csv", header=TRUE) > > data > > replace_outlier_with_mean <- function(x) { > > replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE)) #### , > na.rm=TRUE NOT working > > } > > data[] <- lapply(data, replace_outlier_with_mean) > > > > > > Thank you all very much for your help in advance. > > > > > > with many thanks > > abou > ______________________ > > > *AbouEl-Makarim Aboueissa, PhD* > > *Professor, Mathematics and Statistics* > *Graduate Coordinator* > > *Department of Mathematics and Statistics* > *University of Southern Maine* > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]