thr3ads.net - R help - [R] detect and replace outliers by the average [Apr 2023]

If this information is useful, please help other people find it:
Share via:

AbouEl-Makarim Aboueissa

2023-Apr-20 18:36 UTC

[R] detect and replace outliers by the average

Dear All:



*Re:* detect and replace outliers by the average



The dataset, please see attached, contains a group factoring column ?
*factor*? and two columns of data ?x1? and ?x2? with some NA values. I need
some help to detect the outliers and replace it and the NAs with the
average within each level (0,1,2) for each variable ?x1? and ?x2?.



I tried the below code, but it did not accomplish what I want to do.





data<-read.csv("G:/20-Spring_2023/Outliers/data.csv", header=TRUE)

data

replace_outlier_with_mean <- function(x) {

  replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE))  #### ,
na.rm=TRUE NOT working

}

data[] <- lapply(data, replace_outlier_with_mean)





Thank you all very much for your help in advance.





with many thanks

abou


______________________


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*

Rui Barradas

2023-Apr-20 18:44 UTC

head link

[R] detect and replace outliers by the average

?s 19:36 de 20/04/2023, AbouEl-Makarim Aboueissa
escreveu:> Dear All:
> 
> 
> 
> *Re:* detect and replace outliers by the average
> 
> 
> 
> The dataset, please see attached, contains a group factoring column ?
> *factor*? and two columns of data ?x1? and ?x2? with some NA values. I need
> some help to detect the outliers and replace it and the NAs with the
> average within each level (0,1,2) for each variable ?x1? and ?x2?.
> 
> 
> 
> I tried the below code, but it did not accomplish what I want to do.
> 
> 
> 
> 
> 
> data<-read.csv("G:/20-Spring_2023/Outliers/data.csv",
header=TRUE)
> 
> data
> 
> replace_outlier_with_mean <- function(x) {
> 
>    replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE))  #### ,
> na.rm=TRUE NOT working
> 
> }
> 
> data[] <- lapply(data, replace_outlier_with_mean)
> 
> 
> 
> 
> 
> Thank you all very much for your help in advance.
> 
> 
> 
> 
> 
> with many thanks
> 
> abou
> 
> 
> ______________________
> 
> 
> *AbouEl-Makarim Aboueissa, PhD*
> 
> *Professor, Mathematics and Statistics*
> *Graduate Coordinator*
> 
> *Department of Mathematics and Statistics*
> *University of Southern Maine*
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.Hello,

There is no data set attached, see the posting guide on what file 
extensions are allowed as attachments.

As for the question, try to compute mean(x, na.rm = TRUE)  first, then 
use this value in the replace instruction. Without data I'm just guessing.

Hope this helps,

Rui Barradas

Richard O'Keefe

2023-Apr-21 23:30 UTC

head link

[R] detect and replace outliers by the average

This can be seen as three steps:
(1) identify outliers
(2) replace them with NA (trivial)
(3) impute missing values.
There are packages for imputing missing data.
See
https://www.analyticsvidhya.com/blog/2016/03/tutorial-powerful-packages-imputing-missing-values/

Here I just want to address the first step.
An observation is only an outlier relative to some model.
Outliers can indicate
- data that are just wrong (data entry error, failing battery in measurement
  device, all sorts of stuff).  In this case, deletion + imputation makes
  sense.
- data that are generated by a mixture of two or more processes,
  not the single process you thought was there.  In this case,
  deletion + imputation is dangerous.  The world is trying to tell
  you something and you are ignoring it.
- the model is wrong.  Here again, deletion + imputation is
  dangerous.  You need a better model.

"Detecting outliers in R" as a web query turned up
https://statsandr.com/blog/outliers-detection-in-r/
on the first page of results.  There's plenty of material
about finding outliers.

But please give very VERY serious consideration to the
possibility that some or even all of your outliers are
actually GOOD data telling you something you need to know.


On Fri, 21 Apr 2023 at 06:38, AbouEl-Makarim Aboueissa <
abouelmakarim1962 at gmail.com> wrote:
> Dear All:
>
>
>
> *Re:* detect and replace outliers by the average
>
>
>
> The dataset, please see attached, contains a group factoring column ?
> *factor*? and two columns of data ?x1? and ?x2? with some NA values. I need
> some help to detect the outliers and replace it and the NAs with the
> average within each level (0,1,2) for each variable ?x1? and ?x2?.
>
>
>
> I tried the below code, but it did not accomplish what I want to do.
>
>
>
>
>
> data<-read.csv("G:/20-Spring_2023/Outliers/data.csv",
header=TRUE)
>
> data
>
> replace_outlier_with_mean <- function(x) {
>
>   replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE))  #### ,
> na.rm=TRUE NOT working
>
> }
>
> data[] <- lapply(data, replace_outlier_with_mean)
>
>
>
>
>
> Thank you all very much for your help in advance.
>
>
>
>
>
> with many thanks
>
> abou
>
>
> ______________________
>
>
> *AbouEl-Makarim Aboueissa, PhD*
>
> *Professor, Mathematics and Statistics*
> *Graduate Coordinator*
>
> *Department of Mathematics and Statistics*
> *University of Southern Maine*
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Apr 2023 - detect and replace outliers by the average

[R] detect and replace outliers by the average

[R] detect and replace outliers by the average

[R] detect and replace outliers by the average