thr3ads.net - R help - [R] detect and replace outliers by the average [Apr 2023]

If this information is useful, please help other people find it:
Share via:

AbouEl-Makarim Aboueissa

2023-Apr-20 18:46 UTC

[R] detect and replace outliers by the average

Hi Rui:


here is the dataset

factor x1 x2
0 700 700
0 700 500
0 470 470
0 710 560
0 5555 520
0 610 720
0 710 670
0 610 9999
1 690 620
1 580 540
1 690 690
1 NA 401
1 450 580
1 700 700
1 400 8888
1 6666 600
1 500 400
1 680 650
2 117 63
2 120 68
2 130 73
2 120 69
2 125 54
2 999 70
2 165 62
2 130 987
2 123 70
2 78
2 98
2 5
2 321 NA

with many thanks
abou
______________________


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Thu, Apr 20, 2023 at 2:44?PM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> ?s 19:36 de 20/04/2023, AbouEl-Makarim Aboueissa escreveu:
> > Dear All:
> >
> >
> >
> > *Re:* detect and replace outliers by the average
> >
> >
> >
> > The dataset, please see attached, contains a group factoring column ?
> > *factor*? and two columns of data ?x1? and ?x2? with some NA values. I
> need
> > some help to detect the outliers and replace it and the NAs with the
> > average within each level (0,1,2) for each variable ?x1? and ?x2?.
> >
> >
> >
> > I tried the below code, but it did not accomplish what I want to do.
> >
> >
> >
> >
> >
> > data<-read.csv("G:/20-Spring_2023/Outliers/data.csv",
header=TRUE)
> >
> > data
> >
> > replace_outlier_with_mean <- function(x) {
> >
> >    replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE))  ####
,
> > na.rm=TRUE NOT working
> >
> > }
> >
> > data[] <- lapply(data, replace_outlier_with_mean)
> >
> >
> >
> >
> >
> > Thank you all very much for your help in advance.
> >
> >
> >
> >
> >
> > with many thanks
> >
> > abou
> >
> >
> > ______________________
> >
> >
> > *AbouEl-Makarim Aboueissa, PhD*
> >
> > *Professor, Mathematics and Statistics*
> > *Graduate Coordinator*
> >
> > *Department of Mathematics and Statistics*
> > *University of Southern Maine*
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> There is no data set attached, see the posting guide on what file
> extensions are allowed as attachments.
>
> As for the question, try to compute mean(x, na.rm = TRUE)  first, then
> use this value in the replace instruction. Without data I'm just
guessing.
>
> Hope this helps,
>
> Rui Barradas
>
>
	[[alternative HTML version deleted]]

Rui Barradas

2023-Apr-20 18:58 UTC

head link

[R] detect and replace outliers by the average

?s 19:46 de 20/04/2023, AbouEl-Makarim Aboueissa
escreveu:> Hi Rui:
> 
> 
> here is the dataset
> 
> factor x1 x2
> 0 700 700
> 0 700 500
> 0 470 470
> 0 710 560
> 0 5555 520
> 0 610 720
> 0 710 670
> 0 610 9999
> 1 690 620
> 1 580 540
> 1 690 690
> 1 NA 401
> 1 450 580
> 1 700 700
> 1 400 8888
> 1 6666 600
> 1 500 400
> 1 680 650
> 2 117 63
> 2 120 68
> 2 130 73
> 2 120 69
> 2 125 54
> 2 999 70
> 2 165 62
> 2 130 987
> 2 123 70
> 2 78
> 2 98
> 2 5
> 2 321 NA
> 
> with many thanks
> abou
> ______________________
> 
> 
> *AbouEl-Makarim Aboueissa, PhD*
> 
> *Professor, Mathematics and Statistics*
> *Graduate Coordinator*
> 
> *Department of Mathematics and Statistics*
> *University of Southern Maine*
> 
> 
> 
> On Thu, Apr 20, 2023 at 2:44?PM Rui Barradas <ruipbarradas at
sapo.pt> wrote:
> 
>> ?s 19:36 de 20/04/2023, AbouEl-Makarim Aboueissa escreveu:
>>> Dear All:
>>>
>>>
>>>
>>> *Re:* detect and replace outliers by the average
>>>
>>>
>>>
>>> The dataset, please see attached, contains a group factoring column
?
>>> *factor*? and two columns of data ?x1? and ?x2? with some NA
values. I
>> need
>>> some help to detect the outliers and replace it and the NAs with
the
>>> average within each level (0,1,2) for each variable ?x1? and ?x2?.
>>>
>>>
>>>
>>> I tried the below code, but it did not accomplish what I want to
do.
>>>
>>>
>>>
>>>
>>>
>>> data<-read.csv("G:/20-Spring_2023/Outliers/data.csv",
header=TRUE)
>>>
>>> data
>>>
>>> replace_outlier_with_mean <- function(x) {
>>>
>>>     replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE)) 
#### ,
>>> na.rm=TRUE NOT working
>>>
>>> }
>>>
>>> data[] <- lapply(data, replace_outlier_with_mean)
>>>
>>>
>>>
>>>
>>>
>>> Thank you all very much for your help in advance.
>>>
>>>
>>>
>>>
>>>
>>> with many thanks
>>>
>>> abou
>>>
>>>
>>> ______________________
>>>
>>>
>>> *AbouEl-Makarim Aboueissa, PhD*
>>>
>>> *Professor, Mathematics and Statistics*
>>> *Graduate Coordinator*
>>>
>>> *Department of Mathematics and Statistics*
>>> *University of Southern Maine*
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> Hello,
>>
>> There is no data set attached, see the posting guide on what file
>> extensions are allowed as attachments.
>>
>> As for the question, try to compute mean(x, na.rm = TRUE)  first, then
>> use this value in the replace instruction. Without data I'm just
guessing.
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
> Hello,

Here is a way. It uses ave in the function to group the data by the factor.


df1 <- "factor x1 x2
0 700 700
0 700 500
0 470 470
0 710 560
0 5555 520
0 610 720
0 710 670
0 610 9999
1 690 620
1 580 540
1 690 690
1 NA 401
1 450 580
1 700 700
1 400 8888
1 6666 600
1 500 400
1 680 650
2 117 63
2 120 68
2 130 73
2 120 69
2 125 54
2 999 70
2 165 62
2 130 987
2 123 70
2 78 NA
2 98 NA
2 5 NA
2 321 NA"
df1 <- read.table(text = df1, header = TRUE,
                   colClasses = c("factor", "numeric",
"numeric"))


replace_outlier_with_mean <- function(x, f) {
   ave(x, f, FUN = \(y) {
     i <- is.na(y) | y %in% boxplot.stats(y, do.conf = FALSE)$out
     y[i] <- mean(y, na.rm = TRUE)
     y
   })
}

lapply(df1[-1], replace_outlier_with_mean, f = df1$factor)
#> $x1
#>  [1]  700.0000  700.0000  470.0000  710.0000 1258.1250  610.0000 
710.0000
#>  [8]  610.0000  690.0000  580.0000  690.0000 1261.7778  450.0000 
700.0000
#> [15]  400.0000 1261.7778  500.0000  680.0000  117.0000  120.0000 
130.0000
#> [22]  120.0000  125.0000  194.6923  194.6923  130.0000  123.0000 
194.6923
#> [29]   98.0000  194.6923  194.6923
#>
#> $x2
#>  [1]  700.0000  500.0000  470.0000  560.0000  520.0000  720.0000 
670.0000
#>  [8] 1767.3750  620.0000  540.0000  690.0000  401.0000  580.0000 
700.0000
#> [15] 1406.9000  600.0000  400.0000  650.0000   63.0000   68.0000 
73.0000
#> [22]   69.0000   54.0000   70.0000   62.0000  168.4444   70.0000 
168.4444
#> [29]  168.4444  168.4444  168.4444


Hope this helps,

Rui Barradas

R help - Apr 2023 - detect and replace outliers by the average

[R] detect and replace outliers by the average

[R] detect and replace outliers by the average