*R: *Grubbs Test to detect all outliers Per group for all columns in a data frame Dear All: good morning I have a dataset (as an example) with two column factors (factor1 and factor2) and 5 numerical columns (X,Y,Z,U,V). The X and Y columns have same length as factor1; and Z, U, and V have same length as factor2. Please see dataset is copied below. Please note that all dataset columns have NAs values. *Need help on this:* Can we use the grubbs.test() function to detect all outliers and replace it by NA in X and Y datasets per group in factor1; and in Z, U, and V datasets per group in factor2. Columns in the dataframe have different lengths, but when I read the .csv file, R added NA values for the shorter columns. If you need the .csv data file, please let me know. Thank you very much for your help in advance. install.packages("outliers") library(outliers) datafortest<-read.csv("G:/data_for_test.csv", header=TRUE) datafortest datafortest<-data.frame(datafortest) datafortest$factor1<-as.factor(datafortest$factor1) datafortest$factor2<-as.factor(datafortest$factor2) str(datafortest) ##### tried to use grubbs.test() on a single column of the dataframe, but still not working tests.for.outliers.X<- grubbs.test(datafortest$X, na.rm = TRUE, type=11) #################################### *grubbs.test() on a single dataset: but this can only detect if the min and the max are outliers.* xx999<-c(0.088,1,2,3,4,5,6,7,8,9,88,98,99) grubbs.test(xx999, type=11) With many thanks Abou factor1 X Y factor2 Z U V 1 4455.077 888 1 999 NA 999 1 4348.031 333 1 475 NA 240 1 9999.789 618 1 507 252 394 1 3813.139 417 1 603 332 265 1 7512.65 344 1 442 216 NA 1 5642.667 NA 1 486 217 275 1 6684.386 341 1 927 698 479 2 5165.731 999 1 971 311 562 2 NA 265 1 388 999 512 2 3259.241 557 2 888 444 777 2 3288.383 234 2 514 NA 322 2 1997.878 383 2 409 311 NA 2 99990.61 NA 2 546 327 728 2 2655.977 NA 2 523 228 653 3 3189.49 7777 2 313 456 450 3 1826.851 287 2 296 412 576 3 4386.002 352 2 320 251 NA 3 3295.091 308 2 388 888 396.5 3 2120.902 526 3 9999 398 888 3 NA 489 3 677 438 307 3 2056.123 291 3 555 428 219 3 1995.088 444 3 NA 319 NA 3 NA 349 3 479 NA 321 3 2539.873 333 3 257 406 417 3 313 334 409 3 296 465 546 3 320 180 523 3 388 999 313 ______________________ *AbouEl-Makarim Aboueissa, PhD* *Professor, Mathematics and Statistics* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* [[alternative HTML version deleted]]
?s 14:09 de 28/04/2023, AbouEl-Makarim Aboueissa escreveu:> *R: *Grubbs Test to detect all outliers Per group for all columns in a data > frame > > > > Dear All: good morning > > I have a dataset (as an example) with two column factors (factor1 and > factor2) and 5 numerical columns (X,Y,Z,U,V). The X and Y columns have same > length as factor1; and Z, U, and V have same length as factor2. Please see > dataset is copied below. Please note that all dataset columns have NAs > values. > > *Need help on this:* > > > Can we use the grubbs.test() function to detect all outliers and replace it > by NA in X and Y datasets per group in factor1; and in Z, U, and V datasets > per group in factor2. Columns in the dataframe have different lengths, but > when I read the .csv file, R added NA values for the shorter columns. > > If you need the .csv data file, please let me know. > > > Thank you very much for your help in advance. > > > > > install.packages("outliers") > library(outliers) > > datafortest<-read.csv("G:/data_for_test.csv", header=TRUE) > datafortest > > datafortest<-data.frame(datafortest) > > datafortest$factor1<-as.factor(datafortest$factor1) > datafortest$factor2<-as.factor(datafortest$factor2) > > str(datafortest) > > ##### tried to use grubbs.test() on a single column of the dataframe, but > still not working > tests.for.outliers.X<- grubbs.test(datafortest$X, na.rm = TRUE, type=11) > > > #################################### > > *grubbs.test() on a single dataset: but this can only detect if the min and > the max are outliers.* > > > xx999<-c(0.088,1,2,3,4,5,6,7,8,9,88,98,99) > grubbs.test(xx999, type=11) > > > > > With many thanks > > Abou > > > > factor1 X Y factor2 Z U > V > 1 4455.077 888 1 999 NA 999 > 1 4348.031 333 1 475 NA 240 > 1 9999.789 618 1 507 252 394 > 1 3813.139 417 1 603 332 265 > 1 7512.65 344 1 442 216 NA > 1 5642.667 NA 1 486 217 275 > 1 6684.386 341 1 927 698 479 > 2 5165.731 999 1 971 311 562 > 2 NA 265 1 388 999 512 > 2 3259.241 557 2 888 444 777 > 2 3288.383 234 2 514 NA 322 > 2 1997.878 383 2 409 311 NA > 2 99990.61 NA 2 546 327 728 > 2 2655.977 NA 2 523 228 653 > 3 3189.49 7777 2 313 456 450 > 3 1826.851 287 2 296 412 576 > 3 4386.002 352 2 320 251 NA > 3 3295.091 308 2 388 888 396.5 > 3 2120.902 526 3 9999 398 888 > 3 NA 489 3 677 438 307 > 3 2056.123 291 3 555 428 219 > 3 1995.088 444 3 NA 319 NA > 3 NA 349 3 479 NA 321 > 3 2539.873 333 3 257 406 417 > 3 313 334 409 > 3 296 465 546 > 3 320 180 523 > 3 388 999 313 > > > > ______________________ > > > *AbouEl-Makarim Aboueissa, PhD* > > *Professor, Mathematics and Statistics* > *Graduate Coordinator* > > *Department of Mathematics and Statistics* > *University of Southern Maine* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, Please post the output of dput(datafortest) your data is difficult to read into a R session. Hope this helps, Rui Barradas
?s 14:09 de 28/04/2023, AbouEl-Makarim Aboueissa escreveu:> *R: *Grubbs Test to detect all outliers Per group for all columns in a data > frame > > > > Dear All: good morning > > I have a dataset (as an example) with two column factors (factor1 and > factor2) and 5 numerical columns (X,Y,Z,U,V). The X and Y columns have same > length as factor1; and Z, U, and V have same length as factor2. Please see > dataset is copied below. Please note that all dataset columns have NAs > values. > > *Need help on this:* > > > Can we use the grubbs.test() function to detect all outliers and replace it > by NA in X and Y datasets per group in factor1; and in Z, U, and V datasets > per group in factor2. Columns in the dataframe have different lengths, but > when I read the .csv file, R added NA values for the shorter columns. > > If you need the .csv data file, please let me know. > > > Thank you very much for your help in advance. > > > > > install.packages("outliers") > library(outliers) > > datafortest<-read.csv("G:/data_for_test.csv", header=TRUE) > datafortest > > datafortest<-data.frame(datafortest) > > datafortest$factor1<-as.factor(datafortest$factor1) > datafortest$factor2<-as.factor(datafortest$factor2) > > str(datafortest) > > ##### tried to use grubbs.test() on a single column of the dataframe, but > still not working > tests.for.outliers.X<- grubbs.test(datafortest$X, na.rm = TRUE, type=11) > > > #################################### > > *grubbs.test() on a single dataset: but this can only detect if the min and > the max are outliers.* > > > xx999<-c(0.088,1,2,3,4,5,6,7,8,9,88,98,99) > grubbs.test(xx999, type=11) > > > > > With many thanks > > Abou > > > > factor1 X Y factor2 Z U > V > 1 4455.077 888 1 999 NA 999 > 1 4348.031 333 1 475 NA 240 > 1 9999.789 618 1 507 252 394 > 1 3813.139 417 1 603 332 265 > 1 7512.65 344 1 442 216 NA > 1 5642.667 NA 1 486 217 275 > 1 6684.386 341 1 927 698 479 > 2 5165.731 999 1 971 311 562 > 2 NA 265 1 388 999 512 > 2 3259.241 557 2 888 444 777 > 2 3288.383 234 2 514 NA 322 > 2 1997.878 383 2 409 311 NA > 2 99990.61 NA 2 546 327 728 > 2 2655.977 NA 2 523 228 653 > 3 3189.49 7777 2 313 456 450 > 3 1826.851 287 2 296 412 576 > 3 4386.002 352 2 320 251 NA > 3 3295.091 308 2 388 888 396.5 > 3 2120.902 526 3 9999 398 888 > 3 NA 489 3 677 438 307 > 3 2056.123 291 3 555 428 219 > 3 1995.088 444 3 NA 319 NA > 3 NA 349 3 479 NA 321 > 3 2539.873 333 3 257 406 417 > 3 313 334 409 > 3 296 465 546 > 3 320 180 523 > 3 388 999 313 > > > > ______________________ > > > *AbouEl-Makarim Aboueissa, PhD* > > *Professor, Mathematics and Statistics* > *Graduate Coordinator* > > *Department of Mathematics and Statistics* > *University of Southern Maine* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, With the data file you have attached I cannot reproduce any errors, all went well at the first try. library(outliers) fl <- "~/data_for_test.csv" datafortest <- read.csv(fl) # these are not needed to run the test datafortest$factor1 <- as.factor(datafortest$factor1) datafortest$factor2 <- as.factor(datafortest$factor2) str(datafortest) #> 'data.frame': 28 obs. of 7 variables: #> $ factor1: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 2 2 2 ... #> $ X : num 4455 4348 10000 3813 7513 ... #> $ Y : int 888 333 618 417 344 NA 341 999 265 557 ... #> $ factor2: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 2 ... #> $ Z : int 999 475 507 603 442 486 927 971 388 888 ... #> $ U : int NA NA 252 332 216 217 698 311 999 444 ... #> $ V : num 999 240 394 265 NA 275 479 562 512 777 ... head(datafortest) #> factor1 X Y factor2 Z U V #> 1 1 4455.077 888 1 999 NA 999 #> 2 1 4348.031 333 1 475 NA 240 #> 3 1 9999.789 618 1 507 252 394 #> 4 1 3813.139 417 1 603 332 265 #> 5 1 7512.650 344 1 442 216 NA #> 6 1 5642.667 NA 1 486 217 275 ##### tried to use grubbs.test() on a single column of the dataframe, but ##### still not working grubbs.test(datafortest$X, type = 11) #> #> Grubbs test for two opposite outliers #> #> data: datafortest$X #> G = 4.6640014, U = 0.0091756, p-value = 0.02867 #> alternative hypothesis: 1826.851 and 99990.608 are outliers Hope this helps, Rui Barradas