thr3ads.net - similar to: "implementing Grubbs outlier test on a large dataframe"

Displaying 20 results from an estimated 1100 matches similar to: "implementing Grubbs outlier test on a large dataframe"

cochran-grubbs tests results

2010 Sep 15

cochran-grubbs tests results

Hello, I'm new in this R world and I don't know much about statistics, but now I have to analize some data and I've got some first queries yet: I have 5 sets of area mesures and each set has 5 repetitions. My first step is to check data looking for outliers. I've used the outliers package. I have to use the cochran test and the grubbs test in case I find any outlier. The problem

Outlier statistics question

2010 Nov 30

Outlier statistics question

I have a statistical question. The data sets I am working with are right-skewed so I have been plotting the log transformations of my data. I am using a Grubbs Test to detect outliers in the data, but I get different outcomes depending on whether I run the test on the original data or the log(data). Here is one of the problematic sets: fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047)

grubbs.test

2005 Apr 14

grubbs.test

Dear All, I have small samples of data (between 6 and 15) for numerious time series points. I am assuming the data for each time point is normally distributed. The problem is that the data arrvies sporadically and I would like to detect the number of outliers after I have six data points for any time period. Essentially, I would like to detect the number of outliers when I have 6 data points then

outlier tests

2004 Jun 30

outlier tests

I have been learning about some outlier tests -- Dixon and Grubb, specifically -- for small data sets. When I try help.start() and search for outlier tests, the only response I manage to find is the Bonferroni test avaiable from the CAR package... are there any other packages the offer outlier tests? Are the Dixon and Grubb tests "good" for small samples or are others more

(robust) mixed-effects model with covariate

2006 Jul 20

(robust) mixed-effects model with covariate

Dear all, I am unsure about how to specify a model in R and I thought of asking some advice to the list. I have two groups ("Group"= A, B) of subjects, with each subject undertaking a test before and after a certain treatment ("Time"= pre, post). Additionally, I want to enter the age of the subject as a covariate (the performance on the test is affected by age),

Using grubbs test for residuals to find outliers

2013 May 17

Using grubbs test for residuals to find outliers

Hi, I am a new user of R. This is a conceptual doubt regarding screeing out outliers from the dataset in regression. I read up that Cook's distance can be used and if we want to remove influential observations, we can use the metric (>4/n) (n=no of observations) to remove any outliers. I also came across Grubb's test to identify outliers in univariate distns. (assumed normal) but i

Pierce's criterion

2012 Apr 18

Pierce's criterion

Hello all, I would like to rigorously test whether observations in my dataset are outliers. I guess all the main tests in R (Grubbs) impose the assumption of normality. My data is surely not normal, so I would like to use something else. As far as I can tell from wikipedia, Peirce's criterion is just that. The data I am interested in testing is: 1) Continuous on the unit interval 2)

good method of removing outliers?

2011 Dec 30

good method of removing outliers?

Happy holidays all! I know it's very subjective to determine whether some data is outlier or not... But are there reasonally good and realistic methods of identifying outliers in R? Thanks a lot! [[alternative HTML version deleted]]

How to identify and exclude the outliers with R?

2007 Apr 25

How to identify and exclude the outliers with R?

Hello, everyone, I want to ask a simple question. If I have a set of data,and I want to identify how many outliers there are in the data.Which packages and functions can I use? Thanks. Shao chunxuan. [[alternative HTML version deleted]]

detection of outliers

2004 Sep 23

detection of outliers

Hi, this is both a statistical and a R question... what would the best way / test to detect an outlier value among a series of 10 to 30 values ? for instance if we have the following dataset: 10,11,12,15,20,22,25,30,500 I d like to have a way to identify the last data as an outlier (only one direction). One way would be to calculate abs(mean - median) and if elevated (to what extent ?) delete the

New var

2017 Jun 03

New var

Thank you all for the useful suggestion. I did some of my homework. library(data.table) DFM <- read.table(header=TRUE, text='obs start end 1 2/1/2015 1/1/2017 2 4/11/2010 1/1/2011 3 1/4/2006 5/3/2007 4 10/1/2007 1/1/2008 5 6/1/2011 1/1/2012 6 10/5/2004 12/1/2004',stringsAsFactors = FALSE) DFM DFM$D =as.numeric(difftime(as.Date(DFM$end,format="%m/%d/%Y"),

New var

2017 Jun 04

New var

Thank you Jeff and All, Within a given time period (say 700 days, from the start day), I am expecting measurements taken at each time interval;. In this case "0" means measurement taken, "1" not taken (stopped or opted out and " -1" don't consider that time period for that individual. This will be compared with the actual measurements taken (Observed-

New var

2017 Jun 04

New var

# read.table is NOT part of the data.table package #library(data.table) DFM <- read.table( text= 'obs start end 1 2/1/2015 1/1/2017 2 4/11/2010 1/1/2011 3 1/4/2006 5/3/2007 4 10/1/2007 1/1/2008 5 6/1/2011 1/1/2012 6 10/5/2004 12/1/2004 ',header = TRUE, stringsAsFactors = FALSE) # cleaner way to compute D DFM$start <- as.Date( DFM$start, format="%m/%d/%Y" ) DFM$end

New var

2017 Jun 04

New var

Since the number of choices is small (6), how about this? Starting with Jeff's initial DFM: DFM <- structure(list(obs = 1:6, start = structure(c(16467, 14710, 13152, 13787, 15126, 12696), class = "Date"), end = structure(c(17167, 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700, 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L, 1L), .Label

bug in rbind?

2017 Jan 17

bug in rbind?

I suspect there may be a bug in base::rbind.data.frame Below there is minimal example of the problem: m <- matrix (1:12, 3) dfm <- data.frame (c = 1 : 3, m = I (m)) str (dfm) m.names <- m rownames (m.names) <- letters [1:3] dfm.names <- data.frame (c = 1 : 3, m = I (m.names)) str (dfm.names) rbind (m, m.names) rbind (m.names, m) rbind (dfm, dfm.names) #not working rbind

Am having trouble calling a function

2011 Aug 27

Am having trouble calling a function

In my main R program, I have source("retaanalysis/Functions/doAirport.R") .... stuff to read data and calculate ads sapply(ads, function(x) {doAirport(x, base)} ) And doAirport has # analyze the flights for a given airport doAirport = function(df, base) { # Get rid of unused runway factor levels (from other airports) df$lrw <- drop.levels(df$lrw) # In gdata package #

About doing figures

2017 Jul 16

About doing figures

Hi R users, I still have the problem about plotting. I wanted to put the datasets on one figure, x-axis represents values B, y-axis represents values C, while different colors label column A. Each record uses a circle on the figure, while hollow circles represent DF=1 and solid circles represent DF=2. I put my code below, but the A labels do not correspond to the true record, so I don't know

About doing figures

2017 Jul 16

About doing figures

Hi Jim, For true color, I meant that the points in the figure do not correspond to the values from the dataframe. Also, why to use rainbow(9) here? And the legend is straight in the middle, is it possible to reformat it to the very bottom? Thanks again. On Sun, Jul 16, 2017 at 2:50 AM, Jim Lemon <drjimlemon at gmail.com> wrote: > Hi lily, > As I have no idea of what the "true

About doing figures

2017 Jul 16

About doing figures

Hi lily, As I have no idea of what the "true record" is, I can only guess. Maybe this will help: # get some fairly distinct colors rainbow_colors<-rainbow(9) # this should sort the numbers in dfm$A dfm$Acolor<-factor(dfm$A) plot(dfm$B,dfm$C,pch=ifelse(dfm$DF==1,1,19), col=rainbow_colors[as.numeric(dfm$Acolor)]) legend("bottom",legend=sort(unique(dfm$A)),

About doing figures

2017 Jul 16

About doing figures

For more than 10 records, how to reformat the colors? Also, how to show the first legend only, but at the bottom, while the second legend in your code is not necessary? In all, the same A values have the same color, but different symbols in DF==1 and DF==2. Thanks for your help. On Sun, Jul 16, 2017 at 9:28 AM, lily li <chocold12 at gmail.com> wrote: > Hi Jim, > > For true color,

similar to: implementing Grubbs outlier test on a large dataframe