Hock Ann Lim
2010-Oct-03 06:12 UTC
[R] How to programme R to randomly replace some X values with Outliers
Dear experts, I am a beginner of R. I'm looking for experts to guide me how to do programming in R in order to randomly replace 5 observations in X explanatory variable with outliers drawn from U(15,20) in sample size n=100. The replacement subject to y < 15. The ultimate goal of my study is to compare the std of y with and without the presence of outliers based on average of 1000 simulation. Info : X~U(0,10) Y=6+2X+norm(0,1) Thank you. Hock Ann [[alternative HTML version deleted]]
Joshua Wiley
2010-Oct-03 07:28 UTC
[R] How to programme R to randomly replace some X values with Outliers
Dear Hock Ann, I am not sure of all your requirements, but this should at least get you started. I show it by hand and also wrapped up in a function. In the function I made two density plots that I thought might be interesting to you, you can just delete those two lines of code if you do not want them. The code follows below. Cheers, Josh ## X is pseudo-random numbers from the uniform distribution ## ~U(0, 10) X <- runif(n = 100, min = 0, max = 10) ## We can check that X is about what we would expect ## The mean should be (1/2) * (10 + 0) mean(x = X) ## The variance should be (1/12) * ((10 - 0)^2) var(x = X) ## I am assuming norm(0, 1) to be representing the standard normal distribution ## Create a vectory of these numbers to be used in the formula for Y ## this is important since you will be create Y twice ## (with X and then X replaced with some outliers) ## You do not want R to regenerate the random normal values and change things that way Z <- rnorm(n = 100, mean = 0, sd = 1) ## Create Y from your formula, where Z = norm(0, 1) ## Y = 6 + 2X + norm(0, 1) Y <- 6 + 2 * X + Z ## Now I am using sample() to randomly select some values ## between 1 and the length of X, these will be the positions ## of the elements of X to be replaced toreplace <- sample(x = seq_along(X), size = 5, replace = FALSE) ## Now replace the X values X[toreplace] <- runif(n = 5, min = 15, max = 20) ## Create Ynew based off updated X Ynew <- 6 + 2 * X + Z ## Calculate the standard deviations of Y and Ynew ## and store in a named vector called "results" results <- c("SD_Y" = sd(Y), "SD_Ynew" = sd(Ynew)) ## print the results vector to screen to look at it results ## Now if you wanted to do this many times ## and potentially change a few values easily ## we can put it in a function ## n is the number in each sample ## a and b are the min and max of the uniform distribution for X ## a.outlier and b.outlier are the same but for the outliers ## nreplace is how many values of X you want to replace ## reps is how many times you want to run it ## I have written the values to default to what you said in your emamil ## but obviously it would be easy to change any one of them mysampler <- function(n = 100, a = 0, b = 10, a.outlier = 15, b.outlier = 20, nreplace = 5, reps = 1000) { if(any(c(n, nreplace, reps) < 1)) { stop("n, nreplace, and reps must all be at least 1") } results <- matrix(0, nrow = reps, ncol = 2, dimnames = list(NULL, c("SD_Y", "SD_Ynew"))) for(i in 1:reps) { X <- runif(n = n, min = a, max = b) Z <- rnorm(n = n, mean = 0, sd = 1) Y <- 6 + 2 * X + Z toreplace <- sample(x = seq_along(X), size = nreplace, replace = FALSE) X[toreplace] <- runif(n = nreplace, min = a.outlier, max = b.outlier) Ynew <- 6 + 2 * X + Z results[i, ] <- c(sd(Y), sd(Ynew)) } dev.new() par(mfrow = c(2, 1)) plot(density(results[,"SD_Y"]), xlim = range(results)) plot(density(results[,"SD_Ynew"]), xlim = range(results)) return(results) } ## You might find the following documentation helpful ?runif # generate random values from uniform ?rnorm # from normal ?for # to do your simulation On Sat, Oct 2, 2010 at 11:12 PM, Hock Ann Lim <lim_ha at yahoo.com> wrote:> Dear experts, > I am a beginner of R. > I'm looking for experts to guide me how to?do programming in R in order to > randomly replace?5?observations in X?explanatory variable with outliers?drawn > from U(15,20)?in sample size n=100. The replacement subject to?y < 15. > > The ultimate goal of my?study is?to compare?the std of y?with and without the > presence of outliers based on average of 1000 simulation. > > Info : > X~U(0,10) > Y=6+2X+norm(0,1) > > Thank you. > > Hock Ann > > > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Michael Bedward
2010-Oct-03 07:33 UTC
[R] How to programme R to randomly replace some X values with Outliers
N <- 100 Nrep <- 5 X <- runif(N, 0, 10) Y <- 6 + 2*X + rnorm(N, 0, 1) X[ sample(which(Y < 15), Nrep) ] <- runif(Nrep, 15, 20) Hope this helps, Michael On 3 October 2010 16:12, Hock Ann Lim <lim_ha at yahoo.com> wrote:> Dear experts, > I am a beginner of R. > I'm looking for experts to guide me how to?do programming in R in order to > randomly replace?5?observations in X?explanatory variable with outliers?drawn > from U(15,20)?in sample size n=100. The replacement subject to?y < 15. > > The ultimate goal of my?study is?to compare?the std of y?with and without the > presence of outliers based on average of 1000 simulation. > > Info : > X~U(0,10) > Y=6+2X+norm(0,1) > > Thank you. > > Hock Ann > > > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >