Hock Ann Lim
2010-Oct-03 06:12 UTC
[R] How to programme R to randomly replace some X values with Outliers
Dear experts,
I am a beginner of R.
I'm looking for experts to guide me how to do programming in R in order to
randomly replace 5 observations in X explanatory variable with outliers drawn
from U(15,20) in sample size n=100. The replacement subject to y < 15.
The ultimate goal of my study is to compare the std of y with and without the
presence of outliers based on average of 1000 simulation.
Info :
X~U(0,10)
Y=6+2X+norm(0,1)
Thank you.
Hock Ann
[[alternative HTML version deleted]]
Joshua Wiley
2010-Oct-03 07:28 UTC
[R] How to programme R to randomly replace some X values with Outliers
Dear Hock Ann,
I am not sure of all your requirements, but this should at least get
you started. I show it by hand and also wrapped up in a function. In
the function I made two density plots that I thought might be
interesting to you, you can just delete those two lines of code if you
do not want them. The code follows below.
Cheers,
Josh
## X is pseudo-random numbers from the uniform distribution
## ~U(0, 10)
X <- runif(n = 100, min = 0, max = 10)
## We can check that X is about what we would expect
## The mean should be
(1/2) * (10 + 0)
mean(x = X)
## The variance should be
(1/12) * ((10 - 0)^2)
var(x = X)
## I am assuming norm(0, 1) to be representing the standard normal distribution
## Create a vectory of these numbers to be used in the formula for Y
## this is important since you will be create Y twice
## (with X and then X replaced with some outliers)
## You do not want R to regenerate the random normal values and change
things that way
Z <- rnorm(n = 100, mean = 0, sd = 1)
## Create Y from your formula, where Z = norm(0, 1)
## Y = 6 + 2X + norm(0, 1)
Y <- 6 + 2 * X + Z
## Now I am using sample() to randomly select some values
## between 1 and the length of X, these will be the positions
## of the elements of X to be replaced
toreplace <- sample(x = seq_along(X), size = 5, replace = FALSE)
## Now replace the X values
X[toreplace] <- runif(n = 5, min = 15, max = 20)
## Create Ynew based off updated X
Ynew <- 6 + 2 * X + Z
## Calculate the standard deviations of Y and Ynew
## and store in a named vector called "results"
results <- c("SD_Y" = sd(Y), "SD_Ynew" = sd(Ynew))
## print the results vector to screen to look at it
results
## Now if you wanted to do this many times
## and potentially change a few values easily
## we can put it in a function
## n is the number in each sample
## a and b are the min and max of the uniform distribution for X
## a.outlier and b.outlier are the same but for the outliers
## nreplace is how many values of X you want to replace
## reps is how many times you want to run it
## I have written the values to default to what you said in your emamil
## but obviously it would be easy to change any one of them
mysampler <- function(n = 100, a = 0, b = 10,
a.outlier = 15, b.outlier = 20,
nreplace = 5, reps = 1000) {
if(any(c(n, nreplace, reps) < 1)) {
stop("n, nreplace, and reps must all be at least 1")
}
results <- matrix(0, nrow = reps, ncol = 2,
dimnames = list(NULL, c("SD_Y",
"SD_Ynew")))
for(i in 1:reps) {
X <- runif(n = n, min = a, max = b)
Z <- rnorm(n = n, mean = 0, sd = 1)
Y <- 6 + 2 * X + Z
toreplace <- sample(x = seq_along(X), size = nreplace, replace = FALSE)
X[toreplace] <- runif(n = nreplace, min = a.outlier, max = b.outlier)
Ynew <- 6 + 2 * X + Z
results[i, ] <- c(sd(Y), sd(Ynew))
}
dev.new()
par(mfrow = c(2, 1))
plot(density(results[,"SD_Y"]), xlim = range(results))
plot(density(results[,"SD_Ynew"]), xlim = range(results))
return(results)
}
## You might find the following documentation helpful
?runif # generate random values from uniform
?rnorm # from normal
?for # to do your simulation
On Sat, Oct 2, 2010 at 11:12 PM, Hock Ann Lim <lim_ha at yahoo.com>
wrote:> Dear experts,
> I am a beginner of R.
> I'm looking for experts to guide me how to?do programming in R in order
to
> randomly replace?5?observations in X?explanatory variable with
outliers?drawn
> from U(15,20)?in sample size n=100. The replacement subject to?y < 15.
>
> The ultimate goal of my?study is?to compare?the std of y?with and without
the
> presence of outliers based on average of 1000 simulation.
>
> Info :
> X~U(0,10)
> Y=6+2X+norm(0,1)
>
> Thank you.
>
> Hock Ann
>
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
Michael Bedward
2010-Oct-03 07:33 UTC
[R] How to programme R to randomly replace some X values with Outliers
N <- 100 Nrep <- 5 X <- runif(N, 0, 10) Y <- 6 + 2*X + rnorm(N, 0, 1) X[ sample(which(Y < 15), Nrep) ] <- runif(Nrep, 15, 20) Hope this helps, Michael On 3 October 2010 16:12, Hock Ann Lim <lim_ha at yahoo.com> wrote:> Dear experts, > I am a beginner of R. > I'm looking for experts to guide me how to?do programming in R in order to > randomly replace?5?observations in X?explanatory variable with outliers?drawn > from U(15,20)?in sample size n=100. The replacement subject to?y < 15. > > The ultimate goal of my?study is?to compare?the std of y?with and without the > presence of outliers based on average of 1000 simulation. > > Info : > X~U(0,10) > Y=6+2X+norm(0,1) > > Thank you. > > Hock Ann > > > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >