Hi Ma Teresa,
Sorry, but I can't understand what you're trying to achieve.
On a statistical note, I'd tend to think more in terms of medians and would
think hard before replacing any outliers, but that's another matter.
Here I created the dataframe dd with the means column of D in its first column,
and then populated with a 1 whenever the value of D for that cell was greater
than 4 times the mean for that row -your definition of 'outlier'.
> dd <- rep(0,15*7)
> dim(dd) <- c(15,7)
> dd[,1]<- D[,1]
> for (i in 1:15){
+ for (j in 2:7){
+ dd[i,j] <- D[i,(j+1)]/D[i,2]>4
+ }
+ }> dd
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1108 0 0 0 0 0 0
[2,] 1479 NA NA 0 1 0 0
[3,] 1591 0 0 0 0 0 0
[4,] 3408 0 0 0 0 0 0
[5,] 3423 NA NA NA 1 0 0
[6,] 3872 0 0 0 0 0 1
[7,] 5823 0 0 0 0 0 0
[8,] 6051 NA NA NA NA 0 0
[9,] 8099 0 0 0 0 0 0
[10,] 8100 NA NA NA NA 0 0
[11,] 10640 1 1 1 0 0 0
[12,] 12600 0 0 0 0 0 0
[13,] 14680 0 0 1 0 0 1
[14,] 14698 0 0 0 0 0 0
[15,] 17143 0 0 0 0 0 0
So, you encounter four situations:
a) as in row 2, you have an outlier preceded and followed by values
b) as in row 5, you have an outlier preceded by an NA
c) as in row 6, there is an outlier in the last column
d) as in row 11, there are two or more consecutive outliers
The replacement rule you described would only apply to situations a) (ie
replacing the outlier by the mean of the preceding and subsequent values), and
b) (replacing it by the mean for the row).
But what of situations c) and d)?
And, because this is just a chunk of a bigger dataset, you can also get an
outlier in the first column, followed by a number. Again, your rule has not
accounted for this situation either.
Hope this helps,
Jos?
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of M? Teresa Martinez Soriano
Sent: 30 August 2013 09:13
To: r-help at r-project.org
Subject: [R] Outliers Help
This is my a part of my data set
> D[1:15,c(1,5:10)]
X. media IE.2005 IE.2006 IE.2007 IE.2008 IE.2009 IE.2010
1 1108 22.00000 60.0 39 4.0 8.0 16.0 5.0
2 1479 110.00000 NA NA 53.0 1166.0 344.8 110.0
3 1591 86.60000 247.0 87 95.0 94.0 81.0 76.0
4 3408 807.00000 302.0 322 621.0 1071.0 1301.0 1225.0
5 3423 9.00000 NA NA NA 410.8 7.0 11.0
6 3872 103.25000 288.6 113 116.0 90.0 94.0 12036.6
7 5823 73.00000 117.0 70 80.0 74.0 69.0 72.0
8 6051 73.00000 NA NA NA NA 60.0 86.0
9 8099 125.16667 196.0 161 150.0 94.0 72.0 78.0
10 8100 70.00000 NA NA NA NA 48.0 92.0
11 10640 67.33333 1256.6 1152 664.2 74.0 77.0 51.0
12 12600 2417.00000 1960.0 2383 2453.0 2506.0 2758.0 2442.0
13 14680 38.00000 30.0 61 373.6 42.0 19.0 220.8
14 14698 698.16667 553.0 664 847.0 800.0 679.0 646.0
15 17143 392.16667 323.0 322 434.0 383.0 459.0 432.0
I have done multiple imputation and now I have some outliers which I would like
to replace with the mean of this row or if it is possible with the mean of the
previos and the next value of this row, I mean for instance:
value 1 - Outlier- Value 2
I would like to replace the outlier with the mean of value 1 and value2, the
problem is that this values could be NA ( NA after the imputation because they
don't exist), in this case I would like to replace outlier with the mean of
the row.
An other problem I have is to detect correctly outlier values, for instance in
this example of data set for X=3872 and IE.2010, we can see an outlier, I have
thought to compare the values with the mean ( column media)
I have tried to do this code
D<-datos[, c(1,16:24)]
m<-as.matrix(D)
for( i in 1: nrow(D))
{
for( j in 5:(ncol(D)-1)) # I would change this in the new data set, because I
will have more years than 2010
{
if(!is.na(m[i,j])&& !is.na
(m[i,j+1])&&!is.na(m[i,j-1])&&!is.na(m[i,2])&&((m[i,j]/m[i,2])>4)){m[m[i,j]]<-
(m[i,j-1]+m[i,j+1])/2 # Here I would like to find the values that are much more
bigger than the mean of this row,
#if( !is.na(m[i,j])
# and replace them by the mean of the previous and the next values of the
same row.
}
}
}
D<-as.data.frame(m)
But I get a data.frame that I had previously, it changes nothing
I accept any idea.
Thanks a lot, Teresa
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The Wireless from Age UK | Radio for grown-ups.
www.ageuk.org.uk/thewireless
If you?re looking for a radio station that offers real variety, tune in to The
Wireless from Age UK.
Whether you choose to listen through the website at
www.ageuk.org.uk/thewireless, on digital radio (currently available in London
and Yorkshire) or through our TuneIn Radio app, you can look forward to an
inspiring mix of music, conversation and useful information 24 hours a day.
-------------------------------
Age UK is a registered charity and company limited by guarantee, (registered
charity number 1128267, registered company number 6825798).
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.
For the purposes of promoting Age UK Insurance, Age UK is an Appointed
Representative of Age UK Enterprises Limited, Age UK is an Introducer
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth
Access for the purposes of introducing potential annuity and health
cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit
Solutions Limited and Simplyhealth Access are all authorised and
regulated by the Financial Services Authority.
------------------------------
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are
addressed. If you receive a message in error, please advise the sender and
delete immediately.
Except where this email is sent in the usual course of our business, any
opinions expressed in this email are those of the author and do not
necessarily reflect the opinions of Age UK or its subsidiaries and associated
companies. Age UK monitors all e-mail transmissions passing
through its network and may block or modify mails which are deemed to be
unsuitable.
Age Concern England (charity number 261794) and Help the Aged (charity number
272786) and their trading and other associated companies merged
on 1st April 2009. Together they have formed the Age UK Group, dedicated to
improving the lives of people in later life. The three national
Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help
the Aged in these nations to form three registered charities:
Age Scotland, Age NI, Age Cymru.