Hi All,
I want to take a matrix (or data frame) and winsorize each variable.
So I can, for example, correlate the winsorized variables.
The code below will winsorize a single vector, but when applied to
several vectors, each ends up sorted independently in ascending order
so that a given observation is no longer on the same row for each
vector.
So I need to winsorize the variable but then return it to its original
order. Or another solution that will take a data frame, wisorize each
variable, and return a new data frame with all the variables in the
original order.
Thanks for any help!
-Karl
#The function I'm working from
win<-function(x,tr=.2,na.rm=F){
if(na.rm)x<-x[!is.na(x)]
y<-sort(x)
n<-length(x)
ibot<-floor(tr*n)+1
itop<-length(x)-ibot+1
xbot<-y[ibot]
xtop<-y[itop]
y<-ifelse(y<=xbot,xbot,y)
y<-ifelse(y>=xtop,xtop,y)
win<-y
win
}
#Produces an example data frame, ss is the observation id, vars 1-5
are the variables I want to winzorise.
ss
=
c
(1
:
5
);var1
=
rnorm
(5
);var2
=
rnorm
(5
);var3
=rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))-
>data
data
#Winsorizes each variable, but sorts them independently so the
observations no longer line up.
sapply(data,win)
___________________________
M. Karl Healey
Ph.D. Student
Department of Psychology
University of Toronto
Sidney Smith Hall
100 St. George Street
Toronto, ON
M5S 3G3
karl at psych.utoronto.ca
Might work better to determine top and bottom for each column with
quantile() using an appropriate quantile option, and then process
each variable "in place" with your ifelse logic.
I did find a somewhat different definition of winsorization with no
sorting in this code copied from a Patrick Burns posting from earlier
this year on R-SIG-Finance;
function(x, winsorize=5) {
s <- mad(x) * winsorize
top <- median(x) + s
bot <- median(x) - s
x[x > top] <- top
x[x < bot] <- bot x }
--
David Winsemius
On Jan 16, 2009, at 3:50 PM, Karl Healey wrote:
> Hi All,
>
> I want to take a matrix (or data frame) and winsorize each variable.
> So I can, for example, correlate the winsorized variables.
>
> The code below will winsorize a single vector, but when applied to
> several vectors, each ends up sorted independently in ascending
> order so that a given observation is no longer on the same row for
> each vector.
>
> So I need to winsorize the variable but then return it to its
> original order. Or another solution that will take a data frame,
> wisorize each variable, and return a new data frame with all the
> variables in the original order.
>
> Thanks for any help!
>
> -Karl
>
>
> #The function I'm working from
>
> win<-function(x,tr=.2,na.rm=F){
>
> if(na.rm)x<-x[!is.na(x)]
> y<-sort(x)
> n<-length(x)
> ibot<-floor(tr*n)+1
> itop<-length(x)-ibot+1
> xbot<-y[ibot]
> xtop<-y[itop]
> y<-ifelse(y<=xbot,xbot,y)
> y<-ifelse(y>=xtop,xtop,y)
> win<-y
> win
> }
>
> #Produces an example data frame, ss is the observation id, vars 1-5
> are the variables I want to winzorise.
>
> ss
> =
> c
> (1
> :
> 5
> );var1
> =
> rnorm
> (5
> );var2
> =
> rnorm
> (5
> );var3
> =rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))-
> >data
> data
>
> #Winsorizes each variable, but sorts them independently so the
> observations no longer line up.
>
> sapply(data,win)
>
>
> ___________________________
> M. Karl Healey
> Ph.D. Student
>
> Department of Psychology
> University of Toronto
> Sidney Smith Hall
> 100 St. George Street
> Toronto, ON
> M5S 3G3
>
> karl at psych.utoronto.ca
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Don't sort y. Calculate xbot and xtop using
xtemp<-quantile(y,c(tr,1-tr),na.rm=na.rm)
xbot<-xtemp[1]
xtop<-xtemp[2]
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Karl Healey
Sent: Friday, January 16, 2009 2:51 PM
To: r-help at r-project.org
Subject: [R] Winsorizing Multiple Variables
Hi All,
I want to take a matrix (or data frame) and winsorize each variable.
So I can, for example, correlate the winsorized variables.
The code below will winsorize a single vector, but when applied to
several vectors, each ends up sorted independently in ascending order
so that a given observation is no longer on the same row for each
vector.
So I need to winsorize the variable but then return it to its original
order. Or another solution that will take a data frame, wisorize each
variable, and return a new data frame with all the variables in the
original order.
Thanks for any help!
-Karl
#The function I'm working from
win<-function(x,tr=.2,na.rm=F){
if(na.rm)x<-x[!is.na(x)]
y<-sort(x)
n<-length(x)
ibot<-floor(tr*n)+1
itop<-length(x)-ibot+1
xbot<-y[ibot]
xtop<-y[itop]
y<-ifelse(y<=xbot,xbot,y)
y<-ifelse(y>=xtop,xtop,y)
win<-y
win
}
#Produces an example data frame, ss is the observation id, vars 1-5
are the variables I want to winzorise.
ss
c
(1
:
5
);var1
rnorm
(5
);var2
rnorm
(5
);var3
=rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))-
>data
data
#Winsorizes each variable, but sorts them independently so the
observations no longer line up.
sapply(data,win)
___________________________
M. Karl Healey
Ph.D. Student
Department of Psychology
University of Toronto
Sidney Smith Hall
100 St. George Street
Toronto, ON
M5S 3G3
karl at psych.utoronto.ca
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.