thr3ads.net - R help - [R] Vectorization [Jun 2005]

If this information is useful, please help other people find it:
Share via:

ManojW

2005-Jun-16 00:45 UTC

[R] Vectorization

Greetings,
    Can anyone suggest me if we can vectorize the following problem
effectively?

    I have two datasets, one dataset is portfolio of stocks returns on a
historical basis and another dataset consist of a bunch of factors (again on
a historical basis). I intend to compute a rolling n-day sensitivitiesfor
each stock for each factor, so the output will be a data frame with
<ticker><dt><sensitivities>.

    How would you go onto vector this situation effectively?

    I end up with a psuedo code like this:

            # For each date
            For curr dt in all dates
                # Get Universe of stocks as of that date
                Get Universe for curr date
                # Calculate Sensitivity for each factor between n days back
dt to curr date
                sensitivity   
sapply(univ{ticker},CalcSensitivity,n_days_back_dt,dt)
            Next date

    I would highly appreciate if the above logic could be improved (if at
all) by a more effective solution since I do get into such situations on a
regular basis.

    Thanks in advance

Cheers

Manoj

Dimitri Joe

2005-Jun-17 17:00 UTC

head link

[R] vectorization

Hi there,

I have a data frame (mydata) with 1 numeric variable (income) and 1 factor
(education). I want a new column in this data with the median income for each
education level. A obviously inneficient way to do this is

for ( k in 1: nrow(mydata) )            {
l <- mydata$education[k]
mydata$md[k] <- median(mydata$income[mydata$education==l],na.rm=T)
                                                    }

Since mydata has nearly 30.000 rows, this will be done not untill the end of
this month. I thus need some help for vectorizing this, please.

Thanks,

Dimitri

	[[alternative HTML version deleted]]



	
	
		
_______________________________________________________ 

Instale o discador agora! br.acesso.yahoo.com

Liaw, Andy

2005-Jun-17 18:21 UTC

head link

[R] vectorization

Here I go again with ave():

mydata$md <- ave(mydata$income, mydata$education, FUN=median, na.rm=TRUE)

IMHO it's one of the most under-rated helper functions in R.

Andy
> From: Dimitri Joe
> 
> Hi there,
> 
> I have a data frame (mydata) with 1 numeric variable (income) 
> and 1 factor (education). I want a new column in this data 
> with the median income for each education level. A obviously 
> inneficient way to do this is
> 
> for ( k in 1: nrow(mydata) )            {
> l <- mydata$education[k]
> mydata$md[k] <- median(mydata$income[mydata$education==l],na.rm=T)
>                                                     }
> 
> Since mydata has nearly 30.000 rows, this will be done not 
> untill the end of this month. I thus need some help for 
> vectorizing this, please.
> 
> Thanks,
> 
> Dimitri
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> 	
> 	
> 		
> _______________________________________________________ 
> 
> Instale o discador agora! br.acesso.yahoo.com
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> R-project.org/posting-guide.html
> 
> 
>

Huntsinger, Reid

2005-Jun-17 18:35 UTC

head link

[R] vectorization

You can use tapply() to compute the medians, as in

meds <- tapply(mydata$inc,INDEX=mydata$ed,FUN=median)

then create a new column with the medians as

medianEd <- meds[mydata$ed]


Reid Huntsinger

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dimitri Joe
Sent: Friday, June 17, 2005 1:01 PM
To: R-Help
Subject: [R] vectorization


Hi there,

I have a data frame (mydata) with 1 numeric variable (income) and 1 factor
(education). I want a new column in this data with the median income for
each education level. A obviously inneficient way to do this is

for ( k in 1: nrow(mydata) )            {
l <- mydata$education[k]
mydata$md[k] <- median(mydata$income[mydata$education==l],na.rm=T)
                                                    }

Since mydata has nearly 30.000 rows, this will be done not untill the end of
this month. I thus need some help for vectorizing this, please.

Thanks,

Dimitri

	[[alternative HTML version deleted]]



	
	
		
_______________________________________________________ 

Instale o discador agora! br.acesso.yahoo.com

______________________________________________
R-help at stat.math.ethz.ch mailing list
stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
R-project.org/posting-guide.html

Kevin Bartz

2005-Jun-17 18:35 UTC

head link

[R] vectorization

These two lines worked for me:

rst <- tapply(mydata$income, mydata$education, median)
mydata$md <- rst[mydata$education]

Here's my cheesy example:
> mydata <- data.frame(income    = round(rnorm(30000, 55000, 10000)),+                      education = letters[rbinom(30000, 4,
1/2)+1])> rst <- tapply(mydata$income, mydata$education, median)
> mydata$md <- rst[mydata$education]
> head(mydata)  income education      md
1  66223         e 55094.5
2  56830         c 54966.0
3  58035         b 54937.5
4  74045         a 55213.5
5  61327         b 54937.5
6  64150         b 54937.5

Is this what you wanted?

Kevin

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dimitri Joe
Sent: Friday, June 17, 2005 10:01 AM
To: R-Help
Subject: [R] vectorization

Hi there,

I have a data frame (mydata) with 1 numeric variable (income) and 1
factor (education). I want a new column in this data with the median
income for each education level. A obviously inneficient way to do this
is

for ( k in 1: nrow(mydata) )            {
l <- mydata$education[k]
mydata$md[k] <- median(mydata$income[mydata$education==l],na.rm=T)
                                                    }

Since mydata has nearly 30.000 rows, this will be done not untill the
end of this month. I thus need some help for vectorizing this, please.

Thanks,

Dimitri

	[[alternative HTML version deleted]]



	
	
		
_______________________________________________________ 

Instale o discador agora! br.acesso.yahoo.com

______________________________________________
R-help at stat.math.ethz.ch mailing list
stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
R-project.org/posting-guide.html

Rau, Roland

2005-Jun-17 18:53 UTC

head link

[R] vectorization

Hi,

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dimitri Joe
> Sent: Friday, June 17, 2005 7:01 PM
> To: R-Help
> Subject: [R] vectorization
> 
> Hi there,
> 
> I have a data frame (mydata) with 1 numeric variable (income) 
> and 1 factor (education). I want a new column in this data 
> with the median income for each education level. A obviously 
> inneficient way to do this is
> I guess the attached code (incl. simulating your data structure) is not
the most efficient way to do this, but at least (I hope so!) it does
what you wanted it to do:


####################### Beginning of Example Code

income <- runif(100)
education <- as.factor(sample(c("high", "middle",
"low"),
size=length(income), replace=TRUE))
mydata <- data.frame(inc=income, edu=education)


mymedians <- tapply(X=mydata$inc, INDEX=mydata$edu, FUN=median)

mydata$medians <- ifelse(mydata$edu=="high",
mymedians["high"], 0)
mydata$medians <- ifelse(mydata$edu=="middle",
mymedians["middle"],
mydata$medians)
mydata$medians <- ifelse(mydata$edu=="low",
mymedians["low"],
mydata$medians)

head(mydata)
mymedians

####################### End of Example Code

Maybe one can increase the speed, but I think it is sufficient for your
case of 30,000 cases as you can see from the timing on my desktop
computer here (WinXP Pro SP2, P4, 3GHz, 512MB RAM):
> time.check <- function(){+   income <- runif(30000)
+   education <- as.factor(sample(c("high", "middle",
"low"),
size=length(income), replace=TRUE))
+   mydata <- data.frame(inc=income, edu=education)
+   
+   mymedians <- tapply(X=mydata$inc, INDEX=mydata$edu, FUN=median)
+ 
+   mydata$medians <- ifelse(mydata$edu=="high",
mymedians["high"], 0)
+   mydata$medians <- ifelse(mydata$edu=="middle",
mymedians["middle"],
mydata$medians)
+   mydata$medians <- ifelse(mydata$edu=="low",
mymedians["low"],
mydata$medians)
+   return(NULL)
+ }> system.time(time.check())
[1] 0.36 0.02 0.38   NA   NA>   
> version         _              
platform i386-pc-mingw32
arch     i386           
os       mingw32        
system   i386, mingw32  
status   beta           
major    2              
minor    1.0            
year     2005           
month    04             
day      04             
language R              


Best,
Roland


+++++
This mail has been sent through the MPI for Demographic Rese...{{dropped}}

Reasonably Related Threads

Search for more possibly parallel threads

R help - Jun 2005 - Vectorization

[R] Vectorization

[R] vectorization

[R] vectorization

[R] vectorization

[R] vectorization

[R] vectorization

Reasonably Related Threads