thr3ads.net - R help - [R] Behavior of self-defined function within ddply [Jan 2014]

If this information is useful, please help other people find it:
Share via:

Amitabh Dugar

2014-Jan-15 18:19 UTC

[R] Behavior of self-defined function within ddply

I have a dataframe "small" whch has 5,000 rows and contains data for
several tickers every month, as below:

  

 monthend_n ticker wgtdiff ret interval b1 b2 b3 b4 b5 b6 
1 19990228 AA 0.7172 -2.58 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904
1.108
2 19990228 AAPL -0.0828 -15.48 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904
1.108
3 19990228 ABCW 0.0966 -7.36 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904
1.108

 ? ? 
 
 
 
 
 
 
 
 
 
705 19990331 AA 0.1932 1.7 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998
0.816
706 19990331 AAPL 0.033 3.23 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998
0.816
707 19990331 ABF 0.154 -20.51 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998
0.816
708 19990331 ABI 0.286 8.33 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998
0.816
etc.
Variables b1 through b6 are break points that I want to use in the
"cut" function and they vary each month according to the distribution
of the variable "wgtdiff " during that month.

To handle this I wrote a function as below:
cutfunc <- function(df)
{
vec <- df$wgtdiff
# need to apply unique function as break points within each month are same for
all tickers (b1-b6 values same in each within month)
breaks <- c(unique(df$b1), unique(df$b2), unique(df$b3), unique(df$b4),
unique(df$b5), unique(df$b6))
bin <- cut(vec, breaks,labels=F)
bin
}
Then? I tried:
temp4 <- ddply(small, .(monthend_n), summarize, bins=cutfunc(small))
I was expecting? to get back a data frame with 5,000 rows with bins assignments
for each ticker, and if there are 6 break points the bin #s should range from 1
to 5.
However instead I get? a data frame with 40,000 rows and bin # ranging from 1-
40, as below:
? monthend_n bins
1?? 19990228?? 40
2?? 19990228?? 17
3?? 19990228?? 22
...
5000?? 19990228?? 17
5001?? 19990331?? 40
5002?? 19990331?? 17
5003?? 19990331?? 22

etc

It seems ddply doesn't pass in monthly pieces of the data frame
"small" into my "cutfunc" in the way I expect

Any guidance is appreciated.
Thanks

arun

2014-Jan-16 09:02 UTC

head link

[R] Behavior of self-defined function within ddply

Hi,
May be this helps:

small <- read.table(text="monthend_n ticker wgtdiff ret interval b1 b2
b3 b4 b5 b6
1 19990228 AA 0.7172 -2.58 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904
1.108
2 19990228 AAPL -0.0828 -15.48 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904
1.108
3 19990228 ABCW 0.0966 -7.36 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904
1.108
705 19990331 AA 0.1932 1.7 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998
0.816
706 19990331 AAPL 0.033 3.23 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998
0.816
707 19990331 ABF 0.154 -20.51 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998
0.816
708 19990331 ABI 0.286 8.33 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998
0.816",sep="",header=TRUE,stringsAsFactors=FALSE)
res <- mutate(small,bins=unlist(dlply(small,.(monthend_n),cutfunc)))
res$bins
#[1] 4 2 3 4 3 3 4


ddply(small,.(monthend_n),summarize,bins=cut(wgtdiff,breaks=unique(c(b1,b2,b3,b4,b5,b6)),labels=F))[,2]
#[1] 4 2 3 4 3 3 4

unlist(lapply(split(small,small$monthend_n),cutfunc),use.names=FALSE)
#[1] 4 2 3 4 3 3 4

A.K.

?



On Thursday, January 16, 2014 2:01 AM, Amitabh Dugar <cleverchap at
yahoo.com> wrote:
I have a dataframe "small" whch has 5,000 rows and contains data for
several tickers every month, as below:

? 

monthend_n ticker wgtdiff ret interval b1 b2 b3 b4 b5 b6 
1 19990228 AA 0.7172 -2.58 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904
1.108
2 19990228 AAPL -0.0828 -15.48 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904
1.108
3 19990228 ABCW 0.0966 -7.36 0.33896 -0.5868 -0.24784 0.09112 0.43008 0.76904
1.108

? ? 









705 19990331 AA 0.1932 1.7 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998
0.816
706 19990331 AAPL 0.033 3.23 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998
0.816
707 19990331 ABF 0.154 -20.51 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998
0.816
708 19990331 ABI 0.286 8.33 0.31602 -0.7641 -0.44808 -0.13206 0.18396 0.49998
0.816
etc.
Variables b1 through b6 are break points that I want to use in the
"cut" function and they vary each month according to the distribution
of the variable "wgtdiff " during that month.

To handle this I wrote a function as below:
cutfunc <- function(df)
{
vec <- df$wgtdiff
# need to apply unique function as break points within each month are same for
all tickers (b1-b6 values same in each within month)
breaks <- c(unique(df$b1), unique(df$b2), unique(df$b3), unique(df$b4),
unique(df$b5), unique(df$b6))
bin <- cut(vec, breaks,labels=F)
bin
}
Then? I tried:
temp4 <- ddply(small, .(monthend_n), summarize, bins=cutfunc(small))
I was expecting? to get back a data frame with 5,000 rows with bins assignments
for each ticker, and if there are 6 break points the bin #s should range from 1
to 5.
However instead I get? a data frame with 40,000 rows and bin # ranging from 1-
40, as below:
? monthend_n bins
1?? 19990228?? 40
2?? 19990228?? 17
3?? 19990228?? 22
...
5000?? 19990228?? 17
5001?? 19990331?? 40
5002?? 19990331?? 17
5003?? 19990331?? 22

etc

It seems ddply doesn't pass in monthly pieces of the data frame
"small" into my "cutfunc" in the way I expect

Any guidance is appreciated.
Thanks

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

R help - Jan 2014 - Behavior of self-defined function within ddply

[R] Behavior of self-defined function within ddply

[R] Behavior of self-defined function within ddply