Ira,
You may try also with ?ddply()
dat2<-
data.frame(S1=rep(Pred1[,1],ncol(Pred1)-1),variable=rep(colnames(Pred1)[-1],each=nrow(Pred1)),Predict=unlist(Pred1[,-1],use.names=FALSE),Actual=unlist(Actual1[,-1],use.names=FALSE),stringsAsFactors=FALSE)
?identical(dat,dat2)
#[1] TRUE
dat2New<- dat2[!(is.na(dat2$Predict)|is.na(dat2$Actual)),]
?dat3<- dat2New[order(dat2New$S1,dat2New$Predict),]
library(plyr)
?res2<- ddply(dat3,.(S1),summarize,
cbind(c(head(rev(Predict),5),head(Predict,5)),c(head(rev(Actual),5),head(Actual,5))))
#in the example data this works
res2New<- data.frame(S1=res2[,1],Predict=res2[,2][,1],Actual=res2[,2][,2])
?res3<- res2New[res2New$Predict!=0,]
row.names(res3)<- 1:nrow(res3)
?identical(res3,res[,-2])
#[1] TRUE
But, if you have fewer number of positive or negative values, then the loop
method or trying individually with ?ddply would be appropriate.
A.K.
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: Ira Sharenow <irasharenow100 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Wednesday, September 25, 2013 4:24 PM
Subject: Re: Best and worst values for each date
Hi,
May be you can try this:
obj_name<- load("arun.RData")
Pred1<- get(obj_name[1])
Actual1<- get(obj_name[2])
library(reshape2)
dat<-cbind(melt(Pred1,id.vars="S1"),value2=melt(Actual1,id.vars="S1")[,3])?
# to reshape to long form
colnames(dat)[3:4]<- c("Predict","Actual")
dat$variable<- as.character(dat$variable) #not that needed
dat1<-? dat[!(is.na(dat$Predict)|is.na(dat$Actual)),] # removes the NA values
in columns "Predict" and "Actual"
res<-
do.call(rbind,lapply(split(dat1,dat1$S1),function(x){x1<-x[order(x$Predict),]
??? ??? ??? ??? ??? ??? ????????????? xlow<-if(sum(x1$Predict<0) <5){?
#in cases where you don't have 5 negative numbers
???????????????????????????????????????????????? x1[x1$Predict<0,]
??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? }
???????????????????????????????????????????? else? {
??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? x1[x1$Predict<0,][1:5,]? # select
first five rows?? ?
? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ?}
??? ??? ??? ??? ??? ??? ??? ??? ??? ? ? ?? xhigh<- if(sum(x1$Predict>0)
<5){ #not having 5 postive numbers
??????????????????????????????????????????????? ? x1[x1$Predict>0,]}
????????????????????????????????????????????????? else {
??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? tail(x1[x1$Predict>0,],5)
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? } ?
? ? ? ? ? ? ? ? ?????rbind(xhigh[rev(order(xhigh$Predict)),],xlow)}))? ##reverse
the order of high values
?dim(res)
#[1] 480?? 4
A.K.
________________________________
From: Ira Sharenow <irasharenow100 at yahoo.com>
To: arun <smartpink111 at yahoo.com>
Sent: Wednesday, September 25, 2013 12:55 PM
Subject: Best and worst values for each date
Arun,
I hope you have been doing well.
I have a new problem.
I have two data frames, one for predictions and one for the actual returns.
Each day I act on the returns that have the 5 highest values and the five lowest
values. I then want to compare to the actual values. So I need to subset my two
original data frames so that the stocks and their prices that remain after each
day are the ones I want. At the end of filtering there will be one data frame
for predictions and one data frame for actual values.
Now for an enhancement. NA values cannot be part of the reduced data frames but
will occur in great proportion in the original data frames. Each day I need to
check that the top five are positive; otherwise I need to reduce that number as
needed. Similarly I need for the bottom five are negative. At the end of 50 days
each original data frame will have 5 * 2 * 50 = 500 rows, but this step may
reduce that number.
I attached a smallish file with the two data frames. The real ones have hundreds
of columns and over 1,000 rows.
Please aim for simplicity. If the solution is complex, please explain.
Do you want me to use a different email address?
Thanks.
Ira
Example. But the stocks are not set up this way.
The highlighted stocks are in the first data frames.
Predict Actual
1/3/2006 S1 3 -1.943
1/3/2006 S20 4 10.376
1/3/2006 S3 2 8.611
1/3/2006 S4 1 7.465
1/3/2006 S5 0 1.648
1/3/2006 S6 -1 5.36
1/3/2006 S7 -2 4.36
1/3/2006 S8 -3 3.574
1/3/2006 S9 -4 2.748
1/3/2006 S10 -5 1.933
1/3/2006 S11 -6 0.548
1/3/2006 S12 -7 -0.66
1/3/2006 S13 -8 -1.793
1/3/2006 S14 -9 -2.163
1/3/2006 S15 -10 -3.077
1/3/2006 S16 -11 -4.723
1/3/2006 S17 -12 -5.919
1/3/2006 S18 -13 -6.529
1/3/2006 S19 -14 -7.979
1/3/2006 S20 -15 -8.064
After making sure only positives are in for top 5 predictions and only negatives
for the bottom 5 predictions
1/3/2006 S1 3 -1.943
1/3/2006 S20 4 10.376
1/3/2006 S3 2 8.611
1/3/2006 S4 1 7.465
1/3/2006 S16 -11 -4.723
1/3/2006 S17 -12 -5.919
1/3/2006 S18 -13 -6.529
1/3/2006 S19 -14 -7.979
1/3/2006 S20 -15 -8.064
Note that the next day different stocks may be selected. Also there cannot any
NA in either the Predict or Actual columns.?????????? ???????