rivercode
2010-Oct-04 21:29 UTC
[R] Loop too slow for Bid calc - BUT cannot figure out how to do with matrix
Hi, I am trying to create Bid/Ask for each second from a high volume stock and the only way I have been able to solve this is using loops to create the target matrix from the source tick data matrix. Looping is too slow and not practical to use on multiple stocks. For example: Bids Matrix (a real one is 400,000++ length): Bid Time 10.03 11:05:03.124 10.04 11:05:03.348 10.05 11:05:04.010 One Second Bid Matrix (Bid price for every second of the day): Bid Second 10.02 11:05:03 ?? 11:05:04 <----Last bid price before 11:05:04.xxx, which is 11.04 at 11:05:03.348 The challenge is how to create the one second bid matrix, without looping through the Bids Matrix to find the first timestamp that is greater than the OneSecond timestamp then getting the previous row price from BidsMatrix...which would have been the bid at the beginning of that second. I am new to R, so need some help to do this ?properly?. # OneSecond. Matrix above called ?One Second Bid Matrix? # BidsMatrix. Matrix above called ?Bids Matrix? bidrow = 1 # looping through each second for (sec in 1:length(OneSecond$Second) ) { t = as.POSIXlt(onesec$Second[sec],origin = "1970-01-01") sec.onesec = as.numeric(format(t, "%H%M%S")) # convert date/time to format HHMMSS as a number # Find bid for second, which is the last bid before a change in the second for (r in bidrow:length(BidsMatrix$Price)) { # convert the BidsMatrix timestamp to number of format %H%M%S bidTS = unlist(strsplit(as.character(BidsMatrix$Time[r]), split="\\."))[1] # remove milliseconds bidTS = gsub(":", "", bidTS) # remove ":" from time bidTS = as.numeric(bidTS) # convert to number if (bidTS > sec.onesec) { onesec$Bid[sec] = bids$Price[r -1] # Price of previous bid bidrow = r # save bidrow as starting point to find next bid. break } #if }# for }# for Hope this is clear and thanks for your help. Chris -- View this message in context: http://r.789695.n4.nabble.com/Loop-too-slow-for-Bid-calc-BUT-cannot-figure-out-how-to-do-with-matrix-tp2955116p2955116.html Sent from the R help mailing list archive at Nabble.com.
Duncan Murdoch
2010-Oct-04 21:47 UTC
[R] Loop too slow for Bid calc - BUT cannot figure out how to do with matrix
On 04/10/2010 5:29 PM, rivercode wrote:> Hi, > > I am trying to create Bid/Ask for each second from a high volume stock and > the only way I have been able to solve this is using loops to create the > target matrix from the source tick data matrix. Looping is too slow and > not practical to use on multiple stocks. For example:I would use loops, but something like this: lastrow <- Bids[1,] time <- [ some starting time ] for (i in 2:400000) { thisrow <- Bids[i,] while (thisrow[,2] > time) { output lastrow for time time <- time + 1 } lastrow <- thisrow } I don't see how that would be too slow, but if it was, just rewrite it in C. Duncan Murdoch> > Bids Matrix (a real one is 400,000++ length): > > Bid Time > 10.03 11:05:03.124 > 10.04 11:05:03.348 > 10.05 11:05:04.010 > > One Second Bid Matrix (Bid price for every second of the day): > > Bid Second > 10.02 11:05:03 > ?? 11:05:04 <----Last bid price before 11:05:04.xxx, which is > 11.04 at 11:05:03.348 > > The challenge is how to create the one second bid matrix, without looping > through the Bids Matrix to find the first timestamp that is greater than the > OneSecond timestamp then getting the previous row price from > BidsMatrix...which would have been the bid at the beginning of that second. > > I am new to R, so need some help to do this ?properly?. > > # OneSecond. Matrix above called ?One Second Bid Matrix? > # BidsMatrix. Matrix above called ?Bids Matrix? > > bidrow = 1 > > # looping through each second > for (sec in 1:length(OneSecond$Second) ) > { > t = as.POSIXlt(onesec$Second[sec],origin = "1970-01-01") > sec.onesec = as.numeric(format(t, "%H%M%S")) # convert date/time to format > HHMMSS as a number > > # Find bid for second, which is the last bid before a change in the > second > for (r in bidrow:length(BidsMatrix$Price)) > { > # convert the BidsMatrix timestamp to number of format > %H%M%S > bidTS = unlist(strsplit(as.character(BidsMatrix$Time[r]), > split="\\."))[1] # remove milliseconds > bidTS = gsub(":", "", bidTS) # remove ":" from time > bidTS = as.numeric(bidTS) # convert to number > > > if (bidTS > sec.onesec) > { > onesec$Bid[sec] = bids$Price[r -1] # Price of previous bid > bidrow = r # save bidrow as starting point to find next bid. > break > } #if > > }# for > > }# for > > Hope this is clear and thanks for your help. > > Chris >
Martin Morgan
2010-Oct-04 22:28 UTC
[R] Loop too slow for Bid calc - BUT cannot figure out how to do with matrix
On 10/04/2010 02:29 PM, rivercode wrote:> > Hi, > > I am trying to create Bid/Ask for each second from a high volume stock and > the only way I have been able to solve this is using loops to create the > target matrix from the source tick data matrix. Looping is too slow and > not practical to use on multiple stocks. For example: > > Bids Matrix (a real one is 400,000++ length): > > Bid Time > 10.03 11:05:03.124 > 10.04 11:05:03.348 > 10.05 11:05:04.010 > > One Second Bid Matrix (Bid price for every second of the day): > > Bid Second > 10.02 11:05:03 > ?? 11:05:04 <----Last bid price before 11:05:04.xxx, which is > 11.04 at 11:05:03.348 > > The challenge is how to create the one second bid matrix, without looping > through the Bids Matrix to find the first timestamp that is greater than the > OneSecond timestamp then getting the previous row price from > BidsMatrix...which would have been the bid at the beginning of that second.Not sure that I understand, but here y = as.POSIXlt(runif(400000, 0, 8 * 60 * 60), origin="1970-01-01") are 400,000 times over an 8 hour window, at sub-second intervals. Order these (order()), find the second in which each occurs (floor(as.numeric())), identify the last record in each second (diff() != 0), including the last record of the day (c()), and keep only those (o[]) o = order(y) i = o[ c(diff(floor(as.numeric(y)[o])) != 0, TRUE) ] and view the time (two different ways)> head(y[i])[1] "1969-12-31 16:00:00 PST" "1969-12-31 16:00:01 PST" [3] "1969-12-31 16:00:02 PST" "1969-12-31 16:00:03 PST" [5] "1969-12-31 16:00:04 PST" "1969-12-31 16:00:05 PST"> head(as.numeric(y)[i])[1] 0.9551883 1.8336520 2.8745100 3.9695889 4.8229001 5.8056079 if y were a column of df, y <- df$y and after the above df[i,] Martin> > I am new to R, so need some help to do this ?properly?. > > # OneSecond. Matrix above called ?One Second Bid Matrix? > # BidsMatrix. Matrix above called ?Bids Matrix? > > bidrow = 1 > > # looping through each second > for (sec in 1:length(OneSecond$Second) ) > { > t = as.POSIXlt(onesec$Second[sec],origin = "1970-01-01") > sec.onesec = as.numeric(format(t, "%H%M%S")) # convert date/time to format > HHMMSS as a number > > # Find bid for second, which is the last bid before a change in the > second > for (r in bidrow:length(BidsMatrix$Price)) > { > # convert the BidsMatrix timestamp to number of format > %H%M%S > bidTS = unlist(strsplit(as.character(BidsMatrix$Time[r]), > split="\\."))[1] # remove milliseconds > bidTS = gsub(":", "", bidTS) # remove ":" from time > bidTS = as.numeric(bidTS) # convert to number > > > if (bidTS > sec.onesec) > { > onesec$Bid[sec] = bids$Price[r -1] # Price of previous bid > bidrow = r # save bidrow as starting point to find next bid. > break > } #if > > }# for > > }# for > > Hope this is clear and thanks for your help. > > Chris >-- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
Newbie question ... I am looking something equivalent to read.delim but which accepts a text line as parameter instead of a file input. Below is my problem, I'm unable to get the exact output which is a simple data frame of the data where the delimiter exists ... coming quite close though I have a data frame with 10 lines called MF_Data> MF_Data [1:10][1] "Scheme Code;Scheme Name;Net Asset Value;Repurchase Price;Sale Price;Date" [2] "" [3] "Open Ended Schemes ( Liquid )" [4] "" [5] "" [6] "AIG Global Investment Group Mutual Fund" [7] "106506;AIG India Liquid Fund-Institutional Plan-Daily Dividend Option;1001.0000;1001.0000;1001.0000;02-Oct-2010" [8] "106511;AIG India Liquid Fund-Institutional Plan-Growth Option;1210.4612;1210.4612;1210.4612;02-Oct-2010" [9] "106507;AIG India Liquid Fund-Institutional Plan-Weekly Dividend Option;1001.8765;1001.8765;1001.8765;02-Oct-2010" [10] "106503;AIG India Liquid Fund-Retail Plan-DailyDividend Option;1001.0000;1001.0000;1001.0000;02-Oct-2010" Now for the lines below .. they are delimted by ; ... I am using tempTxt <- MF_Data[7] MF_Data_F <- unlist(strsplit(tempTxt,";", fixed = TRUE)) tempTxt <- MF_Data[8] MF_Data_F1 <- unlist(strsplit(tempTxt,";", fixed = TRUE)) MF_Data_F <- rbind(MF_Data_F,MF_Data_F1) But MF_Data_F is not a simple 2X6 data frame which is what I want