Hello R-Users, I am new to R and trying my best however I need help with this simple task. I have a dataset, YM1207. X.Symbol Date Time Exchange TickType ReferenceNumber Price Size 12491 3:YMZ7.EC 12/03/2007 08:32:50 EC B 85985770 13379 7 12492 3:YMZ7.EC 12/03/2007 08:32:50 EC A 85985771 13380 4 12493 3:YMZ7.EC 12/03/2007 08:32:50 EC T 85985845 13379 1 12494 3:YMZ7.EC 12/03/2007 08:32:50 EC B 85985846 13379 7 12495 3:YMZ7.EC 12/03/2007 08:32:50 EC A 85985847 13380 4 12496 3:YMZ7.EC 12/03/2007 08:32:50 EC B 85986222 13379 6 12497 3:YMZ7.EC 12/03/2007 08:32:50 EC A 85986223 13380 4 I want to insert a column called NPrice which takes a pair of B,A and calculates its average Price. And than input that number in the B row and A row in the new column NPrice. Each B, A is seperated by +1 on the Reference Number. I want to skip T entries. T's do not come inbetween corresponding Bs and As. The other columns are not of interest. I would really appreciate it if I can get some help on this or refer me to a source that may. Thankyou Neil Gupta [[alternative HTML version deleted]]
Neil Gupta wrote:> Hello R-Users, > > I am new to R and trying my best however I need help with this simple task. > I have a dataset, YM1207. > X.Symbol Date Time Exchange TickType > ReferenceNumber Price Size > 12491 3:YMZ7.EC 12/03/2007 08:32:50 EC B > 85985770 13379 7 > 12492 3:YMZ7.EC 12/03/2007 08:32:50 EC A > 85985771 13380 4 > 12493 3:YMZ7.EC 12/03/2007 08:32:50 EC T > 85985845 13379 1 > 12494 3:YMZ7.EC 12/03/2007 08:32:50 EC B > 85985846 13379 7 > 12495 3:YMZ7.EC 12/03/2007 08:32:50 EC A > 85985847 13380 4 > 12496 3:YMZ7.EC 12/03/2007 08:32:50 EC B > 85986222 13379 6 > 12497 3:YMZ7.EC 12/03/2007 08:32:50 EC A > 85986223 13380 4 > > I want to insert a column called NPrice which takes a pair of B,A and > calculates its average Price. And than input that number in the B row and A > row in the new column NPrice. Each B, A is seperated by +1 on the Reference > Number. I want to skip T entries. T's do not come inbetween corresponding Bs > and As. The other columns are not of interest. I would really appreciate it > if I can get some help on this or refer me to a source that may. > >I think this is a case where what you really need to do is to become aware of the tools you have in the toolbox. E.g., I already showed you one way to do it if the T's were absent: N <- nrow(YM1207) ix <- gl(N/2,2) YM1207$NPrice <- ave(YM1207$price, ix) (OK, I forgot $price last time...) so how about making them disappear using isAB <- YM1207$TickType %in% c("A","B)] ABprice <- YM1207$price[ix] then do as before N <- length(ABprice) ix <- gl(N/2,2) NPrice <- ave(YM1207$price, ix) and put it back using YM1207$NPrice <- NA YM1207$NPrice[isAB] <- NPrice There are several ways to do this sort of thing. Another variation, closer to your original suggestion would be to do isA <- YM1207$TickType == "A" isB <- YM1207$TickType == "B" nPrice <- (YM1207$price[isA]+YM1207$price[isB])/2 YM1207$NPrice <- NA YM1207$NPrice[isA] <- YM1207$NPrice[isB] <- nPrice (you probably don't really need the NA assignment, but strange things can happen when you make subassignments into non-existing columns) -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Hello R-users, I have a very simple problem I wanted to solve. I have a large dataset as such: Lag X.Symbol Time TickType ReferenceNumber Price Size X.Symbol.1 Time.1 TickType.1 ReferenceNumber.1 1 ES 3:ESZ7.GB 08:30:00 B 74390987 151075 44 3:ESZ7.GB08:30:00 A 74390988 2 ES 3:YMZ7.EC 08:30:00 B 74390993 13686 17 3:YMZ7.EC08:30:00 A 74390994 3 YM 3:ESZ7.GB 08:30:00 B 74391135 151075 49 3:ESZ7.GB08:30:00 A 74391136 4 YM 3:YMZ7.EC 08:30:00 B 74390998 13686 17 3:YMZ7.EC08:30:00 A 74390999 5 YM 3:ESZ7.GB 08:30:00 B 74391135 151075 49 3:ESZ7.GB08:30:00 A 74391136 6 YM 3:YMZ7.EC 08:30:00 B 74391000 13686 14 3:YMZ7.EC08:30:00 A 74391001 Price.1 Size.1 LeadTime MidPoint Spread 1 151100 22 08:30:00 *151087.5* 25 2 13688 27 08:30:00 13687.0 2 3 151100 22 08:30:00 *151087.5* 25 4 13688 27 08:30:00 13687.0 2 5 151100 22 08:30:00 151087.5 25 6 13688 27 08:30:00 13687.0 2 All I wanted to do was take the Log(MidPoint[2]) - Log(MidPoint[1]) for a symbol "3:ESZ7.GB" So the first one would be log(151087.5) - log(151087.5). I wanted to do this throughout the data set and add that in another column. I would appreciate any help. Regards, Neil Gupta [[alternative HTML version deleted]]
On Fri, 6 Jun 2008, Neil Gupta wrote:> Hello R-users, > > I have a very simple problem I wanted to solve. I have a large dataset as > such: > Lag X.Symbol Time TickType ReferenceNumber Price Size X.Symbol.1 > Time.1 TickType.1 ReferenceNumber.1 > 1 ES 3:ESZ7.GB 08:30:00 B 74390987 151075 44 > 3:ESZ7.GB08:30:00 A 74390988 > 2 ES 3:YMZ7.EC 08:30:00 B 74390993 13686 17 > 3:YMZ7.EC08:30:00 A 74390994 > 3 YM 3:ESZ7.GB 08:30:00 B 74391135 151075 49 > 3:ESZ7.GB08:30:00 A 74391136 > 4 YM 3:YMZ7.EC 08:30:00 B 74390998 13686 17 > 3:YMZ7.EC08:30:00 A 74390999 > 5 YM 3:ESZ7.GB 08:30:00 B 74391135 151075 49 > 3:ESZ7.GB08:30:00 A 74391136 > 6 YM 3:YMZ7.EC 08:30:00 B 74391000 13686 14 > 3:YMZ7.EC08:30:00 A 74391001 > Price.1 Size.1 LeadTime MidPoint Spread > 1 151100 22 08:30:00 *151087.5* 25 > 2 13688 27 08:30:00 13687.0 2 > 3 151100 22 08:30:00 *151087.5* 25 > 4 13688 27 08:30:00 13687.0 2 > 5 151100 22 08:30:00 151087.5 25 > 6 13688 27 08:30:00 13687.0 2 > > > All I wanted to do was take the Log(MidPoint[2]) - Log(MidPoint[1]) for a > symbol "3:ESZ7.GB" > So the first one would be log(151087.5) - log(151087.5). I wanted to do this > throughout the data set and add that in another column. I would appreciate > any help.See example( split ) Note the "### data frame variation", which should serve as a template for your problem. HTH, Chuck> > Regards, > > Neil Gupta > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901