Dear R-experts, Sorry if I've overlooked a simple solution here. I have calculated a proportion of the number of observations which meet a criteria, applied to five years of data. How can I break down this proportion statistic for each year? For example (data in zoo format): open high low close hc lc 2004-12-29 4135 4135 4106 4116 8 -21 2004-12-30 4120 4131 4115 4119 15 -1 2004-12-31 4123 4124 4114 4117 5 -5 2005-01-04 4106 4137 4103 4137 20 -14 2005-01-06 4085 4110 4085 4096 10 -15 2005-01-10 4133 4148 4122 4139 15 -11 2005-01-11 4142 4158 4127 4130 19 -12 2005-01-12 4113 4138 4112 4127 18 8 Statistic of interest is proportion of times that sign of "hc" is positive and sign of "lc" is negative on any given day. Looking to return something like: Yr Prop 2004 1.0 2005 0.8 Along these lines, if I have datasets A and B, where B is a subset of A, can I use the number of matching dates to calculate the yearly proportions in question? Thanks, Alfonso Sammassimo Melbourne Australia
Here is one way to break it down to years.> x <- " open high low close hc lc+ 2004-12-29 4135 4135 4106 4116 8 -21 + 2004-12-30 4120 4131 4115 4119 15 -1 + 2004-12-31 4123 4124 4114 4117 5 -5 + 2005-01-04 4106 4137 4103 4137 20 -14 + 2005-01-06 4085 4110 4085 4096 10 -15 + 2005-01-10 4133 4148 4122 4139 15 -11 + 2005-01-11 4142 4158 4127 4130 19 -12 + 2005-01-12 4113 4138 4112 4127 18 8"> > xIn <- read.table(textConnection(x), header=TRUE) > x.zoo <- zoo(xIn, as.POSIXct(row.names(xIn))) > sapply(split(x.zoo, format(index(x.zoo), "%Y")), function(.year){+ sum(.year[,'lc'] < 0) / sum(.year[,'hc'] > 0) + }) 2004 2005 1.0 0.8 On 5/27/07, Alfonso Sammassimo <cincinattikid@bigpond.com> wrote:> > Dear R-experts, > > Sorry if I've overlooked a simple solution here. I have calculated a > proportion of the number of observations which meet a criteria, applied to > five years of data. How can I break down this proportion statistic for > each > year? > > For example (data in zoo format): > > open high low close hc lc > 2004-12-29 4135 4135 4106 4116 8 -21 > 2004-12-30 4120 4131 4115 4119 15 -1 > 2004-12-31 4123 4124 4114 4117 5 -5 > 2005-01-04 4106 4137 4103 4137 20 -14 > 2005-01-06 4085 4110 4085 4096 10 -15 > 2005-01-10 4133 4148 4122 4139 15 -11 > 2005-01-11 4142 4158 4127 4130 19 -12 > 2005-01-12 4113 4138 4112 4127 18 8 > > Statistic of interest is proportion of times that sign of "hc" is positive > and sign of "lc" is negative on any given day. Looking to return something > like: > > Yr Prop > 2004 1.0 > 2005 0.8 > > Along these lines, if I have datasets A and B, where B is a subset of A, > can > I use the number of matching dates to calculate the yearly proportions in > question? > > Thanks, > Alfonso Sammassimo > Melbourne Australia > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
Here are a couple of solutions: 1. using zoo package First add Date to the header so there are the same number of column headers as columns and then read in using read.zoo. Then aggregate over years using mean. For more on zoo try library(zoo); vignette("zoo") and for more on dates see the R News 4/1 help desk article. # added Date to the header Lines <- "Date open high low close hc lc 2004-12-29 4135 4135 4106 4116 8 -21 2004-12-30 4120 4131 4115 4119 15 -1 2004-12-31 4123 4124 4114 4117 5 -5 2005-01-04 4106 4137 4103 4137 20 -14 2005-01-06 4085 4110 4085 4096 10 -15 2005-01-10 4133 4148 4122 4139 15 -11 2005-01-11 4142 4158 4127 4130 19 -12 2005-01-12 4113 4138 4112 4127 18 8 " library(zoo) # z <- read.zoo("myfile.dat", header = TRUE) z <- read.zoo(textConnection(Lines), header = TRUE) aggregate(z[,"hc"] > 0 & z[,"lc"] < 0, function(x) format(x, "%Y"), mean) 2. Using data frames and tapply Read in as a data frame, calculate year and tapply the mean by year: # Lines is from above # dat <- read.table("myfile.dat", header = TRUE) dat <- read.table(textConnection(Lines), header = TRUE) year <- as.numeric(format(as.Date(dat$Date), "%Y")) tapply(dat$hc > 0 & dat$lc < 0, year, mean) On 5/27/07, Alfonso Sammassimo <cincinattikid at bigpond.com> wrote:> Dear R-experts, > > Sorry if I've overlooked a simple solution here. I have calculated a > proportion of the number of observations which meet a criteria, applied to > five years of data. How can I break down this proportion statistic for each > year? > > For example (data in zoo format): > > open high low close hc lc > 2004-12-29 4135 4135 4106 4116 8 -21 > 2004-12-30 4120 4131 4115 4119 15 -1 > 2004-12-31 4123 4124 4114 4117 5 -5 > 2005-01-04 4106 4137 4103 4137 20 -14 > 2005-01-06 4085 4110 4085 4096 10 -15 > 2005-01-10 4133 4148 4122 4139 15 -11 > 2005-01-11 4142 4158 4127 4130 19 -12 > 2005-01-12 4113 4138 4112 4127 18 8 > > Statistic of interest is proportion of times that sign of "hc" is positive > and sign of "lc" is negative on any given day. Looking to return something > like: > > Yr Prop > 2004 1.0 > 2005 0.8 > > Along these lines, if I have datasets A and B, where B is a subset of A, can > I use the number of matching dates to calculate the yearly proportions in > question? > > Thanks, > Alfonso Sammassimo > Melbourne Australia > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Possibly Parallel Threads
- hang with fsdlm
- aggregating data with Zoo
- help with loop over data frame
- Is there better alternative to this loop?
- 14 commits - libswfdec/swfdec_as_object.c libswfdec/swfdec_file_reference.c libswfdec/swfdec_load_object.c libswfdec/swfdec_sprite_movie_as.c libswfdec/swfdec_system_security.c test/trace