Hi, I have several files with data in this format: 20070102 20070102 20070106 20070201 ... The data is sorted and each line represents a date (YYYYMMDD). I would like to analyze this data using R. For instance, I would like to have a histogram by year, month or day. I've already made a simple Perl script that aggregates this data but I believe that R can be much more powerful and easy on this kind of work. Any suggestions on where to start? Thanks in advance, S?rgio Nunes
Here is a start on what you want to do. This generates some test data and then does a couple of summaries:> # generate some data > N <- 1000 > x <- data.frame(date=as.character(20070000 + sample(1:4, N, TRUE) * 100 +sample(1:31, N, TRUE)), + value=runif(N))> head(x) # display the datadate value 1 20070124 0.07904540 2 20070117 0.17864565 3 20070109 0.86078870 4 20070205 0.93952259 5 20070112 0.87904425 6 20070323 0.01717623> # assuming you read it in as character, convert to Date for processing > x$date <- as.Date(strptime(x$date, "%Y%m%d")) > x <- x[order(x$date), ] # order by date for plotting > plot(x$date, x$value, type='l') # plot the data > # show counts by month > table(months(x$date))April February January March 238 236 253 237> # average by month > aggregate(x$value, list(months(x$date)), mean)Group.1 x 1 April 0.4791387 2 February 0.5010831 3 January 0.5114135 4 March 0.4695668>On 2/15/07, Sérgio Nunes <snunes@gmail.com> wrote:> > Hi, > > I have several files with data in this format: > > 20070102 > 20070102 > 20070106 > 20070201 > ... > > The data is sorted and each line represents a date (YYYYMMDD). I would > like to analyze this data using R. For instance, I would like to have > a histogram by year, month or day. > > I've already made a simple Perl script that aggregates this data but I > believe that R can be much more powerful and easy on this kind of > work. > > Any suggestions on where to start? > > Thanks in advance, > Sérgio Nunes > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
S?rgio Nunes napsal(a):> Hi, > > I have several files with data in this format: > > 20070102 > 20070102 > 20070106 > 20070201 > ... > > The data is sorted and each line represents a date (YYYYMMDD). I would > like to analyze this data using R. For instance, I would like to have > a histogram by year, month or day. > > I've already made a simple Perl script that aggregates this data but I > believe that R can be much more powerful and easy on this kind of > work. > > Any suggestions on where to start? > > Thanks in advance, > S?rgio NunesAnalysing a particular day is easy - just call e.g. hist(your.variable[day==20070101]) if 'day' contains the date stored as an integer. If you want to do this for all days probably you mmight use a loop through unique(day) Finally, to do a monthly/yearly analysis, just create 'month' and 'year' month <- trunc(day/100) year <- trunc(day/10000) HTH Petr Klasterecky -- Dept. of Probability and Statistics Charles University in Prague Czech Republic
Hi again, I'm still trying to read my data but I'm having some difficulties converting it to dates. My data file has lines and in each line a single date exists in the format >2007/02/16< (without the >,<). I've tried the following:> d <- readLines("file.dat") > d[1] "2006/08/09" "2004/02/11" "2004/06/09" ... [2] ...> d2 <- as.Date(d, format="%Y/%m/$d") > d2[1] NA NA NA ... ... I'm surely doing something wrong. Any advice would be welcomed. Thanks! S?rgio Nunes On 2/15/07, S?rgio Nunes <snunes at gmail.com> wrote:> Hi, > > I have several files with data in this format: > > 20070102 > 20070102 > 20070106 > 20070201 > ... > > The data is sorted and each line represents a date (YYYYMMDD). I would > like to analyze this data using R. For instance, I would like to have > a histogram by year, month or day. > > I've already made a simple Perl script that aggregates this data but I > believe that R can be much more powerful and easy on this kind of > work. > > Any suggestions on where to start? > > Thanks in advance, > S?rgio Nunes >