R I am taking an excel dataset and reading it into R using read.table. (actually I am dumping the data into a .txt file first and then reading data in to R). Here is snippet:> head(data);Date Price Open.Int. Comm.Long Comm.Short net.comm 1 15-Jan-86 673.25 175645 65910 28425 37485 2 31-Jan-86 677.00 167350 54060 27120 26940 3 14-Feb-86 680.25 157985 37955 25425 12530 4 28-Feb-86 691.75 162775 49760 16030 33730 5 14-Mar-86 706.50 163495 54120 27995 26125 6 31-Mar-86 709.75 164120 54715 30390 24325 The dataset runs from 1986 to 2007. I want to be able to take subsets of my data based on date e.g. data between 2000 - 2005. As it stands, I can't work with the dates as they are not in correct format. I tried successfully converting the dates to just the year using: transform(data, Yr = format(as.Date(as.character(Date),format = '%d-%b-%y'), "%y"))) This gives the following format: Date Price Open.Int. Comm.Long Comm.Short net.comm Yr 1 15-Jan-86 673.25 175645 65910 28425 37485 86 2 31-Jan-86 677.00 167350 54060 27120 26940 86 3 14-Feb-86 680.25 157985 37955 25425 12530 86 4 28-Feb-86 691.75 162775 49760 16030 33730 86 5 14-Mar-86 706.50 163495 54120 27995 26125 86 6 31-Mar-86 709.75 164120 54715 30390 24325 86 I can subset for a single year e.g: head(subset(df, Yr =="00") But how can I subset for multiple periods e.g 00- 05? The following won't work: head(subset(df, Yr =="00" & Yr=="01") or head(subset(df, Yr = c("00","01","02","03") I can't help but feeling that I am missing something and there is a simpler route. I leafed through R newletter 4.1 which deals with dates and times but it seemed that strptime and POSIXct / POSIXlt are not what I need either. Can anybody help me? Regards Alex
Try some of the following: head(subset(df, Yr %in% c("00","01","02","03"))) subset(df, (Yr >= '00') & (Yr <= '03')) # same as above subset(df, (Yr == '00') | (Yr == '01') | (Yr == '02') |(Yr == '03')) # same On 7/19/07, Alex Park <alex.park1 at ntlworld.com> wrote:> R > > I am taking an excel dataset and reading it into R using read.table. > (actually I am dumping the data into a .txt file first and then reading data > in to R). > > Here is snippet: > > > head(data); > Date Price Open.Int. Comm.Long Comm.Short net.comm > 1 15-Jan-86 673.25 175645 65910 28425 37485 > 2 31-Jan-86 677.00 167350 54060 27120 26940 > 3 14-Feb-86 680.25 157985 37955 25425 12530 > 4 28-Feb-86 691.75 162775 49760 16030 33730 > 5 14-Mar-86 706.50 163495 54120 27995 26125 > 6 31-Mar-86 709.75 164120 54715 30390 24325 > > The dataset runs from 1986 to 2007. > > I want to be able to take subsets of my data based on date e.g. data between > 2000 - 2005. > > As it stands, I can't work with the dates as they are not in correct format. > > I tried successfully converting the dates to just the year using: > > transform(data, Yr = format(as.Date(as.character(Date),format = '%d-%b-%y'), > "%y"))) > > This gives the following format: > > Date Price Open.Int. Comm.Long Comm.Short net.comm Yr > 1 15-Jan-86 673.25 175645 65910 28425 37485 86 > 2 31-Jan-86 677.00 167350 54060 27120 26940 86 > 3 14-Feb-86 680.25 157985 37955 25425 12530 86 > 4 28-Feb-86 691.75 162775 49760 16030 33730 86 > 5 14-Mar-86 706.50 163495 54120 27995 26125 86 > 6 31-Mar-86 709.75 164120 54715 30390 24325 86 > > I can subset for a single year e.g: > > head(subset(df, Yr =="00") > > But how can I subset for multiple periods e.g 00- 05? The following won't > work: > > head(subset(df, Yr =="00" & Yr=="01") > > or > > head(subset(df, Yr = c("00","01","02","03") > > I can't help but feeling that I am missing something and there is a simpler > route. > > I leafed through R newletter 4.1 which deals with dates and times but it > seemed that strptime and POSIXct / POSIXlt are not what I need either. > > Can anybody help me? > > Regards > > > Alex > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Alex:> I am taking an excel dataset and reading it into R using read.table.This sets up a "data.frame" object. The data you have are probably more conveniently represented as a time series, storing the date in an appropriate format, e.g., in class "Date".> (actually I am dumping the data into a .txt file first and then reading data > in to R).Then you can do both steps (calling read.table() and transformation to a time series) in one go using the function read.zoo() from package "zoo". If your text file looks like Date Price Open.Int. Comm.Long Comm.Short net.comm 15-Jan-86 673.25 175645 65910 28425 37485 31-Jan-86 677.00 167350 54060 27120 26940 14-Feb-86 680.25 157985 37955 25425 12530 28-Feb-86 691.75 162775 49760 16030 33730 14-Mar-86 706.50 163495 54120 27995 26125 31-Mar-86 709.75 164120 54715 30390 24325 then you can read it in via z <- read.zoo("mydata.txt", format = "%d-%b-%y", header = TRUE) Then you can do all sorts of standard things for time series, such as plot(z) or...> The dataset runs from 1986 to 2007. > > I want to be able to take subsets of my data based on date e.g. data between > 2000 - 2005....subsetting z2 <- window(z, start = as.Date("2000-01-01"), end = as.Date("2005-12-31")) etc. Look at the "zoo" package vignettes for more information vignette("zoo-quickref", package = "zoo") vignette("zoo", package = "zoo") hth, Z> As it stands, I can't work with the dates as they are not in correct format. > > I tried successfully converting the dates to just the year using: > > transform(data, Yr = format(as.Date(as.character(Date),format = '%d-%b-%y'), > "%y"))) > > This gives the following format: > > Date Price Open.Int. Comm.Long Comm.Short net.comm Yr > 1 15-Jan-86 673.25 175645 65910 28425 37485 86 > 2 31-Jan-86 677.00 167350 54060 27120 26940 86 > 3 14-Feb-86 680.25 157985 37955 25425 12530 86 > 4 28-Feb-86 691.75 162775 49760 16030 33730 86 > 5 14-Mar-86 706.50 163495 54120 27995 26125 86 > 6 31-Mar-86 709.75 164120 54715 30390 24325 86 > > I can subset for a single year e.g: > > head(subset(df, Yr =="00") > > But how can I subset for multiple periods e.g 00- 05? The following won't > work: > > head(subset(df, Yr =="00" & Yr=="01") > > or > > head(subset(df, Yr = c("00","01","02","03") > > I can't help but feeling that I am missing something and there is a simpler > route. > > I leafed through R newletter 4.1 which deals with dates and times but it > seemed that strptime and POSIXct / POSIXlt are not what I need either. > > Can anybody help me? > > Regards > > > Alex > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
> I am taking an excel dataset and reading it into R using read.table. > (actually I am dumping the data into a .txt file first and then reading data > in to R).If you are on *windows* you could also try my xlsReadWrite package which contains some datetime functions. Exceldates (e.g. formatted as dd-mmm-yy) can be read as COleDateTime (floating point) values or as character strings. The first one is preferable imo as it avoids a typecast and it is the type commonly used in OLE automation.> But how can I subset for multiple periods e.g 00- 05?Floating point numbers dates can be converted to year-strings with dateTimeToStr( value, "yy" ) and then subset as shown in a previous post. The (paid) pro version contains many more date functions, e.g. yearOf. Details see http://treetron.googlepages.com/xls.oledatetimeex.html. -- Regards, Hans-Peter