Hi everyone I have some records that include a date attribute for the date and time but i need to separate the data and analyze it separately in GIS by Month and Year, so i need to pull these attributes out and create their own attribute field. So the input: RawData2.. returns ID period_end_date 1 22 9/10/2007 0:00:00 2 44 2/2/2006 0:00:00 and i need to get ID period_end_date Month Year 22 9/10/2007 0:00:00 9 2007 44 2/2/2006 0:00:00 2 2006 The below gets me this in list form which i can then add back into the initial data frame BUT i have over 4.5 million records and when i run the below it ran for more than 18 hours and only go through about 2.7 millions records when i gave up and ended the process. So how can i make this more efficient and possibly add the new attributes (month/year) to the data frame on the fly. Thanks guys.... #Create sample data RawData2..<-data.frame(ID=c(22,44),period_end_date=c("9/10/2007 0:00:00","2/2/2006 0:00:00")) #Create lists to store month and year results Data.Month_<-list() Data.Year_<-list() #pull out year/month attribute at put in own column for(i in 1:length(RawData2..$ID)){ #Select Record Data.X<-RawData..[i,] #Separate date into month, day, and year DateSplit<-strsplit(Data.X$period_end_date,"/") #Select month Month<-unlist(DateSplit)[1] #Separate year from time attribute Year.X<-strsplit(unlist(DateSplit)[3]," ") Year.Y<-unlist(Year.X)[1] Data.Month_[[i]]<-Month Data.Year_[[i]]<-Year.Y } -- View this message in context: http://r.789695.n4.nabble.com/Alter-character-attribute-tp3018202p3018202.html Sent from the R help mailing list archive at Nabble.com.
try this:> x <- read.table(textConnection(" ID date time+ 1 22 9/10/2007 0:00:00 + 2 44 2/2/2006 0:00:00"), header = TRUE)> closeAllConnections() > xID date time 1 22 9/10/2007 0:00:00 2 44 2/2/2006 0:00:00> x$month <- sub("^([[:digit:]]+).*", "\\1", x$date) > x$year <- sub(".*?([[:digit:]]+)$", "\\1", x$date) > xID date time month year 1 22 9/10/2007 0:00:00 9 2007 2 44 2/2/2006 0:00:00 2 2006>On Thu, Oct 28, 2010 at 6:40 PM, LCOG1 <jroll at lcog.org> wrote:> > Hi everyone > > I have some records that include a date attribute for the date and time but > i need to separate the data and analyze it separately in GIS by Month and > Year, so i need to pull these attributes out and create their own attribute > field. > > So the input: > RawData2.. returns > > ?ID ? period_end_date > 1 22 9/10/2007 0:00:00 > 2 44 ?2/2/2006 0:00:00 > > and i need to get > ?ID ? period_end_date ? ?Month Year > ?22 9/10/2007 0:00:00 ? 9 ? ? ? ? 2007 > ?44 ?2/2/2006 0:00:00 ? ?2 ? ? ? ?2006 > > The below gets me this in list form which i can then add back into the > initial data frame BUT > i have over 4.5 million records and when i run the below it ran for more > than 18 hours and only go through about 2.7 millions records when i gave up > and ended the process. > > So how can i make this more efficient and possibly add the new attributes > (month/year) to the data frame on the fly. > > Thanks guys.... > > #Create sample data > RawData2..<-data.frame(ID=c(22,44),period_end_date=c("9/10/2007 > 0:00:00","2/2/2006 0:00:00")) > > #Create lists to store month and year results > Data.Month_<-list() > Data.Year_<-list() > #pull out year/month attribute at put in own column > for(i in 1:length(RawData2..$ID)){ > ? ? #Select Record > ? ? Data.X<-RawData..[i,] > ? ? #Separate date into month, day, and year > ? ? DateSplit<-strsplit(Data.X$period_end_date,"/") > ? ? #Select month > ? ? Month<-unlist(DateSplit)[1] > ? ? #Separate year from time attribute > ? ? Year.X<-strsplit(unlist(DateSplit)[3]," ") > ? ? Year.Y<-unlist(Year.X)[1] > ? ? Data.Month_[[i]]<-Month > ? ? Data.Year_[[i]]<-Year.Y > > } > > > -- > View this message in context: http://r.789695.n4.nabble.com/Alter-character-attribute-tp3018202p3018202.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
I didn't see you test so, so here is the solution with your data:> RawData2..<-data.frame(ID=c(22,44),period_end_date=c("9/10/2007 0:00:00",+ "2/2/2006 0:00:00"))> RawData2..$month <- sub("^([[:digit:]]+).*", "\\1", RawData2..$period_end_date) > RawData2..$year <- sub(".*/([[:digit:]]+) .*", "\\1", RawData2..$period_end_date) > RawData2..ID period_end_date month year 1 22 9/10/2007 0:00:00 9 2007 2 44 2/2/2006 0:00:00 2 2006>On Thu, Oct 28, 2010 at 6:40 PM, LCOG1 <jroll at lcog.org> wrote:> > Hi everyone > > I have some records that include a date attribute for the date and time but > i need to separate the data and analyze it separately in GIS by Month and > Year, so i need to pull these attributes out and create their own attribute > field. > > So the input: > RawData2.. returns > > ?ID ? period_end_date > 1 22 9/10/2007 0:00:00 > 2 44 ?2/2/2006 0:00:00 > > and i need to get > ?ID ? period_end_date ? ?Month Year > ?22 9/10/2007 0:00:00 ? 9 ? ? ? ? 2007 > ?44 ?2/2/2006 0:00:00 ? ?2 ? ? ? ?2006 > > The below gets me this in list form which i can then add back into the > initial data frame BUT > i have over 4.5 million records and when i run the below it ran for more > than 18 hours and only go through about 2.7 millions records when i gave up > and ended the process. > > So how can i make this more efficient and possibly add the new attributes > (month/year) to the data frame on the fly. > > Thanks guys.... > > #Create sample data > RawData2..<-data.frame(ID=c(22,44),period_end_date=c("9/10/2007 > 0:00:00","2/2/2006 0:00:00")) > > #Create lists to store month and year results > Data.Month_<-list() > Data.Year_<-list() > #pull out year/month attribute at put in own column > for(i in 1:length(RawData2..$ID)){ > ? ? #Select Record > ? ? Data.X<-RawData..[i,] > ? ? #Separate date into month, day, and year > ? ? DateSplit<-strsplit(Data.X$period_end_date,"/") > ? ? #Select month > ? ? Month<-unlist(DateSplit)[1] > ? ? #Separate year from time attribute > ? ? Year.X<-strsplit(unlist(DateSplit)[3]," ") > ? ? Year.Y<-unlist(Year.X)[1] > ? ? Data.Month_[[i]]<-Month > ? ? Data.Year_[[i]]<-Year.Y > > } > > > -- > View this message in context: http://r.789695.n4.nabble.com/Alter-character-attribute-tp3018202p3018202.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
If you convert the dates to R date objects, I think things will be easier:> rawdata2$period_end_date = as.Date(rawdata2$period_end_date,format='%m/%d/%Y') > rawdata2$mon = as.numeric(format(rawdata2$period_end_date,'%m')) > rawdata2$year = as.numeric(format(rawdata2$period_end_date,'%Y'))(I'm assuming you're using month/date/year.) I can pretty much guarantee it will run in less than 18 hours :-) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Thu, 28 Oct 2010, LCOG1 wrote:> > Hi everyone > > I have some records that include a date attribute for the date and time but > i need to separate the data and analyze it separately in GIS by Month and > Year, so i need to pull these attributes out and create their own attribute > field. > > So the input: > RawData2.. returns > > ID period_end_date > 1 22 9/10/2007 0:00:00 > 2 44 2/2/2006 0:00:00 > > and i need to get > ID period_end_date Month Year > 22 9/10/2007 0:00:00 9 2007 > 44 2/2/2006 0:00:00 2 2006 > > The below gets me this in list form which i can then add back into the > initial data frame BUT > i have over 4.5 million records and when i run the below it ran for more > than 18 hours and only go through about 2.7 millions records when i gave up > and ended the process. > > So how can i make this more efficient and possibly add the new attributes > (month/year) to the data frame on the fly. > > Thanks guys.... > > #Create sample data > RawData2..<-data.frame(ID=c(22,44),period_end_date=c("9/10/2007 > 0:00:00","2/2/2006 0:00:00")) > > #Create lists to store month and year results > Data.Month_<-list() > Data.Year_<-list() > #pull out year/month attribute at put in own column > for(i in 1:length(RawData2..$ID)){ > #Select Record > Data.X<-RawData..[i,] > #Separate date into month, day, and year > DateSplit<-strsplit(Data.X$period_end_date,"/") > #Select month > Month<-unlist(DateSplit)[1] > #Separate year from time attribute > Year.X<-strsplit(unlist(DateSplit)[3]," ") > Year.Y<-unlist(Year.X)[1] > Data.Month_[[i]]<-Month > Data.Year_[[i]]<-Year.Y > > } > > > -- > View this message in context: http://r.789695.n4.nabble.com/Alter-character-attribute-tp3018202p3018202.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Changing the filed into date format then pulling out the month/year worked best. Thanks, i knew it was gonna be easy. Cheers -- View this message in context: http://r.789695.n4.nabble.com/Alter-character-attribute-tp3018202p3018255.html Sent from the R help mailing list archive at Nabble.com.