Hi, I have a data frame, data, containing two columns: one- the TimeStamp (formatted using data$TimeStamp <- as.POSTIXct(as.character(data$TimeStamp), format = "%d/%m/%Y %H:%M") ) and two- the data value. The data frame has been read from a .csv file and should contain 48 values for each day of the year (values sampled at 30 minute intervals). However, there are only 15,948 observations i.e. only approx 332 days worth of data. I therefore would like to remove any days that do not contain the 48 values. My question, how would I go about doing this? Many thanks, -A.
Hi, May be this helps: set.seed(45) df1<- data.frame(datetime=as.POSIXct("2011-05-25",tz="GMT")+0:200*30*60,value=sample(1:40,201,replace=TRUE),value2= sample(45:90,201,replace=TRUE)) ?df2<- df1[ave(1:nrow(df1),as.Date(df1[,1]),FUN=length)==48,] ?dim(df2) #[1] 192?? 3 #or library(plyr) df3<-df1[ddply(df1,.(as.Date(datetime)),mutate,Ldt=length(datetime)==48)$Ldt,] ?identical(df3,df2) #[1] TRUE A.K. ----- Original Message ----- From: "aj409 at bath.ac.uk" <aj409 at bath.ac.uk> To: r-help at r-project.org Cc: Sent: Friday, October 4, 2013 11:03 AM Subject: [R] Subsetting Timestamped data Hi, I have a data frame, data, containing two columns: one- the TimeStamp? (formatted using data$TimeStamp <-? as.POSTIXct(as.character(data$TimeStamp), format = "%d/%m/%Y %H:%M") )? and two- the data value. The data frame has been read from a .csv file and should contain 48? values for each day of the year (values sampled at 30 minute? intervals). However, there are only 15,948 observations i.e. only? approx 332 days worth of data. I therefore would like to remove any? days that do not contain the 48 values. My question, how would I go about doing this? Many thanks, -A. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Here is an approach using base R tools (not tested, so I hope I don't embarrass myself!) dayid <- format(data$TimeStamp, '%Y-%m-%d') day.counts <- table(dayid) good.days <- names(day.counts)[day.counts == 48] subset(data, dayid %in% good.days) This could be written in a one-liner, but it's much easier to understand and to check if done step by step. (And I'll indulge in a side comment ... as a matter of personal opinion, I think it's beneficial to learn how to do basic data manipulation using base R tools before delving into the use of more sophisticated functions from various packages. This helps build R skills.) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 10/4/13 8:03 AM, "aj409 at bath.ac.uk" <aj409 at bath.ac.uk> wrote:> >Hi, > >I have a data frame, data, containing two columns: one- the TimeStamp >(formatted using data$TimeStamp <- >as.POSTIXct(as.character(data$TimeStamp), format = "%d/%m/%Y %H:%M") ) >and two- the data value. > >The data frame has been read from a .csv file and should contain 48 >values for each day of the year (values sampled at 30 minute >intervals). However, there are only 15,948 observations i.e. only >approx 332 days worth of data. I therefore would like to remove any >days that do not contain the 48 values. > >My question, how would I go about doing this? > >Many thanks, > >-A. > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Seemingly Similar Threads
- count data without NA in certain time intervals and plot it
- Best way/practice to create a new data frame from two given ones with last column computed from the two data frames?
- merging several dataframes from a list
- Looping through a list of objects & do something...
- Subsetting rows by multiple levels of selected values