Hei R Users, i have the following dataframe: Datetime Temperature and many more collumns 1 2008-6-1 00:00:00 5 2 2008-6-1 02:00:00 5 3 2008-6-1 03:00:00 6 4 2008-6-1 03:00:00 0 5 2008-6-1 04:00:00 6 6 2008-6-1 04:00:00 0 7 2008-6-1 05:00:00 7 8 2008-6-1 06:00:00 7 . . . . . . . . . 3000 2008-8-31 00:00:00 3 the problem is that row 3 & 4 and row 5 & 6 have the same "Datetime" value but they differ in the values of the "Temperature" column. Now for the whole dataframe i would like to delete rows that have the same "Datetime" value as the prior row. I have tried unique(dataframe), but it does not work here because the rows are no real duplicates of each other. thanks in advance for your help! Antje
Antje N?thlich <antno <at> web.de> writes: [...]> Now for the whole dataframe i would like to delete rows that have the same > "Datetime" value as the prior row.Well, if you do this, then you loose data. is this really, what you want? Throwing away data? I would think it make sense, if all columns are equal, so that unique() could be used - then you only throw away data, which already is registered in your data frame. But when you throw away different values because of the same date-time, then there is the question: WHICH would you throw away? All but the first? Or do you want to select a maximum or minimum? You attempt looks strange to me... Ciao, Oliver
Antje - I may be missing something, but I usually do this with the negation of duplicated instead of unique. So, as an example: test <- data.frame(a = rep(1:5, each = 2), b = rep(1:5, each = 2), c = rnorm(10)) test[!duplicated(test[c("a", "b")]), ] Hope that helps! Erik Antje N?thlich wrote:> Hei R Users, > > i have the following dataframe: > > Datetime Temperature and many more collumns > 1 2008-6-1 00:00:00 5 > 2 2008-6-1 02:00:00 5 > 3 2008-6-1 03:00:00 6 > 4 2008-6-1 03:00:00 0 > 5 2008-6-1 04:00:00 6 > 6 2008-6-1 04:00:00 0 > 7 2008-6-1 05:00:00 7 > 8 2008-6-1 06:00:00 7 > . . . > . . . > . . . > 3000 2008-8-31 00:00:00 3 > > > the problem is that row 3 & 4 and row 5 & 6 have the same "Datetime" value but they differ in the values of the "Temperature" column. > Now for the whole dataframe i would like to delete rows that have the same "Datetime" value as the prior row. > I have tried unique(dataframe), but it does not work here because the rows are no real duplicates of each other. > thanks in advance for your help! > > Antje > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
This should do it for you:> x <- read.table(textConnection( "Date time Temperature+ 1 2008-6-1 00:00:00 5 + 2 2008-6-1 02:00:00 5 + 3 2008-6-1 03:00:00 6 + 4 2008-6-1 03:00:00 0 + 5 2008-6-1 04:00:00 6 + 6 2008-6-1 04:00:00 0 + 7 2008-6-1 05:00:00 7 + 8 2008-6-1 06:00:00 7"), header=TRUE)> closeAllConnections() > # create datetime > x$dt <- as.POSIXct(paste(x$Date, x$time)) > # create list of duplicate values next to each other > dup <- c(FALSE, diff(x$dt) == 0) > # remove > x[!dup,]Date time Temperature dt 1 2008-6-1 00:00:00 5 2008-06-01 00:00:00 2 2008-6-1 02:00:00 5 2008-06-01 02:00:00 3 2008-6-1 03:00:00 6 2008-06-01 03:00:00 5 2008-6-1 04:00:00 6 2008-06-01 04:00:00 7 2008-6-1 05:00:00 7 2008-06-01 05:00:00 8 2008-6-1 06:00:00 7 2008-06-01 06:00:00 On Sun, Nov 16, 2008 at 1:10 PM, Antje N?thlich <antno at web.de> wrote:> Hei R Users, > > i have the following dataframe: > > Datetime Temperature and many more collumns > 1 2008-6-1 00:00:00 5 > 2 2008-6-1 02:00:00 5 > 3 2008-6-1 03:00:00 6 > 4 2008-6-1 03:00:00 0 > 5 2008-6-1 04:00:00 6 > 6 2008-6-1 04:00:00 0 > 7 2008-6-1 05:00:00 7 > 8 2008-6-1 06:00:00 7 > . . . > . . . > . . . > 3000 2008-8-31 00:00:00 3 > > > the problem is that row 3 & 4 and row 5 & 6 have the same "Datetime" value but they differ in the values of the "Temperature" column. > Now for the whole dataframe i would like to delete rows that have the same "Datetime" value as the prior row. > I have tried unique(dataframe), but it does not work here because the rows are no real duplicates of each other. > thanks in advance for your help! > > Antje > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Here are three possibilities: # 1 DF[!duplicated(DF$Datetime), ] # 2 aggregate(DF[-1], DF[1], head, 1) These give the first one but if you want the last one use the fromLast= arg of duplicated or tail instead of head with aggregate. # 3 # The zoo package can read in data, convert the # first column to datetime and remove # duplicates via aggregation all at once: Lines <- 'Datetime,Temperature 2008-6-1 00:00:00,5 2008-6-1 02:00:00,5 2008-6-1 03:00:00,6 2008-6-1 03:00:00,0 2008-6-1 04:00:00,6 2008-6-1 04:00:00,0 2008-6-1 05:00:00,7 2008-6-1 06:00:00,7 ' library(zoo) library(chron) # z <- read.zoo("myfile.csv", sep = ",", header = TRUE, FUN = as.chron, # aggregate = function(x) head(x, 1)) z <- read.zoo(textConnection(Lines), sep = ",", header = TRUE, FUN = as.chron, aggregate = function(x) head(x, 1)) # or use tz = "" in place of FUN=as.chron if you want POSIXct. # See the three zoo vignettes and on dates and times see R News 4/1. On Sun, Nov 16, 2008 at 1:10 PM, Antje N?thlich <antno at web.de> wrote:> Hei R Users, > > i have the following dataframe: > > Datetime Temperature and many more collumns > 1 2008-6-1 00:00:00 5 > 2 2008-6-1 02:00:00 5 > 3 2008-6-1 03:00:00 6 > 4 2008-6-1 03:00:00 0 > 5 2008-6-1 04:00:00 6 > 6 2008-6-1 04:00:00 0 > 7 2008-6-1 05:00:00 7 > 8 2008-6-1 06:00:00 7 > . . . > . . . > . . . > 3000 2008-8-31 00:00:00 3 > > > the problem is that row 3 & 4 and row 5 & 6 have the same "Datetime" value but they differ in the values of the "Temperature" column. > Now for the whole dataframe i would like to delete rows that have the same "Datetime" value as the prior row. > I have tried unique(dataframe), but it does not work here because the rows are no real duplicates of each other. > thanks in advance for your help! > > Antje > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >