thr3ads.net - R help - [R] duplicate values [Nov 2008]

If this information is useful, please help other people find it:
Share via:

Antje Nöthlich

2008-Nov-16 18:10 UTC

[R] duplicate values

Hei R Users, 

i have the following dataframe:

          Datetime                      Temperature             and many more
collumns
1        2008-6-1 00:00:00      5
2        2008-6-1 02:00:00      5
3        2008-6-1 03:00:00      6
4        2008-6-1 03:00:00      0
5        2008-6-1 04:00:00      6
6        2008-6-1 04:00:00      0
7        2008-6-1 05:00:00      7
8        2008-6-1 06:00:00      7
.            .                                .
.            .                                .
.            .                                .
3000  2008-8-31 00:00:00    3


the problem is that row 3 & 4 and row 5 & 6 have the same
"Datetime" value but they differ in the values of the
"Temperature" column.
Now for the whole dataframe i would like to delete rows that have the same
"Datetime" value as the prior row.
I have tried unique(dataframe), but it does not work here because the rows are
no real duplicates of each other.
thanks in advance for your help!

Antje

Oliver Bandel

2008-Nov-16 18:21 UTC

head link

[R] duplicate values

Antje N?thlich <antno <at> web.de> writes:

[...]> Now for the whole dataframe i would like to delete rows that have the same 
> "Datetime" value as the prior row.
Well, if you do this, then you loose data.
is this really, what you want?
Throwing away data?
I would think it make sense, if all columns are equal, so that unique()
could be used - then you only throw away data, which already is registered
in your data frame.

But when you throw away different values because of the same date-time,
then there is the question: WHICH would you throw away?
All but the first? Or do you want to select a maximum or minimum?


You attempt looks strange to me...


Ciao,
   Oliver

Erik Iverson

2008-Nov-16 18:24 UTC

head link

[R] duplicate values

Antje -

I may be missing something, but I usually do this with the negation of 
duplicated instead of unique.

So, as an example:

test <- data.frame(a = rep(1:5, each = 2),
    b = rep(1:5, each = 2), c = rnorm(10))

test[!duplicated(test[c("a", "b")]), ]

Hope that helps!

Erik

Antje N?thlich wrote:> Hei R Users, 
> 
> i have the following dataframe:
> 
>           Datetime                      Temperature             and many
more collumns
> 1        2008-6-1 00:00:00      5
> 2        2008-6-1 02:00:00      5
> 3        2008-6-1 03:00:00      6
> 4        2008-6-1 03:00:00      0
> 5        2008-6-1 04:00:00      6
> 6        2008-6-1 04:00:00      0
> 7        2008-6-1 05:00:00      7
> 8        2008-6-1 06:00:00      7
> .            .                                .
> .            .                                .
> .            .                                .
> 3000  2008-8-31 00:00:00    3
> 
> 
> the problem is that row 3 & 4 and row 5 & 6 have the same
"Datetime" value but they differ in the values of the
"Temperature" column.
> Now for the whole dataframe i would like to delete rows that have the same
"Datetime" value as the prior row.
> I have tried unique(dataframe), but it does not work here because the rows
are no real duplicates of each other.
> thanks in advance for your help!
> 
> Antje
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

jim holtman

2008-Nov-16 18:24 UTC

head link

[R] duplicate values

This should do it for you:
> x <- read.table(textConnection(         "Date time                 
Temperature+ 1        2008-6-1 00:00:00      5
+ 2        2008-6-1 02:00:00      5
+ 3        2008-6-1 03:00:00      6
+ 4        2008-6-1 03:00:00      0
+ 5        2008-6-1 04:00:00      6
+ 6        2008-6-1 04:00:00      0
+ 7        2008-6-1 05:00:00      7
+ 8        2008-6-1 06:00:00      7"), header=TRUE)> closeAllConnections()
> # create datetime
> x$dt <- as.POSIXct(paste(x$Date, x$time))
> # create list of duplicate values next to each other
> dup <- c(FALSE, diff(x$dt) == 0)
> # remove
> x[!dup,]      Date     time Temperature                  dt
1 2008-6-1 00:00:00           5 2008-06-01 00:00:00
2 2008-6-1 02:00:00           5 2008-06-01 02:00:00
3 2008-6-1 03:00:00           6 2008-06-01 03:00:00
5 2008-6-1 04:00:00           6 2008-06-01 04:00:00
7 2008-6-1 05:00:00           7 2008-06-01 05:00:00
8 2008-6-1 06:00:00           7 2008-06-01 06:00:00


On Sun, Nov 16, 2008 at 1:10 PM, Antje N?thlich <antno at web.de>
wrote:> Hei R Users,
>
> i have the following dataframe:
>
>          Datetime                      Temperature             and many
more collumns
> 1        2008-6-1 00:00:00      5
> 2        2008-6-1 02:00:00      5
> 3        2008-6-1 03:00:00      6
> 4        2008-6-1 03:00:00      0
> 5        2008-6-1 04:00:00      6
> 6        2008-6-1 04:00:00      0
> 7        2008-6-1 05:00:00      7
> 8        2008-6-1 06:00:00      7
> .            .                                .
> .            .                                .
> .            .                                .
> 3000  2008-8-31 00:00:00    3
>
>
> the problem is that row 3 & 4 and row 5 & 6 have the same
"Datetime" value but they differ in the values of the
"Temperature" column.
> Now for the whole dataframe i would like to delete rows that have the same
"Datetime" value as the prior row.
> I have tried unique(dataframe), but it does not work here because the rows
are no real duplicates of each other.
> thanks in advance for your help!
>
> Antje
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Gabor Grothendieck

2008-Nov-16 18:33 UTC

head link

[R] duplicate values

Here are three possibilities:

# 1
DF[!duplicated(DF$Datetime), ]

# 2
aggregate(DF[-1], DF[1], head, 1)

These give the first one but if you want the
last one use the fromLast= arg of duplicated
or tail instead of head with aggregate.

# 3
# The zoo package can read in data, convert the
# first column to datetime and remove
# duplicates via aggregation all at once:

Lines <- 'Datetime,Temperature
2008-6-1 00:00:00,5
2008-6-1 02:00:00,5
2008-6-1 03:00:00,6
2008-6-1 03:00:00,0
2008-6-1 04:00:00,6
2008-6-1 04:00:00,0
2008-6-1 05:00:00,7
2008-6-1 06:00:00,7 '

library(zoo)
library(chron)
# z <- read.zoo("myfile.csv", sep = ",", header = TRUE,
FUN = as.chron,
#  aggregate = function(x) head(x, 1))

z <- read.zoo(textConnection(Lines), sep = ",", header = TRUE, FUN
= as.chron,
  aggregate = function(x) head(x, 1))


# or use tz = "" in place of FUN=as.chron if you want POSIXct.
# See the three zoo vignettes and on dates and times see R News 4/1.

On Sun, Nov 16, 2008 at 1:10 PM, Antje N?thlich <antno at web.de>
wrote:> Hei R Users,
>
> i have the following dataframe:
>
>          Datetime                      Temperature             and many
more collumns
> 1        2008-6-1 00:00:00      5
> 2        2008-6-1 02:00:00      5
> 3        2008-6-1 03:00:00      6
> 4        2008-6-1 03:00:00      0
> 5        2008-6-1 04:00:00      6
> 6        2008-6-1 04:00:00      0
> 7        2008-6-1 05:00:00      7
> 8        2008-6-1 06:00:00      7
> .            .                                .
> .            .                                .
> .            .                                .
> 3000  2008-8-31 00:00:00    3
>
>
> the problem is that row 3 & 4 and row 5 & 6 have the same
"Datetime" value but they differ in the values of the
"Temperature" column.
> Now for the whole dataframe i would like to delete rows that have the same
"Datetime" value as the prior row.
> I have tried unique(dataframe), but it does not work here because the rows
are no real duplicates of each other.
> thanks in advance for your help!
>
> Antje
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Nov 2008 - duplicate values

[R] duplicate values

[R] duplicate values

[R] duplicate values

[R] duplicate values

[R] duplicate values

Seemingly Similar Threads