Morway, Eric
2015-Jan-29 00:54 UTC
[R] adding an additional column for preserving uniqueness
The two datasets below are excerpts from much larger datasets. Note that
there are duplicate dates in both dat1 and dat2, e.g., "2009-10-14".
dat1 <- read.table(textConnection("Date ConcAve
2009-07-08 7
2009-08-26 1
2009-08-26 2
2009-09-15 2
2009-10-14 2
2009-10-14 2
2009-10-16 101
2009-10-16 93
2009-11-18 4
2009-11-18 3
2010-01-04 4"),header=T)
closeAllConnections()
dat2 <- read.table(textConnection("Date ConcAve
2009-08-26 4.84e-05
2009-09-15 4.58e-05
2009-10-14 3.86e-05
2009-10-14 3.55e-05
2009-10-16 3.07e-05
2009-10-16 2.35e-05
2009-11-18 2.00e-05
2009-11-18 1.96e-05
2010-01-04 1.52e-05
2010-01-04 1.53e-05
2010-02-10 2.23e-05"),header=T)
closeAllConnections()
I'm seeking an R operation that will append a third column to both
data.frame's such that it makes these duplicates unique when I run merge().
The desired result for dat1 would be:
Date ConcAve item
2009-07-08 7 1
2009-08-26 1 1
2009-08-26 2 2
2009-09-15 2 1
2009-10-14 2 1
2009-10-14 2 2
2009-10-16 101 1
2009-10-16 93 2
2009-11-18 4 1
2009-11-18 3 2
2010-01-04 4 1
this way, I don't get this:
merge(dat1, dat2, by="Date")
# Date ConcAve.x ConcAve.y
#1 2009-08-26 1 4.84e-05
#2 2009-08-26 2 4.84e-05
#3 2009-09-15 2 4.58e-05
#4 2009-10-14 2 3.55e-05
#5 2009-10-14 2 3.86e-05
#6 2009-10-14 2 3.55e-05
#7 2009-10-14 2 3.86e-05
#8 2009-10-16 101 3.07e-05
#9 2009-10-16 101 2.35e-05
#10 2009-10-16 93 3.07e-05
#11 2009-10-16 93 2.35e-05
#12 2009-11-18 4 1.96e-05
#13 2009-11-18 4 2.00e-05
#14 2009-11-18 3 1.96e-05
#15 2009-11-18 3 2.00e-05
#16 2010-01-04 4 1.52e-05
#17 2010-01-04 4 1.53e-05
With the new column, which I've inserted manually in this small example, I
instead get the merge result below, which is what I'm after for the larger
problem:
dat3 <- read.table(textConnection("Date ConcAve item
2009-07-08 7 1
2009-08-26 1 1
2009-08-26 2 2
2009-09-15 2 1
2009-10-14 2 1
2009-10-14 2 2
2009-10-16 101 1
2009-10-16 93 2
2009-11-18 4 1
2009-11-18 3 2
2010-01-04 4 1"),header=T)
closeAllConnections()
dat4 <- read.table(textConnection("Date ConcAve item
2009-08-26 4.84e-05 1
2009-09-15 4.58e-05 1
2009-10-14 3.86e-05 1
2009-10-14 3.55e-05 2
2009-10-16 3.07e-05 1
2009-10-16 2.35e-05 2
2009-11-18 2.00e-05 1
2009-11-18 1.96e-05 2
2010-01-04 1.52e-05 1
2010-01-04 1.53e-05 2
2010-02-10 2.23e-05 1"),header=T)
closeAllConnections()
merge(dat3, dat4, by=c("Date","item"))
# Date item ConcAve.x ConcAve.y
#1 2009-08-26 1 1 4.84e-05
#2 2009-09-15 1 2 4.58e-05
#3 2009-10-14 1 2 3.86e-05
#4 2009-10-14 2 2 3.55e-05
#5 2009-10-16 1 101 3.07e-05
#6 2009-10-16 2 93 2.35e-05
#7 2009-11-18 1 4 2.00e-05
#8 2009-11-18 2 3 1.96e-05
#9 2010-01-04 1 4 1.52e-05
[[alternative HTML version deleted]]
William Dunlap
2015-Jan-29 01:38 UTC
[R] adding an additional column for preserving uniqueness
> with(dat1, ave(integer(length(Date)), Date, FUN=seq_along))[1] 1 1 2 1 1 2 1 2 1 2 1 Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jan 28, 2015 at 4:54 PM, Morway, Eric <emorway at usgs.gov> wrote:> The two datasets below are excerpts from much larger datasets. Note that > there are duplicate dates in both dat1 and dat2, e.g., "2009-10-14". > > dat1 <- read.table(textConnection("Date ConcAve > 2009-07-08 7 > 2009-08-26 1 > 2009-08-26 2 > 2009-09-15 2 > 2009-10-14 2 > 2009-10-14 2 > 2009-10-16 101 > 2009-10-16 93 > 2009-11-18 4 > 2009-11-18 3 > 2010-01-04 4"),header=T) > closeAllConnections() > > dat2 <- read.table(textConnection("Date ConcAve > 2009-08-26 4.84e-05 > 2009-09-15 4.58e-05 > 2009-10-14 3.86e-05 > 2009-10-14 3.55e-05 > 2009-10-16 3.07e-05 > 2009-10-16 2.35e-05 > 2009-11-18 2.00e-05 > 2009-11-18 1.96e-05 > 2010-01-04 1.52e-05 > 2010-01-04 1.53e-05 > 2010-02-10 2.23e-05"),header=T) > closeAllConnections() > > I'm seeking an R operation that will append a third column to both > data.frame's such that it makes these duplicates unique when I run merge(). > The desired result for dat1 would be: > > Date ConcAve item > 2009-07-08 7 1 > 2009-08-26 1 1 > 2009-08-26 2 2 > 2009-09-15 2 1 > 2009-10-14 2 1 > 2009-10-14 2 2 > 2009-10-16 101 1 > 2009-10-16 93 2 > 2009-11-18 4 1 > 2009-11-18 3 2 > 2010-01-04 4 1 > > this way, I don't get this: > > merge(dat1, dat2, by="Date") > # Date ConcAve.x ConcAve.y > #1 2009-08-26 1 4.84e-05 > #2 2009-08-26 2 4.84e-05 > #3 2009-09-15 2 4.58e-05 > #4 2009-10-14 2 3.55e-05 > #5 2009-10-14 2 3.86e-05 > #6 2009-10-14 2 3.55e-05 > #7 2009-10-14 2 3.86e-05 > #8 2009-10-16 101 3.07e-05 > #9 2009-10-16 101 2.35e-05 > #10 2009-10-16 93 3.07e-05 > #11 2009-10-16 93 2.35e-05 > #12 2009-11-18 4 1.96e-05 > #13 2009-11-18 4 2.00e-05 > #14 2009-11-18 3 1.96e-05 > #15 2009-11-18 3 2.00e-05 > #16 2010-01-04 4 1.52e-05 > #17 2010-01-04 4 1.53e-05 > > With the new column, which I've inserted manually in this small example, I > instead get the merge result below, which is what I'm after for the larger > problem: > > dat3 <- read.table(textConnection("Date ConcAve item > 2009-07-08 7 1 > 2009-08-26 1 1 > 2009-08-26 2 2 > 2009-09-15 2 1 > 2009-10-14 2 1 > 2009-10-14 2 2 > 2009-10-16 101 1 > 2009-10-16 93 2 > 2009-11-18 4 1 > 2009-11-18 3 2 > 2010-01-04 4 1"),header=T) > closeAllConnections() > > dat4 <- read.table(textConnection("Date ConcAve item > 2009-08-26 4.84e-05 1 > 2009-09-15 4.58e-05 1 > 2009-10-14 3.86e-05 1 > 2009-10-14 3.55e-05 2 > 2009-10-16 3.07e-05 1 > 2009-10-16 2.35e-05 2 > 2009-11-18 2.00e-05 1 > 2009-11-18 1.96e-05 2 > 2010-01-04 1.52e-05 1 > 2010-01-04 1.53e-05 2 > 2010-02-10 2.23e-05 1"),header=T) > closeAllConnections() > > merge(dat3, dat4, by=c("Date","item")) > # Date item ConcAve.x ConcAve.y > #1 2009-08-26 1 1 4.84e-05 > #2 2009-09-15 1 2 4.58e-05 > #3 2009-10-14 1 2 3.86e-05 > #4 2009-10-14 2 2 3.55e-05 > #5 2009-10-16 1 101 3.07e-05 > #6 2009-10-16 2 93 2.35e-05 > #7 2009-11-18 1 4 2.00e-05 > #8 2009-11-18 2 3 1.96e-05 > #9 2010-01-04 1 4 1.52e-05 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Chel Hee Lee
2015-Jan-29 03:20 UTC
[R] adding an additional column for preserving uniqueness
I like the way presented by William Dunlap in the previous post. You
may also try this:
> dat1$item <- Reduce(c,lapply(table(dat1$Date), seq_len))
> dat2$item <- Reduce(c,lapply(table(dat2$Date), seq_len))
> dat1
Date ConcAve item
1 2009-07-08 7 1
2 2009-08-26 1 1
3 2009-08-26 2 2
4 2009-09-15 2 1
5 2009-10-14 2 1
6 2009-10-14 2 2
7 2009-10-16 101 1
8 2009-10-16 93 2
9 2009-11-18 4 1
10 2009-11-18 3 2
11 2010-01-04 4 1
> dat2
Date ConcAve item
1 2009-08-26 4.84e-05 1
2 2009-09-15 4.58e-05 1
3 2009-10-14 3.86e-05 1
4 2009-10-14 3.55e-05 2
5 2009-10-16 3.07e-05 1
6 2009-10-16 2.35e-05 2
7 2009-11-18 2.00e-05 1
8 2009-11-18 1.96e-05 2
9 2010-01-04 1.52e-05 1
10 2010-01-04 1.53e-05 2
11 2010-02-10 2.23e-05 1
>
I hope this helps.
Chel Hee Lee
On 01/28/2015 07:38 PM, William Dunlap wrote:>> with(dat1, ave(integer(length(Date)), Date, FUN=seq_along))
> [1] 1 1 2 1 1 2 1 2 1 2 1
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, Jan 28, 2015 at 4:54 PM, Morway, Eric <emorway at usgs.gov>
wrote:
>
>> The two datasets below are excerpts from much larger datasets. Note
that
>> there are duplicate dates in both dat1 and dat2, e.g.,
"2009-10-14".
>>
>> dat1 <- read.table(textConnection("Date ConcAve
>> 2009-07-08 7
>> 2009-08-26 1
>> 2009-08-26 2
>> 2009-09-15 2
>> 2009-10-14 2
>> 2009-10-14 2
>> 2009-10-16 101
>> 2009-10-16 93
>> 2009-11-18 4
>> 2009-11-18 3
>> 2010-01-04 4"),header=T)
>> closeAllConnections()
>>
>> dat2 <- read.table(textConnection("Date ConcAve
>> 2009-08-26 4.84e-05
>> 2009-09-15 4.58e-05
>> 2009-10-14 3.86e-05
>> 2009-10-14 3.55e-05
>> 2009-10-16 3.07e-05
>> 2009-10-16 2.35e-05
>> 2009-11-18 2.00e-05
>> 2009-11-18 1.96e-05
>> 2010-01-04 1.52e-05
>> 2010-01-04 1.53e-05
>> 2010-02-10 2.23e-05"),header=T)
>> closeAllConnections()
>>
>> I'm seeking an R operation that will append a third column to both
>> data.frame's such that it makes these duplicates unique when I run
merge().
>> The desired result for dat1 would be:
>>
>> Date ConcAve item
>> 2009-07-08 7 1
>> 2009-08-26 1 1
>> 2009-08-26 2 2
>> 2009-09-15 2 1
>> 2009-10-14 2 1
>> 2009-10-14 2 2
>> 2009-10-16 101 1
>> 2009-10-16 93 2
>> 2009-11-18 4 1
>> 2009-11-18 3 2
>> 2010-01-04 4 1
>>
>> this way, I don't get this:
>>
>> merge(dat1, dat2, by="Date")
>> # Date ConcAve.x ConcAve.y
>> #1 2009-08-26 1 4.84e-05
>> #2 2009-08-26 2 4.84e-05
>> #3 2009-09-15 2 4.58e-05
>> #4 2009-10-14 2 3.55e-05
>> #5 2009-10-14 2 3.86e-05
>> #6 2009-10-14 2 3.55e-05
>> #7 2009-10-14 2 3.86e-05
>> #8 2009-10-16 101 3.07e-05
>> #9 2009-10-16 101 2.35e-05
>> #10 2009-10-16 93 3.07e-05
>> #11 2009-10-16 93 2.35e-05
>> #12 2009-11-18 4 1.96e-05
>> #13 2009-11-18 4 2.00e-05
>> #14 2009-11-18 3 1.96e-05
>> #15 2009-11-18 3 2.00e-05
>> #16 2010-01-04 4 1.52e-05
>> #17 2010-01-04 4 1.53e-05
>>
>> With the new column, which I've inserted manually in this small
example, I
>> instead get the merge result below, which is what I'm after for the
larger
>> problem:
>>
>> dat3 <- read.table(textConnection("Date ConcAve item
>> 2009-07-08 7 1
>> 2009-08-26 1 1
>> 2009-08-26 2 2
>> 2009-09-15 2 1
>> 2009-10-14 2 1
>> 2009-10-14 2 2
>> 2009-10-16 101 1
>> 2009-10-16 93 2
>> 2009-11-18 4 1
>> 2009-11-18 3 2
>> 2010-01-04 4 1"),header=T)
>> closeAllConnections()
>>
>> dat4 <- read.table(textConnection("Date ConcAve item
>> 2009-08-26 4.84e-05 1
>> 2009-09-15 4.58e-05 1
>> 2009-10-14 3.86e-05 1
>> 2009-10-14 3.55e-05 2
>> 2009-10-16 3.07e-05 1
>> 2009-10-16 2.35e-05 2
>> 2009-11-18 2.00e-05 1
>> 2009-11-18 1.96e-05 2
>> 2010-01-04 1.52e-05 1
>> 2010-01-04 1.53e-05 2
>> 2010-02-10 2.23e-05 1"),header=T)
>> closeAllConnections()
>>
>> merge(dat3, dat4, by=c("Date","item"))
>> # Date item ConcAve.x ConcAve.y
>> #1 2009-08-26 1 1 4.84e-05
>> #2 2009-09-15 1 2 4.58e-05
>> #3 2009-10-14 1 2 3.86e-05
>> #4 2009-10-14 2 2 3.55e-05
>> #5 2009-10-16 1 101 3.07e-05
>> #6 2009-10-16 2 93 2.35e-05
>> #7 2009-11-18 1 4 2.00e-05
>> #8 2009-11-18 2 3 1.96e-05
>> #9 2010-01-04 1 4 1.52e-05
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>