Morway, Eric
2015-Jan-29 00:54 UTC
[R] adding an additional column for preserving uniqueness
The two datasets below are excerpts from much larger datasets. Note that there are duplicate dates in both dat1 and dat2, e.g., "2009-10-14". dat1 <- read.table(textConnection("Date ConcAve 2009-07-08 7 2009-08-26 1 2009-08-26 2 2009-09-15 2 2009-10-14 2 2009-10-14 2 2009-10-16 101 2009-10-16 93 2009-11-18 4 2009-11-18 3 2010-01-04 4"),header=T) closeAllConnections() dat2 <- read.table(textConnection("Date ConcAve 2009-08-26 4.84e-05 2009-09-15 4.58e-05 2009-10-14 3.86e-05 2009-10-14 3.55e-05 2009-10-16 3.07e-05 2009-10-16 2.35e-05 2009-11-18 2.00e-05 2009-11-18 1.96e-05 2010-01-04 1.52e-05 2010-01-04 1.53e-05 2010-02-10 2.23e-05"),header=T) closeAllConnections() I'm seeking an R operation that will append a third column to both data.frame's such that it makes these duplicates unique when I run merge(). The desired result for dat1 would be: Date ConcAve item 2009-07-08 7 1 2009-08-26 1 1 2009-08-26 2 2 2009-09-15 2 1 2009-10-14 2 1 2009-10-14 2 2 2009-10-16 101 1 2009-10-16 93 2 2009-11-18 4 1 2009-11-18 3 2 2010-01-04 4 1 this way, I don't get this: merge(dat1, dat2, by="Date") # Date ConcAve.x ConcAve.y #1 2009-08-26 1 4.84e-05 #2 2009-08-26 2 4.84e-05 #3 2009-09-15 2 4.58e-05 #4 2009-10-14 2 3.55e-05 #5 2009-10-14 2 3.86e-05 #6 2009-10-14 2 3.55e-05 #7 2009-10-14 2 3.86e-05 #8 2009-10-16 101 3.07e-05 #9 2009-10-16 101 2.35e-05 #10 2009-10-16 93 3.07e-05 #11 2009-10-16 93 2.35e-05 #12 2009-11-18 4 1.96e-05 #13 2009-11-18 4 2.00e-05 #14 2009-11-18 3 1.96e-05 #15 2009-11-18 3 2.00e-05 #16 2010-01-04 4 1.52e-05 #17 2010-01-04 4 1.53e-05 With the new column, which I've inserted manually in this small example, I instead get the merge result below, which is what I'm after for the larger problem: dat3 <- read.table(textConnection("Date ConcAve item 2009-07-08 7 1 2009-08-26 1 1 2009-08-26 2 2 2009-09-15 2 1 2009-10-14 2 1 2009-10-14 2 2 2009-10-16 101 1 2009-10-16 93 2 2009-11-18 4 1 2009-11-18 3 2 2010-01-04 4 1"),header=T) closeAllConnections() dat4 <- read.table(textConnection("Date ConcAve item 2009-08-26 4.84e-05 1 2009-09-15 4.58e-05 1 2009-10-14 3.86e-05 1 2009-10-14 3.55e-05 2 2009-10-16 3.07e-05 1 2009-10-16 2.35e-05 2 2009-11-18 2.00e-05 1 2009-11-18 1.96e-05 2 2010-01-04 1.52e-05 1 2010-01-04 1.53e-05 2 2010-02-10 2.23e-05 1"),header=T) closeAllConnections() merge(dat3, dat4, by=c("Date","item")) # Date item ConcAve.x ConcAve.y #1 2009-08-26 1 1 4.84e-05 #2 2009-09-15 1 2 4.58e-05 #3 2009-10-14 1 2 3.86e-05 #4 2009-10-14 2 2 3.55e-05 #5 2009-10-16 1 101 3.07e-05 #6 2009-10-16 2 93 2.35e-05 #7 2009-11-18 1 4 2.00e-05 #8 2009-11-18 2 3 1.96e-05 #9 2010-01-04 1 4 1.52e-05 [[alternative HTML version deleted]]
William Dunlap
2015-Jan-29 01:38 UTC
[R] adding an additional column for preserving uniqueness
> with(dat1, ave(integer(length(Date)), Date, FUN=seq_along))[1] 1 1 2 1 1 2 1 2 1 2 1 Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jan 28, 2015 at 4:54 PM, Morway, Eric <emorway at usgs.gov> wrote:> The two datasets below are excerpts from much larger datasets. Note that > there are duplicate dates in both dat1 and dat2, e.g., "2009-10-14". > > dat1 <- read.table(textConnection("Date ConcAve > 2009-07-08 7 > 2009-08-26 1 > 2009-08-26 2 > 2009-09-15 2 > 2009-10-14 2 > 2009-10-14 2 > 2009-10-16 101 > 2009-10-16 93 > 2009-11-18 4 > 2009-11-18 3 > 2010-01-04 4"),header=T) > closeAllConnections() > > dat2 <- read.table(textConnection("Date ConcAve > 2009-08-26 4.84e-05 > 2009-09-15 4.58e-05 > 2009-10-14 3.86e-05 > 2009-10-14 3.55e-05 > 2009-10-16 3.07e-05 > 2009-10-16 2.35e-05 > 2009-11-18 2.00e-05 > 2009-11-18 1.96e-05 > 2010-01-04 1.52e-05 > 2010-01-04 1.53e-05 > 2010-02-10 2.23e-05"),header=T) > closeAllConnections() > > I'm seeking an R operation that will append a third column to both > data.frame's such that it makes these duplicates unique when I run merge(). > The desired result for dat1 would be: > > Date ConcAve item > 2009-07-08 7 1 > 2009-08-26 1 1 > 2009-08-26 2 2 > 2009-09-15 2 1 > 2009-10-14 2 1 > 2009-10-14 2 2 > 2009-10-16 101 1 > 2009-10-16 93 2 > 2009-11-18 4 1 > 2009-11-18 3 2 > 2010-01-04 4 1 > > this way, I don't get this: > > merge(dat1, dat2, by="Date") > # Date ConcAve.x ConcAve.y > #1 2009-08-26 1 4.84e-05 > #2 2009-08-26 2 4.84e-05 > #3 2009-09-15 2 4.58e-05 > #4 2009-10-14 2 3.55e-05 > #5 2009-10-14 2 3.86e-05 > #6 2009-10-14 2 3.55e-05 > #7 2009-10-14 2 3.86e-05 > #8 2009-10-16 101 3.07e-05 > #9 2009-10-16 101 2.35e-05 > #10 2009-10-16 93 3.07e-05 > #11 2009-10-16 93 2.35e-05 > #12 2009-11-18 4 1.96e-05 > #13 2009-11-18 4 2.00e-05 > #14 2009-11-18 3 1.96e-05 > #15 2009-11-18 3 2.00e-05 > #16 2010-01-04 4 1.52e-05 > #17 2010-01-04 4 1.53e-05 > > With the new column, which I've inserted manually in this small example, I > instead get the merge result below, which is what I'm after for the larger > problem: > > dat3 <- read.table(textConnection("Date ConcAve item > 2009-07-08 7 1 > 2009-08-26 1 1 > 2009-08-26 2 2 > 2009-09-15 2 1 > 2009-10-14 2 1 > 2009-10-14 2 2 > 2009-10-16 101 1 > 2009-10-16 93 2 > 2009-11-18 4 1 > 2009-11-18 3 2 > 2010-01-04 4 1"),header=T) > closeAllConnections() > > dat4 <- read.table(textConnection("Date ConcAve item > 2009-08-26 4.84e-05 1 > 2009-09-15 4.58e-05 1 > 2009-10-14 3.86e-05 1 > 2009-10-14 3.55e-05 2 > 2009-10-16 3.07e-05 1 > 2009-10-16 2.35e-05 2 > 2009-11-18 2.00e-05 1 > 2009-11-18 1.96e-05 2 > 2010-01-04 1.52e-05 1 > 2010-01-04 1.53e-05 2 > 2010-02-10 2.23e-05 1"),header=T) > closeAllConnections() > > merge(dat3, dat4, by=c("Date","item")) > # Date item ConcAve.x ConcAve.y > #1 2009-08-26 1 1 4.84e-05 > #2 2009-09-15 1 2 4.58e-05 > #3 2009-10-14 1 2 3.86e-05 > #4 2009-10-14 2 2 3.55e-05 > #5 2009-10-16 1 101 3.07e-05 > #6 2009-10-16 2 93 2.35e-05 > #7 2009-11-18 1 4 2.00e-05 > #8 2009-11-18 2 3 1.96e-05 > #9 2010-01-04 1 4 1.52e-05 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Chel Hee Lee
2015-Jan-29 03:20 UTC
[R] adding an additional column for preserving uniqueness
I like the way presented by William Dunlap in the previous post. You may also try this: > dat1$item <- Reduce(c,lapply(table(dat1$Date), seq_len)) > dat2$item <- Reduce(c,lapply(table(dat2$Date), seq_len)) > dat1 Date ConcAve item 1 2009-07-08 7 1 2 2009-08-26 1 1 3 2009-08-26 2 2 4 2009-09-15 2 1 5 2009-10-14 2 1 6 2009-10-14 2 2 7 2009-10-16 101 1 8 2009-10-16 93 2 9 2009-11-18 4 1 10 2009-11-18 3 2 11 2010-01-04 4 1 > dat2 Date ConcAve item 1 2009-08-26 4.84e-05 1 2 2009-09-15 4.58e-05 1 3 2009-10-14 3.86e-05 1 4 2009-10-14 3.55e-05 2 5 2009-10-16 3.07e-05 1 6 2009-10-16 2.35e-05 2 7 2009-11-18 2.00e-05 1 8 2009-11-18 1.96e-05 2 9 2010-01-04 1.52e-05 1 10 2010-01-04 1.53e-05 2 11 2010-02-10 2.23e-05 1 > I hope this helps. Chel Hee Lee On 01/28/2015 07:38 PM, William Dunlap wrote:>> with(dat1, ave(integer(length(Date)), Date, FUN=seq_along)) > [1] 1 1 2 1 1 2 1 2 1 2 1 > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Wed, Jan 28, 2015 at 4:54 PM, Morway, Eric <emorway at usgs.gov> wrote: > >> The two datasets below are excerpts from much larger datasets. Note that >> there are duplicate dates in both dat1 and dat2, e.g., "2009-10-14". >> >> dat1 <- read.table(textConnection("Date ConcAve >> 2009-07-08 7 >> 2009-08-26 1 >> 2009-08-26 2 >> 2009-09-15 2 >> 2009-10-14 2 >> 2009-10-14 2 >> 2009-10-16 101 >> 2009-10-16 93 >> 2009-11-18 4 >> 2009-11-18 3 >> 2010-01-04 4"),header=T) >> closeAllConnections() >> >> dat2 <- read.table(textConnection("Date ConcAve >> 2009-08-26 4.84e-05 >> 2009-09-15 4.58e-05 >> 2009-10-14 3.86e-05 >> 2009-10-14 3.55e-05 >> 2009-10-16 3.07e-05 >> 2009-10-16 2.35e-05 >> 2009-11-18 2.00e-05 >> 2009-11-18 1.96e-05 >> 2010-01-04 1.52e-05 >> 2010-01-04 1.53e-05 >> 2010-02-10 2.23e-05"),header=T) >> closeAllConnections() >> >> I'm seeking an R operation that will append a third column to both >> data.frame's such that it makes these duplicates unique when I run merge(). >> The desired result for dat1 would be: >> >> Date ConcAve item >> 2009-07-08 7 1 >> 2009-08-26 1 1 >> 2009-08-26 2 2 >> 2009-09-15 2 1 >> 2009-10-14 2 1 >> 2009-10-14 2 2 >> 2009-10-16 101 1 >> 2009-10-16 93 2 >> 2009-11-18 4 1 >> 2009-11-18 3 2 >> 2010-01-04 4 1 >> >> this way, I don't get this: >> >> merge(dat1, dat2, by="Date") >> # Date ConcAve.x ConcAve.y >> #1 2009-08-26 1 4.84e-05 >> #2 2009-08-26 2 4.84e-05 >> #3 2009-09-15 2 4.58e-05 >> #4 2009-10-14 2 3.55e-05 >> #5 2009-10-14 2 3.86e-05 >> #6 2009-10-14 2 3.55e-05 >> #7 2009-10-14 2 3.86e-05 >> #8 2009-10-16 101 3.07e-05 >> #9 2009-10-16 101 2.35e-05 >> #10 2009-10-16 93 3.07e-05 >> #11 2009-10-16 93 2.35e-05 >> #12 2009-11-18 4 1.96e-05 >> #13 2009-11-18 4 2.00e-05 >> #14 2009-11-18 3 1.96e-05 >> #15 2009-11-18 3 2.00e-05 >> #16 2010-01-04 4 1.52e-05 >> #17 2010-01-04 4 1.53e-05 >> >> With the new column, which I've inserted manually in this small example, I >> instead get the merge result below, which is what I'm after for the larger >> problem: >> >> dat3 <- read.table(textConnection("Date ConcAve item >> 2009-07-08 7 1 >> 2009-08-26 1 1 >> 2009-08-26 2 2 >> 2009-09-15 2 1 >> 2009-10-14 2 1 >> 2009-10-14 2 2 >> 2009-10-16 101 1 >> 2009-10-16 93 2 >> 2009-11-18 4 1 >> 2009-11-18 3 2 >> 2010-01-04 4 1"),header=T) >> closeAllConnections() >> >> dat4 <- read.table(textConnection("Date ConcAve item >> 2009-08-26 4.84e-05 1 >> 2009-09-15 4.58e-05 1 >> 2009-10-14 3.86e-05 1 >> 2009-10-14 3.55e-05 2 >> 2009-10-16 3.07e-05 1 >> 2009-10-16 2.35e-05 2 >> 2009-11-18 2.00e-05 1 >> 2009-11-18 1.96e-05 2 >> 2010-01-04 1.52e-05 1 >> 2010-01-04 1.53e-05 2 >> 2010-02-10 2.23e-05 1"),header=T) >> closeAllConnections() >> >> merge(dat3, dat4, by=c("Date","item")) >> # Date item ConcAve.x ConcAve.y >> #1 2009-08-26 1 1 4.84e-05 >> #2 2009-09-15 1 2 4.58e-05 >> #3 2009-10-14 1 2 3.86e-05 >> #4 2009-10-14 2 2 3.55e-05 >> #5 2009-10-16 1 101 3.07e-05 >> #6 2009-10-16 2 93 2.35e-05 >> #7 2009-11-18 1 4 2.00e-05 >> #8 2009-11-18 2 3 1.96e-05 >> #9 2010-01-04 1 4 1.52e-05 >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >