Abdi, Abdulhakim
2010-Jul-05 20:01 UTC
[R] data.frame: adding a column that is based on ranges of values in another column
Dear List, I've been looking tirelessly for a solution to this dilemma but without success. Perhaps someone has an idea that will guide me in the right direction. Suppose I have the following data.frame: DF = data.frame(X = c(114.5508, 114.6468, 114.6596, 114.6957, 114.6828, 114.8903, 114.9519, 114.8842, 114.8579, 114.8489), Y = c(47.14094, 46.98874, 46.91235, 46.88265, 46.80584, 46.67022, 46.53264, 46.47727, 46.46457, 46.47032), Date = as.Date(c('2009-01-01', '2009-01-03', '2009-01-05', '2009-01-10', '2009-01-14', '2009-01-15', '2009-01-16', '2009-01-17', '2009-01-22', '2009-01-29'))) DF X Y Date 1 114.5508 47.14094 2009-01-01 2 114.6468 46.98874 2009-01-03 3 114.6596 46.91235 2009-01-05 4 114.6957 46.88265 2009-01-10 5 114.6828 46.80584 2009-01-14 6 114.8903 46.67022 2009-01-15 7 114.9519 46.53264 2009-01-16 8 114.8842 46.47727 2009-01-17 9 114.8579 46.46457 2009-01-22 10 114.8489 46.47032 2009-01-29 I also have two objects that contain the dates of the first and last fortnight of the month of January 2009. s.d1 = '2009-01-01' e.d1 = '2009-01-14' f.n1 = seq(from = as.Date(s.d1) , to = as.Date(e.d1), by = 1) f.n1 [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" "2009-01-05" "2009-01-06" "2009-01-07" "2009-01-08" "2009-01-09" "2009-01-10" "2009-01-11" "2009-01-12" "2009-01-13" "2009-01-14" s.d2 = '2009-01-15' e.d2 = '2009-01-31' f.n2 = seq(from = as.Date(s.d2) , to = as.Date(e.d2), by = 1) f.n2 [1] "2009-01-15" "2009-01-16" "2009-01-17" "2009-01-18" "2009-01-19" "2009-01-20" "2009-01-21" "2009-01-22" "2009-01-23" "2009-01-24" "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29" "2009-01-30" "2009-01-31" I'm trying to add a column called "Fortnight" to the existing data.frame. The components of the new "Fortnight" column are based on the existing "Date" column so that if the value in "Date" falls within the first fortnight (f.n1) then the value of the new "Fortnight" column would be "FN1", and if the value of the "Date" column falls within the second fortnight (f.n2), then the value of the "Fortnight" column would be "FN2", and so on. The end result should look like: X Y Date Fortnight 1 114.5508 47.14094 2009-01-01 FN1 2 114.6468 46.98874 2009-01-03 FN1 3 114.6596 46.91235 2009-01-05 FN1 4 114.6957 46.88265 2009-01-10 FN1 5 114.6828 46.80584 2009-01-14 FN1 6 114.8903 46.67022 2009-01-15 FN2 7 114.9519 46.53264 2009-01-16 FN2 8 114.8842 46.47727 2009-01-17 FN2 9 114.8579 46.46457 2009-01-22 FN2 10 114.8489 46.47032 2009-01-29 FN2 I manually entered the above values for the "Fortnight" column to illustrate my point, however, that would be quite tiresome for 500+ rows of data ;-) The only other similar issue I found on the list was https://stat.ethz.ch/pipermail/r-help/2008-February/153995.html but that particular problem is slightly different than what I'm trying to accomplish here. I appreciate your time and assistance. Thanks in advance. Regards, Hakim Abdi _________________________________ Abdulhakim Abdi, M.Sc. Research Intern Conservation GIS/Remote Sensing Lab Smithsonian Conservation Biology Institute 1500 Remount Road Front Royal, VA 22630 phone: +1 540 635 6578 mobile: +1 747 224 7006 fax: +1 540 635 6506 (Attn:GIS Lab) email: abdia@si.edu http://nationalzoo.si.edu/SCBI/ConservationGIS/ [[alternative HTML version deleted]]
jim holtman
2010-Jul-05 20:55 UTC
[R] data.frame: adding a column that is based on ranges of values in another column
use 'merge':> DF = data.frame(X = c(114.5508, 114.6468, 114.6596, 114.6957, 114.6828, 114.8903, 114.9519, 114.8842,+ 114.8579, 114.8489), Y = c(47.14094, 46.98874, 46.91235, 46.88265, 46.80584, 46.67022, 46.53264, 46.47727, + 46.46457, 46.47032), Date = as.Date(c('2009-01-01', '2009-01-03', '2009-01-05', '2009-01-10', '2009-01-14', + '2009-01-15', '2009-01-16', '2009-01-17', '2009-01-22', '2009-01-29')))> > s.d1 = '2009-01-01' > e.d1 = '2009-01-14' > f.n1 = seq(from = as.Date(s.d1) , to = as.Date(e.d1), by = 1) > > s.d2 = '2009-01-15' > e.d2 = '2009-01-31' > f.n2 = seq(from = as.Date(s.d2) , to = as.Date(e.d2), by = 1) > x.new <- data.frame(Date=c(f.n1, f.n2),+ Fortnight=c(rep("FN1", length(f.n1)), rep("FN2", length(f.n2))))> > merge(DF, x.new, all.x=TRUE)Date X Y Fortnight 1 2009-01-01 114.5508 47.14094 FN1 2 2009-01-03 114.6468 46.98874 FN1 3 2009-01-05 114.6596 46.91235 FN1 4 2009-01-10 114.6957 46.88265 FN1 5 2009-01-14 114.6828 46.80584 FN1 6 2009-01-15 114.8903 46.67022 FN2 7 2009-01-16 114.9519 46.53264 FN2 8 2009-01-17 114.8842 46.47727 FN2 9 2009-01-22 114.8579 46.46457 FN2 10 2009-01-29 114.8489 46.47032 FN2 On Mon, Jul 5, 2010 at 4:01 PM, Abdi, Abdulhakim <AbdiA at si.edu> wrote:> Dear List, > > I've been looking tirelessly for a solution to this dilemma but without success. Perhaps someone has an idea that will guide me in the right direction. > > Suppose I have the following data.frame: > > DF = data.frame(X = c(114.5508, 114.6468, 114.6596, 114.6957, 114.6828, 114.8903, 114.9519, 114.8842, > 114.8579, 114.8489), Y = c(47.14094, 46.98874, 46.91235, 46.88265, 46.80584, 46.67022, 46.53264, 46.47727, > 46.46457, 46.47032), Date = as.Date(c('2009-01-01', '2009-01-03', '2009-01-05', '2009-01-10', '2009-01-14', > '2009-01-15', '2009-01-16', '2009-01-17', '2009-01-22', '2009-01-29'))) > > DF > ? ? ? ? ?X ? ? ? ?Y ? ? ? Date > 1 ?114.5508 47.14094 2009-01-01 > 2 ?114.6468 46.98874 2009-01-03 > 3 ?114.6596 46.91235 2009-01-05 > 4 ?114.6957 46.88265 2009-01-10 > 5 ?114.6828 46.80584 2009-01-14 > 6 ?114.8903 46.67022 2009-01-15 > 7 ?114.9519 46.53264 2009-01-16 > 8 ?114.8842 46.47727 2009-01-17 > 9 ?114.8579 46.46457 2009-01-22 > 10 114.8489 46.47032 2009-01-29 > > I also have two objects that contain the dates of the first and last fortnight of the month of January 2009. > > s.d1 = '2009-01-01' > e.d1 = '2009-01-14' > f.n1 = seq(from = as.Date(s.d1) ?, to = ?as.Date(e.d1), by = 1) > > f.n1 > [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" "2009-01-05" "2009-01-06" "2009-01-07" "2009-01-08" "2009-01-09" "2009-01-10" "2009-01-11" "2009-01-12" "2009-01-13" "2009-01-14" > > s.d2 = '2009-01-15' > e.d2 = '2009-01-31' > f.n2 = seq(from = as.Date(s.d2) ?, to = ?as.Date(e.d2), by = 1) > > f.n2 > [1] "2009-01-15" "2009-01-16" "2009-01-17" "2009-01-18" "2009-01-19" "2009-01-20" "2009-01-21" "2009-01-22" "2009-01-23" "2009-01-24" "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29" "2009-01-30" "2009-01-31" > > > I'm trying to add a column called "Fortnight" to the existing data.frame. The components of the new "Fortnight" column are based on the existing "Date" column so that if the value in "Date" falls within the first fortnight (f.n1) then the value of the new "Fortnight" column would be "FN1", and if the value of the "Date" column falls within the second fortnight (f.n2), then the value of the "Fortnight" column would be "FN2", and so on. > > The end result should look like: > > ? ? ? ? ?X ? ? ? ?Y ? ? ? Date Fortnight > 1 ?114.5508 47.14094 2009-01-01 ? ? ? FN1 > 2 ?114.6468 46.98874 2009-01-03 ? ? ? FN1 > 3 ?114.6596 46.91235 2009-01-05 ? ? ? FN1 > 4 ?114.6957 46.88265 2009-01-10 ? ? ? FN1 > 5 ?114.6828 46.80584 2009-01-14 ? ? ? FN1 > 6 ?114.8903 46.67022 2009-01-15 ? ? ? FN2 > 7 ?114.9519 46.53264 2009-01-16 ? ? ? FN2 > 8 ?114.8842 46.47727 2009-01-17 ? ? ? FN2 > 9 ?114.8579 46.46457 2009-01-22 ? ? ? FN2 > 10 114.8489 46.47032 2009-01-29 ? ? ? FN2 > > I manually entered the above values for the "Fortnight" column to illustrate my point, however, that would be quite tiresome for 500+ rows of data ;-) > > The only other similar issue I found on the list was https://stat.ethz.ch/pipermail/r-help/2008-February/153995.html but that particular problem is slightly different than what I'm trying to accomplish here. > > I appreciate your time and assistance. > > Thanks in advance. > > Regards, > > > Hakim Abdi > > > > _________________________________ > Abdulhakim Abdi, M.Sc. > Research Intern > > Conservation GIS/Remote Sensing Lab > Smithsonian Conservation Biology Institute > 1500 Remount Road > Front Royal, VA 22630 > phone: +1 540 635 6578 > mobile: +1 747 224 7006 > fax: +1 540 635 6506 (Attn:GIS Lab) > email: abdia at si.edu > http://nationalzoo.si.edu/SCBI/ConservationGIS/ > > > > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Dennis Murphy
2010-Jul-05 20:55 UTC
[R] data.frame: adding a column that is based on ranges of values in another column
Hi: Since you've been looking tirelessly :) For your stated problem, the following will work: DF$Fortnight <- with(DF, ifelse(Date %in% f.n1, 'FN1', ifelse(Date %in% f.n2, 'FN2', 'FN3'))) However, if you have a number of fortnights (perhaps stretching over several years), you need a different approach. The idea is to use the last day of 2008 as the origin in your example and compute the number of days past it for each observation. We then integer divide the number of days past the the origin by 14 and add one to get the fortnight number. origin <- as.Date('2008-12-31') # set origin DF$doy <- DF$Date - origin # days past origin DF$FN <- 1 + as.numeric(DF$doy)%/% 14 # fortnight number past origin> DFX Y Date Fortnight doy FN 1 114.5508 47.14094 2009-01-01 FN1 1 days 1 2 114.6468 46.98874 2009-01-03 FN1 3 days 1 3 114.6596 46.91235 2009-01-05 FN1 5 days 1 4 114.6957 46.88265 2009-01-10 FN1 10 days 1 5 114.6828 46.80584 2009-01-14 FN1 14 days 2 6 114.8903 46.67022 2009-01-15 FN2 15 days 2 7 114.9519 46.53264 2009-01-16 FN2 16 days 2 8 114.8842 46.47727 2009-01-17 FN2 17 days 2 9 114.8579 46.46457 2009-01-22 FN2 22 days 2 10 114.8489 46.47032 2009-01-29 FN3 29 days 3 This should be less tedious than the ifelse() approach. HTH, Dennis On Mon, Jul 5, 2010 at 1:01 PM, Abdi, Abdulhakim <AbdiA@si.edu> wrote:> Dear List, > > I've been looking tirelessly for a solution to this dilemma but without > success. Perhaps someone has an idea that will guide me in the right > direction. > > Suppose I have the following data.frame: > > DF = data.frame(X = c(114.5508, 114.6468, 114.6596, 114.6957, 114.6828, > 114.8903, 114.9519, 114.8842, > 114.8579, 114.8489), Y = c(47.14094, 46.98874, 46.91235, 46.88265, > 46.80584, 46.67022, 46.53264, 46.47727, > 46.46457, 46.47032), Date = as.Date(c('2009-01-01', '2009-01-03', > '2009-01-05', '2009-01-10', '2009-01-14', > '2009-01-15', '2009-01-16', '2009-01-17', '2009-01-22', '2009-01-29'))) > > DF > X Y Date > 1 114.5508 47.14094 2009-01-01 > 2 114.6468 46.98874 2009-01-03 > 3 114.6596 46.91235 2009-01-05 > 4 114.6957 46.88265 2009-01-10 > 5 114.6828 46.80584 2009-01-14 > 6 114.8903 46.67022 2009-01-15 > 7 114.9519 46.53264 2009-01-16 > 8 114.8842 46.47727 2009-01-17 > 9 114.8579 46.46457 2009-01-22 > 10 114.8489 46.47032 2009-01-29 > > I also have two objects that contain the dates of the first and last > fortnight of the month of January 2009. > > s.d1 = '2009-01-01' > e.d1 = '2009-01-14' > f.n1 = seq(from = as.Date(s.d1) , to = as.Date(e.d1), by = 1) > > f.n1 > [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" "2009-01-05" > "2009-01-06" "2009-01-07" "2009-01-08" "2009-01-09" "2009-01-10" > "2009-01-11" "2009-01-12" "2009-01-13" "2009-01-14" > > s.d2 = '2009-01-15' > e.d2 = '2009-01-31' > f.n2 = seq(from = as.Date(s.d2) , to = as.Date(e.d2), by = 1) > > f.n2 > [1] "2009-01-15" "2009-01-16" "2009-01-17" "2009-01-18" "2009-01-19" > "2009-01-20" "2009-01-21" "2009-01-22" "2009-01-23" "2009-01-24" > "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29" > "2009-01-30" "2009-01-31" > > > I'm trying to add a column called "Fortnight" to the existing data.frame. > The components of the new "Fortnight" column are based on the existing > "Date" column so that if the value in "Date" falls within the first > fortnight (f.n1) then the value of the new "Fortnight" column would be > "FN1", and if the value of the "Date" column falls within the second > fortnight (f.n2), then the value of the "Fortnight" column would be "FN2", > and so on. > > The end result should look like: > > X Y Date Fortnight > 1 114.5508 47.14094 2009-01-01 FN1 > 2 114.6468 46.98874 2009-01-03 FN1 > 3 114.6596 46.91235 2009-01-05 FN1 > 4 114.6957 46.88265 2009-01-10 FN1 > 5 114.6828 46.80584 2009-01-14 FN1 > 6 114.8903 46.67022 2009-01-15 FN2 > 7 114.9519 46.53264 2009-01-16 FN2 > 8 114.8842 46.47727 2009-01-17 FN2 > 9 114.8579 46.46457 2009-01-22 FN2 > 10 114.8489 46.47032 2009-01-29 FN2 > > I manually entered the above values for the "Fortnight" column to > illustrate my point, however, that would be quite tiresome for 500+ rows of > data ;-) > > The only other similar issue I found on the list was > https://stat.ethz.ch/pipermail/r-help/2008-February/153995.html but that > particular problem is slightly different than what I'm trying to accomplish > here. > > I appreciate your time and assistance. > > Thanks in advance. > > Regards, > > > Hakim Abdi > > > > _________________________________ > Abdulhakim Abdi, M.Sc. > Research Intern > > Conservation GIS/Remote Sensing Lab > Smithsonian Conservation Biology Institute > 1500 Remount Road > Front Royal, VA 22630 > phone: +1 540 635 6578 > mobile: +1 747 224 7006 > fax: +1 540 635 6506 (Attn:GIS Lab) > email: abdia@si.edu > http://nationalzoo.si.edu/SCBI/ConservationGIS/ > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Bill.Venables at csiro.au
2010-Jul-05 23:48 UTC
[R] data.frame: adding a column that is based on ranges of values in another column
Here is one way> checkList <- data.frame(Day = c(f.n1, f.n2),+ FN = rep(c("FN1","FN2"), + c(length(f.n1), length(f.n2))))> m <- match(DF$Date, checkList$Day) > DF <- cbind(DF, Fortnight = checkList$FN[m]) > DFX Y Date Fortnight 1 114.5508 47.14094 2009-01-01 FN1 2 114.6468 46.98874 2009-01-03 FN1 3 114.6596 46.91235 2009-01-05 FN1 4 114.6957 46.88265 2009-01-10 FN1 5 114.6828 46.80584 2009-01-14 FN1 6 114.8903 46.67022 2009-01-15 FN2 7 114.9519 46.53264 2009-01-16 FN2 8 114.8842 46.47727 2009-01-17 FN2 9 114.8579 46.46457 2009-01-22 FN2 10 114.8489 46.47032 2009-01-29 FN2>-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Abdi, Abdulhakim Sent: Tuesday, 6 July 2010 6:01 AM To: r-help at r-project.org Subject: [R] data.frame: adding a column that is based on ranges of values in another column Dear List, I've been looking tirelessly for a solution to this dilemma but without success. Perhaps someone has an idea that will guide me in the right direction. Suppose I have the following data.frame: DF = data.frame(X = c(114.5508, 114.6468, 114.6596, 114.6957, 114.6828, 114.8903, 114.9519, 114.8842, 114.8579, 114.8489), Y = c(47.14094, 46.98874, 46.91235, 46.88265, 46.80584, 46.67022, 46.53264, 46.47727, 46.46457, 46.47032), Date = as.Date(c('2009-01-01', '2009-01-03', '2009-01-05', '2009-01-10', '2009-01-14', '2009-01-15', '2009-01-16', '2009-01-17', '2009-01-22', '2009-01-29'))) DF X Y Date 1 114.5508 47.14094 2009-01-01 2 114.6468 46.98874 2009-01-03 3 114.6596 46.91235 2009-01-05 4 114.6957 46.88265 2009-01-10 5 114.6828 46.80584 2009-01-14 6 114.8903 46.67022 2009-01-15 7 114.9519 46.53264 2009-01-16 8 114.8842 46.47727 2009-01-17 9 114.8579 46.46457 2009-01-22 10 114.8489 46.47032 2009-01-29 I also have two objects that contain the dates of the first and last fortnight of the month of January 2009. s.d1 = '2009-01-01' e.d1 = '2009-01-14' f.n1 = seq(from = as.Date(s.d1) , to = as.Date(e.d1), by = 1) f.n1 [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" "2009-01-05" "2009-01-06" "2009-01-07" "2009-01-08" "2009-01-09" "2009-01-10" "2009-01-11" "2009-01-12" "2009-01-13" "2009-01-14" s.d2 = '2009-01-15' e.d2 = '2009-01-31' f.n2 = seq(from = as.Date(s.d2) , to = as.Date(e.d2), by = 1) f.n2 [1] "2009-01-15" "2009-01-16" "2009-01-17" "2009-01-18" "2009-01-19" "2009-01-20" "2009-01-21" "2009-01-22" "2009-01-23" "2009-01-24" "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29" "2009-01-30" "2009-01-31" I'm trying to add a column called "Fortnight" to the existing data.frame. The components of the new "Fortnight" column are based on the existing "Date" column so that if the value in "Date" falls within the first fortnight (f.n1) then the value of the new "Fortnight" column would be "FN1", and if the value of the "Date" column falls within the second fortnight (f.n2), then the value of the "Fortnight" column would be "FN2", and so on. The end result should look like: X Y Date Fortnight 1 114.5508 47.14094 2009-01-01 FN1 2 114.6468 46.98874 2009-01-03 FN1 3 114.6596 46.91235 2009-01-05 FN1 4 114.6957 46.88265 2009-01-10 FN1 5 114.6828 46.80584 2009-01-14 FN1 6 114.8903 46.67022 2009-01-15 FN2 7 114.9519 46.53264 2009-01-16 FN2 8 114.8842 46.47727 2009-01-17 FN2 9 114.8579 46.46457 2009-01-22 FN2 10 114.8489 46.47032 2009-01-29 FN2 I manually entered the above values for the "Fortnight" column to illustrate my point, however, that would be quite tiresome for 500+ rows of data ;-) The only other similar issue I found on the list was https://stat.ethz.ch/pipermail/r-help/2008-February/153995.html but that particular problem is slightly different than what I'm trying to accomplish here. I appreciate your time and assistance. Thanks in advance. Regards, Hakim Abdi _________________________________ Abdulhakim Abdi, M.Sc. Research Intern Conservation GIS/Remote Sensing Lab Smithsonian Conservation Biology Institute 1500 Remount Road Front Royal, VA 22630 phone: +1 540 635 6578 mobile: +1 747 224 7006 fax: +1 540 635 6506 (Attn:GIS Lab) email: abdia at si.edu http://nationalzoo.si.edu/SCBI/ConservationGIS/ [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.