Abdi, Abdulhakim
2010-Jul-05 20:01 UTC
[R] data.frame: adding a column that is based on ranges of values in another column
Dear List,
I've been looking tirelessly for a solution to this dilemma but without
success. Perhaps someone has an idea that will guide me in the right direction.
Suppose I have the following data.frame:
DF = data.frame(X = c(114.5508, 114.6468, 114.6596, 114.6957, 114.6828,
114.8903, 114.9519, 114.8842,
114.8579, 114.8489), Y = c(47.14094, 46.98874, 46.91235, 46.88265, 46.80584,
46.67022, 46.53264, 46.47727,
46.46457, 46.47032), Date = as.Date(c('2009-01-01',
'2009-01-03', '2009-01-05', '2009-01-10',
'2009-01-14',
'2009-01-15', '2009-01-16', '2009-01-17',
'2009-01-22', '2009-01-29')))
DF
X Y Date
1 114.5508 47.14094 2009-01-01
2 114.6468 46.98874 2009-01-03
3 114.6596 46.91235 2009-01-05
4 114.6957 46.88265 2009-01-10
5 114.6828 46.80584 2009-01-14
6 114.8903 46.67022 2009-01-15
7 114.9519 46.53264 2009-01-16
8 114.8842 46.47727 2009-01-17
9 114.8579 46.46457 2009-01-22
10 114.8489 46.47032 2009-01-29
I also have two objects that contain the dates of the first and last fortnight
of the month of January 2009.
s.d1 = '2009-01-01'
e.d1 = '2009-01-14'
f.n1 = seq(from = as.Date(s.d1) , to = as.Date(e.d1), by = 1)
f.n1
[1] "2009-01-01" "2009-01-02" "2009-01-03"
"2009-01-04" "2009-01-05" "2009-01-06"
"2009-01-07" "2009-01-08" "2009-01-09"
"2009-01-10" "2009-01-11" "2009-01-12"
"2009-01-13" "2009-01-14"
s.d2 = '2009-01-15'
e.d2 = '2009-01-31'
f.n2 = seq(from = as.Date(s.d2) , to = as.Date(e.d2), by = 1)
f.n2
[1] "2009-01-15" "2009-01-16" "2009-01-17"
"2009-01-18" "2009-01-19" "2009-01-20"
"2009-01-21" "2009-01-22" "2009-01-23"
"2009-01-24" "2009-01-25" "2009-01-26"
"2009-01-27" "2009-01-28" "2009-01-29"
"2009-01-30" "2009-01-31"
I'm trying to add a column called "Fortnight" to the existing
data.frame. The components of the new "Fortnight" column are based on
the existing "Date" column so that if the value in "Date"
falls within the first fortnight (f.n1) then the value of the new
"Fortnight" column would be "FN1", and if the value of the
"Date" column falls within the second fortnight (f.n2), then the value
of the "Fortnight" column would be "FN2", and so on.
The end result should look like:
X Y Date Fortnight
1 114.5508 47.14094 2009-01-01 FN1
2 114.6468 46.98874 2009-01-03 FN1
3 114.6596 46.91235 2009-01-05 FN1
4 114.6957 46.88265 2009-01-10 FN1
5 114.6828 46.80584 2009-01-14 FN1
6 114.8903 46.67022 2009-01-15 FN2
7 114.9519 46.53264 2009-01-16 FN2
8 114.8842 46.47727 2009-01-17 FN2
9 114.8579 46.46457 2009-01-22 FN2
10 114.8489 46.47032 2009-01-29 FN2
I manually entered the above values for the "Fortnight" column to
illustrate my point, however, that would be quite tiresome for 500+ rows of data
;-)
The only other similar issue I found on the list was
https://stat.ethz.ch/pipermail/r-help/2008-February/153995.html but that
particular problem is slightly different than what I'm trying to accomplish
here.
I appreciate your time and assistance.
Thanks in advance.
Regards,
Hakim Abdi
_________________________________
Abdulhakim Abdi, M.Sc.
Research Intern
Conservation GIS/Remote Sensing Lab
Smithsonian Conservation Biology Institute
1500 Remount Road
Front Royal, VA 22630
phone: +1 540 635 6578
mobile: +1 747 224 7006
fax: +1 540 635 6506 (Attn:GIS Lab)
email: abdia@si.edu
http://nationalzoo.si.edu/SCBI/ConservationGIS/
[[alternative HTML version deleted]]
jim holtman
2010-Jul-05 20:55 UTC
[R] data.frame: adding a column that is based on ranges of values in another column
use 'merge':> DF = data.frame(X = c(114.5508, 114.6468, 114.6596, 114.6957, 114.6828, 114.8903, 114.9519, 114.8842,+ 114.8579, 114.8489), Y = c(47.14094, 46.98874, 46.91235, 46.88265, 46.80584, 46.67022, 46.53264, 46.47727, + 46.46457, 46.47032), Date = as.Date(c('2009-01-01', '2009-01-03', '2009-01-05', '2009-01-10', '2009-01-14', + '2009-01-15', '2009-01-16', '2009-01-17', '2009-01-22', '2009-01-29')))> > s.d1 = '2009-01-01' > e.d1 = '2009-01-14' > f.n1 = seq(from = as.Date(s.d1) , to = as.Date(e.d1), by = 1) > > s.d2 = '2009-01-15' > e.d2 = '2009-01-31' > f.n2 = seq(from = as.Date(s.d2) , to = as.Date(e.d2), by = 1) > x.new <- data.frame(Date=c(f.n1, f.n2),+ Fortnight=c(rep("FN1", length(f.n1)), rep("FN2", length(f.n2))))> > merge(DF, x.new, all.x=TRUE)Date X Y Fortnight 1 2009-01-01 114.5508 47.14094 FN1 2 2009-01-03 114.6468 46.98874 FN1 3 2009-01-05 114.6596 46.91235 FN1 4 2009-01-10 114.6957 46.88265 FN1 5 2009-01-14 114.6828 46.80584 FN1 6 2009-01-15 114.8903 46.67022 FN2 7 2009-01-16 114.9519 46.53264 FN2 8 2009-01-17 114.8842 46.47727 FN2 9 2009-01-22 114.8579 46.46457 FN2 10 2009-01-29 114.8489 46.47032 FN2 On Mon, Jul 5, 2010 at 4:01 PM, Abdi, Abdulhakim <AbdiA at si.edu> wrote:> Dear List, > > I've been looking tirelessly for a solution to this dilemma but without success. Perhaps someone has an idea that will guide me in the right direction. > > Suppose I have the following data.frame: > > DF = data.frame(X = c(114.5508, 114.6468, 114.6596, 114.6957, 114.6828, 114.8903, 114.9519, 114.8842, > 114.8579, 114.8489), Y = c(47.14094, 46.98874, 46.91235, 46.88265, 46.80584, 46.67022, 46.53264, 46.47727, > 46.46457, 46.47032), Date = as.Date(c('2009-01-01', '2009-01-03', '2009-01-05', '2009-01-10', '2009-01-14', > '2009-01-15', '2009-01-16', '2009-01-17', '2009-01-22', '2009-01-29'))) > > DF > ? ? ? ? ?X ? ? ? ?Y ? ? ? Date > 1 ?114.5508 47.14094 2009-01-01 > 2 ?114.6468 46.98874 2009-01-03 > 3 ?114.6596 46.91235 2009-01-05 > 4 ?114.6957 46.88265 2009-01-10 > 5 ?114.6828 46.80584 2009-01-14 > 6 ?114.8903 46.67022 2009-01-15 > 7 ?114.9519 46.53264 2009-01-16 > 8 ?114.8842 46.47727 2009-01-17 > 9 ?114.8579 46.46457 2009-01-22 > 10 114.8489 46.47032 2009-01-29 > > I also have two objects that contain the dates of the first and last fortnight of the month of January 2009. > > s.d1 = '2009-01-01' > e.d1 = '2009-01-14' > f.n1 = seq(from = as.Date(s.d1) ?, to = ?as.Date(e.d1), by = 1) > > f.n1 > [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" "2009-01-05" "2009-01-06" "2009-01-07" "2009-01-08" "2009-01-09" "2009-01-10" "2009-01-11" "2009-01-12" "2009-01-13" "2009-01-14" > > s.d2 = '2009-01-15' > e.d2 = '2009-01-31' > f.n2 = seq(from = as.Date(s.d2) ?, to = ?as.Date(e.d2), by = 1) > > f.n2 > [1] "2009-01-15" "2009-01-16" "2009-01-17" "2009-01-18" "2009-01-19" "2009-01-20" "2009-01-21" "2009-01-22" "2009-01-23" "2009-01-24" "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29" "2009-01-30" "2009-01-31" > > > I'm trying to add a column called "Fortnight" to the existing data.frame. The components of the new "Fortnight" column are based on the existing "Date" column so that if the value in "Date" falls within the first fortnight (f.n1) then the value of the new "Fortnight" column would be "FN1", and if the value of the "Date" column falls within the second fortnight (f.n2), then the value of the "Fortnight" column would be "FN2", and so on. > > The end result should look like: > > ? ? ? ? ?X ? ? ? ?Y ? ? ? Date Fortnight > 1 ?114.5508 47.14094 2009-01-01 ? ? ? FN1 > 2 ?114.6468 46.98874 2009-01-03 ? ? ? FN1 > 3 ?114.6596 46.91235 2009-01-05 ? ? ? FN1 > 4 ?114.6957 46.88265 2009-01-10 ? ? ? FN1 > 5 ?114.6828 46.80584 2009-01-14 ? ? ? FN1 > 6 ?114.8903 46.67022 2009-01-15 ? ? ? FN2 > 7 ?114.9519 46.53264 2009-01-16 ? ? ? FN2 > 8 ?114.8842 46.47727 2009-01-17 ? ? ? FN2 > 9 ?114.8579 46.46457 2009-01-22 ? ? ? FN2 > 10 114.8489 46.47032 2009-01-29 ? ? ? FN2 > > I manually entered the above values for the "Fortnight" column to illustrate my point, however, that would be quite tiresome for 500+ rows of data ;-) > > The only other similar issue I found on the list was https://stat.ethz.ch/pipermail/r-help/2008-February/153995.html but that particular problem is slightly different than what I'm trying to accomplish here. > > I appreciate your time and assistance. > > Thanks in advance. > > Regards, > > > Hakim Abdi > > > > _________________________________ > Abdulhakim Abdi, M.Sc. > Research Intern > > Conservation GIS/Remote Sensing Lab > Smithsonian Conservation Biology Institute > 1500 Remount Road > Front Royal, VA 22630 > phone: +1 540 635 6578 > mobile: +1 747 224 7006 > fax: +1 540 635 6506 (Attn:GIS Lab) > email: abdia at si.edu > http://nationalzoo.si.edu/SCBI/ConservationGIS/ > > > > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Dennis Murphy
2010-Jul-05 20:55 UTC
[R] data.frame: adding a column that is based on ranges of values in another column
Hi:
Since you've been looking tirelessly :)
For your stated problem, the following will work:
DF$Fortnight <- with(DF, ifelse(Date %in% f.n1, 'FN1',
ifelse(Date %in% f.n2, 'FN2', 'FN3')))
However, if you have a number of fortnights (perhaps stretching over several
years), you need a different approach. The idea is to use the last day of
2008 as the origin in your example and compute the number of days past it
for each observation. We then integer divide the number of days past the the
origin by 14 and add one to get the fortnight number.
origin <- as.Date('2008-12-31') # set origin
DF$doy <- DF$Date - origin # days past origin
DF$FN <- 1 + as.numeric(DF$doy)%/% 14 # fortnight number past origin
> DF
X Y Date Fortnight doy FN
1 114.5508 47.14094 2009-01-01 FN1 1 days 1
2 114.6468 46.98874 2009-01-03 FN1 3 days 1
3 114.6596 46.91235 2009-01-05 FN1 5 days 1
4 114.6957 46.88265 2009-01-10 FN1 10 days 1
5 114.6828 46.80584 2009-01-14 FN1 14 days 2
6 114.8903 46.67022 2009-01-15 FN2 15 days 2
7 114.9519 46.53264 2009-01-16 FN2 16 days 2
8 114.8842 46.47727 2009-01-17 FN2 17 days 2
9 114.8579 46.46457 2009-01-22 FN2 22 days 2
10 114.8489 46.47032 2009-01-29 FN3 29 days 3
This should be less tedious than the ifelse() approach.
HTH,
Dennis
On Mon, Jul 5, 2010 at 1:01 PM, Abdi, Abdulhakim <AbdiA@si.edu> wrote:
> Dear List,
>
> I've been looking tirelessly for a solution to this dilemma but without
> success. Perhaps someone has an idea that will guide me in the right
> direction.
>
> Suppose I have the following data.frame:
>
> DF = data.frame(X = c(114.5508, 114.6468, 114.6596, 114.6957, 114.6828,
> 114.8903, 114.9519, 114.8842,
> 114.8579, 114.8489), Y = c(47.14094, 46.98874, 46.91235, 46.88265,
> 46.80584, 46.67022, 46.53264, 46.47727,
> 46.46457, 46.47032), Date = as.Date(c('2009-01-01',
'2009-01-03',
> '2009-01-05', '2009-01-10', '2009-01-14',
> '2009-01-15', '2009-01-16', '2009-01-17',
'2009-01-22', '2009-01-29')))
>
> DF
> X Y Date
> 1 114.5508 47.14094 2009-01-01
> 2 114.6468 46.98874 2009-01-03
> 3 114.6596 46.91235 2009-01-05
> 4 114.6957 46.88265 2009-01-10
> 5 114.6828 46.80584 2009-01-14
> 6 114.8903 46.67022 2009-01-15
> 7 114.9519 46.53264 2009-01-16
> 8 114.8842 46.47727 2009-01-17
> 9 114.8579 46.46457 2009-01-22
> 10 114.8489 46.47032 2009-01-29
>
> I also have two objects that contain the dates of the first and last
> fortnight of the month of January 2009.
>
> s.d1 = '2009-01-01'
> e.d1 = '2009-01-14'
> f.n1 = seq(from = as.Date(s.d1) , to = as.Date(e.d1), by = 1)
>
> f.n1
> [1] "2009-01-01" "2009-01-02" "2009-01-03"
"2009-01-04" "2009-01-05"
> "2009-01-06" "2009-01-07" "2009-01-08"
"2009-01-09" "2009-01-10"
> "2009-01-11" "2009-01-12" "2009-01-13"
"2009-01-14"
>
> s.d2 = '2009-01-15'
> e.d2 = '2009-01-31'
> f.n2 = seq(from = as.Date(s.d2) , to = as.Date(e.d2), by = 1)
>
> f.n2
> [1] "2009-01-15" "2009-01-16" "2009-01-17"
"2009-01-18" "2009-01-19"
> "2009-01-20" "2009-01-21" "2009-01-22"
"2009-01-23" "2009-01-24"
> "2009-01-25" "2009-01-26" "2009-01-27"
"2009-01-28" "2009-01-29"
> "2009-01-30" "2009-01-31"
>
>
> I'm trying to add a column called "Fortnight" to the existing
data.frame.
> The components of the new "Fortnight" column are based on the
existing
> "Date" column so that if the value in "Date" falls
within the first
> fortnight (f.n1) then the value of the new "Fortnight" column
would be
> "FN1", and if the value of the "Date" column falls
within the second
> fortnight (f.n2), then the value of the "Fortnight" column would
be "FN2",
> and so on.
>
> The end result should look like:
>
> X Y Date Fortnight
> 1 114.5508 47.14094 2009-01-01 FN1
> 2 114.6468 46.98874 2009-01-03 FN1
> 3 114.6596 46.91235 2009-01-05 FN1
> 4 114.6957 46.88265 2009-01-10 FN1
> 5 114.6828 46.80584 2009-01-14 FN1
> 6 114.8903 46.67022 2009-01-15 FN2
> 7 114.9519 46.53264 2009-01-16 FN2
> 8 114.8842 46.47727 2009-01-17 FN2
> 9 114.8579 46.46457 2009-01-22 FN2
> 10 114.8489 46.47032 2009-01-29 FN2
>
> I manually entered the above values for the "Fortnight" column to
> illustrate my point, however, that would be quite tiresome for 500+ rows of
> data ;-)
>
> The only other similar issue I found on the list was
> https://stat.ethz.ch/pipermail/r-help/2008-February/153995.html but that
> particular problem is slightly different than what I'm trying to
accomplish
> here.
>
> I appreciate your time and assistance.
>
> Thanks in advance.
>
> Regards,
>
>
> Hakim Abdi
>
>
>
> _________________________________
> Abdulhakim Abdi, M.Sc.
> Research Intern
>
> Conservation GIS/Remote Sensing Lab
> Smithsonian Conservation Biology Institute
> 1500 Remount Road
> Front Royal, VA 22630
> phone: +1 540 635 6578
> mobile: +1 747 224 7006
> fax: +1 540 635 6506 (Attn:GIS Lab)
> email: abdia@si.edu
> http://nationalzoo.si.edu/SCBI/ConservationGIS/
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Bill.Venables at csiro.au
2010-Jul-05 23:48 UTC
[R] data.frame: adding a column that is based on ranges of values in another column
Here is one way> checkList <- data.frame(Day = c(f.n1, f.n2),+ FN = rep(c("FN1","FN2"), + c(length(f.n1), length(f.n2))))> m <- match(DF$Date, checkList$Day) > DF <- cbind(DF, Fortnight = checkList$FN[m]) > DFX Y Date Fortnight 1 114.5508 47.14094 2009-01-01 FN1 2 114.6468 46.98874 2009-01-03 FN1 3 114.6596 46.91235 2009-01-05 FN1 4 114.6957 46.88265 2009-01-10 FN1 5 114.6828 46.80584 2009-01-14 FN1 6 114.8903 46.67022 2009-01-15 FN2 7 114.9519 46.53264 2009-01-16 FN2 8 114.8842 46.47727 2009-01-17 FN2 9 114.8579 46.46457 2009-01-22 FN2 10 114.8489 46.47032 2009-01-29 FN2>-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Abdi, Abdulhakim Sent: Tuesday, 6 July 2010 6:01 AM To: r-help at r-project.org Subject: [R] data.frame: adding a column that is based on ranges of values in another column Dear List, I've been looking tirelessly for a solution to this dilemma but without success. Perhaps someone has an idea that will guide me in the right direction. Suppose I have the following data.frame: DF = data.frame(X = c(114.5508, 114.6468, 114.6596, 114.6957, 114.6828, 114.8903, 114.9519, 114.8842, 114.8579, 114.8489), Y = c(47.14094, 46.98874, 46.91235, 46.88265, 46.80584, 46.67022, 46.53264, 46.47727, 46.46457, 46.47032), Date = as.Date(c('2009-01-01', '2009-01-03', '2009-01-05', '2009-01-10', '2009-01-14', '2009-01-15', '2009-01-16', '2009-01-17', '2009-01-22', '2009-01-29'))) DF X Y Date 1 114.5508 47.14094 2009-01-01 2 114.6468 46.98874 2009-01-03 3 114.6596 46.91235 2009-01-05 4 114.6957 46.88265 2009-01-10 5 114.6828 46.80584 2009-01-14 6 114.8903 46.67022 2009-01-15 7 114.9519 46.53264 2009-01-16 8 114.8842 46.47727 2009-01-17 9 114.8579 46.46457 2009-01-22 10 114.8489 46.47032 2009-01-29 I also have two objects that contain the dates of the first and last fortnight of the month of January 2009. s.d1 = '2009-01-01' e.d1 = '2009-01-14' f.n1 = seq(from = as.Date(s.d1) , to = as.Date(e.d1), by = 1) f.n1 [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" "2009-01-05" "2009-01-06" "2009-01-07" "2009-01-08" "2009-01-09" "2009-01-10" "2009-01-11" "2009-01-12" "2009-01-13" "2009-01-14" s.d2 = '2009-01-15' e.d2 = '2009-01-31' f.n2 = seq(from = as.Date(s.d2) , to = as.Date(e.d2), by = 1) f.n2 [1] "2009-01-15" "2009-01-16" "2009-01-17" "2009-01-18" "2009-01-19" "2009-01-20" "2009-01-21" "2009-01-22" "2009-01-23" "2009-01-24" "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29" "2009-01-30" "2009-01-31" I'm trying to add a column called "Fortnight" to the existing data.frame. The components of the new "Fortnight" column are based on the existing "Date" column so that if the value in "Date" falls within the first fortnight (f.n1) then the value of the new "Fortnight" column would be "FN1", and if the value of the "Date" column falls within the second fortnight (f.n2), then the value of the "Fortnight" column would be "FN2", and so on. The end result should look like: X Y Date Fortnight 1 114.5508 47.14094 2009-01-01 FN1 2 114.6468 46.98874 2009-01-03 FN1 3 114.6596 46.91235 2009-01-05 FN1 4 114.6957 46.88265 2009-01-10 FN1 5 114.6828 46.80584 2009-01-14 FN1 6 114.8903 46.67022 2009-01-15 FN2 7 114.9519 46.53264 2009-01-16 FN2 8 114.8842 46.47727 2009-01-17 FN2 9 114.8579 46.46457 2009-01-22 FN2 10 114.8489 46.47032 2009-01-29 FN2 I manually entered the above values for the "Fortnight" column to illustrate my point, however, that would be quite tiresome for 500+ rows of data ;-) The only other similar issue I found on the list was https://stat.ethz.ch/pipermail/r-help/2008-February/153995.html but that particular problem is slightly different than what I'm trying to accomplish here. I appreciate your time and assistance. Thanks in advance. Regards, Hakim Abdi _________________________________ Abdulhakim Abdi, M.Sc. Research Intern Conservation GIS/Remote Sensing Lab Smithsonian Conservation Biology Institute 1500 Remount Road Front Royal, VA 22630 phone: +1 540 635 6578 mobile: +1 747 224 7006 fax: +1 540 635 6506 (Attn:GIS Lab) email: abdia at si.edu http://nationalzoo.si.edu/SCBI/ConservationGIS/ [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.