Hi, I would appreciate if somebody could help me with this small issue... I have a dataframe like this (originaly has more than 100 000 rows):> subzjul time dtime fix ddawn ddusk day 101608 15006 2011-02-01 19:14:49 19.24694 noon 7.916667 19.88333 1 101609 15006 2011-02-01 19:24:49 19.41361 midnight 7.916667 19.56667 1 101610 15006 2011-02-01 19:24:49 19.41361 noon 7.916667 19.88333 1 101611 15006 2011-02-01 19:34:49 19.58028 midnight 7.916667 19.56667 0 101612 15006 2011-02-01 19:34:49 19.58028 noon 7.916667 19.88333 1 101613 15006 2011-02-01 19:44:49 19.74694 midnight 7.916667 19.56667 0 101614 15006 2011-02-01 19:44:49 19.74694 noon 7.916667 19.88333 1 101615 15006 2011-02-01 19:54:49 19.91361 midnight 7.916667 19.56667 0 101616 15006 2011-02-01 19:54:49 19.91361 noon 7.916667 19.88333 0 101617 15006 2011-02-01 20:04:49 20.08028 midnight 7.916667 19.56667 0 101618 15006 2011-02-01 20:04:49 20.08028 noon 7.916667 19.88333 0> dput(subz)structure(list(jul = c(15006, 15006, 15006, 15006, 15006, 15006, 15006, 15006, 15006, 15006, 15006), time = structure(c(1296587689, 1296588289, 1296588289, 1296588889, 1296588889, 1296589489, 1296589489, 1296590089, 1296590089, 1296590689, 1296590689), class = c("POSIXct", "POSIXt"), tzone = "GMT"), dtime = c(19.2469444444444, 19.4136111111111, 19.4136111111111, 19.5802777777778, 19.5802777777778, 19.7469444444444, 19.7469444444444, 19.9136111111111, 19.9136111111111, 20.0802777777778, 20.0802777777778), fix = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("midnight", "noon"), class = "factor"), ddawn = c(7.91666666666667, 7.91666666666667, 7.91666666666667, 7.91666666666667, 7.91666666666667, 7.91666666666667, 7.91666666666667, 7.91666666666667, 7.91666666666667, 7.91666666666667, 7.91666666666667 ), ddusk = c(19.8833333333333, 19.5666666666667, 19.8833333333333, 19.5666666666667, 19.8833333333333, 19.5666666666667, 19.8833333333333, 19.5666666666667, 19.8833333333333, 19.5666666666667, 19.8833333333333 ), day = c(1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0)), .Names = c("jul", "time", "dtime", "fix", "ddawn", "ddusk", "day"), row.names 101608:101618, class = "data.frame") where "day" is calculated as subz$day <- ifelse( subz$dtime > subz$ddusk | subz$dtime < subz$ddawn, 0, 1 ) The way I would like to calculate "day" is this - for the same "time", the "day" is calculated for "noon" as mentioned above but for "midnight" is just copying the same value as for "noon". So for the same "time" the "day" value should be the same for "noon" and "midnight". Something like this: jul time dtime fix ddawn ddusk day 101608 15006 2011-02-01 19:14:49 19.24694 noon 7.916667 19.88333 1 101609 15006 2011-02-01 19:24:49 19.41361 midnight 7.916667 19.56667 1 101610 15006 2011-02-01 19:24:49 19.41361 noon 7.916667 19.88333 1 101611 15006 2011-02-01 19:34:49 19.58028 midnight 7.916667 19.56667 1 101612 15006 2011-02-01 19:34:49 19.58028 noon 7.916667 19.88333 1 101613 15006 2011-02-01 19:44:49 19.74694 midnight 7.916667 19.56667 1 101614 15006 2011-02-01 19:44:49 19.74694 noon 7.916667 19.88333 1 101615 15006 2011-02-01 19:54:49 19.91361 midnight 7.916667 19.56667 0 101616 15006 2011-02-01 19:54:49 19.91361 noon 7.916667 19.88333 0 101617 15006 2011-02-01 20:04:49 20.08028 midnight 7.916667 19.56667 0 101618 15006 2011-02-01 20:04:49 20.08028 noon 7.916667 19.88333 0 Where I get stuck, is I don't know how to get the value for "midnight". Any suggestion is welcome. Thanks Zuzana [[alternative HTML version deleted]]
It sounds like, although your "noon" and "midnight" data are separate rows, they are not fully independent. If I understand correctly, the operation you want to perform would be simple if you had (at least temporarily) a single row with columns ddawn.midnight, ddusk.midnight, ddawn.noon, ddusk.noon, rather than two separate rows. I recommend you check out the reshape package http://had.co.nz/reshape/, and read the paper Hadley wrote about it for a conceptual understanding of wide vs. long data. On Fri, Mar 22, 2013 at 11:18 AM, zuzana zajkova <zuzulaz@gmail.com> wrote:> Hi, > > I would appreciate if somebody could help me with this small issue... > I have a dataframe like this (originaly has more than 100 000 rows): > > > subz > jul time dtime fix ddawn ddusk day > 101608 15006 2011-02-01 19:14:49 19.24694 noon 7.916667 19.88333 1 > 101609 15006 2011-02-01 19:24:49 19.41361 midnight 7.916667 19.56667 1 > 101610 15006 2011-02-01 19:24:49 19.41361 noon 7.916667 19.88333 1 > 101611 15006 2011-02-01 19:34:49 19.58028 midnight 7.916667 19.56667 0 > 101612 15006 2011-02-01 19:34:49 19.58028 noon 7.916667 19.88333 1 > 101613 15006 2011-02-01 19:44:49 19.74694 midnight 7.916667 19.56667 0 > 101614 15006 2011-02-01 19:44:49 19.74694 noon 7.916667 19.88333 1 > 101615 15006 2011-02-01 19:54:49 19.91361 midnight 7.916667 19.56667 0 > 101616 15006 2011-02-01 19:54:49 19.91361 noon 7.916667 19.88333 0 > 101617 15006 2011-02-01 20:04:49 20.08028 midnight 7.916667 19.56667 0 > 101618 15006 2011-02-01 20:04:49 20.08028 noon 7.916667 19.88333 0 > > > dput(subz) > structure(list(jul = c(15006, 15006, 15006, 15006, 15006, 15006, > 15006, 15006, 15006, 15006, 15006), time = structure(c(1296587689, > 1296588289, 1296588289, 1296588889, 1296588889, 1296589489, 1296589489, > 1296590089, 1296590089, 1296590689, 1296590689), class = c("POSIXct", > "POSIXt"), tzone = "GMT"), dtime = c(19.2469444444444, 19.4136111111111, > 19.4136111111111, 19.5802777777778, 19.5802777777778, 19.7469444444444, > 19.7469444444444, 19.9136111111111, 19.9136111111111, 20.0802777777778, > 20.0802777777778), fix = structure(c(2L, 1L, 2L, 1L, 2L, 1L, > 2L, 1L, 2L, 1L, 2L), .Label = c("midnight", "noon"), class = "factor"), > ddawn = c(7.91666666666667, 7.91666666666667, 7.91666666666667, > 7.91666666666667, 7.91666666666667, 7.91666666666667, 7.91666666666667, > 7.91666666666667, 7.91666666666667, 7.91666666666667, 7.91666666666667 > ), ddusk = c(19.8833333333333, 19.5666666666667, 19.8833333333333, > 19.5666666666667, 19.8833333333333, 19.5666666666667, 19.8833333333333, > 19.5666666666667, 19.8833333333333, 19.5666666666667, 19.8833333333333 > ), day = c(1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0)), .Names = c("jul", > "time", "dtime", "fix", "ddawn", "ddusk", "day"), row.names > 101608:101618, class = "data.frame") > > where "day" is calculated as > > subz$day <- ifelse( subz$dtime > subz$ddusk | subz$dtime < subz$ddawn, 0, 1 > ) > > The way I would like to calculate "day" is this > - for the same "time", the "day" is calculated for "noon" as mentioned > above but for "midnight" is just copying the same value as for "noon". > So for the same "time" the "day" value should be the same for "noon" and > "midnight". > Something like this: > > jul time dtime fix ddawn ddusk day > 101608 15006 2011-02-01 19:14:49 19.24694 noon 7.916667 19.88333 1 > 101609 15006 2011-02-01 19:24:49 19.41361 midnight 7.916667 19.56667 1 > 101610 15006 2011-02-01 19:24:49 19.41361 noon 7.916667 19.88333 1 > 101611 15006 2011-02-01 19:34:49 19.58028 midnight 7.916667 19.56667 1 > 101612 15006 2011-02-01 19:34:49 19.58028 noon 7.916667 19.88333 1 > 101613 15006 2011-02-01 19:44:49 19.74694 midnight 7.916667 19.56667 1 > 101614 15006 2011-02-01 19:44:49 19.74694 noon 7.916667 19.88333 1 > 101615 15006 2011-02-01 19:54:49 19.91361 midnight 7.916667 19.56667 0 > 101616 15006 2011-02-01 19:54:49 19.91361 noon 7.916667 19.88333 0 > 101617 15006 2011-02-01 20:04:49 20.08028 midnight 7.916667 19.56667 0 > 101618 15006 2011-02-01 20:04:49 20.08028 noon 7.916667 19.88333 0 > > Where I get stuck, is I don't know how to get the value for "midnight". > > Any suggestion is welcome. Thanks > > Zuzana > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hello, Try the following. idx <- which(subz$fix == "noon") if(idx[length(idx)] == nrow(subz)) idx <- idx[-length(idx)] subz$day[idx + 1] <- subz$day[idx] Hope this helps, Rui Barradas Em 22-03-2013 18:18, zuzana zajkova escreveu:> Hi, > > I would appreciate if somebody could help me with this small issue... > I have a dataframe like this (originaly has more than 100 000 rows): > >> subz > jul time dtime fix ddawn ddusk day > 101608 15006 2011-02-01 19:14:49 19.24694 noon 7.916667 19.88333 1 > 101609 15006 2011-02-01 19:24:49 19.41361 midnight 7.916667 19.56667 1 > 101610 15006 2011-02-01 19:24:49 19.41361 noon 7.916667 19.88333 1 > 101611 15006 2011-02-01 19:34:49 19.58028 midnight 7.916667 19.56667 0 > 101612 15006 2011-02-01 19:34:49 19.58028 noon 7.916667 19.88333 1 > 101613 15006 2011-02-01 19:44:49 19.74694 midnight 7.916667 19.56667 0 > 101614 15006 2011-02-01 19:44:49 19.74694 noon 7.916667 19.88333 1 > 101615 15006 2011-02-01 19:54:49 19.91361 midnight 7.916667 19.56667 0 > 101616 15006 2011-02-01 19:54:49 19.91361 noon 7.916667 19.88333 0 > 101617 15006 2011-02-01 20:04:49 20.08028 midnight 7.916667 19.56667 0 > 101618 15006 2011-02-01 20:04:49 20.08028 noon 7.916667 19.88333 0 > >> dput(subz) > structure(list(jul = c(15006, 15006, 15006, 15006, 15006, 15006, > 15006, 15006, 15006, 15006, 15006), time = structure(c(1296587689, > 1296588289, 1296588289, 1296588889, 1296588889, 1296589489, 1296589489, > 1296590089, 1296590089, 1296590689, 1296590689), class = c("POSIXct", > "POSIXt"), tzone = "GMT"), dtime = c(19.2469444444444, 19.4136111111111, > 19.4136111111111, 19.5802777777778, 19.5802777777778, 19.7469444444444, > 19.7469444444444, 19.9136111111111, 19.9136111111111, 20.0802777777778, > 20.0802777777778), fix = structure(c(2L, 1L, 2L, 1L, 2L, 1L, > 2L, 1L, 2L, 1L, 2L), .Label = c("midnight", "noon"), class = "factor"), > ddawn = c(7.91666666666667, 7.91666666666667, 7.91666666666667, > 7.91666666666667, 7.91666666666667, 7.91666666666667, 7.91666666666667, > 7.91666666666667, 7.91666666666667, 7.91666666666667, 7.91666666666667 > ), ddusk = c(19.8833333333333, 19.5666666666667, 19.8833333333333, > 19.5666666666667, 19.8833333333333, 19.5666666666667, 19.8833333333333, > 19.5666666666667, 19.8833333333333, 19.5666666666667, 19.8833333333333 > ), day = c(1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0)), .Names = c("jul", > "time", "dtime", "fix", "ddawn", "ddusk", "day"), row.names > 101608:101618, class = "data.frame") > > where "day" is calculated as > > subz$day <- ifelse( subz$dtime > subz$ddusk | subz$dtime < subz$ddawn, 0, 1 > ) > > The way I would like to calculate "day" is this > - for the same "time", the "day" is calculated for "noon" as mentioned > above but for "midnight" is just copying the same value as for "noon". > So for the same "time" the "day" value should be the same for "noon" and > "midnight". > Something like this: > > jul time dtime fix ddawn ddusk day > 101608 15006 2011-02-01 19:14:49 19.24694 noon 7.916667 19.88333 1 > 101609 15006 2011-02-01 19:24:49 19.41361 midnight 7.916667 19.56667 1 > 101610 15006 2011-02-01 19:24:49 19.41361 noon 7.916667 19.88333 1 > 101611 15006 2011-02-01 19:34:49 19.58028 midnight 7.916667 19.56667 1 > 101612 15006 2011-02-01 19:34:49 19.58028 noon 7.916667 19.88333 1 > 101613 15006 2011-02-01 19:44:49 19.74694 midnight 7.916667 19.56667 1 > 101614 15006 2011-02-01 19:44:49 19.74694 noon 7.916667 19.88333 1 > 101615 15006 2011-02-01 19:54:49 19.91361 midnight 7.916667 19.56667 0 > 101616 15006 2011-02-01 19:54:49 19.91361 noon 7.916667 19.88333 0 > 101617 15006 2011-02-01 20:04:49 20.08028 midnight 7.916667 19.56667 0 > 101618 15006 2011-02-01 20:04:49 20.08028 noon 7.916667 19.88333 0 > > Where I get stuck, is I don't know how to get the value for "midnight". > > Any suggestion is welcome. Thanks > > Zuzana > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi Rui, thank you for your code, but unfortunately it doesn't work correctly. What I got is this:> subzjul time dtime fix ddawn ddusk day 101608 15006 2011-02-01 19:14:49 19.24694 noon 7.916667 19.88333 1 101609 15006 2011-02-01 19:24:49 19.41361 midnight 7.916667 19.56667 1 101610 15006 2011-02-01 19:24:49 19.41361 noon 7.916667 19.88333 1 101611 15006 2011-02-01 19:34:49 19.58028 midnight 7.916667 19.56667 1 101612 15006 2011-02-01 19:34:49 19.58028 noon 7.916667 19.88333 1 101613 15006 2011-02-01 19:44:49 19.74694 midnight 7.916667 19.56667 1 101614 15006 2011-02-01 19:44:49 19.74694 noon 7.916667 19.88333 1 101615 15006 2011-02-01 19:54:49 19.91361 midnight 7.916667 19.56667 1 101616 15006 2011-02-01 19:54:49 19.91361 noon 7.916667 19.88333 0 101617 15006 2011-02-01 20:04:49 20.08028 midnight 7.916667 19.56667 0 101618 15006 2011-02-01 20:04:49 20.08028 noon 7.916667 19.88333 0 Where "day" for "time" 19:54:49 for midnight is 1 and for noon is 0. There are supposed to be 0 both (as "dtime" 19.91361 > "ddusk" for noon 19.88333) Probably the problem would be adding 1 to he index in subz$day[idx + 1] <- subz$day[idx] So far, I haven't found solution... Zuzana On 22 March 2013 20:01, Rui Barradas <ruipbarradas@sapo.pt> wrote:> Hello, > > Try the following. > > > idx <- which(subz$fix == "noon") > if(idx[length(idx)] == nrow(subz)) idx <- idx[-length(idx)] > subz$day[idx + 1] <- subz$day[idx] > > > Hope this helps, > > Rui Barradas > > Em 22-03-2013 18:18, zuzana zajkova escreveu: > >> Hi, >> >> I would appreciate if somebody could help me with this small issue... >> I have a dataframe like this (originaly has more than 100 000 rows): >> >> subz >>> >> jul time dtime fix ddawn ddusk day >> 101608 15006 2011-02-01 19:14:49 19.24694 noon 7.916667 19.88333 1 >> 101609 15006 2011-02-01 19:24:49 19.41361 midnight 7.916667 19.56667 1 >> 101610 15006 2011-02-01 19:24:49 19.41361 noon 7.916667 19.88333 1 >> 101611 15006 2011-02-01 19:34:49 19.58028 midnight 7.916667 19.56667 0 >> 101612 15006 2011-02-01 19:34:49 19.58028 noon 7.916667 19.88333 1 >> 101613 15006 2011-02-01 19:44:49 19.74694 midnight 7.916667 19.56667 0 >> 101614 15006 2011-02-01 19:44:49 19.74694 noon 7.916667 19.88333 1 >> 101615 15006 2011-02-01 19:54:49 19.91361 midnight 7.916667 19.56667 0 >> 101616 15006 2011-02-01 19:54:49 19.91361 noon 7.916667 19.88333 0 >> 101617 15006 2011-02-01 20:04:49 20.08028 midnight 7.916667 19.56667 0 >> 101618 15006 2011-02-01 20:04:49 20.08028 noon 7.916667 19.88333 0 >> >> dput(subz) >>> >> structure(list(jul = c(15006, 15006, 15006, 15006, 15006, 15006, >> 15006, 15006, 15006, 15006, 15006), time = structure(c(1296587689, >> 1296588289, 1296588289, 1296588889, 1296588889, 1296589489, 1296589489, >> 1296590089, 1296590089, 1296590689, 1296590689), class = c("POSIXct", >> "POSIXt"), tzone = "GMT"), dtime = c(19.2469444444444, 19.4136111111111, >> 19.4136111111111, 19.5802777777778, 19.5802777777778, 19.7469444444444, >> 19.7469444444444, 19.9136111111111, 19.9136111111111, 20.0802777777778, >> 20.0802777777778), fix = structure(c(2L, 1L, 2L, 1L, 2L, 1L, >> 2L, 1L, 2L, 1L, 2L), .Label = c("midnight", "noon"), class = "factor"), >> ddawn = c(7.91666666666667, 7.91666666666667, 7.91666666666667, >> 7.91666666666667, 7.91666666666667, 7.91666666666667, >> 7.91666666666667, >> 7.91666666666667, 7.91666666666667, 7.91666666666667, >> 7.91666666666667 >> ), ddusk = c(19.8833333333333, 19.5666666666667, 19.8833333333333, >> 19.5666666666667, 19.8833333333333, 19.5666666666667, >> 19.8833333333333, >> 19.5666666666667, 19.8833333333333, 19.5666666666667, >> 19.8833333333333 >> ), day = c(1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0)), .Names = c("jul", >> "time", "dtime", "fix", "ddawn", "ddusk", "day"), row.names >> 101608:101618, class = "data.frame") >> >> where "day" is calculated as >> >> subz$day <- ifelse( subz$dtime > subz$ddusk | subz$dtime < subz$ddawn, 0, >> 1 >> ) >> >> The way I would like to calculate "day" is this >> - for the same "time", the "day" is calculated for "noon" as mentioned >> above but for "midnight" is just copying the same value as for "noon". >> So for the same "time" the "day" value should be the same for "noon" and >> "midnight". >> Something like this: >> >> jul time dtime fix ddawn ddusk day >> 101608 15006 2011-02-01 19:14:49 19.24694 noon 7.916667 19.88333 1 >> 101609 15006 2011-02-01 19:24:49 19.41361 midnight 7.916667 19.56667 1 >> 101610 15006 2011-02-01 19:24:49 19.41361 noon 7.916667 19.88333 1 >> 101611 15006 2011-02-01 19:34:49 19.58028 midnight 7.916667 19.56667 1 >> 101612 15006 2011-02-01 19:34:49 19.58028 noon 7.916667 19.88333 1 >> 101613 15006 2011-02-01 19:44:49 19.74694 midnight 7.916667 19.56667 1 >> 101614 15006 2011-02-01 19:44:49 19.74694 noon 7.916667 19.88333 1 >> 101615 15006 2011-02-01 19:54:49 19.91361 midnight 7.916667 19.56667 0 >> 101616 15006 2011-02-01 19:54:49 19.91361 noon 7.916667 19.88333 0 >> 101617 15006 2011-02-01 20:04:49 20.08028 midnight 7.916667 19.56667 0 >> 101618 15006 2011-02-01 20:04:49 20.08028 noon 7.916667 19.88333 0 >> >> Where I get stuck, is I don't know how to get the value for "midnight". >> >> Any suggestion is welcome. Thanks >> >> Zuzana >> >> [[alternative HTML version deleted]] >> >> ______________________________**________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> >>[[alternative HTML version deleted]]