Aurelie Cosandey Godin
2011-Nov-01 17:16 UTC
[R] Removal/selecting specific rows in a dataframe conditional on 2 columns
Dear list, After reading different mails, blogs, and tried a few different codes without any success, I am asking your help! I have the following data frame where each row represent a survey unit with the following variables:> names(RV09)[1] "record.t" "trip" "set" "month" "stratum" "NAFO" [7] "unit.area" "time" "dur.set" "distance" "operation" "mean.d" [13] "min.d" "max.d" "temp.d" "slat" "slong" "spp" [19] "number" "weight" "elat" "elong" Each survey unit generates one set record, denoted by a 5 in column "record.t". Each species identified in this particular survey unit generates an additional set record, denoted by a 6.> unique(RV09$record.t)[1] 5 6 Each survey unit are identified by a specific "trip" and "set" number, so if there is a 5 record type with no associated 6 records, it means that no species were observed in that survey unit. I would like to be able to select all and only these survey units, which represent my zeros. So as an exemple, in this trip number 913, set 1, 3, and 4 would be part of my "zeros" data.frame as they appear with no record.t 6, such that no species were observed in this survey unit.> head(RV09)record.t trip set month stratum NAFO unit.area time dur.set distance 585 5 913 1 10 351 3O R31 1044 17 9 586 5 913 2 10 351 3O R31 1440 17 9 587 6 913 2 10 351 3O R31 1440 17 9 588 5 913 3 10 340 3O Q31 1800 18 9 589 5 913 4 10 340 3O Q32 2142 17 9 Any tips on how extract this "zero" data.frame in R? Thank you very much in advance! Best, ~Aurelie Aurelie Cosandey-Godin Ph.D. student, Department of Biology Industrial Graduate Fellow, WWF-Canada Dalhousie University | Email: godina@dal.ca [[alternative HTML version deleted]]
R. Michael Weylandt
2011-Nov-01 21:16 UTC
[R] Removal/selecting specific rows in a dataframe conditional on 2 columns
Perhaps use tapply() to split by the survey unit and write a little identity function that returns only those rows you want, then patch them all back together with something like simplify2array(). Michael On Tue, Nov 1, 2011 at 1:16 PM, Aurelie Cosandey Godin <godina at dal.ca> wrote:> Dear list, > > After reading different mails, blogs, and tried a few different codes without any success, I am asking your help! > I have the following data frame where each row represent a survey unit with the following variables: > >> names(RV09) > ?[1] "record.t" ?"trip" ? ? ?"set" ? ? ? "month" ? ? "stratum" ? "NAFO" > ?[7] "unit.area" "time" ? ? ?"dur.set" ? "distance" ?"operation" "mean.d" > [13] "min.d" ? ? "max.d" ? ? "temp.d" ? ?"slat" ? ? ?"slong" ? ? "spp" > [19] "number" ? ?"weight" ? ?"elat" ? ? ?"elong" > > Each survey unit generates one set record, denoted by a 5 in column "record.t". Each species identified in this particular survey unit generates an additional set record, denoted by a 6. > >> unique(RV09$record.t) > [1] 5 6 > > Each survey unit are identified by a specific "trip" and "set" number, so if there is a 5 record type with no associated 6 records, it means that no species were observed in that survey unit. I would like to be able to select all and only these survey units, which represent my zeros. > > So as an exemple, in this trip number 913, set 1, 3, and 4 would be part of my "zeros" data.frame as they appear with no record.t 6, such that no species were observed in this survey unit. > >> head(RV09) > ? record.t trip set month stratum NAFO unit.area time dur.set distance > 585 ? ? ? ?5 ?913 ? 1 ? ?10 ? ? 351 ? 3O ? ? ? R31 1044 ? ? ?17 ? ? ? ?9 > 586 ? ? ? ?5 ?913 ? 2 ? ?10 ? ? 351 ? 3O ? ? ? R31 1440 ? ? ?17 ? ? ? ?9 > 587 ? ? ? ?6 ?913 ? 2 ? ?10 ? ? 351 ? 3O ? ? ? R31 1440 ? ? ?17 ? ? ? ?9 > 588 ? ? ? ?5 ?913 ? 3 ? ?10 ? ? 340 ? 3O ? ? ? Q31 1800 ? ? ?18 ? ? ? ?9 > 589 ? ? ? ?5 ?913 ? 4 ? ?10 ? ? 340 ? 3O ? ? ? Q32 2142 ? ? ?17 ? ? ? ?9 > > Any tips on how extract this "zero" data.frame in R? > Thank you very much in advance! > > Best, > ~Aurelie > > > Aurelie Cosandey-Godin > Ph.D. student, Department of Biology > Industrial Graduate Fellow, WWF-Canada > Dalhousie University | Email: godina at dal.ca > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dennis Murphy
2011-Nov-01 23:32 UTC
[R] Removal/selecting specific rows in a dataframe conditional on 2 columns
Does this work? library('plyr') # Function to return a data frame if it has one row, else return NULL: f <- function(d) if(nrow(d) == 1L) d else NULL> ddply(RV09, .(set, month), f)record.t trip set month stratum NAFO unit.area time dur.set distance 1 5 913 1 10 351 3O R31 1044 17 9 2 5 913 3 10 340 3O Q31 1800 18 9 3 5 913 4 10 340 3O Q32 2142 17 9 ddply() is an apply-like function that takes a data frame as input and a data frame as output (hence the dd). The first argument is the data frame name, the second argument the set of grouping variables and the third is the function to be called (in this application). HTH, Dennis On Tue, Nov 1, 2011 at 10:16 AM, Aurelie Cosandey Godin <godina at dal.ca> wrote:> Dear list, > > After reading different mails, blogs, and tried a few different codes without any success, I am asking your help! > I have the following data frame where each row represent a survey unit with the following variables: > >> names(RV09) > ?[1] "record.t" ?"trip" ? ? ?"set" ? ? ? "month" ? ? "stratum" ? "NAFO" > ?[7] "unit.area" "time" ? ? ?"dur.set" ? "distance" ?"operation" "mean.d" > [13] "min.d" ? ? "max.d" ? ? "temp.d" ? ?"slat" ? ? ?"slong" ? ? "spp" > [19] "number" ? ?"weight" ? ?"elat" ? ? ?"elong" > > Each survey unit generates one set record, denoted by a 5 in column "record.t". Each species identified in this particular survey unit generates an additional set record, denoted by a 6. > >> unique(RV09$record.t) > [1] 5 6 > > Each survey unit are identified by a specific "trip" and "set" number, so if there is a 5 record type with no associated 6 records, it means that no species were observed in that survey unit. I would like to be able to select all and only these survey units, which represent my zeros. > > So as an exemple, in this trip number 913, set 1, 3, and 4 would be part of my "zeros" data.frame as they appear with no record.t 6, such that no species were observed in this survey unit. > >> head(RV09) > ? record.t trip set month stratum NAFO unit.area time dur.set distance > 585 ? ? ? ?5 ?913 ? 1 ? ?10 ? ? 351 ? 3O ? ? ? R31 1044 ? ? ?17 ? ? ? ?9 > 586 ? ? ? ?5 ?913 ? 2 ? ?10 ? ? 351 ? 3O ? ? ? R31 1440 ? ? ?17 ? ? ? ?9 > 587 ? ? ? ?6 ?913 ? 2 ? ?10 ? ? 351 ? 3O ? ? ? R31 1440 ? ? ?17 ? ? ? ?9 > 588 ? ? ? ?5 ?913 ? 3 ? ?10 ? ? 340 ? 3O ? ? ? Q31 1800 ? ? ?18 ? ? ? ?9 > 589 ? ? ? ?5 ?913 ? 4 ? ?10 ? ? 340 ? 3O ? ? ? Q32 2142 ? ? ?17 ? ? ? ?9 > > Any tips on how extract this "zero" data.frame in R? > Thank you very much in advance! > > Best, > ~Aurelie > > > Aurelie Cosandey-Godin > Ph.D. student, Department of Biology > Industrial Graduate Fellow, WWF-Canada > Dalhousie University | Email: godina at dal.ca > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Maybe Matching Threads
- Export a plot/figure to excel or word?
- Deleting rows dataframe in R conditional to “if any of (a specific variable) is equal to”
- Combining some duplicated rows & summing one of their column
- Polishing my geom_bar for publication
- Create a list object in a loop