hi all, I have a data frame such as: 1 blue 0.3 1 NA 0.4 1 red NA 2 blue NA 2 green NA 2 blue NA 3 red 0.5 3 blue NA 3 NA 1.1 I wish to find the last non-missing value in every 3ple: ie I want a 3 by 3 data.frame such as: 1 red 0.4 2 blue NA 3 blue 1.1 I have written a little script data = structure(list(V1 = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L ), V2 = structure(c(1L, NA, 3L, 1L, 2L, 1L, 3L, 1L, NA), .Label = c("blue", "green", "red"), class = "factor"), V3 = c(0.3, 0.4, NA, NA, NA, NA, 0.5, NA, 1.1)), .Names = c("V1", "V2", "V3"), class "data.frame", row.names = c(NA, -9L)) cl = function(x) x[max(which(!is.na(x)))] choose.last = function(x) tapply(x,x[,1],cl) # now function choose.last works properly on numeric vectors:> choose.last(data[,3])1 2 3 0.4 NA 1.1 # but not on factors (I loose the factor labels):> choose.last(data[,2])1 2 3 3 1 1 # moreover, if I apply this function to the whole data.frame # the output is a character matrix> apply(data,2,choose.last)V1 V2 V3 1 "1" "red" "0.4" 2 "2" "blue" NA 3 "3" "blue" "1.1" # and if I sapply, I loose factors labels> sapply(data,choose.last)V1 V2 V3 1 1 3 0.4 2 2 1 NA 3 3 1 1.1 any hint? Thanks in advance, Patrizio +------------------------------------------------- | Patrizio Frederic, PhD | Research associate in Statistics, | Department of Economics, | University of Modena and Reggio Emilia, | Via Berengario 51, | 41100 Modena, Italy | | tel: +39 059 205 6727 | fax: +39 059 205 6947 | mail: patrizio.frederic at unimore.it +-------------------------------------------------
Yes. Read the help pages **carefully**! e.g. ?tapply says that the first argument is an **atomic** vector. A factor is not an atomic vector. So tapply interprets it as such by looking only at its representation, which is as integer values. apply works on **arrays,** which must be of a single type. So it silently converts the data frame to the simplest common type it "can," which is an array of characters. etc. I admit that these details are somewhat obscure and even annoying -- but they **are** documented. I think that's all we can expect. Some have lamented the lack of the language's perfect consistency in these matters, but I cannot understand how that would be possible given its nature, intended, as it is, to be **easily** used for high level data manipulation, graphics,statistical analysis etc. as well as programming. There are just too many possible data structures to expect logical consistency in their handling throughout (if one can even define what that means in specific instances!). All these little inconveniences can be worked around easily, of course. For example, if your new vector of numeric factor levels if f.new and f.old is your original factor, levels(f.old)[f.new] converts f.new to the appropriate character vector. And so forth. So the key is: pay **careful** attention to the docs. -- Bert Gunter -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Patrizio Frederic Sent: Wednesday, December 10, 2008 2:09 PM To: r-help at r-project.org Subject: [R] repeated searching of no-missing values hi all, I have a data frame such as: 1 blue 0.3 1 NA 0.4 1 red NA 2 blue NA 2 green NA 2 blue NA 3 red 0.5 3 blue NA 3 NA 1.1 I wish to find the last non-missing value in every 3ple: ie I want a 3 by 3 data.frame such as: 1 red 0.4 2 blue NA 3 blue 1.1 I have written a little script data = structure(list(V1 = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L ), V2 = structure(c(1L, NA, 3L, 1L, 2L, 1L, 3L, 1L, NA), .Label = c("blue", "green", "red"), class = "factor"), V3 = c(0.3, 0.4, NA, NA, NA, NA, 0.5, NA, 1.1)), .Names = c("V1", "V2", "V3"), class "data.frame", row.names = c(NA, -9L)) cl = function(x) x[max(which(!is.na(x)))] choose.last = function(x) tapply(x,x[,1],cl) # now function choose.last works properly on numeric vectors:> choose.last(data[,3])1 2 3 0.4 NA 1.1 # but not on factors (I loose the factor labels):> choose.last(data[,2])1 2 3 3 1 1 # moreover, if I apply this function to the whole data.frame # the output is a character matrix> apply(data,2,choose.last)V1 V2 V3 1 "1" "red" "0.4" 2 "2" "blue" NA 3 "3" "blue" "1.1" # and if I sapply, I loose factors labels> sapply(data,choose.last)V1 V2 V3 1 1 3 0.4 2 2 1 NA 3 3 1 1.1 any hint? Thanks in advance, Patrizio +------------------------------------------------- | Patrizio Frederic, PhD | Research associate in Statistics, | Department of Economics, | University of Modena and Reggio Emilia, | Via Berengario 51, | 41100 Modena, Italy | | tel: +39 059 205 6727 | fax: +39 059 205 6947 | mail: patrizio.frederic at unimore.it +------------------------------------------------- ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Wed, Dec 10, 2008 at 4:09 PM, Patrizio Frederic <frederic.patrizio at gmail.com> wrote:> hi all, > I have a data frame such as: > > 1 blue 0.3 > 1 NA 0.4 > 1 red NA > 2 blue NA > 2 green NA > 2 blue NA > 3 red 0.5 > 3 blue NA > 3 NA 1.1 > > I wish to find the last non-missing value in every 3ple: ie I want a 3 > by 3 data.frame such as: > > 1 red 0.4 > 2 blue NA > 3 blue 1.1 > > I have written a little script > > data = structure(list(V1 = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L > ), V2 = structure(c(1L, NA, 3L, 1L, 2L, 1L, 3L, 1L, NA), .Label = c("blue", > "green", "red"), class = "factor"), V3 = c(0.3, 0.4, NA, NA, > NA, NA, 0.5, NA, 1.1)), .Names = c("V1", "V2", "V3"), class > "data.frame", row.names = c(NA, > -9L)) > > cl = function(x) x[max(which(!is.na(x)))]It's easily to do this with ddply from plyr: library(plyr) ddply(data, .(V1), colwise(cl)) In brief, this says to take the data frame called data and break it up into pieces defined by the variable V1. Then for each piece, calculate cl for each column, and then join all the pieces back together. Hadley -- http://had.co.nz/
> graphics,statistical analysis etc. as well as programming. There are just > too many possible data structures to expect logical consistency in their > handling throughout (if one can even define what that means in specific > instances!).I disagree with this claim: I think it is possible to create a logical and consistent set of functions for working with the all basic data structures in R, and this is what I have attempted to do with my plyr package. Any remaining inconsistencies are due to my failings, not the fundamental difficulty of the task. Hadley -- http://had.co.nz/
Hadley: Perhaps... But plyr works only on **basic** data structures, and I referred to all **possible** data strucures (deliberately); so I stand by my statement and note that you did not contradict it. -- Bert -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of hadley wickham Sent: Wednesday, December 10, 2008 3:52 PM To: Bert Gunter Cc: r-help at r-project.org Subject: Re: [R] repeated searching of no-missing values> graphics,statistical analysis etc. as well as programming. There are just > too many possible data structures to expect logical consistency in their > handling throughout (if one can even define what that means in specific > instances!).I disagree with this claim: I think it is possible to create a logical and consistent set of functions for working with the all basic data structures in R, and this is what I have attempted to do with my plyr package. Any remaining inconsistencies are due to my failings, not the fundamental difficulty of the task. Hadley -- http://had.co.nz/ ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.