Dear R-Listers, My question concerns indexing vectors by logical vectors that are based on the original vector. Consider the following simple example to hopefully make clear what I mean: a <- rnorm(10) a[a<0] <- NA However, I am now working with multiple data frames that I received, where each of them has nicely descriptive, yet long names(). In my scripts there are many instances where operations similar to the one above are required. Again a simple example: some.data.frame <- data.frame(some.long.variable.name=rnorm(10), some.other.long.variable.name=rnorm(10)) some.data.frame$some.other.long.variable.name[some.data.frame$some.other.long.variable.name < 0] <- NA The fact that the names are so long makes things not very readable in the script and hard to debug. Is there a way in R to refer to the "self" of whatever is being indexed? I am looking for something like some.data.frame$some.other.long.variable.name[.self < 0] <- NA that would accomplish the same result as above. Or is there another concise, but less messy way to do this? I prefer not attaching the data.frames and partial matching makes things even more messy since many names() are very similar. I know I could just rename everything, but I'd like to learn if there is and easy or obvious way to do this in R that I have missed so far. I would appreciate any advice, and I apologize if this topic has been discussed before. > sessionInfo() R version 2.11.0 (2010-04-22) x86_64-redhat-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Christian Raschke Department of Economics and ISDS Research Lab (HSRG) Louisiana State University crasch2 at lsu.edu
On Jul 19, 2010, at 7:16 PM, Christian Raschke wrote:> Dear R-Listers, > > My question concerns indexing vectors by logical vectors that are > based on the original vector. Consider the following simple example > to hopefully make clear what I mean: > > a <- rnorm(10) > a[a<0] <- NA > > However, I am now working with multiple data frames that I received, > where each of them has nicely descriptive, yet long names(). In my > scripts there are many instances where operations similar to the one > above are required. Again a simple example: > > > some.data.frame <- data.frame(some.long.variable.name=rnorm(10), > some.other.long.variable.name=rnorm(10)) > > some.data.frame$some.other.long.variable.name[some.data.frame > $some.other.long.variable.name < 0] <- NA > > > The fact that the names are so long makes things not very readable > in the script and hard to debug. Is there a way in R to refer to the > "self" of whatever is being indexed? I am looking for something like > > some.data.frame$some.other.long.variable.name[.self < 0] <- NAThere is an alternative, "is.na()<-" which I think is a bit more readable: is.na($some.other.long.variable.name) <- some.data.frame $some.other.long.variable.name < 0 But do _not_ do: with(some.data.frame, is.na(some.other.long.variable.name) <- some.other.long.variable.name < 0 ) -- David.> > that would accomplish the same result as above. Or is there another > concise, but less messy way to do this? I prefer not attaching the > data.frames and partial matching makes things even more messy since > many names() are very similar. I know I could just rename > everything, but I'd like to learn if there is and easy or obvious > way to do this in R that I have missed so far. > > I would appreciate any advice, and I apologize if this topic has > been discussed before. > > > > sessionInfo() > R version 2.11.0 (2010-04-22) > x86_64-redhat-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > -- > Christian Raschke > Department of Economics > and > ISDS Research Lab (HSRG) > Louisiana State University > crasch2 at lsu.edu > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
As far as I know the answer to your question is "No", but there are things you can do to improve the readability of your code. One thing I find useful is to avoid using "$" as much as possible and to favour things like with() and within(). The first thing you might do is think about choosing shorter names, of course. If that's not possible, you could try something like this. ensureNN <- function(x) { # "ensure non-negative" is.na(x[x < 0]) <- TRUE x } some.data.frame <- within(some.data.frame, { some.long.variable.name <- ensureNN(some.long.variable.name) some.other.long.variable.name <- ensureNN(some.other.long.variable.name) }) Of course if you wanted to do this to all variables in a data frame you could do some.data.frame <- data.frame(lapply(some.data.frame, ensureNN)) and it all happens, no questions asled. (I can see a generic function emerging here, perhaps...) W. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Christian Raschke Sent: Tuesday, 20 July 2010 9:16 AM To: r-help at r-project.org Subject: [R] Indexing by logical vectors Dear R-Listers, My question concerns indexing vectors by logical vectors that are based on the original vector. Consider the following simple example to hopefully make clear what I mean: a <- rnorm(10) a[a<0] <- NA However, I am now working with multiple data frames that I received, where each of them has nicely descriptive, yet long names(). In my scripts there are many instances where operations similar to the one above are required. Again a simple example: some.data.frame <- data.frame(some.long.variable.name=rnorm(10), some.other.long.variable.name=rnorm(10)) some.data.frame$some.other.long.variable.name[some.data.frame$some.other.long.variable.name < 0] <- NA The fact that the names are so long makes things not very readable in the script and hard to debug. Is there a way in R to refer to the "self" of whatever is being indexed? I am looking for something like some.data.frame$some.other.long.variable.name[.self < 0] <- NA that would accomplish the same result as above. Or is there another concise, but less messy way to do this? I prefer not attaching the data.frames and partial matching makes things even more messy since many names() are very similar. I know I could just rename everything, but I'd like to learn if there is and easy or obvious way to do this in R that I have missed so far. I would appreciate any advice, and I apologize if this topic has been discussed before. > sessionInfo() R version 2.11.0 (2010-04-22) x86_64-redhat-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Christian Raschke Department of Economics and ISDS Research Lab (HSRG) Louisiana State University crasch2 at lsu.edu ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Mon, 2010-07-19 at 19:46 -0400, David Winsemius wrote:> On Jul 19, 2010, at 7:16 PM, Christian Raschke wrote: > > > Dear R-Listers, > > > > My question concerns indexing vectors by logical vectors that are > > based on the original vector. Consider the following simple example > > to hopefully make clear what I mean: > > > > a <- rnorm(10) > > a[a<0] <- NA > > > > However, I am now working with multiple data frames that I received, > > where each of them has nicely descriptive, yet long names(). In my > > scripts there are many instances where operations similar to the one > > above are required. Again a simple example: > > > > > > some.data.frame <- data.frame(some.long.variable.name=rnorm(10), > > some.other.long.variable.name=rnorm(10)) > > > > some.data.frame$some.other.long.variable.name[some.data.frame > > $some.other.long.variable.name < 0] <- NA > > > > > > The fact that the names are so long makes things not very readable > > in the script and hard to debug. Is there a way in R to refer to the > > "self" of whatever is being indexed? I am looking for something like > > > > some.data.frame$some.other.long.variable.name[.self < 0] <- NA > > There is an alternative, "is.na()<-" which I think is a bit more > readable: > > is.na($some.other.long.variable.name) <- some.data.frame > $some.other.long.variable.name < 0Thanks, David! As written, this throws and error. However, is.na(some.data.frame$some.other.long.variable.name) <- some.data.frame $some.other.long.variable.name < 0 works, but does not seem like much of an improvement to me.> > But do _not_ do: > > with(some.data.frame, is.na(some.other.long.variable.name) <- > some.other.long.variable.name < 0 ) >