Joanne Demmler
2011-Aug-26 14:16 UTC
[R] elegant way to check if 2 values are in 3 columns?
Dear all, I'm trying to rerun some data linkage exercises in R (they are designed to be done in SPSS, SAS or STATA) The exercise in question is to relabel the column "treat" to "1", if "yearsep" is smaller than 1988 and columns "proc1"-"proc3" contain the values 56.36 or 59.81. My pathetic solution to do this in R currently looks like this: vaslinks4$treat <- 0 vaslinks4$treat[vaslinks4$yearsep < 1988 && (vaslinks4$proc1 %in% c(56.36,59.81) || vaslinks4$proc2 %in% c(56.36,59.81) || vaslinks4$proc3 %in% c(56.36,59.81))] <- 1 But I'm sure there is a more elegant solution for this, in which I would not have to call all three columns separately. Anyone? Yours Joanne Solution in SPSS: COMPUTE treat=0. FORMATS treat (F1). DO REPEAT proc=proc1 to proc3. DO IF (yearsep LT 1988). IF (proc EQ 56.36 OR proc EQ 59.81) treat = 1. END IF. END REPEAT. Solution in SAS: do i = 1 to 3 until (treat > 0); if yearsep < 1988 then do; if procs{i} in (56.36, 59.81) then treat = 1; else treat = 0; end; Solution in STATA: generate treat=0 foreach x in proc1 proc2 proc3 { recode treat(0=1) if ((`x'==56.36 | `x'==59.81) & yearsep<1988) | ((`x'>=63.70 &`x'<=63.79) & yearsep>=1988) } tab treat -- Joanne Demmler Ph.D. Research Assistant College of Medicine Swansea University Singleton Park Swansea SA2 8PP UK tel: +44 (0)1792 295674 fax: +44 (0)1792 513430 email: j.demmler at swansea.ac.uk DECIPHer: www.decipher.uk.net
You solution is not bad since it tries to make use of vectorized operations, but there are some problems in it. 1) you should be using "&" instead of "&&" and "|" instead of "||" (look at the help page to understand the difference. 2) FAQ 7.31: You are checking against floating point values (... %in% c(56.36,59.81)). You need to use all.equal mask <- with(vaslinks4, (yearsep < 1988) & (all.equal(proc1, 56.36) | all.equal(proc1, 59.81) | all.equal(proc2, 56.36) | all.equal(proc2, 59.81) | all.equal(proc3, 56.36) | all.equal(proc3, 59.81) ) ) vaslinks4$treat <- 0 # initialize vaslinks4$treat[mask] <- 1 You can probably also come up with a solution using "any(sapply(.....))", but it would not necessarily be any better than the above. The most important issue is FAQ 7.31. On Fri, Aug 26, 2011 at 10:16 AM, Joanne Demmler <J.Demmler at swansea.ac.uk> wrote:> Dear all, > > I'm trying to rerun some data linkage exercises in R (they are designed to > be done in SPSS, SAS or STATA) > The exercise in question is to relabel the column "treat" to "1", if > "yearsep" is smaller than 1988 and columns "proc1"-"proc3" contain the > values 56.36 or 59.81. > > My pathetic solution to do this in R currently looks like this: > > vaslinks4$treat <- 0 > > vaslinks4$treat[vaslinks4$yearsep < 1988 && (vaslinks4$proc1 %in% > c(56.36,59.81) > ? ? ? ? ? ?|| vaslinks4$proc2 %in% c(56.36,59.81) > ? ? ? ? ? ?|| vaslinks4$proc3 %in% c(56.36,59.81))] <- 1 > > But I'm sure there is a more elegant solution for this, in which I would not > have to call all three columns separately. > > Anyone? > Yours Joanne > > > > Solution in SPSS: > > COMPUTE treat=0. > FORMATS treat (F1). > DO REPEAT proc=proc1 to proc3. > DO IF (yearsep LT 1988). > IF (proc EQ 56.36 OR proc EQ 59.81) treat = 1. > END IF. > END REPEAT. > > Solution in SAS: > > do i = 1 to 3 until (treat > 0); > if yearsep < 1988 then do; > if procs{i} in (56.36, 59.81) then treat = 1; > else treat = 0; > end; > > Solution in STATA: > > generate treat=0 > foreach x in proc1 proc2 proc3 { > recode treat(0=1) if ((`x'==56.36 | `x'==59.81) & yearsep<1988) > | ((`x'>=63.70 &`x'<=63.79) & yearsep>=1988) > } > tab treat > > > -- > Joanne Demmler Ph.D. > Research Assistant > College of Medicine > Swansea University > Singleton Park > Swansea SA2 8PP > UK > tel: ? ? ? ? ? ?+44 (0)1792 295674 > fax: ? ? ? ? ? ?+44 (0)1792 513430 > email: ? ? ? ? ?j.demmler at swansea.ac.uk > DECIPHer: ? ? ? www.decipher.uk.net > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
David Winsemius
2011-Aug-26 14:39 UTC
[R] elegant way to check if 2 values are in 3 columns?
On Aug 26, 2011, at 10:16 AM, Joanne Demmler wrote:> Dear all, > > I'm trying to rerun some data linkage exercises in R (they are > designed to be done in SPSS, SAS or STATA) > The exercise in question is to relabel the column "treat" to "1", if > "yearsep" is smaller than 1988 and columns "proc1"-"proc3" contain > the values 56.36 or 59.81. > > My pathetic solution to do this in R currently looks like this: > > vaslinks4$treat <- 0Why not just? # skip the setting to 0 step vaslinks4$treat <- as.integer( with( vaslinks, yearsep < 1988 & (proc2 %in% c(56.36,59.81) | proc3 %in% c(56.36,59.81) ) ) ) Maybe, but watch out about testing for equality or set membership when using floating point numbers.> > vaslinks4$treat[vaslinks4$yearsep < 1988 && (vaslinks4$proc1 %in% > c(56.36,59.81) > || vaslinks4$proc2 %in% c(56.36,59.81) > || vaslinks4$proc3 %in% c(56.36,59.81))] <- 1A doomed strategy. Don't use "&&" or "||' when working with vectors.> > But I'm sure there is a more elegant solution for this, in which I > would not have to call all three columns separately.> > Anyone? > Yours Joanne >Snipped SAS and Stata code. -- David Winsemius, MD West Hartford, CT
Slight change; forgot that 'all.equal' only accepts a single value: (and it is a little shorter f.x <- function(x) any(all.equal(x, 56.36) || all.equal(59.91)) mask <- with(vaslinks4, (yearsep < 1988) & (sapply(proc1, f.x) | sapply(proc2, f.x) | sapply(proc3, f.x) ) ) vaslinks4$treat <- 0 # initialize vaslinks4$treat[mask] <- 1 On Fri, Aug 26, 2011 at 10:16 AM, Joanne Demmler <J.Demmler at swansea.ac.uk> wrote:> Dear all, > > I'm trying to rerun some data linkage exercises in R (they are designed to > be done in SPSS, SAS or STATA) > The exercise in question is to relabel the column "treat" to "1", if > "yearsep" is smaller than 1988 and columns "proc1"-"proc3" contain the > values 56.36 or 59.81. > > My pathetic solution to do this in R currently looks like this: > > vaslinks4$treat <- 0 > > vaslinks4$treat[vaslinks4$yearsep < 1988 && (vaslinks4$proc1 %in% > c(56.36,59.81) > ? ? ? ? ? ?|| vaslinks4$proc2 %in% c(56.36,59.81) > ? ? ? ? ? ?|| vaslinks4$proc3 %in% c(56.36,59.81))] <- 1 > > But I'm sure there is a more elegant solution for this, in which I would not > have to call all three columns separately. > > Anyone? > Yours Joanne > > > > Solution in SPSS: > > COMPUTE treat=0. > FORMATS treat (F1). > DO REPEAT proc=proc1 to proc3. > DO IF (yearsep LT 1988). > IF (proc EQ 56.36 OR proc EQ 59.81) treat = 1. > END IF. > END REPEAT. > > Solution in SAS: > > do i = 1 to 3 until (treat > 0); > if yearsep < 1988 then do; > if procs{i} in (56.36, 59.81) then treat = 1; > else treat = 0; > end; > > Solution in STATA: > > generate treat=0 > foreach x in proc1 proc2 proc3 { > recode treat(0=1) if ((`x'==56.36 | `x'==59.81) & yearsep<1988) > | ((`x'>=63.70 &`x'<=63.79) & yearsep>=1988) > } > tab treat > > > -- > Joanne Demmler Ph.D. > Research Assistant > College of Medicine > Swansea University > Singleton Park > Swansea SA2 8PP > UK > tel: ? ? ? ? ? ?+44 (0)1792 295674 > fax: ? ? ? ? ? ?+44 (0)1792 513430 > email: ? ? ? ? ?j.demmler at swansea.ac.uk > DECIPHer: ? ? ? www.decipher.uk.net > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Maybe Matching Threads
- pivot table
- dtracing a forked process OR dynamic library
- [ win32utils-Bugs-28840 ] wrong process_id is returned if using create multiple times for IE
- Matching Problem: Want to match to data.frame with inexact matching identifier (one identifier has to be in the range of the other).
- (no subject)