Arin Basu
2007-May-17 17:59 UTC
[R] How to select specific rows from a data frame based on values
Dear Group: I am working with a data frame containing 316 rows of individuals with 79 variables. Each of these 79 variables have values that range between -4 to +4, and I want to subset this data frame so that in the resulting new dataframe, values of _all_ of these variables should range between -3 and +3. Let's say I have the following dataframe (it's a toy example with 4 individuals and 5 variables): subj1 <- cbind(-4, -3, -1, -5, -7) subj2 <- cbind(-2, -1, -1, -2, +2) subj3 <- cbind(+2, +1, +2, +1, +2) subj4 <- cbind(-4, -1, -2, +2, +1, +1) mydf <- as.data.frame(rbind(subj1, subj2, subj3, subj4))>From mydf, I want to generate a new dataframe (let's call it mydf1)which will have records of only subj2 and subj3 in it since only these two individuals had all values for variables V1 through V5 in mydf to range between -3 and +3. Documentation on subsetting and indexing data frames did not help to solve this specific problem. There may be an obvious solution to it but I just cannot seem to get it. Would greatly appreciate your inputs. [relevant information: R-version: 2.4.1, running on Windows XP] /Arin Basu
jim holtman
2007-May-17 18:13 UTC
[R] How to select specific rows from a data frame based on values
Try this:> subj1 <- cbind(-4, -3, -1, -5, -7) > subj2 <- cbind(-2, -1, -1, -2, +2) > subj3 <- cbind(+2, +1, +2, +1, +2) > subj4 <- cbind(-4, -1, -2, +2, +1) > mydf <- as.data.frame(rbind(subj1, subj2, subj3, subj4)) > mydfV1 V2 V3 V4 V5 1 -4 -3 -1 -5 -7 2 -2 -1 -1 -2 2 3 2 1 2 1 2 4 -4 -1 -2 2 1> apply(mydf, 1, function(x)all(x>-3) & all(x < 3))[1] FALSE TRUE TRUE FALSE> mydf[apply(mydf, 1, function(x)all(x>-3) & all(x < 3)),]V1 V2 V3 V4 V5 2 -2 -1 -1 -2 2 3 2 1 2 1 2>On 5/17/07, Arin Basu <arin.basu@gmail.com> wrote:> > Dear Group: > > I am working with a data frame containing 316 rows of individuals > with 79 variables. Each of these 79 variables have values that range > between -4 to +4, and I want to subset this data frame so that in the > resulting new dataframe, values of _all_ of these variables should > range between -3 and +3. > > Let's say I have the following dataframe (it's a toy example with 4 > individuals and 5 variables): > > subj1 <- cbind(-4, -3, -1, -5, -7) > subj2 <- cbind(-2, -1, -1, -2, +2) > subj3 <- cbind(+2, +1, +2, +1, +2) > subj4 <- cbind(-4, -1, -2, +2, +1, +1) > > mydf <- as.data.frame(rbind(subj1, subj2, subj3, subj4)) > > >From mydf, I want to generate a new dataframe (let's call it mydf1) > which will have records of only subj2 and subj3 in it since only these > two individuals had all values for variables V1 through V5 in mydf to > range between -3 and +3. > > Documentation on subsetting and indexing data frames did not help to > solve this specific problem. There may be an obvious solution to it > but I just cannot seem to get it. > > Would greatly appreciate your inputs. > > [relevant information: R-version: 2.4.1, running on Windows XP] > > /Arin Basu > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
Chuck Cleland
2007-May-17 18:19 UTC
[R] How to select specific rows from a data frame based on values
Arin Basu wrote:> Dear Group: > > I am working with a data frame containing 316 rows of individuals > with 79 variables. Each of these 79 variables have values that range > between -4 to +4, and I want to subset this data frame so that in the > resulting new dataframe, values of _all_ of these variables should > range between -3 and +3. > > Let's say I have the following dataframe (it's a toy example with 4 > individuals and 5 variables): > > subj1 <- cbind(-4, -3, -1, -5, -7) > subj2 <- cbind(-2, -1, -1, -2, +2) > subj3 <- cbind(+2, +1, +2, +1, +2) > subj4 <- cbind(-4, -1, -2, +2, +1, +1) > > mydf <- as.data.frame(rbind(subj1, subj2, subj3, subj4)) > >>From mydf, I want to generate a new dataframe (let's call it mydf1) > which will have records of only subj2 and subj3 in it since only these > two individuals had all values for variables V1 through V5 in mydf to > range between -3 and +3. > > Documentation on subsetting and indexing data frames did not help to > solve this specific problem. There may be an obvious solution to it > but I just cannot seem to get it. > > Would greatly appreciate your inputs.mydf1 <- mydf[apply(mydf >= -3 & mydf <= 3, MARGIN=1, FUN=all),]> [relevant information: R-version: 2.4.1, running on Windows XP] > > /Arin Basu > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894