Rui Barradas
2022-May-09 09:22 UTC
[R] Filtering an Entire Dataset based on Several Conditions
Hello, Something like this? First normalize the data. Then a apply loop creates a logical matrix giving which numbers are in the range -3 to 3. If they are all TRUE then their sum by rows is equal to the number of columns. This creates a logical index i. Use that index i to subset the scaled data set. # test data set, remove the Species column (not numeric) df1 <- iris[-5] df1_norm <- scale(df1) i <- rowSums(apply(df1_norm, 2, \(x) x > -3 & x < 3)) == ncol(df1_norm) # returns a matrix df1_norm[i, ] # returns a data.frame as.data.frame(df1_norm[i,]) Hope this helps, Rui Barradas ?s 09:23 de 09/05/2022, Paul Bernal escreveu:> Dear friends, > > I have a dataframe which every single (i,j) entry (i standing for ith row, > j for jth column) has been normalized (converted to z-scores). > > Now I want to filter or subset the dataframe so that I only end up with a a > dataframe containing only entries greater than -3 or less than 3. > > How could I accomplish this? > > Best, > Paul > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Paul Bernal
2022-May-09 16:44 UTC
[R] Filtering an Entire Dataset based on Several Conditions
Dear Rui, I was trying to dput() the datasets I am working on, but since it is a bit large (42,000 rows by 60 columns) couldn?t retrieve all the structure of the data to include it here, so I am attaching a couple of files. One is the raw data (called trainFeatures42k), which is the data I need to normalize, and the other is normalized_Data, which is the data normalized (or at least I think I got to normalize it). Normalized_Data.csv <drive.google.com/file/d/143I1O710gAqWjzx48Gt1bwUbrG0mbpfa/view?usp=drive_web> trainFeatures42k.xls <drive.google.com/file/d/1deMzGMkJyeVsnRzTKirmm4VqIBRzbvzV/view?usp=drive_web> I have tried some of the code you and other friends from the community have kindly shared, but have not been able to filter values > -3 and < 3. Thank you all for your valuable help always. Best, Paul El lun, 9 may 2022 a las 4:22, Rui Barradas (<ruipbarradas at sapo.pt>) escribi?:> Hello, > > Something like this? > First normalize the data. > Then a apply loop creates a logical matrix giving which numbers are in > the range -3 to 3. > If they are all TRUE then their sum by rows is equal to the number of > columns. This creates a logical index i. > Use that index i to subset the scaled data set. > > # test data set, remove the Species column (not numeric) > df1 <- iris[-5] > > df1_norm <- scale(df1) > i <- rowSums(apply(df1_norm, 2, \(x) x > -3 & x < 3)) == ncol(df1_norm) > > # returns a matrix > df1_norm[i, ] > > # returns a data.frame > as.data.frame(df1_norm[i,]) > > > Hope this helps, > > Rui Barradas > > ?s 09:23 de 09/05/2022, Paul Bernal escreveu: > > Dear friends, > > > > I have a dataframe which every single (i,j) entry (i standing for ith > row, > > j for jth column) has been normalized (converted to z-scores). > > > > Now I want to filter or subset the dataframe so that I only end up with > a a > > dataframe containing only entries greater than -3 or less than 3. > > > > How could I accomplish this? > > > > Best, > > Paul > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]