thr3ads.net - R help - [R] Filtering an Entire Dataset based on Several Conditions [May 2022]

If this information is useful, please help other people find it:
Share via:

Rui Barradas

2022-May-09 09:22 UTC

[R] Filtering an Entire Dataset based on Several Conditions

Hello,

Something like this?
First normalize the data.
Then a apply loop creates a logical matrix giving which numbers are in 
the range -3 to 3.
If they are all TRUE then their sum by rows is equal to the number of 
columns. This creates a logical index i.
Use that index i to subset the scaled data set.

# test data set, remove the Species column (not numeric)
df1 <- iris[-5]

df1_norm <- scale(df1)
i <- rowSums(apply(df1_norm, 2, \(x) x > -3 & x < 3)) ==
ncol(df1_norm)

# returns a matrix
df1_norm[i, ]

# returns a data.frame
as.data.frame(df1_norm[i,])


Hope this helps,

Rui Barradas

?s 09:23 de 09/05/2022, Paul Bernal escreveu:> Dear friends,
> 
> I have a dataframe which every single (i,j) entry (i standing for ith row,
> j for jth column) has been normalized (converted to z-scores).
> 
> Now I want to filter or subset the dataframe so that I only end up with a a
> dataframe containing only entries greater than -3 or less than 3.
> 
> How could I accomplish this?
> 
> Best,
> Paul
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Paul Bernal

2022-May-09 16:44 UTC

head link

[R] Filtering an Entire Dataset based on Several Conditions

Dear Rui,

I was trying to dput() the datasets I am working on, but since it is a bit
large (42,000 rows by 60 columns) couldn?t retrieve all the structure of
the data to include it here, so I am attaching a couple of files. One is
the raw data (called trainFeatures42k), which is the data I need to
normalize, and the other is normalized_Data, which is the data normalized
(or at least I think I got to normalize it).

 Normalized_Data.csv
<https://drive.google.com/file/d/143I1O710gAqWjzx48Gt1bwUbrG0mbpfa/view?usp=drive_web>
 trainFeatures42k.xls
<https://drive.google.com/file/d/1deMzGMkJyeVsnRzTKirmm4VqIBRzbvzV/view?usp=drive_web>

I have tried some of the code you and other friends from the community have
kindly shared, but have not been able to filter values > -3 and < 3.

Thank you all for your valuable help always.
Best,
Paul

El lun, 9 may 2022 a las 4:22, Rui Barradas (<ruipbarradas at sapo.pt>)
escribi?:
> Hello,
>
> Something like this?
> First normalize the data.
> Then a apply loop creates a logical matrix giving which numbers are in
> the range -3 to 3.
> If they are all TRUE then their sum by rows is equal to the number of
> columns. This creates a logical index i.
> Use that index i to subset the scaled data set.
>
> # test data set, remove the Species column (not numeric)
> df1 <- iris[-5]
>
> df1_norm <- scale(df1)
> i <- rowSums(apply(df1_norm, 2, \(x) x > -3 & x < 3)) ==
ncol(df1_norm)
>
> # returns a matrix
> df1_norm[i, ]
>
> # returns a data.frame
> as.data.frame(df1_norm[i,])
>
>
> Hope this helps,
>
> Rui Barradas
>
> ?s 09:23 de 09/05/2022, Paul Bernal escreveu:
> > Dear friends,
> >
> > I have a dataframe which every single (i,j) entry (i standing for ith
> row,
> > j for jth column) has been normalized (converted to z-scores).
> >
> > Now I want to filter or subset the dataframe so that I only end up
with
> a a
> > dataframe containing only entries greater than -3 or less than 3.
> >
> > How could I accomplish this?
> >
> > Best,
> > Paul
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - May 2022 - Filtering an Entire Dataset based on Several Conditions

[R] Filtering an Entire Dataset based on Several Conditions

[R] Filtering an Entire Dataset based on Several Conditions