thr3ads.net - R help - [R] subsetting a data.frame based on a specific group of columns [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Assa Yeroslaviz

2015-Nov-06 10:40 UTC

[R] subsetting a data.frame based on a specific group of columns

Hi,

I have a data frame with multiple columns, which are belong to several
groups
like that:
X1    X2    X3    Y1    Y2    Y3
1232    357    23    0    9871    72
0    71    9    811    795    743
43    919    1111    0    76    14

I would like to filter such rows out, where the sums in one group is lower
than a specifc value. For example, I would like to set all the values in a
group of cloums to zero, if the sum in one group is less than 100
In my example table I would like to set the values in the second row for
the three X-columns to 0, so that the table looks like that:

X1    X2    X3    Y1    Y2    Y3
1232    357    23    0    9871    72
0    0    0    811    795    743
43    919    1111    0    0    0

the same apply also for the Y-values in the last column.
Is there a more efficient way of doing it than going row by row and use the
apply function on each of the subgroups I have in the columns?

thanks
Assa

	[[alternative HTML version deleted]]

jim holtman

2015-Nov-06 13:29 UTC

head link

[R] subsetting a data.frame based on a specific group of columns

Is this what you want:
> x <- read.table(text = "X1    X2    X3    Y1    Y2    Y3+ 1232    357    23    0    9871    72
+ 0    71    9    811    795    743
+ 43    919    1111    0    76    14", header =
TRUE)> x    X1  X2   X3  Y1   Y2  Y3
1 1232 357   23   0 9871  72
2    0  71    9 811  795 743
3   43 919 1111   0   76  14>
> # create indices of columns that start with the same character
> indx <- split(seq(ncol(x)), substring(colnames(x), 1, 1))
> names(indx) <- NULL  # remove names so output not messed up
>
> result <- lapply(indx, function(a){+     row_sum <- rowSums(x[, a])
+     x[row_sum < 100, a] <- 0
+     x[, a]
+ })> # combine back together
> do.call(cbind, result)    X1  X2   X3  Y1   Y2  Y3
1 1232 357   23   0 9871  72
2    0   0    0 811  795 743
3   43 919 1111   0    0   0


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Nov 6, 2015 at 5:40 AM, Assa Yeroslaviz <frymor at gmail.com>
wrote:
> Hi,
>
> I have a data frame with multiple columns, which are belong to several
> groups
> like that:
> X1    X2    X3    Y1    Y2    Y3
> 1232    357    23    0    9871    72
> 0    71    9    811    795    743
> 43    919    1111    0    76    14
>
> I would like to filter such rows out, where the sums in one group is lower
> than a specifc value. For example, I would like to set all the values in a
> group of cloums to zero, if the sum in one group is less than 100
> In my example table I would like to set the values in the second row for
> the three X-columns to 0, so that the table looks like that:
>
> X1    X2    X3    Y1    Y2    Y3
> 1232    357    23    0    9871    72
> 0    0    0    811    795    743
> 43    919    1111    0    0    0
>
> the same apply also for the Y-values in the last column.
> Is there a more efficient way of doing it than going row by row and use the
> apply function on each of the subgroups I have in the columns?
>
> thanks
> Assa
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Assa Yeroslaviz

2015-Nov-06 13:53 UTC

head link

[R] subsetting a data.frame based on a specific group of columns

sorry, for the misunderstanding. here is a more elaborate description of
what i would like to achieve.

I have a data set of counts from a RNA-Seq experiment and would like to
filter reads with low counts. I don't want to set everything to 0
automatically.

I would like to set each categorical group (e.g. condition) to 0, if and
only if all replica in the group together have less than 100 reads.
in my examples I used X and Y to represents the categories. Ususally they
have a more distinct names like "control", "knockout1",
"dKo" etc.

So what I really like to do is to check if the sum of all the
"control"
samples is lower than 100. If so, set all control sample to 0. This I would
like to check *for each category* of every row of the data set.

I hope it is more clear now

thanks
Assa


On Fri, Nov 6, 2015 at 2:29 PM, jim holtman <jholtman at gmail.com> wrote:
> Is this what you want:
>
> > x <- read.table(text = "X1    X2    X3    Y1    Y2    Y3
> + 1232    357    23    0    9871    72
> + 0    71    9    811    795    743
> + 43    919    1111    0    76    14", header = TRUE)
> > x
>     X1  X2   X3  Y1   Y2  Y3
> 1 1232 357   23   0 9871  72
> 2    0  71    9 811  795 743
> 3   43 919 1111   0   76  14
> >
> > # create indices of columns that start with the same character
> > indx <- split(seq(ncol(x)), substring(colnames(x), 1, 1))
> > names(indx) <- NULL  # remove names so output not messed up
> >
> > result <- lapply(indx, function(a){
> +     row_sum <- rowSums(x[, a])
> +     x[row_sum < 100, a] <- 0
> +     x[, a]
> + })
> > # combine back together
> > do.call(cbind, result)
>     X1  X2   X3  Y1   Y2  Y3
> 1 1232 357   23   0 9871  72
> 2    0   0    0 811  795 743
> 3   43 919 1111   0    0   0
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Fri, Nov 6, 2015 at 5:40 AM, Assa Yeroslaviz <frymor at gmail.com>
wrote:
>
>> Hi,
>>
>> I have a data frame with multiple columns, which are belong to several
>> groups
>> like that:
>> X1    X2    X3    Y1    Y2    Y3
>> 1232    357    23    0    9871    72
>> 0    71    9    811    795    743
>> 43    919    1111    0    76    14
>>
>> I would like to filter such rows out, where the sums in one group is
lower
>> than a specifc value. For example, I would like to set all the values
in a
>> group of cloums to zero, if the sum in one group is less than 100
>> In my example table I would like to set the values in the second row
for
>> the three X-columns to 0, so that the table looks like that:
>>
>> X1    X2    X3    Y1    Y2    Y3
>> 1232    357    23    0    9871    72
>> 0    0    0    811    795    743
>> 43    919    1111    0    0    0
>>
>> the same apply also for the Y-values in the last column.
>> Is there a more efficient way of doing it than going row by row and use
>> the
>> apply function on each of the subgroups I have in the columns?
>>
>> thanks
>> Assa
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
	[[alternative HTML version deleted]]

R help - Nov 2015 - subsetting a data.frame based on a specific group of columns

[R] subsetting a data.frame based on a specific group of columns

[R] subsetting a data.frame based on a specific group of columns

[R] subsetting a data.frame based on a specific group of columns