Assa Yeroslaviz
2015-Nov-06 10:40 UTC
[R] subsetting a data.frame based on a specific group of columns
Hi, I have a data frame with multiple columns, which are belong to several groups like that: X1 X2 X3 Y1 Y2 Y3 1232 357 23 0 9871 72 0 71 9 811 795 743 43 919 1111 0 76 14 I would like to filter such rows out, where the sums in one group is lower than a specifc value. For example, I would like to set all the values in a group of cloums to zero, if the sum in one group is less than 100 In my example table I would like to set the values in the second row for the three X-columns to 0, so that the table looks like that: X1 X2 X3 Y1 Y2 Y3 1232 357 23 0 9871 72 0 0 0 811 795 743 43 919 1111 0 0 0 the same apply also for the Y-values in the last column. Is there a more efficient way of doing it than going row by row and use the apply function on each of the subgroups I have in the columns? thanks Assa [[alternative HTML version deleted]]
jim holtman
2015-Nov-06 13:29 UTC
[R] subsetting a data.frame based on a specific group of columns
Is this what you want:> x <- read.table(text = "X1 X2 X3 Y1 Y2 Y3+ 1232 357 23 0 9871 72 + 0 71 9 811 795 743 + 43 919 1111 0 76 14", header = TRUE)> xX1 X2 X3 Y1 Y2 Y3 1 1232 357 23 0 9871 72 2 0 71 9 811 795 743 3 43 919 1111 0 76 14> > # create indices of columns that start with the same character > indx <- split(seq(ncol(x)), substring(colnames(x), 1, 1)) > names(indx) <- NULL # remove names so output not messed up > > result <- lapply(indx, function(a){+ row_sum <- rowSums(x[, a]) + x[row_sum < 100, a] <- 0 + x[, a] + })> # combine back together > do.call(cbind, result)X1 X2 X3 Y1 Y2 Y3 1 1232 357 23 0 9871 72 2 0 0 0 811 795 743 3 43 919 1111 0 0 0 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Fri, Nov 6, 2015 at 5:40 AM, Assa Yeroslaviz <frymor at gmail.com> wrote:> Hi, > > I have a data frame with multiple columns, which are belong to several > groups > like that: > X1 X2 X3 Y1 Y2 Y3 > 1232 357 23 0 9871 72 > 0 71 9 811 795 743 > 43 919 1111 0 76 14 > > I would like to filter such rows out, where the sums in one group is lower > than a specifc value. For example, I would like to set all the values in a > group of cloums to zero, if the sum in one group is less than 100 > In my example table I would like to set the values in the second row for > the three X-columns to 0, so that the table looks like that: > > X1 X2 X3 Y1 Y2 Y3 > 1232 357 23 0 9871 72 > 0 0 0 811 795 743 > 43 919 1111 0 0 0 > > the same apply also for the Y-values in the last column. > Is there a more efficient way of doing it than going row by row and use the > apply function on each of the subgroups I have in the columns? > > thanks > Assa > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Assa Yeroslaviz
2015-Nov-06 13:53 UTC
[R] subsetting a data.frame based on a specific group of columns
sorry, for the misunderstanding. here is a more elaborate description of what i would like to achieve. I have a data set of counts from a RNA-Seq experiment and would like to filter reads with low counts. I don't want to set everything to 0 automatically. I would like to set each categorical group (e.g. condition) to 0, if and only if all replica in the group together have less than 100 reads. in my examples I used X and Y to represents the categories. Ususally they have a more distinct names like "control", "knockout1", "dKo" etc. So what I really like to do is to check if the sum of all the "control" samples is lower than 100. If so, set all control sample to 0. This I would like to check *for each category* of every row of the data set. I hope it is more clear now thanks Assa On Fri, Nov 6, 2015 at 2:29 PM, jim holtman <jholtman at gmail.com> wrote:> Is this what you want: > > > x <- read.table(text = "X1 X2 X3 Y1 Y2 Y3 > + 1232 357 23 0 9871 72 > + 0 71 9 811 795 743 > + 43 919 1111 0 76 14", header = TRUE) > > x > X1 X2 X3 Y1 Y2 Y3 > 1 1232 357 23 0 9871 72 > 2 0 71 9 811 795 743 > 3 43 919 1111 0 76 14 > > > > # create indices of columns that start with the same character > > indx <- split(seq(ncol(x)), substring(colnames(x), 1, 1)) > > names(indx) <- NULL # remove names so output not messed up > > > > result <- lapply(indx, function(a){ > + row_sum <- rowSums(x[, a]) > + x[row_sum < 100, a] <- 0 > + x[, a] > + }) > > # combine back together > > do.call(cbind, result) > X1 X2 X3 Y1 Y2 Y3 > 1 1232 357 23 0 9871 72 > 2 0 0 0 811 795 743 > 3 43 919 1111 0 0 0 > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Fri, Nov 6, 2015 at 5:40 AM, Assa Yeroslaviz <frymor at gmail.com> wrote: > >> Hi, >> >> I have a data frame with multiple columns, which are belong to several >> groups >> like that: >> X1 X2 X3 Y1 Y2 Y3 >> 1232 357 23 0 9871 72 >> 0 71 9 811 795 743 >> 43 919 1111 0 76 14 >> >> I would like to filter such rows out, where the sums in one group is lower >> than a specifc value. For example, I would like to set all the values in a >> group of cloums to zero, if the sum in one group is less than 100 >> In my example table I would like to set the values in the second row for >> the three X-columns to 0, so that the table looks like that: >> >> X1 X2 X3 Y1 Y2 Y3 >> 1232 357 23 0 9871 72 >> 0 0 0 811 795 743 >> 43 919 1111 0 0 0 >> >> the same apply also for the Y-values in the last column. >> Is there a more efficient way of doing it than going row by row and use >> the >> apply function on each of the subgroups I have in the columns? >> >> thanks >> Assa >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >[[alternative HTML version deleted]]