David McPearson
2020-Apr-17 05:49 UTC
[R] calculate row median of every three columns for a dataframe
Anna wrote:> > Hi all, > I need to calculate a row median for every three columns of a > dataframe. I made it work using the following script, but not happy > with the script. Is there a simpler way for doing this? >To which Jim L responded:> > Hi Anna, > I can't think of a simple way, but this function may make you happier: > > step_median<-function(x,window) { > x<-unlist(x) > stop<-length(x)-window+1 > xout<-NA > nindx<-1 > for(i in seq(1,stop,by=window)) { > xout[nindx]<-do.call("median",list(x[i:(i+window-1)])) > nindx<-nindx+1 > } > return(xout) > } > apply(df,1,step_median,3) > > This should return a matrix where the columns are the medians > calculated from blocks of "window" width on each row of "df". As Bert > noted, you may want to think about a "rolling" median where the > "windows" overlap. This can be done like so: > > library(zoo) > apply(df,1,rollmedian,3) > > JimAnother approach you might try is multiple calls to sapply/lapply. This won't rid you of loops, but it will hide them: # Example data. Some names changed to avoid collisions between # R functions (collisions are in the gap between the headphones, # not i R). dfr <- data.frame(a = c(2,3,4), b = c(3,5,1), c = c(1,3,6), d = c(7,2,1), e = c(2,5,3), f = c(4,5,1)) # Turn each of the three-column groups into their own element # in a list. Note: the subsetting (probably) fails with an # error if ncol(dfr) is not a multiple of 3 dlist <- lapply(seq(1, ncol(dfr), by = 3), function(enn) dfr[ , enn + 0:2]) # Then you can use sapply to calculate the row medians for each # of the elements.. # Both of the following seem to work. I'm not sure which is # more readable? sapply(dlist, function(xx) apply(xx, 1, median)) sapply(dlist, apply, 1, median) # I'm sure the cognoscenti will have a much more elegant way # of doing this. Cheers y'all, DMcP
PIKAL Petr
2020-Apr-17 06:53 UTC
[R] calculate row median of every three columns for a dataframe
Hi As usual in R, things could be done by different ways. idx <- (0:(ncol(dfr)-1))%/%3 aggregate(t(dfr), list(idx), median) Group.1 V1 V2 V3 1 0 2 3 4 2 1 4 5 1 Results should be OK although its structure is different, performance is not tested. Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of David McPearson > Sent: Friday, April 17, 2020 7:50 AM > To: r-help at r-project.org > Cc: dcmcp at telstra.com > Subject: Re: [R] calculate row median of every three columns for a dataframe > > Anna wrote: > > > > Hi all, > > I need to calculate a row median for every three columns of a > > dataframe. I made it work using the following script, but not happy > > with the script. Is there a simpler way for doing this? > > > > > > To which Jim L responded: > > > > Hi Anna, > > I can't think of a simple way, but this function may make you happier: > > > > step_median<-function(x,window) { > > x<-unlist(x) > > stop<-length(x)-window+1 > > xout<-NA > > nindx<-1 > > for(i in seq(1,stop,by=window)) { > > xout[nindx]<-do.call("median",list(x[i:(i+window-1)])) > > nindx<-nindx+1 > > } > > return(xout) > > } > > apply(df,1,step_median,3) > > > > This should return a matrix where the columns are the medians > > calculated from blocks of "window" width on each row of "df". As Bert > > noted, you may want to think about a "rolling" median where the > > "windows" overlap. This can be done like so: > > > > library(zoo) > > apply(df,1,rollmedian,3) > > > > Jim > > Another approach you might try is multiple calls to sapply/lapply. This won't > rid you of loops, but it will hide them: > > # Example data. Some names changed to avoid collisions between # R > functions (collisions are in the gap between the headphones, # not i R). > > dfr <- data.frame(a = c(2,3,4), b = c(3,5,1), c = c(1,3,6), > d = c(7,2,1), e = c(2,5,3), f = c(4,5,1)) > > # Turn each of the three-column groups into their own element # in a list. > Note: the subsetting (probably) fails with an # error if ncol(dfr) is not a > multiple of 3 > > dlist <- lapply(seq(1, ncol(dfr), by = 3), function(enn) > dfr[ , enn + 0:2]) > > # Then you can use sapply to calculate the row medians for each # of the > elements.. > > # Both of the following seem to work. I'm not sure which is # more readable? > > sapply(dlist, function(xx) apply(xx, 1, median)) > > sapply(dlist, apply, 1, median) > > # I'm sure the cognoscenti will have a much more elegant way # of doing this. > > > Cheers y'all, > DMcP > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Eric Berger
2020-Apr-17 09:36 UTC
[R] calculate row median of every three columns for a dataframe
Some comments on the contributions:
a) for Petr's suggestion, to return the desired structure modify the
statement to
t(aggregate(t(dfr), list(idx), median)[,-1])
And, although less readable, can certainly be put in a one-liner
solution by removing the idx definition
t(aggregate(t(dfr), list((0:(ncol(dfr)-1))%/%3), median)[,-1])
b) to DMcP: "# I'm sure the cognoscenti will have a much more elegant
way"
+1 for elegance (in my view)
c) to Jim: I think your code is instructive. From a style viewpoint I would
recommend against naming a local variable 'stop' :-)
Best,
Eric
On Fri, Apr 17, 2020 at 9:54 AM PIKAL Petr <petr.pikal at precheza.cz>
wrote:
> Hi
>
> As usual in R, things could be done by different ways.
>
> idx <- (0:(ncol(dfr)-1))%/%3
>
> aggregate(t(dfr), list(idx), median)
> Group.1 V1 V2 V3
> 1 0 2 3 4
> 2 1 4 5 1
>
> Results should be OK although its structure is different, performance is
> not tested.
>
> Cheers
> Petr
>
> > -----Original Message-----
> > From: R-help <r-help-bounces at r-project.org> On Behalf Of
David McPearson
> > Sent: Friday, April 17, 2020 7:50 AM
> > To: r-help at r-project.org
> > Cc: dcmcp at telstra.com
> > Subject: Re: [R] calculate row median of every three columns for a
> dataframe
> >
> > Anna wrote:
> > >
> > > Hi all,
> > > I need to calculate a row median for every three columns of a
> > > dataframe. I made it work using the following script, but not
happy
> > > with the script. Is there a simpler way for doing this?
> > >
> >
> >
> >
> > To which Jim L responded:
> > >
> > > Hi Anna,
> > > I can't think of a simple way, but this function may make you
happier:
> > >
> > > step_median<-function(x,window) {
> > > x<-unlist(x)
> > > stop<-length(x)-window+1
> > > xout<-NA
> > > nindx<-1
> > > for(i in seq(1,stop,by=window)) {
> > >
xout[nindx]<-do.call("median",list(x[i:(i+window-1)]))
> > > nindx<-nindx+1
> > > }
> > > return(xout)
> > > }
> > > apply(df,1,step_median,3)
> > >
> > > This should return a matrix where the columns are the medians
> > > calculated from blocks of "window" width on each row of
"df". As Bert
> > > noted, you may want to think about a "rolling" median
where the
> > > "windows" overlap. This can be done like so:
> > >
> > > library(zoo)
> > > apply(df,1,rollmedian,3)
> > >
> > > Jim
> >
> > Another approach you might try is multiple calls to sapply/lapply.
This
> won't
> > rid you of loops, but it will hide them:
> >
> > # Example data. Some names changed to avoid collisions between # R
> > functions (collisions are in the gap between the headphones, # not i
R).
> >
> > dfr <- data.frame(a = c(2,3,4), b = c(3,5,1), c = c(1,3,6),
> > d = c(7,2,1), e = c(2,5,3), f = c(4,5,1))
> >
> > # Turn each of the three-column groups into their own element # in a
> list.
> > Note: the subsetting (probably) fails with an # error if ncol(dfr) is
> not a
> > multiple of 3
> >
> > dlist <- lapply(seq(1, ncol(dfr), by = 3), function(enn)
> > dfr[ , enn + 0:2])
> >
> > # Then you can use sapply to calculate the row medians for each # of
the
> > elements..
> >
> > # Both of the following seem to work. I'm not sure which is # more
> readable?
> >
> > sapply(dlist, function(xx) apply(xx, 1, median))
> >
> > sapply(dlist, apply, 1, median)
> >
> > # I'm sure the cognoscenti will have a much more elegant way # of
doing
> this.
> >
> >
> > Cheers y'all,
> > DMcP
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]