Hi All, There are a variety of functions that can be applied to a variable (column) in a data frame: mean, min, max, sd, range, IQR, etc. I am aware of only two that work on the rows, using q1-q3 as example variables: rowMeans(cbind(q1,q2,q3),na.rm=T) #mean of multiple variables rowSums (cbind(q1,q2,q3),na.rm=T) #sum of multiple variables Can the standard column functions (listed in the first sentence) be applied to rows, with the use of correct indexes to reference the columns of interest? Or, must these summary functions be programmed separately to work on a row? Thanks, Gerard [[alternative HTML version deleted]]
Robert A LaBudde
2007-Sep-15 16:09 UTC
[R] applying math/stat functions to rows in data frame
At 12:02 PM 9/15/2007, Gerald wrote:>Hi All, > >There are a variety of functions that can be applied to a variable >(column) in a data frame: mean, min, max, sd, range, IQR, etc. > >I am aware of only two that work on the rows, using q1-q3 as example >variables: > >rowMeans(cbind(q1,q2,q3),na.rm=T) #mean of multiple variables >rowSums (cbind(q1,q2,q3),na.rm=T) #sum of multiple variables > >Can the standard column functions (listed in the first sentence) be >applied to rows, with the use of correct indexes to reference the >columns of interest? Or, must these summary functions be programmed >separately to work on a row?Try using t() to transpose the matrix, and then apply the column function of interest. ===============================================================Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"
Marc Schwartz
2007-Sep-15 16:32 UTC
[R] applying math/stat functions to rows in data frame
On Sat, 2007-09-15 at 09:02 -0700, Gerard Smits wrote:> Hi All, > > There are a variety of functions that can be applied to a variable > (column) in a data frame: mean, min, max, sd, range, IQR, etc. > > I am aware of only two that work on the rows, using q1-q3 as example > variables: > > rowMeans(cbind(q1,q2,q3),na.rm=T) #mean of multiple variables > rowSums (cbind(q1,q2,q3),na.rm=T) #sum of multiple variables > > Can the standard column functions (listed in the first sentence) be > applied to rows, with the use of correct indexes to reference the > columns of interest? Or, must these summary functions be programmed > separately to work on a row? > > Thanks, > > GerardThe answer is: it depends If the row can be coerced to a numeric vector, then yes. This presumes that the data frame contains a single data type or the subset of columns you need contains a single data type. If the row contains multiple data types, then the row becomes a single row data frame or a list and you would have to consider other possible approaches. For example: Taking the first row of the 'iris' dataset becomes a single row data frame:> str(iris[1, ])'data.frame': 1 obs. of 5 variables: $ Sepal.Length: num 5.1 $ Sepal.Width : num 3.5 $ Petal.Length: num 1.4 $ Petal.Width : num 0.2 $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 or if you set 'drop = TRUE', a list:> str(iris[1, , drop = TRUE])List of 5 $ Sepal.Length: num 5.1 $ Sepal.Width : num 3.5 $ Petal.Length: num 1.4 $ Petal.Width : num 0.2 $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 If however, you remove the last column Species, which is a factor, you can coerce the remaining object to a numeric matrix:> str(as.matrix(iris[, -5]))num [1:150, 1:4] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" Some functions will do this coercion internally: For example:> rowSums(iris)Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric However:> head(rowSums(iris[, -5]))[1] 10.2 9.5 9.4 9.4 10.2 11.4 HTH, Marc Schwartz
Gavin Simpson
2007-Sep-15 16:36 UTC
[R] applying math/stat functions to rows in data frame
On Sat, 2007-09-15 at 09:02 -0700, Gerard Smits wrote:> Hi All, > > There are a variety of functions that can be applied to a variable > (column) in a data frame: mean, min, max, sd, range, IQR, etc.But one their own, these are not equivalents to rowMeans, rowSums etc below.> > I am aware of only two that work on the rows, using q1-q3 as example > variables: > > rowMeans(cbind(q1,q2,q3),na.rm=T) #mean of multiple variables > rowSums (cbind(q1,q2,q3),na.rm=T) #sum of multiple variablesIf you really want to apply a function to the individual rows of a matrix-like object then apply() is your friend: ?rowMeans states: Details: These functions are equivalent to use of 'apply' with 'FUN = mean' or 'FUN = sum' with appropriate margins, but are a lot faster. So see ?apply and argument 'margin'. For rows use margin = 1, e.g.: dat <- matrix(runif(1000), ncol = 100) apply(dat, 1, mean) rowMeans(dat)> > Can the standard column functions (listed in the first sentence) be > applied to rows, with the use of correct indexes to reference the > columns of interest? Or, must these summary functions be programmed > separately to work on a row?You can only use those functions on a column via subsetting, e.g.: mean(dat[,4]) min(dat[,4]) If all you want is a single row (the equivalent of what you seem to be asking) then these also work: mean(dat[4,]) min(dat[4,]) HTH G> > Thanks, > > Gerard > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%