Hello,
I'm trying to create a custom function that "mean-centers" data
and can be
applied across many columns.
Here is an example dataset, which is similar to my dataset:
*Location,TimePeriod,Units,AveragePrice*
Los Angeles,5/1/11,61,5.42
Los Angeles,5/8/11,49,4.69
Los Angeles,5/15/11,40,5.05
New York,5/1/11,259,6.4
New York,5/8/11,187,5.3
New York,5/15/11,177,5.7
Paris,5/1/11,672,6.26
Paris,5/8/11,514,5.3
Paris,5/15/11,455,5.2
I want to mean-center the "Units" and "AveragePrice"
Columns.
So, I created this function:
specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }
If I use only "one" column in the first argument of the "by"
function,
everything is in fine. For example the following code will work fine:
by(data[c("Units")],
data["Location"],
specialFunction)
But the following code will "not" work, because I have "two"
columns in the
first argument...
by(data[c("Units", "AveragePrice")],
data["Location"],
specialFunction)
Does anyone have any ideas as to what I am doing wrong?
Please note that I'm trying to get the following results (for the "Los
Angeles" group):
Los Angeles "Units" variable (Mean-Centered)
0.213682659
-0.005370907
-0.208311751
Los Angeles "AveragePrice" variable (Mean-Centered)
0.071790268
-0.072872965
0.001082696
Best Regards,
Ray DiGiacomo, Jr.
Healthcare Predictive Analytics Specialist
President, Lion Data Systems LLC
President, The Orange County R User Group
Board Member, TDWI
rayd@liondatasystems.com
(m) 408-425-7851
San Juan Capistrano, California USA
twitter.com/liondatasystems
linkedin.com/in/raydigiacomojr
youtube.com/user/liondatasystems/videos
[[alternative HTML version deleted]]
Hello,
I'm trying to create a custom function that "mean-centers" data
and can be
applied across many columns.
Here is an example dataset, which is similar to my dataset:
*Location,TimePeriod,Units,AveragePrice*
Los Angeles,5/1/11,61,5.42
Los Angeles,5/8/11,49,4.69
Los Angeles,5/15/11,40,5.05
New York,5/1/11,259,6.4
New York,5/8/11,187,5.3
New York,5/15/11,177,5.7
Paris,5/1/11,672,6.26
Paris,5/8/11,514,5.3
Paris,5/15/11,455,5.2
I want to mean-center the "Units" and "AveragePrice"
Columns.
So, I created this function:
specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }
If I use only "one" column in the first argument of the "by"
function,
everything is in fine. For example the following code will work fine:
by(data[c("Units")],
data["Location"],
specialFunction)
But the following code will "not" work, because I have "two"
columns in the
first argument...
by(data[c("Units", "AveragePrice")],
data["Location"],
specialFunction)
Does anyone have any ideas as to what I am doing wrong?
Please note that I'm trying to get the following results (for the "Los
Angeles" group):
Los Angeles "Units" variable (Mean-Centered)
0.213682659
-0.005370907
-0.208311751
Los Angeles "AveragePrice" variable (Mean-Centered)
0.071790268
-0.072872965
0.001082696
Best Regards,
Ray DiGiacomo, Jr.
Healthcare Predictive Analytics Specialist
President, Lion Data Systems LLC
President, The Orange County R User Group
Board Member, TDWI
rayd@liondatasystems.com
(m) 408-425-7851
San Juan Capistrano, California USA
twitter.com/liondatasystems
linkedin.com/in/raydigiacomojr
youtube.com/user/liondatasystems/videos
[[alternative HTML version deleted]]
On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote:> Hello, > > I'm trying to create a custom function that "mean-centers" data and > can be > applied across many columns. > > Here is an example dataset, which is similar to my dataset: > >dat <- read.table(text="Location,TimePeriod,Units,AveragePrice Los Angeles,5/1/11,61,5.42 Los Angeles,5/8/11,49,4.69 Los Angeles,5/15/11,40,5.05 New York,5/1/11,259,6.4 New York,5/8/11,187,5.3 New York,5/15/11,177,5.7 Paris,5/1/11,672,6.26 Paris,5/8/11,514,5.3 Paris,5/15/11,455,5.2", header=TRUE, sep=",")> > I want to mean-center the "Units" and "AveragePrice" Columns. > > So, I created this function: > > specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }I needed to modify this to avoid errors relating to how colMeans is expecting its arguments: specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) } aggregate(dat[3:4], dat[1], FUN=specialFunction2) Location Units.1 Units.2 Units.3 AveragePrice.1 AveragePrice.2 1 Los Angeles 0.2136827 -0.0053709 -0.2083118 0.0717903 -0.0728730 2 New York 0.2354659 -0.0902535 -0.1452124 0.1014743 -0.0871168 3 Paris 0.2193320 -0.0487031 -0.1706289 0.1173316 -0.0491417 AveragePrice.3 1 0.0010827 2 -0.0143575 3 -0.0681899> > If I use only "one" column in the first argument of the "by" function, > everything is in fine. For example the following code will work fine: > > by(data[c("Units")], > data["Location"], > specialFunction) > > But the following code will "not" work, because I have "two" columns > in the > first argument... > > by(data[c("Units", "AveragePrice")], > data["Location"], > specialFunction)OK. So then I tried this with your function and was surprised to see that it also works: > by(dat[c("Units", "AveragePrice")], + dat["Location"], + specialFunction) Location: Los Angeles Units AveragePrice 1 0.21368 0.0717903 2 2.27351 -2.3517586 3 -0.20831 0.0010827 ------------------------------------------------------------------ Location: New York Units AveragePrice 4 0.23547 0.101474 5 3.47628 -3.653655 6 -0.14521 -0.014357 ------------------------------------------------------------------ Location: Paris Units AveragePrice 7 0.21933 0.11733 8 4.52537 -4.62322 9 -0.17063 -0.06819> > Does anyone have any ideas as to what I am doing wrong?I guess I don't. Cannot reproduce and my other methods worked as well.This also works with your version and with mine but I get the deprecation message for `mean.data.frame` from mine: > lapply( split(dat[3:4], dat[1]) , FUN=specialFunction ) $`Los Angeles` Units AveragePrice 1 0.21368 0.0717903 2 2.27351 -2.3517586 3 -0.20831 0.0010827 $`New York` Units AveragePrice 4 0.23547 0.101474 5 3.47628 -3.653655 6 -0.14521 -0.014357 $Paris Units AveragePrice 7 0.21933 0.11733 8 4.52537 -4.62322 9 -0.17063 -0.06819> > Please note that I'm trying to get the following results (for the "Los > Angeles" group): > > Los Angeles "Units" variable (Mean-Centered) > 0.213682659 > -0.005370907 > -0.208311751 > > Los Angeles "AveragePrice" variable (Mean-Centered) > 0.071790268 > -0.072872965 > 0.001082696-- David Winsemius, MD Alameda, CA, USA
Hi,
It works for me also:
?by(dat1[c("Units","AveragePrice")],dat1[,1],specialFunction)
#dat1[, 1]: Los Angeles
?# ???? Units AveragePrice
#1? 0.2136827? 0.071790268
#2? 2.2735148 -2.351758623
#3 -0.2083118? 0.001082696
----------------------------------------------
#or
?by(cbind(Units=dat1[,3],AveragePrice=dat1[,4]),dat1[,1],specialFunction)
#INDICES: Los Angeles
?# ???? Units AveragePrice
#1? 0.2136827? 0.071790268
#2? 2.2735148 -2.351758623
#3 -0.2083118? 0.001082696
--------------------------------------------
A.K.
----- Original Message -----
From: "Ray DiGiacomo, Jr." <rayd at liondatasystems.com>
To: R Help <r-help at r-project.org>
Cc:
Sent: Saturday, December 8, 2012 6:54 PM
Subject: [R] Mean-Centering Question
Hello,
I'm trying to create a custom function that "mean-centers" data
and can be
applied across many columns.
Here is an example dataset, which is similar to my dataset:
*Location,TimePeriod,Units,AveragePrice*
Los Angeles,5/1/11,61,5.42
Los Angeles,5/8/11,49,4.69
Los Angeles,5/15/11,40,5.05
New York,5/1/11,259,6.4
New York,5/8/11,187,5.3
New York,5/15/11,177,5.7
Paris,5/1/11,672,6.26
Paris,5/8/11,514,5.3
Paris,5/15/11,455,5.2
I want to mean-center the "Units" and "AveragePrice"
Columns.
So, I created this function:
specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }
If I use only "one" column in the first argument of the "by"
function,
everything is in fine.? For example the following code will work fine:
by(data[c("Units")],
data["Location"],
specialFunction)
But the following code will "not" work, because I have "two"
columns in the
first argument...
by(data[c("Units", "AveragePrice")],
data["Location"],
specialFunction)
Does anyone have any ideas as to what I am doing wrong?
Please note that I'm trying to get the following results (for the "Los
Angeles" group):
Los Angeles "Units" variable (Mean-Centered)
0.213682659
-0.005370907
-0.208311751
Los Angeles "AveragePrice" variable (Mean-Centered)
0.071790268
-0.072872965
0.001082696
Best Regards,
Ray DiGiacomo, Jr.
Healthcare Predictive Analytics Specialist
President, Lion Data Systems LLC
President, The Orange County R User Group
Board Member, TDWI
rayd at liondatasystems.com
(m) 408-425-7851
San Juan Capistrano, California USA
twitter.com/liondatasystems
linkedin.com/in/raydigiacomojr
youtube.com/user/liondatasystems/videos
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.