Hello, I'm trying to create a custom function that "mean-centers" data and can be applied across many columns. Here is an example dataset, which is similar to my dataset: *Location,TimePeriod,Units,AveragePrice* Los Angeles,5/1/11,61,5.42 Los Angeles,5/8/11,49,4.69 Los Angeles,5/15/11,40,5.05 New York,5/1/11,259,6.4 New York,5/8/11,187,5.3 New York,5/15/11,177,5.7 Paris,5/1/11,672,6.26 Paris,5/8/11,514,5.3 Paris,5/15/11,455,5.2 I want to mean-center the "Units" and "AveragePrice" Columns. So, I created this function: specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) } If I use only "one" column in the first argument of the "by" function, everything is in fine. For example the following code will work fine: by(data[c("Units")], data["Location"], specialFunction) But the following code will "not" work, because I have "two" columns in the first argument... by(data[c("Units", "AveragePrice")], data["Location"], specialFunction) Does anyone have any ideas as to what I am doing wrong? Please note that I'm trying to get the following results (for the "Los Angeles" group): Los Angeles "Units" variable (Mean-Centered) 0.213682659 -0.005370907 -0.208311751 Los Angeles "AveragePrice" variable (Mean-Centered) 0.071790268 -0.072872965 0.001082696 Best Regards, Ray DiGiacomo, Jr. Healthcare Predictive Analytics Specialist President, Lion Data Systems LLC President, The Orange County R User Group Board Member, TDWI rayd@liondatasystems.com (m) 408-425-7851 San Juan Capistrano, California USA twitter.com/liondatasystems linkedin.com/in/raydigiacomojr youtube.com/user/liondatasystems/videos [[alternative HTML version deleted]]
Hello, I'm trying to create a custom function that "mean-centers" data and can be applied across many columns. Here is an example dataset, which is similar to my dataset: *Location,TimePeriod,Units,AveragePrice* Los Angeles,5/1/11,61,5.42 Los Angeles,5/8/11,49,4.69 Los Angeles,5/15/11,40,5.05 New York,5/1/11,259,6.4 New York,5/8/11,187,5.3 New York,5/15/11,177,5.7 Paris,5/1/11,672,6.26 Paris,5/8/11,514,5.3 Paris,5/15/11,455,5.2 I want to mean-center the "Units" and "AveragePrice" Columns. So, I created this function: specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) } If I use only "one" column in the first argument of the "by" function, everything is in fine. For example the following code will work fine: by(data[c("Units")], data["Location"], specialFunction) But the following code will "not" work, because I have "two" columns in the first argument... by(data[c("Units", "AveragePrice")], data["Location"], specialFunction) Does anyone have any ideas as to what I am doing wrong? Please note that I'm trying to get the following results (for the "Los Angeles" group): Los Angeles "Units" variable (Mean-Centered) 0.213682659 -0.005370907 -0.208311751 Los Angeles "AveragePrice" variable (Mean-Centered) 0.071790268 -0.072872965 0.001082696 Best Regards, Ray DiGiacomo, Jr. Healthcare Predictive Analytics Specialist President, Lion Data Systems LLC President, The Orange County R User Group Board Member, TDWI rayd@liondatasystems.com (m) 408-425-7851 San Juan Capistrano, California USA twitter.com/liondatasystems linkedin.com/in/raydigiacomojr youtube.com/user/liondatasystems/videos [[alternative HTML version deleted]]
On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote:> Hello, > > I'm trying to create a custom function that "mean-centers" data and > can be > applied across many columns. > > Here is an example dataset, which is similar to my dataset: > >dat <- read.table(text="Location,TimePeriod,Units,AveragePrice Los Angeles,5/1/11,61,5.42 Los Angeles,5/8/11,49,4.69 Los Angeles,5/15/11,40,5.05 New York,5/1/11,259,6.4 New York,5/8/11,187,5.3 New York,5/15/11,177,5.7 Paris,5/1/11,672,6.26 Paris,5/8/11,514,5.3 Paris,5/15/11,455,5.2", header=TRUE, sep=",")> > I want to mean-center the "Units" and "AveragePrice" Columns. > > So, I created this function: > > specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }I needed to modify this to avoid errors relating to how colMeans is expecting its arguments: specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) } aggregate(dat[3:4], dat[1], FUN=specialFunction2) Location Units.1 Units.2 Units.3 AveragePrice.1 AveragePrice.2 1 Los Angeles 0.2136827 -0.0053709 -0.2083118 0.0717903 -0.0728730 2 New York 0.2354659 -0.0902535 -0.1452124 0.1014743 -0.0871168 3 Paris 0.2193320 -0.0487031 -0.1706289 0.1173316 -0.0491417 AveragePrice.3 1 0.0010827 2 -0.0143575 3 -0.0681899> > If I use only "one" column in the first argument of the "by" function, > everything is in fine. For example the following code will work fine: > > by(data[c("Units")], > data["Location"], > specialFunction) > > But the following code will "not" work, because I have "two" columns > in the > first argument... > > by(data[c("Units", "AveragePrice")], > data["Location"], > specialFunction)OK. So then I tried this with your function and was surprised to see that it also works: > by(dat[c("Units", "AveragePrice")], + dat["Location"], + specialFunction) Location: Los Angeles Units AveragePrice 1 0.21368 0.0717903 2 2.27351 -2.3517586 3 -0.20831 0.0010827 ------------------------------------------------------------------ Location: New York Units AveragePrice 4 0.23547 0.101474 5 3.47628 -3.653655 6 -0.14521 -0.014357 ------------------------------------------------------------------ Location: Paris Units AveragePrice 7 0.21933 0.11733 8 4.52537 -4.62322 9 -0.17063 -0.06819> > Does anyone have any ideas as to what I am doing wrong?I guess I don't. Cannot reproduce and my other methods worked as well.This also works with your version and with mine but I get the deprecation message for `mean.data.frame` from mine: > lapply( split(dat[3:4], dat[1]) , FUN=specialFunction ) $`Los Angeles` Units AveragePrice 1 0.21368 0.0717903 2 2.27351 -2.3517586 3 -0.20831 0.0010827 $`New York` Units AveragePrice 4 0.23547 0.101474 5 3.47628 -3.653655 6 -0.14521 -0.014357 $Paris Units AveragePrice 7 0.21933 0.11733 8 4.52537 -4.62322 9 -0.17063 -0.06819> > Please note that I'm trying to get the following results (for the "Los > Angeles" group): > > Los Angeles "Units" variable (Mean-Centered) > 0.213682659 > -0.005370907 > -0.208311751 > > Los Angeles "AveragePrice" variable (Mean-Centered) > 0.071790268 > -0.072872965 > 0.001082696-- David Winsemius, MD Alameda, CA, USA
Hi, It works for me also: ?by(dat1[c("Units","AveragePrice")],dat1[,1],specialFunction) #dat1[, 1]: Los Angeles ?# ???? Units AveragePrice #1? 0.2136827? 0.071790268 #2? 2.2735148 -2.351758623 #3 -0.2083118? 0.001082696 ---------------------------------------------- #or ?by(cbind(Units=dat1[,3],AveragePrice=dat1[,4]),dat1[,1],specialFunction) #INDICES: Los Angeles ?# ???? Units AveragePrice #1? 0.2136827? 0.071790268 #2? 2.2735148 -2.351758623 #3 -0.2083118? 0.001082696 -------------------------------------------- A.K. ----- Original Message ----- From: "Ray DiGiacomo, Jr." <rayd at liondatasystems.com> To: R Help <r-help at r-project.org> Cc: Sent: Saturday, December 8, 2012 6:54 PM Subject: [R] Mean-Centering Question Hello, I'm trying to create a custom function that "mean-centers" data and can be applied across many columns. Here is an example dataset, which is similar to my dataset: *Location,TimePeriod,Units,AveragePrice* Los Angeles,5/1/11,61,5.42 Los Angeles,5/8/11,49,4.69 Los Angeles,5/15/11,40,5.05 New York,5/1/11,259,6.4 New York,5/8/11,187,5.3 New York,5/15/11,177,5.7 Paris,5/1/11,672,6.26 Paris,5/8/11,514,5.3 Paris,5/15/11,455,5.2 I want to mean-center the "Units" and "AveragePrice" Columns. So, I created this function: specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) } If I use only "one" column in the first argument of the "by" function, everything is in fine.? For example the following code will work fine: by(data[c("Units")], data["Location"], specialFunction) But the following code will "not" work, because I have "two" columns in the first argument... by(data[c("Units", "AveragePrice")], data["Location"], specialFunction) Does anyone have any ideas as to what I am doing wrong? Please note that I'm trying to get the following results (for the "Los Angeles" group): Los Angeles "Units" variable (Mean-Centered) 0.213682659 -0.005370907 -0.208311751 Los Angeles "AveragePrice" variable (Mean-Centered) 0.071790268 -0.072872965 0.001082696 Best Regards, Ray DiGiacomo, Jr. Healthcare Predictive Analytics Specialist President, Lion Data Systems LLC President, The Orange County R User Group Board Member, TDWI rayd at liondatasystems.com (m) 408-425-7851 San Juan Capistrano, California USA twitter.com/liondatasystems linkedin.com/in/raydigiacomojr youtube.com/user/liondatasystems/videos ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.