Janka VANSCHOENWINKEL
2015-Dec-24 13:03 UTC
[R] Weighted demean by group on only a selection of the dataset
Dear colleagues, I am trying to find a simple code to demean 1) only certain values of a dataset, 2) by group 3) and in a weighted fasion. Currently, I can only demean all the numeric variables in the dataset: Data[,sapply(Data, is.numeric)] <- apply(Data[sapply(Data, is.numeric)], 2, function(x) scale(x, scale = FALSE)) Assume that my dataset looks like this: Country<- c('BE','BE','DE','GR','IT','ES','DE','NL') Landvalue<- c(21000, 23400, 26800, 15000,18000,23000,19000,23000) Temperature_spring <- c('15','16','14','18','23','21','12','15') Temperature_summer <- c('25','18','19','23','24','22','15','19') Temperature_autumn <- c('14','12','12','10','20','20','11','13') Temperature_winter <- c('9','4','12','14','15','13','17','12') Weight<-c('5','20','3','2','15','21','13','8') Data <- data.frame(Country, Landvalue, Temperature_spring,Temperature_summer, Temperature_autumn,Temperature_winter, Weight) Now imagine I only want to demean the temperature-variables, grouped by country and weighted by weight. With grouped by country I mean that I want to subtract only the mean of Belgium from an observation in Belgium. Does somebody know how to add the three functions to the code line I already have? Or if this does not work, what code should I use? Thank you very much and have a nice Christmas!
David Winsemius
2015-Dec-24 14:40 UTC
[R] Weighted demean by group on only a selection of the dataset
> On Dec 24, 2015, at 5:03 AM, Janka VANSCHOENWINKEL <janka.vanschoenwinkel at uhasselt.be> wrote: > > Dear colleagues, > > I am trying to find a simple code to demean > 1) only certain values of a dataset, > 2) by group > 3) and in a weighted fasion. > > Currently, I can only demean all the numeric variables in the dataset: > > Data[,sapply(Data, is.numeric)] <- apply(Data[sapply(Data, > is.numeric)], 2, function(x) scale(x, scale = FALSE)) > > Assume that my dataset looks like this: > Country<- c('BE','BE','DE','GR','IT','ES','DE','NL') > Landvalue<- c(21000, 23400, 26800, 15000,18000,23000,19000,23000) > Temperature_spring <- c('15','16','14','18','23','21','12','15') > Temperature_summer <- c('25','18','19','23','24','22','15','19') > Temperature_autumn <- c('14','12','12','10','20','20','11','13') > Temperature_winter <- c('9','4','12','14','15','13','17','12') > Weight<-c('5','20','3','2','15','21','13','8') > Data <- data.frame(Country, Landvalue, > Temperature_spring,Temperature_summer, > Temperature_autumn,Temperature_winter, Weight)Do note that only the `Landvalue` column is numeric above. You would need to us as.numeric on the vectors that you are nominating for processing below.> > > Now imagine I only want to demean the temperature-variables, grouped > by country and weighted by weight. With grouped by country I mean that > I want to subtract only the mean of Belgium from an observation in > Belgium. >Generally when one wants to use multiple columns in a calculation with grouping, the method needs to be along the lines of : Data[ , grepl("Temp", Data) ] <- lapply( split(Data, Data$Country), FUN= ...)> Does somebody know how to add the three functions to the code line I > already have? Or if this does not work, what code should I use? > > Thank you very much and have a nice Christmas! > >-- David Winsemius Alameda, CA, USA