Roger Koenker
2018-Jul-17 08:02 UTC
[R] Scaling - does it get any better results than not scaling?
In certain fields this sort of standardization has become customary based on some sort of (misguided) notion that it induces ?normality.? For example, in anthropometric studies based on the international Demographic and Health Surveys (DHS) childrens? heights are often transformed to Z-scores prior to subsequent analysis under the dubious presumption that variability around the Z-scores at various ages will be Gaussian. In my experience this is rarely justified, and analysts would be better off modeling the original data rather than doing the preliminary transformation. This is discussed in further detail here: https://projecteuclid.org/euclid.bjps/1313973394.> On Jul 17, 2018, at 5:53 AM, Michael Thompson <michael.thompson at manukau.ac.nz> wrote: > > Hi, > I seem to remember from classes that one effect of scaling / standardising data was to get better results in any analysis. But what I'm seeing when I study various explanations on scaling is that we get exactly the same results, just that when we look at standardised data it's easier to see proportionate effects. > This is all very well for the data scientist to further investigate, but from a practical point of view, (especially IF it doesn't improve the accuracy of the result) surely it adds complication to 'telling the story' > of the model to non-DS people? > So, is scaling a technique for the DS to use to find effects, while eventually delivering a non-scaled version to the users? > I'd like to be able to give the true story to my students, not some fairy story based on my misunderstanding. Hope you can help with this. > Michael > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2018-Jul-17 15:02 UTC
[R] Scaling - does it get any better results than not scaling?
Prof. Koenker's response probably settles the matter, but if not, this thread should really be taken offlist, as it is primarily about statistics and not R programming. stats.stackexchange.com might be an alternative place to post; indeed, I suspect the issue has already been addressed in their archives. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jul 17, 2018 at 1:02 AM, Roger Koenker <rkoenker at illinois.edu> wrote:> In certain fields this sort of standardization has become customary based > on some sort of (misguided) notion that it > induces ?normality.? For example, in anthropometric studies based on the > international Demographic and Health > Surveys (DHS) childrens? heights are often transformed to Z-scores prior > to subsequent analysis under the dubious > presumption that variability around the Z-scores at various ages will be > Gaussian. In my experience this is rarely > justified, and analysts would be better off modeling the original data > rather than doing the preliminary transformation. > This is discussed in further detail here: https://projecteuclid.org/ > euclid.bjps/1313973394. > > > On Jul 17, 2018, at 5:53 AM, Michael Thompson < > michael.thompson at manukau.ac.nz> wrote: > > > > Hi, > > I seem to remember from classes that one effect of scaling / > standardising data was to get better results in any analysis. But what I'm > seeing when I study various explanations on scaling is that we get exactly > the same results, just that when we look at standardised data it's easier > to see proportionate effects. > > This is all very well for the data scientist to further investigate, but > from a practical point of view, (especially IF it doesn't improve the > accuracy of the result) surely it adds complication to 'telling the story' > > of the model to non-DS people? > > So, is scaling a technique for the DS to use to find effects, while > eventually delivering a non-scaled version to the users? > > I'd like to be able to give the true story to my students, not some > fairy story based on my misunderstanding. Hope you can help with this. > > Michael > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Michael Thompson
2018-Jul-18 05:36 UTC
[R] Scaling - does it get any better results than not scaling?
My thanks to all contributors, and while I was not in the right place, I certainly got the answers I needed. My students will benefit, so thank you all. Regards, Michael Thompson M.Prof.Studies Data Science 09 975 4678 Senior Lecturer, Digital Technologies Manukau Campus We all, like sheep, have gone astray Isaiah 53 Personal profile: https://www.manukau.ac.nz/about/faculties-schools/business-and-information-technology/more-information-for-students/lecturer-profiles/michael-thompson From: Bert Gunter [mailto:bgunter.4567 at gmail.com] Sent: Wednesday, 18 July 2018 3:02 AM To: Roger Koenker <rkoenker at illinois.edu> Cc: Michael Thompson <michael.thompson at manukau.ac.nz>; r-help at r-project.org Subject: Re: [R] Scaling - does it get any better results than not scaling? Prof. Koenker's response probably settles the matter, but if not, this thread should really be taken offlist, as it is primarily about statistics and not R programming. stats.stackexchange.com<http://stats.stackexchange.com> might be an alternative place to post; indeed, I suspect the issue has already been addressed in their archives. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jul 17, 2018 at 1:02 AM, Roger Koenker <rkoenker at illinois.edu<mailto:rkoenker at illinois.edu>> wrote: In certain fields this sort of standardization has become customary based on some sort of (misguided) notion that it induces ?normality.? For example, in anthropometric studies based on the international Demographic and Health Surveys (DHS) childrens? heights are often transformed to Z-scores prior to subsequent analysis under the dubious presumption that variability around the Z-scores at various ages will be Gaussian. In my experience this is rarely justified, and analysts would be better off modeling the original data rather than doing the preliminary transformation. This is discussed in further detail here: https://projecteuclid.org/euclid.bjps/1313973394.> On Jul 17, 2018, at 5:53 AM, Michael Thompson <michael.thompson at manukau.ac.nz<mailto:michael.thompson at manukau.ac.nz>> wrote: > > Hi, > I seem to remember from classes that one effect of scaling / standardising data was to get better results in any analysis. But what I'm seeing when I study various explanations on scaling is that we get exactly the same results, just that when we look at standardised data it's easier to see proportionate effects. > This is all very well for the data scientist to further investigate, but from a practical point of view, (especially IF it doesn't improve the accuracy of the result) surely it adds complication to 'telling the story' > of the model to non-DS people? > So, is scaling a technique for the DS to use to find effects, while eventually delivering a non-scaled version to the users? > I'd like to be able to give the true story to my students, not some fairy story based on my misunderstanding. Hope you can help with this. > Michael > > ______________________________________________ > R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]