Hi all. I'm brand new to R. My dataset (stored in MySQL) is a list of weather stations in rows by year with various weather variables in columns, for example: STNID YEAR TEMP DEWP station1 1990 54 50 station1 1991 23 10 station1 1992 34 18 station2 1990 45 41 station2 1991 32 25 station2 1992 21 11 I'm trying to run linear regression and get the basic output (i.e. intercept, slope, and significance) for each station. I'm able to run the regression on the entire dataset using: lm(TEMP~DEWP, data=select) But is there a way to aggregate the data ("group by" in MySQL) by STNID during the regression? Ideally I would just have a list of stations and their approriate summary output, which I could use for further analysis. I've searched the manual, etc. for solutions, but have been unsuccessful. Any assistance is greatly appreciated. Thank you, Ryan
Try: by(x, list(x$STNID), function(.x)lm(TEMP ~ DEWP, data = .x)) On Mon, Apr 14, 2008 at 2:03 PM, Ryan Lauritsen <ryanlauritsen@gmail.com> wrote:> Hi all. I'm brand new to R. > > My dataset (stored in MySQL) is a list of weather stations in rows by > year with various weather variables in columns, for example: > STNID YEAR TEMP DEWP > station1 1990 54 50 > station1 1991 23 10 > station1 1992 34 18 > station2 1990 45 41 > station2 1991 32 25 > station2 1992 21 11 > > I'm trying to run linear regression and get the basic output (i.e. > intercept, slope, and significance) for each station. I'm able to run > the regression on the entire dataset using: > lm(TEMP~DEWP, data=select) > But is there a way to aggregate the data ("group by" in MySQL) by > STNID during the regression? Ideally I would just have a list of > stations and their approriate summary output, which I could use for > further analysis. > > I've searched the manual, etc. for solutions, but have been > unsuccessful. Any assistance is greatly appreciated. > > Thank you, > Ryan > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
At 18:03 14/04/2008, Ryan Lauritsen wrote:>Hi all. I'm brand new to R. > >My dataset (stored in MySQL) is a list of weather stations in rows by >year with various weather variables in columns, for example: >STNID YEAR TEMP DEWP >station1 1990 54 50 >station1 1991 23 10 >station1 1992 34 18 >station2 1990 45 41 >station2 1991 32 25 >station2 1992 21 11 > >I'm trying to run linear regression and get the basic output (i.e. >intercept, slope, and significance) for each station. I'm able to run >the regression on the entire dataset using: >lm(TEMP~DEWP, data=select) >But is there a way to aggregate the data ("group by" in MySQL) by >STNID during the regression? Ideally I would just have a list of >stations and their approriate summary output, which I could use for >further analysis.In this particular case you might consider using lmList from the nlme package (or from lme4). More generally you could look at the family of apply functions: apply, tapply, sapply, and so on.>I've searched the manual, etc. for solutions, but have been >unsuccessful. Any assistance is greatly appreciated. > >Thank you, >RyanMichael Dewey http://www.aghmed.fsnet.co.uk