Luis Sisamón
2010-Apr-27 11:48 UTC
[R] Problem calculating multiple regressions on a data frame.
Hi there, I am stuck trying to solve what should be a fairly easy problem. I have a data frame that essentially consists of (ID, time as seqMonth, variable, value) and i want to find the regression coefficient of value vs time for each combination of ID and Variable. I have tried several approaches and none of them seems to work as i expected. For example, i have tried: theSplit<-split(theTestLineal, list(as.factor(theTestLineal $ids), as.factor(theTestLineal $variable))) I can then use lm(value~seqMonth,data=zongSplit[[1]]) ... lm(value~seqMonth,data=zongSplit[[4]]) that works well, (it fails for some combinations of ID and variable where there is one datapoint) however when i try to use an lapply: lapply(zongSplit,function(x)lm(value~seqMonth,data=x,na.action=na.exclude)) it fails, with error message: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases I have tried to change the na.action with no success (na.pass, na.fail, na.exclude... all give the same error message) I have also tried to follow the approach suggested by Charles Sharpsteen (http://www.mail-archive.com/r-help@r-project.org/msg74759.html) with similar results. The code is as follows: theModels <- by( theTestLineal, list( theTestLineal$ids, zongTestLineal$variable), function( dataSlice ){ linMod <- lm( value ~ seqMonth, data = dataSlice ) # Slope and intercept may be recovered from the output of the coef() function: intercept <- coef( linMod )[1] slope <- coef( linMod )[2] # The R-Squared value is returned by the summary() function: rsq <- summary( linMod )[[ ''r.squared'' ]] # The summary function also provides statistics for the F-distribution, # extract them, reformat as a list, rename and feed to pf() using do.call() # in order to get the p-value: fStats <- as.list( summary( linMod )[[ ''fstatistic'' ]] ) names( fStats ) <- c( ''q'', ''df1'', ''df2'' ) fStats[[ ''lower.tail'' ]] <- FALSE pVal <- do.call( pf, fStats ) return(data.frame( slope, intercept, rsq, pVal )) }) Any help will be appreciated! [[alternative HTML version deleted]]
Petr PIKAL
2010-Apr-27 12:13 UTC
[R] Odp: Problem calculating multiple regressions on a data frame.
Hi what about fit <- lm(value~seqMonth+ids+varable, data=theTestLineal) or similar approach using ?lme See also ?interaction Regards Petr r-help-bounces at r-project.org napsal dne 27.04.2010 13:48:32:> Hi there, > I am stuck trying to solve what should be a fairly easy problem. > I have a data frame that essentially consists of (ID, time as seqMonth, > variable, value) and i want to find the regression coefficient of valuevs> time for each combination of ID and Variable. > I have tried several approaches and none of them seems to work as i > expected. > For example, i have tried: > > theSplit<-split(theTestLineal, list(as.factor(theTestLineal $ids), > as.factor(theTestLineal $variable))) > > I can then use > lm(value~seqMonth,data=zongSplit[[1]]) > ... > lm(value~seqMonth,data=zongSplit[[4]]) > > that works well, (it fails for some combinations of ID and variablewhere> there is one datapoint) > > however when i try to use an lapply: >lapply(zongSplit,function(x)lm(value~seqMonth,data=x,na.action=na.exclude))> > it fails, with error message: > Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : > 0 (non-NA) cases > > I have tried to change the na.action with no success (na.pass, na.fail, > na.exclude... all give the same error message) > > > I have also tried to follow the approach suggested by Charles Sharpsteen > (http://www.mail-archive.com/r-help at r-project.org/msg74759.html) with > similar results. > The code is as follows: > theModels <- by( theTestLineal, list( theTestLineal$ids, > zongTestLineal$variable), function( dataSlice ){ > linMod <- lm( value ~ seqMonth, data = dataSlice ) > > # Slope and intercept may be recovered from the output of the coef() > function: > intercept <- coef( linMod )[1] > slope <- coef( linMod )[2] > > # The R-Squared value is returned by the summary() function: > rsq <- summary( linMod )[[ 'r.squared' ]] > > # The summary function also provides statistics for the F-distribution, > # extract them, reformat as a list, rename and feed to pf() usingdo.call()> # in order to get the p-value: > fStats <- as.list( summary( linMod )[[ 'fstatistic' ]] ) > names( fStats ) <- c( 'q', 'df1', 'df2' ) > fStats[[ 'lower.tail' ]] <- FALSE > > pVal <- do.call( pf, fStats ) > > return(data.frame( slope, intercept, rsq, pVal )) > }) > > Any help will be appreciated! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck
2010-Apr-27 12:27 UTC
[R] Problem calculating multiple regressions on a data frame.
Replace lm(...) with try(lm(...)) On Tue, Apr 27, 2010 at 7:48 AM, Luis Sisam?n <luis.sisamon at gmail.com> wrote:> Hi there, > I am stuck trying to solve what should be a fairly easy problem. > I have a data frame that essentially consists of (ID, time as seqMonth, > variable, value) and i want to find the regression coefficient of value vs > time for each combination of ID and Variable. > I have tried several approaches and none of them seems to work as i > expected. > For example, i have tried: > > theSplit<-split(theTestLineal, list(as.factor(theTestLineal $ids), > as.factor(theTestLineal $variable))) > > I can then use > lm(value~seqMonth,data=zongSplit[[1]]) > ... > lm(value~seqMonth,data=zongSplit[[4]]) > > that works well, (it fails for some combinations of ID and variable where > there is one datapoint) > > however when i try to use an lapply: > lapply(zongSplit,function(x)lm(value~seqMonth,data=x,na.action=na.exclude)) > > it fails, with error message: > Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : > ?0 (non-NA) cases > > I have tried to change the na.action with no success (na.pass, na.fail, > na.exclude... all give the same error message) > > > I have also tried to follow the approach suggested by Charles Sharpsteen > (http://www.mail-archive.com/r-help at r-project.org/msg74759.html) with > similar results. > The code is as follows: > theModels <- by( theTestLineal, list( theTestLineal$ids, > zongTestLineal$variable), function( dataSlice ){ > linMod <- lm( value ~ seqMonth, data = dataSlice ) > > # Slope and intercept may be recovered from the output of the coef() > function: > intercept <- coef( linMod )[1] > slope <- coef( linMod )[2] > > # The R-Squared value is returned by the summary() function: > rsq <- summary( linMod )[[ 'r.squared' ]] > > # The summary function also provides statistics for the F-distribution, > # extract them, reformat as a list, rename and feed to pf() using do.call() > # in order to get the p-value: > fStats <- as.list( summary( linMod )[[ 'fstatistic' ]] ) > names( fStats ) <- c( 'q', 'df1', 'df2' ) > fStats[[ 'lower.tail' ]] <- FALSE > > pVal <- do.call( pf, fStats ) > > return(data.frame( slope, intercept, rsq, pVal )) > }) > > Any help will be appreciated! > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Luis Sisamon
2010-Apr-27 15:47 UTC
[R] Problem calculating multiple regressions on a data frame.
Gabor Grothendieck <ggrothendieck <at> gmail.com> writes:> > Replace lm(...) with try(lm(...)) >Thanks for all the replies. I managed to make it work with the try() trick, I actually padded the lm() on two levels of try() and is working now Sharpsteen approach.