Luis Sisamón
2010-Apr-27 11:48 UTC
[R] Problem calculating multiple regressions on a data frame.
Hi there,
I am stuck trying to solve what should be a fairly easy problem.
I have a data frame that essentially consists of (ID, time as seqMonth,
variable, value) and i want to find the regression coefficient of value vs
time for each combination of ID and Variable.
I have tried several approaches and none of them seems to work as i
expected.
For example, i have tried:
theSplit<-split(theTestLineal, list(as.factor(theTestLineal $ids),
as.factor(theTestLineal $variable)))
I can then use
lm(value~seqMonth,data=zongSplit[[1]])
...
lm(value~seqMonth,data=zongSplit[[4]])
that works well, (it fails for some combinations of ID and variable where
there is one datapoint)
however when i try to use an lapply:
lapply(zongSplit,function(x)lm(value~seqMonth,data=x,na.action=na.exclude))
it fails, with error message:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
I have tried to change the na.action with no success (na.pass, na.fail,
na.exclude... all give the same error message)
I have also tried to follow the approach suggested by Charles Sharpsteen
(http://www.mail-archive.com/r-help@r-project.org/msg74759.html) with
similar results.
The code is as follows:
theModels <- by( theTestLineal, list( theTestLineal$ids,
zongTestLineal$variable), function( dataSlice ){
linMod <- lm( value ~ seqMonth, data = dataSlice )
# Slope and intercept may be recovered from the output of the coef()
function:
intercept <- coef( linMod )[1]
slope <- coef( linMod )[2]
# The R-Squared value is returned by the summary() function:
rsq <- summary( linMod )[[ ''r.squared'' ]]
# The summary function also provides statistics for the F-distribution,
# extract them, reformat as a list, rename and feed to pf() using do.call()
# in order to get the p-value:
fStats <- as.list( summary( linMod )[[ ''fstatistic'' ]] )
names( fStats ) <- c( ''q'', ''df1'',
''df2'' )
fStats[[ ''lower.tail'' ]] <- FALSE
pVal <- do.call( pf, fStats )
return(data.frame( slope, intercept, rsq, pVal ))
})
Any help will be appreciated!
[[alternative HTML version deleted]]
Petr PIKAL
2010-Apr-27 12:13 UTC
[R] Odp: Problem calculating multiple regressions on a data frame.
Hi what about fit <- lm(value~seqMonth+ids+varable, data=theTestLineal) or similar approach using ?lme See also ?interaction Regards Petr r-help-bounces at r-project.org napsal dne 27.04.2010 13:48:32:> Hi there, > I am stuck trying to solve what should be a fairly easy problem. > I have a data frame that essentially consists of (ID, time as seqMonth, > variable, value) and i want to find the regression coefficient of valuevs> time for each combination of ID and Variable. > I have tried several approaches and none of them seems to work as i > expected. > For example, i have tried: > > theSplit<-split(theTestLineal, list(as.factor(theTestLineal $ids), > as.factor(theTestLineal $variable))) > > I can then use > lm(value~seqMonth,data=zongSplit[[1]]) > ... > lm(value~seqMonth,data=zongSplit[[4]]) > > that works well, (it fails for some combinations of ID and variablewhere> there is one datapoint) > > however when i try to use an lapply: >lapply(zongSplit,function(x)lm(value~seqMonth,data=x,na.action=na.exclude))> > it fails, with error message: > Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : > 0 (non-NA) cases > > I have tried to change the na.action with no success (na.pass, na.fail, > na.exclude... all give the same error message) > > > I have also tried to follow the approach suggested by Charles Sharpsteen > (http://www.mail-archive.com/r-help at r-project.org/msg74759.html) with > similar results. > The code is as follows: > theModels <- by( theTestLineal, list( theTestLineal$ids, > zongTestLineal$variable), function( dataSlice ){ > linMod <- lm( value ~ seqMonth, data = dataSlice ) > > # Slope and intercept may be recovered from the output of the coef() > function: > intercept <- coef( linMod )[1] > slope <- coef( linMod )[2] > > # The R-Squared value is returned by the summary() function: > rsq <- summary( linMod )[[ 'r.squared' ]] > > # The summary function also provides statistics for the F-distribution, > # extract them, reformat as a list, rename and feed to pf() usingdo.call()> # in order to get the p-value: > fStats <- as.list( summary( linMod )[[ 'fstatistic' ]] ) > names( fStats ) <- c( 'q', 'df1', 'df2' ) > fStats[[ 'lower.tail' ]] <- FALSE > > pVal <- do.call( pf, fStats ) > > return(data.frame( slope, intercept, rsq, pVal )) > }) > > Any help will be appreciated! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck
2010-Apr-27 12:27 UTC
[R] Problem calculating multiple regressions on a data frame.
Replace lm(...) with try(lm(...)) On Tue, Apr 27, 2010 at 7:48 AM, Luis Sisam?n <luis.sisamon at gmail.com> wrote:> Hi there, > I am stuck trying to solve what should be a fairly easy problem. > I have a data frame that essentially consists of (ID, time as seqMonth, > variable, value) and i want to find the regression coefficient of value vs > time for each combination of ID and Variable. > I have tried several approaches and none of them seems to work as i > expected. > For example, i have tried: > > theSplit<-split(theTestLineal, list(as.factor(theTestLineal $ids), > as.factor(theTestLineal $variable))) > > I can then use > lm(value~seqMonth,data=zongSplit[[1]]) > ... > lm(value~seqMonth,data=zongSplit[[4]]) > > that works well, (it fails for some combinations of ID and variable where > there is one datapoint) > > however when i try to use an lapply: > lapply(zongSplit,function(x)lm(value~seqMonth,data=x,na.action=na.exclude)) > > it fails, with error message: > Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : > ?0 (non-NA) cases > > I have tried to change the na.action with no success (na.pass, na.fail, > na.exclude... all give the same error message) > > > I have also tried to follow the approach suggested by Charles Sharpsteen > (http://www.mail-archive.com/r-help at r-project.org/msg74759.html) with > similar results. > The code is as follows: > theModels <- by( theTestLineal, list( theTestLineal$ids, > zongTestLineal$variable), function( dataSlice ){ > linMod <- lm( value ~ seqMonth, data = dataSlice ) > > # Slope and intercept may be recovered from the output of the coef() > function: > intercept <- coef( linMod )[1] > slope <- coef( linMod )[2] > > # The R-Squared value is returned by the summary() function: > rsq <- summary( linMod )[[ 'r.squared' ]] > > # The summary function also provides statistics for the F-distribution, > # extract them, reformat as a list, rename and feed to pf() using do.call() > # in order to get the p-value: > fStats <- as.list( summary( linMod )[[ 'fstatistic' ]] ) > names( fStats ) <- c( 'q', 'df1', 'df2' ) > fStats[[ 'lower.tail' ]] <- FALSE > > pVal <- do.call( pf, fStats ) > > return(data.frame( slope, intercept, rsq, pVal )) > }) > > Any help will be appreciated! > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Luis Sisamon
2010-Apr-27 15:47 UTC
[R] Problem calculating multiple regressions on a data frame.
Gabor Grothendieck <ggrothendieck <at> gmail.com> writes:> > Replace lm(...) with try(lm(...)) >Thanks for all the replies. I managed to make it work with the try() trick, I actually padded the lm() on two levels of try() and is working now Sharpsteen approach.