Jim Bouldin
2009-Nov-24 20:25 UTC
[R] linear regression on groups of consecutive rows of a matrix
I want to perform linear regression on groups of consecutive rows--say 5 to 10 such--of two matrices. There are many such potential groups because the matrices have thousands of rows. The matrices are both of the form:> shp[1:5,16:20]SL495B SL004C SL005C SL005A SL017A -2649 1.06 0.56 NA NA NA -2648 0.97 0.57 NA NA NA -2647 0.46 0.30 NA NA NA -2646 0.92 0.48 NA NA NA -2645 0.82 0.48 NA NA NA That is, they both have NA values, and non-NA values, in the same matrix positions. In my attempts so far, I have had two problems. First, in using the split function (which I assume is essential here), I am unable to split the matrices by groups of rows (say rows 1 to 5, 6 to 10, etc):> shp_split = split(shp,row(shp))will split the matrix by rows but not by groups thereof. Stumped. Second, I cannot seem to get rid of the NA values, which would prevent the regression even is I could figure out how to split the matrices correctly, e.g.:> shp_split = split(shp,row(shp)) > shp_split = shp_split[!is.na(shp_split)] > shp_split[1]$`1` [1] 0.68 0.28 0.43 0.47 0.64 0.40 0.69 0.56 0.62 0.40 1.01 0.67 0.17 1.36 1.84 1.06 0.56 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA etc IF I solve these problems, will I in fact be able to perform individual linear regressions on the (numerous) collections of 5 to 10 rows? Thanks as always for any insight. Jim Bouldin Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740
David Winsemius
2009-Nov-24 20:52 UTC
[R] linear regression on groups of consecutive rows of a matrix
Perhaps along these lines: 1st #need to decide what your group width is , so the second number inside the extraction call will be that number minus 1: for (x in seq(1:1000, by=6) { temp <- na,omit( shp[x:(x+5), ] ) # Need the parens in x:(x+5) lm( formula, data=temp) } Or depending on what you actually meant: for (x in seq(1:1000, by=5) { temp <- shp[ x:(x+4), which(!is.na(shp[x:x+4, ]))] lm( formula, data=temp) } But I do feel compelled to ask: Do you really get meaningful information from lm applied to 5 cases? Especially when the predictors used may not be the same from subset to subset??? -- David On Nov 24, 2009, at 3:25 PM, Jim Bouldin wrote:> > I want to perform linear regression on groups of consecutive rows-- > say 5 to > 10 such--of two matrices. There are many such potential groups > because the > matrices have thousands of rows. The matrices are both of the form: > >> shp[1:5,16:20] > SL495B SL004C SL005C SL005A SL017A > -2649 1.06 0.56 NA NA NA > -2648 0.97 0.57 NA NA NA > -2647 0.46 0.30 NA NA NA > -2646 0.92 0.48 NA NA NA > -2645 0.82 0.48 NA NA NA > > That is, they both have NA values, and non-NA values, in the same > matrix > positions. In my attempts so far, I have had two problems. First, in > using the split function (which I assume is essential here), I am > unable to > split the matrices by groups of rows (say rows 1 to 5, 6 to 10, etc): > >> shp_split = split(shp,row(shp)) > > will split the matrix by rows but not by groups thereof. Stumped. > > Second, I cannot seem to get rid of the NA values, which would > prevent the > regression even is I could figure out how to split the matrices > correctly, > e.g.: >> shp_split = split(shp,row(shp)) >> shp_split = shp_split[!is.na(shp_split)] >> shp_split[1] > $`1` > [1] 0.68 0.28 0.43 0.47 0.64 0.40 0.69 0.56 0.62 0.40 1.01 0.67 > 0.17 1.36 > 1.84 1.06 0.56 NA NA NA NA NA NA NA NA NA NA > NA NA > NA NA NA etc > > IF I solve these problems, will I in fact be able to perform > individual > linear regressions on the (numerous) collections of 5 to 10 rows? > > Thanks as always for any insight. > > > Jim Bouldin > Research Ecologist > Department of Plant Sciences, UC Davis > Davis CA, 95616 > 530-554-1740 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
Jim Bouldin
2009-Nov-24 21:06 UTC
[R] linear regression on groups of consecutive rows of a matrix
> But I do feel compelled to ask: Do you really get meaningful > information from lm applied to 5 cases? Especially when the predictors > used may not be the same from subset to subset???Thanks again for your help David. Your question is a good one. It's a bit complicated but here's the basics. The predictors are the same between subsets, in the sense that, for each group of rows (which represent tree ring years), the predictors and predictands are always from the same set of trees, even though that set changes slightly between consecutive subsets. Typically there will be 20+ observations per year (row), so for 5 rows I have n = 100+. For my purposes (removing the effect of tree size on ring width for small groups of years) that is more than good enough. Now to try out your suggestion... Jim> > -- > David > > On Nov 24, 2009, at 3:25 PM, Jim Bouldin wrote: > > > > > I want to perform linear regression on groups of consecutive rows-- > > say 5 to > > 10 such--of two matrices. There are many such potential groups > > because the > > matrices have thousands of rows. The matrices are both of the form: > > > >> shp[1:5,16:20] > > SL495B SL004C SL005C SL005A SL017A > > -2649 1.06 0.56 NA NA NA > > -2648 0.97 0.57 NA NA NA > > -2647 0.46 0.30 NA NA NA > > -2646 0.92 0.48 NA NA NA > > -2645 0.82 0.48 NA NA NA > > > > That is, they both have NA values, and non-NA values, in the same > > matrix > > positions. In my attempts so far, I have had two problems. First, in > > using the split function (which I assume is essential here), I am > > unable to > > split the matrices by groups of rows (say rows 1 to 5, 6 to 10, etc): > > > >> shp_split = split(shp,row(shp)) > > > > will split the matrix by rows but not by groups thereof. Stumped. > > > > Second, I cannot seem to get rid of the NA values, which would > > prevent the > > regression even is I could figure out how to split the matrices > > correctly, > > e.g.: > >> shp_split = split(shp,row(shp)) > >> shp_split = shp_split[!is.na(shp_split)] > >> shp_split[1] > > $`1` > > [1] 0.68 0.28 0.43 0.47 0.64 0.40 0.69 0.56 0.62 0.40 1.01 0.67 > > 0.17 1.36 > > 1.84 1.06 0.56 NA NA NA NA NA NA NA NA NA NA > > NA NA > > NA NA NA etc > > > > IF I solve these problems, will I in fact be able to perform > > individual > > linear regressions on the (numerous) collections of 5 to 10 rows? > > > > Thanks as always for any insight. > > > > > > Jim Bouldin > > Research Ecologist > > Department of Plant Sciences, UC Davis > > Davis CA, 95616 > > 530-554-1740 > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > >Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740