Hi: I have a dataset and need to interpolate for missing days. In Excel I either average from sampled days from above and below the missing days or use the TREND function to make up for the missing values. I have been reading about na.approx, is this function similar to the TREND function? Which is the best recommendable way to make up for missing data? Here's my dataset: weeks 17,18,26 and 46 have 0 daysSamp. Year Week daysSamp Lower TotalPD Upper varTotalPD 2006 47 6 126988 188259 249530 1045878675 2006 48 7 189155 253350 317545 1148102355 2006 49 7 103300 132741 162182 241480186 2006 50 6 11801 252576 493352 16151006813 2006 51 7 2348 3671 4994 487926 2006 52 5 2606 29901 57196 215454181 2006 2 7 2968 4513 6058 664723 2006 3 7 1128 1889 2650 161231 2006 4 7 479 963 1447 65196 2006 5 7 2819 4413 6007 708094 2006 6 6 -1009 3128 7264 4766743 2006 7 7 -5239 10769 26777 71387835 2006 8 7 150 503 856 34685 2006 9 7 1858 2989 4120 356562 2006 10 7 193 494 795 25281 2006 11 7 125 346 567 13627 2006 12 7 432 767 1102 31189 2006 13 7 1229 1867 2505 113569 2006 14 7 813 1339 1865 77140 2006 15 4 -66 124 315 10105 2006 16 7 152 903 1654 157242 2006 17 0 2006 18 0 2006 19 5 0 0 0 0 2006 20 4 0 0 0 0 2006 21 5 0 0 0 0 2006 22 6 0 0 0 0 2006 23 7 -65 285 635 34112 2006 24 6 0 0 0 0 2006 25 7 0 0 0 0 2006 26 0 2006 27 4 228 931 1634 137726 2006 28 4 801 2231 3662 569977 2006 29 4 4544 9242 13939 6147522 2006 30 5 15798 28465 41131 44697915 2006 31 5 25398 41049 56701 68245523 2006 32 5 48197 82216 116235 322416917 2006 33 5 142980 230411 317841 2129630128 2006 34 5 227141 360468 493794 4952314336 2006 35 5 467244 756325 1045405 23281569629 2006 36 5 281049 463331 645614 9256900449 2006 37 2 227636 620330 1013023 42961663047 2006 38 3 478990 983472 1487954 70903343603 2006 39 7 539690 846522 1153354 26228718974 2006 40 7 320959 457866 594773 5221891252 2006 41 7 427561 582452 737343 6683813344 2006 42 7 271788 351103 430418 1752614293 2006 43 7 165019 208853 252687 535301133 2006 44 7 91514 117390 143266 186537178 2006 45 7 59061 79187 99313 112842787 2006 46 0 Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish & Wildlife Service California, USA
Hi, I would be careful with imputing values. Imputing is a disputed practice. Also, you seem to have huge variation in your data with a nonlinear trend. Since your data is time series data, you may consider modeling the time series and then to predict n-time periods ahead and use these predicted values (look up arima and garch models for that, which are the most common). An alternative approach would be to fit a smoothing spline (package mgcv, function gam() ) through your time trend (leave out the missing data) and then predict for your missing data periods. BUT: Are you sure that your weeks 19 to 25 are correctly recorded? To predict weeks 17, 18, and 26 looks particularly odd given the observations you have for 19 to 25. 17, 18, and 26 could either be connected to the trend from the previous observation, i.e. be nonzero for 17 and 18, and zero for 26, or they also could be "connected" to the following observation and be zero for 17 and 18, and nonzero for 26. This is because the plain zeros in 19-25 are really odd. Are you sure that the observation was a definitve zero? If so, it does not make imputation look like a good strategy for these three values. For period 46 prediction with a time series model looks more promising, but the question here is, is it necessary (as it is the last observation anyway)? You may just want to avoid doing something wrong by using an error prone prediction for your last period. Just my two cents, Daniel Felipe Carrillo wrote:> > Hi: > I have a dataset and need to interpolate for missing days. In Excel I > either average from sampled days from above and below the missing days or > use the TREND function to make up for the missing values. I have been > reading about na.approx, is this function similar to the TREND function? > Which is the best recommendable way to make up for missing data? > Here's my dataset: weeks 17,18,26 and 46 have 0 daysSamp. > > Year Week daysSamp Lower TotalPD Upper varTotalPD > 2006 47 6 126988 188259 249530 1045878675 > 2006 48 7 189155 253350 317545 1148102355 > 2006 49 7 103300 132741 162182 241480186 > 2006 50 6 11801 252576 493352 16151006813 > 2006 51 7 2348 3671 4994 487926 > 2006 52 5 2606 29901 57196 215454181 > 2006 2 7 2968 4513 6058 664723 > 2006 3 7 1128 1889 2650 161231 > 2006 4 7 479 963 1447 65196 > 2006 5 7 2819 4413 6007 708094 > 2006 6 6 -1009 3128 7264 4766743 > 2006 7 7 -5239 10769 26777 71387835 > 2006 8 7 150 503 856 34685 > 2006 9 7 1858 2989 4120 356562 > 2006 10 7 193 494 795 25281 > 2006 11 7 125 346 567 13627 > 2006 12 7 432 767 1102 31189 > 2006 13 7 1229 1867 2505 113569 > 2006 14 7 813 1339 1865 77140 > 2006 15 4 -66 124 315 10105 > 2006 16 7 152 903 1654 157242 > 2006 17 0 > 2006 18 0 > 2006 19 5 0 0 0 0 > 2006 20 4 0 0 0 0 > 2006 21 5 0 0 0 0 > 2006 22 6 0 0 0 0 > 2006 23 7 -65 285 635 34112 > 2006 24 6 0 0 0 0 > 2006 25 7 0 0 0 0 > 2006 26 0 > 2006 27 4 228 931 1634 137726 > 2006 28 4 801 2231 3662 569977 > 2006 29 4 4544 9242 13939 6147522 > 2006 30 5 15798 28465 41131 44697915 > 2006 31 5 25398 41049 56701 68245523 > 2006 32 5 48197 82216 116235 322416917 > 2006 33 5 142980 230411 317841 2129630128 > 2006 34 5 227141 360468 493794 4952314336 > 2006 35 5 467244 756325 1045405 23281569629 > 2006 36 5 281049 463331 645614 9256900449 > 2006 37 2 227636 620330 1013023 42961663047 > 2006 38 3 478990 983472 1487954 70903343603 > 2006 39 7 539690 846522 1153354 26228718974 > 2006 40 7 320959 457866 594773 5221891252 > 2006 41 7 427561 582452 737343 6683813344 > 2006 42 7 271788 351103 430418 1752614293 > 2006 43 7 165019 208853 252687 535301133 > 2006 44 7 91514 117390 143266 186537178 > 2006 45 7 59061 79187 99313 112842787 > 2006 46 0 > > Felipe D. Carrillo > Supervisory Fishery Biologist > Department of the Interior > US Fish & Wildlife Service > California, USA > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/Excel-Trend-Function-tp18423045p18423327.html Sent from the R help mailing list archive at Nabble.com.
Hi Felipe, Daniel mentions imputation is a disputed practice. There are recommendations and rules of thumb for its use. I am not sure that imputation is disputed. I would be interested to see some links to articles recommending against its use. Paul ----- Original Message ----- From: "Felipe Carrillo" <mazatlanmexico at yahoo.com> To: <r-help at stat.math.ethz.ch> Sent: Sunday, July 13, 2008 5:46 AM Subject: [R] Excel Trend Function> Hi: > I have a dataset and need to interpolate for missing days. In Excel I > either average from sampled days from above and below the missing days or > use the TREND function to make up for the missing values. I have been > reading about na.approx, is this function similar to the TREND function? > Which is the best recommendable way to make up for missing data? > Here's my dataset: weeks 17,18,26 and 46 have 0 daysSamp. > > Year Week daysSamp Lower TotalPD Upper varTotalPD > 2006 47 6 126988 188259 249530 1045878675 > 2006 48 7 189155 253350 317545 1148102355 > 2006 49 7 103300 132741 162182 241480186 > 2006 50 6 11801 252576 493352 16151006813 > 2006 51 7 2348 3671 4994 487926 > 2006 52 5 2606 29901 57196 215454181 > 2006 2 7 2968 4513 6058 664723 > 2006 3 7 1128 1889 2650 161231 > 2006 4 7 479 963 1447 65196 > 2006 5 7 2819 4413 6007 708094 > 2006 6 6 -1009 3128 7264 4766743 > 2006 7 7 -5239 10769 26777 71387835 > 2006 8 7 150 503 856 34685 > 2006 9 7 1858 2989 4120 356562 > 2006 10 7 193 494 795 25281 > 2006 11 7 125 346 567 13627 > 2006 12 7 432 767 1102 31189 > 2006 13 7 1229 1867 2505 113569 > 2006 14 7 813 1339 1865 77140 > 2006 15 4 -66 124 315 10105 > 2006 16 7 152 903 1654 157242 > 2006 17 0 > 2006 18 0 > 2006 19 5 0 0 0 0 > 2006 20 4 0 0 0 0 > 2006 21 5 0 0 0 0 > 2006 22 6 0 0 0 0 > 2006 23 7 -65 285 635 34112 > 2006 24 6 0 0 0 0 > 2006 25 7 0 0 0 0 > 2006 26 0 > 2006 27 4 228 931 1634 137726 > 2006 28 4 801 2231 3662 569977 > 2006 29 4 4544 9242 13939 6147522 > 2006 30 5 15798 28465 41131 44697915 > 2006 31 5 25398 41049 56701 68245523 > 2006 32 5 48197 82216 116235 322416917 > 2006 33 5 142980 230411 317841 2129630128 > 2006 34 5 227141 360468 493794 4952314336 > 2006 35 5 467244 756325 1045405 23281569629 > 2006 36 5 281049 463331 645614 9256900449 > 2006 37 2 227636 620330 1013023 42961663047 > 2006 38 3 478990 983472 1487954 70903343603 > 2006 39 7 539690 846522 1153354 26228718974 > 2006 40 7 320959 457866 594773 5221891252 > 2006 41 7 427561 582452 737343 6683813344 > 2006 42 7 271788 351103 430418 1752614293 > 2006 43 7 165019 208853 252687 535301133 > 2006 44 7 91514 117390 143266 186537178 > 2006 45 7 59061 79187 99313 112842787 > 2006 46 0 > > Felipe D. Carrillo > Supervisory Fishery Biologist > Department of the Interior > US Fish & Wildlife Service > California, USA > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >