Sashikanth Chandrasekaran
2013-Oct-30 00:39 UTC
[R] Fitting multiple horizontal lines to data
Dear R-users, I am trying to fit my data using one or more horizontal lines. If my data is in "y", I understand that "lm(y~1)" will fit a single horizontal line at mean(y). However, I want to try and fit the data with multiple horizontal lines if that reduces the error while still keeping the number of horizontal lines to be as small as possible. Concretely, assume: y=c(2,4,2,4,2,4,2,4,8,10,8,10,8,10,8,10) lm(y~1) fits a single horizontal line at y=6. A better fit using multiple horizontal lines would be 2 horizontal lines at y = 3 and y = 9. An even better (if the objective is to solely minimize error and not penalize the number of horizontal lines) would be 4 horizontal lines at y=2, y=4, y=8, y=10. kMeans works for the simple example I have shown, but I would like advice on whether there is a better method that will work when the data does not fit a horizontal line exactly. Thanks, -sashi. [[alternative HTML version deleted]]
Your question doesn't make much sense if you really believe that the best fit is to draw a horizontal line at every unique value of y. What is the actual problem you are trying to solve? Clearly it's not a matter of linear fits, so forget about using "lm" or other regression tools. -- View this message in context: http://r.789695.n4.nabble.com/Fitting-multiple-horizontal-lines-to-data-tp4679324p4679345.html Sent from the R help mailing list archive at Nabble.com.
Sashikanth Chandrasekaran
2013-Nov-06 17:19 UTC
[R] Fitting multiple horizontal lines to data
I am not trying to fit a horizontal line at every unique value of y. I am trying fit the y values with as few horizontal lines by trading off the number of horizontal lines with the error. The actual problem I am trying to solve is to smooth data in a time series. Here is a realistic example of y y=c(134.45,141.82,143.81,141.81,145,141.61,143.72,145.71,200,175,140,200,148.77,71.64,111.57,118.15,119.15,112.8,111.64,111.64,157.26,143.8,40.19,64.99,64.99,129.98,64.99,65,64.98,64.99) An example fit for y using multiple horizontal lines (may not be the best fit in terms of squared error or another error metric, but I have included the y value for concreteness) 1. A horizontal line at approximately y=140 (to fit the first 13 values - 134.45 to 148.77) 2. A horizontal line at approximately y=110 (to fit the next 7 values - 71.64 to 111.64) 3. A horizontal line at approximately y=150 (to fit the next 2 values - 157.26 to 143.8) 4. A horizontal line at approximately y=65 (to fit the last 8 values - 40.19 to 64.99) -sashi. [[alternative HTML version deleted]]