Hi. I need to apply run a regression analysis for groups of data of fixed length:100 As, 100 Bs, 100 Cs etc. eg x Key Value A 1 A 21.2 A 4 A 6.5 ...repeat 96 times with differing values of A B 1 B 2.3 B NA B 6.5 ...repeat 96 times with differing values of B etc I run these against a linear model using tapply(data$Value, data$Key,FUN=regr,100) where regr<-function(x,w) { #run the model against the last w values of x lm((x[length(x)-w):length(x)]~myModel(w)) } In the results, I want to return NA for any Key group where one or more of the values is NA. If I run the above I get a regression structure ignoring the missing values and returning values for data that contains NA. Using na.action=na.fail or na.action=NULL causes the whole tapply function to fail and I get nothing. Is there a way I can get lm to return NA if any of the values in the data are NA but valid numbers for complete data? I realise that I could remove the groups with NAs but I'm running the regressions over multiple time periods and most of the data groups will have a full complement of data for at least some of these periods. It becomes a pain to manage NAs if I do that. Sorry if the above is a little unclear. Thanks Neil . This message is intended only for the use of the person(s) to whom it is addressed. It may contain information which is privileged and confidential. Accordingly any unauthorised use is strictly prohibited. If you are not the intended recipient, please contact the sender as soon as possible. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction, unless specifically agreed otherwise. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any opinions or advice contained in this Internet email are subject to the terms and conditions expressed in any applicable governing Marble Bar Asset Management LLP's terms and conditions of business or client agreement letter. Any comments or statements made herein do not necessarily reflect those of Marble Bar Asset Management LLP. Marble Bar Asset Management LLP is regulated and authorised by the FSA. [[alternative HTML version deleted]]
See ?na.exclude On Fri, 23 Jan 2009, Neil Beddoe wrote:> Hi. > > I need to apply run a regression analysis for groups of data of fixed length:100 As, 100 Bs, 100 Cs etc. > > eg > > x > Key Value > A 1 > A 21.2 > A 4 > A 6.5 > ...repeat 96 times with differing values of A > B 1 > B 2.3 > B NA > B 6.5 > ...repeat 96 times with differing values of B > etc > > I run these against a linear model using tapply(data$Value, data$Key,FUN=regr,100) where > regr<-function(x,w) > { > #run the model against the last w values of x > lm((x[length(x)-w):length(x)]~myModel(w)) > } > In the results, I want to return NA for any Key group where one or more of the values is NA. If I run the above I get a regression structure ignoring the missing values and returning values for data that contains NA. Using na.action=na.fail or na.action=NULL causes the whole tapply function to fail and I get nothing. Is there a way I can get lm to return NA if any of the values in the data are NA but valid numbers for complete data? > > I realise that I could remove the groups with NAs but I'm running the regressions over multiple time periods and most of the data groups will have a full complement of data for at least some of these periods. It becomes a pain to manage NAs if I do that. > > Sorry if the above is a little unclear. > > Thanks > > Neil-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595