Removing rows with NAs, using na.omit(), doesn't seem to be working for me. Dataset:> str ( ex10s )'data.frame': 2189576 obs. of 5 variables: $ LOPNR : int 58 58 58 58 64 64 64 64 64 64 ... $ DIAGNOS: Factor w/ 173 levels "F20","F200","F2000",..: 128 128 128 128 105 105 105 160 105 105 ... $ X_DATE : int 20060821 20061207 20080102 20090904 20010327 20010925 20020307 20021007 20021007 20030320 ... $ SOURCE : int 2 2 2 2 2 2 2 2 2 1 ... $ dg : Factor w/ 7 levels "0","1","2","3",..: 6 6 6 6 5 5 5 6 5 5 ... The only NAs are in the factor dg (put in by 'recode' from the car library; I'm trying to eliminate cases with particular factor levels)> table ( ex10s$dg )0 1 2 3 4 5 NA 2851 271501 63112 98425 335593 1257299 160795 So, I remove the rows with NAs, to a new dataframe ex10ss:> ex10ss<-na.omit(ex10s)Check all the NAs have been removed:> table(ex10ss$dg)0 1 2 3 4 5 NA 2851 271501 63112 98425 335593 1257299 160795> dim(ex10s)[1] 2189576 5> dim(ex10ss)[1] 2189576 5 Nothing seems to have changed. I want all the rows with NA in removed. I am clearly doing something wrong. The only alternative I could find is pretty similar: use <- complete.cases ( ex10 ) ex10ss<-ex10s[use,] which leads to the same result. Stuart Dr Stuart John Leask DM FRCPsych MB Mchir Clinical Senior Lecturer and Honorary Consultant Pychiatrist Institute of Mental Health, Innovation Park Triumph Road, Nottingham, Notts. NG7 2TU. UK Tel. +44 115 82 30419 stuart.leask@nottingham.ac.uk<mailto:stuart.leask@nottingham.ac.uk> Google 'Dr Stuart Leask' This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. [[alternative HTML version deleted]]
Hi both na.omit and complete cases works for me smoothly when NA is not a valid level in factor. If this is the case, as it seems to be, you need reset your factor levels so that NA is not a valid level. ex10s$dg <- factor( ex10s$dg ) both commands shall work than. Regards Petr> > Removing rows with NAs, using na.omit(), doesn't seem to be working forme.> > Dataset: > > > str ( ex10s ) > > 'data.frame': 2189576 obs. of 5 variables: > $ LOPNR : int 58 58 58 58 64 64 64 64 64 64 ... > $ DIAGNOS: Factor w/ 173 levels "F20","F200","F2000",..: 128 128 128 128> 105 105 105 160 105 105 ... > $ X_DATE : int 20060821 20061207 20080102 20090904 20010327 20010925 > 20020307 20021007 20021007 20030320 ... > $ SOURCE : int 2 2 2 2 2 2 2 2 2 1 ... > $ dg : Factor w/ 7 levels "0","1","2","3",..: 6 6 6 6 5 5 5 6 5 5...> > The only NAs are in the factor dg (put in by 'recode' from the car > library; I'm trying to eliminate cases with particular factor levels) > > > table ( ex10s$dg ) > > 0 1 2 3 4 5 NA > 2851 271501 63112 98425 335593 1257299 160795 > > So, I remove the rows with NAs, to a new dataframe ex10ss: > > > ex10ss<-na.omit(ex10s) > > Check all the NAs have been removed: > > > table(ex10ss$dg) > > 0 1 2 3 4 5 NA > 2851 271501 63112 98425 335593 1257299 160795 > > > dim(ex10s) > [1] 2189576 5 > > dim(ex10ss) > [1] 2189576 5 > > Nothing seems to have changed. I want all the rows with NA in removed. > > I am clearly doing something wrong. > > The only alternative I could find is pretty similar: > use <- complete.cases ( ex10 ) > ex10ss<-ex10s[use,] > which leads to the same result. > > > Stuart > > > Dr Stuart John Leask DM FRCPsych MB Mchir > Clinical Senior Lecturer and Honorary Consultant Pychiatrist > Institute of Mental Health, Innovation Park > Triumph Road, Nottingham, Notts. NG7 2TU. UK > Tel. +44 115 82 30419 stuart.leask at nottingham.ac.uk< > mailto:stuart.leask at nottingham.ac.uk> > Google 'Dr Stuart Leask' > > > This message and any attachment are intended solely for the addresseeand> may contain confidential information. If you have received this messagein> error, please send it back to me, and immediately delete it. Please do> not use, copy or disclose the information contained in this message orin> any attachment. Any views or opinions expressed by the author of this > email do not necessarily reflect the views of the University ofNottingham.> > This message has been checked for viruses but the contents of anattachment> may still contain software viruses which could damage your computersystem:> you are advised to perform your own checks. Email communications withthe> University of Nottingham may be monitored as permitted by UKlegislation.> [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
On 22/06/2012 09:41, Stuart Leask wrote:> Removing rows with NAs, using na.omit(), doesn't seem to be working for me.It won't if NA is a level of the factor, which is what you seems to have here. For > table(as.factor(c(1,2,NA))) 1 2 1 1 omits NAs by default.> Dataset: > >> str ( ex10s ) > > 'data.frame': 2189576 obs. of 5 variables: > $ LOPNR : int 58 58 58 58 64 64 64 64 64 64 ... > $ DIAGNOS: Factor w/ 173 levels "F20","F200","F2000",..: 128 128 128 128 105 105 105 160 105 105 ... > $ X_DATE : int 20060821 20061207 20080102 20090904 20010327 20010925 20020307 20021007 20021007 20030320 ... > $ SOURCE : int 2 2 2 2 2 2 2 2 2 1 ... > $ dg : Factor w/ 7 levels "0","1","2","3",..: 6 6 6 6 5 5 5 6 5 5 ... > > The only NAs are in the factor dg (put in by 'recode' from the car library; I'm trying to eliminate cases with particular factor levels) > >> table ( ex10s$dg ) > > 0 1 2 3 4 5 NA > 2851 271501 63112 98425 335593 1257299 160795 > > So, I remove the rows with NAs, to a new dataframe ex10ss: > >> ex10ss<-na.omit(ex10s) > > Check all the NAs have been removed: > >> table(ex10ss$dg) > > 0 1 2 3 4 5 NA > 2851 271501 63112 98425 335593 1257299 160795 > >> dim(ex10s) > [1] 2189576 5 >> dim(ex10ss) > [1] 2189576 5 > > Nothing seems to have changed. I want all the rows with NA in removed. > > I am clearly doing something wrong. > > The only alternative I could find is pretty similar: > use <- complete.cases ( ex10 ) > ex10ss<-ex10s[use,] > which leads to the same result. > > > Stuart > > > Dr Stuart John Leask DM FRCPsych MB Mchir > Clinical Senior Lecturer and Honorary Consultant Pychiatrist > Institute of Mental Health, Innovation Park > Triumph Road, Nottingham, Notts. NG7 2TU. UK > Tel. +44 115 82 30419 stuart.leask at nottingham.ac.uk<mailto:stuart.leask at nottingham.ac.uk> > Google 'Dr Stuart Leask' > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 2012-06-22 01:41, Stuart Leask wrote:> Removing rows with NAs, using na.omit(), doesn't seem to be working for me. > > Dataset: > >> str ( ex10s ) > > 'data.frame': 2189576 obs. of 5 variables: > $ LOPNR : int 58 58 58 58 64 64 64 64 64 64 ... > $ DIAGNOS: Factor w/ 173 levels "F20","F200","F2000",..: 128 128 128 128 105 105 105 160 105 105 ... > $ X_DATE : int 20060821 20061207 20080102 20090904 20010327 20010925 20020307 20021007 20021007 20030320 ... > $ SOURCE : int 2 2 2 2 2 2 2 2 2 1 ... > $ dg : Factor w/ 7 levels "0","1","2","3",..: 6 6 6 6 5 5 5 6 5 5 ... > > The only NAs are in the factor dg (put in by 'recode' from the car library; I'm trying to eliminate cases with particular factor levels) > >> table ( ex10s$dg ) > > 0 1 2 3 4 5 NA > 2851 271501 63112 98425 335593 1257299 160795This shows that what you think are missing values (NAs) R considers to be values at the factor level "NA". If you do levels(ex10s$dg) you should see "NA" as one of the levels. This probably resulted from incorrect data import. When you print ex10s$dg you should see missing values printed as <NA>, not NA. Either re-import the data or run is.na(ex10s$dg) <- ex10s$dg == "NA" ex10s$dg <- factor(ex10s$dg) ## to remove the superfluous level Peter Ehlers> > So, I remove the rows with NAs, to a new dataframe ex10ss: > >> ex10ss<-na.omit(ex10s) > > Check all the NAs have been removed: > >> table(ex10ss$dg) > > 0 1 2 3 4 5 NA > 2851 271501 63112 98425 335593 1257299 160795 > >> dim(ex10s) > [1] 2189576 5 >> dim(ex10ss) > [1] 2189576 5 > > Nothing seems to have changed. I want all the rows with NA in removed. > > I am clearly doing something wrong. > > The only alternative I could find is pretty similar: > use<- complete.cases ( ex10 ) > ex10ss<-ex10s[use,] > which leads to the same result. > > > Stuart > > > Dr Stuart John Leask DM FRCPsych MB Mchir > Clinical Senior Lecturer and Honorary Consultant Pychiatrist > Institute of Mental Health, Innovation Park > Triumph Road, Nottingham, Notts. NG7 2TU. UK > Tel. +44 115 82 30419 stuart.leask at nottingham.ac.uk<mailto:stuart.leask at nottingham.ac.uk> > Google 'Dr Stuart Leask' > > > This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. > > This message has been checked for viruses but the contents of an attachment > may still contain software viruses which could damage your computer system: > you are advised to perform your own checks. Email communications with the > University of Nottingham may be monitored as permitted by UK legislation. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.