raphael.felber at art.admin.ch
2012-Dec-13 08:20 UTC
[R] remove NA in df results in NA, NA.1 ... rows
Good morning! I have the following data frame (df): X.outer Y.outer X.PAD1 Y.PAD1 X.PAD2 Y.PAD2 X.PAD3 Y.PAD3 X.PAD4 Y.PAD4 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 574747.1 179598 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 574724.9 179673 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 574729.2 179674 76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760 574831.8 179699 77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755 574852.3 179626 78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754 574747.1 179598 79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747 NA NA 80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738 NA NA 81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729 NA NA 82 574615.4 179731.0 574718.2 179690.0 NA NA 574708.1 179724 NA NA 83 574614.4 179733.6 574719.1 179688.0 NA NA 574709.3 179720 NA NA ... 44 574702.0 179754.0 NA NA NA NA NA NA NA NA 45 574695.1 179751.0 NA NA NA NA NA NA NA NA 46 574694.4 179752.0 NA NA NA NA NA NA NA NA Which I subset to df2 <- df[,c("X.PAD2","Y.PAD2")] df2 X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 82 NA NA 83 NA NA ... 44 NA NA 45 NA NA 46 NA NA followed by removing the NA's using df2 <- df2[!is.na(df2),] If I now call df2, I get: X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 NA NA NA NA.1 NA NA NA.2 NA NA NA.3 NA NA NA.4 NA NA NA.5 NA NA NA.6 NA NA NA.7 NA NA NA.8 NA NA It seems there are still NA's in my data frame. How can I get rid of them? What is the meaning of the rows numbered NA, NA.1 and so on? Thanks for any hints. Best regards Raphael Felber [[alternative HTML version deleted]]
Hi Raphael, see below.> I have the following data frame (df): > ... > > df2 > > X.PAD2 Y.PAD2 > 73 574618.3 179650 > 74 574719.0 179688 > 75 574719.0 179688 > 76 574723.5 179678 > 77 574724.9 179673 > 78 574747.1 179598 > 79 574641.8 179570 > 80 574639.6 179573 > 81 574618.3 179650 > 82 NA NA > 83 NA NA > ... > 44 NA NA > 45 NA NA > 46 NA NA > > followed by removing the NA's using > > > df2 <- df2[!is.na(df2),] > > ...is.na( df2) produces a logical matrix (!), and you are then indexing the rows of your data frame with a matrix which is "converted" into a vector of its elements producing far too many logical indices for your task (so to say). I assume you should be using> na.omit( df2)instead. Hth -- Gerrit --------------------------------------------------------------------- Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner
df2 <- df2[!is.na(df2),] isn't doing what you want it to do because df2 is a data.frame and not a vector to solve your problem, review http://stackoverflow.com/questions/4862178/r-remove-rows-with-nas-in-data-frame On Thu, Dec 13, 2012 at 3:20 AM, <raphael.felber@art.admin.ch> wrote:> Good morning! > > I have the following data frame (df): > > X.outer Y.outer X.PAD1 Y.PAD1 X.PAD2 Y.PAD2 X.PAD3 Y.PAD3 > X.PAD4 Y.PAD4 > 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 > 574747.1 179598 > 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 > 574724.9 179673 > 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 > 574729.2 179674 > 76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760 > 574831.8 179699 > 77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755 > 574852.3 179626 > 78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754 > 574747.1 179598 > 79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747 > NA NA > 80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738 > NA NA > 81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729 > NA NA > 82 574615.4 179731.0 574718.2 179690.0 NA NA 574708.1 179724 > NA NA > 83 574614.4 179733.6 574719.1 179688.0 NA NA 574709.3 179720 > NA NA > ... > > 44 574702.0 179754.0 NA NA NA NA NA NA > NA NA > > 45 574695.1 179751.0 NA NA NA NA NA NA > NA NA > > 46 574694.4 179752.0 NA NA NA NA NA NA > NA NA > > Which I subset to > > df2 <- df[,c("X.PAD2","Y.PAD2")] > > df2 > > X.PAD2 Y.PAD2 > > 73 574618.3 179650 > > 74 574719.0 179688 > > 75 574719.0 179688 > > 76 574723.5 179678 > > 77 574724.9 179673 > > 78 574747.1 179598 > > 79 574641.8 179570 > > 80 574639.6 179573 > > 81 574618.3 179650 > > 82 NA NA > > 83 NA NA > > ... > > 44 NA NA > > 45 NA NA > > 46 NA NA > > > > > > followed by removing the NA's using > > > > df2 <- df2[!is.na(df2),] > > > > If I now call df2, I get: > > > > X.PAD2 Y.PAD2 > > 73 574618.3 179650 > > 74 574719.0 179688 > > 75 574719.0 179688 > > 76 574723.5 179678 > > 77 574724.9 179673 > > 78 574747.1 179598 > > 79 574641.8 179570 > > 80 574639.6 179573 > > 81 574618.3 179650 > > NA NA NA > > NA.1 NA NA > > NA.2 NA NA > > NA.3 NA NA > > NA.4 NA NA > > NA.5 NA NA > > NA.6 NA NA > > NA.7 NA NA > > NA.8 NA NA > > > > It seems there are still NA's in my data frame. How can I get rid of them? > What is the meaning of the rows numbered NA, NA.1 and so on? > > > > Thanks for any hints. > > > > Best regards > > > > Raphael Felber > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
is.na(df2) is not doing what you think it is doing. Perhaps you should read ?na.omit. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. raphael.felber at art.admin.ch wrote:>Good morning! > >I have the following data frame (df): > >X.outer Y.outer X.PAD1 Y.PAD1 X.PAD2 Y.PAD2 X.PAD3 Y.PAD3 >X.PAD4 Y.PAD4 >73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 >574747.1 179598 >74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 >574724.9 179673 >75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 >574729.2 179674 >76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760 >574831.8 179699 >77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755 >574852.3 179626 >78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754 >574747.1 179598 >79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747 > NA NA >80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738 > NA NA >81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729 > NA NA >82 574615.4 179731.0 574718.2 179690.0 NA NA 574708.1 179724 > NA NA >83 574614.4 179733.6 574719.1 179688.0 NA NA 574709.3 179720 > NA NA >... > >44 574702.0 179754.0 NA NA NA NA NA NA > NA NA > >45 574695.1 179751.0 NA NA NA NA NA NA > NA NA > >46 574694.4 179752.0 NA NA NA NA NA NA > NA NA > >Which I subset to > >df2 <- df[,c("X.PAD2","Y.PAD2")] > >df2 > > X.PAD2 Y.PAD2 > >73 574618.3 179650 > >74 574719.0 179688 > >75 574719.0 179688 > >76 574723.5 179678 > >77 574724.9 179673 > >78 574747.1 179598 > >79 574641.8 179570 > >80 574639.6 179573 > >81 574618.3 179650 > >82 NA NA > >83 NA NA > >... > >44 NA NA > >45 NA NA > >46 NA NA > > > > > >followed by removing the NA's using > > > >df2 <- df2[!is.na(df2),] > > > >If I now call df2, I get: > > > > X.PAD2 Y.PAD2 > >73 574618.3 179650 > >74 574719.0 179688 > >75 574719.0 179688 > >76 574723.5 179678 > >77 574724.9 179673 > >78 574747.1 179598 > >79 574641.8 179570 > >80 574639.6 179573 > >81 574618.3 179650 > >NA NA NA > >NA.1 NA NA > >NA.2 NA NA > >NA.3 NA NA > >NA.4 NA NA > >NA.5 NA NA > >NA.6 NA NA > >NA.7 NA NA > >NA.8 NA NA > > > >It seems there are still NA's in my data frame. How can I get rid of >them? What is the meaning of the rows numbered NA, NA.1 and so on? > > > >Thanks for any hints. > > > >Best regards > > > >Raphael Felber > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
You can use "complete.cases": df <- df[complete.cases(df), ] On Thu, Dec 13, 2012 at 3:20 AM, <raphael.felber at art.admin.ch> wrote:> Good morning! > > I have the following data frame (df): > > X.outer Y.outer X.PAD1 Y.PAD1 X.PAD2 Y.PAD2 X.PAD3 Y.PAD3 X.PAD4 Y.PAD4 > 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 574747.1 179598 > 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 574724.9 179673 > 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 574729.2 179674 > 76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760 574831.8 179699 > 77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755 574852.3 179626 > 78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754 574747.1 179598 > 79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747 NA NA > 80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738 NA NA > 81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729 NA NA > 82 574615.4 179731.0 574718.2 179690.0 NA NA 574708.1 179724 NA NA > 83 574614.4 179733.6 574719.1 179688.0 NA NA 574709.3 179720 NA NA > ... > > 44 574702.0 179754.0 NA NA NA NA NA NA NA NA > > 45 574695.1 179751.0 NA NA NA NA NA NA NA NA > > 46 574694.4 179752.0 NA NA NA NA NA NA NA NA > > Which I subset to > > df2 <- df[,c("X.PAD2","Y.PAD2")] > > df2 > > X.PAD2 Y.PAD2 > > 73 574618.3 179650 > > 74 574719.0 179688 > > 75 574719.0 179688 > > 76 574723.5 179678 > > 77 574724.9 179673 > > 78 574747.1 179598 > > 79 574641.8 179570 > > 80 574639.6 179573 > > 81 574618.3 179650 > > 82 NA NA > > 83 NA NA > > ... > > 44 NA NA > > 45 NA NA > > 46 NA NA > > > > > > followed by removing the NA's using > > > > df2 <- df2[!is.na(df2),] > > > > If I now call df2, I get: > > > > X.PAD2 Y.PAD2 > > 73 574618.3 179650 > > 74 574719.0 179688 > > 75 574719.0 179688 > > 76 574723.5 179678 > > 77 574724.9 179673 > > 78 574747.1 179598 > > 79 574641.8 179570 > > 80 574639.6 179573 > > 81 574618.3 179650 > > NA NA NA > > NA.1 NA NA > > NA.2 NA NA > > NA.3 NA NA > > NA.4 NA NA > > NA.5 NA NA > > NA.6 NA NA > > NA.7 NA NA > > NA.8 NA NA > > > > It seems there are still NA's in my data frame. How can I get rid of them? What is the meaning of the rows numbered NA, NA.1 and so on? > > > > Thanks for any hints. > > > > Best regards > > > > Raphael Felber > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
Hi, You could use either: ?na.omit() #the option was already suggested #or df2[complete.cases(df2),] #In this case, this should also work sapply(df2,function(x) x[!is.na(x)]) #or ?apply(df2,2,function(x) x[!is.na(x)]) #If the NAs are not in the same rows, then the ouptut will be a list with the list elements differ in length. A.K. ----- Original Message ----- From: "raphael.felber at art.admin.ch" <raphael.felber at art.admin.ch> To: r-help at r-project.org Cc: Sent: Thursday, December 13, 2012 3:20 AM Subject: [R] remove NA in df results in NA, NA.1 ... rows Good morning! I have the following data frame (df): ? ? X.outer? Y.outer? X.PAD1? Y.PAD1? X.PAD2 Y.PAD2? X.PAD3 Y.PAD3? X.PAD4 Y.PAD4 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 574747.1 179598 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 574724.9 179673 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 574729.2 179674 76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760 574831.8 179699 77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755 574852.3 179626 78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754 574747.1 179598 79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747? ? ? NA? ? NA 80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738? ? ? NA? ? NA 81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729? ? ? NA? ? NA 82 574615.4 179731.0 574718.2 179690.0? ? ? NA? ? NA 574708.1 179724? ? ? NA? ? NA 83 574614.4 179733.6 574719.1 179688.0? ? ? NA? ? NA 574709.3 179720? ? ? NA? ? NA ... 44 574702.0 179754.0? ? ? NA? ? ? NA? ? ? NA? ? NA? ? ? NA? ? NA? ? ? NA? ? NA 45 574695.1 179751.0? ? ? NA? ? ? NA? ? ? NA? ? NA? ? ? NA? ? NA? ? ? NA? ? NA 46 574694.4 179752.0? ? ? NA? ? ? NA? ? ? NA? ? NA? ? ? NA? ? NA? ? ? NA? ? NA Which I subset to df2 <- df[,c("X.PAD2","Y.PAD2")] df2 ? ? X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 82? ? ? NA? ? NA 83? ? ? NA? ? NA ... 44? ? ? NA? ? NA 45? ? ? NA? ? NA 46? ? ? NA? ? NA followed by removing the NA's using df2 <- df2[!is.na(df2),] If I now call df2, I get: ? ? ? X.PAD2 Y.PAD2 73? 574618.3 179650 74? 574719.0 179688 75? 574719.0 179688 76? 574723.5 179678 77? 574724.9 179673 78? 574747.1 179598 79? 574641.8 179570 80? 574639.6 179573 81? 574618.3 179650 NA? ? ? ? NA? ? NA NA.1? ? ? NA? ? NA NA.2? ? ? NA? ? NA NA.3? ? ? NA? ? NA NA.4? ? ? NA? ? NA NA.5? ? ? NA? ? NA NA.6? ? ? NA? ? NA NA.7? ? ? NA? ? NA NA.8? ? ? NA? ? NA It seems there are still NA's in my data frame. How can I get rid of them? What is the meaning of the rows numbered NA, NA.1 and so on? Thanks for any hints. Best regards Raphael Felber ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.