Hello! I'm trying to create a subset of a dataset and then remove all rows with NAs in them. Ultimately, I am running phylogenetic analyses with trees that require the tree tiplabels to match exactly with the rows in the dataframe. But when I use na.omit to delete the rows with NAs, there is still a trace of those omitted rows in the data.frame, which then causes an error in the phylogenetic analyses. Is there any way to completely scrub those omitted rows from the dataframe? The code is below. As you can see from the result of the final str(Protect1) line, there are attributes with the omitted features still in the dataframe (356 species names in the UphamComplBinomial factor, but only 319 observations). These traces are causing errors with the phylo analyses.> Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep, Shade)) #Create the dataframe with variables of interest from an attached dataset > row.names(Protect1)=Protect1$UphamComplBinomial #assign species names as rownames > Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing data > str(Protect1)'data.frame': 319 obs. of 4 variables: $ UphamComplBinomial: Factor w/ 356 levels "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 11 12 ... $ DarkEum : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ... $ NoctCrep : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ... $ Shade : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 17 49 52 52 39 39 41 ... - attr(*, "na.action")= 'omit' Named int 6 7 23 36 37 40 42 50 51 60 ... ..- attr(*, "names")= chr "Alouatta_macconnelli_ATELIDAE_PRIMATES" "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" "Callicebus_baptista_PITHECIIDAE_PRIMATES" ... Dr. Ted Stankowich Associate Professor Department of Biological Sciences California State University Long Beach Long Beach, CA 90840 theodore.stankowich at csulb.edu<mailto:theodore.stankowich at csulb.edu> 562-985-4826 http://www.csulb.edu/mammal-lab/ @CSULBMammalLab [[alternative HTML version deleted]]
Does droplevels() help?> d <- data.frame(size = factor(c("S","M","M","L","L"),levels=c("S","M","L")), id=c(101,NA,NA,104,105))> str(d)'data.frame': 5 obs. of 2 variables: $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3 $ id : num 101 NA NA 104 105> str(na.omit(d))'data.frame': 3 obs. of 2 variables: $ size: Factor w/ 3 levels "S","M","L": 1 3 3 $ id : num 101 104 105 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3 ..- attr(*, "names")= chr [1:2] "2" "3"> str(droplevels(na.omit(d)))'data.frame': 3 obs. of 2 variables: $ size: Factor w/ 2 levels "S","L": 1 2 2 $ id : num 101 104 105 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3 ..- attr(*, "names")= chr [1:2] "2" "3" Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich < Theodore.Stankowich at csulb.edu> wrote:> Hello! I'm trying to create a subset of a dataset and then remove all rows > with NAs in them. Ultimately, I am running phylogenetic analyses with trees > that require the tree tiplabels to match exactly with the rows in the > dataframe. But when I use na.omit to delete the rows with NAs, there is > still a trace of those omitted rows in the data.frame, which then causes an > error in the phylogenetic analyses. Is there any way to completely scrub > those omitted rows from the dataframe? The code is below. As you can see > from the result of the final str(Protect1) line, there are attributes with > the omitted features still in the dataframe (356 species names in the > UphamComplBinomial factor, but only 319 observations). These traces are > causing errors with the phylo analyses. > > > Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep, > Shade)) #Create the dataframe with variables of interest from an attached > dataset > > row.names(Protect1)=Protect1$UphamComplBinomial #assign species names as > rownames > > Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing data > > str(Protect1) > 'data.frame': 319 obs. of 4 variables: > $ UphamComplBinomial: Factor w/ 356 levels > "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 > 11 12 ... > $ DarkEum : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ... > $ NoctCrep : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ... > $ Shade : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 > 17 49 52 52 39 39 41 ... > - attr(*, "na.action")= 'omit' Named int 6 7 23 36 37 40 42 50 51 60 ... > ..- attr(*, "names")= chr "Alouatta_macconnelli_ATELIDAE_PRIMATES" > "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" > "Callicebus_baptista_PITHECIIDAE_PRIMATES" ... > > Dr. Ted Stankowich > Associate Professor > Department of Biological Sciences > California State University Long Beach > Long Beach, CA 90840 > theodore.stankowich at csulb.edu<mailto:theodore.stankowich at csulb.edu> > 562-985-4826 > http://www.csulb.edu/mammal-lab/ > @CSULBMammalLab > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Thanks, but no that doesn?t work. The na.omit attributes are still in the dataframe, which you can see in the str outputs from the post. The problem line is likely: - attr(*, "na.action")= 'omit' Named int [1:2] 2 3 From: William Dunlap [mailto:wdunlap at tibco.com] Sent: Thursday, June 4, 2020 12:39 PM To: Ted Stankowich <Theodore.Stankowich at csulb.edu> Cc: r-help at r-project.org Subject: Re: [R] na.omit not omitting rows CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments. Does droplevels() help?> d <- data.frame(size = factor(c("S","M","M","L","L"), levels=c("S","M","L")), id=c(101,NA,NA,104,105)) > str(d)'data.frame': 5 obs. of 2 variables: $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3 $ id : num 101 NA NA 104 105> str(na.omit(d))'data.frame': 3 obs. of 2 variables: $ size: Factor w/ 3 levels "S","M","L": 1 3 3 $ id : num 101 104 105 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3 ..- attr(*, "names")= chr [1:2] "2" "3"> str(droplevels(na.omit(d)))'data.frame': 3 obs. of 2 variables: $ size: Factor w/ 2 levels "S","L": 1 2 2 $ id : num 101 104 105 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3 ..- attr(*, "names")= chr [1:2] "2" "3" Bill Dunlap TIBCO Software wdunlap tibco.com<http://tibco.com> On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich <Theodore.Stankowich at csulb.edu<mailto:Theodore.Stankowich at csulb.edu>> wrote: Hello! I'm trying to create a subset of a dataset and then remove all rows with NAs in them. Ultimately, I am running phylogenetic analyses with trees that require the tree tiplabels to match exactly with the rows in the dataframe. But when I use na.omit to delete the rows with NAs, there is still a trace of those omitted rows in the data.frame, which then causes an error in the phylogenetic analyses. Is there any way to completely scrub those omitted rows from the dataframe? The code is below. As you can see from the result of the final str(Protect1) line, there are attributes with the omitted features still in the dataframe (356 species names in the UphamComplBinomial factor, but only 319 observations). These traces are causing errors with the phylo analyses.> Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep, Shade)) #Create the dataframe with variables of interest from an attached dataset > row.names(Protect1)=Protect1$UphamComplBinomial #assign species names as rownames > Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing data > str(Protect1)'data.frame': 319 obs. of 4 variables: $ UphamComplBinomial: Factor w/ 356 levels "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 11 12 ... $ DarkEum : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ... $ NoctCrep : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ... $ Shade : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 17 49 52 52 39 39 41 ... - attr(*, "na.action")= 'omit' Named int 6 7 23 36 37 40 42 50 51 60 ... ..- attr(*, "names")= chr "Alouatta_macconnelli_ATELIDAE_PRIMATES" "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" "Callicebus_baptista_PITHECIIDAE_PRIMATES" ... Dr. Ted Stankowich Associate Professor Department of Biological Sciences California State University Long Beach Long Beach, CA 90840 theodore.stankowich at csulb.edu<mailto:theodore.stankowich at csulb.edu><mailto:theodore.stankowich at csulb.edu<mailto:theodore.stankowich at csulb.edu>> 562-985-4826 http://www.csulb.edu/mammal-lab/ @CSULBMammalLab [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]