Dear R-community,
I am using R (V 2.14.1) on Windows 7. I have a dataset which consists of 19
variables for 91 individuals or rows. Two of my variables are Age
(adult/chick, with no NA values) and Sex (0 for females/1 for females, with
quite a few NA values). The sex of many adult birds is unknown (entered as
NA in dataframe). At some point of my analyses, I happen to need to need to
work with only male adults, so I tried subsetting the dataframe as follows
(see code below) but I get a new dataframe containing all the males but also
a lot of unneeded information such as data in rows 1-7 (NAs), 13, 14, 19 and
21-30. I suspect this is caused by NAs in the variable Sex because
everything goes fine (I get a dataframe containing adults) if I run the same
code but without the "& Data$Sex == 1" part.
How can I fix this problem? I there a straightforward way of subsetting
efficiently when NAs are present in the original dataset?
Thank you so much!
Luciano
adult.males <- Data[Data$Category == "Adult" & Data$Sex == 1,]
adult.males
ID Category Sex Beak Head
NA <NA> <NA> NA NA NA
NA.1 <NA> <NA> NA NA NA
NA.2 <NA> <NA> NA NA NA
NA.3 <NA> <NA> NA NA NA
NA.4 <NA> <NA> NA NA NA
NA.5 <NA> <NA> NA NA NA
NA.6 <NA> <NA> NA NA NA
9 LAA10 Adult 1 57.40 121.95
10 LAA11 Adult 1 56.40 113.00
11 LAA12 Adult 1 52.00 111.85
13 LAA14 Adult 1 56.55 124.85
15 LAA16 Adult 1 57.15 120.10
NA.7 <NA> <NA> NA NA NA
NA.8 <NA> <NA> NA NA NA
21 LAA22 Adult 1 56.85 117.35
22 LAA23 Adult 1 54.80 117.45
27 LAA28 Adult 1 59.00 116.75
28 LAA29 Adult 1 55.95 124.25
NA.9 <NA> <NA> NA NA NA
30 LAA31 Adult 1 57.70 112.80
NA.10 <NA> <NA> NA NA NA
NA.11 <NA> <NA> NA NA NA
NA.12 <NA> <NA> NA NA NA
NA.13 <NA> <NA> NA NA NA
NA.14 <NA> <NA> NA NA NA
NA.15 <NA> <NA> NA NA NA
NA.16 <NA> <NA> NA NA NA
NA.17 <NA> <NA> NA NA NA
NA.18 <NA> <NA> NA NA NA
NA.19 <NA> <NA> NA NA NA
Hello,
Try ?is.na.
In the example below I've changed your first row name from NA to NA.0
x <- read.table(text="
ID Category Sex Beak Head
NA.0 <NA> <NA> NA NA NA
NA.1 <NA> <NA> NA NA NA
NA.2 <NA> <NA> NA NA NA
NA.3 <NA> <NA> NA NA NA
NA.4 <NA> <NA> NA NA NA
NA.5 <NA> <NA> NA NA NA
NA.6 <NA> <NA> NA NA NA
9 LAA10 Adult 1 57.40 121.95
10 LAA11 Adult 1 56.40 113.00
11 LAA12 Adult 1 52.00 111.85
13 LAA14 Adult 1 56.55 124.85
15 LAA16 Adult 1 57.15 120.10
NA.7 <NA> <NA> NA NA NA
NA.8 <NA> <NA> NA NA NA
21 LAA22 Adult 1 56.85 117.35
22 LAA23 Adult 1 54.80 117.45
27 LAA28 Adult 1 59.00 116.75
28 LAA29 Adult 1 55.95 124.25
NA.9 <NA> <NA> NA NA NA
30 LAA31 Adult 1 57.70 112.80
NA.10 <NA> <NA> NA NA NA
NA.11 <NA> <NA> NA NA NA
NA.12 <NA> <NA> NA NA NA
NA.13 <NA> <NA> NA NA NA
NA.14 <NA> <NA> NA NA NA
NA.15 <NA> <NA> NA NA NA
NA.16 <NA> <NA> NA NA NA
NA.17 <NA> <NA> NA NA NA
NA.18 <NA> <NA> NA NA NA
NA.19 <NA> <NA> NA NA NA
", header=TRUE)
rownames(x) <- seq.int(nrow(x))
head(x)
i1 <- is.na(x$Category)
i2 <- is.na(x$Sex)
x[!i1 & !i2, ]
Hope this helps,
Rui Barradas
--
View this message in context:
http://r.789695.n4.nabble.com/Subsetting-dataframe-with-missing-values-tp4590819p4591052.html
Sent from the R help mailing list archive at Nabble.com.
Hi> > Dear R-community, > > I am using R (V 2.14.1) on Windows 7. I have a dataset which consists of19> variables for 91 individuals or rows. Two of my variables are Age > (adult/chick, with no NA values) and Sex (0 for females/1 for females,with> quite a few NA values). The sex of many adult birds is unknown (enteredas> NA in dataframe). At some point of my analyses, I happen to need to needto> work with only male adults, so I tried subsetting the dataframe asfollows> (see code below) but I get a new dataframe containing all the males butalso> a lot of unneeded information such as data in rows 1-7 (NAs), 13, 14, 19and> 21-30. I suspect this is caused by NAs in the variable Sex because > everything goes fine (I get a dataframe containing adults) if I run thesame> code but without the "& Data$Sex == 1" part. > > How can I fix this problem? I there a straightforward way of subsetting > efficiently when NAs are present in the original dataset? > Thank you so much!I usually do it in 2 lines selection<- which(Data$Category == "Adult" & Data$Sex == 1) Data[selection, ] could be what you want. Or you can do adult.males <- adult.males[!is.na(adult.males$Sex),] Regards Petr> > Luciano > > adult.males <- Data[Data$Category == "Adult" & Data$Sex == 1,] > adult.males > > ID Category Sex Beak Head > NA <NA> <NA> NA NA NA > NA.1 <NA> <NA> NA NA NA > NA.2 <NA> <NA> NA NA NA > NA.3 <NA> <NA> NA NA NA > NA.4 <NA> <NA> NA NA NA > NA.5 <NA> <NA> NA NA NA > NA.6 <NA> <NA> NA NA NA > 9 LAA10 Adult 1 57.40 121.95 > 10 LAA11 Adult 1 56.40 113.00 > 11 LAA12 Adult 1 52.00 111.85 > 13 LAA14 Adult 1 56.55 124.85 > 15 LAA16 Adult 1 57.15 120.10 > NA.7 <NA> <NA> NA NA NA > NA.8 <NA> <NA> NA NA NA > 21 LAA22 Adult 1 56.85 117.35 > 22 LAA23 Adult 1 54.80 117.45 > 27 LAA28 Adult 1 59.00 116.75 > 28 LAA29 Adult 1 55.95 124.25 > NA.9 <NA> <NA> NA NA NA > 30 LAA31 Adult 1 57.70 112.80 > NA.10 <NA> <NA> NA NA NA > NA.11 <NA> <NA> NA NA NA > NA.12 <NA> <NA> NA NA NA > NA.13 <NA> <NA> NA NA NA > NA.14 <NA> <NA> NA NA NA > NA.15 <NA> <NA> NA NA NA > NA.16 <NA> <NA> NA NA NA > NA.17 <NA> <NA> NA NA NA > NA.18 <NA> <NA> NA NA NA > NA.19 <NA> <NA> NA NA NA > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.