Hello, I need to subset my data to only look at the parts that have "holes" in it. I already have a formula to get rid of inconsistencies, but now I need to look only at the problem data to reconfigure it. In my data set where there are multiple "cycles" per "patient," and I want to highlight the patients who have a variable was not measured every cycle. Here's a similar example of the data: Patient, Cycle, Variable1, Variable 2 A, 1, 4, 5 A, 2, 3, 3 A, 3, 4, NA B, 1, 6, 6 B, 2, NA, 6 C, 1, 6, 5 C, 3, 2, 2 So in this case, I would want Patient A and Patient B, but not Patient C. Thanks! [[alternative HTML version deleted]]
Tena koe Lib In case you have receive a reply to this (I didn't notice one), here is one option:> libA X1 X4 X5 1 A 2 3 3 2 A 3 4 NA 3 B 1 6 6 4 B 2 NA 6 5 C 1 6 5 6 C 3 2 2> str(lib)'data.frame': 6 obs. of 4 variables: $ A : chr "A" "A" "B" "B" ... $ X1: num 2 3 1 2 1 3 $ X4: num 3 4 6 NA 6 2 $ X5: num 3 NA 6 6 5 2> lib1 <- aggregate(lib[,-1], list(lib[,1]), function(x) length(x[is.na(x)])>0) > lib1[apply(lib1[,-1], 1, sum)>0,1][1] "A" "B" HTH ... Peter Alspach -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Lib Gray Sent: Thursday, 19 July 2012 7:30 a.m. To: r-help at r-project.org Subject: [R] Subsetting problem data Hello, I need to subset my data to only look at the parts that have "holes" in it. I already have a formula to get rid of inconsistencies, but now I need to look only at the problem data to reconfigure it. In my data set where there are multiple "cycles" per "patient," and I want to highlight the patients who have a variable was not measured every cycle. Here's a similar example of the data: Patient, Cycle, Variable1, Variable 2 A, 1, 4, 5 A, 2, 3, 3 A, 3, 4, NA B, 1, 6, 6 B, 2, NA, 6 C, 1, 6, 5 C, 3, 2, 2 So in this case, I would want Patient A and Patient B, but not Patient C. Thanks! [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited.
Hello, Not sure whether I understand it well. If you want your output to include only Patient A &B, this should work: dat1<-read.table(text=" Patient Cycle Variable1 Variable2 A 1 4 5 A 2 3 3 A 3 4 NA B 1 6 6 B 2 NA 6 C 1 6 5 C 3 2 2 ",sep="",header=TRUE) ?subset(dat1,!dat1$Patient=="C") ? Patient Cycle Variable1 Variable2 1?????? A???? 1???????? 4???????? 5 2?????? A???? 2???????? 3???????? 3 3?????? A???? 3???????? 4??????? NA 4?????? B???? 1???????? 6???????? 6 5?????? B???? 2??????? NA???????? 6 But, if you want patient rows having NA's in either variable1 or 2, subset(dat1,is.na(dat1$Variable1)|is.na(dat1$Variable2)) ? #Patient Cycle Variable1 Variable2 #3?????? A???? 3???????? 4??????? NA #5?????? B???? 2??????? NA???????? 6 This will help to locate the patients which have missing values. Hope it helps. A.K. ----- Original Message ----- From: Lib Gray <libgray3827 at gmail.com> To: r-help at r-project.org Cc: Sent: Wednesday, July 18, 2012 3:30 PM Subject: [R] Subsetting problem data Hello, I need to subset my data to only look at the parts that have "holes" in it. I already have a formula to get rid of inconsistencies, but now I need to look only at the problem data to reconfigure it. In my data set where there are multiple "cycles" per "patient," and I want to highlight the patients who have a variable was not measured every cycle. Here's a similar example of the data: Patient, Cycle, Variable1, Variable 2 A, 1, 4, 5 A, 2, 3, 3 A, 3, 4, NA B, 1, 6, 6 B, 2, NA, 6 C, 1, 6, 5 C, 3, 2, 2 So in this case, I would want Patient A and Patient B, but not Patient C. Thanks! ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello, Try the following. d <- read.csv(text=" Patient, Cycle, Variable1, Variable2 A, 1, 4, 5 A, 2, 3, 3 A, 3, 4, NA B, 1, 6, 6 B, 2, NA, 6 C, 1, 6, 5 C, 3, 2, 2 ", header=TRUE) d compl <- lapply(split(d, d$Patient), function(x) if(all(diff(x$Cycle) == 1)) x) holes <- lapply(split(d, d$Patient), function(x) if(any(diff(x$Cycle) != 1)) x) do.call(rbind, compl) do.call(rbind, holes) In the mean time, you have posted another question similar but apparently more complete. I'll see to it, but tell something, is this answer completely off? If you just want to know whether there are holes, TRUE/FALSE answers, this other version might do it. aggregate(Cycle ~ Patient, data=d, function(x) any(diff(x) != 1)) Hope this helps, Rui Barradas Em 18-07-2012 20:30, Lib Gray escreveu:> Hello, I need to subset my data to only look at the parts that have "holes" > in it. I already have a formula to get rid of inconsistencies, but now I > need to look only at the problem data to reconfigure it. In my data set > where there are multiple "cycles" per "patient," and I want to highlight > the patients who have a variable was not measured every cycle. > > Here's a similar example of the data: > > Patient, Cycle, Variable1, Variable 2 > A, 1, 4, 5 > A, 2, 3, 3 > A, 3, 4, NA > B, 1, 6, 6 > B, 2, NA, 6 > C, 1, 6, 5 > C, 3, 2, 2 > > So in this case, I would want Patient A and Patient B, but not Patient C. > > Thanks! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.