Christopher Desjardins
2013-Aug-16 19:02 UTC
[R] Randomly drop a percent of data from a data.frame
Hi, I have the following data.> set.seed(6245) > data <- data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) > round(data,digits=3)x1 x2 x3 x4 1 0.482 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 -0.069 0.354 4 -0.086 0.475 0.244 0.781 5 0.690 -0.181 1.274 1.633 What I would like to do is drop 20% of the data. But I want this 20% to only come from dropping data from x3 and x4. It doesn't have to be evenly, i.e. I don't care to drop 2 from x3 and 2 from x4 or make sure only one observation has missing data on only one variable. I just want to drop 20% of the data through x3 and x4 only. In other words, x1 x2 x3 x4 1 0.482 1.320 -0.859 NA 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 NA 0.354 4 -0.086 0.475 NA 0.781 5 0.690 -0.181 NA 1.633 OR x1 x2 x3 x4 1 0.482 1.320 NA -0.142 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 NA NA 4 -0.086 0.475 0.244 NA 5 0.690 -0.181 1.274 1.633 OR x1 x2 x3 x4 1 0.482 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063 NA 3 0.028 -0.256 -0.069 NA 4 -0.086 0.475 0.244 NA 5 0.690 -0.181 1.274 NA ETC. are all fine. Any ideas how I can do this? Chris [[alternative HTML version deleted]]
Hi, May be this helps: #data1 (changed `data` to `data1`) set.seed(6245) ?data1 <- data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) ?data1<- round(data1,digits=3) data2<- data1 data1[,3:4]<-lapply(data1[,3:4],function(x){x1<- match(x,sample(unlist(data1[,3:4]),round(0.8*length(unlist(data1[,3:4])))));x[is.na(x1)]<-NA;x}) ?data1 #????? x1???? x2???? x3???? x4 #1? 0.482? 1.320???? NA -0.142 #2 -0.753 -0.041 -0.063? 0.886 #3? 0.028 -0.256 -0.069? 0.354 #4 -0.086? 0.475? 0.244? 0.781 #5? 0.690 -0.181? 1.274? 1.633 #or data2[,3:4]<-lapply(data2[,3:4],function(x){x1<- match(x,sample(unlist(data2[,3:4]),round(0.8*length(unlist(data2[,3:4])))));x[is.na(x1)]<-NA;x}) ?data2 #????? x1???? x2???? x3???? x4 #1? 0.482? 1.320 -0.859 -0.142 #2 -0.753 -0.041???? NA???? NA #3? 0.028 -0.256 -0.069? 0.354 #4 -0.086? 0.475? 0.244? 0.781 #5? 0.690 -0.181? 1.274? 1.633 A.K. ----- Original Message ----- From: Christopher Desjardins <cddesjardins at gmail.com> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Friday, August 16, 2013 3:02 PM Subject: [R] Randomly drop a percent of data from a data.frame Hi, I have the following data.> set.seed(6245) > data <- data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) > round(data,digits=3)? ? ? x1? ? x2? ? x3? ? x4 1? 0.482? 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063? 0.886 3? 0.028 -0.256 -0.069? 0.354 4 -0.086? 0.475? 0.244? 0.781 5? 0.690 -0.181? 1.274? 1.633 What I would like to do is drop 20% of the data. But I want this 20% to only come from dropping data from x3 and x4. It doesn't have to be evenly, i.e. I don't care to drop 2 from x3 and 2 from x4 or make sure only one observation has missing data on only one variable. I just want to drop 20% of the data through x3 and x4 only.? In other words, ? ? ? x1? ? x2? ? x3? ? x4 1? 0.482? 1.320 -0.859 NA 2 -0.753 -0.041 -0.063? 0.886 3? 0.028 -0.256? ? ? NA? 0.354 4 -0.086? 0.475? ? ? NA? 0.781 5? 0.690 -0.181? ? ? NA? 1.633 OR ? ? ? x1? ? x2? ? x3? ? x4 1? 0.482? 1.320? ? NA -0.142 2 -0.753 -0.041 -0.063? 0.886 3? 0.028 -0.256? ? ? NA? NA 4 -0.086? 0.475? 0.244? NA 5? 0.690 -0.181? 1.274? 1.633 OR ? ? ? x1? ? x2? ? x3? ? x4 1? 0.482? 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063? ? NA 3? 0.028 -0.256 -0.069? ? NA 4 -0.086? 0.475? 0.244? ? NA 5? 0.690 -0.181? 1.274? ? NA ETC. are all fine. Any ideas how I can do this? Chris ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.