hello folks, Im trying to clean out a large file with data i dont need. The column im manipulating in the file is called "legal_status" There are three kinds of rows i want to remove. Those that have "Private", "Private (Op", or "Unknown" in the legal_status column. I wrote this code but i get errors and it says im missing a TRUE/ False thingy...im lost...heres the code... cleanse <- function(a){ data1<-a for (i in 1:dim(data1)[1]) { if (data1[i,"legal_status"] == "Private") { data1[i,"legal_status"]<-data1[-i,"legal_status"] } if (data1[i,"legal_status"] == "Private (Op"){ data1[i,"legal_status"]<-data1[-i,"legal_status"] } if (data1[i,"legal_status"] == "Unknown"){ data1[i,"legal_status"]<-data1[-i,"legal_status"] } } return(data1) } new_data<-cleanse(data) Any ideas? -- View this message in context: http://old.nabble.com/cleanse-columns-and-unwanted-rows-tp26342169p26342169.html Sent from the R help mailing list archive at Nabble.com.
?subset ----- Original message ----- From: "frenchcr" <frenchcr at btinternet.com> To: r-help at r-project.org Date: Fri, 13 Nov 2009 11:32:35 -0800 (PST) Subject: [R] cleanse columns and unwanted rows hello folks, Im trying to clean out a large file with data i dont need. The column im manipulating in the file is called "legal status" Their are three kinds of rows i want to remove. Those that have "Private", "Private (Op", or "Unknown" in the legal_status column. I wrote this code but it syas im missing a TRUE/ False thingy...im lost...heres the code... cleanse <- function(a){ data1<-a for (i in 1:dim(data1)[1]) { if (data1[i,"legal_status"] == "Private") { data1[i,"legal_status"]<-data1[-i,"legal_status"] } if (data1[i,"legal_status"] == "Private (Op"){ data1[i,"legal_status"]<-data1[-i,"legal_status"] } if (data1[i,"legal_status"] == "Unknown"){ data1[i,"legal_status"]<-data1[-i,"legal_status"] } } return(data1) } new_data<-cleanse(data) Any ideas? -- View this message in context: http://old.nabble.com/cleanse-columns-and-unwanted-rows-tp26342169p26342169.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Nov 13, 2009, at 2:32 PM, frenchcr wrote:> hello folks, > > Im trying to clean out a large file with data i dont need. > The column im manipulating in the file is called "legal status" > Their are three kinds of rows i want to remove. > Those that have "Private", "Private (Op", or "Unknown" in the > legal_status > column. > > > I wrote this code but it syas im missing a TRUE/ False thingy...im > lost...heres the code... >Come on, "frenchcr". Just copy and post the damned error message.> > cleanse <- function(a){ > data1<-a > > for (i in 1:dim(data1)[1])> { > if (data1[i," > { > data1[i,"legal_status"]<-data1[-i,"legal_status"]That will return every thing but one particular row> } > if (data1[i,""){ > data1[i,"legal_status"]<-data1[-i,"legal_status"]ditto> } > if (data1[i,""){ > data1[i,"legal_status"]<-data1[-i,"legal_status"] > } > }Makes for a lot of data.frame copying even if you hadn't sabotaged up the registration of the indexing with the shrinking dataframe.> return(data1) > } > new_data<-cleanse(data)new_data <- subset(data, legal_status != "Private" & legal_status != "Private(Op" & legal_status != "Unknown") Or maybe: "%not-in%" <- function(x, table) match(x, table, nomatch = 0) == 0 new_data <- subset(data, legal_status %not-in% c( "Private" , "Private(Op" , "Unknown") )>-- David Winsemius, MD Heritage Laboratories West Hartford, CT
The full code and error message i get is...> cleanse <- function(a){+ data1<-a + for (i in 1:dim(data1)[1]) + { + if (data1[i,"legal_status"] == "Private"){ + data1[i,"legal_status"]<-data1[-i,] + if (data1[i,"legal_status"] == "Private (Op"){ + data1[i,"legal_status"]<-data1[-i,] + if (data1[i,"legal_status"] == "Unknown"){ + data1[i,"legal_status"]<-data1[-i,] + } + } + } + } + return(data1) + }> new_data<-cleanse(data)Error in if (data1[i, "legal_status"] == "Private (Op") { : missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50)>frenchcr wrote:> > hello folks, > > Im trying to clean out a large file with data i dont need. > The column im manipulating in the file is called "legal_status" > There are three kinds of rows i want to remove. Those that have "Private", > "Private (Op", or "Unknown" in the legal_status column. > > > I wrote this code but i get errors and it says im missing a TRUE/ False > thingy...im lost...heres the code... > > > > cleanse <- function(a){ > data1<-a > > for (i in 1:dim(data1)[1]) > { > if (data1[i,"legal_status"] == "Private") > { > data1[i,"legal_status"]<-data1[-i,"legal_status"] > } > if (data1[i,"legal_status"] == "Private (Op"){ > data1[i,"legal_status"]<-data1[-i,"legal_status"] > } > if (data1[i,"legal_status"] == "Unknown"){ > data1[i,"legal_status"]<-data1[-i,"legal_status"] > } > } > > return(data1) > } > new_data<-cleanse(data) > > > > > Any ideas? >-- View this message in context: http://old.nabble.com/cleanse-columns-and-unwanted-rows-tp26342169p26350857.html Sent from the R help mailing list archive at Nabble.com.
The solution is much simpler (thanks Phil!) new_data = data[!data$"legal status" %in% c("Private","Private (Op","Unknown"),] ...works nicely. frenchcr wrote:> > hello folks, > > Im trying to clean out a large file with data i dont need. > The column im manipulating in the file is called "legal_status" > There are three kinds of rows i want to remove. Those that have "Private", > "Private (Op", or "Unknown" in the legal_status column. > > > I wrote this code but i get errors and it says im missing a TRUE/ False > thingy...im lost...heres the code... > > > > cleanse <- function(a){ > data1<-a > > for (i in 1:dim(data1)[1]) > { > if (data1[i,"legal_status"] == "Private") > { > data1[i,"legal_status"]<-data1[-i,"legal_status"] > } > if (data1[i,"legal_status"] == "Private (Op"){ > data1[i,"legal_status"]<-data1[-i,"legal_status"] > } > if (data1[i,"legal_status"] == "Unknown"){ > data1[i,"legal_status"]<-data1[-i,"legal_status"] > } > } > > return(data1) > } > new_data<-cleanse(data) > > > > > Any ideas? >-- View this message in context: http://old.nabble.com/cleanse-columns-and-unwanted-rows-tp26342169p26350874.html Sent from the R help mailing list archive at Nabble.com.