hello folks,
Im trying to clean out a large file with data i dont need.
The column im manipulating in the file is called "legal_status"
There are three kinds of rows i want to remove. Those that have
"Private",
"Private (Op", or "Unknown" in the legal_status column.
I wrote this code but i get errors and it says im missing a TRUE/ False
thingy...im lost...heres the code...
cleanse <- function(a){
data1<-a
for (i in 1:dim(data1)[1])
{
if (data1[i,"legal_status"] == "Private")
{
data1[i,"legal_status"]<-data1[-i,"legal_status"]
}
if (data1[i,"legal_status"] == "Private (Op"){
data1[i,"legal_status"]<-data1[-i,"legal_status"]
}
if (data1[i,"legal_status"] == "Unknown"){
data1[i,"legal_status"]<-data1[-i,"legal_status"]
}
}
return(data1)
}
new_data<-cleanse(data)
Any ideas?
--
View this message in context:
http://old.nabble.com/cleanse-columns-and-unwanted-rows-tp26342169p26342169.html
Sent from the R help mailing list archive at Nabble.com.
?subset
----- Original message -----
From: "frenchcr" <frenchcr at btinternet.com>
To: r-help at r-project.org
Date: Fri, 13 Nov 2009 11:32:35 -0800 (PST)
Subject: [R] cleanse columns and unwanted rows
hello folks,
Im trying to clean out a large file with data i dont need.
The column im manipulating in the file is called "legal status"
Their are three kinds of rows i want to remove.
Those that have "Private", "Private (Op", or
"Unknown" in the legal_status
column.
I wrote this code but it syas im missing a TRUE/ False thingy...im
lost...heres the code...
cleanse <- function(a){
data1<-a
for (i in 1:dim(data1)[1])
{
if (data1[i,"legal_status"] == "Private")
{
data1[i,"legal_status"]<-data1[-i,"legal_status"]
}
if (data1[i,"legal_status"] == "Private (Op"){
data1[i,"legal_status"]<-data1[-i,"legal_status"]
}
if (data1[i,"legal_status"] == "Unknown"){
data1[i,"legal_status"]<-data1[-i,"legal_status"]
}
}
return(data1)
}
new_data<-cleanse(data)
Any ideas?
--
View this message in context:
http://old.nabble.com/cleanse-columns-and-unwanted-rows-tp26342169p26342169.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
On Nov 13, 2009, at 2:32 PM, frenchcr wrote:> hello folks, > > Im trying to clean out a large file with data i dont need. > The column im manipulating in the file is called "legal status" > Their are three kinds of rows i want to remove. > Those that have "Private", "Private (Op", or "Unknown" in the > legal_status > column. > > > I wrote this code but it syas im missing a TRUE/ False thingy...im > lost...heres the code... >Come on, "frenchcr". Just copy and post the damned error message.> > cleanse <- function(a){ > data1<-a > > for (i in 1:dim(data1)[1])> { > if (data1[i," > { > data1[i,"legal_status"]<-data1[-i,"legal_status"]That will return every thing but one particular row> } > if (data1[i,""){ > data1[i,"legal_status"]<-data1[-i,"legal_status"]ditto> } > if (data1[i,""){ > data1[i,"legal_status"]<-data1[-i,"legal_status"] > } > }Makes for a lot of data.frame copying even if you hadn't sabotaged up the registration of the indexing with the shrinking dataframe.> return(data1) > } > new_data<-cleanse(data)new_data <- subset(data, legal_status != "Private" & legal_status != "Private(Op" & legal_status != "Unknown") Or maybe: "%not-in%" <- function(x, table) match(x, table, nomatch = 0) == 0 new_data <- subset(data, legal_status %not-in% c( "Private" , "Private(Op" , "Unknown") )>-- David Winsemius, MD Heritage Laboratories West Hartford, CT
The full code and error message i get is...> cleanse <- function(a){+ data1<-a + for (i in 1:dim(data1)[1]) + { + if (data1[i,"legal_status"] == "Private"){ + data1[i,"legal_status"]<-data1[-i,] + if (data1[i,"legal_status"] == "Private (Op"){ + data1[i,"legal_status"]<-data1[-i,] + if (data1[i,"legal_status"] == "Unknown"){ + data1[i,"legal_status"]<-data1[-i,] + } + } + } + } + return(data1) + }> new_data<-cleanse(data)Error in if (data1[i, "legal_status"] == "Private (Op") { : missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50)>frenchcr wrote:> > hello folks, > > Im trying to clean out a large file with data i dont need. > The column im manipulating in the file is called "legal_status" > There are three kinds of rows i want to remove. Those that have "Private", > "Private (Op", or "Unknown" in the legal_status column. > > > I wrote this code but i get errors and it says im missing a TRUE/ False > thingy...im lost...heres the code... > > > > cleanse <- function(a){ > data1<-a > > for (i in 1:dim(data1)[1]) > { > if (data1[i,"legal_status"] == "Private") > { > data1[i,"legal_status"]<-data1[-i,"legal_status"] > } > if (data1[i,"legal_status"] == "Private (Op"){ > data1[i,"legal_status"]<-data1[-i,"legal_status"] > } > if (data1[i,"legal_status"] == "Unknown"){ > data1[i,"legal_status"]<-data1[-i,"legal_status"] > } > } > > return(data1) > } > new_data<-cleanse(data) > > > > > Any ideas? >-- View this message in context: http://old.nabble.com/cleanse-columns-and-unwanted-rows-tp26342169p26350857.html Sent from the R help mailing list archive at Nabble.com.
The solution is much simpler (thanks Phil!)
new_data = data[!data$"legal status" %in%
c("Private","Private
(Op","Unknown"),]
...works nicely.
frenchcr wrote:>
> hello folks,
>
> Im trying to clean out a large file with data i dont need.
> The column im manipulating in the file is called "legal_status"
> There are three kinds of rows i want to remove. Those that have
"Private",
> "Private (Op", or "Unknown" in the legal_status column.
>
>
> I wrote this code but i get errors and it says im missing a TRUE/ False
> thingy...im lost...heres the code...
>
>
>
> cleanse <- function(a){
> data1<-a
>
> for (i in 1:dim(data1)[1])
> {
> if (data1[i,"legal_status"] == "Private")
> {
>
data1[i,"legal_status"]<-data1[-i,"legal_status"]
> }
> if (data1[i,"legal_status"] == "Private (Op"){
>
data1[i,"legal_status"]<-data1[-i,"legal_status"]
> }
> if (data1[i,"legal_status"] == "Unknown"){
>
data1[i,"legal_status"]<-data1[-i,"legal_status"]
> }
> }
>
> return(data1)
> }
> new_data<-cleanse(data)
>
>
>
>
> Any ideas?
>
--
View this message in context:
http://old.nabble.com/cleanse-columns-and-unwanted-rows-tp26342169p26350874.html
Sent from the R help mailing list archive at Nabble.com.