Given that you know which columns should be numeric and which should be
character, finding characters in numeric columns or numbers in character
columns is not difficult. Your data frame consists of three character
columns so you can use regular expressions as Bert mentioned. First you
should strip the whitespace out of your data:
dat1 <-read.table(text="Name, Age, Weight
Alex, 20, 13X
Bob, 25, 142
Carol, 24, 120
John, 3BC, 175
Katy, 35, 160
Jack3, 34, 140",sep=",", header=TRUE, stringsAsFactors=FALSE,
strip.white=TRUE)
Now check to see if all of the fields are character as expected.
sapply(dat1, typeof)
# Name Age Weight
# "character" "character" "character"
Now identify character variables containing numbers and numeric variables
containing characters:
BadName <- which(grepl("[[:digit:]]", dat1$Name))
BadAge <- which(grepl("[[:alpha:]]", dat1$Age))
BadWeight <- which(grepl("[[:alpha:]]", dat1$Weight))
Next remove those rows:
(dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ])
# Name Age Weight
# 2 Bob 25 142
# 3 Carol 24 120
# 5 Katy 35 160
You still need to convert Age and Weight to numeric, e.g. dat2$Age <-
as.numeric(dat2$Age).
David Carlson
On Fri, Jan 28, 2022 at 11:59 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> As character 'polluted' entries will cause a column to be read in
(via
> read.table and relatives) as factor or character data, this sounds like a
> job for regular expressions. If you are not familiar with this subject,
> time to learn. And, yes, ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
> ZjQcmQRYFpfptBannerEnd
>
> As character 'polluted' entries will cause a column to be read in
(via
> read.table and relatives) as factor or character data, this sounds like a
> job for regular expressions. If you are not familiar with this subject,
> time to learn. And, yes, some heavy lifting will be required.
> See ?regexp for a start maybe? Or the stringr package?
>
> Cheers,
> Bert
>
>
>
>
> On Fri, Jan 28, 2022, 7:08 PM Val <valkremk at gmail.com> wrote:
>
> > Hi All,
> >
> > I want to remove rows that contain a character string in an integer
> > column or a digit in a character column.
> >
> > Sample data
> >
> > dat1 <-read.table(text="Name, Age, Weight
> > Alex, 20, 13X
> > Bob, 25, 142
> > Carol, 24, 120
> > John, 3BC, 175
> > Katy, 35, 160
> > Jack3, 34,
140",sep=",",header=TRUE,stringsAsFactors=F)
> >
> > If the Age/Weight column contains any character(s) then remove
> > if the Name column contains an digit then remove that row
> > Desired output
> >
> > Name Age weight
> > 1 Bob 25 142
> > 2 Carol 24 120
> > 3 Katy 35 160
> >
> > Thank you,
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$
> > PLEASE do read the posting guide
> >
https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________R-help at r-project.org
mailing list -- To UNSUBSCRIBE and more,
seehttps://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$
> PLEASE do read the posting guide
https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$
> and provide commented, minimal, self-contained, reproducible code.
>
>
[[alternative HTML version deleted]]