Here is one way to fix the data:
# First note that "value" is a factor so we need to convert it to
character> str(zp)
'data.frame': 20 obs. of 2 variables:
$ variable: Factor w/ 5 levels "ZP.1","ZP.3",..: 1 1 1 1 2
2 2 2 3 3 ...
$ value : Factor w/ 19 levels "<0.030","<1.2",..: 3
4 2 1 7 8 6 5 12 11 ...> zp$value <- as.character(zp$value)
> str(zp)
'data.frame': 20 obs. of 2 variables:
$ variable: Factor w/ 5 levels "ZP.1","ZP.3",..: 1 1 1 1 2
2 2 2 3 3 ...
$ value : chr "1160" "27.3" "<1.2"
"<0.030" ...
# Next we need to see which values are preceded by "<", and record
that in
# a new variable, "note"> zp$note <- ifelse(grepl("<", zp$value), "Limit",
"Measured")
# Finally we strip the "<" off and convert "value" to
numeric> zp$value <- as.numeric(gsub("<", "", zp$value))
> str(zp)
'data.frame': 20 obs. of 3 variables:
$ variable: Factor w/ 5 levels "ZP.1","ZP.3",..: 1 1 1 1 2
2 2 2 3 3 ...
$ value : num 1160 27.3 1.2 0.03 1870 45.7 0.85 0.025 695 31.9 ...
$ note : chr "Measured" "Measured" "Limit"
"Limit" ...> head(zp)
variable value note
1 ZP.1 1160.00 Measured
2 ZP.1 27.30 Measured
3 ZP.1 1.20 Limit
4 ZP.1 0.03 Limit
5 ZP.3 1870.00 Measured
6 ZP.3 45.70 Measured
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Sam Albers
Sent: Monday, January 26, 2015 12:41 PM
To: r-help at r-project.org
Subject: [R] Working with < and > is data sets
Hello,
I am having some trouble figuring out how to deal with data that has some
observations that are detection limits and others that are integers denoted
by greater and less than symbols. Ideally I would like a column that has
the data as numbers then another column with values "Measured" or
"Limit"
or something like that. Data and further clarification below.
##Data
zp<-structure(list(variable = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L),
.Label = c("ZP.1",
"ZP.3", "ZP.5",
"ZP.7", "ZP.9"), class = "factor"),
value = structure(c(3L, 4L, 2L, 1L, 7L, 8L, 6L, 5L, 12L,
11L, 10L, 9L, 15L, 16L, 14L, 13L, 19L, 18L, 17L, 9L),
.Label = c("<0.030",
"<1.2", "1160",
"27.3", "<0.025", "<0.85", "1870",
"45.7", "<0.0020",
"<0.050",
"31.9", "695",
"<0.0060", "<0.20", "311", "8.84",
"<0.090", "12", "646"), class
"factor")),
.Names = c("variable", "value"), row.names =
c(NA, -20L),
class = "data.frame")
## As expected converting everything to numeric results is a slew of NA
values
zp$valuefactor<-as.numeric(as.character(zp$value))
## At this point I am unsure how to proceed.
zp
###
So I am just wondering how folks deal with this type of data. Any advice
would be much appreciated as I am looking for something that will reliably
works on a large data set.
Thanks in advance!
Sam
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.